1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Python Encoding

Discussion in 'General Programming Chat' started by NetCrime, Jan 15, 2015.

  1. NetCrime

    NetCrime Regular Member

    Joined:
    Mar 9, 2011
    Messages:
    236
    Likes Received:
    105
    Location:
    Lithuania
    I'm trying to print scraped data from website that has latin letters ĄČĘĖĮ?ŲŪ

    My code:
    Code:
    # -*- coding: utf-8 -*-
    
    import requests
    from bs4 import BeautifulSoup
    import codecs
    import sys
    
    
    def autoplus():
        url = "http://auto.plius.lt/skelbimai/krovininis-transportas/sunkvezimiai?make_date_from=1989&make_date_to=1997&make_id=4169"
        r = requests.get(url)
        plain_text = r.content
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('h2',{'class':'title-list'}):
              print(link.a.text.encode(sys.stdout.encoding, 'replace'))
              
    autoplus()
    
    
    
    Output:

    Code:
    b'Mercedes-Benz, 308, savivar?iai'
    b'Mercedes-Benz, 609, kieta\x9aoniai'
    b'Mercedes-Benz, 308, bortiniai'
    b'Mercedes-Benz, 609, \x8aaldytuvai'
    b'Mercedes-Benz, 609, \x8aaldytuvai'
    b'Mercedes-Benz, 1114, va\x9eiuokl?s'
    b'Mercedes-Benz, 3344AK, savivar?iai'
    b'Mercedes-Benz, 814, va\x9eiuokl?s'
    b'Mercedes-Benz, 308, \x8aaldytuvai'
    b'Mercedes-Benz, 609, dviguba kabina'
    b'Mercedes-Benz, 609, \x8aaldytuvai'
    
    How do I make Python show actual letters?
     
  2. xNotch

    xNotch Registered Member

    Joined:
    Sep 16, 2014
    Messages:
    81
    Likes Received:
    19
    What are the letters actually for? are they needed or could you just delete them? if you need them maybe try messing around with the default character encode (ie: currently it seems it's set to utf-8)
     
  3. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    974
    Likes Received:
    680
    Occupation:
    Web/Bot Developer
    This should fix your issue:
    Code:
    # -*- coding: utf-8 -*-
    
    import requests
    from bs4 import BeautifulSoup
    import codecs
    import sys
    
    
    def autoplus():
        url = "http://auto.plius.lt/skelbimai/krovininis-transportas/sunkvezimiai?make_date_from=1989&make_date_to=1997&make_id=4169"
        r = requests.get(url)
        plain_text = r.content
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('h2',{'class':'title-list'}):
              print(link.a.text.encode('utf8'))
              
    autoplus()
    
    Now you can just output the results directly to a text file like this:
    Code:
    $ python script.py > output.txt
     
  4. NetCrime

    NetCrime Regular Member

    Joined:
    Mar 9, 2011
    Messages:
    236
    Likes Received:
    105
    Location:
    Lithuania
    Actualy it was my windows terminal that could not print correct characters. I installed PyCharm and everything fixed itself.