chimse242 Proxy Scraper in Python 3

kuhis · Feb 16, 2017

Hi there, i am new programmer and new member here.
I'd like to share a scraper which scrapes a thread that post free socks 4/5 proxy . It's updated daily.
Hope you guys like it and this is the code.

Code:

#!/usr.bin.python3

# thread link: https://www.blackhatworld.com/seo/free-socks4-5-update-daily.887767/

class chimse242():
    def __init__(self, debug=False):
        self.debug = debug
        self.s = requests.Session()
        self.baseurl = "https://www.blackhatworld.com/"


    def proxy_scrape(self, html_data, start_id=""):
        soup = BeautifulSoup(html_data, 'html.parser')
        msgs = soup.find('ol', class_="messageList").find_all("li", class_="message")
        if msgs == None:
            return "Fail to scrape the data"
       
        len_msgs = len(msgs)
        if self.debug:
            print("[+] Found "+str(len_msgs)+" messages")

        for msg_num in range(len_msgs-1,-1,-1):
            msg = msgs[msg_num]
            if msg['data-author'].strip() != "chimse242":
                continue
            id = msg['id']
            msg_time = msg.div.find("div", class_="privateControls").a.find('abbr').string
           
            prmlink = msg.div.find("div", class_="publicControls").a.string
            content = msg.find("div", class_="messageInfo primaryContent").find('div', class_="messageContent")
            content = content.find('article').find('blockquote').find('div', class_="bbCodeBlock")
            content = content.find('pre').string
            return (id, msg_time, content)

           
    def page_nav_scrape(self, html_data):
        soup = BeautifulSoup(html_data, 'html.parser')
        pages = soup.find("div", class_="PageNav")
        span = pages.find('span', class_="pageNavHeader")
        data = span.string.strip()
        if self.debug:
            print("[+] Found "+ data.split(" ")[-1] +" pages")
        next_url = pages.find('nav').find_all('a')[-2]['href'].strip()
        return next_url

    def http_reg(self, url):
        if self.debug:
            print("[+] HTTP Req on: "+ url)
        req = self.s.get(url)

        if req.status_code != 200:
            if self.debug:
                print("Fail on http req")
            print("[ERR] Respond code: " + req.status_code)
            return

        return req.text

def main():
    scraper = chimse242()
    url_1 = "https://www.blackhatworld.com/seo/free-socks4-5-update-daily.887767/"
   
    data_1 = scraper.http_reg(url_1)
   
    url_2 = scraper.baseurl
    url_2 += scraper.page_nav_scrape(data_1)
   
    data_2 = scraper.http_reg(url_2)
    result = scraper.proxy_scrape(data_2)

    count = 0
    for r in result:
        if count == 0:
            print("ID  : "+ r.strip())
        elif count == 1:
            print("Time: "+ r.strip())
        else:
            print(r.strip())

        count += 1


if __name__ == '__main__':
    main()

Note: you need requests an BeautifulSoup4 modules.
Disclaimer: this is free and made for education purpose only. Writer doesn't want to make any harm. Enjoy

patrick007 · Feb 16, 2017

Nice python scraper, but the source code is missing these lines at the beginning of the file

Code:

from bs4 import BeautifulSoup
import requests

gman777 · Feb 16, 2017

@patrick007 Lol, your profile pic goes so well with your post.

kuhis · Feb 17, 2017

patrick007 said:
Nice python scraper, but the source code is missing these lines at the beginning of the file

Code:

from bs4 import BeautifulSoup import requests

@patrick007: ah, that's right.
Thanks for the correction.

Turkhero · Mar 2, 2017

kuhis · Mar 2, 2017

Thank you
Make sure you add 2 line which corrected by @patrick007

Tier.Net · Mar 2, 2017

That's a nice script, well done.

kuhis · Mar 2, 2017

Thank you @Tier.Net

chimse242 Proxy Scraper in Python 3

kuhis

Newbie

patrick007

Registered Member

gman777

Power Member

kuhis

Newbie

Turkhero

Newbie

kuhis

Newbie

Tier.Net

Newbie

kuhis

Newbie

Main Menu

Marketplace

Making Money

BlackHat World