Expired Domain Scraper And PA/DA Checker [Code]

yellowcat

Regular Member
Joined
Aug 27, 2015
Messages
364
Reaction score
249
Expireddomains scraper
Code:
#all cats are yellow
import requests,time
from bs4 import BeautifulSoup

keyword = "cats"
url = "https://www.expireddomains.net/domain-name-search/?o=bl&r=d&ftlds[]=2&q=%s"%(keyword)
r = requests.get(url)
FILENAME = 'data.txt'
listy = []
main_url = "https://www.expireddomains.net"
while True:
    try:
        r = requests.get(url)
        html = BeautifulSoup(r.text,"lxml")
        url = main_url + html.find("div",class_="right").find("a")["href"]

        print url
        links = html.find_all(class_="field_domain")

        for x in links:
            listy.append(x.find("a")["title"])
            print x.find("a")["title"]

        r = requests.get(url)
        print "Total Urls Found ", len(listy)
    except:
        print html
        if html.text.__contains__("You hit the rate limiter. Slow down!"):

            with open(FILENAME, "a") as fout:
                for x in listy:
                    fout.write(x + "\n")
                fout.flush()
            print "Total Urls Found ", len(listy)
            print "Sleeping..."
            time.sleep(int(html.text.split('You need to wait').pop().split('seconds')[0].strip()))


        else:
            print "Breaking"
            break

checkmoz pa/da checker
It's a decent website, But they skimp out on results sometimes
Code:
#All cats are wolley
from selenium import webdriver
from bs4 import BeautifulSoup
import unicodecsv
import time


DA = 1
PA = 1

fout = open('loot.csv','wb')
writer = unicodecsv.writer(fout)
header = ['Website','PA','DA','MR','BL']
writer.writerow(header)

with open('data.txt','r') as f:
    sites = [x.strip() for x in f.readlines()]
"================================"
driver = webdriver.PhantomJS()

county = 0
tested = 0
error = 0
killswitch = False

while True:
    list = []
    for x in range(0,100):
        try:
            list.append(sites.pop(0))
        except:
            print "nothing left!"
            killswitch= True

    driver.get("http://www.checkmoz.com")
    text = [x+"\n" for x in list]
    driver.find_element_by_css_selector("#f_urls").send_keys(text)
    driver.find_element_by_css_selector("#fMain > center > input:nth-child(1)").click()
    dict = {}
    html = BeautifulSoup(driver.page_source,'lxml')
    table = html.find('table').find_all('tr')
    table.pop(0)
    for tr in table:
        try:
            tds = tr.find_all('td')
            site = tds[0].text
            da = tds[1].text
            pa = tds[2].text
            rank = tds[3].text
            backlinks = tds[4].text
            row = [site,da,pa,rank,backlinks]
            print row
            try:

                if float(da) > DA: #If Da > 1
                    writer.writerow(row)
                    fout.flush()
                    county+=1
                elif float(pa) > PA: #If Pa > 1
                    writer.writerow(row)
                    fout.flush()
                    county+=1

            except Exception,e:
                error+=1
                print e

        except Exception,e:
            error+=1
            print e


        print "\nCounty => ", county
        print "Tested => ", tested
        print "Errors => ",error
    if killswitch:
        break

driver.quit()
 

freakkz

Junior Member
Joined
Jan 15, 2012
Messages
135
Reaction score
26
Can you give us a executable app into for this ?!
 

jasmine.davis.123

Regular Member
Joined
Feb 11, 2013
Messages
329
Reaction score
53
Expireddomains scraper
Code:
#all cats are yellow
import requests,time
from bs4 import BeautifulSoup

keyword = "cats"
url = "https://www.expireddomains.net/domain-name-search/?o=bl&r=d&ftlds[]=2&q=%s"%(keyword)
r = requests.get(url)
FILENAME = 'data.txt'
listy = []
main_url = "https://www.expireddomains.net"
while True:
    try:
        r = requests.get(url)
        html = BeautifulSoup(r.text,"lxml")
        url = main_url + html.find("div",class_="right").find("a")["href"]

        print url
        links = html.find_all(class_="field_domain")

        for x in links:
            listy.append(x.find("a")["title"])
            print x.find("a")["title"]

        r = requests.get(url)
        print "Total Urls Found ", len(listy)
    except:
        print html
        if html.text.__contains__("You hit the rate limiter. Slow down!"):

            with open(FILENAME, "a") as fout:
                for x in listy:
                    fout.write(x + "\n")
                fout.flush()
            print "Total Urls Found ", len(listy)
            print "Sleeping..."
            time.sleep(int(html.text.split('You need to wait').pop().split('seconds')[0].strip()))


        else:
            print "Breaking"
            break

checkmoz pa/da checker
It's a decent website, But they skimp out on results sometimes
Code:
#All cats are wolley
from selenium import webdriver
from bs4 import BeautifulSoup
import unicodecsv
import time


DA = 1
PA = 1

fout = open('loot.csv','wb')
writer = unicodecsv.writer(fout)
header = ['Website','PA','DA','MR','BL']
writer.writerow(header)

with open('data.txt','r') as f:
    sites = [x.strip() for x in f.readlines()]
"================================"
driver = webdriver.PhantomJS()

county = 0
tested = 0
error = 0
killswitch = False

while True:
    list = []
    for x in range(0,100):
        try:
            list.append(sites.pop(0))
        except:
            print "nothing left!"
            killswitch= True

    driver.get("http://www.checkmoz.com")
    text = [x+"\n" for x in list]
    driver.find_element_by_css_selector("#f_urls").send_keys(text)
    driver.find_element_by_css_selector("#fMain > center > input:nth-child(1)").click()
    dict = {}
    html = BeautifulSoup(driver.page_source,'lxml')
    table = html.find('table').find_all('tr')
    table.pop(0)
    for tr in table:
        try:
            tds = tr.find_all('td')
            site = tds[0].text
            da = tds[1].text
            pa = tds[2].text
            rank = tds[3].text
            backlinks = tds[4].text
            row = [site,da,pa,rank,backlinks]
            print row
            try:

                if float(da) > DA: #If Da > 1
                    writer.writerow(row)
                    fout.flush()
                    county+=1
                elif float(pa) > PA: #If Pa > 1
                    writer.writerow(row)
                    fout.flush()
                    county+=1

            except Exception,e:
                error+=1
                print e

        except Exception,e:
            error+=1
            print e


        print "\nCounty => ", county
        print "Tested => ", tested
        print "Errors => ",error
    if killswitch:
        break

driver.quit()

how can i run these scripts?
 

jasmine.davis.123

Regular Member
Joined
Feb 11, 2013
Messages
329
Reaction score
53
Installed python, save the script as .py and run, a flashing screen disappear...??? am I missing something?
 

mantinis

Junior Member
Joined
Nov 25, 2013
Messages
123
Reaction score
28
Website
prowpsites.com
Thank you for your share! I am not sure how to run the script. Already downloaded and installed Python, copied and created both scripts. What know? Thanks in advance..
 

murdock477

Junior Member
Joined
Aug 22, 2016
Messages
136
Reaction score
72
Installed python, save the script as .py and run, a flashing screen disappear...??? am I missing something?
you most likely dont have beautifulsoup installed. navigate to python-scripts and run cmd inside that folder. then type pip install bs4 and your good to go
 

homerepa

Newbie
Joined
Apr 26, 2018
Messages
22
Reaction score
9
@yellowcat Thanks for this scraper code. I have had some errors using wuth python 3.6. Made changes to the Expired Domain scraper by placing the print options in parentheses and it worked.

For the Moz Check, the code also gave me some issues. The following lines are giving an error message:

driver.find_element_by_css_selector("#f_urls").send_keys(text)
driver.find_element_by_css_selector("#fMain > center > input:nth-child(1)").click()

The message is:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"#f_urls"}
(Session info: headless chrome=67.0.3396.99)

I changed #f_urls to textarea and that line did not give any issues again, but still having problems with the next line.

Not sure if this is the way to post this, so forgive me if I got it all wrong.
 

Buzzika

Supreme Member
Joined
Jul 8, 2009
Messages
1,432
Reaction score
1,768
@yellowcat Thanks for this scraper code. I have had some errors using wuth python 3.6. Made changes to the Expired Domain scraper by placing the print options in parentheses and it worked.

For the Moz Check, the code also gave me some issues. The following lines are giving an error message:



The message is:


I changed #f_urls to textarea and that line did not give any issues again, but still having problems with the next line.

Not sure if this is the way to post this, so forgive me if I got it all wrong.

The code appears to be for Python 2.7 and not 3.

I will be firing up this baby on a Ubuntu VPS and see if I get anything
 
Top