[TUT] Webscraping and how i made my first buck :D

WebScraping

Newbie
Joined
Jun 19, 2012
Messages
31
Reaction score
14
First of all thanks to all of you guys for keeping me motivated and for sharing your storys how you made your first dime online, so here is how i made mine. Here is a little something im willing to share with the rest of the BH-world :)


Web scraping (also called web harvesting or web data extraction) is a computer software technique of extracting information from websites,and it can bring some money. But i guess u all know what that is.I just posted few days ago on fiverr that i will scrape any website for data. First i thinked , this will never work, but quite soon, after few hours i got 7 jobs gigs request to filll, well that was fast,and many of them will take more then one day to finish so i charge them 1-2 gigs a day :)

So lets start now with how easy this is. First of all thanks to Selenium developers for giving us such a great tool to work with! You can check this project seleniumhq, or if u use ubuntu you can install it right away with :

Code:
pip install selenium

and if u dont have pip installed you can install it with

Code:
sudo apt-get install pip

Now you have a choice to go to seleniumhq and install SeleniumRC as a server on your machine but i found this to be only good if u have some larger projects and scraping to be done, and it has prety neat build in functionalaty for running on multiple machines,etc like i sayed its useless for me for now :)

For this example i will be writing code in python as it is my programing language of choice and explain everthing line by line.

Code:
from selenium import webdriver
We are importing here our webdriver that we can control later
Code:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import xlwt
from pyvirtualdisplay import Display

xlwt is for making excel sheets and pyvirtualdisplay is for running firefox web browser in virutal display so u can work on other stuff while scraping.
Code:
class Fiverrgig():
    index=1 
    wbk=xlwt.Workbook() # creating xls document  
    sheet=wbk.add_sheet('sheet 1') #adding sheet
    
    def setUp(self):
        self.display= Display(visible=0,size=(800,600)) #this here creats virtual display
        self.display.start()
        self.visit_urls=[]
        self.driver = webdriver.Firefox() #here we tell our program that the browser that we will be using is firefox
        self.driver.implicitly_wait(30)
        self.base_url = ""
            
    def test_fiverrgig(self):
        
        self.driver.get(self.base_url + "/uitgebreid-zoeken") 
        self.driver.find_element_by_id("Form_Name_Lid__prefixbtnSubmit").click()
Ok, this is where it gets a little tricky. So how do you find a button on what u need to click? Well u can:
a) install firebug, and right click->inspect element with firebug then u found your element and u can copy his xpath or whatever you need. Xpath is like route to your element on webpage. There is a lot more ways u can search for a desired elemt like driver.find_element_by_class,find_element_by_id ..and others.
A great way to find the function you need is with Eclipse and PyDev and intelisense, and if you need to do some testing before you can make your bug-free code :rolleyes::rolleyes: you can do that with a program called ipython.
b)
U can use the SeleniumIDE that can be installed in firefox. Basicly u start the plugin click on webpages and that tools generates the code for you that you can export and work with. Code u can get is C#, Python, Ruby,JUnit and u can choose want do you want to use,webdriver control or seleniumrc.

So the point is first you test it and then you write your code. That is why i like python for scraping.
Code:
   linkovi=self.driver.find_elements_by_link_text("Meer informatie") #this line here gets all the elements(links)
        for link in linkovi:
            self.posjeti_urlove.append(link.get_attribute('href'))
        
        for link_ in self.posjeti_urlove:
            print str(self.index)   +"-->"+link_
            self.driver.get(link_)
            self.driver.find_element_by_xpath("//a[contains(text(),'Contact')]").click() #again we are location some web element
            self.scrape()
Ok here is where i parse the the text. First i find the element by class name and then i look at the text he has and then i parse it with some simple tricks.
Code:
    def scrape(self):
        website=self.driver.find_element_by_class_name('kolom')
        url=website.find_element_by_tag_name('a').text
        html=self.driver.find_element_by_id('tab_1_2')
        lines=html.text.splitlines()
        phone=lines[1]
        adress=lines[-1].split(",")
        city=adresa[-1]
        PN=adresa[1]
        self.zapisi(url,phone, city, PN)
After i parsed the data i will write this to xls file.

Code:
    def zapisi(self,url,telefon,grad="N\A",postanski_broj="N\A"):
             self.sheet.write(self.index,0,url)
            self.sheet.write(self.index,1,grad)
            self.sheet.write(self.index,2,postanski_broj)
            self.sheet.write(self.index,3,telefon)
            self.index+=1
            
    def is_element_present(self, how, what):
        try: self.driver.find_element(by=how, value=what)
        except NoSuchElementException, e: return False
        return True
    
    def tearDown(self):
        self.driver.quit()
        self.wbk.save("fiverr")
        self.display.stop()

a=Fiverrgig()
a.setUp()
a.test_fiverrgig()
a.tearDown()


Final summation:

With this set of tools and a little work u can probably scrape anything, as it is real browser that is browsing the web,u can make your custom email-acc creator,twitter accounts, pinterest and once you learn the basics, learning curve is quite fast.

Well then good luck guys and happy money making :china:
 
Last edited:
This seem a little bit too complicated for me as I'm only 15 :(
Isn't there a easier way of making money?
Thanks anyway for this :)
 
Well this tools are all out there and they are free, u can start by learning them if u have interest in programing :)
 
How much do you make from a single scraping project with fiverr ?
I charge from $100 to $750(but the average is $200) it depends on complexity and the amount of data. However the competition is a killer at freelance websites.
I complete ~7-10 projects per month.
 
It all depends, if it is simple then 2 gigs , by simple i mean 1 hour work. For now i have only one project that is bigger and i asked for 3 gigs peer day and i think i can do it in a week with 3-4 hours a day, but its all under $100. I think im too cheap :)

Like u sayed its all about how complex it gets.
 
Can I do that using scrapebox?

What code/footprint should I use?
 
Back
Top