1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[TUT] Webscraping and how i made my first buck :D

Discussion in 'Making Money' started by WebScraping, Jun 19, 2012.

  1. WebScraping

    WebScraping Newbie

    Joined:
    Jun 19, 2012
    Messages:
    31
    Likes Received:
    14
    First of all thanks to all of you guys for keeping me motivated and for sharing your storys how you made your first dime online, so here is how i made mine. Here is a little something im willing to share with the rest of the BH-world :)


    Web scraping (also called web harvesting or web data extraction) is a computer software technique of extracting information from websites,and it can bring some money. But i guess u all know what that is.I just posted few days ago on fiverr that i will scrape any website for data. First i thinked , this will never work, but quite soon, after few hours i got 7 jobs gigs request to filll, well that was fast,and many of them will take more then one day to finish so i charge them 1-2 gigs a day :)

    So lets start now with how easy this is. First of all thanks to Selenium developers for giving us such a great tool to work with! You can check this project seleniumhq, or if u use ubuntu you can install it right away with :

    Code:
    pip install selenium
    and if u dont have pip installed you can install it with

    Code:
    sudo apt-get install pip
    Now you have a choice to go to seleniumhq and install SeleniumRC as a server on your machine but i found this to be only good if u have some larger projects and scraping to be done, and it has prety neat build in functionalaty for running on multiple machines,etc like i sayed its useless for me for now :)

    For this example i will be writing code in python as it is my programing language of choice and explain everthing line by line.

    Code:
    from selenium import webdriver
    
    We are importing here our webdriver that we can control later
    Code:
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import Select
    from selenium.common.exceptions import NoSuchElementException
    import xlwt
    from pyvirtualdisplay import Display
    
    xlwt is for making excel sheets and pyvirtualdisplay is for running firefox web browser in virutal display so u can work on other stuff while scraping.
    Code:
    class Fiverrgig():
        index=1 
        wbk=xlwt.Workbook() # creating xls document  
        sheet=wbk.add_sheet('sheet 1') #adding sheet
        
        def setUp(self):
            self.display= Display(visible=0,size=(800,600)) #this here creats virtual display
            self.display.start()
            self.visit_urls=[]
            self.driver = webdriver.Firefox() #here we tell our program that the browser that we will be using is firefox
            self.driver.implicitly_wait(30)
            self.base_url = ""
                
        def test_fiverrgig(self):
            
            self.driver.get(self.base_url + "/uitgebreid-zoeken") 
            self.driver.find_element_by_id("Form_Name_Lid__prefixbtnSubmit").click()
    
    Ok, this is where it gets a little tricky. So how do you find a button on what u need to click? Well u can:
    a) install firebug, and right click->inspect element with firebug then u found your element and u can copy his xpath or whatever you need. Xpath is like route to your element on webpage. There is a lot more ways u can search for a desired elemt like driver.find_element_by_class,find_element_by_id ..and others.
    A great way to find the function you need is with Eclipse and PyDev and intelisense, and if you need to do some testing before you can make your bug-free code :rolleyes::rolleyes: you can do that with a program called ipython.
    b)
    U can use the SeleniumIDE that can be installed in firefox. Basicly u start the plugin click on webpages and that tools generates the code for you that you can export and work with. Code u can get is C#, Python, Ruby,JUnit and u can choose want do you want to use,webdriver control or seleniumrc.

    So the point is first you test it and then you write your code. That is why i like python for scraping.
    Code:
       linkovi=self.driver.find_elements_by_link_text("Meer informatie") #this line here gets all the elements(links)
            for link in linkovi:
                self.posjeti_urlove.append(link.get_attribute('href'))
            
            for link_ in self.posjeti_urlove:
                print str(self.index)   +"-->"+link_
                self.driver.get(link_)
                self.driver.find_element_by_xpath("//a[contains(text(),'Contact')]").click() #again we are location some web element
                self.scrape()
     
    Ok here is where i parse the the text. First i find the element by class name and then i look at the text he has and then i parse it with some simple tricks.
    Code:
        def scrape(self):
            website=self.driver.find_element_by_class_name('kolom')
            url=website.find_element_by_tag_name('a').text
            html=self.driver.find_element_by_id('tab_1_2')
            lines=html.text.splitlines()
            phone=lines[1]
            adress=lines[-1].split(",")
            city=adresa[-1]
            PN=adresa[1]
            self.zapisi(url,phone, city, PN)
      
    After i parsed the data i will write this to xls file.

    Code:
          
        def zapisi(self,url,telefon,grad="N\A",postanski_broj="N\A"):
                 self.sheet.write(self.index,0,url)
                self.sheet.write(self.index,1,grad)
                self.sheet.write(self.index,2,postanski_broj)
                self.sheet.write(self.index,3,telefon)
                self.index+=1
                
        def is_element_present(self, how, what):
            try: self.driver.find_element(by=how, value=what)
            except NoSuchElementException, e: return False
            return True
        
        def tearDown(self):
            self.driver.quit()
            self.wbk.save("fiverr")
            self.display.stop()
    
    a=Fiverrgig()
    a.setUp()
    a.test_fiverrgig()
    a.tearDown()
    
    

    Final summation:

    With this set of tools and a little work u can probably scrape anything, as it is real browser that is browsing the web,u can make your custom email-acc creator,twitter accounts, pinterest and once you learn the basics, learning curve is quite fast.

    Well then good luck guys and happy money making :china:
     
    • Thanks Thanks x 1
    Last edited: Jun 19, 2012
  2. Alext96

    Alext96 Newbie

    Joined:
    Apr 13, 2011
    Messages:
    13
    Likes Received:
    2
    This seem a little bit too complicated for me as I'm only 15 :(
    Isn't there a easier way of making money?
    Thanks anyway for this :)
     
  3. WebScraping

    WebScraping Newbie

    Joined:
    Jun 19, 2012
    Messages:
    31
    Likes Received:
    14
    Well this tools are all out there and they are free, u can start by learning them if u have interest in programing :)
     
  4. EXtraHand

    EXtraHand Junior Member

    Joined:
    Jan 26, 2012
    Messages:
    111
    Likes Received:
    62
    Age doesn't matters..
     
  5. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    How much do you make from a single scraping project with fiverr ?
    I charge from $100 to $750(but the average is $200) it depends on complexity and the amount of data. However the competition is a killer at freelance websites.
    I complete ~7-10 projects per month.
     
  6. WebScraping

    WebScraping Newbie

    Joined:
    Jun 19, 2012
    Messages:
    31
    Likes Received:
    14
    It all depends, if it is simple then 2 gigs , by simple i mean 1 hour work. For now i have only one project that is bigger and i asked for 3 gigs peer day and i think i can do it in a week with 3-4 hours a day, but its all under $100. I think im too cheap :)

    Like u sayed its all about how complex it gets.
     
    • Thanks Thanks x 1
  7. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    I will give fiverr a try, thanks for the input.
     
  8. bdboy8

    bdboy8 Newbie

    Joined:
    Aug 6, 2012
    Messages:
    12
    Likes Received:
    1
    Occupation:
    No Idea !
    Location:
    Bangladesh
    Home Page:
    Can I do that using scrapebox?

    What code/footprint should I use?
     
  9. xdrvirusx

    xdrvirusx Newbie

    Joined:
    Oct 26, 2016
    Messages:
    16
    Likes Received:
    1
    Gender:
    Male
    hello guys i have 2 questions .
    1- how money are made after harvesting website ? i still don't understand this
    2- im ipv6 socks5 and ipv6 proxy maker/seller and i can create like 30000 ipv6 different ips for socks5 and http proxies or both .


    im not sure how will that help ?

    cheers