1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

want to make tool that scrape 10 first results from google

Discussion in 'General Programming Chat' started by forvenz, Jan 6, 2014.

  1. forvenz

    forvenz Regular Member

    Joined:
    Jan 25, 2013
    Messages:
    454
    Likes Received:
    56
    Hi ,
    I prefer to do that with javascript or with java , but if there is a better language that suit for this job so no problem .
    I want to make a tool that get a input (X numbers of keywords ) and return the 10 first url in google to each of these keywords .
    how can I do that? Do I need to use google api?
     
  2. virtualc08

    virtualc08 Supreme Member

    Joined:
    Mar 23, 2010
    Messages:
    1,380
    Likes Received:
    951
    Why don't you just buy Scrapebox and use that?
     
  3. DiamonMike

    DiamonMike Regular Member

    Joined:
    Aug 22, 2013
    Messages:
    217
    Likes Received:
    63
    Or try Market Samurai....it does exactly what you want
     
  4. divok

    divok Senior Member

    Joined:
    Jul 21, 2010
    Messages:
    1,015
    Likes Received:
    634
    Location:
    http://twitter.com/divok
    Install Scraper for Google Chrome if you are planning to use it for personal use .
    Python has better libraries for scraping & you can convert it to windows program easily or run on linux .
    also try zenno and ubot .
     
  5. Shirko

    Shirko Junior Member

    Joined:
    Aug 11, 2012
    Messages:
    193
    Likes Received:
    172
    Location:
    adding monkeys to my papal
    Any language is good for doing this task... this is way too simple.

    You just need to know programming or hire someone to do it for you.
     
  6. forvenz

    forvenz Regular Member

    Joined:
    Jan 25, 2013
    Messages:
    454
    Likes Received:
    56
    I know little bit programming but I need a direction how to do that ,
    and with scrape box it won't scrape the 10 first results
     
  7. forvenz

    forvenz Regular Member

    Joined:
    Jan 25, 2013
    Messages:
    454
    Likes Received:
    56
    someone ???:confused:?
     
  8. mypmmail

    mypmmail Junior Member

    Joined:
    Jan 31, 2008
    Messages:
    111
    Likes Received:
    27
    What you can do is to look at the google result html code.

    e.g.
    when using Firefox, the google result has the
    Code:
    <div id="ires" data-async-context="query:your_query_string">
    <ol id="rso" eid="lkdjhaaifubiuad">
    <li class="g">
    <div data-hveid="44" class="rc"><span class="altcts"></span><h3 class="r"><a href="......">1st result title</a></h3>
    </li>
    
    // ......  the <li> will repeat for each result
    
    </ol>
    </div>
    
    With this known format, you can sieve through the source by looking for the string data-async-context and extract the result as you wish from there on.


    hth
     
  9. forvenz

    forvenz Regular Member

    Joined:
    Jan 25, 2013
    Messages:
    454
    Likes Received:
    56
    yes but how can I access the data?
    what function give me access to the data of a web page?
     
  10. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Premium Member

    Joined:
    Nov 10, 2012
    Messages:
    10,139
    Likes Received:
    28,607
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
  11. garthor

    garthor Newbie

    Joined:
    Mar 24, 2013
    Messages:
    48
    Likes Received:
    13
    This task can be done in many different languages. If i were you I would use Visual Basic,
    You can take advantage of the webbrowser component. Simply make it navigate to the page you'd like
    and then it's all about parsing the HTML document in the webbrowser.
     
  12. mypmmail

    mypmmail Junior Member

    Joined:
    Jan 31, 2008
    Messages:
    111
    Likes Received:
    27
    If you are using PHP, then you can use cURL.

    If you are using java, then URLConnection. Or, if you are using a library, use HttpClient

    If you are using javascript, then you need to use an ajax call using XMLHttpRequest

    hth
     
    Last edited: Jan 8, 2014
  13. tratata

    tratata Newbie

    Joined:
    Jul 26, 2013
    Messages:
    14
    Likes Received:
    5
    Better to use specialized software. Because after simple script development would be needed proxy support, captcha filling and etc
     
  14. termseo

    termseo Junior Member

    Joined:
    Nov 4, 2010
    Messages:
    103
    Likes Received:
    160
    Occupation:
    Software ingineer
    it can be done by most languages, you want to code it from scrach ? or you are searching for available softwares which do the task ? cause the most of tools do it...
     
  15. bytzu

    bytzu Registered Member

    Joined:
    Jun 30, 2011
    Messages:
    96
    Likes Received:
    138
    You can harvest the first 10 results from G with scrapebox. You can see it here:

    http://www.youtube.com/watch?v=LyCLfL_ffqQ

    Go to min 2.46

    Good luck
     
  16. justone

    justone Elite Member

    Joined:
    Oct 12, 2008
    Messages:
    1,516
    Likes Received:
    1,037
    Occupation:
    -
    Location:
    Europe
    You will find a full featured PHP source for scraping Google at http://google-rank-checker.squabbel.com
    To only scrape the 10 first results you need to set it to 1 result page and the number of results per page to 10.

    It is open source so you can use the code for your stuff but converting that one to javascript or java would be difficult, but after all PHP is a very nice language and easy to understand.
     
  17. jamb0ss

    jamb0ss Junior Member

    Joined:
    Feb 9, 2012
    Messages:
    125
    Likes Received:
    45
    Occupation:
    Bots programming
    • Thanks Thanks x 1
  18. hamd01

    hamd01 Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 29, 2010
    Messages:
    560
    Likes Received:
    127
  19. FJX

    FJX Jr. VIP Jr. VIP Premium Member

    Joined:
    Oct 13, 2011
    Messages:
    356
    Likes Received:
    186
    Location:
    0x90
    This is just a lazy snippet that I've written using Python.

    Code:
    # import libraries
    import requests
    from bs4 import BeautifulSoup as BS
    import time
    
    # settings
    keywords = ['bla bla', 'ho ho ho']
    links_amount = 10
    timeout = 5
    results = []
    # initialize session
    s = requests.session()
    
    # set headers
    s.headers = {
        'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0',
        'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
    }
    t = time.time()
    for keyword in keywords:
        get_results = s.get('https://www.google.com.ph/search?q='+str(keyword)+'&safe=off', timeout = timeout)
        soup = BS(get_results.content)
        # get the text results, exclude images.
        for srg in soup.findAll(attrs={'class' : 'srg'}):
            for li in srg.findAll('li', attrs={'class' : 'g'}):
                if len(results) < links_amount:
                    results.append(li.h3.text)
                else:
                    break
        # display result for this keyword
        print 'Results for keyword:',keyword
        for text in results:
            print text
        results = []
    print 'Elapsed time:',time.time() - t
    
    Code:
    Results for keyword: bla bla
    blabla kids
    Knit Dolls - Blabla
    Easy car sharing with BlaBlaCar, the UK's leading low cost ...
    BLA BLA - NFB/interactive - National Film Board of Canada
    Urban Dictionary: bla bla bla
    Bla Bla - Wikipedia, the free encyclopedia
    Gigi D' Agostino: Blablabla - YouTube
    Chat Blá Blá : sala aeiou
    Bla Bla Music | Facebook
    Bla Bla :: Beatport
    Results for keyword: ho ho ho
    Ho ho ho - Wikipedia, the free encyclopedia
    Ho, Ho, Ho - Wikipedia, the free encyclopedia
    Santa Ho Ho - YouTube
    Ho Ho Ho! - YouTube
    Urban Dictionary: ho ho ho
    Yo ho ho (1981) - IMDb
    Ho Ho Ho (2009) - IMDb
    Santa Claus' Christmas home | SantaClaus.com since 1994
    Download: Ho! Ho! Ho! Canada 5 | The Line Of Best Fit
    Download: Ho! Ho! Ho! Canada 4 | The Line Of Best Fit
    Elapsed time: 8.77699995041
    
    Can be done with less lines of codes, but it's clean..so w/e.

    Goodluck!
     
    • Thanks Thanks x 2
  20. Gogol

    Gogol Elite Member

    Joined:
    Sep 10, 2010
    Messages:
    3,066
    Likes Received:
    2,872
    Gender:
    Male
    Not sure how you would scrape google with just javascript, but I did make a keyword tracker project using PHP which can be useful here. You will need to modify it to return the urls instead of the position of your domain. The basic fetch algo will remain the same.
    Check that thread here:
    http://www.blackhatworld.com/blackh...serp-position-checker-script-written-php.html

    Let me know if you are unsure about something there. :)