1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Google Keyword Tool Data Scraped - how to market this?

Discussion in 'Black Hat SEO' started by rnc505, Oct 18, 2009.

  1. rnc505

    rnc505 Regular Member

    Joined:
    Oct 28, 2008
    Messages:
    229
    Likes Received:
    109
    hey guys,

    for a few days I've been trying to make sense of the source of the google keyworld tool pages And I have found how to php curl scrape the data, captcha included. Now with this goldmine of knowledge I was wondering what I could do with it and I was thinking membership site. I would make people pay monthly (or per search) to have the data for the keywords they want to be scraped Alongside the competition in the google search for the keyword, along with other parameters I now had access to (adwords ppc $, adwords competition, etc). What would be good prices or biz models for this? Then I was thinking that this was a huge project because I would have to integrate proxy swithing with php curl, which is tough so I was thinking of selling the knowledge to the SENuke developers.

    Anybody have any ideas that I can do with this?

    If I turn it into a paying mem site, i promise to give discounts for BHWers!
    Posted via Mobile Device
     
  2. trophaeum

    trophaeum Senior Member

    Joined:
    Dec 21, 2007
    Messages:
    1,189
    Likes Received:
    706
    already got a list from russians, trust me, its not worth tryin to do it, just buy the lists that the russians have already lol
     
  3. mehdvirus

    mehdvirus Junior Member

    Joined:
    Dec 19, 2008
    Messages:
    117
    Likes Received:
    107
    isnt like dozends of apps for this ?
    or is this one has a different functionality??
     
  4. trophaeum

    trophaeum Senior Member

    Joined:
    Dec 21, 2007
    Messages:
    1,189
    Likes Received:
    706
    there are apps thatll scrape realtime and there are databases you can buy that you can search, personally iv bought databases (i like realtime processing)
     
  5. netmktg

    netmktg Newbie

    Joined:
    Jan 13, 2009
    Messages:
    15
    Likes Received:
    1
    Did you just claim to solve Google's Captcha :fish2:

    Or did you just figure out how to use Curl to scrape Keywords and think that's a "goldmine of knowledge" in itself.

    Switching proxies does not have to be in Curl. I use a Curl-based remote proxy which uses a Curl class, serialize the Post vars and switch between those web-proxies. The web-proxies themselves are just running on cheap shared hosting.
     
  6. youngguy

    youngguy Senior Member

    Joined:
    Apr 11, 2009
    Messages:
    1,053
    Likes Received:
    1,560
    Location:
    Hell
    hmm? so what is so NEW? scrape these stuff is really easy, nothing special ... hmm maybe I'm having a "goldmine knowledge" too? :haha:
     
  7. rnc505

    rnc505 Regular Member

    Joined:
    Oct 28, 2008
    Messages:
    229
    Likes Received:
    109

    I wish to make a bet with you. I will send you $10 via paypal right now if you can make a script to scrape the data from The Google
    Keyword Tool. It's coded in AJAX and the data isn't in the source so good luck...
    Posted via Mobile Device
     
  8. netmktg

    netmktg Newbie

    Joined:
    Jan 13, 2009
    Messages:
    15
    Likes Received:
    1
    You're kinda new to this whole Curl gizmo and excited about what it can do. YES, it can do AJAX calls and I've been doing it for ages. Actually, there is no such thing as an Ajax request... Ajax just uses standard Http, it just doesn't return Html

    Google uses AJAX even for Google Suggest and SKtool. I've been scraping those for a long time.

    Yes, the data isn't in the actual Html source, its Ajax after all. The data isn't in the actual Html source even for Google-Suggest.

    But Curl returns the actual Ajax data from Keyword tool, you decode it (UTF-8 / UTF-16) and that's about it.
     
    Last edited: Oct 19, 2009
  9. netmktg

    netmktg Newbie

    Joined:
    Jan 13, 2009
    Messages:
    15
    Likes Received:
    1
    Are you even using Firefox with LiveHttpHeaders plugin... anyone who uses that can clearly see the Ajax calls being made.
     
  10. rnc505

    rnc505 Regular Member

    Joined:
    Oct 28, 2008
    Messages:
    229
    Likes Received:
    109
    Okay - so you found that url to go to, but you don't know how to scrape the data/the parameters to do it. And yes I use LiveHttpHeaders.
     
  11. netmktg

    netmktg Newbie

    Joined:
    Jan 13, 2009
    Messages:
    15
    Likes Received:
    1
    Umm... the parameters are in the request itself.

    And did you even read my earlier post.... I'm ALREADY scraping Keywords from Keyword tool. And so are many many folks over at Syndk8 doing the same thing as well.
     
  12. insider

    insider Regular Member

    Joined:
    Jul 5, 2009
    Messages:
    344
    Likes Received:
    134
    Location:
    Europe
    You can easily see AJAX calls using firebug
     
  13. redstar504

    redstar504 Newbie

    Joined:
    Jun 4, 2009
    Messages:
    19
    Likes Received:
    48
    So I thought I had found the source of the keyword data, but when I loaded the page, it said:

    Code:
    var captchaError = true; var quotaExceeded = false;
    When the captcha has been entered however, it displays a page full of quite unorganized scrapable data.

    Are you guys talking about this request? How do you go about the captcha part?
    Code:
    https://adwords.google.com/select/VariationsTool?
    adgroupid=0
    &campaignid=0
    &adgroupIntegrated=false
    &skipLogin=true
    &currencyCode=CAD
    &maxCpcOverride=
    &targetLanguages=en
    &targetCountries=US
    &synonyms=true
    &captchaAnswer=<how to get around this???>
    &suggest=true
    &excludedWords=
    &allowExisting=true
    &showAdult=false
    &showTrademark=null
    &keywords=exampleKeyword
    
    If this isn't the correct ajax call, could someone fill me in? Thank you
     
    Last edited: Oct 25, 2009
  14. risefromdeath

    risefromdeath Power Member

    Joined:
    Jul 1, 2009
    Messages:
    650
    Likes Received:
    107
    sorry to bump this thread up..but does any one has/found a good php script that does this?
    i can get the captcha manually entered....but i need a script to do this (note desktop apps)
     
  15. Gyro83

    Gyro83 Newbie

    Joined:
    Apr 13, 2009
    Messages:
    35
    Likes Received:
    1
    So I never did read how ppl are scraping the google keyword tool. Could someone give some tips as to how to do it or send me a PM? I have a bit of curl but I haven't had any luck so far.
     
  16. BlackHatSoda

    BlackHatSoda Junior Member

    Joined:
    Oct 6, 2009
    Messages:
    178
    Likes Received:
    100
    Basically Google changed the way the Keyword tool works. It no longer just sends a simple HTTP request with data in the query string and post header.

    It now uses a mess of compressed JavaScript code which does XML-RPC calls. The parameters of the request are encoded somehow and unless you unravel the whole mess of JS code to figure out how the request parameters are encoded and what order they go in, you can't do this with a simple cURL request.

    And no, you can't use Fiddler or some other tool to intercept the calls and simply change the keywords in the XML-RPC data being sent in the HTTP request. There appear to be checksum values in the call data. I was able to take the call data for a single keyword and replace it with any other single keyword and make it work, but trying to pass in more than one keyword at a time didn't work.

    So I created a tool which scrapes the keyword tool which uses the IE browser control under the hood to get around the whole mess. Works just fine under multiple threads. The only issue is that you can't run multiple browser controls, each with their own proxy settings as setting the proxy for the IE control sets it for all IE browsers.

    But my tool works just fine in scraping the tool under 3-4 threads until the current proxy IP gets a temporary ban. Then I just rotate to the next proxy in the list. I'm able to suck in an average of about 1200 keywords per minute or about 1.7 million per day per computer that I run it on :)
     
    Last edited: Jan 15, 2011
  17. dreamworker

    dreamworker Newbie

    Joined:
    Jan 13, 2010
    Messages:
    14
    Likes Received:
    0
    BlackHatSoda : Do you sell this tool ?
     
  18. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Glad someone else was smart enough to mine using the old keyword tool. I kept that secret for so long but it was so easy. I figured a webbrowser control would be the easiest way to go about it now but I have no clue when it comes to captcha breaking and I have no reason to pay for decaptcha to run this.
     
  19. Gyro83

    Gyro83 Newbie

    Joined:
    Apr 13, 2009
    Messages:
    35
    Likes Received:
    1
    If you create an account and log in you don't have any captcha. Captcha is if you are not logged in. So if you can have the tool log you in as well then you would have something worth using. Anyone actually have anything to share?
     
  20. ninjacrx

    ninjacrx Regular Member

    Joined:
    Jul 22, 2008
    Messages:
    272
    Likes Received:
    203
    I was working on big project like this.. Everything worked fantastic before new adword system came in.

    You can take a look at my script live at: http://syndicatemarketing.net/007/

    If anyone is interested in purchasing full project (I can make it work again), PM me.

    It has great options.. + Automatic captcha decode.