1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

For everyone using Google Keyword scraper tools

Discussion in 'Black Hat SEO Tools' started by Subsonic, Nov 29, 2011.

  1. Subsonic

    Subsonic Regular Member

    Joined:
    Mar 17, 2011
    Messages:
    367
    Likes Received:
    333
    Location:
    DNS root zone database
    Hey guys,

    Today I suddenly noticed that Google keywords scraping feature in my software didn't work anymore. Quick check with Wireshark showed me that Google has changed the address where it takes the suggestions for search keywords. I'm not sure if this is permanent but I think so.

    Just wanted to tell you that if your favorite keyword scraper doesn't work anymore you just have to wait for the developer to find out the new address and update the tool :) If it helps anyone here's the old and new addresses for keyword scraping:

    [OLD] http://clients1.google.com/complete/search?hl=en&gl=&q=[KEYWORD]
    [NEW] http://suggestqueries.google.com/complete/search?output=firefox&client=firefox&hl=en-US&q=[KEYWORD]

    I hope that this helps some developers to quickly update their software! Oh and one more thing. At least for me the output is a little different so I also had to update the parser to get the keywords out from the response message.

    PM me if you need help with updating or something.
     
  2. Greeneye

    Greeneye Junior Member

    Joined:
    Sep 23, 2011
    Messages:
    117
    Likes Received:
    59
    Since the sub url structure is still the same, you can probably just update your host file to work around the issue.

    try adding this to your hosts file, it may do the trick.

    Code:
    173.194.64.113 clients1.google.com
     
  3. Subsonic

    Subsonic Regular Member

    Joined:
    Mar 17, 2011
    Messages:
    367
    Likes Received:
    333
    Location:
    DNS root zone database
    That might work for some tools but as far as I know the structure of the response message has changed so if the tool was built to parse the old data there's nothing to do (except update the tool).. :) I might be wrong because I only tried using "firefox" for output and client parameters. Results might differ depending on client (chrome, ie etc.)
     
    Last edited: Nov 29, 2011
  4. jvmartija

    jvmartija Regular Member

    Joined:
    Oct 30, 2009
    Messages:
    234
    Likes Received:
    37
    Hope there would be another tool.
     
  5. banel

    banel Regular Member

    Joined:
    Mar 30, 2010
    Messages:
    287
    Likes Received:
    16
    Can anyone share a tool that works ?
     
  6. Subsonic

    Subsonic Regular Member

    Joined:
    Mar 17, 2011
    Messages:
    367
    Likes Received:
    333
    Location:
    DNS root zone database
    I actually have a fully working Google keyword scraper developed for my upcoming domaining software so I could take some time and make a standalone version of it. At least with basic scraping functionality at first :)
     
  7. htmlinstant

    htmlinstant Newbie

    Joined:
    Nov 30, 2011
    Messages:
    2
    Likes Received:
    0
    Can someone PM me. I'm new and can't post any links. I have an easy example (using jsbin) that uses the old clients1 url that I want to share, that can be edited and shared on here. (that doesn't work)
     
  8. sockpuppet

    sockpuppet Junior Member

    Joined:
    Nov 7, 2011
    Messages:
    155
    Likes Received:
    145
    here is a nodejs script that gets the job done
    you need nodejs to run it
    Code:
    nodejs.org/#download
    
    copy the code into a file named scraper.js, open a command line and type
    Code:
    node.exe scraper.js keyword > keywords.txt
    
    some random notes
    -you can add as many keywords as you like, but see next
    -no proxy support, so you get banned if you have too many keywords or use it to frequently
    -if your ip is banned you see something like "could not parse result <html..." on the command line
    -if your keyword contains spaces you have to wrap it in "
    -you can edit the line "var CYCLES = 4;" to specify the number of cycles the tool should do, by cycle i mean take the results and put them again into the keyword suggestion
    -i never tested it on windows. i actually never tested it really long at all, i just wrote it today



    Code:
    var http = require('http');
    
    var CYCLES = 4;
    
    
    if ( process.argv.length < 3 ) {
    	process.stdout.write('usage: node scraper.js [keyword1] [keyword2] ...\n');
    	return;
    }
    var keywords = process.argv.slice(2);
    
    function scrape( keyword, deep ){
    	if ( !deep ) deep = 0;
    	if ( deep == CYCLES ) return;
    	process.stdout.write( keyword+'\n' );
    	http.get({
    		host: 'suggestqueries.google.com',
    		port: 80,
    		path: '/complete/search?output=firefox&client=firefox&hl=en-US&q='+ encodeURIComponent(keyword)
    	}, function(res) {
    		var body = "";
    		res.on('data', function (chunk) {
    			body = body + chunk;
    		});
    		res.on('end', function(){
    			try {
    				var data = JSON.parse( body );
    				if ( data.length < 2 || data[1].length == 1 ) {
    					// comment this out -> lot of messages
    					//process.stderr.write("no data found for keyword "+keyword+"\n" );
    					return;	
    				}
    				words = data[1].slice(1);
    				words.forEach(function(item){ scrape(item,deep+1); });
    			} catch(e){
    				process.stderr.write("could not parse result "+body+": " + e.message+"\n");		
    			}
      		});
    	}).on('error', function(e) {
    		process.stderr.write("error while scappring "+keyword+": " + e.message+"\n");
    	});
    }
    keywords.forEach( scrape );
    
     
  9. Subsonic

    Subsonic Regular Member

    Joined:
    Mar 17, 2011
    Messages:
    367
    Likes Received:
    333
    Location:
    DNS root zone database
    • Thanks Thanks x 1
  10. htmlinstant

    htmlinstant Newbie

    Joined:
    Nov 30, 2011
    Messages:
    2
    Likes Received:
    0
    Solved the problem:

    I had the "client" as "pdc", which stopped working. Change it to "news"