- Apr 25, 2011
- 19,795
- 28,113
Just a quick guide that will show you how to scrape your own lists for using with GSA SER - of course GSA SER has its own built in tool for scraping URLs to post to, but its not that great and also it means using resources that you really want to reserve for actually posting to URLs rather than finding them.
Things you will need:
STEP 1: Getting the list of footprints together
First of all you need a list of footprints and preferably the ones that GSA actually uses - you can get these by opening GSA then going to options, tools and clicking on "search online for URLs".
Once in there you can go through the engines one by one and select "Add all footprints for...". This will gradually compile a list of all the footprints that you need.
Once you have them you can tick the "Save to file" box or just copy and paste them into a text file.
Once you have done this for all engines you should end up with around 2300 footprints.
STEP 2: Getting the list of keywords together
The next thing you need is a list of keywords - I normally do this for each project so that I get a targeted list of keywords and end up with a relatively unique if not perfectly targeted list of URLs at the end of it all.
So open up Scrapebox and click on Scrape and then keyword scraper.
Put a couple of seed keywords in here for your niche and then click on Start. Once it completes click on "Transfer results to left side" and then click start again. Leave it running for a while until you get a nice list of keywords.
Once you have this list export it to a text file.
STEP 3: Merging your list of keywords with your footprints to create a list of Google search queries.
Go back to the main Scrapebox screen, clear anything currently in the keyword section and import your list of keywords that you previously scraped.
Then click on "M" and import your list of footprints as well - this will merge the two together to create a nice list of search queries.
Personally when I do this I use around 2500 keywords along with the 2300 footprints and this gives you around 5 million queries - if you have more keywords than that you can reduce it down using Dup remove - instructions included in the next step.
Once you have the list of keyword+footprint combinations save them to a new text file.
STEP 4: Randomizing and splitting your list
This isnt imperative but I normally randomize my list so that I get an even amount of URLs for all of the different platforms. I also normally split my list because I don't need to scrape using 5 million queries.
So if you goto addons in scrapebox and install dup remove you can acheive this easily.
Once installed open dup remove and first randomize the list and then open that randomized list and split it. I personally split my lists into 10k files and then scrape using those one at a time.
STEP 5: Begin harvesting
Now that you have a set of 10,000 randomized keyword+footprint combinations you can import them back into scrapebox and begin scraping - just make sure you clear anything already in the keyword section, import the 10k list and then click on "Start harvesting".
With 30 proxies and a 4 core VPS I normally leave this running for 12-24 hours and then stop it. This should give you 1-2 million URLs.
Once that is complete close the harvester and they will be transfered into the main window where you can remove duplicates and then save the list.
STEP 6: Sorting the list in to GSA
Once you have this list it needs to be sorted in to GSA. This can be done by clicking on options, tools and then Import URLs. In my experience importing a file often fails so I open the text file first, copy all URLs and then use the procedure above but select "Import from clipboard". Once you have done that GSA will start looking through the list and importing them into its "Identified" list.
STEP 7: Start posting
Importing 2 million URLs will probably give you somewhere between 300k - 500k URLs that GSA can post to and once you have that you are good to go.
LPM after using this method:
Things you will need:
- GSA SER (Of course).
- Scrapebox
- Proxies - 30 shared proxies will do fine
STEP 1: Getting the list of footprints together
First of all you need a list of footprints and preferably the ones that GSA actually uses - you can get these by opening GSA then going to options, tools and clicking on "search online for URLs".
Once in there you can go through the engines one by one and select "Add all footprints for...". This will gradually compile a list of all the footprints that you need.
Once you have them you can tick the "Save to file" box or just copy and paste them into a text file.



Once you have done this for all engines you should end up with around 2300 footprints.
STEP 2: Getting the list of keywords together
The next thing you need is a list of keywords - I normally do this for each project so that I get a targeted list of keywords and end up with a relatively unique if not perfectly targeted list of URLs at the end of it all.
So open up Scrapebox and click on Scrape and then keyword scraper.

Put a couple of seed keywords in here for your niche and then click on Start. Once it completes click on "Transfer results to left side" and then click start again. Leave it running for a while until you get a nice list of keywords.


Once you have this list export it to a text file.

STEP 3: Merging your list of keywords with your footprints to create a list of Google search queries.
Go back to the main Scrapebox screen, clear anything currently in the keyword section and import your list of keywords that you previously scraped.
Then click on "M" and import your list of footprints as well - this will merge the two together to create a nice list of search queries.
Personally when I do this I use around 2500 keywords along with the 2300 footprints and this gives you around 5 million queries - if you have more keywords than that you can reduce it down using Dup remove - instructions included in the next step.
Once you have the list of keyword+footprint combinations save them to a new text file.

STEP 4: Randomizing and splitting your list
This isnt imperative but I normally randomize my list so that I get an even amount of URLs for all of the different platforms. I also normally split my list because I don't need to scrape using 5 million queries.
So if you goto addons in scrapebox and install dup remove you can acheive this easily.
Once installed open dup remove and first randomize the list and then open that randomized list and split it. I personally split my lists into 10k files and then scrape using those one at a time.


STEP 5: Begin harvesting
Now that you have a set of 10,000 randomized keyword+footprint combinations you can import them back into scrapebox and begin scraping - just make sure you clear anything already in the keyword section, import the 10k list and then click on "Start harvesting".
With 30 proxies and a 4 core VPS I normally leave this running for 12-24 hours and then stop it. This should give you 1-2 million URLs.
Once that is complete close the harvester and they will be transfered into the main window where you can remove duplicates and then save the list.

STEP 6: Sorting the list in to GSA
Once you have this list it needs to be sorted in to GSA. This can be done by clicking on options, tools and then Import URLs. In my experience importing a file often fails so I open the text file first, copy all URLs and then use the procedure above but select "Import from clipboard". Once you have done that GSA will start looking through the list and importing them into its "Identified" list.
STEP 7: Start posting
Importing 2 million URLs will probably give you somewhere between 300k - 500k URLs that GSA can post to and once you have that you are good to go.

LPM after using this method:
