1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox Harvest Millions of Links But Only Get Few ?

Discussion in 'Black Hat SEO Tools' started by myfault, Jul 12, 2013.

  1. myfault

    myfault Power Member

    Joined:
    Sep 21, 2012
    Messages:
    636
    Likes Received:
    121
    I scrape lot of keywords and merge with footprints which i got it from here. Harvesting runs for the 1 day and scrape around 1.2 millions urls but when i choose remove duplicate domains i found only 25000 urls, where am i doing wrong ?
     
  2. rob1977

    rob1977 Power Member

    Joined:
    Mar 25, 2013
    Messages:
    773
    Likes Received:
    666
    I'm not sure your doing anything wrong, when I use it I drop right down until I'm left with far fewer than that. I tend to be left with somewhere between 100 & 200 url and they tend to be strong enough to do there stuff. From those url I'll manually submit anything up to 20% of them and the rest are just window dressing where the follow status is unimportant to me.

    This is just how I use it. It works for my niche I hope that helps somewhat
     
  3. myfault

    myfault Power Member

    Joined:
    Sep 21, 2012
    Messages:
    636
    Likes Received:
    121
    Someone can answer ?
     
  4. gullsinn

    gullsinn Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 24, 2009
    Messages:
    2,429
    Likes Received:
    2,210
    Gender:
    Male
    Occupation:
    Jobless :D
    Location:
    Graveyard
    Home Page:
    You are not doing wrong, you are on right track go ahead!
     
  5. slim_dusty

    slim_dusty Jr. VIP Jr. VIP Premium Member

    Joined:
    Jun 5, 2011
    Messages:
    392
    Likes Received:
    115
    Location:
    Middle earth
    If you are scraping lots of keywords together, and the keywords are similar, then you will get lots of duplicate results. You do things this way to get the maximum number of results, but you will inevitable get large numbers of duplicate results which need to be filtered out.
     
    • Thanks Thanks x 1
  6. faisalmaximus

    faisalmaximus Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 6, 2013
    Messages:
    524
    Likes Received:
    54
    Occupation:
    Online Marketer
    Home Page:
    I am also facing same kind of problem and was searching for a solution. Beside scraping URLs, I have a long list of around 340k auto approved blog list which I use. This is because most scraped urls do not approve links in comments.
     
  7. SEO_Alchemy

    SEO_Alchemy Senior Member

    Joined:
    Sep 8, 2012
    Messages:
    1,134
    Likes Received:
    1,213
    Location:
    USA
    That's just how the scraping process works. You merge lots of keywords and there will be lots and lots of overlap, you're just trying to find as many unique as possible. The joys of scraping :)
     
  8. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    Its been a long while since I used SB but as I recall that is not entirely unusual. The only time I can recall getting lots of unique domains is when I scrape against the dictionary for my keywords. Another time I recall I scraped against the Yellow pages categories plus every city in America with a population of over 100,000 and that was good..

    So get the list of all the yellow pages categories. Taxis plumbers etc. Now append those with the names of all the cities and towns in the US with more than 100k in population. Now scrape.
     
    • Thanks Thanks x 1
  9. myfault

    myfault Power Member

    Joined:
    Sep 21, 2012
    Messages:
    636
    Likes Received:
    121
    Than what is correct method to scrape the keywords so i avoid much less duplicate domains ?
     
  10. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    Create massive keyword lists.. hey I was actually just thinking about this thread and I realized you are doing something wrong. Don't eliminate duplicate domains after you scrape. Scrape the millions of urls right? Then try to post on all of them.

    See if you are trying to create blog comment links on website.com comments may be closed on website.com/p1 but website.com/p99 may be wide open with no outbound links. Get it?
     
  11. slim_dusty

    slim_dusty Jr. VIP Jr. VIP Premium Member

    Joined:
    Jun 5, 2011
    Messages:
    392
    Likes Received:
    115
    Location:
    Middle earth
    Agree with this- you could eliminate duplicate urls but not duplicate domains.

    You have a keyword list (can use scrapebox keyword scraper) and merge with footprints. I would normally also merge a list of common words, but I really like Bostoncab's idea of having a merge list having all the words of the dictionary, or all the Yellow Pages categories/major cities. This may improve the number of uniques domains you scrape.
     
  12. donaldbeck

    donaldbeck Power Member

    Joined:
    Dec 28, 2006
    Messages:
    585
    Likes Received:
    212
    More keywords, more footprints.

    What kind of footprints are you using and how many keywords?
     
  13. cooltoad

    cooltoad Senior Member

    Joined:
    Sep 10, 2010
    Messages:
    934
    Likes Received:
    549
    Occupation:
    None of your business
    Location:
    On Vacation
    Well even though your username says its your fault OP, trust me it is not. First up, if you are using the scraped links for comment posting - DO NOT remove Duplicate domains. Secondly, when you do find an AA site, just scrape all the pages of that site and you will have a bloated AA number instantly.
     
  14. bendutchman

    bendutchman Junior Member

    Joined:
    Jun 1, 2012
    Messages:
    131
    Likes Received:
    41
    Occupation:
    genetic engineer
    Location:
    House, Road House
  15. *Hawke*

    *Hawke* Junior Member Premium Member

    Joined:
    Apr 9, 2013
    Messages:
    126
    Likes Received:
    81
    Home Page:
    Maybe you should try GScraper , the build-in proxy you can get amount of urls list.
     
  16. proxygo

    proxygo Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 2, 2008
    Messages:
    10,201
    Likes Received:
    8,689
    scraping the same list in gscraper wont reduce the amount
    of dupe urls hes getting, its about having keywords that arnt
    to similar
     
  17. Sweetfunny

    Sweetfunny Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 13, 2008
    Messages:
    1,747
    Likes Received:
    5,038
    Location:
    ScrapeBox v2.0
    Home Page:
    Two tips are, make a merge file with the following to merge with your keywords

    Then when you harvest the queries will be like:

    Keyword1 .com
    Keyword1 .net
    Keyword1 .org
    Keyword1 .info

    And this will give you a more diverse range of domain extensions, which means less duplicate domains. Also instead of using "Remove Duplicate Domains" use the very next option in the menu "Split Duplicate Domains" this will leave only unique domains in the harvester grid the same as using Remove. But it will split off the rest and save them to a file and you can probably use them for another project.

    If you spend the time and resources to harvest millions of urls it makes no sense to vaporize the dupe domains and waste them all, split them and save them instead.
     
    • Thanks Thanks x 1