I cannot harvest

theseodude

Regular Member
Joined
Jun 25, 2012
Messages
304
Reaction score
89
hi
I am trying to harvest all the pages of a site. I am an experienced user and I have done this in the past. but now, every time I try to scrape, it scrapes and scrapes and it finds like 6000 links. but then, it says "similar links have been removed" and after that, I am left with 1 or 2 links.

I have tried

site:http://www.domain.com


and I have tried
site:http://www.domain.com John
site:http://www.domain.com card
site:http://www.domain.com real
site:http://www.domain.com brain
site:http://www.domain.com Susan
site:http://www.domain.com (random word)
site:http://www.domain.com (random word 2)
etc.
etc.

By the time same urls are removed, I am left with like 1 or 2 links. I dont know what the hell is going on. I have done this in the past.

I am using private proxies by the way.
 

theseodude

Regular Member
Joined
Jun 25, 2012
Messages
304
Reaction score
89
Hi guys, so nobody knows why scrapebox is behaving this way?
 

EXtraHand

Junior Member
Joined
Jan 26, 2012
Messages
111
Reaction score
63
Do you mean Duplicate domain are removed ? If yes, go to the options drop down (beside, settings) and untick "Automatically Remove Duplicate Domains", it's above "Auto Recovery".
 

futurestic06

Supreme Member
Joined
Apr 16, 2011
Messages
1,204
Reaction score
149
Dude sorry but I don't get your point. I think you should tell us some more about the problem. so that I can help you in this regard.thanks
 

theseodude

Regular Member
Joined
Jun 25, 2012
Messages
304
Reaction score
89
Dude sorry but I don't get your point. I think you should tell us some more about the problem. so that I can help you in this regard.thanks

Alright, let me make it simple. How do I harvest all the pages that a domain has?
 

t0.sh

Registered Member
Joined
Jun 6, 2012
Messages
55
Reaction score
24
It's seems like your conflicting what you're looking for. Do you mean you want to download all the clickable links available on the site so you're left with a list of links or do you want to download all the individual pages of a site?
 

theseodude

Regular Member
Joined
Jun 25, 2012
Messages
304
Reaction score
89
It's seems like your conflicting what you're looking for. Do you mean you want to download all the clickable links available on the site so you're left with a list of links or do you want to download all the individual pages of a site?

I want to harvest all the pages that a domain has.
for example
domain.com/index.html
domain.com/post1.html
domain.com/post2.html
domain.com/about.html

I know for a fact that this domain has hundreds, if not thousands of pages but when I do it in sb, I get like 1 or 2.
 

jb2008

Senior Member
Joined
Jul 15, 2010
Messages
1,158
Reaction score
976
Just use site:http://domain.com

Don't put a keyword after it.

And use hrefer, not scrapebox, with good quality tested public proxies, large list. Search some of my posts about hrefer.
 

williamk

BANNED
Joined
Oct 29, 2009
Messages
1,031
Reaction score
187
I would advice the following:


a) make sure that the proxies are working well and not blacklisted.
b) use the site operator ( You are already using it)
c) Try with a low number of threads.
d) Call support if all these are ok and you still cannot scrape.
 
Top