theseodude
Regular Member
- Joined
- Jun 25, 2012
- Messages
- 304
- Reaction score
- 89
hi
I am trying to harvest all the pages of a site. I am an experienced user and I have done this in the past. but now, every time I try to scrape, it scrapes and scrapes and it finds like 6000 links. but then, it says "similar links have been removed" and after that, I am left with 1 or 2 links.
I have tried
site:http://www.domain.com
and I have tried
site:http://www.domain.com John
site:http://www.domain.com card
site:http://www.domain.com real
site:http://www.domain.com brain
site:http://www.domain.com Susan
site:http://www.domain.com (random word)
site:http://www.domain.com (random word 2)
etc.
etc.
By the time same urls are removed, I am left with like 1 or 2 links. I dont know what the hell is going on. I have done this in the past.
I am using private proxies by the way.
I am trying to harvest all the pages of a site. I am an experienced user and I have done this in the past. but now, every time I try to scrape, it scrapes and scrapes and it finds like 6000 links. but then, it says "similar links have been removed" and after that, I am left with 1 or 2 links.
I have tried
site:http://www.domain.com
and I have tried
site:http://www.domain.com John
site:http://www.domain.com card
site:http://www.domain.com real
site:http://www.domain.com brain
site:http://www.domain.com Susan
site:http://www.domain.com (random word)
site:http://www.domain.com (random word 2)
etc.
etc.
By the time same urls are removed, I am left with like 1 or 2 links. I dont know what the hell is going on. I have done this in the past.
I am using private proxies by the way.