1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

I can't harvest with scrapebox

Discussion in 'Black Hat SEO' started by Drago05, Jan 7, 2011.

  1. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    If use operators like "inurl", "intitle" it doesn't scrape anything. I have filtered about 200 public proxies with 2000 time out but it doesn't work at all.

    Does anyone also have the same problem?
     
  2. kn1024

    kn1024 Newbie

    Joined:
    Dec 14, 2010
    Messages:
    34
    Likes Received:
    83
    Do you stick a use of custom footprint?
     
  3. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    Yes, i use costume footprints. The strange thing is i can scrape if i don't use advanced operators, but i can't when i use advanced operators. And i need to use advanced operators.
     
  4. iglow

    iglow Elite Member

    Joined:
    Feb 20, 2009
    Messages:
    2,080
    Likes Received:
    856
    Home Page:
    yeah advanced operators like " " are fucked. dunno why
     
  5. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    Does anyone know another program that i can use to scrape url's which works with advanced operators?
     
  6. Kendro65

    Kendro65 Newbie

    Joined:
    Dec 31, 2010
    Messages:
    10
    Likes Received:
    0
    whats good on it
     
  7. faster

    faster Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 3, 2011
    Messages:
    1,749
    Likes Received:
    186
    Home Page:
    I faced a little bit problem.I use costume footprints. The strange thing is i can scrape if i don't use advanced operators, but i can't when i use advanced operators. And i need to use advanced. Has any advance operator.
     
  8. sargerevenge

    sargerevenge Newbie

    Joined:
    Nov 17, 2010
    Messages:
    33
    Likes Received:
    8
    I think I see your problem. You are using "costume footprints" i think you should use custom footprints instead.
     
  9. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    I see i am not the only one with this problem. I hope someone who knows how to work with this program to give some advice.
     
  10. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    Dude, "custom footprint" is a radio button. It got nothing to do with the spell LOL
     
  11. Abh_Empire

    Abh_Empire Regular Member

    Joined:
    Sep 21, 2010
    Messages:
    222
    Likes Received:
    47
    Location:
    Romania
    I read somwhere around here yesterday that inurl kills your proxies fast.
    I can vouch for that because i used them on around 100k blogs and at one point they started scraping less and less blogs, then they just didn't work at all.
     
  12. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    Drago, you're not alone. Special operators harvest slow as hell and burn out your proxies fast. G is far more restrictive on special operators than on regular queries. I was once doing keyword research and needed to enter around 100 allintitle: queries over the space of an hour. I thought it wouldn't be a problem. I was wrong - I was IP banned after around 50 queries. Even if you have hundreds or thousands of proxies, unfortunately they will burn out too quickly. At all costs, I avoid scraping using special operators. With any substantial lists, it's practically impossible to do. I harvest tens or even hundreds of millions of URLs at a time - this would be impossible with special operators. A few thousand URLs would be alright, but once you're pushing past a million assuming you get there as it's so damn slow, you are screwed.

    My 'solution' to this is just a simple workaround. Instead of using the inurl: operator, use the string in quotes. For example, rather than inurl:index.php? you will be doing "index.php?" . You will often find that you get just as targeted results as with the inurl operator, i.e. accurate / the URLs you are looking for. But just to check, go into your browser and google the string in quotes and take a look at the URLs that come up. Click on a few and browse them. Are they still what you are looking for? If so, then great.

    Often, if the inurl: string is quite unique, then simply putting it in quotation marks suffices, because G searches not only the page content, but also the url content, in fact, url content is one of the main things determining search results, ask any owner of an exact match domain. Putting "sadasdfasnh43???php_content=op&At3232" instead of inurl:sadasdfasnh43???php_content=op&At3232 is hardly going to lead to much difference, because the string is so unique G will have to find it in the URL, not the page text itself. Try it out.
     
    • Thanks Thanks x 2
  13. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    Thanks for the info. I will try with quotes instead with advanced operators. But you say that, even if you use advanced operators you can scrape though they are slow and dying fast, but i can't scrape nothing if i use advanced operators - zero, nada. I don't know if i use privet proxies will make a difference.
     
  14. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    I doubt it. The search engines are not perfect. It's likely the special operator is not being read well by the search engine, have you tried writing that footprint in a regular browser search and seeing what comes up?

    My advice is you accept that special operators are often not viable and find ways to work around it. I have not been hindered by quotes.
     
    • Thanks Thanks x 1
    Last edited: Jan 8, 2011
  15. Drago05

    Drago05 Junior Member

    Joined:
    Oct 31, 2010
    Messages:
    151
    Likes Received:
    10
    Location:
    Europe
    These are some tests i made with advanced operators for scraping. I used different type of proxies: elite, anonymous, transparent, socks proxies - no success. But if i uncheck "use multi-threaded harvester" i can see every proxie that scrapebox use at the moment for scraping and for the most of them i see "error (302) IP blocked". So, i assume google blocked them. But the strange thing is that if i don't use advanced operators and scrape with the SAME proxies google don't block them and scrapebox can harvest url's.

    I don't understand why when i use advanced operators the proxies are instantly blocked by google. I thought it will take some time before they are blocked, but if they are blocked instantly maybe i do something wrong. I don't know.
     
  16. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    That's simple, Drago.

    Special operators take very, very small amounts of searches for the IP to get blocked. That's why you can never harvest properly with them. Regular search, on the other hand, takes considerably larger amounts of usage to get IP blocked. Doesn't matter what type of proxy or IP - more than a few special operators => IP ban. Like I said you've got to work around it.
     
    • Thanks Thanks x 1
  17. pen---

    pen--- Registered Member

    Joined:
    Dec 25, 2010
    Messages:
    91
    Likes Received:
    6
    for 0, do you use new proxy checker?
     
  18. taxx83

    taxx83 Registered Member

    Joined:
    Sep 24, 2010
    Messages:
    55
    Likes Received:
    10
    Occupation:
    .Net Application / Web / Bot Developer
    Location:
    Dirty Jersey
    Home Page:
    I've noticed the last couple weeks I've been getting 5k harvested links after removing dupes opposed to over 100k. I'm using the standard wordpress option and keywords I've used in the past. I'm not a complete noob either. But I am human, and could be making a stupid mistake. More curious if anyone else is having an issue, specifically in the last 2-3 weeks.