1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Gscraper Question

Discussion in 'Black Hat SEO Tools' started by rodol, Mar 27, 2014.

  1. rodol

    rodol Regular Member

    Joined:
    Mar 10, 2010
    Messages:
    346
    Likes Received:
    67
    Location:
    Earth
    can someone tell me why my gscraper is scraping a lot of urls with webcache.googleusercontent.com on the url and how i can get rid of that, thanks.
     
  2. Groen

    Groen Regular Member

    Joined:
    Nov 7, 2009
    Messages:
    397
    Likes Received:
    221
    I wouldn't know why you would get such urls, but I guess you could use something like notepad++ to find and replace whatever text you wish to get rid of with nothing.
     
  3. rodol

    rodol Regular Member

    Joined:
    Mar 10, 2010
    Messages:
    346
    Likes Received:
    67
    Location:
    Earth
    Thanks, i was trying that but the url looks like this:
    numbers after =cache are all different and the keyword is also different so its complicated to clear the actual scraped url.
     
    Last edited: Mar 27, 2014
  4. BloodyDox

    BloodyDox Newbie

    Joined:
    Dec 5, 2015
    Messages:
    8
    Likes Received:
    1
    Nobody got solution on this?
     
  5. rere003

    rere003 Newbie

    Joined:
    Sep 22, 2012
    Messages:
    29
    Likes Received:
    16
    Location:
    New Java
    Use notepad++ to get rid "webcache.googleusercontent.com" using regex.
     
  6. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    609
    Likes Received:
    452
    I could show you the basic method that GScraper uses to filter URL's, but it would not solve your problem. So we will spoon feed you.

    > Go to the filter tab

    [​IMG]

    > Select URL include.
    > In the textbox enter what you want to filter out (webcashe), you do not need the Googleusercontent . com, but you can include it if it makes you feel good.
    > Go to the bottom of that groupbox next to "it cant post reply," and click do.
    >Save list back to file.

    This method will filter out what you do not want.
     
  7. Unknown Overlord

    Unknown Overlord Junior Member

    Joined:
    Nov 7, 2009
    Messages:
    104
    Likes Received:
    44
    What footprint are you using for the scrape? And what JustUs posted will do the job but
    it seems like it is something you're using as a footprint.