1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Bot for scraping URLs indexed

Discussion in 'Black Hat SEO' started by Johan bucco, May 16, 2016.

  1. Johan bucco

    Johan bucco Newbie

    Joined:
    Jul 19, 2013
    Messages:
    21
    Likes Received:
    1
    Hi there!

    I was looking for a bot or software that scrapes all the urls of a website that Google has indexed. I know with screaming I can extract all the URLs of a website, but I need to know which of them are indexed. Does someone know any tool? Thanks
     
  2. sashablack

    sashablack Elite Member

    Joined:
    Jan 8, 2010
    Messages:
    3,697
    Likes Received:
    2,059
    Gender:
    Male
    Just google this: site:www.yourwebsite.com

    All the urls you see are indexed by google. If the url is not indexed then you will not see it :)

    -Sasha
     
  3. ZennoBlaster

    ZennoBlaster Senior Member

    Joined:
    Jan 17, 2014
    Messages:
    1,030
    Likes Received:
    310
    Doesn't ScrapeBox do this?
     
  4. Johan bucco

    Johan bucco Newbie

    Joined:
    Jul 19, 2013
    Messages:
    21
    Likes Received:
    1
    scrapebox? Just writting site:domain.com on the left side? yourwebsite.com doesnt work dude, have you check it? Thanks guys!
     
  5. Johan bucco

    Johan bucco Newbie

    Joined:
    Jul 19, 2013
    Messages:
    21
    Likes Received:
    1
    I have tried with scrapebox adding the command site:mydomain.com and scrapebox gives me 676 results and if I writte it on google search bar it gives me 878 results...
     
  6. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,724
    Likes Received:
    1,993
    Gender:
    Male
    Home Page:
    If you write in google.com try clicking thru, you will likely see as you get closer to the last page it changes the number. Google does this all the time, they want to post big numbers, so they show you a lot on the first page. But often when you say "ok google give me the results" they change it, sometimes from millions down to hundreds.

    At any rate google sees Scrapebox as a browser just like firefox or chrome for example, so you get the same thing in Scrapebox.

    So you could do site:domain.com

    or crawl the whole site with scrapebox and then check indexed in google. That way you would also know which ones are not indexed so you can try and index them. Just be aware that checking indexed will require a LOT of proxies or a big delay.
     
  7. accelerator_dd

    accelerator_dd Jr. VIP Jr. VIP

    Joined:
    May 14, 2010
    Messages:
    2,448
    Likes Received:
    1,009
    Occupation:
    SEO
    Location:
    IM Wonderland
    You can also use the sitemap scrapper from scrapebox and just check all those pages against the G index.