1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox - hit a roadblock.... Javascript links don't get harvested right?

Discussion in 'Black Hat SEO Tools' started by listerdl, Jul 18, 2012.

  1. listerdl

    listerdl Junior Member

    Joined:
    Jun 11, 2012
    Messages:
    170
    Likes Received:
    8
    This is really annoying but one of the sites I am trying to rip seems to have a way to hide their links -

    Code:
    <a name="" class="contentlink" href="http://link?begeoid=33367#" onclick='s_objectID="http://website.com/WWChannels/LOCATR/partnerDetail.do?begeoid=33367#_1";return this.s_oc?this.s_oc(e):true'>
    
    [COLOR=#ff0000]THIS IS WHAT I AM TRYING TO HARVEST ------> www.then-a-website-here.com/[/COLOR]
    
    </a>
    
    The red font link IS in the code - just wondering why using Scrapebox Link Extractor it is not finding these links...

    AM I missing somethin here? Thanks
     
  2. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,147
    Because it 's not a standard link format, but a simple home-brew "protection" from scraping.
     
  3. listerdl

    listerdl Junior Member

    Joined:
    Jun 11, 2012
    Messages:
    170
    Likes Received:
    8
    any work around here ?
     
    Last edited: Jul 18, 2012
  4. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,374
    Likes Received:
    1,799
    Gender:
    Male
    Home Page:
    Its because its javascript. The link extractor is using sockets, and sockets don't support javascript.

    So when a browser loads that it parses the javascript. Scrapebox loads it and sees only the html, it can see the script, but the socket can't parse the script, so it can't actually "see" the end resulting html. So you could have a link extractor built that was single threaded, used IE rendering engine, leave javascript and image loading on and then it would load each page, parse it, click the link, and then grab it. Then go back and keep going down the page, because it looks like the actual link is never displayed until the use clicks on it.

    aka as Jazzc said, its homebrew scraping protection.

    No work around in scrapebox, it can't be done. If its a small job, hit odesk or fiverr and pay someone to go thru, click each link, and copy down the resulting url. If its a large enough scale job, find a programmer and have you own tool built. Thats the work arounds.
     
    • Thanks Thanks x 1
  5. staypositive

    staypositive Junior Member

    Joined:
    Jul 28, 2015
    Messages:
    136
    Likes Received:
    4
    hi loopline

    in 2016, is SB have the method to scrape it?

    i
     
  6. Scritty

    Scritty Elite Member Premium Member

    Joined:
    May 1, 2010
    Messages:
    2,807
    Likes Received:
    4,496
    Occupation:
    Affiliate Marketer
    Location:
    UK
    Home Page:
    Getting around that if you have a list of pages with the java link on is a couple of hours in Python. I'd go to Elancer or whoever and spend no more than $50 on getting a solution coded. If you know even a little coding yourself you could probably sort it in half a day. I suppose it depends on the scale of the problem. I've only seen it a few times over the years, but maybe it's increasing in popularity.

    Where there's a will there is a way!
     
  7. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,374
    Likes Received:
    1,799
    Gender:
    Male
    Home Page:
    To answer the above question, Scrapebox does not support javascript. So no.