1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox tip needed

Discussion in 'Black Hat SEO Tools' started by BK1981, Nov 23, 2015.

  1. BK1981

    BK1981 Newbie

    Joined:
    Oct 12, 2014
    Messages:
    6
    Likes Received:
    0
    Hi, guys,

    I am scraping a directory site. I would like to pull every external link that goes to the listed company's site, which is in a green button (<a href="TARGET URL HERE" target="_blank" rel="nofollow" class="button-link btn btn-green-sl btn-sm sl-ext">Visit Website<span class="sl-ext"><span class="element-invisible"> (link is external)</span></span></a>). But each profile page has other links, and the link extractor is pulling them as well. How do I just get the links with (in this case) a green button?
     
  2. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    So you have a list of all the external links from the UserProfiles (and the Links of the Button [CompanyWebsite]). And now you want to exclude the links of the Directory Site, so the External Links from the Buttons are remaining? Is that right? //Edit: You could try to remove them from the list with "Manage List" > "Remove/Filter" > "Remove urls containing..."
     
    Last edited: Nov 23, 2015
  3. BK1981

    BK1981 Newbie

    Joined:
    Oct 12, 2014
    Messages:
    6
    Likes Received:
    0
    No. I have a list of all the user profile URLs, and I want to just get the links which go to their external websites, without any of the other links in their profiles. Each profile also has links to client sites, LinkedIn profiles and who knows what. The links I want are in a button in each user profile.

     
  4. flux252

    flux252 Newbie

    Joined:
    Nov 14, 2013
    Messages:
    46
    Likes Received:
    26
    Why not try the custom data grabber in scrapebox? It's not 100% but it just might get what you need. Though, this only work well if the site has a fixed pattern in the code that you're trying to extract. There's a guide on youtube that teaches you how to set up the modules.
     
  5. BK1981

    BK1981 Newbie

    Joined:
    Oct 12, 2014
    Messages:
    6
    Likes Received:
    0
    I was thinking about it. Does it work at analyzing page source code, or just what is displayed? Can you give an example of a custom query setup to only grab data from page elements that fit a certain pattern?

     
  6. flux252

    flux252 Newbie

    Joined:
    Nov 14, 2013
    Messages:
    46
    Likes Received:
    26
    There's a before-after or regex field you can input. It scans the page source code. An example could be:

    Code:
    before_after=<a href="|" target="_blank" rel="nofollow" class="button-link btn btn-green-sl btn-sm sl-ext">
    Take not the above might not work well as the before is quite generic and it might just pull data from the first "a href" it sees. I don't have the page source so I can't say. Try to add more strings to the "before" part (e.g <div class="example"><a href=") if you can find a fixed pattern for each profile page.
     
    • Thanks Thanks x 2