1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Help With A Bot That Extracts Forum Profile URLS

Discussion in 'Black Hat SEO' started by Thesiege84, Feb 13, 2013.

  1. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Hey Guys,

    Ive been trying to do something for ages now and i can't find the best software to achieve what i want.

    Basically i want to make a list of all the public member profiles of a particular forum...

    I have tried scrapebox using google for site: and inurl:forumname/member.php however google only gives you 1,000 results and my first target has 80k profiles :(

    Next i remembered i have used an email spider and found one that does the same for URLs called "web data extractor 8.1" but it seems to mess up and extract random data and ignore what i tell it, i.e. only extract if url contains "member.php?"

    I have a list of urls (page 1, page 2, page 3 etc) of where a bot needs to extract from but i just can't find a good bot to achieve this.

    Ive even tried the scrapebox addon for link extraction but that doesnt seem to even see the urls on the page and only grabs 6 when there are at least 20 per page, and it doesnt even grab the urls i want...

    What software would you suggest to achieve this.
     
  2. Narrator

    Narrator Power Member

    Joined:
    Oct 5, 2010
    Messages:
    507
    Likes Received:
    396
    Occupation:
    Internet Marketing
    Location:
    /dev/null
    If you already have the list of urls to scrape from, you can do it pretty easily with php.
    If the pages have links to profiles with the format:
    Code:
    <a href="member.php?u=16923">SomeUsername</a>
    
    Something like
    PHP:
    $htmlfile_get_contents('url to be scraped'); //or you could use curl which would be better
    preg_match_all("/member\.php\?u=(\d+)/"$html$matchesPREG_SET_ORDER);
    foreach (
    $matches as $val) {
    $userURL=$val[0];
    }
    with $html being the code grabbed from the page. It's kind of late for me to write a full example for a scraper but I hope this helps.

    Cheers!
     
    • Thanks Thanks x 1
  3. ShabbySquire

    ShabbySquire Power Member

    Joined:
    Nov 30, 2011
    Messages:
    574
    Likes Received:
    122
    Location:
    UK
    This could be done easily with Zennoposter (I create bots for scraping). Just use regex to grab the usernames and you're away.
     
  4. elviswong

    elviswong BANNED BANNED Premium Member

    Joined:
    Nov 8, 2011
    Messages:
    918
    Likes Received:
    240
    Mate use Ubot for this ! Now where's my reflink.... Hummmm

    Seth, if he buys, can i get credited for the sale please



    lol
     
  5. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Hey everyone, thanks for the tips.

    I didnt see the replies until it was too late.

    I'm ashamed to say my php knowledge is terrible and i rely on coders to help me, god knows how i got this far without.

    For the scraping i ended up using a nice simple bit of software that cost me $20. I would share the files and license etc but its for 1 computer so i'm afraid i cannot.

    Here is the link to the software if your interested or for any leachers on the prowl!

    http://www.sobolsoft.com/extractlink/

    Thanks again everyone for the tips, ill test that zenno poster out when i have something a bit more complex to achieve!
     
    • Thanks Thanks x 1