1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[GET] Article Scraper Created By Myself

Discussion in 'Black Hat SEO Tools' started by 0_00_0, Oct 10, 2012.

  1. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    Hello everyone!

    This is a basic article scraper. I am sick of using SENuke's basic EZineArticles downloader I would prefer if the content came from more than one directory. This is nothing fancy - it's not multi-threaded and there is no proxy support yet (Although you can just set IE proxy to use one).

    The tool currently supports:
    Code:
    EZineArticles
    Article Base
    Go Articles
    Helium Articles
    Yahoo Voices - added in new update
    Instructions:
    1. Select whatever platforms you want to scrape from
    2. Enter the search query
    3. Tweak the delay [in milliseconds (1000ms=1s)]
    4. Select the maximum number of articles from each directory.
    5. Press start
    6. Articles will be saved in the same directory as the .exe under "query.txt".


    Article Base is touchy and it seems to be quick to captcha block. Increase the delay if you want to scrape more than a couple articles from that directory.
    This only works on windows and requires internet explorer.

    I am looking for feature suggestions - I was thinking of having it scrape for images and/or videos and inserting them in the articles. I may add this feature soon.
    I am also looking for more quality article platforms I should add.

    It has been a while since I shared one of my useful scripts - I have seen many people re-sharing without giving credit and even selling them in the past. Hopefully if someone is going to share this they could please at least give me credit.

    Screenshot:
    ArticleScraper.jpg

    Download (Updated Version):
    Code:
    http://www.multiupload.nl/ZXUX3QMC8N
    Virus Total:
    Code:
    [URL]https://www.virustotal.com/file/800f7314d17a67b69dc3fc534e194f3a380fa9b97632729e9fb464ec0eaf1c1d/analysis/1349898338/[/URL]
    Hmm... It is showing 1/44 detection ratio. Very weird it is just a basic AutoIT script - you can decompile it if you feel so inclined.
     
    • Thanks Thanks x 13
    Last edited: Oct 11, 2012
  2. SEORasta

    SEORasta Senior Member

    Joined:
    Sep 22, 2010
    Messages:
    1,001
    Likes Received:
    230
    Occupation:
    What Ever Makes Money..LEGALLY
    Location:
    Right Here!
    So it will download the full article(s) and then store them in .txt? Will it seperate the articles so they are not run into each other? Also i take it it does not do any spinning (meaning taking 1 paragrapgh from one article and then the second paragraph from another)
     
  3. Djalminha

    Djalminha Regular Member

    Joined:
    Dec 10, 2011
    Messages:
    452
    Likes Received:
    69
    what is the output ? just a file with 8 articles in txt ?
     
  4. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    It will download the full articles and put them in a text file.

    The formatted text file looks like this:

    EZINE ARTICLES:
    blah blah blah...
    blah blah...
    blah blah...

    HELIUM ARTICLES:
    more blah...
    even more blahs....
    etc...

    I didn't bother doing paragraph spinning because SeNuke already takes care of that for me. This was a tool I created for myself that I figured would be useful for everyone else too :). I may be able to add that kind of functionality later on if there is a big request!
     
  5. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    If you choose maximum 2 articles and select all four directories then you will get 8 articles. You can crank up maximum number of articles and have it scrape a lot more though!
     
  6. moonlighsunligh

    moonlighsunligh Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2010
    Messages:
    1,623
    Likes Received:
    218
    Thanks very much.

    In which language have you wrote it?

    edit: Got it, auto it. It is not as fast as vb.net.
     
    Last edited: Oct 10, 2012
  7. od265

    od265 Regular Member

    Joined:
    Feb 17, 2011
    Messages:
    222
    Likes Received:
    37
    Very nice Share OP will help with creating wiki's and web 2.0s so thanks
     
  8. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    Yeah it's just a quick script I wrote up and compiled in AutoIT. Not the fastest but I just wanted to get it done quick and dirty. I was considering writing it in C# but time is money :).

    No problem! That's the plan - I wanted to expedite making my SENuke campaigns. Let me know if you have any feature suggestions I'm all ears!
     
    • Thanks Thanks x 1
  9. SEORasta

    SEORasta Senior Member

    Joined:
    Sep 22, 2010
    Messages:
    1,001
    Likes Received:
    230
    Occupation:
    What Ever Makes Money..LEGALLY
    Location:
    Right Here!
    Ok I get it now....Proxy support would be GREAT so we can scrape like 10 + articles or so per directory. Right now though if I keep it at 5 per diretory I should be ok under my main IP.....will try later but thank you!
     
  10. Bestbuyfoam

    Bestbuyfoam Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 14, 2009
    Messages:
    1,637
    Likes Received:
    536
    This seems awesome, I'm definitely going to try it out...
     
  11. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    I can probably add this in the next couple days. I have my hands full at the moment. I have all of the proxy rotation code written up in other projects its just a matter of integration.

    Let me know if you have any feature suggestions!
     
  12. SundayForever

    SundayForever Junior Member

    Joined:
    Mar 17, 2011
    Messages:
    148
    Likes Received:
    11
    Home Page:
    This is awasome. I also use other software to collect articles. But this is the best I have ever seen. Many thanks.
     
  13. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    I've updated the software. I've added a fifth platform it can scrape - Yahoo Voices.
    I've also cleaned up the output a little bit.

    I will probably release a little update soon that incorporates proxies and whatever other requests I receive.

    Here is the new download link:
    Code:
    http://www.multiupload.nl/ZXUX3QMC8N
     
  14. SEORasta

    SEORasta Senior Member

    Joined:
    Sep 22, 2010
    Messages:
    1,001
    Likes Received:
    230
    Occupation:
    What Ever Makes Money..LEGALLY
    Location:
    Right Here!
    Damn all the places I go to does not have the actual link.....got a better link to DL ?
     
  15. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    Here is a mirror on dropbox. Hopefully this helps. All the links on multiupload work fine for me.
    Code:
    https://www.dropbox.com/sh/dvg6flg4k4hptf4/kgIt6mcG9i
     
    • Thanks Thanks x 4
  16. chris4004

    chris4004 Newbie

    Joined:
    Aug 12, 2012
    Messages:
    22
    Likes Received:
    0
    No need to try it then
     
  17. jeromespitfire

    jeromespitfire Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 8, 2008
    Messages:
    600
    Likes Received:
    452
    Location:
    403 Access Forbidden
    Thanks very much, seems to work well
     
  18. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    Your post is very confusing - why is there no need to try it? Elaborate and maybe I can improve whatever your looking for.

    Glad you like it! Please give me whatever suggestions you think would make it better!
     
  19. djhickory

    djhickory Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 3, 2011
    Messages:
    136
    Likes Received:
    13
    thanks for the good tip, you just saved me lots of time
     
  20. 0_00_0

    0_00_0 Senior Member

    Joined:
    Oct 7, 2010
    Messages:
    1,024
    Likes Received:
    486
    Location:
    Canada
    Hey Everyone,

    Just added proxy support to the script (proxies that don't require authentication). This will allow you to scrape articles much faster by allowing a lower delay time and avoid getting blocked by the directory spam controls.
    I also added a checkbox for the option to show IE or not.

    To use proxies add your list of proxies to "tProxies.txt" in the application. Each proxy should be on a new line in format "host":"port" - I've included a few proxies as a sample.

    Screenshot of the updated GUI:
    article Scraper ss.png

    Download it here:
    Code:
    https://www.dropbox.com/s/9umv8xd7nxv6ydj/Article%20Scraper%20By%200_00_0%20Updated.zip
    Let me know if you have any problems or suggestions!
     
    • Thanks Thanks x 4
    Last edited: Oct 16, 2012