1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Starting a new ScrapeBox experiment.

Discussion in 'Black Hat SEO' started by Bostoncab, Jan 8, 2011.

  1. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    O.k.

    So I had an idea and I am starting a thread to document it. If it helps someone great. If it has no value I apologize but it is better for me to put it in writing as I do it.

    I have been playing with SB for a few weeks now and of course reading as much of BHW as I can. One thread I read by Bobby Love was real interesting. To sum up Mr. Love's thread he theorizes that it's not exactly how many links you have but rather how many links you have from unique ip addresses that really matters. I have seen this in my own Niche as one competitor has more links and an older domain but less unique IP links than the top spot. I have to say I agree with BLove's theory. If you want to see how many unique IP links your competitors have the easiest and cheapest way to do this is to look up BHW member LEWI and buy a Majestic SEO report from him. Getting one or 2 is far cheaper than buying a subscription to Majestic.


    To date I have had better luck getting links approved and having them stick using blogengine blogs. IT seems that people use this software to throw up a blog that they quickly abandon and leave on autoapprove. I even often get emails while doing a blast that say something like I appreciate the comment you left on my blog insert name of blog here. LOL they Don't even bother to set up their blogs all the way. Plus everyone else is firing away at Wordpress getting banned right and left by akismet and having tio buy .info domains to use htt redircts to keep firing away at WP blogs. It's like they feel warm water and don't realize it's warm cuz everyone pissed in the lake..

    So the problem I was coming up against was how do I find as many unique domains as possible powered by blogengine? If you use 1-100 keywords and then eliminate duplicate domains you wind up with fewer domains from a huge list of harvested blogs.

    Earlier today I harvested around 12 million urls and wound up with 17,000 unique domains.. not good and a waste of harvesting bandwidth. Plus we are looking for bang for the buck and typical blogengine approval on fast poster is what 20% anyway?

    Again for the purposes of this experiment I am in need of as many unique domains running blogengine as possible. I feel I needed more keywords and less results per keyword.

    The BHW member MAruk was giving away keyword lists to anyone who asked and if you added all the keywords he has given out together it was something like 800,000 keywords. One thing I thought there though was that too many people had asked him for similar types of Niches. After all this is an IM forum of sorts. It's only normal that we would have similar interests. Plus the keywords he offered are from the scrapebox wonder wheel scraper. When I did my own experiment with this tool the Keywrod Boston gave me around 27,000 keywords that returned around 14,000 unique domains if I recall correctly? Maruk is a great guy and his Keyword lists are great but not exactly right for our experiment.

    So I was thinking where in hell do I get a huge list of quality non repetitive keywords?

    I googled at length trying to find a text file that had every word in the dictionary line after line with no definitions. The closest I came was a text file that after removing duplicates and 's and other extraneous characters netted me 528,030 words in the English Language. Pretty much everything from A to Zoo.

    So currently I am harvesting using 95 connections,around 110 good public proxies and t using the custom footprint " powered by blogengine.net" I am only looking to get 11 results per keyword. usually you want as many results as possible but I am out for unique domains and hopefully unique ip links not 400 links from one site to mine.

    Theoretically I should get 11*528,030 =5.8 million results. I wish I had a larger list of proxies to start with and a faster internet connection is always better.. ideally I should be on a windows vps for this experiment but when I went to purchase one from xsserver tonight they were out of stock.. fuckers.

    I am harvesting 17 urls a second and my calculations show that at that rate I wont get my 5.8 million results until Monday evening at the earliest.

    Hopefully at least half of the results are unique domains. let's make the math easy and hope 3 million are unique domains.. (hoping as many people as possible are running that bullshit asp crap blogengine).

    If 3 million are unique domains and out of that 3 million I only get a 1% stick rate I'll wind up with 30,000 back links..with anchor text.

    Now praying that all 30,000 of those sites aren't hosted between fatcow,hostgator and bluehost I would HAVE to wind up with 1,000 links on uniqe ip's minimum. That many unique IP links would put me about 3 times as many unique ip links ahead of my number one competitor. You notice I am not setting my goals too high..

    I am also thinking of incorporating another users idea of putting in my gmaps citations into the comments. I forget this users name but he claims that when G finds your exact info about your local business anywhere on the net it is likely to include that info as a review in your G places listing.

    I am about to hit 500,000 results so I am still a long way away from starting my comment run. I am willing to hear out any input..
     
    • Thanks Thanks x 2
  2. schwagoo

    schwagoo Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Nov 8, 2010
    Messages:
    807
    Likes Received:
    737
    Location:
    Midwest, USA
    Good call on running the mentire dictionary. I recently searched for "most searched keywords" and pulled a txt file of he 500 most searched keywords from the last year. I was planning on harvesting that list soon.
    Certainly there will be a LOT of blogs for the top searched words. But you will find far more obscure blogs with the whole dictionary.
     
  3. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    17 urls/ s? That sucks man. Why so slow? Try to use more proxies for larger harvests, and decrease the amount of connections. Only do around 30 connections if your internet is that slow, you will find it will actually be faster.
     
  4. Ardit

    Ardit Junior Member

    Joined:
    Oct 14, 2009
    Messages:
    119
    Likes Received:
    18
    Occupation:
    Marketing Expert
    Location:
    Facebook, Twitter, Google+ & Tumblr
    Good luck buddy!
     
  5. kandor

    kandor Regular Member

    Joined:
    May 26, 2008
    Messages:
    274
    Likes Received:
    97
    I decided to try this

    So 3min running with the footprint supplied

    And only unique hosts = domains or subdomains and I have already over 5k domains.

    Anyone would like to try the list ?
    I would want a report on how it went if anyone like to try.

    Best,
    Kandor
     
  6. kandor

    kandor Regular Member

    Joined:
    May 26, 2008
    Messages:
    274
    Likes Received:
    97
    7k URLs now from unique domains.

    Let me know if anyone can try and report back how the success rate is and what kind of % is auto approve.

    Won't run the scraping longer if I do not see value in them.

    Best,
    Kandor
     
  7. kandor

    kandor Regular Member

    Joined:
    May 26, 2008
    Messages:
    274
    Likes Received:
    97
    Here is the list for everyone

    Please report back how it goes.

    Best,
    Kandor
     

    Attached Files:

    • Thanks Thanks x 2
  8. xthoms

    xthoms Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    280
    Likes Received:
    99
    Wow thanks for the share :)

    I personally have 4400 unique blogengine domains, if u PM me your email, ill send them to you, but not too keen on sharing them on BHW for free. I blasted some of them and then waited some time and blasted the rest. I think perhaps I waited a week from I posted the first ones til I checked them and got about 8% approval rate even though some have been accepted in the mean time.

    My site did jump from page 16 to page 7 from that blast :) and afterwards i pinged all the found results once

    Edit: I don't think there even are 3 million blogengine blogs. There are about 25 million WP and WP is way more common, unfortunately.

    For proxies I would recommend going here and seeing if he is still giving out test accounts, otherwise it's just $5
    Code:
    http://www.blackhatworld.com/blackhat-seo/proxies/268232-need-testers-constantly-updated-proxies-site.html
     
    Last edited: Jan 8, 2011
  9. kandor

    kandor Regular Member

    Joined:
    May 26, 2008
    Messages:
    274
    Likes Received:
    97
    Proxies are of no issue for me.

    I do not have scrapebox installed on my linkbuilding server.

    I have scrapebox on my work desktop with slow internet connection.
    I never used scrapebox comment function so thats why I wanted someone else to try the list, and if there is a huge success in links sticking I will scrape more and buy another license of scrapebox to install it on my link server.

    And the 7000 domains was done in less than 4minutes of scraping so possibility to get the list larger is pretty high :)

    And you are welcome about the share.

    Best,
    Kandor
     
  10. charlie3

    charlie3 Senior Member

    Joined:
    Oct 4, 2009
    Messages:
    1,046
    Likes Received:
    468
    Location:
    U of A
    This seems like a pretty good experiment that's a little different from most on here. I also read Bobby's post and really liked the theory about unique IPs and am actually trying it out myself. I just harvested 7.6 GB of URLs (over 140 million so it's gonna be a while to narrow that list down to no dupes).

    But like you mentioned do you only get a unique IP from different hosting services? So if 5 sites have Hostgator you're only getting 1 unique IP or what? Thanks :D
     
  11. xthoms

    xthoms Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    280
    Likes Received:
    99
    I'll run it tonight and see what kinds of results I get from it and post it here tmw. please post more blogengine blogs and I will run them too. You must scrape pretty well then. I don't understand why I scrape fairly slowly 100-150 results/sec and then it stops scraping after around 80 keywords.. and i also only get a 20-25% success rate for the fast poster when i should get around 80-90% (i bought the list).. and i blast with private proxies. But when i run BE blogs thru slow poster i get like 80% posted.
     
  12. ezines

    ezines Power Member

    Joined:
    Jan 3, 2011
    Messages:
    712
    Likes Received:
    216
    Occupation:
    Online/Offline
    Location:
    Somewhere On Earth
    Good luck on this experiment and of course I'll be following this thread and might implement as well.
     
  13. kandor

    kandor Regular Member

    Joined:
    May 26, 2008
    Messages:
    274
    Likes Received:
    97
    Hrefer with 1000 threads

    Its another beast compared to scrapebox when it comes to scraping .


    But at the moment I am scraping a large list of other kind so I won't use resources on this if the success is not high.

    So let me know how it goes and we go from there.

    Best,
    Kandor
     
  14. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    Running 95 threads on a slow connection is crazy, put it lower.
     
  15. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    So I just woke up and found that SB was done harvesting. It completed roughly 200,000 keywords that resulted in almost 800,000 urls.

    After clicking remove duplicate domains I am left with 11,953 unique domains. I guess I'll have to run the remaining 325,214 keywords with less connections.. I turned the connections down to 20 and the timeout up to 30 secs.

    I thought for sure this would get me as many blogengine blogs as possible..

    I know some people commented on my internet connection being slow but you should know it is one of the fastest available in my area for a reasonable price. My ISP is comcast and I pay $40 a month for it. I think I can get triple the speed from the same ISP but I am pretty sure they reserve that for business customers and thats triple the price.
     
  16. Bostoncab

    Bostoncab Elite Member

    Joined:
    Dec 31, 2009
    Messages:
    2,255
    Likes Received:
    514
    Occupation:
    pain in the ass cabbie
    Location:
    Boston,Ma.
    Home Page:
    this is not going as well as I had hoped. Every time the harvest stops I eliminate duplicate domains and thus far I am up to 13,000 or so unique domains powered by B.E.

    My last harvest started with 109k keywords so I have to say that having gone through 400,000 keywords and only gotten 13,000 unique domains I am kind of disappointed almost to the point that I was tempted to abandon the project. I of course did not and am charging ahead hoping these last 100k keywords will somehow result in 5 million unique domains.
     
  17. kandor

    kandor Regular Member

    Joined:
    May 26, 2008
    Messages:
    274
    Likes Received:
    97
    Be realistic.

    5million blogengine domains?

    I think you have got all of the already, might be another 1000 of them if you continue for weeks with new keywords.

    But thats it.

    Best,
    Kandor
     
  18. albaniax

    albaniax Elite Member

    Joined:
    Aug 5, 2008
    Messages:
    1,586
    Likes Received:
    823
    Location:
    GER - ALB
    Yeah I think that, too. I don't see a lot of Blogengine sites.

    Otherwise, there are still a lot of footprints for other CMS / Blogs / Site where you can put your link at, if you look for them. Then just harvest them with SB, and sort by PR + manually comment.

    Don't just stick with things, that everyone is doing.
     
  19. wkrappen91

    wkrappen91 Power Member

    Joined:
    Sep 9, 2010
    Messages:
    588
    Likes Received:
    720
    Location:
    127.0.0.1
    I think i have an idea how to get EVERY last blogengine blog there is outhere.
    It would require someone to invest around $5. Im serious. Its a very good idea. Supprised no one had it before.
    If you are interested and are not trying to steal my idea, PM me...