hacking 1000 articles.

feralAddict

Newbie
Joined
Jun 19, 2021
Messages
40
Reaction score
45
Hi. So I'm lazy as fuck. The idea of writing 1000 articles turns my stomach. So I thought I would see how far I can get with using a few basic tools and a bit of ML.
In case someone else is interested I'm going to document my process.
So here goes...
POST #1: hacking 1000 articles - my approach
GOAL: produce 1000 articles of original content for my startup blog.
AIM: growth hack traffic with seo and social.
TOOLS: Screamingfrog premium and I also found a free crawler Sureoak, plus 80legs. Might also use ScrapeStorm if I get stuck. And if I'm really stuck I might even have to write my own scraper script - but I'm lazy so I will only do this as a last resort.
APPROACH: Get heaps of existing articles on my topic and summarise and spin them using AI. Then deploy in clusters.
STEP 1: Define my targets:
My startup focuses on marketing content on social media. The main opportunity I see in the content space is in funnels. And actual practice shit that isnt just the basics.
But creating content takes time so I'm going to scrape a heap of published content. First I need to crawl a few websites. to do so Im using ScreamingFrog and Sureoak - no way Im paying a $100 bucks to access premium tools like SemRush or AHREFs.
So I systematically crawl the sites and then I can filter out the crap links and combine the final list and scrape the whole articles & meta from the final link list. If SF doesnt help me do this than I will use 80legs which should pick up most of the content piece of pie.
I will post the results of how many links I get, etc when I finish.
Stay tuned.
 
Last edited by a moderator:
Hi. So I'm lazy as fuck. The idea of writing 1000 articles turns my stomach. So I thought I would see how far I can get with using a few basic tools and a bit of ML.
In case someone else is interested I'm going to document my process.
So here goes...
POST #1: hacking 1000 articles - my approach
GOAL: produce 1000 articles of original content for my startup blog: blog dot kin dot so
AIM: growth hack traffic with seo and social.
TOOLS: Screamingfrog premium and I also found a free crawler Sureoak, plus 80legs. Might also use ScrapeStorm if I get stuck. And if I'm really stuck I might even have to write my own scraper script - but I'm lazy so I will only do this as a last resort.
APPROACH: Get heaps of existing articles on my topic and summarise and spin them using AI. Then deploy in clusters.
STEP 1: Define my targets:
My startup focuses on marketing content on social media. The main opportunity I see in the content space is in funnels. And actual practice shit that isnt just the basics.
But creating content takes time so I'm going to scrape a heap of published content. First I need to crawl a few websites. to do so Im using ScreamingFrog and Sureoak - no way Im paying a $100 bucks to access premium tools like SemRush or AHREFs.
So I systematically crawl the sites and then I can filter out the crap links and combine the final list and scrape the whole articles & meta from the final link list. If SF doesnt help me do this than I will use 80legs which should pick up most of the content piece of pie.
I will post the results of how many links I get, etc when I finish.
Stay tuned.
 
STEP 2: filtering crawled links. this took a lot of effort. I hate time consuming shit but Im too lazy to write a script to do it. finished filtering out crap links and now I combine the data to get: links, titles, h1, h2, word count. next step is to do some basic clustering to build a hub and spoke content model. I could do this later but a bit of preplanning is kind to my future self.
 
1000 ai articles ain't much. Try 100k maybe.
true. but Im also aiming for article quality. and the topic clusters need to be managed. If I can prototype 1000 then I can extend this to 100k easily enough.
 
wow. so after scraping some quality marketing blogs (shoutout https://www.linkedin.com/feed/hashtag/?keywords=neilpatel) I've 13.5k articles and 32m words to process into original content within proper topic clusters... I may have bitten off more than I can chew with this...
 
wow. so after scraping some quality marketing blogs (shoutout #neilpatel) I've 13.5k articles and 32m words to process into original content within proper topic clusters... I may have bitten off more than I can chew with this...
hey, don't back off now! Stick with the plan :D
 
1K ARTICLE HACK UPDATE:

I managed to cull my list of crawled articles down from 10k to about 5k by removing a few that target a specific niche (away from startup and ecommerce.)

Also, culled by article publication data - where the date was available, anything before 2019 was culled. Also on article size. I favoured longer articles. finally, I killed a lot of articles dealing with sem, display and algo updates, seasonality, keywords and technical seo. instead favouring articles more aligned towards funnel design and less on optimisation.

With the remaining 5k articles, I scraped the body html including content into flat files.

Next step is to convert the html into docs and cluster similar topics. Following this, I will process each article using #nlp into original content. I plan to do a pilot comparison of simplemarketing.ai with conversion.ai and an open-source model I will develop based on a huggingface framework. While for my project, each of these commercially available tools are not going to be economical it will give me a solid benchmark from which to fine tune my huggingface model. It will also be interesting to compare the two commercially available solutions I'm testing.

I estimate the project for the most part will largely be finished by the end of next week.

Equating to about 5m words of quality content across 1k articles produced in 14 days. Estimated cost of content produced if I was to outsource to freelancers. $50k - $100k. Based on average ROAI metrics, this would be worth anywhere between $1m - $5m.

If I was to charge a reasonable consultancy fee for this work of between $15k to $30k, this is approximately $1k per dollar spent. A phenomenal ROI for any business!

#AI FTW!
 
@feralAddict my .02: 1000 random articles may or may not do the trick. Here's the thing, you can do better with fewer articles as long as you focus on Keyword Research. First find low comp keywords (you will need to do this manually, check out the forum for guides there are many on here). Once you have viable keywords, then auto-generate content, this is going to yield much better results than going after random articles without proper keyword research.
 
@feralAddict my .02: 1000 random articles may or may not do the trick. Here's the thing, you can do better with fewer articles as long as you focus on Keyword Research. First find low comp keywords (you will need to do this manually, check out the forum for guides there are many on here). Once you have viable keywords, then auto-generate content, this is going to yield much better results than going after random articles without proper keyword research.
love this
 
@feralAddict my .02: 1000 random articles may or may not do the trick. Here's the thing, you can do better with fewer articles as long as you focus on Keyword Research. First find low comp keywords (you will need to do this manually, check out the forum for guides there are many on here). Once you have viable keywords, then auto-generate content, this is going to yield much better results than going after random articles without proper keyword research.
Nah I disagree for few reasons:
1- 80/20 rule. I'm too lazy to optimise as long as I get 80% of the result for 20% of the effort.
2- Fous on customers instead of keywords.
3- Google is pretty clever
4- I'm hacking articles which are already optimised.

Besides, the desired outcome is to produce an algo that makes SEO redundant.
 
Nah I disagree for few reasons:
1- 80/20 rule. I'm too lazy to optimise as long as I get 80% of the result for 20% of the effort.
2- Fous on customers instead of keywords.
3- Google is pretty clever
4- I'm hacking articles which are already optimised.

Besides, the desired outcome is to produce an algo that makes SEO redundant.
I love it. Good luck. Following with interest as I was thinking about doing this myself.
 
so you're basically just taking existing articles and using AI spinning then re-posting? What else are you doing on top of that? You'll be trying to drive traffic from social media first?

Not totally sure I understand, but it sounds like essentially you're just taking existing articles, not doing any keyword research and just AI spinning and pumping them out.
 
so you're basically just taking existing articles and using AI spinning then re-posting? What else are you doing on top of that? You'll be trying to drive traffic from social media first?

Not totally sure I understand, but it sounds like essentially you're just taking existing articles, not doing any keyword research and just AI spinning and pumping them out.

Essentially. however the challenge will be to spin articles which google will value, with no duplicated content, and at sufficient scale to make a decent dent. the second problem is to internally link these articles into a decent hub and spoke content model. As I mentioned, Im lazy. I dont want to manually do this. 3rd challenge is to create a decent funnel out of all this - i think KW research is dumb. Focus should be on understanding the customer journey and aligning content accordingly. Not jerking around with random keywords trying to rank for obscure phrases.
 
UPDATE on 1000 article hack.
ok so I jumped on huggingface last night and found some strong models to paraphrase articles. this means I can pump out decent content.
screamingfrog was awesome for scraping 10k articles quickly. I think I cocked up a bit though cos I pulled some articles from some pretty crap blogs. But I think I have a decent core set of content to spin.
I converted the html I pulled from SF into word docs and uploaded to GDrive. some got corrupted but after culling weak/shit/short articles Im still sitting at about 4.5k so even a 20% corruption rate isnt worth losing sleep over.
Now, I could pull another model from Huggingface and cluster the topics. this would be the smartest way and I may have to do this when I come to productise this algo. but for now Im clustering manually as there is still a lot of topics I dont want dealing with technical seo, KW research, ppc - personally I think this shit is a waste of time and money. besides I can move pretty quick and its simply assessing the title. But still this isnt in the spirit of the project so I will definitely revisit this part of the pipeline.
For now my plan is simply to focus on the articles from the quality blogs - backlinko, NP, AHREFs, etc - and leave the weaker blogs until later subject to how much time I have available. then I will skim through the remaining.
To expedite the process I roped in a cheap VA to help me out.
As long as I can get the total project done in under 3 weeks and produce over 1k articles with less than $1k cash investment, I count it as a successful outcome - the expected ROI will be huge.
 
Back
Top