[Journey] 1 million UVs/month in 12 months using AI generated content. Let's do it!

Status
Not open for further replies.
Great read @Sartre.
I wanted to pick your brain on a couple of details.

1. How did you acquire this database and where could I get one?

2. Can you elaborate more on your processs in relation to analyzing topical relevancy to the keyword / snippet - using spacy or something similar I take it?
3. If so, is it in reach for you to share a code snippet of just this? :)
4. I hope this makes sense, but how would the score from your app be represented - for a keyword with multiple UGC's (user gen content) would you calculatae in the sense that you decrease the difficulty score for each of UGC on the front page?

5. I see, so you don't use AI generated content on them?
Thanks.
1. I'm scraping Google using a f*** ton of 64-core servers for a very long time 24/7
2. I covered this in detail somewhere in this thread.
3. My app is over 10k lines of code. Unfortunately can't just share a snippet, as it depends on my custom classes and functions.
4. Yeah, exactly I decrease difficulty for each UGC, and the higher in position a particular link is, the higher relative power it has. I took the calculations from CTR in Google. It's mostly a function of inverse Φ - a 38% reduction for every position. If I remember correctly.
5. I use only AI generated content from now on. I also revamped almost everything when it comes to article generation after the last update. I will write an update on this soon.
Not, to find relevant sentences/paragraphs in Google top articles

By the way, if it's not difficult, tell me what coefficient you use to remove duplicates
well, it's much more complex than just that at this point, so I really can't help you. Especially after the changes since the last Google update.
 
Love the journey thanks for sharing so much is PPA site still working after the update or is it dead now ?
also, another question is swiftgrowth your site?
 
Could u send recent search console analytics
 
Love the journey thanks for sharing so much is PPA site still working after the update or is it dead now ?
also, another question is swiftgrowth your site?
Yes, it's my site.
Could u send recent search console analytics
It's not dead, but i have changed the model entirely. No more keyword stuffing, no more stock images, no more random youtube videos. Sites that had authority before haven't been penalized, but I'm afraid they might in the future, so I'm regenerating articles that are much more natural. I also recommend everyone do this:
  • generate 1000 excellent articles using AI. target 1 keyword per article. natural h2s. no longer stuffing h2s with PAAs
  • don't ping GSC
  • after 1-2 months see which articles are getting the most impressions(top 5%), improve them using human editors, build backlinks, put them on social media
  • AI-assisted sites is the name of the game now. Less risk; still super ROI.
My sites created for the purpose of this journey: https://datastudio.google.com/u/0/reporting/7db9eb94-334e-4ba1-9847-d0155f291a83/page/LuRfC
 
Regarding this:

4. Yeah, exactly I decrease difficulty for each UGC, and the higher in position a particular link is, the higher relative power it has. I took the calculations from CTR in Google. It's mostly a function of inverse Φ - a 38% reduction for every position. If I remember correctly.

how do you know a site is UGC or not? I mean, is there any module in python or do you have a huge list with all sites considered UGC? (for example, all social networks, all stores with customers' reviews, blogs, forums, etc.). Or even you look for a certain class inspecting the page...?
A hint would be appreciated! thx!
 
1. I'm scraping Google using a f*** ton of 64-core servers for a very long time 24/7
3. My app is over 10k lines of code. Unfortunately can't just share a snippet, as it depends on my custom classes and functions.
That's what skills are for.
If you have some skill, that's what putting it into practice means.
10k lines of code, run non-stop on 64 cores machines. And it never dies... Sometimes even the owner would have problems with stopping it. :D
 
Regarding this:



how do you know a site is UGC or not? I mean, is there any module in python or do you have a huge list with all sites considered UGC? (for example, all social networks, all stores with customers' reviews, blogs, forums, etc.). Or even you look for a certain class inspecting the page...?
A hint would be appreciated! thx!
I've gathered this list + 'board' or 'forum' or 'wiki' in domain:
Code:
      "aliexpress.com",
            "amazon.com",
            "answers.com",
            "archive.org",
            "blogger.com",
            "blogspot.com",
            "ebay.com",
            "etsy.com",
            "facebook.com",
            "fiverr.com",
            "github.com",
            "imgur.com",
            "instagram.com",
            "linkedin.com",
            "livejournal.com",
            "medium.com",
            "microsoft.com",
            "nih.gov",
            "wikipedia.org",
            "pinterest.com",
            "quora.com",
            "reddit.com",
            "scribd.com",
            "slack.com",
            "soundcloud.com",
            "spotify.com",
            "stackexchange.com",
            "stackoverflow.com",
            "t.co",
            "tumblr.com",
            "twitter.com",
            "udemy.com",
            "vimeo.com",
            "wordpress.com",
            "wordpress.org",
            "twitch.tv",
            "discord.com",
            "adobe.com",
            "mvorganizing.org",
            "blackhatworld.com",
            "warriorforum.com",
            "buildersociety.com",
            "steamcommunity.com",
            "discordapp.com",
            "archiveofourown.org",
            "imdb.com",
            "ign.com",
            "gamespot.com",
            "metafilter.com",
            "*****.org",
            "slickdeals.net",
            "nexusmods.com",
            "xda-developers.com",
            "kaskus.co.id",
            "tripadvisor.com",
            "arstechnica.com",
            "spiceworks.com",
            "ycombinator.com",
            "saidit.net",
            "notabug.io",
            "snapzu.com",
            "lobste.rs",
            "phuks.co",
            "hubski.com",
            "tildes.net",
            "duolingo.com",
            "habbo.com",
            "vk.com",
            "last.fm",
            "goodreads.com",
            "bebo.com",
            "tagged.com",
 
I initially made a setup but was not happy with the results of the scraping (how to get relevant paragraphs) so went back to the start trying semi manually to try and figure out how to automate. I am still kind of stuck here.

I tried the paa question and answer way and paraphrased just the answers, to avoid the problem, but then it looked like a 'paa site'. I would like to have articles which look like 'normal' articles but unsure how to scrape content there.

So currently for me I am making articles by manually going to the serps for main kw phrase, then opening up the top ten. I then take some decent paragraphs from the results by looking through each article manually for relevant stuff and place them into the article template. I repeat the process for a few more paa question searches until it has over 1k words.

Then I run it through pegasus and will manually edit the result so it makes sense. This seems to be a similar process to the semi automated way the other seos use with commercial ai tools.

This produces good articles I am happy to post, rather than the semi coherent stuff the bots came up with doing it totally automated, however it still takes a good chunk of my time and of course the goal is automation.

What are some tips to be able to have a higher degree of accuracy via bot to automate this process of choosing paragraphs that will make sense in the article as a whole as this is the main sticking point currently in terms of automation. There is so much variance in articles and content I am unsure what logic/nlp to use in order to get reliable results in what the bot would pull.

I know I could pull the h2s and 1-3 paragraphs beneath from articles, but still not sure how to parse for coherent content.
this is really interesting and I was thinking of doing something similar.
Have you had any progress so far?
Anyone else have some advices?
 
Yes, it's my site.

It's not dead, but i have changed the model entirely. No more keyword stuffing, no more stock images, no more random youtube videos. Sites that had authority before haven't been penalized, but I'm afraid they might in the future, so I'm regenerating articles that are much more natural. I also recommend everyone do this:
  • generate 1000 excellent articles using AI. target 1 keyword per article. natural h2s. no longer stuffing h2s with PAAs
  • don't ping GSC
  • after 1-2 months see which articles are getting the most impressions(top 5%), improve them using human editors, build backlinks, put them on social media
  • AI-assisted sites is the name of the game now. Less risk; still super ROI.
My sites created for the purpose of this journey: https://datastudio.google.com/u/0/reporting/7db9eb94-334e-4ba1-9847-d0155f291a83/page/LuRfC
What’s the niche of the articles you’re doing?
 
What’s the niche of the articles you’re doing?
anything really, but aiming for non-ymyl niches that are adsense-friendly with moderate rpm and competition
 
Sartre, how do you index the posts?
How long on average do posts from sites created with AI usually take to be indexed?
 
Sartre, how do you index the posts?
How long on average do posts from sites created with AI usually take to be indexed?
I used to post and ping 200 posts/day using GSC API. Now, after the update I'm waiting to see what happens. So far posting a lot of posts and letting Google discover them on its own.
 
this is really interesting and I was thinking of doing something similar.
Have you had any progress so far?
Anyone else have some advices?
Same here. Any suggestions or threads on automation logic so the site doesn't look like PAA site?
 
Same here. Any suggestions or threads on automation logic so the site doesn't look like PAA site?
don't stuff the article with PAA keywords? :) the question is how are you generating your content
 
I finished regenerating my 8 sites for the purpose of this journey with the new model (less keyword stuffing, random images, etc) to be safer. Let's see how it works over the next few weeks:
https://datastudio.google.com/u/0/reporting/7db9eb94-334e-4ba1-9847-d0155f291a83/page/LuRfC
 
Status
Not open for further replies.
Back
Top