[Journey] 1 million UVs/month in 12 months using AI generated content. Let's do it!

Status
Not open for further replies.
Your work and investments are yielding good results. Keep it up!
 
Your work and investments are yielding good results. Keep it up!
thanks so much. sorry for the delay with the new journey, but I'm working on it, I promise :) Almost there.
 
How do you create datasets, for fine-tuning models, from places like reddit or similar sources?
 
How do you create datasets, for fine-tuning models, from places like reddit or similar sources?
exactly. google open public datasets. check Kaggle. You can scrape huge websites like Wikihow. There is a Wikipedia dataset out there. There is Reddit/Quora dataset out there. Lots of possibilities.
 
Thanks so much for your answers through the thread @Sartre. Could you tell more about the database design, like how many table do I need and which columns :)
 
Hey folks,

I have a background in computer science. I already own several profitable content websites (but nothing crazy), and I'm tired of creating/outsourcing content.

I've created a simple app in Python that goes through the top results on Google for a given keyword, takes a paragraph from each for semantically relevant keywords, constructs a new article out of it, and paraphrases it using an AI tool. It also generates related images, adds nice formatting, and schema (I'm using FAQ schema a lot for PAA keywords).

I'm using WordPress on Linode with Centminmod. Posting using the REST API.

In the nearest future I'm launching 3 sites:
  1. My passion hobby project - I will generate articles using AI, but edit them manually - So far this one is up with 3 articles, started yesterday.
  2. A big site where I will drip tens of thousands of posts without editing and try to monetize with display ads.
  3. Another fully-automated site that will target local keywords for lead generation.
Attaching a sneak peek of my app. I will show you an example article in my next update.

Wish me luck!

Good luck with your journey!

There's another guy here who went down the same path and did not have any success. You might want to check him up. Not linking so you can do your own browsing, there's plenty to learn in your situation with AI.

You might want to check what he did and figure out what he did wrong, maybe that will help. The concept is the same though.

Happy birthday to your thread, one year has passed since you started this and it seems you achieved the initial goal from the datastudio charts.
Boy, was I wrong :D
 
Happy birthday to your thread, one year has passed since you started this and it seems you achieved the initial goal from the datastudio charts.
Boy, was I wrong :D
Thank so much, I haven't realized! It's good to be wrong, I'm wrong all the time. I start 10 projects, 8 work out. It's all good.
time for a new journey
YESSSS!
 
  • Like
Reactions: #IM
exactly. google open public datasets. check Kaggle. You can scrape huge websites like Wikihow. There is a Wikipedia dataset out there. There is Reddit/Quora dataset out there. Lots of possibilities.
I was asking not can you or where do you, with those examples, but rather how do you get the data to make into something that the models can accept. As in are the datasets already there to download or do you make some sort of scraper for every single page and then format into something the fine tuners will accept?

Is it done for each niche for example or you just scrape a whole dataset for a whole website? I was thinking that if you did it per niche, if making your own from the sites, then you could keep it smaller saving on computation time. While still running a small scale operation I thought that could be beneficial rather than scraping whole site content.
 
How many years must a domain be for you to consider it aged?
It is not only the age that matters. A 2 years domain with a good link profile and a healthy history and often updated will perform better than a 20 year old domain never updated and with a poor backlink profile. But to answer to your questions I guess 2-3 years at least.
 
My plan is to do paraphrasing on large scale. Doing it on smaller scale, I did no get any DMCA complaints. Now I want to scale out.

@Sartre what kind of DMCA complaint did you get? I remember you stated this happens for brands specifically. What about doing NER and filtering out popular brands? Or translating and paraphrasing then?
Did you get complaints for the pure paraphrase as well?
Did you have to pay in the end, or just take down your site?
 
The guide you posted on CWV getting to 100 is insane.

I want to do something similar but in other languages.

Just so I'm correct python is the key skill required to do this at a lower level correct ?
Also can one hire different developers to do this in parts ?
What sort of monetization is being used here ?

Thanks,
Russel
 
@Sartre Have you written about hedonic adaptation & AdSense optimization before?
Reddit? Yup, that's me.
I was asking not can you or where do you, with those examples, but rather how do you get the data to make into something that the models can accept. As in are the datasets already there to download or do you make some sort of scraper for every single page and then format into something the fine tuners will accept?

Is it done for each niche for example or you just scrape a whole dataset for a whole website? I was thinking that if you did it per niche, if making your own from the sites, then you could keep it smaller saving on computation time. While still running a small scale operation I thought that could be beneficial rather than scraping whole site content.
Fine-tuning 1 model for all websites. It already has the knowledge it needs. I'm fine-tuning it to fit the "blogging format" that I'm looking for. Fine-tuning for 1 niche would be very overkill/expensive. I guess it's a good idea if you got a very high budget
It is not only the age that matters. A 2 years domain with a good link profile and a healthy history and often updated will perform better than a 20 year old domain never updated and with a poor backlink profile. But to answer to your questions I guess 2-3 years at least.
This.

Do not look at DA/DR. You can get DR to 30 for $100. There are people who offer these services all around.

Look at archive.org. The niche has to match. Look at the backlinks.

Is the name brandable and makes sense in that niche?

The name itself is pretty important IMO when selling the site later. Many people want to develop these sites into genuine brands.

My plan is to do paraphrasing on large scale. Doing it on smaller scale, I did no get any DMCA complaints. Now I want to scale out.

@Sartre what kind of DMCA complaint did you get? I remember you stated this happens for brands specifically. What about doing NER and filtering out popular brands? Or translating and paraphrasing then?
Did you get complaints for the pure paraphrase as well?
Did you have to pay in the end, or just take down your site?
pay? :D Nah, I just took down 1 article and blacklisted that company in the app
The guide you posted on CWV getting to 100 is insane.

I want to do something similar but in other languages.

Just so I'm correct python is the key skill required to do this at a lower level correct ?
Also can one hire different developers to do this in parts ?
What sort of monetization is being used here ?

Thanks,
Russel
thanks.

I'm using display ads, and I'm coding everything myself basically
 
@Sartre
How you structure the article using rest-api. I am finding it difficult to structure the article the way I want.

Sometimes the output of OpenAi is in list format. Do you check for every sentence...if the sentence starts with 1. 2. etc then wrap that sentence in <li> ?

Also..do you use fixed article template with fixed number of headings, images etc...and then your program replaces it with actual headings and content. Or your program is intelligent enough to decide the number of headings and images depending upon the keywords.
 
pay? :D Nah, I just took down 1 article and blacklisted that company in the app
So this means you did not go from paraphrasing to generation because of legal/$$-issues, but just because by generation you can create better and more micro-niches?
 
How the spam update was for you?
can't say yet. I will do a writeup on that once the dust settles. So far mostly fine, but I want to wait a few days. I'm expecting a rollback though. Too many WH sites got hit.
So this means you did not go from paraphrasing to generation because of legal/$$-issues, but just because by generation you can create better and more micro-niches?
both. It's cheaper, faster, easier, less risk.
 
Status
Not open for further replies.
Back
Top