thanks so much. sorry for the delay with the new journey, but I'm working on it, I promiseYour work and investments are yielding good results. Keep it up!
exactly. google open public datasets. check Kaggle. You can scrape huge websites like Wikihow. There is a Wikipedia dataset out there. There is Reddit/Quora dataset out there. Lots of possibilities.How do you create datasets, for fine-tuning models, from places like reddit or similar sources?
Hey folks,
I have a background in computer science. I already own several profitable content websites (but nothing crazy), and I'm tired of creating/outsourcing content.
I've created a simple app in Python that goes through the top results on Google for a given keyword, takes a paragraph from each for semantically relevant keywords, constructs a new article out of it, and paraphrases it using an AI tool. It also generates related images, adds nice formatting, and schema (I'm using FAQ schema a lot for PAA keywords).
I'm using WordPress on Linode with Centminmod. Posting using the REST API.
In the nearest future I'm launching 3 sites:
Attaching a sneak peek of my app. I will show you an example article in my next update.
- My passion hobby project - I will generate articles using AI, but edit them manually - So far this one is up with 3 articles, started yesterday.
- A big site where I will drip tens of thousands of posts without editing and try to monetize with display ads.
- Another fully-automated site that will target local keywords for lead generation.
Wish me luck!
Good luck with your journey!
There's another guy here who went down the same path and did not have any success. You might want to check him up. Not linking so you can do your own browsing, there's plenty to learn in your situation with AI.
You might want to check what he did and figure out what he did wrong, maybe that will help. The concept is the same though.
Thank so much, I haven't realized! It's good to be wrong, I'm wrong all the time. I start 10 projects, 8 work out. It's all good.Happy birthday to your thread, one year has passed since you started this and it seems you achieved the initial goal from the datastudio charts.
Boy, was I wrong![]()
YESSSS!time for a new journey
I was asking not can you or where do you, with those examples, but rather how do you get the data to make into something that the models can accept. As in are the datasets already there to download or do you make some sort of scraper for every single page and then format into something the fine tuners will accept?exactly. google open public datasets. check Kaggle. You can scrape huge websites like Wikihow. There is a Wikipedia dataset out there. There is Reddit/Quora dataset out there. Lots of possibilities.
How many years must a domain be for you to consider it aged?expired better than fresh, aged better than expired
It is not only the age that matters. A 2 years domain with a good link profile and a healthy history and often updated will perform better than a 20 year old domain never updated and with a poor backlink profile. But to answer to your questions I guess 2-3 years at least.How many years must a domain be for you to consider it aged?
Reddit? Yup, that's me.@Sartre Have you written about hedonic adaptation & AdSense optimization before?
Fine-tuning 1 model for all websites. It already has the knowledge it needs. I'm fine-tuning it to fit the "blogging format" that I'm looking for. Fine-tuning for 1 niche would be very overkill/expensive. I guess it's a good idea if you got a very high budgetI was asking not can you or where do you, with those examples, but rather how do you get the data to make into something that the models can accept. As in are the datasets already there to download or do you make some sort of scraper for every single page and then format into something the fine tuners will accept?
Is it done for each niche for example or you just scrape a whole dataset for a whole website? I was thinking that if you did it per niche, if making your own from the sites, then you could keep it smaller saving on computation time. While still running a small scale operation I thought that could be beneficial rather than scraping whole site content.
This.It is not only the age that matters. A 2 years domain with a good link profile and a healthy history and often updated will perform better than a 20 year old domain never updated and with a poor backlink profile. But to answer to your questions I guess 2-3 years at least.
pay?My plan is to do paraphrasing on large scale. Doing it on smaller scale, I did no get any DMCA complaints. Now I want to scale out.
@Sartre what kind of DMCA complaint did you get? I remember you stated this happens for brands specifically. What about doing NER and filtering out popular brands? Or translating and paraphrasing then?
Did you get complaints for the pure paraphrase as well?
Did you have to pay in the end, or just take down your site?
thanks.The guide you posted on CWV getting to 100 is insane.
I want to do something similar but in other languages.
Just so I'm correct python is the key skill required to do this at a lower level correct ?
Also can one hire different developers to do this in parts ?
What sort of monetization is being used here ?
Thanks,
Russel
So this means you did not go from paraphrasing to generation because of legal/$$-issues, but just because by generation you can create better and more micro-niches?pay?Nah, I just took down 1 article and blacklisted that company in the app
can't say yet. I will do a writeup on that once the dust settles. So far mostly fine, but I want to wait a few days. I'm expecting a rollback though. Too many WH sites got hit.How the spam update was for you?
both. It's cheaper, faster, easier, less risk.So this means you did not go from paraphrasing to generation because of legal/$$-issues, but just because by generation you can create better and more micro-niches?