[Journey] 1 million UVs/month in 12 months using AI generated content. Let's do it!

ComputerJunkie

Regular Member
Joined
Oct 9, 2012
Messages
331
Reaction score
105
It's not possible to DM you and selling outside of BST is forbidden. Can you just share your knowledge with others? :D
Well I wasn’t selling anything but I’m also not going to share this knowledge publicly.

I have a legit site that has well over 10 millions pages and API quota is consistently 140,000 per day.

I’m indexing around 200,000-400,000 new posts a month, I just thought it would be beneficial to your system after I’ve read your journey.
 
Last edited:

tikaku

Newbie
Joined
Jun 7, 2013
Messages
14
Reaction score
1
Hi @Sartre , do you use Wordpress Multisite or standalone for each website? Could you recommend good paraphrasing service supported API? Where do you buy aged domains and what are primary factors of choosing good aged domains? Thanks in advance,
 
Last edited:

Sartre

Jr. VIP
Jr. VIP
Joined
Apr 1, 2010
Messages
530
Reaction score
560
Website
NoSandbox.com
still leaving meta descriptions to google?
yup
Is this hard to set it up? I am using wpx like a noob
nah it's really not that hard. 1 hour of work
My bot was banned from Quillbot this week which is a real setback as their model performs far better than Pegasus ever has for me.
Do you have any pointers for building my own summary model? Cheers
Yup, unfortunately. Training your own moddel is pretty difficult. Look up stuff on training models on Huggingface

have you considered non google targets for content
I haven't. Right now on a 24 core CPU I got more threads that I think I will ever be able to use.
a legit site that has well over 10 millions pages and API quota is consiste
It's not even possible to PM you from my side for some reason.
Hi @Sartre , do you use Wordpress Multisite or standalone for each website? Could you recommend good paraphrasing service supported API? Where do you buy aged domains and what are primary factors of choosing good aged domains? Thanks in advance,
I use standalone sites + REST API. Unfortunately I don't know any good paraphrasing API that isn't banning this stuff.

I'm buying aged domains from auctions. My priority is that there was a site there in the same niche before and it had backlinks leading to that site that are niche-related. I developed a workflow and every day I filter and check/bid/buy a few domains. Domain Hunter Gatherer that is sold here on BHW in a BST thread is a good tool that helps to find domains. I have developed my own Python script now, of course :D
 

b00bz

Registered Member
Joined
Sep 26, 2018
Messages
90
Reaction score
37
1. AdSense
2. 3z0ic
3. M0num3tric
4. M3di4vin3

In the order of your site growing.
Did You have experience with Your sites, that AdSense didnt accept it, then You reapplied and they did? Im running some PAA websites, and got like 30 sites pending approval. One got rejected for low value content, You think its worth to just "re-apply" and hope for it to slip through, or without some sort of enriching that content, there is no way for it to get accepted anyways?
 

Sartre

Jr. VIP
Jr. VIP
Joined
Apr 1, 2010
Messages
530
Reaction score
560
Website
NoSandbox.com
Did You have experience with Your sites, that AdSense didnt accept it, then You reapplied and they did? Im running some PAA websites, and got like 30 sites pending approval. One got rejected for low value content, You think its worth to just "re-apply" and hope for it to slip through, or without some sort of enriching that content, there is no way for it to get accepted anyways?
dude why would you apply with 30 sites at once? that's a red flag.
 

BlogPro

Jr Vip
Jr. VIP
Joined
Apr 23, 2012
Messages
1,656
Reaction score
2,530
Website
wordsigma.com
Noob question, what does PAA stand for?

People Also Ask

The questions you see when you run a Google Search.

Lopk5og.png
 

PinguSpy

Jr. VIP
Jr. VIP
Joined
Dec 7, 2007
Messages
2,661
Reaction score
2,725
I wonder what would happen if everybody is doing the same thing?

Scraping the paa is easy.
To re-paraphrase the text is the hardest. Even after fine tuning it, the output still un-readable. Is that really possible we can generate the output similar to quilbot quality just by using pegasus? Not to mention quetext still showing 80% plagarism.
 
Last edited:

BlogPro

Jr Vip
Jr. VIP
Joined
Apr 23, 2012
Messages
1,656
Reaction score
2,530
Website
wordsigma.com
I wonder would happen if everybody is doing the same thing?

The internet is pretty vast and automation is here to stay. You don't wait for things to get fux0red by Google - you find something else that works. Isn't that what we're all about? Staying one step ahead, all the time.

Scraping the paa is easy.
To re-paraphrase the text is the hardest. Even after fine tuning it, the output still un-reable. Is that really possible we can generate the output similar to quilbot quality just by using pegasus? Not to mention quetext still showing 80% plagarism.

The trick here is to augment your data, rather than going for basic paraphrasing.

Quillbot has an enormous training model behind it - and it is continuously trained, hence the greater quality of output.

//

To achieve something like this with Pegasus or Parrot or T5 - you'll need to train the model. How do you train a paraphraser model? You use human data. Since they're the best paraphrasers.

--

- Here's a dataset from Tatoeba.org - which is a crowdsourced portal for language learners. The Tapaco Dataset has exactly that. It has 1.9 million sentences in 70+ languages - even if you extract English from them - you still have a decent sized dataset to train your model.

Link to Tapaco - https://zenodo.org/record/3707949

--

- Here's an official dataset from Quora - https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs

The research team at Quora wanted a unique page for each logic distinct question, and so they released a dataset of 400K questions that are possibly semantic pairs (duplicates of each other)

You can use these as well.

//

The more semantic knowledge your models is trained on, the more likely you are to receive augmented output. Augmentation is also possible when you have several data sources embedded together.
 

NulledCode

Jr. VIP
Jr. VIP
Joined
Jun 10, 2010
Messages
1,573
Reaction score
1,252
Website
www.gplcellar.com
You have surely noticed by now that Node is absolutely single-threaded, and thus not that useful ;).

And this is an absolutely false statement, NodeJs is not single threaded, in fact it uses multiple threads under the hood. The applications you write are single threaded in nature due to the event loop, but even this doesn’t mean you still can’t write applications that use multiple threads.
 

desperado596

Regular Member
Joined
Nov 14, 2015
Messages
346
Reaction score
63
Well I wasn’t selling anything but I’m also not going to share this knowledge publicly.

I have a legit site that has well over 10 millions pages and API quota is consistently 140,000 per day.

I’m indexing around 200,000-400,000 new posts a month, I just thought it would be beneficial to your system after I’ve read your journey.

how are you indexing so much posts ? i have more than 30k posts with google news website but my index between 3-4k.
 

PinguSpy

Jr. VIP
Jr. VIP
Joined
Dec 7, 2007
Messages
2,661
Reaction score
2,725
The internet is pretty vast and automation is here to stay. You don't wait for things to get fux0red by Google - you find something else that works. Isn't that what we're all about? Staying one step ahead, all the time.



The trick here is to augment your data, rather than going for basic paraphrasing.

Quillbot has an enormous training model behind it - and it is continuously trained, hence the greater quality of output.

//

To achieve something like this with Pegasus or Parrot or T5 - you'll need to train the model. How do you train a paraphraser model? You use human data. Since they're the best paraphrasers.

--

- Here's a dataset from Tatoeba.org - which is a crowdsourced portal for language learners. The Tapaco Dataset has exactly that. It has 1.9 million sentences in 70+ languages - even if you extract English from them - you still have a decent sized dataset to train your model.

Link to Tapaco - https://zenodo.org/record/3707949

--

- Here's an official dataset from Quora - https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs

The research team at Quora wanted a unique page for each logic distinct question, and so they released a dataset of 400K questions that are possibly semantic pairs (duplicates of each other)

You can use these as well.

//

The more semantic knowledge your models is trained on, the more likely you are to receive augmented output. Augmentation is also possible when you have several data sources embedded together.
Nice.. I will forward this to my coder and hope the quality will be much better.

Thanks.
 

ProductionsDream

Regular Member
Joined
Nov 28, 2013
Messages
344
Reaction score
163
The internet is pretty vast and automation is here to stay. You don't wait for things to get fux0red by Google - you find something else that works. Isn't that what we're all about? Staying one step ahead, all the time.



The trick here is to augment your data, rather than going for basic paraphrasing.

Quillbot has an enormous training model behind it - and it is continuously trained, hence the greater quality of output.

//

To achieve something like this with Pegasus or Parrot or T5 - you'll need to train the model. How do you train a paraphraser model? You use human data. Since they're the best paraphrasers.

--

- Here's a dataset from Tatoeba.org - which is a crowdsourced portal for language learners. The Tapaco Dataset has exactly that. It has 1.9 million sentences in 70+ languages - even if you extract English from them - you still have a decent sized dataset to train your model.

Link to Tapaco - https://zenodo.org/record/3707949

--

- Here's an official dataset from Quora - https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs

The research team at Quora wanted a unique page for each logic distinct question, and so they released a dataset of 400K questions that are possibly semantic pairs (duplicates of each other)

You can use these as well.

//

The more semantic knowledge your models is trained on, the more likely you are to receive augmented output. Augmentation is also possible when you have several data sources embedded together.
Thanks,

Following your advice I have trained the t5 model on the Tapaco Dataset.

I will proceed with the Quora dataset next.

Regards
 

Sartre

Jr. VIP
Jr. VIP
Joined
Apr 1, 2010
Messages
530
Reaction score
560
Website
NoSandbox.com
The internet is pretty vast and automation is here to stay. You don't wait for things to get fux0red by Google - you find something else that works. Isn't that what we're all about? Staying one step ahead, all the time.



The trick here is to augment your data, rather than going for basic paraphrasing.

Quillbot has an enormous training model behind it - and it is continuously trained, hence the greater quality of output.

//

To achieve something like this with Pegasus or Parrot or T5 - you'll need to train the model. How do you train a paraphraser model? You use human data. Since they're the best paraphrasers.

--

- Here's a dataset from Tatoeba.org - which is a crowdsourced portal for language learners. The Tapaco Dataset has exactly that. It has 1.9 million sentences in 70+ languages - even if you extract English from them - you still have a decent sized dataset to train your model.

Link to Tapaco - https://zenodo.org/record/3707949

--

- Here's an official dataset from Quora - https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs

The research team at Quora wanted a unique page for each logic distinct question, and so they released a dataset of 400K questions that are possibly semantic pairs (duplicates of each other)

You can use these as well.

//

The more semantic knowledge your models is trained on, the more likely you are to receive augmented output. Augmentation is also possible when you have several data sources embedded together.
I wish I could articulate my thoughts like you do!
 
Top