[BlogPro AMA] 1500 Posts - Ask about PAA Sites / Data Enriched Sites / Job Boards / Scraping / AI / ML - Go Ahead!

Would appreciate if you you could shed some light on your ml infrastructure.

Do you use any GPUs locally or is it all cloud? Are you mostly on AWS SageMaker for training or do you do stuff locally too?

Are you self hosting any models?

If you'd like to give a rough estimate. How much does it cost you to generate 1k tokens across your infra? Roughly how many thousand tokens are you generating?
 
Thanks for doing this AMA !

Do you purchase links or do you create them yourself ? How do you handle your links velocity ?
 
Would appreciate if you you could shed some light on your ml infrastructure.

Do you use any GPUs locally or is it all cloud? Are you mostly on AWS SageMaker for training or do you do stuff locally too?

Are you self hosting any models?

If you'd like to give a rough estimate. How much does it cost you to generate 1k tokens across your infra? Roughly how many thousand tokens are you generating?

Hey, I just answered this question in the Journey thread.

I use Sagemaker for the training, the model gets stored as an S3 which is then downloaded and the inference run locally.

I am self hosting models in local machines, yes. The cost per 1k token is high for GPT3 fine tuned content. Beyond that its almost negligible since models run locally. The only cost is a one time training, dataset scraping/sanitization cost.

As I explained in the other thread, since this is an output-oriented process (where I push a lot of content out at once) as against a process oriented process (where a huge model is built for it to output limited data inside a closed environment) - the economy of scale is in my favor. And anything beyond my current setup would be overkill.
 
Data science student here
Have been playing around with transfer learning lately (mostly on computer vision models).

1- If i understood right you’re not using GPT-3 but GPT-J ?
2- Can you share the configuration of the MLP you train on your data ? ( the 2nd half of the NN as of how many layers, nodes, activation functions) .
3- Also i suppose you’re using LSTM cells in your layers ?
4- Does the locally trainable part of your network have to be an RNN ?

I know it’s a lot of questions and would respect it if you don’t want to share everything ( i know how much time and effort and stackoverflow it takes to hyperparameter tune these models).

You’re doing a great job
 
Data science student here
Have been playing around with transfer learning lately (mostly on computer vision models).

1- If i understood right you’re not using GPT-3 but GPT-J ?
2- Can you share the configuration of the MLP you train on your data ? ( the 2nd half of the NN as of how many layers, nodes, activation functions) .
3- Also i suppose you’re using LSTM cells in your layers ?
4- Does the locally trainable part of your network have to be an RNN ?

I know it’s a lot of questions and would respect it if you don’t want to share everything ( i know how much time and effort and stackoverflow it takes to hyperparameter tune these models).

You’re doing a great job

I am not a data scientist - so I am probably not the best person to answer your technical questions - but let me try.

1. I use both GPT3 and GPTJ (and other) - I am currently training T0pp and figuring out someone who can give me monies to train the new Yandex Model (released yesterday)

2. The perceptron is designed on a case-by-case basis - I don't have a definite configuration for you.

3/4. So far I have exclusively worked with Transformer models (I want to try LSTM for some paraphrasing, but am still learning)

Cheers man, keep asking, keep learning. (If my answers appear stupid, please note that I am not a DS).
 
Do you think it's a good idea to go all-in on AI-generated content?
Using tools like Jasper to be specific
 
Do you think it's a good idea to go all-in on AI-generated content?
Using tools like Jasper to be specific

I've had great results with AI content. However, I cannot make a statement on the veracity of Jarvis etc.

I use a fine-tuned model for GPT-3 + my custom fine tuned models.

If you checkout my journey thread, you'll see that Site # 5 is actually just Pure AI content.
 
figuring out someone who can give me monies to train the new Yandex Model (released yesterday)
You don't need to train the model. Its like GPT3, pre-trained and only needs few shot learning. Just like gpt it can be finetuned and I guess you meant Fine tuning it.

I guess it will perform worse than GPT-3. GPT3 is trained on an all english dataset comprising of several types of text.

Yandexs model is trained on a mixed English and Russian Dataset. And they havent published performance benchmarks for the model.
 
You don't need to train the model. Its like GPT3, pre-trained and only needs few shot learning. Just like gpt it can be finetuned and I guess you meant Fine tuning it.

I guess it will perform worse than GPT-3. GPT3 is trained on an all english dataset comprising of several types of text.

Yandexs model is trained on a mixed English and Russian Dataset. And they havent published performance benchmarks for the model.

Yeah I am aware. I meant run the instance of the Yandex model. Sorry was multi-tasking.

I am about to run a test on this next weekend on an instance for a couple days. Let's see if am able to get it working.

I am also part of a group of data nerds who might be running a test on this sooner than that to see and run a comparison on the test output between GPT-3 and this.

Let's see.
 
Beyond PAA

A lot of you Skype'd me to ask about Data Enriched sites in my title.

So these are one more type of site I build.

They answer a different type of question with a lot of supporting data. Either through a dataset or in general.

Most of the questions are again auto generated, but each question follows a pattern.

Check this response in my journey, for example, about generating pages for Systolic and Diastolic blood pressure and another one about brands.

https://www.blackhatworld.com/seo/j...on-cumulative-uvs-month.1416719/post-15379659
 
That's highly dependent on a lot of factors.

Something as simple as fine-tuning GPT3 would cost you half of what you see on their pricing page. Fine Tuning tokens are charged at a 50% discount.

If you'd like to work with other models, you can either setup the infrastructure yourself, download the available models and train them, test them and deploy them.

You can also use a third party API like the Huggingface Inference API - https://huggingface.co/inference-api - that works really well in conjunction with HF's Autotrain - https://huggingface.co/autotrain

For topical clustering and text analysis there are so many third party free and premium APIs that deploy custom models in the back and let you integrate with a simple request - such as Linguakit, Spacy or Twinword

For working with models, you can use Eleuther's Mystic.ai - it costs $12.99 a month + $2 per GPU hour - it let's you access the following models -

GPT-Neo 1.3B
GPT-Neo 2.7B
GPT-Neo 125M
BART Large
GPT-J
GPT-2
GPT-2 Large
GPT-2 Medium
GPT-2 XL

Or checkout goose.ai - which gives you GPT-NeoX at $0.002650/request.

It's very subjective. And depends on what your end goal is. Most other models are not able to generate entire texts unsupervised on their own - atleast till now.
Does goose.ai/gpt-neox api better than gpt-3 openai api?

I”m still thinking either to use openai gpt-3 api or goose neox api as it’s much cheaper.

My plan has changed, this thing is impossible to self host. You need data centre type of hardware and bunch of high end amd cards.

Unless you are self running gpt-2 which is outdated.
 
@BlogPro "I build social signals (automated) and Web 2.0 automatically", can you give some insights, please?
 
What do you think about Jasper as a tool for Al-generated content?
 
As I said above, I now build websites using AI and scraping. I train my own AI NLP models to generate text on the fly and to generate context relevant content. I scrape all day, everyday.
Thank you for creating the topic!

Can you share some courses on how to create a text generator AI that you think it will work? I want to start building my own AI text gen and scraping model but I don't know what to learn besides some python lines of code. There are so many instructions/courses out there and it is overwhelming.
 
Does goose.ai/gpt-neox api better than gpt-3 openai api?

I”m still thinking either to use openai gpt-3 api or goose neox api as it’s much cheaper.

My plan has changed, this thing is impossible to self host. You need data centre type of hardware and bunch of high end amd cards.

Unless you are self running gpt-2 which is outdated.

NeoX will require an immense amount of fine-tuning before it begins outputting coherent sentences. Yes, the training data is large - but it does not have the sanitization that GPT3 has and so it returns a lot of gibberish.

I personally believe that even GPT3 requires fine-tuning before being used.

@BlogPro "I build social signals (automated) and Web 2.0 automatically", can you give some insights, please?

Hey, So I answered this in my journey thread here - https://www.blackhatworld.com/seo/j...on-cumulative-uvs-month.1416719/post-15383117

The process is almost entirely automated. All I have to do is, add a WordPress credentials (for it to fetch new post and their publish status every once in a while)

What do you think about Jasper as a tool for Al-generated content?

I can't say, because I haven't used it. I would recommend you try a few tools and make a judgement for yourself.

Which one works best for beginners?

It's a matter of personal preference entirely. I like PPC/Leadgen/CPA because I love monetization. Optimizing a page to extract the maximum possible revenue. Of course the payment is higher, but not all my visitors convert.

This issue is resolved when using a Display network - as you get paid per 1000 visitors, and don't have to worry about conversion etc.

You can also use both of them together.

For beginners, I'd recommend starting with Adsense to wet your beak, then as you begin scaling your site and getting more and more traffic, you can try signing up for a Display network or going the CPA/Leadgen way

Thank you for creating the topic!

Can you share some courses on how to create a text generator AI that you think it will work? I want to start building my own AI text gen and scraping model but I don't know what to learn besides some python lines of code. There are so many instructions/courses out there and it is overwhelming.

"Creating a text generator AI" is extremely difficult - all the big companies are doing it.

Your approach is wrong. First learn python well. Then start working with models.

Once you learn the language, you'll start understanding sample codes and what they do. You'll understand how to use a model to run an inference for completely unique text. Or how to scrape and paraphrase existing content.
 
"Creating a text generator AI" is extremely difficult - all the big companies are doing it.

Your approach is wrong. First learn python well. Then start working with models.

Once you learn the language, you'll start understanding sample codes and what they do. You'll understand how to use a model to run an inference for completely unique text. Or how to scrape and paraphrase existing content.
thank you for your advice. Now i know what i need to do. Big help
 
Hey Fam,

So I completed 1500 posts and about 10 years on the forum as a member.

A lot of you know me from the Amazon Affiliate Site AMA I did around 4 years ago.

A lot has changed since then and like everyone else - I have adapted. For the past 2 years, I have been building Google PAA scraped sites, snippet sites, content enriched sites and largely concentrating on automation where possible and implementing more and more AI in the workflow.

About me

I own and operate a decent sized web services company in an Asian country.

I have been building sites for a decade now for clients and for myself. That is how I got started.

I was building sites back when Adsense was all the rage. I made a decent sized income then - enough to build a comfortable spot for me to start my company.

I was building sites when Buzzfeed was merely listicles and tracking viral content.

I was building sites when Scott launched Viralnova.com and over the course of a few months changed how the Internet consumed its information. Clickbait being the term here.

I was one of the few people to present a proof-of-concept to the OpenAI team and gain access to the AI Beta when it launched.

//

Right now...

As I said above, I now build websites using AI and scraping. I train my own AI NLP models to generate text on the fly and to generate context relevant content. I scrape all day, everyday.

I have tackled the hardest of niches for the longest of tails.

My sites have been shared on this forum, as well as Reddit and even a couple Russian forums (really proud of that last one).

You see a site you like, and think how you can build a similar enterprise. I see a site I like, and think how I can automate it.

I have been going around answering random questions on operations and optimization related to scraping, AI etc. Decided to go with an AMA.

So this is my second, giving back to the community thread.

//

What this will not be?

I won't hold your hand. I won't be sharing my scripts or code (except maybe a little). I won't promote anything. I won't write your code for you.

//

What this will be?

I'll answer questions on how-to do things. The best way to optimize information flow. How to rapidly prototype a site and deploy it ASAP.

I'll help you debug your logic. I'll help you understand a topic that you think is alien to your thought process.

//


Please note

Like with everything else, all of the information is discretionary and tested by me and me alone. A lot of you may find success with my methods/answers or may not find success at all.

I am not building a get rich quick guide - I am a believer in the concept of your website being your business and how to treat it like one.

//

So if you have any questions on the above topics (or on topics related to them) - I'll be happy to answer them.

Let's keep it civil.

Bring on the questions, my body is ready. I have my trusty JW by my side and I'll check this thread in gaps of a few minutes/hours and try and answer questions as they come along.
Whats your content strategy? like do you build out content silos, and then focus on internal linking, as internal linking is known to build authority. Or do you just churn out articles day on day?
 
Whats your content strategy? like do you build out content silos, and then focus on internal linking, as internal linking is known to build authority. Or do you just churn out articles day on day?

I do build out content Silos. The sites are automated, but there's a great amount of planning before a scraper is deployed.

For internal linking I use Inline Related Post Plugin for now. As I said before, I am working on a solution to enable incontent linking, but that will take some time.


Don't be sad bud. I apologise, I miss questions sometimes.

I have answered your query before already. I don't purchase links for my AI sites. I use automated social signals and build tech stack links (Amazon/Google etc.) and run them against an indexer.

As your site continues to grow, new and new links automatically begin appearing.
 
Back
Top