[BlogPro AMA] 1500 Posts - Ask about PAA Sites / Data Enriched Sites / Job Boards / Scraping / AI / ML - Go Ahead!

BlogPro

Elite Member
Jr. Executive VIP
Jr. VIP
Joined
Apr 23, 2012
Messages
2,577
Reaction score
5,402
Hey Fam,

So I completed 1500 posts and about 10 years on the forum as a member.

A lot of you know me from the Amazon Affiliate Site AMA I did around 4 years ago.

A lot has changed since then and like everyone else - I have adapted. For the past 2 years, I have been building Google PAA scraped sites, snippet sites, content enriched sites and largely concentrating on automation where possible and implementing more and more AI in the workflow.

About me

I own and operate a decent sized web services company in an Asian country.

I have been building sites for a decade now for clients and for myself. That is how I got started.

I was building sites back when Adsense was all the rage. I made a decent sized income then - enough to build a comfortable spot for me to start my company.

I was building sites when Buzzfeed was merely listicles and tracking viral content.

I was building sites when Scott launched Viralnova.com and over the course of a few months changed how the Internet consumed its information. Clickbait being the term here.

I was one of the few people to present a proof-of-concept to the OpenAI team and gain access to the AI Beta when it launched.

//

Right now...

As I said above, I now build websites using AI and scraping. I train my own AI NLP models to generate text on the fly and to generate context relevant content. I scrape all day, everyday.

I have tackled the hardest of niches for the longest of tails.

My sites have been shared on this forum, as well as Reddit and even a couple Russian forums (really proud of that last one).

You see a site you like, and think how you can build a similar enterprise. I see a site I like, and think how I can automate it.

I have been going around answering random questions on operations and optimization related to scraping, AI etc. Decided to go with an AMA.

So this is my second, giving back to the community thread.

//

What this will not be?

I won't hold your hand. I won't be sharing my scripts or code (except maybe a little). I won't promote anything. I won't write your code for you.

//

What this will be?

I'll answer questions on how-to do things. The best way to optimize information flow. How to rapidly prototype a site and deploy it ASAP.

I'll help you debug your logic. I'll help you understand a topic that you think is alien to your thought process.

//


Please note

Like with everything else, all of the information is discretionary and tested by me and me alone. A lot of you may find success with my methods/answers or may not find success at all.

I am not building a get rich quick guide - I am a believer in the concept of your website being your business and how to treat it like one.

//

So if you have any questions on the above topics (or on topics related to them) - I'll be happy to answer them.

Let's keep it civil.

Bring on the questions, my body is ready. I have my trusty JW by my side and I'll check this thread in gaps of a few minutes/hours and try and answer questions as they come along.
 
What types of automation do you mean by "I see a site I like, and think how I can automate it" bossman, I'm not in a position to start thinking about automating as I'm almost a great level to start developing.

Checking out my future with your thread ;)
 
What types of automation do you mean by "I see a site I like, and think how I can automate it" bossman, I'm not in a position to start thinking about automating as I'm almost a great level to start developing.

Checking out my future with your thread ;)

Anything really - with minimal human intervention and a lot of output.

If you cannot automate, then at least draw out a replication plan.

For instance, you find a site in the Casino niche that's doing well.

Take a pen and paper and isolate everything you see on the site.

- Note "I need this" features on the site. Features that you've never seen before and would like to replicate. This could be an interesting Opt-in form, a calculation tool, a form, or a way they serve content.

- Note their content strategy.

- Plop the site in ahrefs and see where/how they're ranking. Get all the keywords.

- Use a tool like https://sitechecker.pro/ or SEMRush site Audit to scrape the full site and understand how they're doing what they're doing.

Once armed with the above, begin your replication strategy.

How will you structure your pages? What will be the silo structure (if any)? What are the key data variants included in the content? How can you use the sidebar to enrich the site with more relevant data? How will the homepage look? What are the keywords you'll rank? Which content technique you'll implement?

Once done - you have a proper checklist to building the site.
 
My first question is about "Google Indexing"

How do you index your bulk content in google and bing ??

I see google does not index bulk article page properly.

Even, Bing remove full site after a few days.

Why? Any proven solution? or logic for this problem?
 
Thanks for opening up an AMA.

What's the best way to start with PAA automated sites for an intermediate SEO?

I am assuming you know how to set one up.

Thereafter, it's mostly on how content enriched your site is - beyond the PAA. What is your backlink profile, domain age etc.

Also this is entirely niche dependent.

I primarily build sites on fresh domains. My reasoning is that domains are expendable. In the event of de-indexation or penalty, it takes me a few hours to relaunch the site in the same niche under a different domain.

As of today, I have 11 sites on expired domains, auction domains, or old abandoned sites that were just kicked on.

Did any of your sites got hit by recent G update because of the use of AI content?

I didn't lose any sites where I used AI content.

I did lose a few where it was primarily "PAA Only" or "Data only" from other sources.
 
My first question is about "Google Indexing"

How do you index your bulk content in google and bing ??

I see google does not index bulk article page properly.

Even, Bing remove full site after a few days.

Why? Any proven solution? or logic for this problem?

Ask yourself - would a legit site make 200 new content posts everyday?

I go for the slow and steady form - no more than 8-10 posts a day. Spread across variant categories. Enriched with high quality AI content + whatever else I can find. I also inculcate Schema and use all on-page SEO best practices. I make heavy use of Sky scraper technique as well.

Then, I send a huge variety of social signals (automated) and build traditional indexable links (profiles etc.) - again automated.

eyOxWTG


This is one of my most recent experiments. The site is over a month old, but I screwed up the start month from 5 to 6. So instead of scheduling the content in May - it began the schedule in June. So the first posts were on 2nd June.

This is one of my older sites - about 7 months old.

Gckz7Wy


I have had mixed success with the Google Indexing API.

I know for sure @Sartre uses it with great success. You should definitely checkout his journey here -

[Journey] 1 million UVs/month in 12 months using AI generated content. Let's do it!

 
Do you use your own AI or a public one?

If the latter, which one do you recommend ?
 
Do you use your own AI or a public one?

If the latter, which one do you recommend ?

I use a combination of a few custom fine-tuned GPT-3 models + GPT-NeoX (again fine-tuned).

Not all models are for text generation - if you have access to the resources you can train a model to do one very specific task on the site for every post - and do it really well, at that.
 
1. What programming language are you using to train your models? Does python play a big part on your tech stack?

2. Are you using WordPress as cms? Or just plain html sites?

3. What other tools are you using to speed up your processes?

Very interested in what you have to say. Best regards
 
1. What programming language are you using to train your models? Does python play a big part on your tech stack?

Python is the basis for my tech-stack. From training to deploying it is there all the way.

I am currently learning R - because the people I interact with for my ML/AI needs - use it as their daily driver

I come from a PHP background, so that helps in some quick PoC.

Given my background, I have a decent understanding of APIs and basic manipulation.

For front-end, it's mostly PHP and Javascript.

2. Are you using WordPress as cms? Or just plain html sites?

Yes, I absolutely love WordPress. It helps with quick deployment. And is sped up if you know what you're doing.

I am also experimenting with static site generation as I write this.

3. What other tools are you using to speed up your processes?

I have servers at Vultr, DO and Linode.

I use Cloudflare, Namecheap PremiumDNS and CloudNS for DNS Management.

I use cPanel/WHM for my bigger sites.

For quick deployment, I use EasyEngine on an Ubuntu server with Redis Cache enabled.
 
How do you choose keywords, how do you acquire them automatically?
 
Ask yourself - would a legit site make 200 new content posts everyday?

I go for the slow and steady form - no more than 8-10 posts a day.

I know for sure @Sartre uses it with great success. You should definitely checkout his journey here -

https://www.blackhatworld.com/seo/journey-1-million-uvs-month-in-12-months-using-ai-generated-content-lets-do-it.1360940/​

I wish Google was so white hat and didn't benefit sites like midogguide(dot)com which has 1,870,000 results indexed rn from my location. They didn't even paraphrase their stuff. These folks definitely didn't drip feed 10posts/day :D

Imo, with indexing the big difference is the authority of your domain/backlinks.
 
I wish Google was so white hat and didn't benefit sites like midogguide(dot)com which has 1,870,000 results indexed rn from my location. They didn't even paraphrase their stuff. These folks definitely didn't drip feed 10posts/day :D

Imo, with indexing the big difference is the authority of your domain/backlinks.

Haha! I know right... I am just trying to breathe an air of legitimacy - not just for google but for my conversions - since I am not doing display.

That being said, I do have the Indexing API loaded for a site that is quite possibly my best AI creation till date.

22k published pages + 191K scheduled. Let's effin' go!
 
How much cost is there to run the custom AI models?

That's highly dependent on a lot of factors.

Something as simple as fine-tuning GPT3 would cost you half of what you see on their pricing page. Fine Tuning tokens are charged at a 50% discount.

If you'd like to work with other models, you can either setup the infrastructure yourself, download the available models and train them, test them and deploy them.

You can also use a third party API like the Huggingface Inference API - https://huggingface.co/inference-api - that works really well in conjunction with HF's Autotrain - https://huggingface.co/autotrain

For topical clustering and text analysis there are so many third party free and premium APIs that deploy custom models in the back and let you integrate with a simple request - such as Linguakit, Spacy or Twinword

For working with models, you can use Eleuther's Mystic.ai - it costs $12.99 a month + $2 per GPU hour - it let's you access the following models -

GPT-Neo 1.3B
GPT-Neo 2.7B
GPT-Neo 125M
BART Large
GPT-J
GPT-2
GPT-2 Large
GPT-2 Medium
GPT-2 XL

Or checkout goose.ai - which gives you GPT-NeoX at $0.002650/request.

It's very subjective. And depends on what your end goal is. Most other models are not able to generate entire texts unsupervised on their own - atleast till now.
 
Back
Top