How to get indexed website with 6 000 000 (!!!) pages

Kakucis

Registered Member
Joined
Sep 27, 2009
Messages
64
Reaction score
8
So one of my collogues have build up page with ~6 000 000 pages. It is like huge search engine for chemicals. But they are are stored in their data base with descriptions and related articles and suppliers.

So the questions is how to get them all indexed. Domain is registered 3 years ago and google have indexed 30 000 pages.

I found that they have duplicated meta tag descriptions. Titles are unique.

They have no directory or any category. Just huge search engine.

If I will fix the duplicated meta tag problem and directory problem and will start some massive link building to main domain and for some inner pages. How long will it take to get indexed at least 10-20% of all pages?

I think it will take forever, but results could be impressive because of quality content and so many pages. I think page rank for inner pages only can be something like 4 or am I wrong?

I made some calculations and if in need to crawl all pages in half a year it is 23 pages in minute.

So from where I have to start and why google bot stopped indexing? I think that it was because of duplicate description and bad interlinking.

What do you think off all this?

Sorry for my bad language. Not everyone is native English speaker :)
 
2b) backlinks of diffrent pages not just to the main www
 
They are using google web masters, so it is not the most important thing.

Link building will increase page rank to main page and only for few others, but how bot will get throughout all page if there is bad interlinking?

Anything else?
 
Export the links table from the db to excel or access and load them in chunks into a pinging tool?
 
RSS submit and some bookmarks for some pages
do that 1 time / week and in one month i think over 70% will be indexed
 
1) submit *several* xml sitemaps to GWT (according to Google, "your sitemap can contain up to 50,000 URLs or reach a file size of 10MB, uncompressed")

2) find a way to export the URLs pointing to the most important sections of your site (homepage, first levels) and divide them into several chunks. I would upload them to blogger blogs, squidoo pages or other web 2.0 properties frequently spidered by search engines. Ping those resources and send them some fresh links from forums, comment spam, etc. (and ping those too)

3) [optional] when your pages are indexed, you can remove those links.

If internal linking of your site is well-structured, this should easier the process.
 
Last edited:
Thanks for so many advices. I will try to put some thing in practice.

And if you are interested I can post some progress report from time to time.

If you have some other suggestions then let me know.

@elmbrent - 12k is 500 times less than 6 000 000 :p and I have experience dealing with 7000+ pages, so its not enough.
 
I have a 100,000 page site and the true index (amount of pages that show up when you click to the last page of Google results, not the number on the first page) has gone up quite a bit after submitting my sitemap to Webmaster Tools. I haven't really built any links at all in a long time and even then I only did about 10-20 total to the main site and 1-2 for each of around 130 subcategories.

You either have to build links to individual pages or just wait until Google eventually covers them all.
 
How long does it took for google to index all your pages?
 
Back
Top