First, make sure that the pages are high quality, with much more original content than ads, decorative graphics, and so on. Second, make sure that the site's navigation links are properly formed and that there is a plain HTML <a>nchor tag link somewhere on your site pointing to each page. It's always best to have a hierarchical structure to a website's navigation: the home page links to the major category pages, which link to the sub-category pages, etc. This would be especially true for a site with 5 million pages. Keep in mind that the depth and frequency with which Google crawls a site is largely dependent on PageRank. So you need to insure that sufficient PageRank ("link juice") is flowing throughout your site to get those millions of pages indexed. It would be a good idea to build links to some of the internal pages of such a large site, instead of simply focusing on the site's main page.
You may never get every single page of your site indexed. For many reasons, you'll need to make sure that the content is original and generally good quality. Good luck!
here are some steps that will help to get deep internal pages indexed:
1) Make sure it has significant unique content.
Search engines may visit but tend not to index low quality pages.
2) Increase Google toolbar usage
Google likes to discover pages based on user activity. If your page is great then simply publicize it and get people with Google toolbar to spend time on that page.
3) Boost unique internal linking
Add links from your other pages to the internal page. I am not talking about adding to the footer, I am talking about adding unqiue links embedded in relevant content.
4) Submit sitemap
It does not hurt to submit a sitemap with all of your pages. It also does not guarantee all of your pages will be indexed. Its easy to do, so why not.
5) Boost external links
Develop links from relevant external websites to your deep content. It is ideal to gain links from external websites that are likely to generate traffic.
ps please remember that almost every website will have some webpages that just never get indexed. this is because almost every website has some very low quality pages that dont need to be indexed. search engines are not looking to index pages that do not add value to their serps.
I need help in understanding how to get deep pages on my site to index.
My site has over 5.6 million pages, and I've hit a wall trying to get them indexing.
Can someone help explain methods that can be used to get those pages to index?
If you owe million pages in website, then you must have to proper formation of sitemap that will be understandable by search engines to know how do you make a hierarchy for internal deep pages. A Single sitemap contain max. 50,000 web links. I think you know!
This is the best way to tell you search engines to crawl and index your deep pages. [using sitemap]
When Google schedules the Deep Crawlers to index New URLs and 301 and 302 redirected URLs, just the URLs (not the descriptions) start appearing in search engines result pages when you run the a search.
The best methods for getting your deep pages indexed is by doing the following preferably in this order
1. Create / Update your sitemap
2. Create quality content for your visitors
3. Encourage them to share, comment and recommend your content on those pages
4. Google with determine the usefulness of your pages by the number of shares, comments and recommendations it has from other users which will entice them to visit and crawl them regularly
You can also include do_follow meta tags within your pages to encourage robots to visit your pages ever so often
By the way, are you using wordpress for your website or is it a custom php job? If its custom then I would consider taking an indepth look at how wordpress handles SEO practices and implement that into your php code. It will help alot!