I'm getting more into CEO for my site lately, and my main concern is Google, of course. I'm a little confused about exactly how their crawlers work. For instance, I'm using Google webmaster and seeing a vastly different amount of pages crawled day to day, none of which even come close to representing every page on my entire site, so I assume they limit the crawl in some fashion. I don't expect them to crawl every single "page" on the site every day, as there are probably a couple hundred thousand, possibly even a million at this point, so I'd like a way to point them towards the most important stuff.
So I guess a sitemap is the answer?
My entire site is my custom code, so I don't have any sitemap building software or anything like that. I know there are some free online ones out there but they seem to severely limit the amount of pages they will crawl. Besides, the point is that I want more control over things. For example, my "default" (home) page links to all of our articles, but so does my forum, so it's kind of pointless to crawl both. And although there may be some benefit to letting a crawler crawl through all of the replies pages of the forum, I'm more concerned about making sure that the "page one" of each article, etc. is getting crawled.
I'm just a bit confused where to start. And one main question I have is... can a sitemap LIMIT what Google crawls? This (from Google) seems to suggest otherwise:
"Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs."
For instance, what I really kind of want to do is just create a small map of the important links (could probably modify my RSS, which currently has all of the newest news, reviews, etc.) and make sure Google gives them priority, but would Google stop there, or would it still crawl all of the stuff it used to? I don't want to lose other stuff (one of our biggest hits, oddly enough, was from an image posted on by a user that ended up the top image for "pokemon hoodie" searches, lol.) The above piece of info from Google would seem to suggest I'd be ok doing my small sitemap because it would only "supplement" their current process, but I dunno.
What do you guys/girls think? I guess one way to find out is to just do it, and watch the crawler results over the next week or so very closely...