I'm trying to develop a tool that will allow me to uniquely ID a web page. This is so that I can assign a unique ID to a web page even if it doesn't necessarily have any a unique ID built into it.
I can't just look at the URL because I need it to be able to tell the difference between the different layouts that are possible within the same URL - for example, if the content of the page changes due to user input then I need to be able to tell the difference between the page with the new layout and the original page.
I tried stripping the dynamic content from the page and generating a hash from the source code but the IDs that I obtained changed when the dynamic content changed. I suppose this is because I was stripping all content from container objects (divs etc), but obviously container objects can be added as dynamic content thus changing the hash!
Any ideas on how to go about this would be greatly appreciated! Thanks.