mich_dev
04-27-2010, 09:22 AM
Apologies, I'm not sure what to call this tool so bear with me. :confused:
We have an unusual task that requires us to identify the active files on an aging web server (well, many 'mostly' mirrored web and app servers) and then add that content to a source control solution and archive / clean up the rest of the garbage.
We attempted to parse the log files but found that they didn't have the depth of resolution we needed. The servers are in an automotive environment and the files could date back over a decade. The log files we have access to are a couple of months old (at best) and getting access to older log files would be difficult (if possible). We then wrote a tool that simply checked the last accessed date/time of all the files and found that method to be less than rewarding as files we know are used daily had last access dates of months or years ago.
The next idea that came to mind was a client side tool like Web Devil that would basically download all files active on the site (trace each link, download all of the html content and related binaries while keeping the same folder structure). The problem with this is that it would miss anything assembled on the server side (includes, common headers / footers, scripts, etc.).
So the question is does anyone know of a server side tool that would allow me to log all of the active files used on a web site.
Any help would be greatly appreciated.
Thanx
We have an unusual task that requires us to identify the active files on an aging web server (well, many 'mostly' mirrored web and app servers) and then add that content to a source control solution and archive / clean up the rest of the garbage.
We attempted to parse the log files but found that they didn't have the depth of resolution we needed. The servers are in an automotive environment and the files could date back over a decade. The log files we have access to are a couple of months old (at best) and getting access to older log files would be difficult (if possible). We then wrote a tool that simply checked the last accessed date/time of all the files and found that method to be less than rewarding as files we know are used daily had last access dates of months or years ago.
The next idea that came to mind was a client side tool like Web Devil that would basically download all files active on the site (trace each link, download all of the html content and related binaries while keeping the same folder structure). The problem with this is that it would miss anything assembled on the server side (includes, common headers / footers, scripts, etc.).
So the question is does anyone know of a server side tool that would allow me to log all of the active files used on a web site.
Any help would be greatly appreciated.
Thanx