Click to See Complete Forum and Search --> : help converting Word to html CLEANLY...


glennnphp
04-28-2007, 12:06 PM
in the development of a website i've been given about 50 Word files as my client would have them designed as a webpage - i've NEVER had success doing a simple conversion - is there any good way to convert these .doc files to html without total destruction of the layout?

anyone?

many thanks,
Glenn

WebJoel
04-28-2007, 01:18 PM
Google is your friend. :)
I Googled "convert msword to html" and found some software that does this, -but none of the are free. :( Cost ranges from $49.oo to around $100.oo, and higher..
(I feel your pain... I've hand-converted pages of msword to html. It's not too fun).

glennnphp
04-28-2007, 01:22 PM
Google has been my friend for years. Those apps don't do any better than Word "Saves As" html...

thanks, tho

WebJoel
04-28-2007, 01:54 PM
-That's what I figured... :cool:

Fang
04-28-2007, 01:56 PM
Give OpenOffice (http://www.openoffice.org/) a try, it strips out all MS tags, removes most style to embedded.
The quality of the cleaned document is dependent on the writer of the original.
I find that few people know how to 'write' a semantically correct document in Word.

Charles
04-28-2007, 06:06 PM
OpenOffice is wonderful in its own right but I'm not impressed with the HTML it produces. Myself, I'd just Word to save each document as "HTML" and then use HTML Tidy (http://tidy.sourceforge.net/) to convert to HTML. It has a special "from MS Word" mode. And if that doesn't get you where you need to be I would use HTML Tidy (http://tidy.sourceforge.net/) to convert the "HTML" to XHTML and then use Xalan (http://xml.apache.org/xalan-j/) and XSLT to generate the HTML. I'm a bit weird, but I would then use FOP (http://xmlgraphics.apache.org/fop/) with XSL-FO and generate printer-friendly, PDF versions as well.

Major Payne
04-28-2007, 11:59 PM
Might try this online tool and see if it works well enough:
Textism: Word HTML Cleaner (http://textism.com/wordcleaner/)

Ron

felgall
04-29-2007, 12:40 AM
The simplest and only way of getting clean HTML is

1. to save the Word Document as plain text.
2. Load that plain text into a web editor.
3. Use the web editor to insert the appropriate HTML around the text.

NogDog
04-30-2007, 01:43 AM
From within Word itself, you can save a file as HTML while avoiding all the MSWord-specific markup by doing a "Save As" and specifying "Web Page, Filtered" as the file type. It's still probably going to be pretty ugly mark-up, but it's a lot better than just saving it as a "Web Page". At the very least, it would provide a better baseline from which to try Charles's idea of using HTMLTidy.