www.webdeveloper.com
Results 1 to 3 of 3

Thread: MS-SQL function for removing Word-generated HTML

  1. #1
    Join Date
    Dec 2002
    Location
    St. Louis, MO, USA
    Posts
    1,582

    MS-SQL function for removing Word-generated HTML

    Has anyone seen, or yet created, an MS-SQL UDF that will strip out Microsoft Word-generated HTML from a string?

    I need something that will remove that garbage and leave proper HTML code alone.

    Thanks,

  2. #2
    Join Date
    Jul 2010
    Location
    /ramdisk/
    Posts
    865
    I need something that will remove that garbage and leave proper HTML code alone.
    Who doesn't?

    You can TEST the HTML as xhtml by doing this:

    SELECT TOP(1) html FROM table
    FOR XML PATH(''), TYPE

    If it fails- it's not valid xhtml. It it passes it works. Unfortunately that's all that comes to mind as far as an easy solution goes. You can try parsing the html yourself and using the self-closing or end tags as a way to signal a closing element- more specifically a valid element or list of valid elements. If you go this method I would strongly consider doing a depth first search; a broken parent DOM element will likely mean the child DOM element is also going to be parsed as broken.

    Look around for XPATH forums or threads. This is an issue that people using XPATH will have- so you might be able to find something that has already been created. BeautifulSoup for Python already has a parse tree- and from what I hear it's amazing. If you know Python you might try translating it (and sharing it!!)

  3. #3
    Join Date
    Dec 2002
    Location
    St. Louis, MO, USA
    Posts
    1,582
    Thanks for the suggestions, eval. The XPATH and Beautiful Soup are appealing. I do NOT know Python; but I can learn. And if there is a way to translate it into SQL, I'll do my best.

    Thanks,

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles