Home > work > Regular Expression to remove HTML tables

Regular Expression to remove HTML tables

September 12th, 2006 Leave a comment Go to comments

Just figured out this regular expression to remove all tables from a HTML document:

</?table[^>]*>|</?tr[^>]*>|</?td[^>]*>|</?thead[^>]*>|</?tbody[^>]*>

Extremely useful for cleaning up prehistoric mark-up with a text editor that supports regular expression find-and-replace searches.

And to go all the way, this one removes font tags too:

</?table[^>]*>|</?tr[^>]*>|</?td[^>]*>|</?thead[^>]*>|</?tbody[^>]*>|</?font[^>]*>

  1. E_Jim
    January 8th, 2008 at 20:30 | #1

    Thanks a lot, you have no idea how much time you’ve saved me!

  2. kjdash
    March 11th, 2008 at 15:04 | #2

    Pretty good, but missing a few things:


    ]*>|]*>|]*>|]*>|]*>|]*>|]*>|]*>

    would be more complete.
    Thanks for the foundation

  3. kjdash
    March 11th, 2008 at 15:06 | #3

    ugh, it stripped everything else out.

    I added th and tfoot

  4. llll
    April 10th, 2008 at 07:04 | #4

    Great code

  5. Mark
    October 19th, 2008 at 05:32 | #5

    you’re a legend, thanks a lot for this, save me a fair bit of time!

    cheers.

  1. No trackbacks yet.