HTML Tidy is a computer program and a library whose purpose is to fix invalid HTML and give the source code a reasonable layout (aka indent style).
It was developed by Dave Raggett of W3C, then passed on to become a Sourceforge project. Its source code is written in ANSI C for maximum portability and precompiled binaries are available for a variety of platforms. It is available under the W3C license (a permissive, BSD-style license).
Examples of bad code it is able to fix:
- Missing or mismatched end tags, mixed up tags
- Adding missing items (some tags, quotes, ...)
- Reporting proprietary HTML extensions
- Change layout owing to predefined style
- Transform characters from some encodings into HTML entities
- Cleaning up presentational markup
JTidy
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.
More information on JTidy can be found on the JTidy SourceForge project page .