- Fix various missing end tags and mismatched tags
- Also add closing slash to various empty tags (<br> => <br/>, <input> => <input/>...)
- Replace some html entities with the actual UTF-8 characters; This looks nicer and makes it easier to run xmllint on the output.
Details
- Reviewers
cmatthews kristof.beyls - Commits
- rL306360: Fix missing/mismatched html tags
Diff Detail
- Repository
- rL LLVM
Event Timeline
It would be wonderful if we could programmatically check these during testing. Thanks Matthias!
I had to make the tags XML-compliant in the daily report page to make testing its content in tests/server/ui/V4Pages.py possible using the Python built-in xml parser (see functions check_nr_machines_reported and get_xml_tree).
So, in other words, I think checking this programmatically would probably be easy and boil down to parsing every HTML page using the Python built-in xml parser, see function get_xml_tree pointed to in the sentence above.
Adding the tests would be easy (you could easy tweak the V4Pages script to not just check the presence of pages but also the well-formedness). Unforunately I had to give up the approach of validating XML for now:
The flask WTForms stuff only outputs HTML5 and has no option (at least nothing without a lot of hackery accessible according to my websearches) that outputs XHTML, so it produces <input> elements that aren't properly closed according to xml rules.
For the record: I just added an optional integration to pytidylib/tidy-html5 that checks lnt pages for html problems (r312061). It can be used with lnt -Dtidylib=1.
Very nice! Thanks for all the cleanups and improvements you've been making to LNT lately!
I guess there's a chance tidylib might be better than just aiming to parse XHTML as proper XML, as it may go beyond what an XML validator is capable of, by being written specifically for HTML?
I would have preferred simpler xml validation too, in fact that is what I tried first. In the end I failed with that approach because the WTForms library that we use only outputs HTML5 and has no way to produce XHTML, thus resulting in unclosed <input> tags without an good way to fix it. So various pages with forms on them will fail xml validation right now.
So the best thing I could find was the tidylib/tidy-html5 combination.