Standards-compliant library for parsing and serializing HTML documents and fragments in Python. html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers. all MIT https://github.com/html5lib/html5lib-python.git https://github.com/html5lib/html5lib-python