commit | 243cbc9c40e7022f35a0a108f488ff30bb735061 | [log] [tgz] |
---|---|---|
author | John Messerly <jmesserly@google.com> | Thu Mar 05 15:05:19 2015 -0800 |
committer | John Messerly <jmesserly@google.com> | Thu Mar 05 15:05:19 2015 -0800 |
tree | 883600ed9b869bb9ce28f2b8fdfe884f6e8018a6 | |
parent | a58a0929c729ba5288d20ec6a5797f50658a68f4 [diff] |
remove most string concat, fixes #7 This fixes most of the egregious cases of string concat in the HTML tokenizer: * attribute names/values * data/comments * script/rawtext/rcdata It also fixes adjacent text nodes in TreeBuilder, which happens whenever space characters are adjacent to characters. There's still an issue of preserving char codes for longer, but this should get out of the O(N^2) at least. R=sigmund@google.com Review URL: https://codereview.chromium.org//987433005
This is a pure Dart html5 parser. It‘s a port of html5lib from Python. Since it’s 100% Dart you can use it safely from a script or server side app.
Eventually the parse tree API will be compatible with dart:html, so the same code will work on the client and the server.
Add this to your pubspec.yaml
(or create it):
dependencies: html: any
Then run the Pub Package Manager (comes with the Dart SDK):
pub install
Parsing HTML is easy!
import 'package:html/parser.dart' show parse; import 'package:html/dom.dart'; main() { var document = parse( '<body>Hello world! <a href="www.html5rocks.com">HTML5 rocks!'); print(document.outerHtml); }
You can pass a String or list of bytes to parse
. There's also parseFragment
for parsing a document fragment, and HtmlParser
if you want more low level control.
./test/run.sh