commit | f940f823fabad08b85c2d4fa9a76051d636eab99 | [log] [tgz] |
---|---|---|
author | Sam Rawlins <srawlins@google.com> | Thu Dec 10 14:06:35 2020 -0800 |
committer | GitHub <noreply@github.com> | Thu Dec 10 14:06:35 2020 -0800 |
tree | 32abcadcb5c7d26c24e8dd60b96504f8ed3f647b | |
parent | 6f89681d59541ddb1cf3a58efbdaa2304ffc3f51 [diff] |
Overhaul link and emphasis resolution (#345) * Overhaul link and emphasis resolution Resolution of complex link and emphasis text follows very specific rules which were incompatible with the currenty TagState stack. The new algorithms follow the process outlined in the [CommonMark spec](https://spec.commonmark.org/0.29/#an-algorithm-for-parsing-nested-emphasis-and-links). The crux of the issue which required such an overhaul is that the current TagState stack did not include any ability to wait to parse a tag's inner text until it was known that a tag could be closed at the current position, then parse that inner text, then close the tag. This unfortunately requires a breaking change for downstream packages which subclass TagSyntax. * BREAKING: TagSyntax constructor no longer takes an `end` parameter. TagSyntax no longer implements `onMatchEnd`. Instead, TagSyntax implements a method called 'close' which creates and returns a Node, if a Node can be created and closed at the current position. If the TagSyntax instance cannot create a Node at the current position, the method should return `null`. Some TagSyntax subclasses will unconditionally create a tag in `close`, while others may be unable to, such as LinkSyntax, if an inline or reference link could not be resolved. * Loosely, the stack of TagStates is replaced with a stack of Delimiters and a tree of parsed HTML nodes. * Emphasis and strong emphasis, link and image open delimiters are handled with the "look for link or image" and "process emphasis" algorithms. * We combine adjacent text in a more intentional way, and likely more efficient manner. * The _DelimiterRun class is replaced with three classes: abstract Delimiter and subclasses SimpleDelimiter, and DelimiterRun. These changes result in no new spec failures. Emphasis compliance rises from 96% to 99%. Link compliance rises from 90% to 93%. Total CommonMark compliance rises from 93% to 94%. Total GFM compliance rises from 92% to 93%. * documentation and simplification * Fix test * revert gitignore * bump to 4.0.0-dev
A portable Markdown library written in Dart. It can parse Markdown into HTML on both the client and server.
Play with it at dart-lang.github.io/markdown.
import 'package:markdown/markdown.dart'; void main() { print(markdownToHtml('Hello *Markdown*')); //=> <p>Hello <em>Markdown</em></p> }
A few Markdown extensions, beyond what was specified in the original Perl Markdown implementation, are supported. By default, the ones supported in CommonMark are enabled. Any individual extension can be enabled by specifying an Array of extension syntaxes in the blockSyntaxes
or inlineSyntaxes
argument of markdownToHtml
.
The currently supported inline extension syntaxes are:
new InlineHtmlSyntax()
- approximately CommonMark's definition of “Raw HTML”.The currently supported block extension syntaxes are:
const FencedCodeBlockSyntax()
- Code blocks familiar to Pandoc and PHP Markdown Extra users.const HeaderWithIdSyntax()
- ATX-style headers have generated IDs, for link anchors (akin to Pandoc's auto_identifiers
).const SetextHeaderWithIdSyntax()
- Setext-style headers have generated IDs for link anchors (akin to Pandoc's auto_identifiers
).const TableSyntax()
- Table syntax familiar to GitHub, PHP Markdown Extra, and Pandoc users.For example:
import 'package:markdown/markdown.dart'; void main() { print(markdownToHtml('Hello <span class="green">Markdown</span>', inlineSyntaxes: [new InlineHtmlSyntax()])); //=> <p>Hello <span class="green">Markdown</span></p> }
To make extension management easy, you can also just specify an extension set. Both markdownToHtml()
and Document()
accept an extensionSet
named parameter. Currently, there are four pre-defined extension sets:
ExtensionSet.none
includes no extensions. With no extensions, Markdown documents will be parsed with a default set of block and inline syntax parsers that closely match how the document might be parsed by the original Perl Markdown implementation.
ExtensionSet.commonMark
includes two extensions in addition to the default parsers to bring the parsed output closer to the CommonMark specification:
Block Syntax Parser
const FencedCodeBlockSyntax()
Inline Syntax Parser
InlineHtmlSyntax()
ExtensionSet.gitHubFlavored
includes five extensions in addition to the default parsers to bring the parsed output close to the GitHub Flavored Markdown specification:
Block Syntax Parser
const FencedCodeBlockSyntax()
const TableSyntax()
Inline Syntax Parser
InlineHtmlSyntax()
StrikethroughSyntax()
AutolinkExtensionSyntax()
ExtensionSet.gitHubWeb
includes eight extensions. The same set of parsers use in the gitHubFlavored
extension set with the addition of the block syntax parsers, HeaderWithIdSyntax and SetextHeaderWithIdSyntax, which add id
attributes to headers and inline syntac parser, EmojiSyntax, for parsing GitHub style emoji characters:
Block Syntax Parser
const FencedCodeBlockSyntax()
const HeaderWithIdSyntax()
, which adds id
attributes to ATX-style headers, for easy intra-document linking.const SetextHeaderWithIdSyntax()
, which adds id
attributes to Setext-style headers, for easy intra-document linking.const TableSyntax()
Inline Syntax Parser
InlineHtmlSyntax()
StrikethroughSyntax()
EmojiSyntax()
AutolinkExtensionSyntax()
You can create and use your own syntaxes.
import 'package:markdown/markdown.dart'; void main() { var syntaxes = [new TextSyntax('nyan', sub: '~=[,,_,,]:3')]; print(markdownToHtml('nyan', inlineSyntaxes: syntaxes)); //=> <p>~=[,,_,,]:3</p> }
This package offers no features in the way of HTML sanitization. Read Estevão Soares dos Santos's great article, “Markdown's XSS Vulnerability (and how to mitigate it)”, to learn more.
The authors recommend that you perform any necessary sanitization on the resulting HTML, for example via dart:html
's NodeValidator.
This package contains a number of files in the tool
directory for tracking compliance with CommonMark.
dart tool/stats.dart --update-files
to update the per-test results tool/common_mark_stats.json
and the test summary tool/common_mark_stats.txt
.Check out the CommonMark source. Make sure you checkout a major release.
Dump the test output overwriting the existing tests file.
> cd /path/to/common_mark_dir > python3 test/spec_tests.py --dump-tests > \ /path/to/markdown.dart/tool/common_mark_tests.json
Update the stats files as described above. Note any changes in the results.
Update any references to the existing spec by search for https://spec.commonmark.org/0.28
in the repository. (Including this one.) Verify the updated links are still valid.
Commit changes, including a corresponding note in CHANGELOG.md
.