Couldn't find wdiff. Falling back to builtin diff colouring...
| spec.txt | spec.txt | |||
|---|---|---|---|---|
| --- | --- | |||
| title: CommonMark Spec | title: CommonMark Spec | |||
| author: John MacFarlane | author: John MacFarlane | |||
| version: 0.20 | version: 0.21 | |||
| date: 2015-06-08 | date: 2015-07-14 | |||
| license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | |||
| ... | ... | |||
| # Introduction | # Introduction | |||
| ## What is Markdown? | ## What is Markdown? | |||
| Markdown is a plain text format for writing structured documents, | Markdown is a plain text format for writing structured documents, | |||
| based on conventions used for indicating formatting in email and | based on conventions used for indicating formatting in email and | |||
| usenet posts. It was developed in 2004 by John Gruber, who wrote | usenet posts. It was developed in 2004 by John Gruber, who wrote | |||
| skipping to change at line 240 | skipping to change at line 240 | |||
| A [unicode whitespace character](@unicode-whitespace-character) is | A [unicode whitespace character](@unicode-whitespace-character) is | |||
| any code point in the unicode `Zs` class, or a tab (`U+0009`), | any code point in the unicode `Zs` class, or a tab (`U+0009`), | |||
| carriage return (`U+000D`), newline (`U+000A`), or form feed | carriage return (`U+000D`), newline (`U+000A`), or form feed | |||
| (`U+000C`). | (`U+000C`). | |||
| [Unicode whitespace](@unicode-whitespace) is a sequence of one | [Unicode whitespace](@unicode-whitespace) is a sequence of one | |||
| or more [unicode whitespace character]s. | or more [unicode whitespace character]s. | |||
| A [space](@space) is `U+0020`. | A [space](@space) is `U+0020`. | |||
| A [non-space character](@non-space-character) is any character | A [non-whitespace character](@non-space-character) is any character | |||
| that is not a [whitespace character]. | that is not a [whitespace character]. | |||
| An [ASCII punctuation character](@ascii-punctuation-character) | An [ASCII punctuation character](@ascii-punctuation-character) | |||
| is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | |||
| `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | |||
| `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | |||
| A [punctuation character](@punctuation-character) is an [ASCII | A [punctuation character](@punctuation-character) is an [ASCII | |||
| punctuation character] or anything in | punctuation character] or anything in | |||
| the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | |||
| ## Preprocessing | ## Tabs | |||
| Tabs in lines are immediately expanded to [spaces][space], with a tab | Tabs in lines are not expanded to [spaces][space]. However, | |||
| stop of 4 characters: | in contexts where indentation is significant for the | |||
| document's structure, tabs behave as if they were replaced | ||||
| by spaces with a tab stop of 4 characters. | ||||
| . | . | |||
| →foo→baz→→bim | →foo→baz→→bim | |||
| . | . | |||
| <pre><code>foo baz bim | <pre><code>foo→baz→→bim | |||
| </code></pre> | ||||
| . | ||||
| . | ||||
| →foo→baz→→bim | ||||
| . | ||||
| <pre><code>foo→baz→→bim | ||||
| </code></pre> | </code></pre> | |||
| . | . | |||
| . | . | |||
| a→a | a→a | |||
| ὐ→a | ὐ→a | |||
| . | . | |||
| <pre><code>a a | <pre><code>a→a | |||
| ὐ a | ὐ→a | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| . | ||||
| - foo | ||||
| →bar | ||||
| . | ||||
| <ul> | ||||
| <li> | ||||
| <p>foo</p> | ||||
| <p>bar</p> | ||||
| </li> | ||||
| </ul> | ||||
| . | ||||
| . | ||||
| >→foo→bar | ||||
| . | ||||
| <blockquote> | ||||
| <p>foo→bar</p> | ||||
| </blockquote> | ||||
| . | ||||
| ## Insecure characters | ## Insecure characters | |||
| For security reasons, the Unicode character `U+0000` must be replaced | For security reasons, the Unicode character `U+0000` must be replaced | |||
| with the replacement character (`U+FFFD`). | with the replacement character (`U+FFFD`). | |||
| # Blocks and inlines | # Blocks and inlines | |||
| We can think of a document as a sequence of | We can think of a document as a sequence of | |||
| [blocks](@block)---structural elements like paragraphs, block | [blocks](@block)---structural elements like paragraphs, block | |||
| quotations, lists, headers, rules, and code blocks. Some blocks (like | quotations, lists, headers, rules, and code blocks. Some blocks (like | |||
| skipping to change at line 446 | skipping to change at line 476 | |||
| a------ | a------ | |||
| ---a--- | ---a--- | |||
| . | . | |||
| <p>_ _ _ _ a</p> | <p>_ _ _ _ a</p> | |||
| <p>a------</p> | <p>a------</p> | |||
| <p>---a---</p> | <p>---a---</p> | |||
| . | . | |||
| It is required that all of the [non-space character]s be the same. | It is required that all of the [non-whitespace character]s be the same. | |||
| So, this is not a horizontal rule: | So, this is not a horizontal rule: | |||
| . | . | |||
| *-* | *-* | |||
| . | . | |||
| <p><em>-</em></p> | <p><em>-</em></p> | |||
| . | . | |||
| Horizontal rules do not need blank lines before or after: | Horizontal rules do not need blank lines before or after: | |||
| skipping to change at line 536 | skipping to change at line 566 | |||
| </ul> | </ul> | |||
| . | . | |||
| ## ATX headers | ## ATX headers | |||
| An [ATX header](@atx-header) | An [ATX header](@atx-header) | |||
| consists of a string of characters, parsed as inline content, between an | consists of a string of characters, parsed as inline content, between an | |||
| opening sequence of 1--6 unescaped `#` characters and an optional | opening sequence of 1--6 unescaped `#` characters and an optional | |||
| closing sequence of any number of `#` characters. The opening sequence | closing sequence of any number of `#` characters. The opening sequence | |||
| of `#` characters cannot be followed directly by a | of `#` characters cannot be followed directly by a | |||
| [non-space character]. The optional closing sequence of `#`s must be | [non-whitespace character]. The optional closing sequence of `#`s must be | |||
| preceded by a [space] and may be followed by spaces only. The opening | preceded by a [space] and may be followed by spaces only. The opening | |||
| `#` character may be indented 0-3 spaces. The raw contents of the | `#` character may be indented 0-3 spaces. The raw contents of the | |||
| header are stripped of leading and trailing spaces before being parsed | header are stripped of leading and trailing spaces before being parsed | |||
| as inline content. The header level is equal to the number of `#` | as inline content. The header level is equal to the number of `#` | |||
| characters in the opening sequence. | characters in the opening sequence. | |||
| Simple headers: | Simple headers: | |||
| . | . | |||
| # foo | # foo | |||
| skipping to change at line 668 | skipping to change at line 698 | |||
| Spaces are allowed after the closing sequence: | Spaces are allowed after the closing sequence: | |||
| . | . | |||
| ### foo ### | ### foo ### | |||
| . | . | |||
| <h3>foo</h3> | <h3>foo</h3> | |||
| . | . | |||
| A sequence of `#` characters with a | A sequence of `#` characters with a | |||
| [non-space character] following it | [non-whitespace character] following it | |||
| is not a closing sequence, but counts as part of the contents of the | is not a closing sequence, but counts as part of the contents of the | |||
| header: | header: | |||
| . | . | |||
| ### foo ### b | ### foo ### b | |||
| . | . | |||
| <h3>foo ### b</h3> | <h3>foo ### b</h3> | |||
| . | . | |||
| The closing sequence must be preceded by a space: | The closing sequence must be preceded by a space: | |||
| skipping to change at line 737 | skipping to change at line 767 | |||
| ### ### | ### ### | |||
| . | . | |||
| <h2></h2> | <h2></h2> | |||
| <h1></h1> | <h1></h1> | |||
| <h3></h3> | <h3></h3> | |||
| . | . | |||
| ## Setext headers | ## Setext headers | |||
| A [setext header](@setext-header) | A [setext header](@setext-header) | |||
| consists of a line of text, containing at least one [non-space character], | consists of a line of text, containing at least one [non-whitespace character], | |||
| with no more than 3 spaces indentation, followed by a [setext header | with no more than 3 spaces indentation, followed by a [setext header | |||
| underline]. The line of text must be | underline]. The line of text must be | |||
| one that, were it not followed by the setext header underline, | one that, were it not followed by the setext header underline, | |||
| would be interpreted as part of a paragraph: it cannot be | would be interpreted as part of a paragraph: it cannot be | |||
| interpretable as a [code fence], [ATX header][ATX headers], | interpretable as a [code fence], [ATX header][ATX headers], | |||
| [block quote][block quotes], [horizontal rule][horizontal rules], | [block quote][block quotes], [horizontal rule][horizontal rules], | |||
| [list item][list items], or [HTML block][HTML blocks]. | [list item][list items], or [HTML block][HTML blocks]. | |||
| A [setext header underline](@setext-header-underline) is a sequence of | A [setext header underline](@setext-header-underline) is a sequence of | |||
| `=` characters or a sequence of `-` characters, with no more than 3 | `=` characters or a sequence of `-` characters, with no more than 3 | |||
| skipping to change at line 1312 | skipping to change at line 1342 | |||
| ~~~~ | ~~~~ | |||
| aaa | aaa | |||
| ~~~ | ~~~ | |||
| ~~~~ | ~~~~ | |||
| . | . | |||
| <pre><code>aaa | <pre><code>aaa | |||
| ~~~ | ~~~ | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| Unclosed code blocks are closed by the end of the document: | Unclosed code blocks are closed by the end of the document | |||
| (or the enclosing [block quote] or [list item]): | ||||
| . | . | |||
| ``` | ``` | |||
| . | . | |||
| <pre><code></code></pre> | <pre><code></code></pre> | |||
| . | . | |||
| . | . | |||
| ````` | ````` | |||
| ``` | ``` | |||
| aaa | aaa | |||
| . | . | |||
| <pre><code> | <pre><code> | |||
| ``` | ``` | |||
| aaa | aaa | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| . | ||||
| > ``` | ||||
| > aaa | ||||
| bbb | ||||
| . | ||||
| <blockquote> | ||||
| <pre><code>aaa | ||||
| </code></pre> | ||||
| </blockquote> | ||||
| <p>bbb</p> | ||||
| . | ||||
| A code block can have all empty lines as its content: | A code block can have all empty lines as its content: | |||
| . | . | |||
| ``` | ``` | |||
| ``` | ``` | |||
| . | . | |||
| <pre><code> | <pre><code> | |||
| </code></pre> | </code></pre> | |||
| skipping to change at line 1554 | skipping to change at line 1598 | |||
| ``` | ``` | |||
| ``` aaa | ``` aaa | |||
| ``` | ``` | |||
| . | . | |||
| <pre><code>``` aaa | <pre><code>``` aaa | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| ## HTML blocks | ## HTML blocks | |||
| An [HTML block tag](@html-block-tag) is | An [HTML block](@html-block) is a group of lines that is treated | |||
| an [open tag] or [closing tag] whose tag | as raw HTML (and will not be escaped in HTML output). | |||
| name is one of the following (case-insensitive): | ||||
| `article`, `header`, `aside`, `hgroup`, `blockquote`, `hr`, `iframe`, | ||||
| `body`, `li`, `map`, `button`, `object`, `canvas`, `ol`, `caption`, | ||||
| `output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`, | ||||
| `section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`, | ||||
| `fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`, | ||||
| `tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`, | ||||
| `script`, `style`. | ||||
| An [HTML block](@html-block) begins with an | There are seven kinds of [HTML block], which can be defined | |||
| [HTML block tag], [HTML comment], [processing instruction], | by their start and end conditions. The block begins with a line that | |||
| [declaration], or [CDATA section]. | meets a [start condition](@start-condition) (after up to three spaces | |||
| It ends when a [blank line] or the end of the | optional indentation). It ends with the first subsequent line that | |||
| input is encountered. The initial line may be indented up to three | meets a matching [end condition](@end-condition), or the last line of | |||
| spaces, and subsequent lines may have any indentation. The contents | the document, if no line is encountered that meets the | |||
| of the HTML block are interpreted as raw HTML, and will not be escaped | [end condition]. If the first line meets both the [start condition] | |||
| in HTML output. | and the [end condition], the block will contain just that line. | |||
| Some simple examples: | 1. **Start condition:** line begins with the string `<script`, | |||
| `<pre`, or `<style` (case-insensitive), followed by whitespace, | ||||
| the string `>`, or the end of the line.\ | ||||
| **End condition:** line contains an end tag | ||||
| `</script>`, `</pre>`, or `</style>` (case-insensitive; it | ||||
| need not match the start tag). | ||||
| 2. **Start condition:** line begins with the string `<!--`.\ | ||||
| **End condition:** line contains the string `-->`. | ||||
| 3. **Start condition:** line begins with the string `<?`.\ | ||||
| **End condition:** line contains the string `?>`. | ||||
| 4. **Start condition:** line begins with the string `<!` | ||||
| followed by an uppercase ASCII letter.\ | ||||
| **End condition:** line contains the character `>`. | ||||
| 5. **Start condition:** line begins with the string | ||||
| `<![CDATA[`.\ | ||||
| **End condition:** line contains the string `]]>`. | ||||
| 6. **Start condition:** line begins the string `<` or `</` | ||||
| followed by one of the strings (case-insensitive) `address`, | ||||
| `article`, `aside`, `base`, `basefont`, `blockquote`, `body`, | ||||
| `caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, | ||||
| `dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, | ||||
| `footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, | ||||
| `html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, | ||||
| `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`, | ||||
| `section`, `source`, `title`, `summary`, `table`, `tbody`, `td`, | ||||
| `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed | ||||
| by [whitespace], the end of the line, the string `>`, or | ||||
| the string `/>`.\ | ||||
| **End condition:** line is followed by a [blank line]. | ||||
| 7. **Start condition:** line begins with an [open tag] | ||||
| (with any [tag name]) followed only by [whitespace] or the end | ||||
| of the line.\ | ||||
| **End condition:** line is followed by a [blank line]. | ||||
| All types of [HTML blocks] except type 7 may interrupt | ||||
| a paragraph. Blocks of type 7 may not interrupt a paragraph. | ||||
| (This restricted is intended to prevent unwanted interpretation | ||||
| of long tags inside a wrapped paragraph as starting HTML blocks.) | ||||
| Some simple examples follow. Here are some basic HTML blocks | ||||
| of type 6: | ||||
| . | . | |||
| <table> | <table> | |||
| <tr> | <tr> | |||
| <td> | <td> | |||
| hi | hi | |||
| </td> | </td> | |||
| </tr> | </tr> | |||
| </table> | </table> | |||
| skipping to change at line 1607 | skipping to change at line 1689 | |||
| . | . | |||
| <div> | <div> | |||
| *hello* | *hello* | |||
| <foo><a> | <foo><a> | |||
| . | . | |||
| <div> | <div> | |||
| *hello* | *hello* | |||
| <foo><a> | <foo><a> | |||
| . | . | |||
| A block can also start with a closing tag: | ||||
| . | ||||
| </div> | ||||
| *foo* | ||||
| . | ||||
| </div> | ||||
| *foo* | ||||
| . | ||||
| Here we have two HTML blocks with a Markdown paragraph between them: | Here we have two HTML blocks with a Markdown paragraph between them: | |||
| . | . | |||
| <DIV CLASS="foo"> | <DIV CLASS="foo"> | |||
| *Markdown* | *Markdown* | |||
| </DIV> | </DIV> | |||
| . | . | |||
| <DIV CLASS="foo"> | <DIV CLASS="foo"> | |||
| <p><em>Markdown</em></p> | <p><em>Markdown</em></p> | |||
| </DIV> | </DIV> | |||
| . | . | |||
| In the following example, what looks like a Markdown code block | The tag on the first line can be partial, as long | |||
| as it is split where there would be whitespace: | ||||
| . | ||||
| <div id="foo" | ||||
| class="bar"> | ||||
| </div> | ||||
| . | ||||
| <div id="foo" | ||||
| class="bar"> | ||||
| </div> | ||||
| . | ||||
| . | ||||
| <div id="foo" class="bar | ||||
| baz"> | ||||
| </div> | ||||
| . | ||||
| <div id="foo" class="bar | ||||
| baz"> | ||||
| </div> | ||||
| . | ||||
| An open tag need not be closed: | ||||
| . | ||||
| <div> | ||||
| *foo* | ||||
| *bar* | ||||
| . | ||||
| <div> | ||||
| *foo* | ||||
| <p><em>bar</em></p> | ||||
| . | ||||
| A partial tag need not even be completed (garbage | ||||
| in, garbage out): | ||||
| . | ||||
| <div id="foo" | ||||
| *hi* | ||||
| . | ||||
| <div id="foo" | ||||
| *hi* | ||||
| . | ||||
| . | ||||
| <div class | ||||
| foo | ||||
| . | ||||
| <div class | ||||
| foo | ||||
| . | ||||
| The initial tag doesn't even need to be a valid | ||||
| tag, as long as it starts like one: | ||||
| . | ||||
| <div *???-&&&-<--- | ||||
| *foo* | ||||
| . | ||||
| <div *???-&&&-<--- | ||||
| *foo* | ||||
| . | ||||
| In type 6 blocks, the initial tag need not be on a line by | ||||
| itself: | ||||
| . | ||||
| <div><a href="bar">*foo*</a></div> | ||||
| . | ||||
| <div><a href="bar">*foo*</a></div> | ||||
| . | ||||
| . | ||||
| <table><tr><td> | ||||
| foo | ||||
| </td></tr></table> | ||||
| . | ||||
| <table><tr><td> | ||||
| foo | ||||
| </td></tr></table> | ||||
| . | ||||
| Everything until the next blank line or end of document | ||||
| gets included in the HTML block. So, in the following | ||||
| example, what looks like a Markdown code block | ||||
| is actually part of the HTML block, which continues until a blank | is actually part of the HTML block, which continues until a blank | |||
| line or the end of the document is reached: | line or the end of the document is reached: | |||
| . | . | |||
| <div></div> | <div></div> | |||
| ``` c | ``` c | |||
| int x = 33; | int x = 33; | |||
| ``` | ``` | |||
| . | . | |||
| <div></div> | <div></div> | |||
| ``` c | ``` c | |||
| int x = 33; | int x = 33; | |||
| ``` | ``` | |||
| . | . | |||
| A comment: | To start an [HTML block] with a tag that is *not* in the | |||
| list of block-level tags in (6), you must put the tag by | ||||
| itself on the first line (and it must be complete): | ||||
| . | ||||
| <a href="foo"> | ||||
| *bar* | ||||
| </a> | ||||
| . | ||||
| <a href="foo"> | ||||
| *bar* | ||||
| </a> | ||||
| . | ||||
| In type 7 blocks, the [tag name] can be anything: | ||||
| . | ||||
| <Warning> | ||||
| *bar* | ||||
| </Warning> | ||||
| . | ||||
| <Warning> | ||||
| *bar* | ||||
| </Warning> | ||||
| . | ||||
| . | ||||
| <i class="foo"> | ||||
| *bar* | ||||
| </i> | ||||
| . | ||||
| <i class="foo"> | ||||
| *bar* | ||||
| </i> | ||||
| . | ||||
| These rules are designed to allow us to work with tags that | ||||
| can function as either block-level or inline-level tags. | ||||
| The `<del>` tag is a nice example. We can surround content with | ||||
| `<del>` tags in three different ways. In this case, we get a raw | ||||
| HTML block, because the `<del>` tag is on a line by itself: | ||||
| . | ||||
| <del> | ||||
| *foo* | ||||
| </del> | ||||
| . | ||||
| <del> | ||||
| *foo* | ||||
| </del> | ||||
| . | ||||
| In this case, we get a raw HTML block that just includes | ||||
| the `<del>` tag (because it ends with the following blank | ||||
| line). So the contents get interpreted as CommonMark: | ||||
| . | ||||
| <del> | ||||
| *foo* | ||||
| </del> | ||||
| . | ||||
| <del> | ||||
| <p><em>foo</em></p> | ||||
| </del> | ||||
| . | ||||
| Finally, in this case, the `<del>` tags are interpreted | ||||
| as [raw HTML] *inside* the CommonMark paragraph. (Because | ||||
| the tag is not on a line by itself, we get inline HTML | ||||
| rather than an [HTML block].) | ||||
| . | ||||
| <del>*foo*</del> | ||||
| . | ||||
| <p><del><em>foo</em></del></p> | ||||
| . | ||||
| HTML tags designed to contain literal content | ||||
| (`script`, `style`, `pre`), comments, processing instructions, | ||||
| and declarations are treated somewhat differently. | ||||
| Instead of ending at the first blank line, these blocks | ||||
| end at the first line containing a corresponding end tag. | ||||
| As a result, these blocks can contain blank lines: | ||||
| A pre tag (type 1): | ||||
| . | ||||
| <pre language="haskell"><code> | ||||
| import Text.HTML.TagSoup | ||||
| main :: IO () | ||||
| main = print $ parseTags tags | ||||
| </code></pre> | ||||
| . | ||||
| <pre language="haskell"><code> | ||||
| import Text.HTML.TagSoup | ||||
| main :: IO () | ||||
| main = print $ parseTags tags | ||||
| </code></pre> | ||||
| . | ||||
| A script tag (type 1): | ||||
| . | ||||
| <script type="text/javascript"> | ||||
| // JavaScript example | ||||
| document.getElementById("demo").innerHTML = "Hello JavaScript!"; | ||||
| </script> | ||||
| . | ||||
| <script type="text/javascript"> | ||||
| // JavaScript example | ||||
| document.getElementById("demo").innerHTML = "Hello JavaScript!"; | ||||
| </script> | ||||
| . | ||||
| A style tag (type 1): | ||||
| . | ||||
| <style | ||||
| type="text/css"> | ||||
| h1 {color:red;} | ||||
| p {color:blue;} | ||||
| </style> | ||||
| . | ||||
| <style | ||||
| type="text/css"> | ||||
| h1 {color:red;} | ||||
| p {color:blue;} | ||||
| </style> | ||||
| . | ||||
| If there is no matching end tag, the block will end at the | ||||
| end of the document (or the enclosing [block quote] or | ||||
| [list item]): | ||||
| . | ||||
| <style | ||||
| type="text/css"> | ||||
| foo | ||||
| . | ||||
| <style | ||||
| type="text/css"> | ||||
| foo | ||||
| . | ||||
| . | ||||
| > <div> | ||||
| > foo | ||||
| bar | ||||
| . | ||||
| <blockquote> | ||||
| <div> | ||||
| foo | ||||
| </blockquote> | ||||
| <p>bar</p> | ||||
| . | ||||
| . | ||||
| - <div> | ||||
| - foo | ||||
| . | ||||
| <ul> | ||||
| <li> | ||||
| <div> | ||||
| </li> | ||||
| <li>foo</li> | ||||
| </ul> | ||||
| . | ||||
| The end tag can occur on the same line as the start tag: | ||||
| . | ||||
| <style>p{color:red;}</style> | ||||
| *foo* | ||||
| . | ||||
| <style>p{color:red;}</style> | ||||
| <p><em>foo</em></p> | ||||
| . | ||||
| . | ||||
| <!-- foo -->*bar* | ||||
| *baz* | ||||
| . | ||||
| <!-- foo -->*bar* | ||||
| <p><em>baz</em></p> | ||||
| . | ||||
| Note that anything on the last line after the | ||||
| end tag will be included in the [HTML block]: | ||||
| . | ||||
| <script> | ||||
| foo | ||||
| </script>1. *bar* | ||||
| . | ||||
| <script> | ||||
| foo | ||||
| </script>1. *bar* | ||||
| . | ||||
| A comment (type 2): | ||||
| . | . | |||
| <!-- Foo | <!-- Foo | |||
| bar | bar | |||
| baz --> | baz --> | |||
| . | . | |||
| <!-- Foo | <!-- Foo | |||
| bar | bar | |||
| baz --> | baz --> | |||
| . | . | |||
| A processing instruction: | A processing instruction (type 3): | |||
| . | . | |||
| <?php | <?php | |||
| echo '>'; | echo '>'; | |||
| ?> | ?> | |||
| . | . | |||
| <?php | <?php | |||
| echo '>'; | echo '>'; | |||
| ?> | ?> | |||
| . | . | |||
| CDATA: | A declaration (type 4): | |||
| . | ||||
| <!DOCTYPE html> | ||||
| . | ||||
| <!DOCTYPE html> | ||||
| . | ||||
| CDATA (type 5): | ||||
| . | . | |||
| <![CDATA[ | <![CDATA[ | |||
| function matchwo(a,b) | function matchwo(a,b) | |||
| { | { | |||
| if (a < b && a < 0) then | if (a < b && a < 0) then { | |||
| { | return 1; | |||
| return 1; | ||||
| } | } else { | |||
| else | ||||
| { | return 0; | |||
| return 0; | ||||
| } | } | |||
| } | } | |||
| ]]> | ]]> | |||
| . | . | |||
| <![CDATA[ | <![CDATA[ | |||
| function matchwo(a,b) | function matchwo(a,b) | |||
| { | { | |||
| if (a < b && a < 0) then | if (a < b && a < 0) then { | |||
| { | return 1; | |||
| return 1; | ||||
| } | } else { | |||
| else | ||||
| { | return 0; | |||
| return 0; | ||||
| } | } | |||
| } | } | |||
| ]]> | ]]> | |||
| . | . | |||
| The opening tag can be indented 1-3 spaces, but not 4: | The opening tag can be indented 1-3 spaces, but not 4: | |||
| . | . | |||
| <!-- foo --> | <!-- foo --> | |||
| <!-- foo --> | <!-- foo --> | |||
| . | . | |||
| <!-- foo --> | <!-- foo --> | |||
| <pre><code><!-- foo --> | <pre><code><!-- foo --> | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| An HTML block can interrupt a paragraph, and need not be preceded | . | |||
| by a blank line. | <div> | |||
| <div> | ||||
| . | ||||
| <div> | ||||
| <pre><code><div> | ||||
| </code></pre> | ||||
| . | ||||
| An HTML block of types 1--6 can interrupt a paragraph, and need not be | ||||
| preceded by a blank line. | ||||
| . | . | |||
| Foo | Foo | |||
| <div> | <div> | |||
| bar | bar | |||
| </div> | </div> | |||
| . | . | |||
| <p>Foo</p> | <p>Foo</p> | |||
| <div> | <div> | |||
| bar | bar | |||
| </div> | </div> | |||
| . | . | |||
| However, a following blank line is always needed, except at the end of | However, a following blank line is needed, except at the end of | |||
| a document: | a document, and except for blocks of types 1--5, above: | |||
| . | . | |||
| <div> | <div> | |||
| bar | bar | |||
| </div> | </div> | |||
| *foo* | *foo* | |||
| . | . | |||
| <div> | <div> | |||
| bar | bar | |||
| </div> | </div> | |||
| *foo* | *foo* | |||
| . | . | |||
| An incomplete HTML block tag may also start an HTML block: | HTML blocks of type 7 cannot interrupt a paragraph: | |||
| . | . | |||
| <div class | Foo | |||
| foo | <a href="bar"> | |||
| baz | ||||
| . | . | |||
| <div class | <p>Foo | |||
| foo | <a href="bar"> | |||
| baz</p> | ||||
| . | . | |||
| This rule differs from John Gruber's original Markdown syntax | This rule differs from John Gruber's original Markdown syntax | |||
| specification, which says: | specification, which says: | |||
| > The only restrictions are that block-level HTML elements — | > The only restrictions are that block-level HTML elements — | |||
| > e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from | > e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from | |||
| > surrounding content by blank lines, and the start and end tags of the | > surrounding content by blank lines, and the start and end tags of the | |||
| > block should not be indented with tabs or spaces. | > block should not be indented with tabs or spaces. | |||
| In some ways Gruber's rule is more restrictive than the one given | In some ways Gruber's rule is more restrictive than the one given | |||
| here: | here: | |||
| - It requires that an HTML block be preceded by a blank line. | - It requires that an HTML block be preceded by a blank line. | |||
| - It does not allow the start tag to be indented. | - It does not allow the start tag to be indented. | |||
| - It requires a matching end tag, which it also does not allow to | - It requires a matching end tag, which it also does not allow to | |||
| be indented. | be indented. | |||
| Indeed, most Markdown implementations, including some of Gruber's | Most Markdown implementations (including some of Gruber's own) do not | |||
| own perl implementations, do not impose these restrictions. | respect all of these restrictions. | |||
| There is one respect, however, in which Gruber's rule is more liberal | There is one respect, however, in which Gruber's rule is more liberal | |||
| than the one given here, since it allows blank lines to occur inside | than the one given here, since it allows blank lines to occur inside | |||
| an HTML block. There are two reasons for disallowing them here. | an HTML block. There are two reasons for disallowing them here. | |||
| First, it removes the need to parse balanced tags, which is | First, it removes the need to parse balanced tags, which is | |||
| expensive and can require backtracking from the end of the document | expensive and can require backtracking from the end of the document | |||
| if no matching end tag is found. Second, it provides a very simple | if no matching end tag is found. Second, it provides a very simple | |||
| and flexible way of including Markdown content inside HTML tags: | and flexible way of including Markdown content inside HTML tags: | |||
| simply separate the Markdown from the HTML using blank lines: | simply separate the Markdown from the HTML using blank lines: | |||
| Compare: | ||||
| . | . | |||
| <div> | <div> | |||
| *Emphasized* text. | *Emphasized* text. | |||
| </div> | </div> | |||
| . | . | |||
| <div> | <div> | |||
| <p><em>Emphasized</em> text.</p> | <p><em>Emphasized</em> text.</p> | |||
| </div> | </div> | |||
| . | . | |||
| Compare: | ||||
| . | . | |||
| <div> | <div> | |||
| *Emphasized* text. | *Emphasized* text. | |||
| </div> | </div> | |||
| . | . | |||
| <div> | <div> | |||
| *Emphasized* text. | *Emphasized* text. | |||
| </div> | </div> | |||
| . | . | |||
| skipping to change at line 1830 | skipping to change at line 2242 | |||
| . | . | |||
| <table> | <table> | |||
| <tr> | <tr> | |||
| <td> | <td> | |||
| Hi | Hi | |||
| </td> | </td> | |||
| </tr> | </tr> | |||
| </table> | </table> | |||
| . | . | |||
| Moreover, blank lines are usually not necessary and can be | There are problems, however, if the inner tags are indented | |||
| deleted. The exception is inside `<pre>` tags; here, one can | *and* separated by spaces, as then they will be interpreted as | |||
| replace the blank lines with ` ` entities. | an indented code block: | |||
| So there is no important loss of expressive power with the new rule. | . | |||
| <table> | ||||
| <tr> | ||||
| <td> | ||||
| Hi | ||||
| </td> | ||||
| </tr> | ||||
| </table> | ||||
| . | ||||
| <table> | ||||
| <tr> | ||||
| <pre><code><td> | ||||
| Hi | ||||
| </td> | ||||
| </code></pre> | ||||
| </tr> | ||||
| </table> | ||||
| . | ||||
| Fortunately, blank lines are usually not necessary and can be | ||||
| deleted. The exception is inside `<pre>` tags, but as described | ||||
| above, raw HTML blocks starting with `<pre>` *can* contain blank | ||||
| lines. | ||||
| ## Link reference definitions | ## Link reference definitions | |||
| A [link reference definition](@link-reference-definition) | A [link reference definition](@link-reference-definition) | |||
| consists of a [link label], indented up to three spaces, followed | consists of a [link label], indented up to three spaces, followed | |||
| by a colon (`:`), optional [whitespace] (including up to one | by a colon (`:`), optional [whitespace] (including up to one | |||
| [line ending]), a [link destination], | [line ending]), a [link destination], | |||
| optional [whitespace] (including up to one | optional [whitespace] (including up to one | |||
| [line ending]), and an optional [link | [line ending]), and an optional [link | |||
| title], which if it is present must be separated | title], which if it is present must be separated | |||
| from the [link destination] by [whitespace]. | from the [link destination] by [whitespace]. | |||
| No further [non-space character]s may occur on the line. | No further [non-whitespace character]s may occur on the line. | |||
| A [link reference definition] | A [link reference definition] | |||
| does not correspond to a structural element of a document. Instead, it | does not correspond to a structural element of a document. Instead, it | |||
| defines a label which can be used in [reference link]s | defines a label which can be used in [reference link]s | |||
| and reference-style [images] elsewhere in the document. [Link | and reference-style [images] elsewhere in the document. [Link | |||
| reference definitions] can come either before or after the links that use | reference definitions] can come either before or after the links that use | |||
| them. | them. | |||
| . | . | |||
| [foo]: /url "title" | [foo]: /url "title" | |||
| skipping to change at line 1945 | skipping to change at line 2383 | |||
| . | . | |||
| [foo]: | [foo]: | |||
| [foo] | [foo] | |||
| . | . | |||
| <p>[foo]:</p> | <p>[foo]:</p> | |||
| <p>[foo]</p> | <p>[foo]</p> | |||
| . | . | |||
| Both title and destination can contain backslash escapes | ||||
| and literal backslashes: | ||||
| . | ||||
| [foo]: /url\bar\*baz "foo\"bar\baz" | ||||
| [foo] | ||||
| . | ||||
| <p><a href="/url%5Cbar*baz" title="foo"bar\baz">foo</a></p> | ||||
| . | ||||
| A link can come before its corresponding definition: | A link can come before its corresponding definition: | |||
| . | . | |||
| [foo] | [foo] | |||
| [foo]: url | [foo]: url | |||
| . | . | |||
| <p><a href="url">foo</a></p> | <p><a href="url">foo</a></p> | |||
| . | . | |||
| skipping to change at line 2006 | skipping to change at line 2455 | |||
| . | . | |||
| [ | [ | |||
| foo | foo | |||
| ]: /url | ]: /url | |||
| bar | bar | |||
| . | . | |||
| <p>bar</p> | <p>bar</p> | |||
| . | . | |||
| This is not a link reference definition, because there are | This is not a link reference definition, because there are | |||
| [non-space character]s after the title: | [non-whitespace character]s after the title: | |||
| . | . | |||
| [foo]: /url "title" ok | [foo]: /url "title" ok | |||
| . | . | |||
| <p>[foo]: /url "title" ok</p> | <p>[foo]: /url "title" ok</p> | |||
| . | . | |||
| This is a link reference definition, but it has no title: | ||||
| . | ||||
| [foo]: /url | ||||
| "title" ok | ||||
| . | ||||
| <p>"title" ok</p> | ||||
| . | ||||
| This is not a link reference definition, because it is indented | This is not a link reference definition, because it is indented | |||
| four spaces: | four spaces: | |||
| . | . | |||
| [foo]: /url "title" | [foo]: /url "title" | |||
| [foo] | [foo] | |||
| . | . | |||
| <pre><code>[foo]: /url "title" | <pre><code>[foo]: /url "title" | |||
| </code></pre> | </code></pre> | |||
| skipping to change at line 2240 | skipping to change at line 2698 | |||
| form of the definition is: | form of the definition is: | |||
| > If X is a sequence of blocks, then the result of | > If X is a sequence of blocks, then the result of | |||
| > transforming X in such-and-such a way is a container of type Y | > transforming X in such-and-such a way is a container of type Y | |||
| > with these blocks as its content. | > with these blocks as its content. | |||
| So, we explain what counts as a block quote or list item by explaining | So, we explain what counts as a block quote or list item by explaining | |||
| how these can be *generated* from their contents. This should suffice | how these can be *generated* from their contents. This should suffice | |||
| to define the syntax, although it does not give a recipe for *parsing* | to define the syntax, although it does not give a recipe for *parsing* | |||
| these constructions. (A recipe is provided below in the section entitled | these constructions. (A recipe is provided below in the section entitled | |||
| [A parsing strategy](#appendix-a-a-parsing-strategy).) | [A parsing strategy](#appendix-a-parsing-strategy).) | |||
| ## Block quotes | ## Block quotes | |||
| A [block quote marker](@block-quote-marker) | A [block quote marker](@block-quote-marker) | |||
| consists of 0-3 spaces of initial indent, plus (a) the character `>` together | consists of 0-3 spaces of initial indent, plus (a) the character `>` together | |||
| with a following space, or (b) a single character `>` not followed by a space. | with a following space, or (b) a single character `>` not followed by a space. | |||
| The following rules define [block quotes]: | The following rules define [block quotes]: | |||
| 1. **Basic case.** If a string of lines *Ls* constitute a sequence | 1. **Basic case.** If a string of lines *Ls* constitute a sequence | |||
| of blocks *Bs*, then the result of prepending a [block quote | of blocks *Bs*, then the result of prepending a [block quote | |||
| marker] to the beginning of each line in *Ls* | marker] to the beginning of each line in *Ls* | |||
| is a [block quote](#block-quotes) containing *Bs*. | is a [block quote](#block-quotes) containing *Bs*. | |||
| 2. **Laziness.** If a string of lines *Ls* constitute a [block | 2. **Laziness.** If a string of lines *Ls* constitute a [block | |||
| quote](#block-quotes) with contents *Bs*, then the result of deleting | quote](#block-quotes) with contents *Bs*, then the result of deleting | |||
| the initial [block quote marker] from one or | the initial [block quote marker] from one or | |||
| more lines in which the next [non-space character] after the [block | more lines in which the next [non-whitespace character] after the [block | |||
| quote marker] is [paragraph continuation | quote marker] is [paragraph continuation | |||
| text] is a block quote with *Bs* as its content. | text] is a block quote with *Bs* as its content. | |||
| [Paragraph continuation text](@paragraph-continuation-text) is text | [Paragraph continuation text](@paragraph-continuation-text) is text | |||
| that will be parsed as part of the content of a paragraph, but does | that will be parsed as part of the content of a paragraph, but does | |||
| not occur at the beginning of the paragraph. | not occur at the beginning of the paragraph. | |||
| 3. **Consecutiveness.** A document cannot contain two [block | 3. **Consecutiveness.** A document cannot contain two [block | |||
| quotes] in a row unless there is a [blank line] between them. | quotes] in a row unless there is a [blank line] between them. | |||
| Nothing else counts as a [block quote](#block-quotes). | Nothing else counts as a [block quote](#block-quotes). | |||
| skipping to change at line 2628 | skipping to change at line 3086 | |||
| ## List items | ## List items | |||
| A [list marker](@list-marker) is a | A [list marker](@list-marker) is a | |||
| [bullet list marker] or an [ordered list marker]. | [bullet list marker] or an [ordered list marker]. | |||
| A [bullet list marker](@bullet-list-marker) | A [bullet list marker](@bullet-list-marker) | |||
| is a `-`, `+`, or `*` character. | is a `-`, `+`, or `*` character. | |||
| An [ordered list marker](@ordered-list-marker) | An [ordered list marker](@ordered-list-marker) | |||
| is a sequence of one of more digits (`0-9`), followed by either a | is a sequence of 1--9 arabic digits (`0-9`), followed by either a | |||
| `.` character or a `)` character. | `.` character or a `)` character. (The reason for the length | |||
| limit is that with 10 digits we start seeing integer overflows | ||||
| in some browsers.) | ||||
| The following rules define [list items]: | The following rules define [list items]: | |||
| 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of | 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of | |||
| blocks *Bs* starting with a [non-space character] and not separated | blocks *Bs* starting with a [non-whitespace character] and not separated | |||
| from each other by more than one blank line, and *M* is a list | from each other by more than one blank line, and *M* is a list | |||
| marker of width *W* followed by 0 < *N* < 5 spaces, then the result | marker of width *W* followed by 0 < *N* < 5 spaces, then the result | |||
| of prepending *M* and the following spaces to the first line of | of prepending *M* and the following spaces to the first line of | |||
| *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a | *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a | |||
| list item with *Bs* as its contents. The type of the list item | list item with *Bs* as its contents. The type of the list item | |||
| (bullet or ordered) is determined by the type of its list marker. | (bullet or ordered) is determined by the type of its list marker. | |||
| If the list item is ordered, then it is also assigned a start | If the list item is ordered, then it is also assigned a start | |||
| number, based on the ordered list marker. | number, based on the ordered list marker. | |||
| For example, let *Ls* be the lines | For example, let *Ls* be the lines | |||
| skipping to change at line 2692 | skipping to change at line 3152 | |||
| <p>A block quote.</p> | <p>A block quote.</p> | |||
| </blockquote> | </blockquote> | |||
| </li> | </li> | |||
| </ol> | </ol> | |||
| . | . | |||
| The most important thing to notice is that the position of | The most important thing to notice is that the position of | |||
| the text after the list marker determines how much indentation | the text after the list marker determines how much indentation | |||
| is needed in subsequent blocks in the list item. If the list | is needed in subsequent blocks in the list item. If the list | |||
| marker takes up two spaces, and there are three spaces between | marker takes up two spaces, and there are three spaces between | |||
| the list marker and the next [non-space character], then blocks | the list marker and the next [non-whitespace character], then blocks | |||
| must be indented five spaces in order to fall under the list | must be indented five spaces in order to fall under the list | |||
| item. | item. | |||
| Here are some examples showing how far content must be indented to be | Here are some examples showing how far content must be indented to be | |||
| put under the list item: | put under the list item: | |||
| . | . | |||
| - one | - one | |||
| two | two | |||
| skipping to change at line 2750 | skipping to change at line 3210 | |||
| <ul> | <ul> | |||
| <li> | <li> | |||
| <p>one</p> | <p>one</p> | |||
| <p>two</p> | <p>two</p> | |||
| </li> | </li> | |||
| </ul> | </ul> | |||
| . | . | |||
| It is tempting to think of this in terms of columns: the continuation | It is tempting to think of this in terms of columns: the continuation | |||
| blocks must be indented at least to the column of the first | blocks must be indented at least to the column of the first | |||
| [non-space character] after the list marker. However, that is not quite right. | [non-whitespace character] after the list marker. However, that is not quite rig ht. | |||
| The spaces after the list marker determine how much relative indentation | The spaces after the list marker determine how much relative indentation | |||
| is needed. Which column this indentation reaches will depend on | is needed. Which column this indentation reaches will depend on | |||
| how the list item is embedded in other constructions, as shown by | how the list item is embedded in other constructions, as shown by | |||
| this example: | this example: | |||
| . | . | |||
| > > 1. one | > > 1. one | |||
| >> | >> | |||
| >> two | >> two | |||
| . | . | |||
| skipping to change at line 2893 | skipping to change at line 3353 | |||
| <pre><code>bar | <pre><code>bar | |||
| </code></pre> | </code></pre> | |||
| <p>baz</p> | <p>baz</p> | |||
| <blockquote> | <blockquote> | |||
| <p>bam</p> | <p>bam</p> | |||
| </blockquote> | </blockquote> | |||
| </li> | </li> | |||
| </ol> | </ol> | |||
| . | . | |||
| Note that ordered list start numbers must be nine digits or less: | ||||
| . | ||||
| 123456789. ok | ||||
| . | ||||
| <ol start="123456789"> | ||||
| <li>ok</li> | ||||
| </ol> | ||||
| . | ||||
| . | ||||
| 1234567890. not ok | ||||
| . | ||||
| <p>1234567890. not ok</p> | ||||
| . | ||||
| A start number may begin with 0s: | ||||
| . | ||||
| 0. ok | ||||
| . | ||||
| <ol start="0"> | ||||
| <li>ok</li> | ||||
| </ol> | ||||
| . | ||||
| . | ||||
| 003. ok | ||||
| . | ||||
| <ol start="3"> | ||||
| <li>ok</li> | ||||
| </ol> | ||||
| . | ||||
| A start number may not be negative: | ||||
| . | ||||
| -1. not ok | ||||
| . | ||||
| <p>-1. not ok</p> | ||||
| . | ||||
| 2. **Item starting with indented code.** If a sequence of lines *Ls* | 2. **Item starting with indented code.** If a sequence of lines *Ls* | |||
| constitute a sequence of blocks *Bs* starting with an indented code | constitute a sequence of blocks *Bs* starting with an indented code | |||
| block and not separated from each other by more than one blank line, | block and not separated from each other by more than one blank line, | |||
| and *M* is a list marker of width *W* followed by | and *M* is a list marker of width *W* followed by | |||
| one space, then the result of prepending *M* and the following | one space, then the result of prepending *M* and the following | |||
| space to the first line of *Ls*, and indenting subsequent lines of | space to the first line of *Ls*, and indenting subsequent lines of | |||
| *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. | *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. | |||
| If a line is empty, then it need not be indented. The type of the | If a line is empty, then it need not be indented. The type of the | |||
| list item (bullet or ordered) is determined by the type of its list | list item (bullet or ordered) is determined by the type of its list | |||
| marker. If the list item is ordered, then it is also assigned a | marker. If the list item is ordered, then it is also assigned a | |||
| skipping to change at line 2998 | skipping to change at line 3500 | |||
| </code></pre> | </code></pre> | |||
| <p>paragraph</p> | <p>paragraph</p> | |||
| <pre><code>more code | <pre><code>more code | |||
| </code></pre> | </code></pre> | |||
| </li> | </li> | |||
| </ol> | </ol> | |||
| . | . | |||
| Note that rules #1 and #2 only apply to two cases: (a) cases | Note that rules #1 and #2 only apply to two cases: (a) cases | |||
| in which the lines to be included in a list item begin with a | in which the lines to be included in a list item begin with a | |||
| [non-space character], and (b) cases in which | [non-whitespace character], and (b) cases in which | |||
| they begin with an indented code | they begin with an indented code | |||
| block. In a case like the following, where the first block begins with | block. In a case like the following, where the first block begins with | |||
| a three-space indent, the rules do not allow us to form a list item by | a three-space indent, the rules do not allow us to form a list item by | |||
| indenting the whole thing and prepending a list marker: | indenting the whole thing and prepending a list marker: | |||
| . | . | |||
| foo | foo | |||
| bar | bar | |||
| . | . | |||
| skipping to change at line 3228 | skipping to change at line 3730 | |||
| indented code | indented code | |||
| > A block quote. | > A block quote. | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| 5. **Laziness.** If a string of lines *Ls* constitute a [list | 5. **Laziness.** If a string of lines *Ls* constitute a [list | |||
| item](#list-items) with contents *Bs*, then the result of deleting | item](#list-items) with contents *Bs*, then the result of deleting | |||
| some or all of the indentation from one or more lines in which the | some or all of the indentation from one or more lines in which the | |||
| next [non-space character] after the indentation is | next [non-whitespace character] after the indentation is | |||
| [paragraph continuation text] is a | [paragraph continuation text] is a | |||
| list item with the same contents and attributes. The unindented | list item with the same contents and attributes. The unindented | |||
| lines are called | lines are called | |||
| [lazy continuation line](@lazy-continuation-line)s. | [lazy continuation line](@lazy-continuation-line)s. | |||
| Here is an example with [lazy continuation line]s: | Here is an example with [lazy continuation line]s: | |||
| . | . | |||
| 1. A paragraph | 1. A paragraph | |||
| with two lines. | with two lines. | |||
| skipping to change at line 4201 | skipping to change at line 4703 | |||
| . | . | |||
| <p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> | <p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> | |||
| . | . | |||
| Backslashes before other characters are treated as literal | Backslashes before other characters are treated as literal | |||
| backslashes: | backslashes: | |||
| . | . | |||
| \→\A\a\ \3\φ\« | \→\A\a\ \3\φ\« | |||
| . | . | |||
| <p>\ \A\a\ \3\φ\«</p> | <p>\→\A\a\ \3\φ\«</p> | |||
| . | . | |||
| Escaped characters are treated as regular characters and do | Escaped characters are treated as regular characters and do | |||
| not have their usual Markdown meanings: | not have their usual Markdown meanings: | |||
| . | . | |||
| \*not emphasized* | \*not emphasized* | |||
| \<br/> not a tag | \<br/> not a tag | |||
| \[not a link](/foo) | \[not a link](/foo) | |||
| \`not code` | \`not code` | |||
| skipping to change at line 4279 | skipping to change at line 4781 | |||
| . | . | |||
| <http://example.com?find=\*> | <http://example.com?find=\*> | |||
| . | . | |||
| <p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> | <p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> | |||
| . | . | |||
| . | . | |||
| <a href="/bar\/)"> | <a href="/bar\/)"> | |||
| . | . | |||
| <p><a href="/bar\/)"></p> | <a href="/bar\/)"> | |||
| . | . | |||
| But they work in all other contexts, including URLs and link titles, | But they work in all other contexts, including URLs and link titles, | |||
| link references, and [info string]s in [fenced code block]s: | link references, and [info string]s in [fenced code block]s: | |||
| . | . | |||
| [foo](/bar\* "ti\*tle") | [foo](/bar\* "ti\*tle") | |||
| . | . | |||
| <p><a href="/bar*" title="ti*tle">foo</a></p> | <p><a href="/bar*" title="ti*tle">foo</a></p> | |||
| . | . | |||
| skipping to change at line 4325 | skipping to change at line 4827 | |||
| unicode characters as entities or leave them as they are. (However, | unicode characters as entities or leave them as they are. (However, | |||
| `"`, `&`, `<`, and `>` must always be rendered as entities.) | `"`, `&`, `<`, and `>` must always be rendered as entities.) | |||
| [Named entities](@name-entities) consist of `&` | [Named entities](@name-entities) consist of `&` | |||
| + any of the valid HTML5 entity names + `;`. The | + any of the valid HTML5 entity names + `;`. The | |||
| [following document](https://html.spec.whatwg.org/multipage/entities.json) | [following document](https://html.spec.whatwg.org/multipage/entities.json) | |||
| is used as an authoritative source of the valid entity names and their | is used as an authoritative source of the valid entity names and their | |||
| corresponding codepoints. | corresponding codepoints. | |||
| . | . | |||
| & © Æ Ď ¾ ℋ ⅆ &Cl ockwiseContourIntegral; | & © Æ Ď | |||
| ¾ ℋ ⅆ | ||||
| ∲ ≧̸ | ||||
| . | . | |||
| <p> & © Æ Ď ¾ ℋ ⅆ ∲</p> | <p> & © Æ Ď | |||
| ¾ ℋ ⅆ | ||||
| ∲ ≧̸</p> | ||||
| . | . | |||
| [Decimal entities](@decimal-entities) | [Decimal entities](@decimal-entities) | |||
| consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | |||
| entities need to be recognised and transformed into their corresponding | entities need to be recognised and transformed into their corresponding | |||
| unicode codepoints. Invalid unicode codepoints will be replaced by | unicode codepoints. Invalid unicode codepoints will be replaced by | |||
| the "unknown codepoint" character (`U+FFFD`). For security reasons, | the "unknown codepoint" character (`U+FFFD`). For security reasons, | |||
| the codepoint `U+0000` will also be replaced by `U+FFFD`. | the codepoint `U+0000` will also be replaced by `U+FFFD`. | |||
| . | . | |||
| skipping to change at line 4388 | skipping to change at line 4894 | |||
| <p>&MadeUpEntity;</p> | <p>&MadeUpEntity;</p> | |||
| . | . | |||
| Entities are recognized in any context besides code spans or | Entities are recognized in any context besides code spans or | |||
| code blocks, including raw HTML, URLs, [link title]s, and | code blocks, including raw HTML, URLs, [link title]s, and | |||
| [fenced code block] [info string]s: | [fenced code block] [info string]s: | |||
| . | . | |||
| <a href="öö.html"> | <a href="öö.html"> | |||
| . | . | |||
| <p><a href="öö.html"></p> | <a href="öö.html"> | |||
| . | . | |||
| . | . | |||
| [foo](/föö "föö") | [foo](/föö "föö") | |||
| . | . | |||
| <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> | <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> | |||
| . | . | |||
| . | . | |||
| [foo] | [foo] | |||
| skipping to change at line 5690 | skipping to change at line 6196 | |||
| . | . | |||
| <p><em>foo _bar</em> baz_</p> | <p><em>foo _bar</em> baz_</p> | |||
| . | . | |||
| . | . | |||
| **foo*bar** | **foo*bar** | |||
| . | . | |||
| <p><em><em>foo</em>bar</em>*</p> | <p><em><em>foo</em>bar</em>*</p> | |||
| . | . | |||
| . | ||||
| *foo __bar *baz bim__ bam* | ||||
| . | ||||
| <p><em>foo <strong>bar *baz bim</strong> bam</em></p> | ||||
| . | ||||
| Rule 16: | Rule 16: | |||
| . | . | |||
| **foo **bar baz** | **foo **bar baz** | |||
| . | . | |||
| <p>**foo <strong>bar baz</strong></p> | <p>**foo <strong>bar baz</strong></p> | |||
| . | . | |||
| . | . | |||
| *foo *bar baz* | *foo *bar baz* | |||
| skipping to change at line 5773 | skipping to change at line 6285 | |||
| (the URI that is the link destination), and optionally a [link title]. | (the URI that is the link destination), and optionally a [link title]. | |||
| There are two basic kinds of links in Markdown. In [inline link]s the | There are two basic kinds of links in Markdown. In [inline link]s the | |||
| destination and title are given immediately after the link text. In | destination and title are given immediately after the link text. In | |||
| [reference link]s the destination and title are defined elsewhere in | [reference link]s the destination and title are defined elsewhere in | |||
| the document. | the document. | |||
| A [link text](@link-text) consists of a sequence of zero or more | A [link text](@link-text) consists of a sequence of zero or more | |||
| inline elements enclosed by square brackets (`[` and `]`). The | inline elements enclosed by square brackets (`[` and `]`). The | |||
| following rules apply: | following rules apply: | |||
| - Links may not contain other links, at any level of nesting. | - Links may not contain other links, at any level of nesting. If | |||
| multiple otherwise valid link definitions appear nested inside each | ||||
| other, the inner-most definition is used. | ||||
| - Brackets are allowed in the [link text] only if (a) they | - Brackets are allowed in the [link text] only if (a) they | |||
| are backslash-escaped or (b) they appear as a matched pair of brackets, | are backslash-escaped or (b) they appear as a matched pair of brackets, | |||
| with an open bracket `[`, a sequence of zero or more inlines, and | with an open bracket `[`, a sequence of zero or more inlines, and | |||
| a close bracket `]`. | a close bracket `]`. | |||
| - Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly | - Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly | |||
| than the brackets in link text. Thus, for example, | than the brackets in link text. Thus, for example, | |||
| `` [foo`]` `` could not be a link text, since the second `]` | `` [foo`]` `` could not be a link text, since the second `]` | |||
| is part of a code span. | is part of a code span. | |||
| skipping to change at line 5929 | skipping to change at line 6443 | |||
| Parentheses and other symbols can also be escaped, as usual | Parentheses and other symbols can also be escaped, as usual | |||
| in Markdown: | in Markdown: | |||
| . | . | |||
| [link](foo\)\:) | [link](foo\)\:) | |||
| . | . | |||
| <p><a href="foo):">link</a></p> | <p><a href="foo):">link</a></p> | |||
| . | . | |||
| A link can contain fragment identifiers and queries: | ||||
| . | ||||
| [link](#fragment) | ||||
| [link](http://example.com#fragment) | ||||
| [link](http://example.com?foo=bar&baz#fragment) | ||||
| . | ||||
| <p><a href="#fragment">link</a></p> | ||||
| <p><a href="http://example.com#fragment">link</a></p> | ||||
| <p><a href="http://example.com?foo=bar&baz#fragment">link</a></p> | ||||
| . | ||||
| Note that a backslash before a non-escapable character is | ||||
| just a backslash: | ||||
| . | ||||
| [link](foo\bar) | ||||
| . | ||||
| <p><a href="foo%5Cbar">link</a></p> | ||||
| . | ||||
| URL-escaping should be left alone inside the destination, as all | URL-escaping should be left alone inside the destination, as all | |||
| URL-escaped characters are also valid URL characters. HTML entities in | URL-escaped characters are also valid URL characters. HTML entities in | |||
| the destination will be parsed into the corresponding unicode | the destination will be parsed into the corresponding unicode | |||
| codepoints, as usual, and optionally URL-escaped when written as HTML. | codepoints, as usual, and optionally URL-escaped when written as HTML. | |||
| . | . | |||
| [link](foo%20bä) | [link](foo%20bä) | |||
| . | . | |||
| <p><a href="foo%20b%C3%A4">link</a></p> | <p><a href="foo%20b%C3%A4">link</a></p> | |||
| . | . | |||
| skipping to change at line 6134 | skipping to change at line 6671 | |||
| There are three kinds of [reference link](@reference-link)s: | There are three kinds of [reference link](@reference-link)s: | |||
| [full](#full-reference-link), [collapsed](#collapsed-reference-link), | [full](#full-reference-link), [collapsed](#collapsed-reference-link), | |||
| and [shortcut](#shortcut-reference-link). | and [shortcut](#shortcut-reference-link). | |||
| A [full reference link](@full-reference-link) | A [full reference link](@full-reference-link) | |||
| consists of a [link text], optional [whitespace], and a [link label] | consists of a [link text], optional [whitespace], and a [link label] | |||
| that [matches] a [link reference definition] elsewhere in the document. | that [matches] a [link reference definition] elsewhere in the document. | |||
| A [link label](@link-label) begins with a left bracket (`[`) and ends | A [link label](@link-label) begins with a left bracket (`[`) and ends | |||
| with the first right bracket (`]`) that is not backslash-escaped. | with the first right bracket (`]`) that is not backslash-escaped. | |||
| Between these brackets there must be at least one non-[whitespace character]. | Between these brackets there must be at least one [non-whitespace character]. | |||
| Unescaped square bracket characters are not allowed in | Unescaped square bracket characters are not allowed in | |||
| [link label]s. A link label can have at most 999 | [link label]s. A link label can have at most 999 | |||
| characters inside the square brackets. | characters inside the square brackets. | |||
| One label [matches](@matches) | One label [matches](@matches) | |||
| another just in case their normalized forms are equal. To normalize a | another just in case their normalized forms are equal. To normalize a | |||
| label, perform the *unicode case fold* and collapse consecutive internal | label, perform the *unicode case fold* and collapse consecutive internal | |||
| [whitespace] to a single space. If there are multiple | [whitespace] to a single space. If there are multiple | |||
| matching reference link definitions, the one that comes first in the | matching reference link definitions, the one that comes first in the | |||
| document is used. (It is desirable in such cases to emit a warning.) | document is used. (It is desirable in such cases to emit a warning.) | |||
| skipping to change at line 6381 | skipping to change at line 6918 | |||
| . | . | |||
| . | . | |||
| [foo][ref\[] | [foo][ref\[] | |||
| [ref\[]: /uri | [ref\[]: /uri | |||
| . | . | |||
| <p><a href="/uri">foo</a></p> | <p><a href="/uri">foo</a></p> | |||
| . | . | |||
| A [link label] must contain at least one non-[whitespace character]: | A [link label] must contain at least one [non-whitespace character]: | |||
| . | . | |||
| [] | [] | |||
| []: /uri | []: /uri | |||
| . | . | |||
| <p>[]</p> | <p>[]</p> | |||
| <p>[]: /uri</p> | <p>[]: /uri</p> | |||
| . | . | |||
| skipping to change at line 6961 | skipping to change at line 7498 | |||
| ## Raw HTML | ## Raw HTML | |||
| Text between `<` and `>` that looks like an HTML tag is parsed as a | Text between `<` and `>` that looks like an HTML tag is parsed as a | |||
| raw HTML tag and will be rendered in HTML without escaping. | raw HTML tag and will be rendered in HTML without escaping. | |||
| Tag and attribute names are not limited to current HTML tags, | Tag and attribute names are not limited to current HTML tags, | |||
| so custom tags (and even, say, DocBook tags) may be used. | so custom tags (and even, say, DocBook tags) may be used. | |||
| Here is the grammar for tags: | Here is the grammar for tags: | |||
| A [tag name](@tag-name) consists of an ASCII letter | A [tag name](@tag-name) consists of an ASCII letter | |||
| followed by zero or more ASCII letters or digits. | followed by zero or more ASCII letters, digits, or | |||
| hyphens (`-`). | ||||
| An [attribute](@attribute) consists of [whitespace], | An [attribute](@attribute) consists of [whitespace], | |||
| an [attribute name], and an optional | an [attribute name], and an optional | |||
| [attribute value specification]. | [attribute value specification]. | |||
| An [attribute name](@attribute-name) | An [attribute name](@attribute-name) | |||
| consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII | consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII | |||
| letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML | letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML | |||
| specification restricted to ASCII. HTML5 is laxer.) | specification restricted to ASCII. HTML5 is laxer.) | |||
| skipping to change at line 6994 | skipping to change at line 7532 | |||
| A [single-quoted attribute value](@single-quoted-attribute-value) | A [single-quoted attribute value](@single-quoted-attribute-value) | |||
| consists of `'`, zero or more | consists of `'`, zero or more | |||
| characters not including `'`, and a final `'`. | characters not including `'`, and a final `'`. | |||
| A [double-quoted attribute value](@double-quoted-attribute-value) | A [double-quoted attribute value](@double-quoted-attribute-value) | |||
| consists of `"`, zero or more | consists of `"`, zero or more | |||
| characters not including `"`, and a final `"`. | characters not including `"`, and a final `"`. | |||
| An [open tag](@open-tag) consists of a `<` character, a [tag name], | An [open tag](@open-tag) consists of a `<` character, a [tag name], | |||
| zero or more [attributes], optional [whitespace], an optional `/` | zero or more [attributes](@attribute], optional [whitespace], an optional `/` | |||
| character, and a `>` character. | character, and a `>` character. | |||
| A [closing tag](@closing-tag) consists of the string `</`, a | A [closing tag](@closing-tag) consists of the string `</`, a | |||
| [tag name], optional [whitespace], and the character `>`. | [tag name], optional [whitespace], and the character `>`. | |||
| An [HTML comment](@html-comment) consists of `<!--` + *text* + `-->`, | An [HTML comment](@html-comment) consists of `<!--` + *text* + `-->`, | |||
| where *text* does not start with `>` or `->`, does not end with `-`, | where *text* does not start with `>` or `->`, does not end with `-`, | |||
| and does not contain `--`. (See the | and does not contain `--`. (See the | |||
| [HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) | [HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) | |||
| skipping to change at line 7059 | skipping to change at line 7597 | |||
| With attributes: | With attributes: | |||
| . | . | |||
| <a foo="bar" bam = 'baz <em>"</em>' | <a foo="bar" bam = 'baz <em>"</em>' | |||
| _boolean zoop:33=zoop:33 /> | _boolean zoop:33=zoop:33 /> | |||
| . | . | |||
| <p><a foo="bar" bam = 'baz <em>"</em>' | <p><a foo="bar" bam = 'baz <em>"</em>' | |||
| _boolean zoop:33=zoop:33 /></p> | _boolean zoop:33=zoop:33 /></p> | |||
| . | . | |||
| Custom tag names can be used: | ||||
| . | ||||
| <responsive-image src="foo.jpg" /> | ||||
| <My-Tag> | ||||
| foo | ||||
| </My-Tag> | ||||
| . | ||||
| <responsive-image src="foo.jpg" /> | ||||
| <My-Tag> | ||||
| foo | ||||
| </My-Tag> | ||||
| . | ||||
| Illegal tag names, not parsed as HTML: | Illegal tag names, not parsed as HTML: | |||
| . | . | |||
| <33> <__> | <33> <__> | |||
| . | . | |||
| <p><33> <__></p> | <p><33> <__></p> | |||
| . | . | |||
| Illegal attribute names: | Illegal attribute names: | |||
| skipping to change at line 7107 | skipping to change at line 7660 | |||
| . | . | |||
| <p><a href='bar'title=title></p> | <p><a href='bar'title=title></p> | |||
| . | . | |||
| Closing tags: | Closing tags: | |||
| . | . | |||
| </a> | </a> | |||
| </foo > | </foo > | |||
| . | . | |||
| <p></a> | </a> | |||
| </foo ></p> | </foo > | |||
| . | . | |||
| Illegal attributes in closing tag: | Illegal attributes in closing tag: | |||
| . | . | |||
| </a href="foo"> | </a href="foo"> | |||
| . | . | |||
| <p></a href="foo"></p> | <p></a href="foo"></p> | |||
| . | . | |||
| skipping to change at line 7175 | skipping to change at line 7728 | |||
| foo <![CDATA[>&<]]> | foo <![CDATA[>&<]]> | |||
| . | . | |||
| <p>foo <![CDATA[>&<]]></p> | <p>foo <![CDATA[>&<]]></p> | |||
| . | . | |||
| Entities are preserved in HTML attributes: | Entities are preserved in HTML attributes: | |||
| . | . | |||
| <a href="ö"> | <a href="ö"> | |||
| . | . | |||
| <p><a href="ö"></p> | <a href="ö"> | |||
| . | . | |||
| Backslash escapes do not work in HTML attributes: | Backslash escapes do not work in HTML attributes: | |||
| . | . | |||
| <a href="\*"> | <a href="\*"> | |||
| . | . | |||
| <p><a href="\*"></p> | <a href="\*"> | |||
| . | . | |||
| . | . | |||
| <a href="\""> | <a href="\""> | |||
| . | . | |||
| <p><a href="""></p> | <p><a href="""></p> | |||
| . | . | |||
| ## Hard line breaks | ## Hard line breaks | |||
| skipping to change at line 7387 | skipping to change at line 7940 | |||
| Internal spaces are preserved verbatim: | Internal spaces are preserved verbatim: | |||
| . | . | |||
| Multiple spaces | Multiple spaces | |||
| . | . | |||
| <p>Multiple spaces</p> | <p>Multiple spaces</p> | |||
| . | . | |||
| <!-- END TESTS --> | <!-- END TESTS --> | |||
| # Appendix A: A parsing strategy {-} | # Appendix: A parsing strategy {-} | |||
| In this appendix we describe some features of the parsing strategy | ||||
| used in the CommonMark reference implementations. | ||||
| ## Overview {-} | ## Overview {-} | |||
| Parsing has two phases: | Parsing has two phases: | |||
| 1. In the first phase, lines of input are consumed and the block | 1. In the first phase, lines of input are consumed and the block | |||
| structure of the document---its division into paragraphs, block quotes, | structure of the document---its division into paragraphs, block quotes, | |||
| list items, and so on---is constructed. Text is assigned to these | list items, and so on---is constructed. Text is assigned to these | |||
| blocks but not parsed. Link reference definitions are parsed and a | blocks but not parsed. Link reference definitions are parsed and a | |||
| map of links is constructed. | map of links is constructed. | |||
| 2. In the second phase, the raw text contents of paragraphs and headers | 2. In the second phase, the raw text contents of paragraphs and headers | |||
| are parsed into sequences of Markdown inline elements (strings, | are parsed into sequences of Markdown inline elements (strings, | |||
| code spans, links, emphasis, and so on), using the map of link | code spans, links, emphasis, and so on), using the map of link | |||
| references constructed in phase 1. | references constructed in phase 1. | |||
| ## The document tree {-} | ||||
| At each point in processing, the document is represented as a tree of | At each point in processing, the document is represented as a tree of | |||
| **blocks**. The root of the tree is a `document` block. The `document` | **blocks**. The root of the tree is a `document` block. The `document` | |||
| may have any number of other blocks as **children**. These children | may have any number of other blocks as **children**. These children | |||
| may, in turn, have other blocks as children. The last child of a block | may, in turn, have other blocks as children. The last child of a block | |||
| is normally considered **open**, meaning that subsequent lines of input | is normally considered **open**, meaning that subsequent lines of input | |||
| can alter its contents. (Blocks that are not open are **closed**.) | can alter its contents. (Blocks that are not open are **closed**.) | |||
| Here, for example, is a possible document tree, with the open blocks | Here, for example, is a possible document tree, with the open blocks | |||
| marked by arrows: | marked by arrows: | |||
| ``` tree | ``` tree | |||
| skipping to change at line 7429 | skipping to change at line 7983 | |||
| "Lorem ipsum dolor\nsit amet." | "Lorem ipsum dolor\nsit amet." | |||
| -> list (type=bullet tight=true bullet_char=-) | -> list (type=bullet tight=true bullet_char=-) | |||
| list_item | list_item | |||
| paragraph | paragraph | |||
| "Qui *quodsi iracundia*" | "Qui *quodsi iracundia*" | |||
| -> list_item | -> list_item | |||
| -> paragraph | -> paragraph | |||
| "aliquando id" | "aliquando id" | |||
| ``` | ``` | |||
| ## How source lines alter the document tree {-} | ## Phase 1: block structure {-} | |||
| Each line that is processed has an effect on this tree. The line is | Each line that is processed has an effect on this tree. The line is | |||
| analyzed and, depending on its contents, the document may be altered | analyzed and, depending on its contents, the document may be altered | |||
| in one or more of the following ways: | in one or more of the following ways: | |||
| 1. One or more open blocks may be closed. | 1. One or more open blocks may be closed. | |||
| 2. One or more new blocks may be created as children of the | 2. One or more new blocks may be created as children of the | |||
| last open block. | last open block. | |||
| 3. Text may be added to the last (deepest) open block remaining | 3. Text may be added to the last (deepest) open block remaining | |||
| on the tree. | on the tree. | |||
| Once a line has been incorporated into the tree in this way, | Once a line has been incorporated into the tree in this way, | |||
| it can be discarded, so input can be read in a stream. | it can be discarded, so input can be read in a stream. | |||
| For each line, we follow this procedure: | ||||
| 1. First we iterate through the open blocks, starting with the | ||||
| root document, and descending through last children down to the last | ||||
| open block. Each block imposes a condition that the line must satisfy | ||||
| if the block is to remain open. For example, a block quote requires a | ||||
| `>` character. A paragraph requires a non-blank line. | ||||
| In this phase we may match all or just some of the open | ||||
| blocks. But we cannot close unmatched blocks yet, because we may have a | ||||
| [lazy continuation line]. | ||||
| 2. Next, after consuming the continuation markers for existing | ||||
| blocks, we look for new block starts (e.g. `>` for a block quote. | ||||
| If we encounter a new block start, we close any blocks unmatched | ||||
| in step 1 before creating the new block as a child of the last | ||||
| matched block. | ||||
| 3. Finally, we look at the remainder of the line (after block | ||||
| markers like `>`, list markers, and indentation have been consumed). | ||||
| This is text that can be incorporated into the last open | ||||
| block (a paragraph, code block, header, or raw HTML). | ||||
| Setext headers are formed when we detect that the second line of | ||||
| a paragraph is a setext header line. | ||||
| Reference link definitions are detected when a paragraph is closed; | ||||
| the accumulated text lines are parsed to see if they begin with | ||||
| one or more reference link definitions. Any remainder becomes a | ||||
| normal paragraph. | ||||
| We can see how this works by considering how the tree above is | We can see how this works by considering how the tree above is | |||
| generated by four lines of Markdown: | generated by four lines of Markdown: | |||
| ``` markdown | ``` markdown | |||
| > Lorem ipsum dolor | > Lorem ipsum dolor | |||
| sit amet. | sit amet. | |||
| > - Qui *quodsi iracundia* | > - Qui *quodsi iracundia* | |||
| > - aliquando id | > - aliquando id | |||
| ``` | ``` | |||
| skipping to change at line 7541 | skipping to change at line 8125 | |||
| "Lorem ipsum dolor\nsit amet." | "Lorem ipsum dolor\nsit amet." | |||
| -> list (type=bullet tight=true bullet_char=-) | -> list (type=bullet tight=true bullet_char=-) | |||
| list_item | list_item | |||
| paragraph | paragraph | |||
| "Qui *quodsi iracundia*" | "Qui *quodsi iracundia*" | |||
| -> list_item | -> list_item | |||
| -> paragraph | -> paragraph | |||
| "aliquando id" | "aliquando id" | |||
| ``` | ``` | |||
| ## From block structure to the final document {-} | ## Phase 2: inline structure {-} | |||
| Once all of the input has been parsed, all open blocks are closed. | Once all of the input has been parsed, all open blocks are closed. | |||
| We then "walk the tree," visiting every node, and parse raw | We then "walk the tree," visiting every node, and parse raw | |||
| string contents of paragraphs and headers as inlines. At this | string contents of paragraphs and headers as inlines. At this | |||
| point we have seen all the link reference definitions, so we can | point we have seen all the link reference definitions, so we can | |||
| resolve reference links as we go. | resolve reference links as we go. | |||
| ``` tree | ``` tree | |||
| document | document | |||
| skipping to change at line 7572 | skipping to change at line 8156 | |||
| str "quodsi iracundia" | str "quodsi iracundia" | |||
| list_item | list_item | |||
| paragraph | paragraph | |||
| str "aliquando id" | str "aliquando id" | |||
| ``` | ``` | |||
| Notice how the [line ending] in the first paragraph has | Notice how the [line ending] in the first paragraph has | |||
| been parsed as a `softbreak`, and the asterisks in the first list item | been parsed as a `softbreak`, and the asterisks in the first list item | |||
| have become an `emph`. | have become an `emph`. | |||
| The document can be rendered as HTML, or in any other format, given | ### An algorithm for parsing nested emphasis and links {-} | |||
| an appropriate renderer. | ||||
| By far the trickiest part of inline parsing is handling emphasis, | ||||
| strong emphasis, links, and images. This is done using the following | ||||
| algorithm. | ||||
| When we're parsing inlines and we hit either | ||||
| - a run of `*` or `_` characters, or | ||||
| - a `[` or `. | ||||
| The [delimiter stack] is a doubly linked list. Each | ||||
| element contains a pointer to a text node, plus information about | ||||
| - the type of delimiter (`[`, `![`, `*`, `_`) | ||||
| - the number of delimiters, | ||||
| - whether the delimiter is "active" (all are active to start), and | ||||
| - whether the delimiter is a potential opener, a potential closer, | ||||
| or both (which depends on what sort of characters precede | ||||
| and follow the delimiters). | ||||
| When we hit a `]` character, we call the *look for link or image* | ||||
| procedure (see below). | ||||
| When we hit the end of the input, we call the *process emphasis* | ||||
| procedure (see below), with `stack_bottom` = NULL. | ||||
| #### *look for link or image* {-} | ||||
| Starting at the top of the delimiter stack, we look backwards | ||||
| through the stack for an opening `[` or `![` delimiter. | ||||
| - If we don't find one, we return a literal text node `]`. | ||||
| - If we do find one, but it's not *active*, we remove the inactive | ||||
| delimiter from the stack, and return a literal text node `]`. | ||||
| - If we find one and it's active, then we parse ahead to see if | ||||
| we have an inline link/image, reference link/image, compact reference | ||||
| link/image, or shortcut reference link/image. | ||||
| + If we don't, then we remove the opening delimiter from the | ||||
| delimiter stack and return a literal text node `]`. | ||||
| + If we do, then | ||||
| * We return a link or image node whose children are the inlines | ||||
| after the text node pointed to by the opening delimiter. | ||||
| * We run *process emphasis* on these inlines, with the `[` opener | ||||
| as `stack_bottom`. | ||||
| * We remove the opening delimiter. | ||||
| * If we have a link (and not an image), we also set all | ||||
| `[` delimiters before the opening delimiter to *inactive*. (This | ||||
| will prevent us from getting links within links.) | ||||
| #### *process emphasis* {-} | ||||
| Parameter `stack_bottom` sets a lower bound to how far we | ||||
| descend in the [delimiter stack]. If it is NULL, we can | ||||
| go all the way to the bottom. Otherwise, we stop before | ||||
| visiting `stack_bottom`. | ||||
| Let `current_position` point to the element on the [delimiter stack] | ||||
| just above `stack_bottom` (or the first element if `stack_bottom` | ||||
| is NULL). | ||||
| We keep track of the `openers_bottom` for each delimiter | ||||
| type (`*`, `_`). Initialize this to `stack_bottom`. | ||||
| Then we repeat the following until we run out of potential | ||||
| closers: | ||||
| - Move `current_position` forward in the delimiter stack (if needed) | ||||
| until we find the first potential closer with delimiter `*` or `_`. | ||||
| (This will be the potential closer closest | ||||
| to the beginning of the input -- the first one in parse order.) | ||||
| - Now, look back in the stack (staying above `stack_bottom` and | ||||
| the `openers_bottom` for this delimiter type) for the | ||||
| first matching potential opener ("matching" means same delimiter). | ||||
| - If one is found: | ||||
| + Figure out whether we have emphasis or strong emphasis: | ||||
| if both closer and opener spans have length >= 2, we have | ||||
| strong, otherwise regular. | ||||
| + Insert an emph or strong emph node accordingly, after | ||||
| the text node corresponding to the opener. | ||||
| + Remove any delimiters between the opener and closer from | ||||
| the delimiter stack. | ||||
| + Remove 1 (for regular emph) or 2 (for strong emph) delimiters | ||||
| from the opening and closing text nodes. If they become empty | ||||
| as a result, remove them and remove the corresponding element | ||||
| of the delimiter stack. If the closing node is removed, reset | ||||
| `current_position` to the next element in the stack. | ||||
| - If none in found: | ||||
| + Set `openers_bottom` to the element before `current_position`. | ||||
| (We know that there are no openers for this kind of closer up to and | ||||
| including this point, so this puts a lower bound on future searches.) | ||||
| + If the closer at `current_position` is not a potential opener, | ||||
| remove it from the delimiter stack (since we know it can't | ||||
| be a closer either). | ||||
| + Advance `current_position` to the next element in the stack. | ||||
| After we're done, we remove all delimiters above `stack_bottom` from the | ||||
| delimiter stack. | ||||
| End of changes. 74 change blocks. | ||||
| 98 lines changed or deleted | 682 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||