| spec.txt | spec.txt | |||
|---|---|---|---|---|
| --- | --- | |||
| title: CommonMark Spec | title: CommonMark Spec | |||
| author: John MacFarlane | author: John MacFarlane | |||
| version: 0.19 | version: 0.20 | |||
| date: 2015-04-27 | date: 2015-06-08 | |||
| license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | |||
| ... | ... | |||
| # Introduction | # Introduction | |||
| ## What is Markdown? | ## What is Markdown? | |||
| Markdown is a plain text format for writing structured documents, | Markdown is a plain text format for writing structured documents, | |||
| based on conventions used for indicating formatting in email and | based on conventions used for indicating formatting in email and | |||
| usenet posts. It was developed in 2004 by John Gruber, who wrote | usenet posts. It was developed in 2004 by John Gruber, who wrote | |||
| skipping to change at line 215 | skipping to change at line 215 | |||
| document. | document. | |||
| A [character](@character) is a unicode code point. | A [character](@character) is a unicode code point. | |||
| This spec does not specify an encoding; it thinks of lines as composed | This spec does not specify an encoding; it thinks of lines as composed | |||
| of characters rather than bytes. A conforming parser may be limited | of characters rather than bytes. A conforming parser may be limited | |||
| to a certain encoding. | to a certain encoding. | |||
| A [line](@line) is a sequence of zero or more [character]s | A [line](@line) is a sequence of zero or more [character]s | |||
| followed by a [line ending] or by the end of file. | followed by a [line ending] or by the end of file. | |||
| A [line ending](@line-ending) is, depending on the platform, a | A [line ending](@line-ending) is a newline (`U+000A`), carriage return | |||
| newline (`U+000A`), carriage return (`U+000D`), or | (`U+000D`), or carriage return + newline. | |||
| carriage return + newline. | ||||
| For security reasons, a conforming parser must strip or replace the | ||||
| Unicode character `U+0000`. | ||||
| A line containing no characters, or a line containing only spaces | A line containing no characters, or a line containing only spaces | |||
| (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). | (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). | |||
| The following definitions of character classes will be used in this spec: | The following definitions of character classes will be used in this spec: | |||
| A [whitespace character](@whitespace-character) is a space | A [whitespace character](@whitespace-character) is a space | |||
| (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), | (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), | |||
| form feed (`U+000C`), or carriage return (`U+000D`). | form feed (`U+000C`), or carriage return (`U+000D`). | |||
| skipping to change at line 242 | skipping to change at line 238 | |||
| character]s. | character]s. | |||
| A [unicode whitespace character](@unicode-whitespace-character) is | A [unicode whitespace character](@unicode-whitespace-character) is | |||
| any code point in the unicode `Zs` class, or a tab (`U+0009`), | any code point in the unicode `Zs` class, or a tab (`U+0009`), | |||
| carriage return (`U+000D`), newline (`U+000A`), or form feed | carriage return (`U+000D`), newline (`U+000A`), or form feed | |||
| (`U+000C`). | (`U+000C`). | |||
| [Unicode whitespace](@unicode-whitespace) is a sequence of one | [Unicode whitespace](@unicode-whitespace) is a sequence of one | |||
| or more [unicode whitespace character]s. | or more [unicode whitespace character]s. | |||
| A [non-space character](@non-space-character) is anything but `U+0020`. | A [space](@space) is `U+0020`. | |||
| A [non-space character](@non-space-character) is any character | ||||
| that is not a [whitespace character]. | ||||
| An [ASCII punctuation character](@ascii-punctuation-character) | An [ASCII punctuation character](@ascii-punctuation-character) | |||
| is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | |||
| `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | |||
| `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | |||
| A [punctuation character](@punctuation-character) is an [ASCII | A [punctuation character](@punctuation-character) is an [ASCII | |||
| punctuation character] or anything in | punctuation character] or anything in | |||
| the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | |||
| ## Tab expansion | ## Preprocessing | |||
| Tabs in lines are expanded to spaces, with a tab stop of 4 characters: | Tabs in lines are immediately expanded to [spaces][space], with a tab | |||
| stop of 4 characters: | ||||
| . | . | |||
| →foo→baz→→bim | →foo→baz→→bim | |||
| . | . | |||
| <pre><code>foo baz bim | <pre><code>foo baz bim | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| . | . | |||
| a→a | a→a | |||
| ὐ→a | ὐ→a | |||
| . | . | |||
| <pre><code>a a | <pre><code>a a | |||
| ὐ a | ὐ a | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| ## Insecure characters | ||||
| For security reasons, the Unicode character `U+0000` must be replaced | ||||
| with the replacement character (`U+FFFD`). | ||||
| # Blocks and inlines | # Blocks and inlines | |||
| We can think of a document as a sequence of | We can think of a document as a sequence of | |||
| [blocks](@block)---structural | [blocks](@block)---structural elements like paragraphs, block | |||
| elements like paragraphs, block quotations, | quotations, lists, headers, rules, and code blocks. Some blocks (like | |||
| lists, headers, rules, and code blocks. Blocks can contain other | block quotes and list items) contain other blocks; others (like | |||
| blocks, or they can contain [inline](@inline) content: | headers and paragraphs) contain [inline](@inline) content---text, | |||
| words, spaces, links, emphasized text, images, and inline code. | links, emphasized text, images, code, and so on. | |||
| ## Precedence | ## Precedence | |||
| Indicators of block structure always take precedence over indicators | Indicators of block structure always take precedence over indicators | |||
| of inline structure. So, for example, the following is a list with | of inline structure. So, for example, the following is a list with | |||
| two items, not a list with one item containing a code span: | two items, not a list with one item containing a code span: | |||
| . | . | |||
| - `one | - `one | |||
| - two` | - two` | |||
| skipping to change at line 531 | skipping to change at line 536 | |||
| </ul> | </ul> | |||
| . | . | |||
| ## ATX headers | ## ATX headers | |||
| An [ATX header](@atx-header) | An [ATX header](@atx-header) | |||
| consists of a string of characters, parsed as inline content, between an | consists of a string of characters, parsed as inline content, between an | |||
| opening sequence of 1--6 unescaped `#` characters and an optional | opening sequence of 1--6 unescaped `#` characters and an optional | |||
| closing sequence of any number of `#` characters. The opening sequence | closing sequence of any number of `#` characters. The opening sequence | |||
| of `#` characters cannot be followed directly by a | of `#` characters cannot be followed directly by a | |||
| [non-space character]. | [non-space character]. The optional closing sequence of `#`s must be | |||
| The optional closing sequence of `#`s must be preceded by a space and may be | preceded by a [space] and may be followed by spaces only. The opening | |||
| followed by spaces only. The opening `#` character may be indented 0-3 | `#` character may be indented 0-3 spaces. The raw contents of the | |||
| spaces. The raw contents of the header are stripped of leading and | header are stripped of leading and trailing spaces before being parsed | |||
| trailing spaces before being parsed as inline content. The header level | as inline content. The header level is equal to the number of `#` | |||
| is equal to the number of `#` characters in the opening sequence. | characters in the opening sequence. | |||
| Simple headers: | Simple headers: | |||
| . | . | |||
| # foo | # foo | |||
| ## foo | ## foo | |||
| ### foo | ### foo | |||
| #### foo | #### foo | |||
| ##### foo | ##### foo | |||
| ###### foo | ###### foo | |||
| skipping to change at line 564 | skipping to change at line 569 | |||
| . | . | |||
| More than six `#` characters is not a header: | More than six `#` characters is not a header: | |||
| . | . | |||
| ####### foo | ####### foo | |||
| . | . | |||
| <p>####### foo</p> | <p>####### foo</p> | |||
| . | . | |||
| A space is required between the `#` characters and the header's | At least one space is required between the `#` characters and the | |||
| contents. Note that many implementations currently do not require | header's contents, unless the header is empty. Note that many | |||
| the space. However, the space was required by the [original ATX | implementations currently do not require the space. However, the | |||
| implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps | space was required by the | |||
| prevent things like the following from being parsed as headers: | [original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), | |||
| and it helps prevent things like the following from being parsed as | ||||
| headers: | ||||
| . | . | |||
| #5 bolt | #5 bolt | |||
| #foobar | ||||
| . | . | |||
| <p>#5 bolt</p> | <p>#5 bolt</p> | |||
| <p>#foobar</p> | ||||
| . | . | |||
| This is not a header, because the first `#` is escaped: | This is not a header, because the first `#` is escaped: | |||
| . | . | |||
| \## foo | \## foo | |||
| . | . | |||
| <p>## foo</p> | <p>## foo</p> | |||
| . | . | |||
| skipping to change at line 1027 | skipping to change at line 1037 | |||
| . | . | |||
| a simple | a simple | |||
| indented code block | indented code block | |||
| . | . | |||
| <pre><code>a simple | <pre><code>a simple | |||
| indented code block | indented code block | |||
| </code></pre> | </code></pre> | |||
| . | . | |||
| The contents are literal text, and do not get parsed as Markdown: | If there is any ambiguity between an interpretation of indentation | |||
| as a code block and as indicating that material belongs to a [list | ||||
| item][list items], the list item interpretation takes precedence: | ||||
| . | ||||
| - foo | ||||
| bar | ||||
| . | ||||
| <ul> | ||||
| <li> | ||||
| <p>foo</p> | ||||
| <p>bar</p> | ||||
| </li> | ||||
| </ul> | ||||
| . | ||||
| . | ||||
| 1. foo | ||||
| - bar | ||||
| . | ||||
| <ol> | ||||
| <li> | ||||
| <p>foo</p> | ||||
| <ul> | ||||
| <li>bar</li> | ||||
| </ul> | ||||
| </li> | ||||
| </ol> | ||||
| . | ||||
| The contents of a code block are literal text, and do not get parsed | ||||
| as Markdown: | ||||
| . | . | |||
| <a/> | <a/> | |||
| *hi* | *hi* | |||
| - one | - one | |||
| . | . | |||
| <pre><code><a/> | <pre><code><a/> | |||
| *hi* | *hi* | |||
| skipping to change at line 2312 | skipping to change at line 2355 | |||
| baz | baz | |||
| > foo | > foo | |||
| . | . | |||
| <blockquote> | <blockquote> | |||
| <p>bar | <p>bar | |||
| baz | baz | |||
| foo</p> | foo</p> | |||
| </blockquote> | </blockquote> | |||
| . | . | |||
| Laziness only applies to lines that are continuations of | Laziness only applies to lines that would have been continuations of | |||
| paragraphs. Lines containing characters or indentation that indicate | paragraphs had they been prepended with `>`. For example, the | |||
| block structure cannot be lazy. | `>` cannot be omitted in the second line of | |||
| ``` markdown | ||||
| > foo | ||||
| > --- | ||||
| ``` | ||||
| without changing the meaning: | ||||
| . | . | |||
| > foo | > foo | |||
| --- | --- | |||
| . | . | |||
| <blockquote> | <blockquote> | |||
| <p>foo</p> | <p>foo</p> | |||
| </blockquote> | </blockquote> | |||
| <hr /> | <hr /> | |||
| . | . | |||
| Similarly, if we omit the `>` in the second line of | ||||
| ``` markdown | ||||
| > - foo | ||||
| > - bar | ||||
| ``` | ||||
| then the block quote ends after the first line: | ||||
| . | . | |||
| > - foo | > - foo | |||
| - bar | - bar | |||
| . | . | |||
| <blockquote> | <blockquote> | |||
| <ul> | <ul> | |||
| <li>foo</li> | <li>foo</li> | |||
| </ul> | </ul> | |||
| </blockquote> | </blockquote> | |||
| <ul> | <ul> | |||
| <li>bar</li> | <li>bar</li> | |||
| </ul> | </ul> | |||
| . | . | |||
| For the same reason, we can't omit the `>` in front of | ||||
| subsequent lines of an indented or fenced code block: | ||||
| . | . | |||
| > foo | > foo | |||
| bar | bar | |||
| . | . | |||
| <blockquote> | <blockquote> | |||
| <pre><code>foo | <pre><code>foo | |||
| </code></pre> | </code></pre> | |||
| </blockquote> | </blockquote> | |||
| <pre><code>bar | <pre><code>bar | |||
| </code></pre> | </code></pre> | |||
| skipping to change at line 3808 | skipping to change at line 3870 | |||
| List items need not be indented to the same level. The following | List items need not be indented to the same level. The following | |||
| list items will be treated as items at the same list level, | list items will be treated as items at the same list level, | |||
| since none is indented enough to belong to the previous list | since none is indented enough to belong to the previous list | |||
| item: | item: | |||
| . | . | |||
| - a | - a | |||
| - b | - b | |||
| - c | - c | |||
| - d | - d | |||
| - e | - e | |||
| - f | - f | |||
| - g | - g | |||
| - h | ||||
| - i | ||||
| . | . | |||
| <ul> | <ul> | |||
| <li>a</li> | <li>a</li> | |||
| <li>b</li> | <li>b</li> | |||
| <li>c</li> | <li>c</li> | |||
| <li>d</li> | <li>d</li> | |||
| <li>e</li> | <li>e</li> | |||
| <li>f</li> | <li>f</li> | |||
| <li>g</li> | <li>g</li> | |||
| <li>h</li> | ||||
| <li>i</li> | ||||
| </ul> | </ul> | |||
| . | . | |||
| . | ||||
| 1. a | ||||
| 2. b | ||||
| 3. c | ||||
| . | ||||
| <ol> | ||||
| <li> | ||||
| <p>a</p> | ||||
| </li> | ||||
| <li> | ||||
| <p>b</p> | ||||
| </li> | ||||
| <li> | ||||
| <p>c</p> | ||||
| </li> | ||||
| </ol> | ||||
| . | ||||
| This is a loose list, because there is a blank line between | This is a loose list, because there is a blank line between | |||
| two of the list items: | two of the list items: | |||
| . | . | |||
| - a | - a | |||
| - b | - b | |||
| - c | - c | |||
| . | . | |||
| <ul> | <ul> | |||
| skipping to change at line 4247 | skipping to change at line 4333 | |||
| . | . | |||
| & © Æ Ď ¾ ℋ ⅆ &Cl ockwiseContourIntegral; | & © Æ Ď ¾ ℋ ⅆ &Cl ockwiseContourIntegral; | |||
| . | . | |||
| <p> & © Æ Ď ¾ ℋ ⅆ ∲</p> | <p> & © Æ Ď ¾ ℋ ⅆ ∲</p> | |||
| . | . | |||
| [Decimal entities](@decimal-entities) | [Decimal entities](@decimal-entities) | |||
| consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | |||
| entities need to be recognised and transformed into their corresponding | entities need to be recognised and transformed into their corresponding | |||
| unicode codepoints. Invalid unicode codepoints will be written as the | unicode codepoints. Invalid unicode codepoints will be replaced by | |||
| "unknown codepoint" character (`0xFFFD`) | the "unknown codepoint" character (`U+FFFD`). For security reasons, | |||
| the codepoint `U+0000` will also be replaced by `U+FFFD`. | ||||
| . | . | |||
| # Ӓ Ϡ � | # Ӓ Ϡ � � | |||
| . | . | |||
| <p># Ӓ Ϡ �</p> | <p># Ӓ Ϡ � �</p> | |||
| . | . | |||
| [Hexadecimal entities](@hexadecimal-entities) | [Hexadecimal entities](@hexadecimal-entities) | |||
| consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits | consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits | |||
| + `;`. They will also be parsed and turned into the corresponding | + `;`. They will also be parsed and turned into the corresponding | |||
| unicode codepoints in the AST. | unicode codepoints in the AST. | |||
| . | . | |||
| " ആ ಫ | " ആ ಫ | |||
| . | . | |||
| skipping to change at line 5032 | skipping to change at line 5119 | |||
| __foo, __bar__, baz__ | __foo, __bar__, baz__ | |||
| . | . | |||
| <p><strong>foo, <strong>bar</strong>, baz</strong></p> | <p><strong>foo, <strong>bar</strong>, baz</strong></p> | |||
| . | . | |||
| This is strong emphasis, even though the opening delimiter is | This is strong emphasis, even though the opening delimiter is | |||
| both left- and right-flanking, because it is preceded by | both left- and right-flanking, because it is preceded by | |||
| punctuation: | punctuation: | |||
| . | . | |||
| foo-_(bar)_ | foo-__(bar)__ | |||
| . | . | |||
| <p>foo-<em>(bar)</em></p> | <p>foo-<strong>(bar)</strong></p> | |||
| . | . | |||
| Rule 7: | Rule 7: | |||
| This is not strong emphasis, because the closing delimiter is preceded | This is not strong emphasis, because the closing delimiter is preceded | |||
| by whitespace: | by whitespace: | |||
| . | . | |||
| **foo bar ** | **foo bar ** | |||
| . | . | |||
| skipping to change at line 5145 | skipping to change at line 5232 | |||
| __foo__bar__baz__ | __foo__bar__baz__ | |||
| . | . | |||
| <p><strong>foo__bar__baz</strong></p> | <p><strong>foo__bar__baz</strong></p> | |||
| . | . | |||
| This is strong emphasis, even though the closing delimiter is | This is strong emphasis, even though the closing delimiter is | |||
| both left- and right-flanking, because it is followed by | both left- and right-flanking, because it is followed by | |||
| punctuation: | punctuation: | |||
| . | . | |||
| _(bar)_. | __(bar)__. | |||
| . | . | |||
| <p><em>(bar)</em>.</p> | <p><strong>(bar)</strong>.</p> | |||
| . | . | |||
| Rule 9: | Rule 9: | |||
| Any nonempty sequence of inline elements can be the contents of an | Any nonempty sequence of inline elements can be the contents of an | |||
| emphasized span. | emphasized span. | |||
| . | . | |||
| *foo [bar](/url)* | *foo [bar](/url)* | |||
| . | . | |||
| skipping to change at line 6047 | skipping to change at line 6134 | |||
| There are three kinds of [reference link](@reference-link)s: | There are three kinds of [reference link](@reference-link)s: | |||
| [full](#full-reference-link), [collapsed](#collapsed-reference-link), | [full](#full-reference-link), [collapsed](#collapsed-reference-link), | |||
| and [shortcut](#shortcut-reference-link). | and [shortcut](#shortcut-reference-link). | |||
| A [full reference link](@full-reference-link) | A [full reference link](@full-reference-link) | |||
| consists of a [link text], optional [whitespace], and a [link label] | consists of a [link text], optional [whitespace], and a [link label] | |||
| that [matches] a [link reference definition] elsewhere in the document. | that [matches] a [link reference definition] elsewhere in the document. | |||
| A [link label](@link-label) begins with a left bracket (`[`) and ends | A [link label](@link-label) begins with a left bracket (`[`) and ends | |||
| with the first right bracket (`]`) that is not backslash-escaped. | with the first right bracket (`]`) that is not backslash-escaped. | |||
| Between these brackets there must be at least one non-[whitespace character]. | ||||
| Unescaped square bracket characters are not allowed in | Unescaped square bracket characters are not allowed in | |||
| [link label]s. A link label can have at most 999 | [link label]s. A link label can have at most 999 | |||
| characters inside the square brackets. | characters inside the square brackets. | |||
| One label [matches](@matches) | One label [matches](@matches) | |||
| another just in case their normalized forms are equal. To normalize a | another just in case their normalized forms are equal. To normalize a | |||
| label, perform the *unicode case fold* and collapse consecutive internal | label, perform the *unicode case fold* and collapse consecutive internal | |||
| [whitespace] to a single space. If there are multiple | [whitespace] to a single space. If there are multiple | |||
| matching reference link definitions, the one that comes first in the | matching reference link definitions, the one that comes first in the | |||
| document is used. (It is desirable in such cases to emit a warning.) | document is used. (It is desirable in such cases to emit a warning.) | |||
| skipping to change at line 6293 | skipping to change at line 6381 | |||
| . | . | |||
| . | . | |||
| [foo][ref\[] | [foo][ref\[] | |||
| [ref\[]: /uri | [ref\[]: /uri | |||
| . | . | |||
| <p><a href="/uri">foo</a></p> | <p><a href="/uri">foo</a></p> | |||
| . | . | |||
| A [link label] must contain at least one non-[whitespace character]: | ||||
| . | ||||
| [] | ||||
| []: /uri | ||||
| . | ||||
| <p>[]</p> | ||||
| <p>[]: /uri</p> | ||||
| . | ||||
| . | ||||
| [ | ||||
| ] | ||||
| [ | ||||
| ]: /uri | ||||
| . | ||||
| <p>[ | ||||
| ]</p> | ||||
| <p>[ | ||||
| ]: /uri</p> | ||||
| . | ||||
| A [collapsed reference link](@collapsed-reference-link) | A [collapsed reference link](@collapsed-reference-link) | |||
| consists of a [link label] that [matches] a | consists of a [link label] that [matches] a | |||
| [link reference definition] elsewhere in the | [link reference definition] elsewhere in the | |||
| document, optional [whitespace], and the string `[]`. | document, optional [whitespace], and the string `[]`. | |||
| The contents of the first link label are parsed as inlines, | The contents of the first link label are parsed as inlines, | |||
| which are used as the link's text. The link's URI and title are | which are used as the link's text. The link's URI and title are | |||
| provided by the matching reference link definition. Thus, | provided by the matching reference link definition. Thus, | |||
| `[foo][]` is equivalent to `[foo][foo]`. | `[foo][]` is equivalent to `[foo][foo]`. | |||
| . | . | |||
| End of changes. 27 change blocks. | ||||
| 42 lines changed or deleted | 154 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||