spec.txt | spec.txt | |||
---|---|---|---|---|
--- | --- | |||
title: CommonMark Spec | title: CommonMark Spec | |||
author: John MacFarlane | author: John MacFarlane | |||
version: 0.19 | version: 0.20 | |||
date: 2015-04-27 | date: 2015-06-08 | |||
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | |||
... | ... | |||
# Introduction | # Introduction | |||
## What is Markdown? | ## What is Markdown? | |||
Markdown is a plain text format for writing structured documents, | Markdown is a plain text format for writing structured documents, | |||
based on conventions used for indicating formatting in email and | based on conventions used for indicating formatting in email and | |||
usenet posts. It was developed in 2004 by John Gruber, who wrote | usenet posts. It was developed in 2004 by John Gruber, who wrote | |||
skipping to change at line 215 | skipping to change at line 215 | |||
document. | document. | |||
A [character](@character) is a unicode code point. | A [character](@character) is a unicode code point. | |||
This spec does not specify an encoding; it thinks of lines as composed | This spec does not specify an encoding; it thinks of lines as composed | |||
of characters rather than bytes. A conforming parser may be limited | of characters rather than bytes. A conforming parser may be limited | |||
to a certain encoding. | to a certain encoding. | |||
A [line](@line) is a sequence of zero or more [character]s | A [line](@line) is a sequence of zero or more [character]s | |||
followed by a [line ending] or by the end of file. | followed by a [line ending] or by the end of file. | |||
A [line ending](@line-ending) is, depending on the platform, a | A [line ending](@line-ending) is a newline (`U+000A`), carriage return | |||
newline (`U+000A`), carriage return (`U+000D`), or | (`U+000D`), or carriage return + newline. | |||
carriage return + newline. | ||||
For security reasons, a conforming parser must strip or replace the | ||||
Unicode character `U+0000`. | ||||
A line containing no characters, or a line containing only spaces | A line containing no characters, or a line containing only spaces | |||
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). | (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). | |||
The following definitions of character classes will be used in this spec: | The following definitions of character classes will be used in this spec: | |||
A [whitespace character](@whitespace-character) is a space | A [whitespace character](@whitespace-character) is a space | |||
(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), | (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), | |||
form feed (`U+000C`), or carriage return (`U+000D`). | form feed (`U+000C`), or carriage return (`U+000D`). | |||
skipping to change at line 242 | skipping to change at line 238 | |||
character]s. | character]s. | |||
A [unicode whitespace character](@unicode-whitespace-character) is | A [unicode whitespace character](@unicode-whitespace-character) is | |||
any code point in the unicode `Zs` class, or a tab (`U+0009`), | any code point in the unicode `Zs` class, or a tab (`U+0009`), | |||
carriage return (`U+000D`), newline (`U+000A`), or form feed | carriage return (`U+000D`), newline (`U+000A`), or form feed | |||
(`U+000C`). | (`U+000C`). | |||
[Unicode whitespace](@unicode-whitespace) is a sequence of one | [Unicode whitespace](@unicode-whitespace) is a sequence of one | |||
or more [unicode whitespace character]s. | or more [unicode whitespace character]s. | |||
A [non-space character](@non-space-character) is anything but `U+0020`. | A [space](@space) is `U+0020`. | |||
A [non-space character](@non-space-character) is any character | ||||
that is not a [whitespace character]. | ||||
An [ASCII punctuation character](@ascii-punctuation-character) | An [ASCII punctuation character](@ascii-punctuation-character) | |||
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | |||
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | |||
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | |||
A [punctuation character](@punctuation-character) is an [ASCII | A [punctuation character](@punctuation-character) is an [ASCII | |||
punctuation character] or anything in | punctuation character] or anything in | |||
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | |||
## Tab expansion | ## Preprocessing | |||
Tabs in lines are expanded to spaces, with a tab stop of 4 characters: | Tabs in lines are immediately expanded to [spaces][space], with a tab | |||
stop of 4 characters: | ||||
. | . | |||
→foo→baz→→bim | →foo→baz→→bim | |||
. | . | |||
<pre><code>foo baz bim | <pre><code>foo baz bim | |||
</code></pre> | </code></pre> | |||
. | . | |||
. | . | |||
a→a | a→a | |||
ὐ→a | ὐ→a | |||
. | . | |||
<pre><code>a a | <pre><code>a a | |||
ὐ a | ὐ a | |||
</code></pre> | </code></pre> | |||
. | . | |||
## Insecure characters | ||||
For security reasons, the Unicode character `U+0000` must be replaced | ||||
with the replacement character (`U+FFFD`). | ||||
# Blocks and inlines | # Blocks and inlines | |||
We can think of a document as a sequence of | We can think of a document as a sequence of | |||
[blocks](@block)---structural | [blocks](@block)---structural elements like paragraphs, block | |||
elements like paragraphs, block quotations, | quotations, lists, headers, rules, and code blocks. Some blocks (like | |||
lists, headers, rules, and code blocks. Blocks can contain other | block quotes and list items) contain other blocks; others (like | |||
blocks, or they can contain [inline](@inline) content: | headers and paragraphs) contain [inline](@inline) content---text, | |||
words, spaces, links, emphasized text, images, and inline code. | links, emphasized text, images, code, and so on. | |||
## Precedence | ## Precedence | |||
Indicators of block structure always take precedence over indicators | Indicators of block structure always take precedence over indicators | |||
of inline structure. So, for example, the following is a list with | of inline structure. So, for example, the following is a list with | |||
two items, not a list with one item containing a code span: | two items, not a list with one item containing a code span: | |||
. | . | |||
- `one | - `one | |||
- two` | - two` | |||
skipping to change at line 531 | skipping to change at line 536 | |||
</ul> | </ul> | |||
. | . | |||
## ATX headers | ## ATX headers | |||
An [ATX header](@atx-header) | An [ATX header](@atx-header) | |||
consists of a string of characters, parsed as inline content, between an | consists of a string of characters, parsed as inline content, between an | |||
opening sequence of 1--6 unescaped `#` characters and an optional | opening sequence of 1--6 unescaped `#` characters and an optional | |||
closing sequence of any number of `#` characters. The opening sequence | closing sequence of any number of `#` characters. The opening sequence | |||
of `#` characters cannot be followed directly by a | of `#` characters cannot be followed directly by a | |||
[non-space character]. | [non-space character]. The optional closing sequence of `#`s must be | |||
The optional closing sequence of `#`s must be preceded by a space and may be | preceded by a [space] and may be followed by spaces only. The opening | |||
followed by spaces only. The opening `#` character may be indented 0-3 | `#` character may be indented 0-3 spaces. The raw contents of the | |||
spaces. The raw contents of the header are stripped of leading and | header are stripped of leading and trailing spaces before being parsed | |||
trailing spaces before being parsed as inline content. The header level | as inline content. The header level is equal to the number of `#` | |||
is equal to the number of `#` characters in the opening sequence. | characters in the opening sequence. | |||
Simple headers: | Simple headers: | |||
. | . | |||
# foo | # foo | |||
## foo | ## foo | |||
### foo | ### foo | |||
#### foo | #### foo | |||
##### foo | ##### foo | |||
###### foo | ###### foo | |||
skipping to change at line 564 | skipping to change at line 569 | |||
. | . | |||
More than six `#` characters is not a header: | More than six `#` characters is not a header: | |||
. | . | |||
####### foo | ####### foo | |||
. | . | |||
<p>####### foo</p> | <p>####### foo</p> | |||
. | . | |||
A space is required between the `#` characters and the header's | At least one space is required between the `#` characters and the | |||
contents. Note that many implementations currently do not require | header's contents, unless the header is empty. Note that many | |||
the space. However, the space was required by the [original ATX | implementations currently do not require the space. However, the | |||
implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps | space was required by the | |||
prevent things like the following from being parsed as headers: | [original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), | |||
and it helps prevent things like the following from being parsed as | ||||
headers: | ||||
. | . | |||
#5 bolt | #5 bolt | |||
#foobar | ||||
. | . | |||
<p>#5 bolt</p> | <p>#5 bolt</p> | |||
<p>#foobar</p> | ||||
. | . | |||
This is not a header, because the first `#` is escaped: | This is not a header, because the first `#` is escaped: | |||
. | . | |||
\## foo | \## foo | |||
. | . | |||
<p>## foo</p> | <p>## foo</p> | |||
. | . | |||
skipping to change at line 1027 | skipping to change at line 1037 | |||
. | . | |||
a simple | a simple | |||
indented code block | indented code block | |||
. | . | |||
<pre><code>a simple | <pre><code>a simple | |||
indented code block | indented code block | |||
</code></pre> | </code></pre> | |||
. | . | |||
The contents are literal text, and do not get parsed as Markdown: | If there is any ambiguity between an interpretation of indentation | |||
as a code block and as indicating that material belongs to a [list | ||||
item][list items], the list item interpretation takes precedence: | ||||
. | ||||
- foo | ||||
bar | ||||
. | ||||
<ul> | ||||
<li> | ||||
<p>foo</p> | ||||
<p>bar</p> | ||||
</li> | ||||
</ul> | ||||
. | ||||
. | ||||
1. foo | ||||
- bar | ||||
. | ||||
<ol> | ||||
<li> | ||||
<p>foo</p> | ||||
<ul> | ||||
<li>bar</li> | ||||
</ul> | ||||
</li> | ||||
</ol> | ||||
. | ||||
The contents of a code block are literal text, and do not get parsed | ||||
as Markdown: | ||||
. | . | |||
<a/> | <a/> | |||
*hi* | *hi* | |||
- one | - one | |||
. | . | |||
<pre><code><a/> | <pre><code><a/> | |||
*hi* | *hi* | |||
skipping to change at line 2312 | skipping to change at line 2355 | |||
baz | baz | |||
> foo | > foo | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>bar | <p>bar | |||
baz | baz | |||
foo</p> | foo</p> | |||
</blockquote> | </blockquote> | |||
. | . | |||
Laziness only applies to lines that are continuations of | Laziness only applies to lines that would have been continuations of | |||
paragraphs. Lines containing characters or indentation that indicate | paragraphs had they been prepended with `>`. For example, the | |||
block structure cannot be lazy. | `>` cannot be omitted in the second line of | |||
``` markdown | ||||
> foo | ||||
> --- | ||||
``` | ||||
without changing the meaning: | ||||
. | . | |||
> foo | > foo | |||
--- | --- | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>foo</p> | <p>foo</p> | |||
</blockquote> | </blockquote> | |||
<hr /> | <hr /> | |||
. | . | |||
Similarly, if we omit the `>` in the second line of | ||||
``` markdown | ||||
> - foo | ||||
> - bar | ||||
``` | ||||
then the block quote ends after the first line: | ||||
. | . | |||
> - foo | > - foo | |||
- bar | - bar | |||
. | . | |||
<blockquote> | <blockquote> | |||
<ul> | <ul> | |||
<li>foo</li> | <li>foo</li> | |||
</ul> | </ul> | |||
</blockquote> | </blockquote> | |||
<ul> | <ul> | |||
<li>bar</li> | <li>bar</li> | |||
</ul> | </ul> | |||
. | . | |||
For the same reason, we can't omit the `>` in front of | ||||
subsequent lines of an indented or fenced code block: | ||||
. | . | |||
> foo | > foo | |||
bar | bar | |||
. | . | |||
<blockquote> | <blockquote> | |||
<pre><code>foo | <pre><code>foo | |||
</code></pre> | </code></pre> | |||
</blockquote> | </blockquote> | |||
<pre><code>bar | <pre><code>bar | |||
</code></pre> | </code></pre> | |||
skipping to change at line 3808 | skipping to change at line 3870 | |||
List items need not be indented to the same level. The following | List items need not be indented to the same level. The following | |||
list items will be treated as items at the same list level, | list items will be treated as items at the same list level, | |||
since none is indented enough to belong to the previous list | since none is indented enough to belong to the previous list | |||
item: | item: | |||
. | . | |||
- a | - a | |||
- b | - b | |||
- c | - c | |||
- d | - d | |||
- e | - e | |||
- f | - f | |||
- g | - g | |||
- h | ||||
- i | ||||
. | . | |||
<ul> | <ul> | |||
<li>a</li> | <li>a</li> | |||
<li>b</li> | <li>b</li> | |||
<li>c</li> | <li>c</li> | |||
<li>d</li> | <li>d</li> | |||
<li>e</li> | <li>e</li> | |||
<li>f</li> | <li>f</li> | |||
<li>g</li> | <li>g</li> | |||
<li>h</li> | ||||
<li>i</li> | ||||
</ul> | </ul> | |||
. | . | |||
. | ||||
1. a | ||||
2. b | ||||
3. c | ||||
. | ||||
<ol> | ||||
<li> | ||||
<p>a</p> | ||||
</li> | ||||
<li> | ||||
<p>b</p> | ||||
</li> | ||||
<li> | ||||
<p>c</p> | ||||
</li> | ||||
</ol> | ||||
. | ||||
This is a loose list, because there is a blank line between | This is a loose list, because there is a blank line between | |||
two of the list items: | two of the list items: | |||
. | . | |||
- a | - a | |||
- b | - b | |||
- c | - c | |||
. | . | |||
<ul> | <ul> | |||
skipping to change at line 4247 | skipping to change at line 4333 | |||
. | . | |||
& © Æ Ď ¾ ℋ ⅆ &Cl ockwiseContourIntegral; | & © Æ Ď ¾ ℋ ⅆ &Cl ockwiseContourIntegral; | |||
. | . | |||
<p> & © Æ Ď ¾ ℋ ⅆ ∲</p> | <p> & © Æ Ď ¾ ℋ ⅆ ∲</p> | |||
. | . | |||
[Decimal entities](@decimal-entities) | [Decimal entities](@decimal-entities) | |||
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | |||
entities need to be recognised and transformed into their corresponding | entities need to be recognised and transformed into their corresponding | |||
unicode codepoints. Invalid unicode codepoints will be written as the | unicode codepoints. Invalid unicode codepoints will be replaced by | |||
"unknown codepoint" character (`0xFFFD`) | the "unknown codepoint" character (`U+FFFD`). For security reasons, | |||
the codepoint `U+0000` will also be replaced by `U+FFFD`. | ||||
. | . | |||
# Ӓ Ϡ � | # Ӓ Ϡ � � | |||
. | . | |||
<p># Ӓ Ϡ �</p> | <p># Ӓ Ϡ � �</p> | |||
. | . | |||
[Hexadecimal entities](@hexadecimal-entities) | [Hexadecimal entities](@hexadecimal-entities) | |||
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits | consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits | |||
+ `;`. They will also be parsed and turned into the corresponding | + `;`. They will also be parsed and turned into the corresponding | |||
unicode codepoints in the AST. | unicode codepoints in the AST. | |||
. | . | |||
" ആ ಫ | " ആ ಫ | |||
. | . | |||
skipping to change at line 5032 | skipping to change at line 5119 | |||
__foo, __bar__, baz__ | __foo, __bar__, baz__ | |||
. | . | |||
<p><strong>foo, <strong>bar</strong>, baz</strong></p> | <p><strong>foo, <strong>bar</strong>, baz</strong></p> | |||
. | . | |||
This is strong emphasis, even though the opening delimiter is | This is strong emphasis, even though the opening delimiter is | |||
both left- and right-flanking, because it is preceded by | both left- and right-flanking, because it is preceded by | |||
punctuation: | punctuation: | |||
. | . | |||
foo-_(bar)_ | foo-__(bar)__ | |||
. | . | |||
<p>foo-<em>(bar)</em></p> | <p>foo-<strong>(bar)</strong></p> | |||
. | . | |||
Rule 7: | Rule 7: | |||
This is not strong emphasis, because the closing delimiter is preceded | This is not strong emphasis, because the closing delimiter is preceded | |||
by whitespace: | by whitespace: | |||
. | . | |||
**foo bar ** | **foo bar ** | |||
. | . | |||
skipping to change at line 5145 | skipping to change at line 5232 | |||
__foo__bar__baz__ | __foo__bar__baz__ | |||
. | . | |||
<p><strong>foo__bar__baz</strong></p> | <p><strong>foo__bar__baz</strong></p> | |||
. | . | |||
This is strong emphasis, even though the closing delimiter is | This is strong emphasis, even though the closing delimiter is | |||
both left- and right-flanking, because it is followed by | both left- and right-flanking, because it is followed by | |||
punctuation: | punctuation: | |||
. | . | |||
_(bar)_. | __(bar)__. | |||
. | . | |||
<p><em>(bar)</em>.</p> | <p><strong>(bar)</strong>.</p> | |||
. | . | |||
Rule 9: | Rule 9: | |||
Any nonempty sequence of inline elements can be the contents of an | Any nonempty sequence of inline elements can be the contents of an | |||
emphasized span. | emphasized span. | |||
. | . | |||
*foo [bar](/url)* | *foo [bar](/url)* | |||
. | . | |||
skipping to change at line 6047 | skipping to change at line 6134 | |||
There are three kinds of [reference link](@reference-link)s: | There are three kinds of [reference link](@reference-link)s: | |||
[full](#full-reference-link), [collapsed](#collapsed-reference-link), | [full](#full-reference-link), [collapsed](#collapsed-reference-link), | |||
and [shortcut](#shortcut-reference-link). | and [shortcut](#shortcut-reference-link). | |||
A [full reference link](@full-reference-link) | A [full reference link](@full-reference-link) | |||
consists of a [link text], optional [whitespace], and a [link label] | consists of a [link text], optional [whitespace], and a [link label] | |||
that [matches] a [link reference definition] elsewhere in the document. | that [matches] a [link reference definition] elsewhere in the document. | |||
A [link label](@link-label) begins with a left bracket (`[`) and ends | A [link label](@link-label) begins with a left bracket (`[`) and ends | |||
with the first right bracket (`]`) that is not backslash-escaped. | with the first right bracket (`]`) that is not backslash-escaped. | |||
Between these brackets there must be at least one non-[whitespace character]. | ||||
Unescaped square bracket characters are not allowed in | Unescaped square bracket characters are not allowed in | |||
[link label]s. A link label can have at most 999 | [link label]s. A link label can have at most 999 | |||
characters inside the square brackets. | characters inside the square brackets. | |||
One label [matches](@matches) | One label [matches](@matches) | |||
another just in case their normalized forms are equal. To normalize a | another just in case their normalized forms are equal. To normalize a | |||
label, perform the *unicode case fold* and collapse consecutive internal | label, perform the *unicode case fold* and collapse consecutive internal | |||
[whitespace] to a single space. If there are multiple | [whitespace] to a single space. If there are multiple | |||
matching reference link definitions, the one that comes first in the | matching reference link definitions, the one that comes first in the | |||
document is used. (It is desirable in such cases to emit a warning.) | document is used. (It is desirable in such cases to emit a warning.) | |||
skipping to change at line 6293 | skipping to change at line 6381 | |||
. | . | |||
. | . | |||
[foo][ref\[] | [foo][ref\[] | |||
[ref\[]: /uri | [ref\[]: /uri | |||
. | . | |||
<p><a href="/uri">foo</a></p> | <p><a href="/uri">foo</a></p> | |||
. | . | |||
A [link label] must contain at least one non-[whitespace character]: | ||||
. | ||||
[] | ||||
[]: /uri | ||||
. | ||||
<p>[]</p> | ||||
<p>[]: /uri</p> | ||||
. | ||||
. | ||||
[ | ||||
] | ||||
[ | ||||
]: /uri | ||||
. | ||||
<p>[ | ||||
]</p> | ||||
<p>[ | ||||
]: /uri</p> | ||||
. | ||||
A [collapsed reference link](@collapsed-reference-link) | A [collapsed reference link](@collapsed-reference-link) | |||
consists of a [link label] that [matches] a | consists of a [link label] that [matches] a | |||
[link reference definition] elsewhere in the | [link reference definition] elsewhere in the | |||
document, optional [whitespace], and the string `[]`. | document, optional [whitespace], and the string `[]`. | |||
The contents of the first link label are parsed as inlines, | The contents of the first link label are parsed as inlines, | |||
which are used as the link's text. The link's URI and title are | which are used as the link's text. The link's URI and title are | |||
provided by the matching reference link definition. Thus, | provided by the matching reference link definition. Thus, | |||
`[foo][]` is equivalent to `[foo][foo]`. | `[foo][]` is equivalent to `[foo][foo]`. | |||
. | . | |||
End of changes. 27 change blocks. | ||||
42 lines changed or deleted | 154 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |