spec.txt   spec.txt 
--- ---
title: CommonMark Spec title: CommonMark Spec
author: John MacFarlane author: John MacFarlane
version: 0.19 version: 0.20
date: 2015-04-27 date: 2015-06-08
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
... ...
# Introduction # Introduction
## What is Markdown? ## What is Markdown?
Markdown is a plain text format for writing structured documents, Markdown is a plain text format for writing structured documents,
based on conventions used for indicating formatting in email and based on conventions used for indicating formatting in email and
usenet posts. It was developed in 2004 by John Gruber, who wrote usenet posts. It was developed in 2004 by John Gruber, who wrote
skipping to change at line 215 skipping to change at line 215
document. document.
A [character](@character) is a unicode code point. A [character](@character) is a unicode code point.
This spec does not specify an encoding; it thinks of lines as composed This spec does not specify an encoding; it thinks of lines as composed
of characters rather than bytes. A conforming parser may be limited of characters rather than bytes. A conforming parser may be limited
to a certain encoding. to a certain encoding.
A [line](@line) is a sequence of zero or more [character]s A [line](@line) is a sequence of zero or more [character]s
followed by a [line ending] or by the end of file. followed by a [line ending] or by the end of file.
A [line ending](@line-ending) is, depending on the platform, a A [line ending](@line-ending) is a newline (`U+000A`), carriage return
newline (`U+000A`), carriage return (`U+000D`), or (`U+000D`), or carriage return + newline.
carriage return + newline.
For security reasons, a conforming parser must strip or replace the
Unicode character `U+0000`.
A line containing no characters, or a line containing only spaces A line containing no characters, or a line containing only spaces
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).
The following definitions of character classes will be used in this spec: The following definitions of character classes will be used in this spec:
A [whitespace character](@whitespace-character) is a space A [whitespace character](@whitespace-character) is a space
(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`),
form feed (`U+000C`), or carriage return (`U+000D`). form feed (`U+000C`), or carriage return (`U+000D`).
skipping to change at line 242 skipping to change at line 238
character]s. character]s.
A [unicode whitespace character](@unicode-whitespace-character) is A [unicode whitespace character](@unicode-whitespace-character) is
any code point in the unicode `Zs` class, or a tab (`U+0009`), any code point in the unicode `Zs` class, or a tab (`U+0009`),
carriage return (`U+000D`), newline (`U+000A`), or form feed carriage return (`U+000D`), newline (`U+000A`), or form feed
(`U+000C`). (`U+000C`).
[Unicode whitespace](@unicode-whitespace) is a sequence of one [Unicode whitespace](@unicode-whitespace) is a sequence of one
or more [unicode whitespace character]s. or more [unicode whitespace character]s.
A [non-space character](@non-space-character) is anything but `U+0020`. A [space](@space) is `U+0020`.
A [non-space character](@non-space-character) is any character
that is not a [whitespace character].
An [ASCII punctuation character](@ascii-punctuation-character) An [ASCII punctuation character](@ascii-punctuation-character)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`.
A [punctuation character](@punctuation-character) is an [ASCII A [punctuation character](@punctuation-character) is an [ASCII
punctuation character] or anything in punctuation character] or anything in
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
## Tab expansion ## Preprocessing
Tabs in lines are expanded to spaces, with a tab stop of 4 characters: Tabs in lines are immediately expanded to [spaces][space], with a tab
stop of 4 characters:
. .
→foo→baz→→bim →foo→baz→→bim
. .
<pre><code>foo baz bim <pre><code>foo baz bim
</code></pre> </code></pre>
. .
. .
a→a a→a
ὐ→a ὐ→a
. .
<pre><code>a a <pre><code>a a
ὐ a ὐ a
</code></pre> </code></pre>
. .
## Insecure characters
For security reasons, the Unicode character `U+0000` must be replaced
with the replacement character (`U+FFFD`).
# Blocks and inlines # Blocks and inlines
We can think of a document as a sequence of We can think of a document as a sequence of
[blocks](@block)---structural [blocks](@block)---structural elements like paragraphs, block
elements like paragraphs, block quotations, quotations, lists, headers, rules, and code blocks. Some blocks (like
lists, headers, rules, and code blocks. Blocks can contain other block quotes and list items) contain other blocks; others (like
blocks, or they can contain [inline](@inline) content: headers and paragraphs) contain [inline](@inline) content---text,
words, spaces, links, emphasized text, images, and inline code. links, emphasized text, images, code, and so on.
## Precedence ## Precedence
Indicators of block structure always take precedence over indicators Indicators of block structure always take precedence over indicators
of inline structure. So, for example, the following is a list with of inline structure. So, for example, the following is a list with
two items, not a list with one item containing a code span: two items, not a list with one item containing a code span:
. .
- `one - `one
- two` - two`
skipping to change at line 531 skipping to change at line 536
</ul> </ul>
. .
## ATX headers ## ATX headers
An [ATX header](@atx-header) An [ATX header](@atx-header)
consists of a string of characters, parsed as inline content, between an consists of a string of characters, parsed as inline content, between an
opening sequence of 1--6 unescaped `#` characters and an optional opening sequence of 1--6 unescaped `#` characters and an optional
closing sequence of any number of `#` characters. The opening sequence closing sequence of any number of `#` characters. The opening sequence
of `#` characters cannot be followed directly by a of `#` characters cannot be followed directly by a
[non-space character]. [non-space character]. The optional closing sequence of `#`s must be
The optional closing sequence of `#`s must be preceded by a space and may be preceded by a [space] and may be followed by spaces only. The opening
followed by spaces only. The opening `#` character may be indented 0-3 `#` character may be indented 0-3 spaces. The raw contents of the
spaces. The raw contents of the header are stripped of leading and header are stripped of leading and trailing spaces before being parsed
trailing spaces before being parsed as inline content. The header level as inline content. The header level is equal to the number of `#`
is equal to the number of `#` characters in the opening sequence. characters in the opening sequence.
Simple headers: Simple headers:
. .
# foo # foo
## foo ## foo
### foo ### foo
#### foo #### foo
##### foo ##### foo
###### foo ###### foo
skipping to change at line 564 skipping to change at line 569
. .
More than six `#` characters is not a header: More than six `#` characters is not a header:
. .
####### foo ####### foo
. .
<p>####### foo</p> <p>####### foo</p>
. .
A space is required between the `#` characters and the header's At least one space is required between the `#` characters and the
contents. Note that many implementations currently do not require header's contents, unless the header is empty. Note that many
the space. However, the space was required by the [original ATX implementations currently do not require the space. However, the
implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps space was required by the
prevent things like the following from being parsed as headers: [original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
and it helps prevent things like the following from being parsed as
headers:
. .
#5 bolt #5 bolt
#foobar
. .
<p>#5 bolt</p> <p>#5 bolt</p>
<p>#foobar</p>
. .
This is not a header, because the first `#` is escaped: This is not a header, because the first `#` is escaped:
. .
\## foo \## foo
. .
<p>## foo</p> <p>## foo</p>
. .
skipping to change at line 1027 skipping to change at line 1037
. .
a simple a simple
indented code block indented code block
. .
<pre><code>a simple <pre><code>a simple
indented code block indented code block
</code></pre> </code></pre>
. .
The contents are literal text, and do not get parsed as Markdown: If there is any ambiguity between an interpretation of indentation
as a code block and as indicating that material belongs to a [list
item][list items], the list item interpretation takes precedence:
.
- foo
bar
.
<ul>
<li>
<p>foo</p>
<p>bar</p>
</li>
</ul>
.
.
1. foo
- bar
.
<ol>
<li>
<p>foo</p>
<ul>
<li>bar</li>
</ul>
</li>
</ol>
.
The contents of a code block are literal text, and do not get parsed
as Markdown:
. .
<a/> <a/>
*hi* *hi*
- one - one
. .
<pre><code>&lt;a/&gt; <pre><code>&lt;a/&gt;
*hi* *hi*
skipping to change at line 2312 skipping to change at line 2355
baz baz
> foo > foo
. .
<blockquote> <blockquote>
<p>bar <p>bar
baz baz
foo</p> foo</p>
</blockquote> </blockquote>
. .
Laziness only applies to lines that are continuations of Laziness only applies to lines that would have been continuations of
paragraphs. Lines containing characters or indentation that indicate paragraphs had they been prepended with `>`. For example, the
block structure cannot be lazy. `>` cannot be omitted in the second line of
``` markdown
> foo
> ---
```
without changing the meaning:
. .
> foo > foo
--- ---
. .
<blockquote> <blockquote>
<p>foo</p> <p>foo</p>
</blockquote> </blockquote>
<hr /> <hr />
. .
Similarly, if we omit the `>` in the second line of
``` markdown
> - foo
> - bar
```
then the block quote ends after the first line:
. .
> - foo > - foo
- bar - bar
. .
<blockquote> <blockquote>
<ul> <ul>
<li>foo</li> <li>foo</li>
</ul> </ul>
</blockquote> </blockquote>
<ul> <ul>
<li>bar</li> <li>bar</li>
</ul> </ul>
. .
For the same reason, we can't omit the `>` in front of
subsequent lines of an indented or fenced code block:
. .
> foo > foo
bar bar
. .
<blockquote> <blockquote>
<pre><code>foo <pre><code>foo
</code></pre> </code></pre>
</blockquote> </blockquote>
<pre><code>bar <pre><code>bar
</code></pre> </code></pre>
skipping to change at line 3808 skipping to change at line 3870
List items need not be indented to the same level. The following List items need not be indented to the same level. The following
list items will be treated as items at the same list level, list items will be treated as items at the same list level,
since none is indented enough to belong to the previous list since none is indented enough to belong to the previous list
item: item:
. .
- a - a
- b - b
- c - c
- d - d
- e - e
- f - f
- g - g
- h
- i
. .
<ul> <ul>
<li>a</li> <li>a</li>
<li>b</li> <li>b</li>
<li>c</li> <li>c</li>
<li>d</li> <li>d</li>
<li>e</li> <li>e</li>
<li>f</li> <li>f</li>
<li>g</li> <li>g</li>
<li>h</li>
<li>i</li>
</ul> </ul>
. .
.
1. a
2. b
3. c
.
<ol>
<li>
<p>a</p>
</li>
<li>
<p>b</p>
</li>
<li>
<p>c</p>
</li>
</ol>
.
This is a loose list, because there is a blank line between This is a loose list, because there is a blank line between
two of the list items: two of the list items:
. .
- a - a
- b - b
- c - c
. .
<ul> <ul>
skipping to change at line 4247 skipping to change at line 4333
. .
&nbsp; &amp; &copy; &AElig; &Dcaron; &frac34; &HilbertSpace; &DifferentialD; &Cl ockwiseContourIntegral; &nbsp; &amp; &copy; &AElig; &Dcaron; &frac34; &HilbertSpace; &DifferentialD; &Cl ockwiseContourIntegral;
. .
<p>  &amp; © Æ Ď ¾ ℋ ⅆ ∲</p> <p>  &amp; © Æ Ď ¾ ℋ ⅆ ∲</p>
. .
[Decimal entities](@decimal-entities) [Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and transformed into their corresponding entities need to be recognised and transformed into their corresponding
unicode codepoints. Invalid unicode codepoints will be written as the unicode codepoints. Invalid unicode codepoints will be replaced by
"unknown codepoint" character (`0xFFFD`) the "unknown codepoint" character (`U+FFFD`). For security reasons,
the codepoint `U+0000` will also be replaced by `U+FFFD`.
. .
&#35; &#1234; &#992; &#98765432; &#35; &#1234; &#992; &#98765432; &#0;
. .
<p># Ӓ Ϡ �</p> <p># Ӓ Ϡ �</p>
. .
[Hexadecimal entities](@hexadecimal-entities) [Hexadecimal entities](@hexadecimal-entities)
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
+ `;`. They will also be parsed and turned into the corresponding + `;`. They will also be parsed and turned into the corresponding
unicode codepoints in the AST. unicode codepoints in the AST.
. .
&#X22; &#XD06; &#xcab; &#X22; &#XD06; &#xcab;
. .
skipping to change at line 5032 skipping to change at line 5119
__foo, __bar__, baz__ __foo, __bar__, baz__
. .
<p><strong>foo, <strong>bar</strong>, baz</strong></p> <p><strong>foo, <strong>bar</strong>, baz</strong></p>
. .
This is strong emphasis, even though the opening delimiter is This is strong emphasis, even though the opening delimiter is
both left- and right-flanking, because it is preceded by both left- and right-flanking, because it is preceded by
punctuation: punctuation:
. .
foo-_(bar)_ foo-__(bar)__
. .
<p>foo-<em>(bar)</em></p> <p>foo-<strong>(bar)</strong></p>
. .
Rule 7: Rule 7:
This is not strong emphasis, because the closing delimiter is preceded This is not strong emphasis, because the closing delimiter is preceded
by whitespace: by whitespace:
. .
**foo bar ** **foo bar **
. .
skipping to change at line 5145 skipping to change at line 5232
__foo__bar__baz__ __foo__bar__baz__
. .
<p><strong>foo__bar__baz</strong></p> <p><strong>foo__bar__baz</strong></p>
. .
This is strong emphasis, even though the closing delimiter is This is strong emphasis, even though the closing delimiter is
both left- and right-flanking, because it is followed by both left- and right-flanking, because it is followed by
punctuation: punctuation:
. .
_(bar)_. __(bar)__.
. .
<p><em>(bar)</em>.</p> <p><strong>(bar)</strong>.</p>
. .
Rule 9: Rule 9:
Any nonempty sequence of inline elements can be the contents of an Any nonempty sequence of inline elements can be the contents of an
emphasized span. emphasized span.
. .
*foo [bar](/url)* *foo [bar](/url)*
. .
skipping to change at line 6047 skipping to change at line 6134
There are three kinds of [reference link](@reference-link)s: There are three kinds of [reference link](@reference-link)s:
[full](#full-reference-link), [collapsed](#collapsed-reference-link), [full](#full-reference-link), [collapsed](#collapsed-reference-link),
and [shortcut](#shortcut-reference-link). and [shortcut](#shortcut-reference-link).
A [full reference link](@full-reference-link) A [full reference link](@full-reference-link)
consists of a [link text], optional [whitespace], and a [link label] consists of a [link text], optional [whitespace], and a [link label]
that [matches] a [link reference definition] elsewhere in the document. that [matches] a [link reference definition] elsewhere in the document.
A [link label](@link-label) begins with a left bracket (`[`) and ends A [link label](@link-label) begins with a left bracket (`[`) and ends
with the first right bracket (`]`) that is not backslash-escaped. with the first right bracket (`]`) that is not backslash-escaped.
Between these brackets there must be at least one non-[whitespace character].
Unescaped square bracket characters are not allowed in Unescaped square bracket characters are not allowed in
[link label]s. A link label can have at most 999 [link label]s. A link label can have at most 999
characters inside the square brackets. characters inside the square brackets.
One label [matches](@matches) One label [matches](@matches)
another just in case their normalized forms are equal. To normalize a another just in case their normalized forms are equal. To normalize a
label, perform the *unicode case fold* and collapse consecutive internal label, perform the *unicode case fold* and collapse consecutive internal
[whitespace] to a single space. If there are multiple [whitespace] to a single space. If there are multiple
matching reference link definitions, the one that comes first in the matching reference link definitions, the one that comes first in the
document is used. (It is desirable in such cases to emit a warning.) document is used. (It is desirable in such cases to emit a warning.)
skipping to change at line 6293 skipping to change at line 6381
. .
. .
[foo][ref\[] [foo][ref\[]
[ref\[]: /uri [ref\[]: /uri
. .
<p><a href="/uri">foo</a></p> <p><a href="/uri">foo</a></p>
. .
A [link label] must contain at least one non-[whitespace character]:
.
[]
[]: /uri
.
<p>[]</p>
<p>[]: /uri</p>
.
.
[
]
[
]: /uri
.
<p>[
]</p>
<p>[
]: /uri</p>
.
A [collapsed reference link](@collapsed-reference-link) A [collapsed reference link](@collapsed-reference-link)
consists of a [link label] that [matches] a consists of a [link label] that [matches] a
[link reference definition] elsewhere in the [link reference definition] elsewhere in the
document, optional [whitespace], and the string `[]`. document, optional [whitespace], and the string `[]`.
The contents of the first link label are parsed as inlines, The contents of the first link label are parsed as inlines,
which are used as the link's text. The link's URI and title are which are used as the link's text. The link's URI and title are
provided by the matching reference link definition. Thus, provided by the matching reference link definition. Thus,
`[foo][]` is equivalent to `[foo][foo]`. `[foo][]` is equivalent to `[foo][foo]`.
. .
 End of changes. 27 change blocks. 
42 lines changed or deleted 154 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/