Diff: spec.txt - spec.txt

	spec.txt	spec.txt
	---	---
	title: CommonMark Spec	title: CommonMark Spec
	author: John MacFarlane	author: John MacFarlane

	version: 0.19	version: 0.20
	date: 2015-04-27	date: 2015-06-08
	license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'	license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
	...	...

	# Introduction	# Introduction

	## What is Markdown?	## What is Markdown?

	Markdown is a plain text format for writing structured documents,	Markdown is a plain text format for writing structured documents,
	based on conventions used for indicating formatting in email and	based on conventions used for indicating formatting in email and
	usenet posts. It was developed in 2004 by John Gruber, who wrote	usenet posts. It was developed in 2004 by John Gruber, who wrote

	skipping to change at line 215	skipping to change at line 215
	document.	document.

	A [character](@character) is a unicode code point.	A [character](@character) is a unicode code point.
	This spec does not specify an encoding; it thinks of lines as composed	This spec does not specify an encoding; it thinks of lines as composed
	of characters rather than bytes. A conforming parser may be limited	of characters rather than bytes. A conforming parser may be limited
	to a certain encoding.	to a certain encoding.

	A [line](@line) is a sequence of zero or more [character]s	A [line](@line) is a sequence of zero or more [character]s
	followed by a [line ending] or by the end of file.	followed by a [line ending] or by the end of file.


	A [line ending](@line-ending) is, depending on the platform, a	A [line ending](@line-ending) is a newline (`U+000A`), carriage return
	newline (`U+000A`), carriage return (`U+000D`), or	(`U+000D`), or carriage return + newline.
	carriage return + newline.

	For security reasons, a conforming parser must strip or replace the
	Unicode character `U+0000`.

	A line containing no characters, or a line containing only spaces	A line containing no characters, or a line containing only spaces
	(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).	(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).

	The following definitions of character classes will be used in this spec:	The following definitions of character classes will be used in this spec:

	A [whitespace character](@whitespace-character) is a space	A [whitespace character](@whitespace-character) is a space
	(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`),	(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`),
	form feed (`U+000C`), or carriage return (`U+000D`).	form feed (`U+000C`), or carriage return (`U+000D`).


	skipping to change at line 242	skipping to change at line 238
	character]s.	character]s.

	A [unicode whitespace character](@unicode-whitespace-character) is	A [unicode whitespace character](@unicode-whitespace-character) is
	any code point in the unicode `Zs` class, or a tab (`U+0009`),	any code point in the unicode `Zs` class, or a tab (`U+0009`),
	carriage return (`U+000D`), newline (`U+000A`), or form feed	carriage return (`U+000D`), newline (`U+000A`), or form feed
	(`U+000C`).	(`U+000C`).

	[Unicode whitespace](@unicode-whitespace) is a sequence of one	[Unicode whitespace](@unicode-whitespace) is a sequence of one
	or more [unicode whitespace character]s.	or more [unicode whitespace character]s.


	A [non-space character](@non-space-character) is anything but `U+0020`.	A [space](@space) is `U+0020`.

		A [non-space character](@non-space-character) is any character
		that is not a [whitespace character].

	An [ASCII punctuation character](@ascii-punctuation-character)	An [ASCII punctuation character](@ascii-punctuation-character)
	is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,	is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
	`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,	`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,
	`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `\|`, `}`, or `~`.	`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `\|`, `}`, or `~`.

	A [punctuation character](@punctuation-character) is an [ASCII	A [punctuation character](@punctuation-character) is an [ASCII
	punctuation character] or anything in	punctuation character] or anything in
	the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.	the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.


	## Tab expansion	## Preprocessing


	Tabs in lines are expanded to spaces, with a tab stop of 4 characters:	Tabs in lines are immediately expanded to [spaces][space], with a tab
		stop of 4 characters:

	.	.
	→foo→baz→→bim	→foo→baz→→bim
	.	.
	<pre><code>foo baz bim	<pre><code>foo baz bim
	</code></pre>	</code></pre>
	.	.

	.	.
	a→a	a→a
	ὐ→a	ὐ→a
	.	.
	<pre><code>a a	<pre><code>a a
	ὐ a	ὐ a
	</code></pre>	</code></pre>
	.	.


		## Insecure characters

		For security reasons, the Unicode character `U+0000` must be replaced
		with the replacement character (`U+FFFD`).

	# Blocks and inlines	# Blocks and inlines

	We can think of a document as a sequence of	We can think of a document as a sequence of

	[blocks](@block)---structural	[blocks](@block)---structural elements like paragraphs, block
	elements like paragraphs, block quotations,	quotations, lists, headers, rules, and code blocks. Some blocks (like
	lists, headers, rules, and code blocks. Blocks can contain other	block quotes and list items) contain other blocks; others (like
	blocks, or they can contain [inline](@inline) content:	headers and paragraphs) contain [inline](@inline) content---text,
	words, spaces, links, emphasized text, images, and inline code.	links, emphasized text, images, code, and so on.

	## Precedence	## Precedence

	Indicators of block structure always take precedence over indicators	Indicators of block structure always take precedence over indicators
	of inline structure. So, for example, the following is a list with	of inline structure. So, for example, the following is a list with
	two items, not a list with one item containing a code span:	two items, not a list with one item containing a code span:

	.	.
	- `one	- `one
	- two`	- two`

	skipping to change at line 531	skipping to change at line 536
	</ul>	</ul>
	.	.

	## ATX headers	## ATX headers

	An [ATX header](@atx-header)	An [ATX header](@atx-header)
	consists of a string of characters, parsed as inline content, between an	consists of a string of characters, parsed as inline content, between an
	opening sequence of 1--6 unescaped `#` characters and an optional	opening sequence of 1--6 unescaped `#` characters and an optional
	closing sequence of any number of `#` characters. The opening sequence	closing sequence of any number of `#` characters. The opening sequence
	of `#` characters cannot be followed directly by a	of `#` characters cannot be followed directly by a

	[non-space character].	[non-space character]. The optional closing sequence of `#`s must be
	The optional closing sequence of `#`s must be preceded by a space and may be	preceded by a [space] and may be followed by spaces only. The opening
	followed by spaces only. The opening `#` character may be indented 0-3	`#` character may be indented 0-3 spaces. The raw contents of the
	spaces. The raw contents of the header are stripped of leading and	header are stripped of leading and trailing spaces before being parsed
	trailing spaces before being parsed as inline content. The header level	as inline content. The header level is equal to the number of `#`
	is equal to the number of `#` characters in the opening sequence.	characters in the opening sequence.

	Simple headers:	Simple headers:

	.	.
	# foo	# foo
	## foo	## foo
	### foo	### foo
	#### foo	#### foo
	##### foo	##### foo
	###### foo	###### foo

	skipping to change at line 564	skipping to change at line 569
	.	.

	More than six `#` characters is not a header:	More than six `#` characters is not a header:

	.	.
	####### foo	####### foo
	.	.
	<p>####### foo</p>	<p>####### foo</p>
	.	.


	A space is required between the `#` characters and the header's	At least one space is required between the `#` characters and the
	contents. Note that many implementations currently do not require	header's contents, unless the header is empty. Note that many
	the space. However, the space was required by the [original ATX	implementations currently do not require the space. However, the
	implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps	space was required by the
	prevent things like the following from being parsed as headers:	[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
		and it helps prevent things like the following from being parsed as
		headers:

	.	.
	#5 bolt	#5 bolt


		#foobar
	.	.
	<p>#5 bolt</p>	<p>#5 bolt</p>

		<p>#foobar</p>
	.	.

	This is not a header, because the first `#` is escaped:	This is not a header, because the first `#` is escaped:

	.	.
	\## foo	\## foo
	.	.
	<p>## foo</p>	<p>## foo</p>
	.	.


	skipping to change at line 1027	skipping to change at line 1037

	.	.
	a simple	a simple
	indented code block	indented code block
	.	.
	<pre><code>a simple	<pre><code>a simple
	indented code block	indented code block
	</code></pre>	</code></pre>
	.	.


	The contents are literal text, and do not get parsed as Markdown:	If there is any ambiguity between an interpretation of indentation
		as a code block and as indicating that material belongs to a [list
		item][list items], the list item interpretation takes precedence:

		.
		- foo

		bar
		.
		<ul>
		<li>
		<p>foo</p>
		<p>bar</p>
		</li>
		</ul>
		.

		.
		1. foo

		- bar
		.
		<ol>
		<li>
		<p>foo</p>
		<ul>
		<li>bar</li>
		</ul>
		</li>
		</ol>
		.

		The contents of a code block are literal text, and do not get parsed
		as Markdown:

	.	.
	<a/>	<a/>
	hi	hi

	- one	- one
	.	.
	<pre><code><a/>	<pre><code><a/>
	hi	hi


	skipping to change at line 2312	skipping to change at line 2355
	baz	baz
	> foo	> foo
	.	.
	<blockquote>	<blockquote>
	<p>bar	<p>bar
	baz	baz
	foo</p>	foo</p>
	</blockquote>	</blockquote>
	.	.


	Laziness only applies to lines that are continuations of	Laziness only applies to lines that would have been continuations of
	paragraphs. Lines containing characters or indentation that indicate	paragraphs had they been prepended with `>`. For example, the
	block structure cannot be lazy.	`>` cannot be omitted in the second line of

		``` markdown
		> foo
		> ---
		```

		without changing the meaning:

	.	.
	> foo	> foo
	---	---
	.	.
	<blockquote>	<blockquote>
	<p>foo</p>	<p>foo</p>
	</blockquote>	</blockquote>
	<hr />	<hr />
	.	.


		Similarly, if we omit the `>` in the second line of

		``` markdown
		> - foo
		> - bar
		```

		then the block quote ends after the first line:

	.	.
	> - foo	> - foo
	- bar	- bar
	.	.
	<blockquote>	<blockquote>
	<ul>	<ul>
	<li>foo</li>	<li>foo</li>
	</ul>	</ul>
	</blockquote>	</blockquote>
	<ul>	<ul>
	<li>bar</li>	<li>bar</li>
	</ul>	</ul>
	.	.


		For the same reason, we can't omit the `>` in front of
		subsequent lines of an indented or fenced code block:

	.	.
	> foo	> foo
	bar	bar
	.	.
	<blockquote>	<blockquote>
	<pre><code>foo	<pre><code>foo
	</code></pre>	</code></pre>
	</blockquote>	</blockquote>
	<pre><code>bar	<pre><code>bar
	</code></pre>	</code></pre>

	skipping to change at line 3808	skipping to change at line 3870
	List items need not be indented to the same level. The following	List items need not be indented to the same level. The following
	list items will be treated as items at the same list level,	list items will be treated as items at the same list level,
	since none is indented enough to belong to the previous list	since none is indented enough to belong to the previous list
	item:	item:

	.	.
	- a	- a
	- b	- b
	- c	- c
	- d	- d

	- e	- e
	- f	- f
	- g	- g
		- h
		- i
	.	.
	<ul>	<ul>
	<li>a</li>	<li>a</li>
	<li>b</li>	<li>b</li>
	<li>c</li>	<li>c</li>
	<li>d</li>	<li>d</li>
	<li>e</li>	<li>e</li>
	<li>f</li>	<li>f</li>
	<li>g</li>	<li>g</li>

		<li>h</li>
		<li>i</li>
	</ul>	</ul>
	.	.


		.
		1. a

		2. b

		3. c
		.
		<ol>
		<li>
		<p>a</p>
		</li>
		<li>
		<p>b</p>
		</li>
		<li>
		<p>c</p>
		</li>
		</ol>
		.

	This is a loose list, because there is a blank line between	This is a loose list, because there is a blank line between
	two of the list items:	two of the list items:

	.	.
	- a	- a
	- b	- b

	- c	- c
	.	.
	<ul>	<ul>

	skipping to change at line 4247	skipping to change at line 4333

	.	.
	& © Æ &Dcaron; ¾ &HilbertSpace; &DifferentialD; &Cl ockwiseContourIntegral;	& © Æ &Dcaron; ¾ &HilbertSpace; &DifferentialD; &Cl ockwiseContourIntegral;
	.	.
	<p> & © Æ Ď ¾ ℋ ⅆ ∲</p>	<p> & © Æ Ď ¾ ℋ ⅆ ∲</p>
	.	.

	[Decimal entities](@decimal-entities)	[Decimal entities](@decimal-entities)
	consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these	consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
	entities need to be recognised and transformed into their corresponding	entities need to be recognised and transformed into their corresponding

	unicode codepoints. Invalid unicode codepoints will be written as the	unicode codepoints. Invalid unicode codepoints will be replaced by
	"unknown codepoint" character (`0xFFFD`)	the "unknown codepoint" character (`U+FFFD`). For security reasons,
		the codepoint `U+0000` will also be replaced by `U+FFFD`.

	.	.

	# Ӓ Ϡ &#98765432;	# Ӓ Ϡ &#98765432;
	.	.

	<p># Ӓ Ϡ �</p>	<p># Ӓ Ϡ � �</p>
	.	.

	[Hexadecimal entities](@hexadecimal-entities)	[Hexadecimal entities](@hexadecimal-entities)
	consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits	consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
	+ `;`. They will also be parsed and turned into the corresponding	+ `;`. They will also be parsed and turned into the corresponding
	unicode codepoints in the AST.	unicode codepoints in the AST.

	.	.
	&#X22; &#XD06; ಫ	&#X22; &#XD06; ಫ
	.	.

	skipping to change at line 5032	skipping to change at line 5119
	__foo, __bar__, baz__	__foo, __bar__, baz__
	.	.
	<p><strong>foo, <strong>bar</strong>, baz</strong></p>	<p><strong>foo, <strong>bar</strong>, baz</strong></p>
	.	.

	This is strong emphasis, even though the opening delimiter is	This is strong emphasis, even though the opening delimiter is
	both left- and right-flanking, because it is preceded by	both left- and right-flanking, because it is preceded by
	punctuation:	punctuation:

	.	.

	foo-_(bar)_	foo-__(bar)__
	.	.

	<p>foo-<em>(bar)</em></p>	<p>foo-<strong>(bar)</strong></p>
	.	.

	Rule 7:	Rule 7:

	This is not strong emphasis, because the closing delimiter is preceded	This is not strong emphasis, because the closing delimiter is preceded
	by whitespace:	by whitespace:

	.	.
	foo bar	foo bar
	.	.

	skipping to change at line 5145	skipping to change at line 5232
	__foo__bar__baz__	__foo__bar__baz__
	.	.
	<p><strong>foo__bar__baz</strong></p>	<p><strong>foo__bar__baz</strong></p>
	.	.

	This is strong emphasis, even though the closing delimiter is	This is strong emphasis, even though the closing delimiter is
	both left- and right-flanking, because it is followed by	both left- and right-flanking, because it is followed by
	punctuation:	punctuation:

	.	.

	_(bar)_.	__(bar)__.
	.	.

	<p><em>(bar)</em>.</p>	<p><strong>(bar)</strong>.</p>
	.	.

	Rule 9:	Rule 9:

	Any nonempty sequence of inline elements can be the contents of an	Any nonempty sequence of inline elements can be the contents of an
	emphasized span.	emphasized span.

	.	.
	foo [bar](/url)	foo [bar](/url)
	.	.

	skipping to change at line 6047	skipping to change at line 6134
	There are three kinds of [reference link](@reference-link)s:	There are three kinds of [reference link](@reference-link)s:
	[full](#full-reference-link), [collapsed](#collapsed-reference-link),	[full](#full-reference-link), [collapsed](#collapsed-reference-link),
	and [shortcut](#shortcut-reference-link).	and [shortcut](#shortcut-reference-link).

	A [full reference link](@full-reference-link)	A [full reference link](@full-reference-link)
	consists of a [link text], optional [whitespace], and a [link label]	consists of a [link text], optional [whitespace], and a [link label]
	that [matches] a [link reference definition] elsewhere in the document.	that [matches] a [link reference definition] elsewhere in the document.

	A [link label](@link-label) begins with a left bracket (`[`) and ends	A [link label](@link-label) begins with a left bracket (`[`) and ends
	with the first right bracket (`]`) that is not backslash-escaped.	with the first right bracket (`]`) that is not backslash-escaped.

		Between these brackets there must be at least one non-[whitespace character].
	Unescaped square bracket characters are not allowed in	Unescaped square bracket characters are not allowed in
	[link label]s. A link label can have at most 999	[link label]s. A link label can have at most 999
	characters inside the square brackets.	characters inside the square brackets.

	One label [matches](@matches)	One label [matches](@matches)
	another just in case their normalized forms are equal. To normalize a	another just in case their normalized forms are equal. To normalize a
	label, perform the unicode case fold and collapse consecutive internal	label, perform the unicode case fold and collapse consecutive internal
	[whitespace] to a single space. If there are multiple	[whitespace] to a single space. If there are multiple
	matching reference link definitions, the one that comes first in the	matching reference link definitions, the one that comes first in the
	document is used. (It is desirable in such cases to emit a warning.)	document is used. (It is desirable in such cases to emit a warning.)

	skipping to change at line 6293	skipping to change at line 6381
	.	.

	.	.
	[foo][ref\[]	[foo][ref\[]

	[ref\[]: /uri	[ref\[]: /uri
	.	.
	<p><a href="/uri">foo</a></p>	<p><a href="/uri">foo</a></p>
	.	.


		A [link label] must contain at least one non-[whitespace character]:

		.
		[]

		[]: /uri
		.
		<p>[]</p>
		<p>[]: /uri</p>
		.

		.
		[
		]

		[
		]: /uri
		.
		<p>[
		]</p>
		<p>[
		]: /uri</p>
		.

	A [collapsed reference link](@collapsed-reference-link)	A [collapsed reference link](@collapsed-reference-link)
	consists of a [link label] that [matches] a	consists of a [link label] that [matches] a
	[link reference definition] elsewhere in the	[link reference definition] elsewhere in the
	document, optional [whitespace], and the string `[]`.	document, optional [whitespace], and the string `[]`.
	The contents of the first link label are parsed as inlines,	The contents of the first link label are parsed as inlines,
	which are used as the link's text. The link's URI and title are	which are used as the link's text. The link's URI and title are
	provided by the matching reference link definition. Thus,	provided by the matching reference link definition. Thus,
	`[foo][]` is equivalent to `[foo][foo]`.	`[foo][]` is equivalent to `[foo][foo]`.

	.	.

End of changes. 27 change blocks.
	42 lines changed or deleted	154 lines changed or added
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/