Couldn't find wdiff. Falling back to builtin diff colouring... Diff: spec.txt - spec.txt
spec.txt   spec.txt
--- ---
title: CommonMark Spec title: CommonMark Spec
author: John MacFarlane author: John MacFarlane
version: 0.2 version: 0.21
date: 2015-0 date: 2015-07-14
... ...
# Introduction # Introduction
## What is Markdown? ## What is Markdown?
Markdown is a plain text format for writing structured documents, Markdown is a plain text format for writing structured documents,
based on conventions used for indicating formatting in email and based on conventions used for indicating formatting in email and
usenet posts. It was developed in 2004 by John Gruber, who wrote usenet posts. It was developed in 2004 by John Gruber, who wrote
skipping to change at line 240 skipping to change at line 240
A [unicode whitespace character](@unicode-whitespace-character) is A [unicode whitespace character](@unicode-whitespace-character) is
any code point in the unicode Zs class, or a tab (U+0009), any code point in the unicode Zs class, or a tab (U+0009),
carriage return (U+000D), newline (U+000A), or form feed carriage return (U+000D), newline (U+000A), or form feed
(U+000C). (U+000C).
[Unicode whitespace](@unicode-whitespace) is a sequence of one [Unicode whitespace](@unicode-whitespace) is a sequence of one
or more [unicode whitespace character]s. or more [unicode whitespace character]s.
A [space](@space) is U+0020. A [space](@space) is U+0020.
A [non-space character](@non-space-character) is any character A [non-whitespace character](@non-space-character) is any character
that is not a [whitespace character]. that is not a [whitespace character].
An [ASCII punctuation character](@ascii-punctuation-character) An [ASCII punctuation character](@ascii-punctuation-character)
is !, ", #, $, %, &, ', (, ), is !, ", #, $, %, &, ', (, ),
*, +, ,, -, ., /, :, ;, <, =, >, ?, @, *, +, ,, -, ., /, :, ;, <, =, >, ?, @,
[, \, ], ^, _,   , {, |, }, or ~. [, \, ], ^, _,   , {, |, }, or ~.
A [punctuation character](@punctuation-character) is an [ASCII A [punctuation character](@punctuation-character) is an [ASCII
punctuation character] or anything in punctuation character] or anything in
the unicode classes Pc, Pd, Pe, Pf, Pi, Po, or Ps. the unicode classes Pc, Pd, Pe, Pf, Pi, Po, or Ps.
## ## Tabs
Tabs in lines are Tabs in lines are not expanded to [spaces][space]. However,
in contexts where indentation is significant for the
document's structure, tabs behave as if they were replaced
by spaces with a tab stop of 4 characters.
. .
→foo→baz→→bim →foo→baz→→bim
. .
<pre><code>foobim <pre><code>foo→baz→→bim
</code></pre>
.
.
→foo→baz→→bim
.
<pre><code>foo→baz→→bim
</code></pre> </code></pre>
. .
. .
a→a a→a
ὐ→a ὐ→a
. .
<pre><code>aa <pre><code>aa
a a
</code></pre> </code></pre>
. .
.
- foo
→bar
.
<ul>
<li>
<p>foo</p>
<p>bar</p>
</li>
</ul>
.
.
>→foo→bar
.
<blockquote>
<p>foo→bar</p>
</blockquote>
.
## Insecure characters ## Insecure characters
For security reasons, the Unicode character U+0000 must be replaced For security reasons, the Unicode character U+0000 must be replaced
with the replacement character (U+FFFD). with the replacement character (U+FFFD).
# Blocks and inlines # Blocks and inlines
We can think of a document as a sequence of We can think of a document as a sequence of
[blocks](@block)---structural elements like paragraphs, block [blocks](@block)---structural elements like paragraphs, block
quotations, lists, headers, rules, and code blocks. Some blocks (like quotations, lists, headers, rules, and code blocks. Some blocks (like
skipping to change at line 446 skipping to change at line 476
a------ a------
---a--- ---a---
. .
<p>_ _ _ _ a</p> <p>_ _ _ _ a</p>
<p>a------</p> <p>a------</p>
<p>---a---</p> <p>---a---</p>
. .
It is required that all of the [non-space character]s be the same. It is required that all of the [non-whitespace character]s be the same.
So, this is not a horizontal rule: So, this is not a horizontal rule:
. .
*-* *-*
. .
<p><em>-</em></p> <p><em>-</em></p>
. .
Horizontal rules do not need blank lines before or after: Horizontal rules do not need blank lines before or after:
skipping to change at line 536 skipping to change at line 566
</ul> </ul>
. .
consists of a string of characters, parsed as inline content, between an consists of a string of characters, parsed as inline content, between an
opening sequence of 1--6 unescaped # characters and an optional opening sequence of 1--6 unescaped # characters and an optional
closing sequence of any number of # characters. The opening sequence closing sequence of any number of # characters. The opening sequence
of # characters cannot be followed directly by a of # characters cannot be followed directly by a
[non-space character]. The optional closing sequence of #s must be [non-whitespace character]. The optional closing sequence of #s must be
preceded by a [space] and may be followed by spaces only. The opening preceded by a [space] and may be followed by spaces only. The opening
# character may be indented 0-3 spaces. The raw contents of the # character may be indented 0-3 spaces. The raw contents of the
as inline content. The header level is equal to the number of # as inline content. The header level is equal to the number of #
characters in the opening sequence. characters in the opening sequence.
. .
# foo # foo
skipping to change at line 668 skipping to change at line 698
Spaces are allowed after the closing sequence: Spaces are allowed after the closing sequence:
. .
### foo ### ### foo ###
. .
<h3>foo</h3> <h3>foo</h3>
. .
A sequence of # characters with a A sequence of # characters with a
[non-space character] following it [non-whitespace character] following it
is not a closing sequence, but counts as part of the contents of the is not a closing sequence, but counts as part of the contents of the
. .
### foo ### b ### foo ### b
. .
<h3>foo ### b</h3> <h3>foo ### b</h3>
. .
The closing sequence must be preceded by a space: The closing sequence must be preceded by a space:
skipping to change at line 737 skipping to change at line 767
### ### ### ###
. .
<h2></h2> <h2></h2>
<h1></h1> <h1></h1>
<h3></h3> <h3></h3>
. .
consists of a line of text, containing at least one [non-space character], consists of a line of text, containing at least one [non-whitespace character],
with no more than 3 spaces indentation, followed by a [setext header with no more than 3 spaces indentation, followed by a [setext header
underline]. The line of text must be underline]. The line of text must be
one that, were it not followed by the setext header underline, one that, were it not followed by the setext header underline,
would be interpreted as part of a paragraph: it cannot be would be interpreted as part of a paragraph: it cannot be
[block quote][block quotes], [horizontal rule][horizontal rules], [block quote][block quotes], [horizontal rule][horizontal rules],
[list item][list items], or [HTML block][HTML blocks]. [list item][list items], or [HTML block][HTML blocks].
= characters or a sequence of - characters, with no more than 3 = characters or a sequence of - characters, with no more than 3
skipping to change at line 1312 skipping to change at line 1342
~~~~ ~~~~
aaa aaa
~~~ ~~~
~~~~ ~~~~
. .
<pre><code>aaa <pre><code>aaa
~~~ ~~~
</code></pre> </code></pre>
. .
Unclosed code blocks are closed by the end of the document Unclosed code blocks are closed by the end of the document
(or the enclosing [block quote] or [list item]):
. .
 
. .
<pre><code></code></pre> <pre><code></code></pre>
. .
. .
 
 
aaa aaa
. .
<pre><code> <pre><code>
 
aaa aaa
</code></pre> </code></pre>
. .
.
> 
> aaa
bbb
.
<blockquote>
<pre><code>aaa
</code></pre>
</blockquote>
<p>bbb</p>
.
A code block can have all empty lines as its content: A code block can have all empty lines as its content:
. .
 
 
. .
<pre><code> <pre><code>
</code></pre> </code></pre>
skipping to change at line 1554 skipping to change at line 1598
 
 aaa  aaa
 
. .
<pre><code> aaa <pre><code> aaa
</code></pre> </code></pre>
. .
## HTML blocks ## HTML blocks
An [HTML block An [HTML block](@html-block) is a group of lines that is treated
a as raw HTML (and will not be escaped in HTML output).
There are seven kinds of [HTML block], which can be defined
by their start and end conditions. The block begins with a line that
meets a [start condition](@start-condition) (after up to three spaces
optional indentation). It ends with the first subsequent line that
meets a matching [end condition](@end-condition), or the last line of
the document, if no line is encountered that meets the
[end condition]. If the first line meets both the [start condition]
. and the [end condition], the block will contain just that line.
1. **Start condition:** line begins with the string <script,
<pre, or <style (case-insensitive), followed by whitespace,
the string >, or the end of the line.\
**End condition:** line contains an end tag
</script>, </pre>, or </style> (case-insensitive; it
need not match the start tag).
2. **Start condition:** line begins with the string <!--.\
**End condition:** line contains the string -->.
3. **Start condition:** line begins with the string <?.\
**End condition:** line contains the string ?>.
4. **Start condition:** line begins with the string <!
followed by an uppercase ASCII letter.\
**End condition:** line contains the character >.
5. **Start condition:** line begins with the string
<![CDATA[.\
**End condition:** line contains the string ]]>.
6. **Start condition:** line begins the string < or </
followed by one of the strings (case-insensitive) address,
article, aside, base, basefont, blockquote, body,
caption, center, col, colgroup, dd, details, dialog,
dir, div, dl, dt, fieldset, figcaption, figure,
footer, form, frame, frameset, h1, head, header, hr,
html, legend, li, link, main, menu, menuitem, meta,
nav, noframes, ol, optgroup, option, p, param, pre,
section, source, title, summary, table, tbody, td,
tfoot, th, thead, title, tr, track, ul, followed
by [whitespace], the end of the line, the string >, or
the string />.\
**End condition:** line is followed by a [blank line].
7. **Start condition:** line begins with an [open tag]
(with any [tag name]) followed only by [whitespace] or the end
of the line.\
**End condition:** line is followed by a [blank line].
All types of [HTML blocks] except type 7 may interrupt
a paragraph. Blocks of type 7 may not interrupt a paragraph.
(This restricted is intended to prevent unwanted interpretation
of long tags inside a wrapped paragraph as starting HTML blocks.)
Some simple examples follow. Here are some basic HTML blocks
of type 6:
. .
<table> <table>
<tr> <tr>
<td> <td>
hi hi
</td> </td>
</tr> </tr>
</table> </table>
skipping to change at line 1607 skipping to change at line 1689
. .
<div> <div>
*hello* *hello*
<foo><a> <foo><a>
. .
<div> <div>
*hello* *hello*
<foo><a> <foo><a>
. .
.
</div>
*foo*
.
</div>
*foo*
.
Here we have two HTML blocks with a Markdown paragraph between them: Here we have two HTML blocks with a Markdown paragraph between them:
. .
<DIV CLASS="foo"> <DIV CLASS="foo">
*Markdown* *Markdown*
</DIV> </DIV>
. .
<DIV CLASS="foo"> <DIV CLASS="foo">
<p><em>Markdown</em></p> <p><em>Markdown</em></p>
</DIV> </DIV>
. .
The tag on the first line can be partial, as long
as it is split where there would be whitespace:
.
<div id="foo"
class="bar">
</div>
.
<div id="foo"
class="bar">
</div>
.
.
<div id="foo" class="bar
baz">
</div>
.
<div id="foo" class="bar
baz">
</div>
.
An open tag need not be closed:
.
<div>
*foo*
*bar*
.
<div>
*foo*
<p><em>bar</em></p>
.
A partial tag need not even be completed (garbage
in, garbage out):
.
<div id="foo"
*hi*
.
<div id="foo"
*hi*
.
.
<div class
foo
.
<div class
foo
.
The initial tag doesn't even need to be a valid
tag, as long as it starts like one:
.
<div *???-&&&-<---
*foo*
.
<div *???-&&&-<---
*foo*
.
In type 6 blocks, the initial tag need not be on a line by
itself:
.
<div><a href="bar">*foo*</a></div>
.
<div><a href="bar">*foo*</a></div>
.
.
<table><tr><td>
foo
</td></tr></table>
.
<table><tr><td>
foo
</td></tr></table>
.
Everything until the next blank line or end of document
gets included in the HTML block. So, in the following
example, what looks like a Markdown code block
is actually part of the HTML block, which continues until a blank is actually part of the HTML block, which continues until a blank
line or the end of the document is reached: line or the end of the document is reached:
. .
<div></div> <div></div>
 c  c
int x = 33; int x = 33;
 
. .
<div></div> <div></div>
 c  c
int x = 33; int x = 33;
 
. .
To start an [HTML block] with a tag that is *not* in the
list of block-level tags in (6), you must put the tag by
itself on the first line (and it must be complete):
.
<a href="foo">
*bar*
</a>
.
<a href="foo">
*bar*
</a>
.
In type 7 blocks, the [tag name] can be anything:
.
<Warning>
*bar*
</Warning>
.
<Warning>
*bar*
</Warning>
.
.
<i class="foo">
*bar*
</i>
.
<i class="foo">
*bar*
</i>
.
These rules are designed to allow us to work with tags that
can function as either block-level or inline-level tags.
The <del> tag is a nice example. We can surround content with
<del> tags in three different ways. In this case, we get a raw
HTML block, because the <del> tag is on a line by itself:
.
<del>
*foo*
</del>
.
<del>
*foo*
</del>
.
In this case, we get a raw HTML block that just includes
the <del> tag (because it ends with the following blank
line). So the contents get interpreted as CommonMark:
.
<del>
*foo*
</del>
.
<del>
<p><em>foo</em></p>
</del>
.
Finally, in this case, the <del> tags are interpreted
as [raw HTML] *inside* the CommonMark paragraph. (Because
the tag is not on a line by itself, we get inline HTML
rather than an [HTML block].)
.
<del>*foo*</del>
.
<p><del><em>foo</em></del></p>
.
HTML tags designed to contain literal content
(script, style, pre), comments, processing instructions,
and declarations are treated somewhat differently.
Instead of ending at the first blank line, these blocks
end at the first line containing a corresponding end tag.
As a result, these blocks can contain blank lines:
A pre tag (type 1):
.
import Text.HTML.TagSoup
main :: IO ()
main = print $parseTags tags </code></pre> . <pre language="haskell"><code> import Text.HTML.TagSoup main :: IO () main = print$ parseTags tags
</code></pre>
.
A script tag (type 1):
.
<script type="text/javascript">
// JavaScript example
document.getElementById("demo").innerHTML = "Hello JavaScript!";
</script>
.
<script type="text/javascript">
// JavaScript example
document.getElementById("demo").innerHTML = "Hello JavaScript!";
</script>
.
A style tag (type 1):
.
<style
type="text/css">
h1 {color:red;}
p {color:blue;}
</style>
.
<style
type="text/css">
h1 {color:red;}
p {color:blue;}
</style>
.
If there is no matching end tag, the block will end at the
end of the document (or the enclosing [block quote] or
[list item]):
.
<style
type="text/css">
foo
.
<style
type="text/css">
foo
.
.
> <div>
> foo
bar
.
<blockquote>
<div>
foo
</blockquote>
<p>bar</p>
.
.
- <div>
- foo
.
<ul>
<li>
<div>
</li>
<li>foo</li>
</ul>
.
The end tag can occur on the same line as the start tag:
.
<style>p{color:red;}</style>
*foo*
.
<style>p{color:red;}</style>
<p><em>foo</em></p>
.
.
<!-- foo -->*bar*
*baz*
.
<!-- foo -->*bar*
<p><em>baz</em></p>
.
Note that anything on the last line after the
end tag will be included in the [HTML block]:
.
<script>
foo
</script>1. *bar*
.
<script>
foo
</script>1. *bar*
.
A comment (type 2):
. .
<!-- Foo <!-- Foo
bar bar
baz --> baz -->
. .
<!-- Foo <!-- Foo
bar bar
baz --> baz -->
. .
A processing instruction: A processing instruction (type 3):
. .
<?php <?php
echo '>'; echo '>';
?> ?>
. .
<?php <?php
echo '>'; echo '>';
?> ?>
. .
: A declaration (type 4):
.
<!DOCTYPE html>
.
<!DOCTYPE html>
.
CDATA (type 5):
. .
<![CDATA[ <![CDATA[
function matchwo(a,b) function matchwo(a,b)
{ {
if (a < b && a < 0) then {
return 1;
} } else {
return 0;
} }
} }
]]> ]]>
. .
<![CDATA[ <![CDATA[
function matchwo(a,b) function matchwo(a,b)
{ {
if (a < b && a < 0) then {
return 1;
} } else {
return 0;
} }
} }
]]> ]]>
. .
The opening tag can be indented 1-3 spaces, but not 4: The opening tag can be indented 1-3 spaces, but not 4:
. .
<!-- foo --> <!-- foo -->
<!-- foo --> <!-- foo -->
. .
<!-- foo --> <!-- foo -->
<pre><code>&lt;!-- foo --&gt; <pre><code>&lt;!-- foo --&gt;
</code></pre> </code></pre>
. .
.
<div>
<div>
.
<div>
<pre><code>&lt;div&gt;
</code></pre>
.
An HTML block of types 1--6 can interrupt a paragraph, and need not be
preceded by a blank line.
. .
Foo Foo
<div> <div>
bar bar
</div> </div>
. .
<p>Foo</p> <p>Foo</p>
<div> <div>
bar bar
</div> </div>
. .
However, a following blank line is needed, except at the end of However, a following blank line is needed, except at the end of
a document: a document, and except for blocks of types 1--5, above:
. .
<div> <div>
bar bar
</div> </div>
*foo* *foo*
. .
<div> <div>
bar bar
</div> </div>
*foo* *foo*
. .
: HTML blocks of type 7 cannot interrupt a paragraph:
. .
Foo
<a href="bar">
baz
. .
< <p>Foo
<a href="bar">
baz</p>
. .
This rule differs from John Gruber's original Markdown syntax This rule differs from John Gruber's original Markdown syntax
specification, which says: specification, which says:
> The only restrictions are that block-level HTML elements — > The only restrictions are that block-level HTML elements —
> e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from > e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from
> surrounding content by blank lines, and the start and end tags of the > surrounding content by blank lines, and the start and end tags of the
> block should not be indented with tabs or spaces. > block should not be indented with tabs or spaces.
In some ways Gruber's rule is more restrictive than the one given In some ways Gruber's rule is more restrictive than the one given
here: here:
- It requires that an HTML block be preceded by a blank line. - It requires that an HTML block be preceded by a blank line.
- It does not allow the start tag to be indented. - It does not allow the start tag to be indented.
- It requires a matching end tag, which it also does not allow to - It requires a matching end tag, which it also does not allow to
be indented. be indented.
Most Markdown implementations (including some of Gruber's own) do not
these restrictions. respect all of these restrictions.
There is one respect, however, in which Gruber's rule is more liberal There is one respect, however, in which Gruber's rule is more liberal
than the one given here, since it allows blank lines to occur inside than the one given here, since it allows blank lines to occur inside
an HTML block. There are two reasons for disallowing them here. an HTML block. There are two reasons for disallowing them here.
First, it removes the need to parse balanced tags, which is First, it removes the need to parse balanced tags, which is
expensive and can require backtracking from the end of the document expensive and can require backtracking from the end of the document
if no matching end tag is found. Second, it provides a very simple if no matching end tag is found. Second, it provides a very simple
and flexible way of including Markdown content inside HTML tags: and flexible way of including Markdown content inside HTML tags:
simply separate the Markdown from the HTML using blank lines: simply separate the Markdown from the HTML using blank lines:
Compare:
. .
<div> <div>
*Emphasized* text. *Emphasized* text.
</div> </div>
. .
<div> <div>
<p><em>Emphasized</em> text.</p> <p><em>Emphasized</em> text.</p>
</div> </div>
. .
. .
<div> <div>
*Emphasized* text. *Emphasized* text.
</div> </div>
. .
<div> <div>
*Emphasized* text. *Emphasized* text.
</div> </div>
. .
skipping to change at line 1830 skipping to change at line 2242
. .
<table> <table>
<tr> <tr>
<td> <td>
Hi Hi
</td> </td>
</tr> </tr>
</table> </table>
. .
There are problems, however, if the inner tags are indented
*and* separated by spaces, as then they will be interpreted as
an indented code block:
. .
<table>
<tr>
<td>
Hi
</td>
</tr>
</table>
.
<table>
<tr>
<pre><code>&lt;td&gt;
Hi
&lt;/td&gt;
</code></pre>
</tr>
</table>
.
Fortunately, blank lines are usually not necessary and can be
deleted. The exception is inside <pre> tags, but as described
above, raw HTML blocks starting with <pre> *can* contain blank
lines.
consists of a [link label], indented up to three spaces, followed consists of a [link label], indented up to three spaces, followed
by a colon (:), optional [whitespace] (including up to one by a colon (:), optional [whitespace] (including up to one
optional [whitespace] (including up to one optional [whitespace] (including up to one
[line ending]), and an optional [link [line ending]), and an optional [link
title], which if it is present must be separated title], which if it is present must be separated
from the [link destination] by [whitespace]. from the [link destination] by [whitespace].
No further [non-space character]s may occur on the line. No further [non-whitespace character]s may occur on the line.
does not correspond to a structural element of a document. Instead, it does not correspond to a structural element of a document. Instead, it
defines a label which can be used in [reference link]s defines a label which can be used in [reference link]s
and reference-style [images] elsewhere in the document. [Link and reference-style [images] elsewhere in the document. [Link
reference definitions] can come either before or after the links that use reference definitions] can come either before or after the links that use
them. them.
. .
[foo]: /url "title" [foo]: /url "title"
skipping to change at line 1945 skipping to change at line 2383
. .
[foo]: [foo]:
[foo] [foo]
. .
<p>[foo]:</p> <p>[foo]:</p>
<p>[foo]</p> <p>[foo]</p>
. .
Both title and destination can contain backslash escapes
and literal backslashes:
.
[foo]: /url\bar\*baz "foo\"bar\baz"
[foo]
.
<p><a href="/url%5Cbar*baz" title="foo&quot;bar\baz">foo</a></p>
.
A link can come before its corresponding definition: A link can come before its corresponding definition:
. .
[foo] [foo]
[foo]: url [foo]: url
. .
<p><a href="url">foo</a></p> <p><a href="url">foo</a></p>
. .
skipping to change at line 2006 skipping to change at line 2455
. .
[ [
foo foo
]: /url ]: /url
bar bar
. .
<p>bar</p> <p>bar</p>
. .
This is not a link reference definition, because there are This is not a link reference definition, because there are
[non-space character]s after the title: [non-whitespace character]s after the title:
. .
[foo]: /url "title" ok [foo]: /url "title" ok
. .
<p>[foo]: /url &quot;title&quot; ok</p> <p>[foo]: /url &quot;title&quot; ok</p>
. .
This is a link reference definition, but it has no title:
.
[foo]: /url
"title" ok
.
<p>&quot;title&quot; ok</p>
.
This is not a link reference definition, because it is indented This is not a link reference definition, because it is indented
four spaces: four spaces:
. .
[foo]: /url "title" [foo]: /url "title"
[foo] [foo]
. .
<pre><code>[foo]: /url &quot;title&quot; <pre><code>[foo]: /url &quot;title&quot;
</code></pre> </code></pre>
skipping to change at line 2240 skipping to change at line 2698
form of the definition is: form of the definition is:
> If X is a sequence of blocks, then the result of > If X is a sequence of blocks, then the result of
> transforming X in such-and-such a way is a container of type Y > transforming X in such-and-such a way is a container of type Y
> with these blocks as its content. > with these blocks as its content.
So, we explain what counts as a block quote or list item by explaining So, we explain what counts as a block quote or list item by explaining
how these can be *generated* from their contents. This should suffice how these can be *generated* from their contents. This should suffice
to define the syntax, although it does not give a recipe for *parsing* to define the syntax, although it does not give a recipe for *parsing*
these constructions. (A recipe is provided below in the section entitled these constructions. (A recipe is provided below in the section entitled
[A parsing strategy](#appendix-a-parsing-strategy).) [A parsing strategy](#appendix-a-parsing-strategy).)
## Block quotes ## Block quotes
A [block quote marker](@block-quote-marker) A [block quote marker](@block-quote-marker)
consists of 0-3 spaces of initial indent, plus (a) the character > together consists of 0-3 spaces of initial indent, plus (a) the character > together
with a following space, or (b) a single character > not followed by a space. with a following space, or (b) a single character > not followed by a space.
The following rules define [block quotes]: The following rules define [block quotes]:
1. **Basic case.** If a string of lines *Ls* constitute a sequence 1. **Basic case.** If a string of lines *Ls* constitute a sequence
of blocks *Bs*, then the result of prepending a [block quote of blocks *Bs*, then the result of prepending a [block quote
marker] to the beginning of each line in *Ls* marker] to the beginning of each line in *Ls*
is a [block quote](#block-quotes) containing *Bs*. is a [block quote](#block-quotes) containing *Bs*.
2. **Laziness.** If a string of lines *Ls* constitute a [block 2. **Laziness.** If a string of lines *Ls* constitute a [block
quote](#block-quotes) with contents *Bs*, then the result of deleting quote](#block-quotes) with contents *Bs*, then the result of deleting
the initial [block quote marker] from one or the initial [block quote marker] from one or
more lines in which the next [non-space character] after the [block more lines in which the next [non-whitespace character] after the [block
quote marker] is [paragraph continuation quote marker] is [paragraph continuation
text] is a block quote with *Bs* as its content. text] is a block quote with *Bs* as its content.
[Paragraph continuation text](@paragraph-continuation-text) is text [Paragraph continuation text](@paragraph-continuation-text) is text
that will be parsed as part of the content of a paragraph, but does that will be parsed as part of the content of a paragraph, but does
not occur at the beginning of the paragraph. not occur at the beginning of the paragraph.
3. **Consecutiveness.** A document cannot contain two [block 3. **Consecutiveness.** A document cannot contain two [block
quotes] in a row unless there is a [blank line] between them. quotes] in a row unless there is a [blank line] between them.
Nothing else counts as a [block quote](#block-quotes). Nothing else counts as a [block quote](#block-quotes).
skipping to change at line 2628 skipping to change at line 3086
## List items ## List items
A [list marker](@list-marker) is a A [list marker](@list-marker) is a
[bullet list marker] or an [ordered list marker]. [bullet list marker] or an [ordered list marker].
A [bullet list marker](@bullet-list-marker) A [bullet list marker](@bullet-list-marker)
is a -, +, or * character. is a -, +, or * character.
An [ordered list marker](@ordered-list-marker) An [ordered list marker](@ordered-list-marker)
is a sequence of digits (0-9), followed by either a is a sequence of 1--9 arabic digits (0-9), followed by either a
. character or a ) character. . character or a ) character. (The reason for the length
limit is that with 10 digits we start seeing integer overflows
in some browsers.)
The following rules define [list items]: The following rules define [list items]:
1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
blocks *Bs* starting with a [non-space character] and not separated blocks *Bs* starting with a [non-whitespace character] and not separated
from each other by more than one blank line, and *M* is a list from each other by more than one blank line, and *M* is a list
marker of width *W* followed by 0 < *N* < 5 spaces, then the result marker of width *W* followed by 0 < *N* < 5 spaces, then the result
of prepending *M* and the following spaces to the first line of of prepending *M* and the following spaces to the first line of
*Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
list item with *Bs* as its contents. The type of the list item list item with *Bs* as its contents. The type of the list item
(bullet or ordered) is determined by the type of its list marker. (bullet or ordered) is determined by the type of its list marker.
If the list item is ordered, then it is also assigned a start If the list item is ordered, then it is also assigned a start
number, based on the ordered list marker. number, based on the ordered list marker.
For example, let *Ls* be the lines For example, let *Ls* be the lines
skipping to change at line 2692 skipping to change at line 3152
<p>A block quote.</p> <p>A block quote.</p>
</blockquote> </blockquote>
</li> </li>
</ol> </ol>
. .
The most important thing to notice is that the position of The most important thing to notice is that the position of
the text after the list marker determines how much indentation the text after the list marker determines how much indentation
is needed in subsequent blocks in the list item. If the list is needed in subsequent blocks in the list item. If the list
marker takes up two spaces, and there are three spaces between marker takes up two spaces, and there are three spaces between
the list marker and the next [non-space character], then blocks the list marker and the next [non-whitespace character], then blocks
must be indented five spaces in order to fall under the list must be indented five spaces in order to fall under the list
item. item.
Here are some examples showing how far content must be indented to be Here are some examples showing how far content must be indented to be
put under the list item: put under the list item:
. .
- one - one
two two
skipping to change at line 2750 skipping to change at line 3210
<ul> <ul>
<li> <li>
<p>one</p> <p>one</p>
<p>two</p> <p>two</p>
</li> </li>
</ul> </ul>
. .
It is tempting to think of this in terms of columns: the continuation It is tempting to think of this in terms of columns: the continuation
blocks must be indented at least to the column of the first blocks must be indented at least to the column of the first
[non-ht. [non-whitespace character] after the list marker. However, that is not quite rig ht.
The spaces after the list marker determine how much relative indentation The spaces after the list marker determine how much relative indentation
is needed. Which column this indentation reaches will depend on is needed. Which column this indentation reaches will depend on
how the list item is embedded in other constructions, as shown by how the list item is embedded in other constructions, as shown by
this example: this example:
. .
> > 1. one > > 1. one
>> >>
>> two >> two
. .
skipping to change at line 2893 skipping to change at line 3353
<pre><code>bar <pre><code>bar
</code></pre> </code></pre>
<p>baz</p> <p>baz</p>
<blockquote> <blockquote>
<p>bam</p> <p>bam</p>
</blockquote> </blockquote>
</li> </li>
</ol> </ol>
. .
Note that ordered list start numbers must be nine digits or less:
.
123456789. ok
.
<ol start="123456789">
<li>ok</li>
</ol>
.
.
1234567890. not ok
.
<p>1234567890. not ok</p>
.
A start number may begin with 0s:
.
0. ok
.
<ol start="0">
<li>ok</li>
</ol>
.
.
003. ok
.
<ol start="3">
<li>ok</li>
</ol>
.
A start number may not be negative:
.
-1. not ok
.
<p>-1. not ok</p>
.
2. **Item starting with indented code.** If a sequence of lines *Ls* 2. **Item starting with indented code.** If a sequence of lines *Ls*
constitute a sequence of blocks *Bs* starting with an indented code constitute a sequence of blocks *Bs* starting with an indented code
block and not separated from each other by more than one blank line, block and not separated from each other by more than one blank line,
and *M* is a list marker of width *W* followed by and *M* is a list marker of width *W* followed by
one space, then the result of prepending *M* and the following one space, then the result of prepending *M* and the following
space to the first line of *Ls*, and indenting subsequent lines of space to the first line of *Ls*, and indenting subsequent lines of
*Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
If a line is empty, then it need not be indented. The type of the If a line is empty, then it need not be indented. The type of the
list item (bullet or ordered) is determined by the type of its list list item (bullet or ordered) is determined by the type of its list
marker. If the list item is ordered, then it is also assigned a marker. If the list item is ordered, then it is also assigned a
skipping to change at line 2998 skipping to change at line 3500
</code></pre> </code></pre>
<p>paragraph</p> <p>paragraph</p>
<pre><code>more code <pre><code>more code
</code></pre> </code></pre>
</li> </li>
</ol> </ol>
. .
Note that rules #1 and #2 only apply to two cases: (a) cases Note that rules #1 and #2 only apply to two cases: (a) cases
in which the lines to be included in a list item begin with a in which the lines to be included in a list item begin with a
[non-space character], and (b) cases in which [non-whitespace character], and (b) cases in which
they begin with an indented code they begin with an indented code
block. In a case like the following, where the first block begins with block. In a case like the following, where the first block begins with
a three-space indent, the rules do not allow us to form a list item by a three-space indent, the rules do not allow us to form a list item by
indenting the whole thing and prepending a list marker: indenting the whole thing and prepending a list marker:
. .
foo foo
bar bar
. .
skipping to change at line 3228 skipping to change at line 3730
indented code indented code
&gt; A block quote. &gt; A block quote.
</code></pre> </code></pre>
. .
5. **Laziness.** If a string of lines *Ls* constitute a [list 5. **Laziness.** If a string of lines *Ls* constitute a [list
item](#list-items) with contents *Bs*, then the result of deleting item](#list-items) with contents *Bs*, then the result of deleting
some or all of the indentation from one or more lines in which the some or all of the indentation from one or more lines in which the
next [non-space character] after the indentation is next [non-whitespace character] after the indentation is
[paragraph continuation text] is a [paragraph continuation text] is a
list item with the same contents and attributes. The unindented list item with the same contents and attributes. The unindented
lines are called lines are called
[lazy continuation line](@lazy-continuation-line)s. [lazy continuation line](@lazy-continuation-line)s.
Here is an example with [lazy continuation line]s: Here is an example with [lazy continuation line]s:
. .
1. A paragraph 1. A paragraph
with two lines. with two lines.
skipping to change at line 4201 skipping to change at line 4703
. .
<p>!&quot;#$%&amp;'()*+,-./:;&lt;=&gt;?@[\]^_{|}~</p> <p>!&quot;#$%&amp;'()*+,-./:;&lt;=&gt;?@[\]^_{|}~</p>
. .
Backslashes before other characters are treated as literal Backslashes before other characters are treated as literal
backslashes: backslashes:
. .
\→\A\a\ \3\φ\« \→\A\a\ \3\φ\«
. .
<p>\\A\a\ \3\φ\«</p> <p>\\A\a\ \3\φ\«</p>
. .
Escaped characters are treated as regular characters and do Escaped characters are treated as regular characters and do
not have their usual Markdown meanings: not have their usual Markdown meanings:
. .
\*not emphasized* \*not emphasized*
\<br/> not a tag \<br/> not a tag
\not code \not code
skipping to change at line 4279 skipping to change at line 4781
. .
<http://example.com?find=\*> <http://example.com?find=\*>
. .
<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> <p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
. .
. .
<a href="/bar\/)"> <a href="/bar\/)">
. .
<> <a href="/bar\/)">
. .
But they work in all other contexts, including URLs and link titles, But they work in all other contexts, including URLs and link titles,
link references, and [info string]s in [fenced code block]s: link references, and [info string]s in [fenced code block]s:
. .
[foo](/bar\* "ti\*tle") [foo](/bar\* "ti\*tle")
. .
<p><a href="/bar*" title="ti*tle">foo</a></p> <p><a href="/bar*" title="ti*tle">foo</a></p>
. .
skipping to change at line 4325 skipping to change at line 4827
unicode characters as entities or leave them as they are. (However, unicode characters as entities or leave them as they are. (However,
", &, <, and > must always be rendered as entities.) ", &, <, and > must always be rendered as entities.)
[Named entities](@name-entities) consist of & [Named entities](@name-entities) consist of &
+ any of the valid HTML5 entity names + ;. The + any of the valid HTML5 entity names + ;. The
[following document](https://html.spec.whatwg.org/multipage/entities.json) [following document](https://html.spec.whatwg.org/multipage/entities.json)
is used as an authoritative source of the valid entity names and their is used as an authoritative source of the valid entity names and their
corresponding codepoints. corresponding codepoints.
. .
&nbsp; &amp; &copy; &AElig; &Dcaron; &nbsp; &amp; &copy; &AElig; &Dcaron;
&frac34; &HilbertSpace; &DifferentialD;
&ClockwiseContourIntegral; &ngE;
. .
¾ ℋ ⅆ
∲ ≧̸</p>
. .
[Decimal entities](@decimal-entities) [Decimal entities](@decimal-entities)
consist of &# + a string of 1--8 arabic digits + ;. Again, these consist of &# + a string of 1--8 arabic digits + ;. Again, these
entities need to be recognised and transformed into their corresponding entities need to be recognised and transformed into their corresponding
unicode codepoints. Invalid unicode codepoints will be replaced by unicode codepoints. Invalid unicode codepoints will be replaced by
the "unknown codepoint" character (U+FFFD). For security reasons, the "unknown codepoint" character (U+FFFD). For security reasons,
the codepoint U+0000 will also be replaced by U+FFFD. the codepoint U+0000 will also be replaced by U+FFFD.
. .
skipping to change at line 4388 skipping to change at line 4894
. .
Entities are recognized in any context besides code spans or Entities are recognized in any context besides code spans or
code blocks, including raw HTML, URLs, [link title]s, and code blocks, including raw HTML, URLs, [link title]s, and
[fenced code block] [info string]s: [fenced code block] [info string]s:
. .
<a href="&ouml;&ouml;.html"> <a href="&ouml;&ouml;.html">
. .
<> <a href="&ouml;&ouml;.html">
. .
. .
[foo](/f&ouml;&ouml; "f&ouml;&ouml;") [foo](/f&ouml;&ouml; "f&ouml;&ouml;")
. .
<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
. .
. .
[foo] [foo]
skipping to change at line 5690 skipping to change at line 6196
. .
<p><em>foo _bar</em> baz_</p> <p><em>foo _bar</em> baz_</p>
. .
. .
**foo*bar** **foo*bar**
. .
<p><em><em>foo</em>bar</em>*</p> <p><em><em>foo</em>bar</em>*</p>
. .
.
*foo __bar *baz bim__ bam*
.
<p><em>foo <strong>bar *baz bim</strong> bam</em></p>
.
Rule 16: Rule 16:
. .
**foo **bar baz** **foo **bar baz**
. .
<p>**foo <strong>bar baz</strong></p> <p>**foo <strong>bar baz</strong></p>
. .
. .
*foo *bar baz* *foo *bar baz*
skipping to change at line 5773 skipping to change at line 6285
(the URI that is the link destination), and optionally a [link title]. (the URI that is the link destination), and optionally a [link title].
There are two basic kinds of links in Markdown. In [inline link]s the There are two basic kinds of links in Markdown. In [inline link]s the
destination and title are given immediately after the link text. In destination and title are given immediately after the link text. In
[reference link]s the destination and title are defined elsewhere in [reference link]s the destination and title are defined elsewhere in
the document. the document.
inline elements enclosed by square brackets ([ and ]). The inline elements enclosed by square brackets ([ and ]). The
following rules apply: following rules apply:
- Links may not contain other links, at any level of nesting. - Links may not contain other links, at any level of nesting. If
multiple otherwise valid link definitions appear nested inside each
other, the inner-most definition is used.
- Brackets are allowed in the [link text] only if (a) they - Brackets are allowed in the [link text] only if (a) they
are backslash-escaped or (b) they appear as a matched pair of brackets, are backslash-escaped or (b) they appear as a matched pair of brackets,
with an open bracket [, a sequence of zero or more inlines, and with an open bracket [, a sequence of zero or more inlines, and
a close bracket ]. a close bracket ].
- Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly - Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly
than the brackets in link text. Thus, for example, than the brackets in link text. Thus, for example,
 [foo]  could not be a link text, since the second ]  [foo]  could not be a link text, since the second ]
is part of a code span. is part of a code span.
skipping to change at line 5929 skipping to change at line 6443
Parentheses and other symbols can also be escaped, as usual Parentheses and other symbols can also be escaped, as usual
in Markdown: in Markdown:
. .
. .
. .
A link can contain fragment identifiers and queries:
.
.
.
Note that a backslash before a non-escapable character is
just a backslash:
.
.
.
URL-escaping should be left alone inside the destination, as all URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in URL-escaped characters are also valid URL characters. HTML entities in
the destination will be parsed into the corresponding unicode the destination will be parsed into the corresponding unicode
codepoints, as usual, and optionally URL-escaped when written as HTML. codepoints, as usual, and optionally URL-escaped when written as HTML.
. .
. .
. .
skipping to change at line 6134 skipping to change at line 6671
that [matches] a [link reference definition] elsewhere in the document. that [matches] a [link reference definition] elsewhere in the document.
A [link label](@link-label) begins with a left bracket ([) and ends A [link label](@link-label) begins with a left bracket ([) and ends
with the first right bracket (]) that is not backslash-escaped. with the first right bracket (]) that is not backslash-escaped.
Between these brackets there must be at least one whitespace character]. Between these brackets there must be at least one [non-whitespace character].
Unescaped square bracket characters are not allowed in Unescaped square bracket characters are not allowed in
characters inside the square brackets. characters inside the square brackets.
One label [matches](@matches) One label [matches](@matches)
another just in case their normalized forms are equal. To normalize a another just in case their normalized forms are equal. To normalize a
label, perform the *unicode case fold* and collapse consecutive internal label, perform the *unicode case fold* and collapse consecutive internal
[whitespace] to a single space. If there are multiple [whitespace] to a single space. If there are multiple
matching reference link definitions, the one that comes first in the matching reference link definitions, the one that comes first in the
document is used. (It is desirable in such cases to emit a warning.) document is used. (It is desirable in such cases to emit a warning.)
skipping to change at line 6381 skipping to change at line 6918
. .
. .
[foo][ref\[] [foo][ref\[]
[ref\[]: /uri [ref\[]: /uri
. .
<p><a href="/uri">foo</a></p> <p><a href="/uri">foo</a></p>
. .
A [link label] must contain at least one whitespace character]: A [link label] must contain at least one [non-whitespace character]:
. .
[] []
[]: /uri []: /uri
. .
<p>[]</p> <p>[]</p>
<p>[]: /uri</p> <p>[]: /uri</p>
. .
skipping to change at line 6961 skipping to change at line 7498
## Raw HTML ## Raw HTML
Text between < and > that looks like an HTML tag is parsed as a Text between < and > that looks like an HTML tag is parsed as a
raw HTML tag and will be rendered in HTML without escaping. raw HTML tag and will be rendered in HTML without escaping.
Tag and attribute names are not limited to current HTML tags, Tag and attribute names are not limited to current HTML tags,
so custom tags (and even, say, DocBook tags) may be used. so custom tags (and even, say, DocBook tags) may be used.
Here is the grammar for tags: Here is the grammar for tags:
A [tag name](@tag-name) consists of an ASCII letter A [tag name](@tag-name) consists of an ASCII letter
followed by zero or more ASCII letters followed by zero or more ASCII letters, digits, or
hyphens (-).
An [attribute](@attribute) consists of [whitespace], An [attribute](@attribute) consists of [whitespace],
an [attribute name], and an optional an [attribute name], and an optional
[attribute value specification]. [attribute value specification].
An [attribute name](@attribute-name) An [attribute name](@attribute-name)
consists of an ASCII letter, _, or :, followed by zero or more ASCII consists of an ASCII letter, _, or :, followed by zero or more ASCII
letters, digits, _, ., :, or -. (Note: This is the XML letters, digits, _, ., :, or -. (Note: This is the XML
specification restricted to ASCII. HTML5 is laxer.) specification restricted to ASCII. HTML5 is laxer.)
skipping to change at line 6994 skipping to change at line 7532
A [single-quoted attribute value](@single-quoted-attribute-value) A [single-quoted attribute value](@single-quoted-attribute-value)
consists of ', zero or more consists of ', zero or more
characters not including ', and a final '. characters not including ', and a final '.
A [double-quoted attribute value](@double-quoted-attribute-value) A [double-quoted attribute value](@double-quoted-attribute-value)
consists of ", zero or more consists of ", zero or more
characters not including ", and a final ". characters not including ", and a final ".
An [open tag](@open-tag) consists of a < character, a [tag name], An [open tag](@open-tag) consists of a < character, a [tag name],
zero or more [attributes], optional [whitespace], an optional / zero or more [attributes](@attribute], optional [whitespace], an optional /
character, and a > character. character, and a > character.
A [closing tag](@closing-tag) consists of the string </, a A [closing tag](@closing-tag) consists of the string </, a
[tag name], optional [whitespace], and the character >. [tag name], optional [whitespace], and the character >.
An [HTML comment](@html-comment) consists of <!-- + *text* + -->, An [HTML comment](@html-comment) consists of <!-- + *text* + -->,
where *text* does not start with > or ->, does not end with -, where *text* does not start with > or ->, does not end with -,
and does not contain --. (See the and does not contain --. (See the
skipping to change at line 7059 skipping to change at line 7597
With attributes: With attributes:
. .
<a foo="bar" bam = 'baz <em>"</em>' <a foo="bar" bam = 'baz <em>"</em>'
_boolean zoop:33=zoop:33 /> _boolean zoop:33=zoop:33 />
. .
<p><a foo="bar" bam = 'baz <em>"</em>' <p><a foo="bar" bam = 'baz <em>"</em>'
_boolean zoop:33=zoop:33 /></p> _boolean zoop:33=zoop:33 /></p>
. .
Custom tag names can be used:
.
<responsive-image src="foo.jpg" />
<My-Tag>
foo
</My-Tag>
.
<responsive-image src="foo.jpg" />
<My-Tag>
foo
</My-Tag>
.
Illegal tag names, not parsed as HTML: Illegal tag names, not parsed as HTML:
. .
<33> <__> <33> <__>
. .
<p>&lt;33&gt; &lt;__&gt;</p> <p>&lt;33&gt; &lt;__&gt;</p>
. .
Illegal attribute names: Illegal attribute names:
skipping to change at line 7107 skipping to change at line 7660
. .
<p>&lt;a href='bar'title=title&gt;</p> <p>&lt;a href='bar'title=title&gt;</p>
. .
Closing tags: Closing tags:
. .
</a> </a>
</foo > </foo >
. .
</a> </a>
</foo > </foo >
. .
Illegal attributes in closing tag: Illegal attributes in closing tag:
. .
</a href="foo"> </a href="foo">
. .
<p>&lt;/a href=&quot;foo&quot;&gt;</p> <p>&lt;/a href=&quot;foo&quot;&gt;</p>
. .
skipping to change at line 7175 skipping to change at line 7728
foo <![CDATA[>&<]]> foo <![CDATA[>&<]]>
. .
<p>foo <![CDATA[>&<]]></p> <p>foo <![CDATA[>&<]]></p>
. .
Entities are preserved in HTML attributes: Entities are preserved in HTML attributes:
. .
<a href="&ouml;"> <a href="&ouml;">
. .
<> <a href="&ouml;">
. .
Backslash escapes do not work in HTML attributes: Backslash escapes do not work in HTML attributes:
. .
<a href="\*"> <a href="\*">
. .
<> <a href="\*">
. .
. .
<a href="\""> <a href="\"">
. .
<p>&lt;a href=&quot;&quot;&quot;&gt;</p> <p>&lt;a href=&quot;&quot;&quot;&gt;</p>
. .
## Hard line breaks ## Hard line breaks
skipping to change at line 7387 skipping to change at line 7940
Internal spaces are preserved verbatim: Internal spaces are preserved verbatim:
. .
Multiple spaces Multiple spaces
. .
<p>Multiple spaces</p> <p>Multiple spaces</p>
. .
<!-- END TESTS --> <!-- END TESTS -->
# Appendix: A parsing strategy {-} # Appendix: A parsing strategy {-}
In this appendix we describe some features of the parsing strategy
used in the CommonMark reference implementations.
## Overview {-} ## Overview {-}
Parsing has two phases: Parsing has two phases:
1. In the first phase, lines of input are consumed and the block 1. In the first phase, lines of input are consumed and the block
structure of the document---its division into paragraphs, block quotes, structure of the document---its division into paragraphs, block quotes,
list items, and so on---is constructed. Text is assigned to these list items, and so on---is constructed. Text is assigned to these
blocks but not parsed. Link reference definitions are parsed and a blocks but not parsed. Link reference definitions are parsed and a
2. In the second phase, the raw text contents of paragraphs and headers 2. In the second phase, the raw text contents of paragraphs and headers
are parsed into sequences of Markdown inline elements (strings, are parsed into sequences of Markdown inline elements (strings,
code spans, links, emphasis, and so on), using the map of link code spans, links, emphasis, and so on), using the map of link
references constructed in phase 1. references constructed in phase 1.
At each point in processing, the document is represented as a tree of At each point in processing, the document is represented as a tree of
**blocks**. The root of the tree is a document block. The document **blocks**. The root of the tree is a document block. The document
may have any number of other blocks as **children**. These children may have any number of other blocks as **children**. These children
may, in turn, have other blocks as children. The last child of a block may, in turn, have other blocks as children. The last child of a block
is normally considered **open**, meaning that subsequent lines of input is normally considered **open**, meaning that subsequent lines of input
can alter its contents. (Blocks that are not open are **closed**.) can alter its contents. (Blocks that are not open are **closed**.)
Here, for example, is a possible document tree, with the open blocks Here, for example, is a possible document tree, with the open blocks
marked by arrows: marked by arrows:
 tree  tree
skipping to change at line 7429 skipping to change at line 7983
"Lorem ipsum dolor\nsit amet." "Lorem ipsum dolor\nsit amet."
-> list (type=bullet tight=true bullet_char=-) -> list (type=bullet tight=true bullet_char=-)
list_item list_item
paragraph paragraph
"Qui *quodsi iracundia*" "Qui *quodsi iracundia*"
-> list_item -> list_item
-> paragraph -> paragraph
"aliquando id" "aliquando id"
 
## e {-} ## Phase 1: block structure {-}
Each line that is processed has an effect on this tree. The line is Each line that is processed has an effect on this tree. The line is
analyzed and, depending on its contents, the document may be altered analyzed and, depending on its contents, the document may be altered
in one or more of the following ways: in one or more of the following ways:
1. One or more open blocks may be closed. 1. One or more open blocks may be closed.
2. One or more new blocks may be created as children of the 2. One or more new blocks may be created as children of the
last open block. last open block.
3. Text may be added to the last (deepest) open block remaining 3. Text may be added to the last (deepest) open block remaining
on the tree. on the tree.
Once a line has been incorporated into the tree in this way, Once a line has been incorporated into the tree in this way,
it can be discarded, so input can be read in a stream. it can be discarded, so input can be read in a stream.
For each line, we follow this procedure:
1. First we iterate through the open blocks, starting with the
root document, and descending through last children down to the last
open block. Each block imposes a condition that the line must satisfy
if the block is to remain open. For example, a block quote requires a
> character. A paragraph requires a non-blank line.
In this phase we may match all or just some of the open
blocks. But we cannot close unmatched blocks yet, because we may have a
[lazy continuation line].
2. Next, after consuming the continuation markers for existing
blocks, we look for new block starts (e.g. > for a block quote.
If we encounter a new block start, we close any blocks unmatched
in step 1 before creating the new block as a child of the last
matched block.
3. Finally, we look at the remainder of the line (after block
markers like >, list markers, and indentation have been consumed).
This is text that can be incorporated into the last open
block (a paragraph, code block, header, or raw HTML).
Setext headers are formed when we detect that the second line of
a paragraph is a setext header line.
Reference link definitions are detected when a paragraph is closed;
the accumulated text lines are parsed to see if they begin with
one or more reference link definitions. Any remainder becomes a
normal paragraph.
We can see how this works by considering how the tree above is We can see how this works by considering how the tree above is
generated by four lines of Markdown: generated by four lines of Markdown:
 markdown  markdown
> Lorem ipsum dolor > Lorem ipsum dolor
sit amet. sit amet.
> - Qui *quodsi iracundia* > - Qui *quodsi iracundia*
> - aliquando id > - aliquando id
 
skipping to change at line 7541 skipping to change at line 8125
"Lorem ipsum dolor\nsit amet." "Lorem ipsum dolor\nsit amet."
-> list (type=bullet tight=true bullet_char=-) -> list (type=bullet tight=true bullet_char=-)
list_item list_item
paragraph paragraph
"Qui *quodsi iracundia*" "Qui *quodsi iracundia*"
-> list_item -> list_item
-> paragraph -> paragraph
"aliquando id" "aliquando id"
 
## {-} ## Phase 2: inline structure {-}
Once all of the input has been parsed, all open blocks are closed. Once all of the input has been parsed, all open blocks are closed.
We then "walk the tree," visiting every node, and parse raw We then "walk the tree," visiting every node, and parse raw
string contents of paragraphs and headers as inlines. At this string contents of paragraphs and headers as inlines. At this
point we have seen all the link reference definitions, so we can point we have seen all the link reference definitions, so we can
resolve reference links as we go. resolve reference links as we go.
 tree  tree
document document
skipping to change at line 7572 skipping to change at line 8156
str "quodsi iracundia" str "quodsi iracundia"
list_item list_item
paragraph paragraph
str "aliquando id" str "aliquando id"
 
Notice how the [line ending] in the first paragraph has Notice how the [line ending] in the first paragraph has
been parsed as a softbreak, and the asterisks in the first list item been parsed as a softbreak, and the asterisks in the first list item
have become an emph. have become an emph.
### An algorithm for parsing nested emphasis and links {-}
By far the trickiest part of inline parsing is handling emphasis,
strong emphasis, links, and images. This is done using the following
algorithm.
When we're parsing inlines and we hit either
- a run of * or _ characters, or
- a [ or ![
we insert a text node with these symbols as its literal content, and we
add a pointer to this text node to the [delimiter stack](@delimiter-stack).
The [delimiter stack] is a doubly linked list. Each
element contains a pointer to a text node, plus information about
- the type of delimiter ([, ![, *, _)
- the number of delimiters,
- whether the delimiter is "active" (all are active to start), and
- whether the delimiter is a potential opener, a potential closer,
or both (which depends on what sort of characters precede
When we hit a ] character, we call the *look for link or image*
procedure (see below).
When we hit the end of the input, we call the *process emphasis*
procedure (see below), with stack_bottom = NULL.
#### *look for link or image* {-}
Starting at the top of the delimiter stack, we look backwards
through the stack for an opening [ or ![ delimiter.
- If we don't find one, we return a literal text node ].
- If we do find one, but it's not *active*, we remove the inactive
delimiter from the stack, and return a literal text node ].
- If we find one and it's active, then we parse ahead to see if
+ If we don't, then we remove the opening delimiter from the
delimiter stack and return a literal text node ].
+ If we do, then
* We return a link or image node whose children are the inlines
after the text node pointed to by the opening delimiter.
* We run *process emphasis* on these inlines, with the [ opener
as stack_bottom.
* We remove the opening delimiter.
* If we have a link (and not an image), we also set all
[ delimiters before the opening delimiter to *inactive*. (This
#### *process emphasis* {-}
Parameter stack_bottom sets a lower bound to how far we
descend in the [delimiter stack]. If it is NULL, we can
go all the way to the bottom. Otherwise, we stop before
visiting stack_bottom.
Let current_position point to the element on the [delimiter stack]
just above stack_bottom (or the first element if stack_bottom
is NULL).
We keep track of the openers_bottom for each delimiter
type (*, _). Initialize this to stack_bottom.
Then we repeat the following until we run out of potential
closers:
- Move current_position forward in the delimiter stack (if needed)
until we find the first potential closer with delimiter * or _.
(This will be the potential closer closest
to the beginning of the input -- the first one in parse order.)
- Now, look back in the stack (staying above stack_bottom and
the openers_bottom for this delimiter type) for the
first matching potential opener ("matching" means same delimiter).
- If one is found:
+ Figure out whether we have emphasis or strong emphasis:
if both closer and opener spans have length >= 2, we have
strong, otherwise regular.
+ Insert an emph or strong emph node accordingly, after
the text node corresponding to the opener.
+ Remove any delimiters between the opener and closer from
the delimiter stack.
+ Remove 1 (for regular emph) or 2 (for strong emph) delimiters
from the opening and closing text nodes. If they become empty
as a result, remove them and remove the corresponding element
of the delimiter stack. If the closing node is removed, reset
current_position to the next element in the stack.
- If none in found:
+ Set openers_bottom to the element before current_position.
(We know that there are no openers for this kind of closer up to and
including this point, so this puts a lower bound on future searches.)
+ If the closer at current_position is not a potential opener,
remove it from the delimiter stack (since we know it can't
be a closer either).
+ Advance current_position to the next element in the stack.
After we're done, we remove all delimiters above stack_bottom` from the
delimiter stack.
End of changes. 74 change blocks.
98 lines changed or deleted 682 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/