spec.txt | spec.txt | |||
---|---|---|---|---|
--- | --- | |||
title: CommonMark Spec | title: CommonMark Spec | |||
author: John MacFarlane | author: John MacFarlane | |||
version: 0.22 | version: 0.23 | |||
date: 2015-08-23 | date: 2015-12-29 | |||
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | |||
... | ... | |||
# Introduction | # Introduction | |||
## What is Markdown? | ## What is Markdown? | |||
Markdown is a plain text format for writing structured documents, | Markdown is a plain text format for writing structured documents, | |||
based on conventions used for indicating formatting in email and | based on conventions used for indicating formatting in email and | |||
usenet posts. It was developed in 2004 by John Gruber, who wrote | usenet posts. It was developed in 2004 by John Gruber, who wrote | |||
skipping to change at line 39 | skipping to change at line 39 | |||
1. How much indentation is needed for a sublist? The spec says that | 1. How much indentation is needed for a sublist? The spec says that | |||
continuation paragraphs need to be indented four spaces, but is | continuation paragraphs need to be indented four spaces, but is | |||
not fully explicit about sublists. It is natural to think that | not fully explicit about sublists. It is natural to think that | |||
they, too, must be indented four spaces, but `Markdown.pl` does | they, too, must be indented four spaces, but `Markdown.pl` does | |||
not require that. This is hardly a "corner case," and divergences | not require that. This is hardly a "corner case," and divergences | |||
between implementations on this issue often lead to surprises for | between implementations on this issue often lead to surprises for | |||
users in real documents. (See [this comment by John | users in real documents. (See [this comment by John | |||
Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).) | Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).) | |||
2. Is a blank line needed before a block quote or header? | 2. Is a blank line needed before a block quote or heading? | |||
Most implementations do not require the blank line. However, | Most implementations do not require the blank line. However, | |||
this can lead to unexpected results in hard-wrapped text, and | this can lead to unexpected results in hard-wrapped text, and | |||
also to ambiguities in parsing (note that some implementations | also to ambiguities in parsing (note that some implementations | |||
put the header inside the blockquote, while others do not). | put the heading inside the blockquote, while others do not). | |||
(John Gruber has also spoken [in favor of requiring the blank | (John Gruber has also spoken [in favor of requiring the blank | |||
lines](http://article.gmane.org/gmane.text.markdown.general/2146).) | lines](http://article.gmane.org/gmane.text.markdown.general/2146).) | |||
3. Is a blank line needed before an indented code block? | 3. Is a blank line needed before an indented code block? | |||
(`Markdown.pl` requires it, but this is not mentioned in the | (`Markdown.pl` requires it, but this is not mentioned in the | |||
documentation, and some implementations do not require it.) | documentation, and some implementations do not require it.) | |||
``` markdown | ``` markdown | |||
paragraph | paragraph | |||
code? | code? | |||
skipping to change at line 88 | skipping to change at line 88 | |||
[here](http://article.gmane.org/gmane.text.markdown.general/2554).) | [here](http://article.gmane.org/gmane.text.markdown.general/2554).) | |||
5. Can list markers be indented? Can ordered list markers be right-aligned? | 5. Can list markers be indented? Can ordered list markers be right-aligned? | |||
``` markdown | ``` markdown | |||
8. item 1 | 8. item 1 | |||
9. item 2 | 9. item 2 | |||
10. item 2a | 10. item 2a | |||
``` | ``` | |||
6. Is this one list with a horizontal rule in its second item, | 6. Is this one list with a thematic break in its second item, | |||
or two lists separated by a horizontal rule? | or two lists separated by a thematic break? | |||
``` markdown | ``` markdown | |||
* a | * a | |||
* * * * * | * * * * * | |||
* b | * b | |||
``` | ``` | |||
7. When list markers change from numbers to bullets, do we have | 7. When list markers change from numbers to bullets, do we have | |||
two lists or one? (The Markdown syntax description suggests two, | two lists or one? (The Markdown syntax description suggests two, | |||
but the perl scripts and many other implementations produce one.) | but the perl scripts and many other implementations produce one.) | |||
skipping to change at line 131 | skipping to change at line 131 | |||
``` | ``` | |||
10. What are the precedence rules between block-level and inline-level | 10. What are the precedence rules between block-level and inline-level | |||
structure? For example, how should the following be parsed? | structure? For example, how should the following be parsed? | |||
``` markdown | ``` markdown | |||
- `a long code span can contain a hyphen like this | - `a long code span can contain a hyphen like this | |||
- and it can screw things up` | - and it can screw things up` | |||
``` | ``` | |||
11. Can list items include section headers? (`Markdown.pl` does not | 11. Can list items include section headings? (`Markdown.pl` does not | |||
allow this, but does allow blockquotes to include headers.) | allow this, but does allow blockquotes to include headings.) | |||
``` markdown | ``` markdown | |||
- # Heading | - # Heading | |||
``` | ``` | |||
12. Can list items be empty? | 12. Can list items be empty? | |||
``` markdown | ``` markdown | |||
* a | * a | |||
* | * | |||
skipping to change at line 327 | skipping to change at line 327 | |||
## Insecure characters | ## Insecure characters | |||
For security reasons, the Unicode character `U+0000` must be replaced | For security reasons, the Unicode character `U+0000` must be replaced | |||
with the replacement character (`U+FFFD`). | with the replacement character (`U+FFFD`). | |||
# Blocks and inlines | # Blocks and inlines | |||
We can think of a document as a sequence of | We can think of a document as a sequence of | |||
[blocks](@block)---structural elements like paragraphs, block | [blocks](@block)---structural elements like paragraphs, block | |||
quotations, lists, headers, rules, and code blocks. Some blocks (like | quotations, lists, headings, rules, and code blocks. Some blocks (like | |||
block quotes and list items) contain other blocks; others (like | block quotes and list items) contain other blocks; others (like | |||
headers and paragraphs) contain [inline](@inline) content---text, | headings and paragraphs) contain [inline](@inline) content---text, | |||
links, emphasized text, images, code, and so on. | links, emphasized text, images, code, and so on. | |||
## Precedence | ## Precedence | |||
Indicators of block structure always take precedence over indicators | Indicators of block structure always take precedence over indicators | |||
of inline structure. So, for example, the following is a list with | of inline structure. So, for example, the following is a list with | |||
two items, not a list with one item containing a code span: | two items, not a list with one item containing a code span: | |||
. | . | |||
- `one | - `one | |||
- two` | - two` | |||
. | . | |||
<ul> | <ul> | |||
<li>`one</li> | <li>`one</li> | |||
<li>two`</li> | <li>two`</li> | |||
</ul> | </ul> | |||
. | . | |||
This means that parsing can proceed in two steps: first, the block | This means that parsing can proceed in two steps: first, the block | |||
structure of the document can be discerned; second, text lines inside | structure of the document can be discerned; second, text lines inside | |||
paragraphs, headers, and other block constructs can be parsed for inline | paragraphs, headings, and other block constructs can be parsed for inline | |||
structure. The second step requires information about link reference | structure. The second step requires information about link reference | |||
definitions that will be available only at the end of the first | definitions that will be available only at the end of the first | |||
step. Note that the first step requires processing lines in sequence, | step. Note that the first step requires processing lines in sequence, | |||
but the second can be parallelized, since the inline parsing of | but the second can be parallelized, since the inline parsing of | |||
one block element does not affect the inline parsing of any other. | one block element does not affect the inline parsing of any other. | |||
## Container blocks and leaf blocks | ## Container blocks and leaf blocks | |||
We can divide blocks into two types: | We can divide blocks into two types: | |||
[container block](@container-block)s, | [container block](@container-block)s, | |||
which can contain other blocks, and [leaf block](@leaf-block)s, | which can contain other blocks, and [leaf block](@leaf-block)s, | |||
which cannot. | which cannot. | |||
# Leaf blocks | # Leaf blocks | |||
This section describes the different kinds of leaf block that make up a | This section describes the different kinds of leaf block that make up a | |||
Markdown document. | Markdown document. | |||
## Horizontal rules | ## Thematic breaks | |||
A line consisting of 0-3 spaces of indentation, followed by a sequence | A line consisting of 0-3 spaces of indentation, followed by a sequence | |||
of three or more matching `-`, `_`, or `*` characters, each followed | of three or more matching `-`, `_`, or `*` characters, each followed | |||
optionally by any number of spaces, forms a | optionally by any number of spaces, forms a | |||
[horizontal rule](@horizontal-rule). | [thematic break](@thematic-break). | |||
. | . | |||
*** | *** | |||
--- | --- | |||
___ | ___ | |||
. | . | |||
<hr /> | <hr /> | |||
<hr /> | <hr /> | |||
<hr /> | <hr /> | |||
. | . | |||
skipping to change at line 492 | skipping to change at line 492 | |||
a------ | a------ | |||
---a--- | ---a--- | |||
. | . | |||
<p>_ _ _ _ a</p> | <p>_ _ _ _ a</p> | |||
<p>a------</p> | <p>a------</p> | |||
<p>---a---</p> | <p>---a---</p> | |||
. | . | |||
It is required that all of the [non-whitespace character]s be the same. | It is required that all of the [non-whitespace character]s be the same. | |||
So, this is not a horizontal rule: | So, this is not a thematic break: | |||
. | . | |||
*-* | *-* | |||
. | . | |||
<p><em>-</em></p> | <p><em>-</em></p> | |||
. | . | |||
Horizontal rules do not need blank lines before or after: | Thematic breaks do not need blank lines before or after: | |||
. | . | |||
- foo | - foo | |||
*** | *** | |||
- bar | - bar | |||
. | . | |||
<ul> | <ul> | |||
<li>foo</li> | <li>foo</li> | |||
</ul> | </ul> | |||
<hr /> | <hr /> | |||
<ul> | <ul> | |||
<li>bar</li> | <li>bar</li> | |||
</ul> | </ul> | |||
. | . | |||
Horizontal rules can interrupt a paragraph: | Thematic breaks can interrupt a paragraph: | |||
. | . | |||
Foo | Foo | |||
*** | *** | |||
bar | bar | |||
. | . | |||
<p>Foo</p> | <p>Foo</p> | |||
<hr /> | <hr /> | |||
<p>bar</p> | <p>bar</p> | |||
. | . | |||
If a line of dashes that meets the above conditions for being a | If a line of dashes that meets the above conditions for being a | |||
horizontal rule could also be interpreted as the underline of a [setext | thematic break could also be interpreted as the underline of a [setext | |||
header], the interpretation as a | heading], the interpretation as a | |||
[setext header] takes precedence. Thus, for example, | [setext heading] takes precedence. Thus, for example, | |||
this is a setext header, not a paragraph followed by a horizontal rule: | this is a setext heading, not a paragraph followed by a thematic break: | |||
. | . | |||
Foo | Foo | |||
--- | --- | |||
bar | bar | |||
. | . | |||
<h2>Foo</h2> | <h2>Foo</h2> | |||
<p>bar</p> | <p>bar</p> | |||
. | . | |||
When both a horizontal rule and a list item are possible | When both a thematic break and a list item are possible | |||
interpretations of a line, the horizontal rule takes precedence: | interpretations of a line, the thematic break takes precedence: | |||
. | . | |||
* Foo | * Foo | |||
* * * | * * * | |||
* Bar | * Bar | |||
. | . | |||
<ul> | <ul> | |||
<li>Foo</li> | <li>Foo</li> | |||
</ul> | </ul> | |||
<hr /> | <hr /> | |||
<ul> | <ul> | |||
<li>Bar</li> | <li>Bar</li> | |||
</ul> | </ul> | |||
. | . | |||
If you want a horizontal rule in a list item, use a different bullet: | If you want a thematic break in a list item, use a different bullet: | |||
. | . | |||
- Foo | - Foo | |||
- * * * | - * * * | |||
. | . | |||
<ul> | <ul> | |||
<li>Foo</li> | <li>Foo</li> | |||
<li> | <li> | |||
<hr /> | <hr /> | |||
</li> | </li> | |||
</ul> | </ul> | |||
. | . | |||
## ATX headers | ## ATX headings | |||
An [ATX header](@atx-header) | An [ATX heading](@atx-heading) | |||
consists of a string of characters, parsed as inline content, between an | consists of a string of characters, parsed as inline content, between an | |||
opening sequence of 1--6 unescaped `#` characters and an optional | opening sequence of 1--6 unescaped `#` characters and an optional | |||
closing sequence of any number of unescaped `#` characters. | closing sequence of any number of unescaped `#` characters. | |||
The opening sequence of `#` characters cannot be followed directly by a | The opening sequence of `#` characters must be followed by a | |||
[non-whitespace character]. The optional closing sequence of `#`s must be | [space] or by the end of line. The optional closing sequence of `#`s must be | |||
preceded by a [space] and may be followed by spaces only. The opening | preceded by a [space] and may be followed by spaces only. The opening | |||
`#` character may be indented 0-3 spaces. The raw contents of the | `#` character may be indented 0-3 spaces. The raw contents of the | |||
header are stripped of leading and trailing spaces before being parsed | heading are stripped of leading and trailing spaces before being parsed | |||
as inline content. The header level is equal to the number of `#` | as inline content. The heading level is equal to the number of `#` | |||
characters in the opening sequence. | characters in the opening sequence. | |||
Simple headers: | Simple headings: | |||
. | . | |||
# foo | # foo | |||
## foo | ## foo | |||
### foo | ### foo | |||
#### foo | #### foo | |||
##### foo | ##### foo | |||
###### foo | ###### foo | |||
. | . | |||
<h1>foo</h1> | <h1>foo</h1> | |||
<h2>foo</h2> | <h2>foo</h2> | |||
<h3>foo</h3> | <h3>foo</h3> | |||
<h4>foo</h4> | <h4>foo</h4> | |||
<h5>foo</h5> | <h5>foo</h5> | |||
<h6>foo</h6> | <h6>foo</h6> | |||
. | . | |||
More than six `#` characters is not a header: | More than six `#` characters is not a heading: | |||
. | . | |||
####### foo | ####### foo | |||
. | . | |||
<p>####### foo</p> | <p>####### foo</p> | |||
. | . | |||
At least one space is required between the `#` characters and the | At least one space is required between the `#` characters and the | |||
header's contents, unless the header is empty. Note that many | heading's contents, unless the heading is empty. Note that many | |||
implementations currently do not require the space. However, the | implementations currently do not require the space. However, the | |||
space was required by the | space was required by the | |||
[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), | [original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), | |||
and it helps prevent things like the following from being parsed as | and it helps prevent things like the following from being parsed as | |||
headers: | headings: | |||
. | . | |||
#5 bolt | #5 bolt | |||
#foobar | #hashtag | |||
. | . | |||
<p>#5 bolt</p> | <p>#5 bolt</p> | |||
<p>#foobar</p> | <p>#hashtag</p> | |||
. | . | |||
This is not a header, because the first `#` is escaped: | A tab will not work: | |||
. | ||||
#→foo | ||||
. | ||||
<p>#→foo</p> | ||||
. | ||||
This is not a heading, because the first `#` is escaped: | ||||
. | . | |||
\## foo | \## foo | |||
. | . | |||
<p>## foo</p> | <p>## foo</p> | |||
. | . | |||
Contents are parsed as inlines: | Contents are parsed as inlines: | |||
. | . | |||
skipping to change at line 714 | skipping to change at line 722 | |||
Spaces are allowed after the closing sequence: | Spaces are allowed after the closing sequence: | |||
. | . | |||
### foo ### | ### foo ### | |||
. | . | |||
<h3>foo</h3> | <h3>foo</h3> | |||
. | . | |||
A sequence of `#` characters with anything but [space]s following it | A sequence of `#` characters with anything but [space]s following it | |||
is not a closing sequence, but counts as part of the contents of the | is not a closing sequence, but counts as part of the contents of the | |||
header: | heading: | |||
. | . | |||
### foo ### b | ### foo ### b | |||
. | . | |||
<h3>foo ### b</h3> | <h3>foo ### b</h3> | |||
. | . | |||
The closing sequence must be preceded by a space: | The closing sequence must be preceded by a space: | |||
. | . | |||
skipping to change at line 743 | skipping to change at line 751 | |||
. | . | |||
### foo \### | ### foo \### | |||
## foo #\## | ## foo #\## | |||
# foo \# | # foo \# | |||
. | . | |||
<h3>foo ###</h3> | <h3>foo ###</h3> | |||
<h2>foo ###</h2> | <h2>foo ###</h2> | |||
<h1>foo #</h1> | <h1>foo #</h1> | |||
. | . | |||
ATX headers need not be separated from surrounding content by blank | ATX headings need not be separated from surrounding content by blank | |||
lines, and they can interrupt paragraphs: | lines, and they can interrupt paragraphs: | |||
. | . | |||
**** | **** | |||
## foo | ## foo | |||
**** | **** | |||
. | . | |||
<hr /> | <hr /> | |||
<h2>foo</h2> | <h2>foo</h2> | |||
<hr /> | <hr /> | |||
skipping to change at line 766 | skipping to change at line 774 | |||
. | . | |||
Foo bar | Foo bar | |||
# baz | # baz | |||
Bar foo | Bar foo | |||
. | . | |||
<p>Foo bar</p> | <p>Foo bar</p> | |||
<h1>baz</h1> | <h1>baz</h1> | |||
<p>Bar foo</p> | <p>Bar foo</p> | |||
. | . | |||
ATX headers can be empty: | ATX headings can be empty: | |||
. | . | |||
## | ## | |||
# | # | |||
### ### | ### ### | |||
. | . | |||
<h2></h2> | <h2></h2> | |||
<h1></h1> | <h1></h1> | |||
<h3></h3> | <h3></h3> | |||
. | . | |||
## Setext headers | ## Setext headings | |||
A [setext header](@setext-header) | A [setext heading](@setext-heading) | |||
consists of a line of text, containing at least one [non-whitespace character], | consists of a line of text, containing at least one [non-whitespace character], | |||
with no more than 3 spaces indentation, followed by a [setext header | with no more than 3 spaces indentation, followed by a [setext heading | |||
underline]. The line of text must be | underline]. The line of text must be | |||
one that, were it not followed by the setext header underline, | one that, were it not followed by the setext heading underline, | |||
would be interpreted as part of a paragraph: it cannot be | would be interpreted as part of a paragraph: it cannot be | |||
interpretable as a [code fence], [ATX header][ATX headers], | interpretable as a [code fence], [ATX heading][ATX headings], | |||
[block quote][block quotes], [horizontal rule][horizontal rules], | [block quote][block quotes], [thematic break][thematic breaks], | |||
[list item][list items], or [HTML block][HTML blocks]. | [list item][list items], or [HTML block][HTML blocks]. | |||
A [setext header underline](@setext-header-underline) is a sequence of | A [setext heading underline](@setext-heading-underline) is a sequence of | |||
`=` characters or a sequence of `-` characters, with no more than 3 | `=` characters or a sequence of `-` characters, with no more than 3 | |||
spaces indentation and any number of trailing spaces. If a line | spaces indentation and any number of trailing spaces. If a line | |||
containing a single `-` can be interpreted as an | containing a single `-` can be interpreted as an | |||
empty [list items], it should be interpreted this way | empty [list items], it should be interpreted this way | |||
and not as a [setext header underline]. | and not as a [setext heading underline]. | |||
The header is a level 1 header if `=` characters are used in the | The heading is a level 1 heading if `=` characters are used in the | |||
[setext header underline], and a level 2 | [setext heading underline], and a level 2 | |||
header if `-` characters are used. The contents of the header are the | heading if `-` characters are used. The contents of the heading are the | |||
result of parsing the first line as Markdown inline content. | result of parsing the first line as Markdown inline content. | |||
In general, a setext header need not be preceded or followed by a | In general, a setext heading need not be preceded or followed by a | |||
blank line. However, it cannot interrupt a paragraph, so when a | blank line. However, it cannot interrupt a paragraph, so when a | |||
setext header comes after a paragraph, a blank line is needed between | setext heading comes after a paragraph, a blank line is needed between | |||
them. | them. | |||
Simple examples: | Simple examples: | |||
. | . | |||
Foo *bar* | Foo *bar* | |||
========= | ========= | |||
Foo *bar* | Foo *bar* | |||
--------- | --------- | |||
skipping to change at line 833 | skipping to change at line 841 | |||
Foo | Foo | |||
------------------------- | ------------------------- | |||
Foo | Foo | |||
= | = | |||
. | . | |||
<h2>Foo</h2> | <h2>Foo</h2> | |||
<h1>Foo</h1> | <h1>Foo</h1> | |||
. | . | |||
The header content can be indented up to three spaces, and need | The heading content can be indented up to three spaces, and need | |||
not line up with the underlining: | not line up with the underlining: | |||
. | . | |||
Foo | Foo | |||
--- | --- | |||
Foo | Foo | |||
----- | ----- | |||
Foo | Foo | |||
skipping to change at line 868 | skipping to change at line 876 | |||
--- | --- | |||
. | . | |||
<pre><code>Foo | <pre><code>Foo | |||
--- | --- | |||
Foo | Foo | |||
</code></pre> | </code></pre> | |||
<hr /> | <hr /> | |||
. | . | |||
The setext header underline can be indented up to three spaces, and | The setext heading underline can be indented up to three spaces, and | |||
may have trailing spaces: | may have trailing spaces: | |||
. | . | |||
Foo | Foo | |||
---- | ---- | |||
. | . | |||
<h2>Foo</h2> | <h2>Foo</h2> | |||
. | . | |||
Four spaces is too much: | Four spaces is too much: | |||
. | . | |||
Foo | Foo | |||
--- | --- | |||
. | . | |||
<p>Foo | <p>Foo | |||
---</p> | ---</p> | |||
. | . | |||
The setext header underline cannot contain internal spaces: | The setext heading underline cannot contain internal spaces: | |||
. | . | |||
Foo | Foo | |||
= = | = = | |||
Foo | Foo | |||
--- - | --- - | |||
. | . | |||
<p>Foo | <p>Foo | |||
= =</p> | = =</p> | |||
skipping to change at line 922 | skipping to change at line 930 | |||
Nor does a backslash at the end: | Nor does a backslash at the end: | |||
. | . | |||
Foo\ | Foo\ | |||
---- | ---- | |||
. | . | |||
<h2>Foo\</h2> | <h2>Foo\</h2> | |||
. | . | |||
Since indicators of block structure take precedence over | Since indicators of block structure take precedence over | |||
indicators of inline structure, the following are setext headers: | indicators of inline structure, the following are setext headings: | |||
. | . | |||
`Foo | `Foo | |||
---- | ---- | |||
` | ` | |||
<a title="a lot | <a title="a lot | |||
--- | --- | |||
of dashes"/> | of dashes"/> | |||
. | . | |||
<h2>`Foo</h2> | <h2>`Foo</h2> | |||
<p>`</p> | <p>`</p> | |||
<h2><a title="a lot</h2> | <h2><a title="a lot</h2> | |||
<p>of dashes"/></p> | <p>of dashes"/></p> | |||
. | . | |||
The setext header underline cannot be a [lazy continuation | The setext heading underline cannot be a [lazy continuation | |||
line] in a list item or block quote: | line] in a list item or block quote: | |||
. | . | |||
> Foo | > Foo | |||
--- | --- | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>Foo</p> | <p>Foo</p> | |||
</blockquote> | </blockquote> | |||
<hr /> | <hr /> | |||
skipping to change at line 962 | skipping to change at line 970 | |||
. | . | |||
- Foo | - Foo | |||
--- | --- | |||
. | . | |||
<ul> | <ul> | |||
<li>Foo</li> | <li>Foo</li> | |||
</ul> | </ul> | |||
<hr /> | <hr /> | |||
. | . | |||
A setext header cannot interrupt a paragraph: | A setext heading cannot interrupt a paragraph: | |||
. | . | |||
Foo | Foo | |||
Bar | Bar | |||
--- | --- | |||
Foo | Foo | |||
Bar | Bar | |||
=== | === | |||
. | . | |||
skipping to change at line 997 | skipping to change at line 1005 | |||
Bar | Bar | |||
--- | --- | |||
Baz | Baz | |||
. | . | |||
<hr /> | <hr /> | |||
<h2>Foo</h2> | <h2>Foo</h2> | |||
<h2>Bar</h2> | <h2>Bar</h2> | |||
<p>Baz</p> | <p>Baz</p> | |||
. | . | |||
Setext headers cannot be empty: | Setext headings cannot be empty: | |||
. | . | |||
==== | ==== | |||
. | . | |||
<p>====</p> | <p>====</p> | |||
. | . | |||
Setext header text lines must not be interpretable as block | Setext heading text lines must not be interpretable as block | |||
constructs other than paragraphs. So, the line of dashes | constructs other than paragraphs. So, the line of dashes | |||
in these examples gets interpreted as a horizontal rule: | in these examples gets interpreted as a thematic break: | |||
. | . | |||
--- | --- | |||
--- | --- | |||
. | . | |||
<hr /> | <hr /> | |||
<hr /> | <hr /> | |||
. | . | |||
. | . | |||
skipping to change at line 1047 | skipping to change at line 1055 | |||
. | . | |||
> foo | > foo | |||
----- | ----- | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>foo</p> | <p>foo</p> | |||
</blockquote> | </blockquote> | |||
<hr /> | <hr /> | |||
. | . | |||
If you want a header with `> foo` as its literal text, you can | If you want a heading with `> foo` as its literal text, you can | |||
use backslash escapes: | use backslash escapes: | |||
. | . | |||
\> foo | \> foo | |||
------ | ------ | |||
. | . | |||
<h2>> foo</h2> | <h2>> foo</h2> | |||
. | . | |||
## Indented code blocks | ## Indented code blocks | |||
skipping to change at line 1189 | skipping to change at line 1197 | |||
. | . | |||
<pre><code>foo | <pre><code>foo | |||
</code></pre> | </code></pre> | |||
<p>bar</p> | <p>bar</p> | |||
. | . | |||
And indented code can occur immediately before and after other kinds of | And indented code can occur immediately before and after other kinds of | |||
blocks: | blocks: | |||
. | . | |||
# Header | # Heading | |||
foo | foo | |||
Header | Heading | |||
------ | ------ | |||
foo | foo | |||
---- | ---- | |||
. | . | |||
<h1>Header</h1> | <h1>Heading</h1> | |||
<pre><code>foo | <pre><code>foo | |||
</code></pre> | </code></pre> | |||
<h2>Header</h2> | <h2>Heading</h2> | |||
<pre><code>foo | <pre><code>foo | |||
</code></pre> | </code></pre> | |||
<hr /> | <hr /> | |||
. | . | |||
The first line can be indented more than four spaces: | The first line can be indented more than four spaces: | |||
. | . | |||
foo | foo | |||
bar | bar | |||
skipping to change at line 1357 | skipping to change at line 1365 | |||
aaa | aaa | |||
~~~ | ~~~ | |||
~~~~ | ~~~~ | |||
. | . | |||
<pre><code>aaa | <pre><code>aaa | |||
~~~ | ~~~ | |||
</code></pre> | </code></pre> | |||
. | . | |||
Unclosed code blocks are closed by the end of the document | Unclosed code blocks are closed by the end of the document | |||
(or the enclosing [block quote] or [list item]): | (or the enclosing [block quote][block quotes] or [list item][list items]): | |||
. | . | |||
``` | ``` | |||
. | . | |||
<pre><code></code></pre> | <pre><code></code></pre> | |||
. | . | |||
. | . | |||
````` | ````` | |||
skipping to change at line 1977 | skipping to change at line 1985 | |||
. | . | |||
<style | <style | |||
type="text/css"> | type="text/css"> | |||
h1 {color:red;} | h1 {color:red;} | |||
p {color:blue;} | p {color:blue;} | |||
</style> | </style> | |||
. | . | |||
If there is no matching end tag, the block will end at the | If there is no matching end tag, the block will end at the | |||
end of the document (or the enclosing [block quote] or | end of the document (or the enclosing [block quote][block quotes] | |||
[list item]): | or [list item][list items]): | |||
. | . | |||
<style | <style | |||
type="text/css"> | type="text/css"> | |||
foo | foo | |||
. | . | |||
<style | <style | |||
type="text/css"> | type="text/css"> | |||
skipping to change at line 2536 | skipping to change at line 2544 | |||
Foo | Foo | |||
[bar]: /baz | [bar]: /baz | |||
[bar] | [bar] | |||
. | . | |||
<p>Foo | <p>Foo | |||
[bar]: /baz</p> | [bar]: /baz</p> | |||
<p>[bar]</p> | <p>[bar]</p> | |||
. | . | |||
However, it can directly follow other block elements, such as headers | However, it can directly follow other block elements, such as headings | |||
and horizontal rules, and it need not be followed by a blank line. | and thematic breaks, and it need not be followed by a blank line. | |||
. | . | |||
# [Foo] | # [Foo] | |||
[foo]: /url | [foo]: /url | |||
> bar | > bar | |||
. | . | |||
<h1><a href="/url">Foo</a></h1> | <h1><a href="/url">Foo</a></h1> | |||
<blockquote> | <blockquote> | |||
<p>bar</p> | <p>bar</p> | |||
</blockquote> | </blockquote> | |||
skipping to change at line 3400 | skipping to change at line 3408 | |||
<pre><code>bar | <pre><code>bar | |||
</code></pre> | </code></pre> | |||
<p>baz</p> | <p>baz</p> | |||
<blockquote> | <blockquote> | |||
<p>bam</p> | <p>bam</p> | |||
</blockquote> | </blockquote> | |||
</li> | </li> | |||
</ol> | </ol> | |||
. | . | |||
A list item that contains an indented code block will preserve | ||||
empty lines within the code block verbatim, unless there are two | ||||
or more empty lines in a row (since as described above, two | ||||
blank lines end the list): | ||||
. | ||||
- Foo | ||||
bar | ||||
baz | ||||
. | ||||
<ul> | ||||
<li> | ||||
<p>Foo</p> | ||||
<pre><code>bar | ||||
baz | ||||
</code></pre> | ||||
</li> | ||||
</ul> | ||||
. | ||||
. | ||||
- Foo | ||||
bar | ||||
baz | ||||
. | ||||
<ul> | ||||
<li> | ||||
<p>Foo</p> | ||||
<pre><code>bar | ||||
</code></pre> | ||||
</li> | ||||
</ul> | ||||
<pre><code> baz | ||||
</code></pre> | ||||
. | ||||
Note that ordered list start numbers must be nine digits or less: | Note that ordered list start numbers must be nine digits or less: | |||
. | . | |||
123456789. ok | 123456789. ok | |||
. | . | |||
<ol start="123456789"> | <ol start="123456789"> | |||
<li>ok</li> | <li>ok</li> | |||
</ol> | </ol> | |||
. | . | |||
skipping to change at line 3967 | skipping to change at line 4016 | |||
<li> | <li> | |||
<ol start="2"> | <ol start="2"> | |||
<li>foo</li> | <li>foo</li> | |||
</ol> | </ol> | |||
</li> | </li> | |||
</ul> | </ul> | |||
</li> | </li> | |||
</ol> | </ol> | |||
. | . | |||
A list item can contain a header: | A list item can contain a heading: | |||
. | . | |||
- # Foo | - # Foo | |||
- Bar | - Bar | |||
--- | --- | |||
baz | baz | |||
. | . | |||
<ul> | <ul> | |||
<li> | <li> | |||
<h1>Foo</h1> | <h1>Foo</h1> | |||
skipping to change at line 4778 | skipping to change at line 4827 | |||
Escaped characters are treated as regular characters and do | Escaped characters are treated as regular characters and do | |||
not have their usual Markdown meanings: | not have their usual Markdown meanings: | |||
. | . | |||
\*not emphasized* | \*not emphasized* | |||
\<br/> not a tag | \<br/> not a tag | |||
\[not a link](/foo) | \[not a link](/foo) | |||
\`not code` | \`not code` | |||
1\. not a list | 1\. not a list | |||
\* not a list | \* not a list | |||
\# not a header | \# not a heading | |||
\[foo]: /url "not a reference" | \[foo]: /url "not a reference" | |||
. | . | |||
<p>*not emphasized* | <p>*not emphasized* | |||
<br/> not a tag | <br/> not a tag | |||
[not a link](/foo) | [not a link](/foo) | |||
`not code` | `not code` | |||
1. not a list | 1. not a list | |||
* not a list | * not a list | |||
# not a header | # not a heading | |||
[foo]: /url "not a reference"</p> | [foo]: /url "not a reference"</p> | |||
. | . | |||
If a backslash is itself escaped, the following character is not: | If a backslash is itself escaped, the following character is not: | |||
. | . | |||
\\*emphasis* | \\*emphasis* | |||
. | . | |||
<p>\<em>emphasis</em></p> | <p>\<em>emphasis</em></p> | |||
. | . | |||
skipping to change at line 4872 | skipping to change at line 4921 | |||
. | . | |||
``` foo\+bar | ``` foo\+bar | |||
foo | foo | |||
``` | ``` | |||
. | . | |||
<pre><code class="language-foo+bar">foo | <pre><code class="language-foo+bar">foo | |||
</code></pre> | </code></pre> | |||
. | . | |||
## Entities | ## Entity and numeric character references | |||
With the goal of making this standard as HTML-agnostic as possible, all | All valid HTML entity references and numeric character | |||
valid HTML entities (except in code blocks and code spans) | references, except those occuring in code blocks, code spans, | |||
are recognized as such and converted into Unicode characters before | and raw HTML, are recognized as such and treated as equivalent to the | |||
they are stored in the AST. This means that renderers to formats other | corresponding Unicode characters. Conforming CommonMark parsers | |||
than HTML need not be HTML-entity aware. HTML renderers may either escape | need not store information about whether a particular character | |||
Unicode characters as entities or leave them as they are. (However, | was represented in the source using a Unicode character or | |||
`"`, `&`, `<`, and `>` must always be rendered as entities.) | an entity reference. | |||
[Named entities](@name-entities) consist of `&` + any of the valid | [Entity references](@entity-references) consist of `&` + any of the valid | |||
HTML5 entity names + `;`. The | HTML5 entity names + `;`. The | |||
[following document](https://html.spec.whatwg.org/multipage/entities.json) | document <https://html.spec.whatwg.org/multipage/entities.json> | |||
is used as an authoritative source of the valid entity names and their | is used as an authoritative source for the valid entity | |||
corresponding code points. | references and their corresponding code points. | |||
. | . | |||
& © Æ Ď | & © Æ Ď | |||
¾ ℋ ⅆ | ¾ ℋ ⅆ | |||
∲ ≧̸ | ∲ ≧̸ | |||
. | . | |||
<p> & © Æ Ď | <p> & © Æ Ď | |||
¾ ℋ ⅆ | ¾ ℋ ⅆ | |||
∲ ≧̸</p> | ∲ ≧̸</p> | |||
. | . | |||
[Decimal entities](@decimal-entities) | [Decimal numeric character | |||
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | references](@decimal-numeric-character-references) | |||
entities need to be recognised and transformed into their corresponding | consist of `&#` + a string of 1--8 arabic digits + `;`. A | |||
Unicode code points. Invalid Unicode code points will be replaced by | numeric character reference is parsed as the corresponding | |||
Unicode character. Invalid Unicode code points will be replaced by | ||||
the "unknown code point" character (`U+FFFD`). For security reasons, | the "unknown code point" character (`U+FFFD`). For security reasons, | |||
the code point `U+0000` will also be replaced by `U+FFFD`. | the code point `U+0000` will also be replaced by `U+FFFD`. | |||
. | . | |||
# Ӓ Ϡ � � | # Ӓ Ϡ � � | |||
. | . | |||
<p># Ó’ Ï ï¿½ �</p> | <p># Ó’ Ï ï¿½ �</p> | |||
. | . | |||
[Hexadecimal entities](@hexadecimal-entities) consist of `&#` + either | [Hexadecimal numeric character | |||
`X` or `x` + a string of 1-8 hexadecimal digits + `;`. They will also | references](@hexadecimal-numeric-character-references) consist of `&#` + | |||
be parsed and turned into the corresponding Unicode code points in the | either `X` or `x` + a string of 1-8 hexadecimal digits + `;`. | |||
AST. | They too are parsed as the corresponding Unicode character (this | |||
time specified with a hexadecimal numeral instead of decimal). | ||||
. | . | |||
" ആ ಫ | " ആ ಫ | |||
. | . | |||
<p>" ആ ಫ</p> | <p>" ആ ಫ</p> | |||
. | . | |||
Here are some nonentities: | Here are some nonentities: | |||
. | . | |||
  &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; |   &x; &#; &#x; | |||
&ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; | ||||
. | . | |||
<p>&nbsp &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; & | <p>&nbsp &x; &#; &#x; | |||
amp;hi?;</p> | &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;</p> | |||
. | . | |||
Although HTML5 does accept some entities without a trailing semicolon | Although HTML5 does accept some entity references | |||
(such as `©`), these are not recognized as entities here, because it | without a trailing semicolon (such as `©`), these are not | |||
makes the grammar too ambiguous: | recognized here, because it makes the grammar too ambiguous: | |||
. | . | |||
© | © | |||
. | . | |||
<p>&copy</p> | <p>&copy</p> | |||
. | . | |||
Strings that are not on the list of HTML5 named entities are not | Strings that are not on the list of HTML5 named entities are not | |||
recognized as entities either: | recognized as entity references either: | |||
. | . | |||
&MadeUpEntity; | &MadeUpEntity; | |||
. | . | |||
<p>&MadeUpEntity;</p> | <p>&MadeUpEntity;</p> | |||
. | . | |||
Entities are recognized in any context besides code spans or | Entity and numeric character references are recognized in any | |||
code blocks, including raw HTML, URLs, [link title]s, and | context besides code spans or code blocks or raw HTML, including | |||
[fenced code block] [info string]s: | URLs, [link title]s, and [fenced code block][] [info string]s: | |||
. | . | |||
<a href="öö.html"> | <a href="öö.html"> | |||
. | . | |||
<a href="öö.html"> | <a href="öö.html"> | |||
. | . | |||
. | . | |||
[foo](/föö "föö") | [foo](/föö "föö") | |||
. | . | |||
skipping to change at line 4982 | skipping to change at line 5035 | |||
. | . | |||
``` föö | ``` föö | |||
foo | foo | |||
``` | ``` | |||
. | . | |||
<pre><code class="language-föö">foo | <pre><code class="language-föö">foo | |||
</code></pre> | </code></pre> | |||
. | . | |||
Entities are treated as literal text in code spans and code blocks: | Entity and numeric character references are treated as literal | |||
text in code spans and code blocks, and in raw HTML: | ||||
. | . | |||
`föö` | `föö` | |||
. | . | |||
<p><code>f&ouml;&ouml;</code></p> | <p><code>f&ouml;&ouml;</code></p> | |||
. | . | |||
. | . | |||
föfö | föfö | |||
. | . | |||
<pre><code>f&ouml;f&ouml; | <pre><code>f&ouml;f&ouml; | |||
</code></pre> | </code></pre> | |||
. | . | |||
. | ||||
<a href="föfö"/> | ||||
. | ||||
<a href="föfö"/> | ||||
. | ||||
## Code spans | ## Code spans | |||
A [backtick string](@backtick-string) | A [backtick string](@backtick-string) | |||
is a string of one or more backtick characters (`` ` ``) that is neither | is a string of one or more backtick characters (`` ` ``) that is neither | |||
preceded nor followed by a backtick. | preceded nor followed by a backtick. | |||
A [code span](@code-span) begins with a backtick string and ends with | A [code span](@code-span) begins with a backtick string and ends with | |||
a backtick string of equal length. The contents of the code span are | a backtick string of equal length. The contents of the code span are | |||
the characters between the two backtick strings, with leading and | the characters between the two backtick strings, with leading and | |||
trailing spaces and [line ending]s removed, and | trailing spaces and [line ending]s removed, and | |||
skipping to change at line 5269 | skipping to change at line 5329 | |||
are a bit more complex than the ones given here.) | are a bit more complex than the ones given here.) | |||
The following rules define emphasis and strong emphasis: | The following rules define emphasis and strong emphasis: | |||
1. A single `*` character [can open emphasis](@can-open-emphasis) | 1. A single `*` character [can open emphasis](@can-open-emphasis) | |||
iff (if and only if) it is part of a [left-flanking delimiter run]. | iff (if and only if) it is part of a [left-flanking delimiter run]. | |||
2. A single `_` character [can open emphasis] iff | 2. A single `_` character [can open emphasis] iff | |||
it is part of a [left-flanking delimiter run] | it is part of a [left-flanking delimiter run] | |||
and either (a) not part of a [right-flanking delimiter run] | and either (a) not part of a [right-flanking delimiter run] | |||
or (b) part of a [right-flanking delimeter run] | or (b) part of a [right-flanking delimiter run] | |||
preceded by punctuation. | preceded by punctuation. | |||
3. A single `*` character [can close emphasis](@can-close-emphasis) | 3. A single `*` character [can close emphasis](@can-close-emphasis) | |||
iff it is part of a [right-flanking delimiter run]. | iff it is part of a [right-flanking delimiter run]. | |||
4. A single `_` character [can close emphasis] iff | 4. A single `_` character [can close emphasis] iff | |||
it is part of a [right-flanking delimiter run] | it is part of a [right-flanking delimiter run] | |||
and either (a) not part of a [left-flanking delimiter run] | and either (a) not part of a [left-flanking delimiter run] | |||
or (b) part of a [left-flanking delimeter run] | or (b) part of a [left-flanking delimiter run] | |||
followed by punctuation. | followed by punctuation. | |||
5. A double `**` [can open strong emphasis](@can-open-strong-emphasis) | 5. A double `**` [can open strong emphasis](@can-open-strong-emphasis) | |||
iff it is part of a [left-flanking delimiter run]. | iff it is part of a [left-flanking delimiter run]. | |||
6. A double `__` [can open strong emphasis] iff | 6. A double `__` [can open strong emphasis] iff | |||
it is part of a [left-flanking delimiter run] | it is part of a [left-flanking delimiter run] | |||
and either (a) not part of a [right-flanking delimiter run] | and either (a) not part of a [right-flanking delimiter run] | |||
or (b) part of a [right-flanking delimeter run] | or (b) part of a [right-flanking delimiter run] | |||
preceded by punctuation. | preceded by punctuation. | |||
7. A double `**` [can close strong emphasis](@can-close-strong-emphasis) | 7. A double `**` [can close strong emphasis](@can-close-strong-emphasis) | |||
iff it is part of a [right-flanking delimiter run]. | iff it is part of a [right-flanking delimiter run]. | |||
8. A double `__` [can close strong emphasis] | 8. A double `__` [can close strong emphasis] | |||
it is part of a [right-flanking delimiter run] | it is part of a [right-flanking delimiter run] | |||
and either (a) not part of a [left-flanking delimiter run] | and either (a) not part of a [left-flanking delimiter run] | |||
or (b) part of a [left-flanking delimeter run] | or (b) part of a [left-flanking delimiter run] | |||
followed by punctuation. | followed by punctuation. | |||
9. Emphasis begins with a delimiter that [can open emphasis] and ends | 9. Emphasis begins with a delimiter that [can open emphasis] and ends | |||
with a delimiter that [can close emphasis], and that uses the same | with a delimiter that [can close emphasis], and that uses the same | |||
character (`_` or `*`) as the opening delimiter. There must | character (`_` or `*`) as the opening delimiter. There must | |||
be a nonempty sequence of inlines between the open delimiter | be a nonempty sequence of inlines between the open delimiter | |||
and the closing delimiter; these form the contents of the emphasis | and the closing delimiter; these form the contents of the emphasis | |||
inline. | inline. | |||
10. Strong emphasis begins with a delimiter that | 10. Strong emphasis begins with a delimiter that | |||
skipping to change at line 6512 | skipping to change at line 6572 | |||
<p><a href="foo):">link</a></p> | <p><a href="foo):">link</a></p> | |||
. | . | |||
A link can contain fragment identifiers and queries: | A link can contain fragment identifiers and queries: | |||
. | . | |||
[link](#fragment) | [link](#fragment) | |||
[link](http://example.com#fragment) | [link](http://example.com#fragment) | |||
[link](http://example.com?foo=bar&baz#fragment) | [link](http://example.com?foo=3#frag) | |||
. | . | |||
<p><a href="#fragment">link</a></p> | <p><a href="#fragment">link</a></p> | |||
<p><a href="http://example.com#fragment">link</a></p> | <p><a href="http://example.com#fragment">link</a></p> | |||
<p><a href="http://example.com?foo=bar&baz#fragment">link</a></p> | <p><a href="http://example.com?foo=3#frag">link</a></p> | |||
. | . | |||
Note that a backslash before a non-escapable character is | Note that a backslash before a non-escapable character is | |||
just a backslash: | just a backslash: | |||
. | . | |||
[link](foo\bar) | [link](foo\bar) | |||
. | . | |||
<p><a href="foo%5Cbar">link</a></p> | <p><a href="foo%5Cbar">link</a></p> | |||
. | . | |||
URL-escaping should be left alone inside the destination, as all | URL-escaping should be left alone inside the destination, as all | |||
URL-escaped characters are also valid URL characters. HTML entities in | URL-escaped characters are also valid URL characters. Entity and | |||
the destination will be parsed into the corresponding Unicode | numerical character references in the destination will be parsed | |||
code points, as usual, and optionally URL-escaped when written as HTML. | into the corresponding Unicode code points, as usual. These may | |||
be optionally URL-escaped when written as HTML, but this spec | ||||
does not enforce any particular policy for rendering URLs in | ||||
HTML or other formats. Renderers may make different decisions | ||||
about how to escape or normalize URLs in the output. | ||||
. | . | |||
[link](foo%20bä) | [link](foo%20bä) | |||
. | . | |||
<p><a href="foo%20b%C3%A4">link</a></p> | <p><a href="foo%20b%C3%A4">link</a></p> | |||
. | . | |||
Note that, because titles can often be parsed as destinations, | Note that, because titles can often be parsed as destinations, | |||
if you try to omit the destination and keep the title, you'll | if you try to omit the destination and keep the title, you'll | |||
get unexpected results: | get unexpected results: | |||
skipping to change at line 6561 | skipping to change at line 6625 | |||
. | . | |||
[link](/url "title") | [link](/url "title") | |||
[link](/url 'title') | [link](/url 'title') | |||
[link](/url (title)) | [link](/url (title)) | |||
. | . | |||
<p><a href="/url" title="title">link</a> | <p><a href="/url" title="title">link</a> | |||
<a href="/url" title="title">link</a> | <a href="/url" title="title">link</a> | |||
<a href="/url" title="title">link</a></p> | <a href="/url" title="title">link</a></p> | |||
. | . | |||
Backslash escapes and entities may be used in titles: | Backslash escapes and entity and numeric character references | |||
may be used in titles: | ||||
. | . | |||
[link](/url "title \""") | [link](/url "title \""") | |||
. | . | |||
<p><a href="/url" title="title """>link</a></p> | <p><a href="/url" title="title """>link</a></p> | |||
. | . | |||
Nested balanced quotes are not allowed without escaping: | Nested balanced quotes are not allowed without escaping: | |||
. | . | |||
skipping to change at line 6589 | skipping to change at line 6654 | |||
. | . | |||
[link](/url 'title "and" title') | [link](/url 'title "and" title') | |||
. | . | |||
<p><a href="/url" title="title "and" title">link</a></p> | <p><a href="/url" title="title "and" title">link</a></p> | |||
. | . | |||
(Note: `Markdown.pl` did allow double quotes inside a double-quoted | (Note: `Markdown.pl` did allow double quotes inside a double-quoted | |||
title, and its test suite included a test demonstrating this. | title, and its test suite included a test demonstrating this. | |||
But it is hard to see a good rationale for the extra complexity this | But it is hard to see a good rationale for the extra complexity this | |||
brings, since there are already many ways---backslash escaping, | brings, since there are already many ways---backslash escaping, | |||
entities, or using a different quote type for the enclosing title---to | entity and numeric character references, or using a different | |||
write titles containing double quotes. `Markdown.pl`'s handling of | quote type for the enclosing title---to write titles containing | |||
titles has a number of other strange features. For example, it allows | double quotes. `Markdown.pl`'s handling of titles has a number | |||
single-quoted titles in inline links, but not reference links. And, in | of other strange features. For example, it allows single-quoted | |||
reference links but not inline links, it allows a title to begin with | titles in inline links, but not reference links. And, in | |||
`"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing | reference links but not inline links, it allows a title to begin | |||
quotation mark, though 1.0.2b8 does not. It seems preferable to adopt | with `"` and end with `)`. `Markdown.pl` 1.0.1 even allows | |||
a simple, rational rule that works the same way in inline links and | titles with no closing quotation mark, though 1.0.2b8 does not. | |||
link reference definitions.) | It seems preferable to adopt a simple, rational rule that works | |||
the same way in inline links and link reference definitions.) | ||||
[Whitespace] is allowed around the destination and title: | [Whitespace] is allowed around the destination and title: | |||
. | . | |||
[link]( /uri | [link]( /uri | |||
"title" ) | "title" ) | |||
. | . | |||
<p><a href="/uri" title="title">link</a></p> | <p><a href="/uri" title="title">link</a></p> | |||
. | . | |||
skipping to change at line 6728 | skipping to change at line 6794 | |||
[foo<http://example.com/?search=](uri)> | [foo<http://example.com/?search=](uri)> | |||
. | . | |||
<p>[foo<a href="http://example.com/?search=%5D(uri)">http://example.com/?search= ](uri)</a></p> | <p>[foo<a href="http://example.com/?search=%5D(uri)">http://example.com/?search= ](uri)</a></p> | |||
. | . | |||
There are three kinds of [reference link](@reference-link)s: | There are three kinds of [reference link](@reference-link)s: | |||
[full](#full-reference-link), [collapsed](#collapsed-reference-link), | [full](#full-reference-link), [collapsed](#collapsed-reference-link), | |||
and [shortcut](#shortcut-reference-link). | and [shortcut](#shortcut-reference-link). | |||
A [full reference link](@full-reference-link) | A [full reference link](@full-reference-link) | |||
consists of a [link text], optional [whitespace], and a [link label] | consists of a [link text] immediately followed by a [link label] | |||
that [matches] a [link reference definition] elsewhere in the document. | that [matches] a [link reference definition] elsewhere in the document. | |||
A [link label](@link-label) begins with a left bracket (`[`) and ends | A [link label](@link-label) begins with a left bracket (`[`) and ends | |||
with the first right bracket (`]`) that is not backslash-escaped. | with the first right bracket (`]`) that is not backslash-escaped. | |||
Between these brackets there must be at least one [non-whitespace character]. | Between these brackets there must be at least one [non-whitespace character]. | |||
Unescaped square bracket characters are not allowed in | Unescaped square bracket characters are not allowed in | |||
[link label]s. A link label can have at most 999 | [link label]s. A link label can have at most 999 | |||
characters inside the square brackets. | characters inside the square brackets. | |||
One label [matches](@matches) | One label [matches](@matches) | |||
skipping to change at line 6898 | skipping to change at line 6964 | |||
. | . | |||
[Foo | [Foo | |||
bar]: /url | bar]: /url | |||
[Baz][Foo bar] | [Baz][Foo bar] | |||
. | . | |||
<p><a href="/url">Baz</a></p> | <p><a href="/url">Baz</a></p> | |||
. | . | |||
There can be [whitespace] between the [link text] and the [link label]: | No [whitespace] is allowed between the [link text] and the | |||
[link label]: | ||||
. | . | |||
[foo] [bar] | [foo] [bar] | |||
[bar]: /url "title" | [bar]: /url "title" | |||
. | . | |||
<p><a href="/url" title="title">foo</a></p> | <p>[foo] <a href="/url" title="title">bar</a></p> | |||
. | . | |||
. | . | |||
[foo] | [foo] | |||
[bar] | [bar] | |||
[bar]: /url "title" | [bar]: /url "title" | |||
. | . | |||
<p><a href="/url" title="title">foo</a></p> | <p>[foo] | |||
<a href="/url" title="title">bar</a></p> | ||||
. | . | |||
This is a departure from John Gruber's original Markdown syntax | ||||
description, which explicitly allows whitespace between the link | ||||
text and the link label. It brings reference links in line with | ||||
[inline link]s, which (according to both original Markdown and | ||||
this spec) cannot have whitespace after the link text. More | ||||
importantly, it prevents inadvertent capture of consecutive | ||||
[shortcut reference link]s. If whitespace is allowed between the | ||||
link text and the link label, then in the following we will have | ||||
a single reference link, not two shortcut reference links, as | ||||
intended: | ||||
``` markdown | ||||
[foo] | ||||
[bar] | ||||
[foo]: /url1 | ||||
[bar]: /url2 | ||||
``` | ||||
(Note that [shortcut reference link]s were introduced by Gruber | ||||
himself in a beta version of `Markdown.pl`, but never included | ||||
in the official syntax description. Without shortcut reference | ||||
links, it is harmless to allow space between the link text and | ||||
link label; but once shortcut references are introduced, it is | ||||
too dangerous to allow this, as it frequently leads to | ||||
unintended results.) | ||||
When there are multiple matching [link reference definition]s, | When there are multiple matching [link reference definition]s, | |||
the first is used: | the first is used: | |||
. | . | |||
[foo]: /url1 | [foo]: /url1 | |||
[foo]: /url2 | [foo]: /url2 | |||
[bar][foo] | [bar][foo] | |||
. | . | |||
skipping to change at line 6980 | skipping to change at line 7075 | |||
. | . | |||
. | . | |||
[foo][ref\[] | [foo][ref\[] | |||
[ref\[]: /uri | [ref\[]: /uri | |||
. | . | |||
<p><a href="/uri">foo</a></p> | <p><a href="/uri">foo</a></p> | |||
. | . | |||
Note that in this example `]` is not backslash-escaped: | ||||
. | ||||
[bar\\]: /uri | ||||
[bar\\] | ||||
. | ||||
<p><a href="/uri">bar\</a></p> | ||||
. | ||||
A [link label] must contain at least one [non-whitespace character]: | A [link label] must contain at least one [non-whitespace character]: | |||
. | . | |||
[] | [] | |||
[]: /uri | []: /uri | |||
. | . | |||
<p>[]</p> | <p>[]</p> | |||
<p>[]: /uri</p> | <p>[]: /uri</p> | |||
. | . | |||
skipping to change at line 7007 | skipping to change at line 7112 | |||
. | . | |||
<p>[ | <p>[ | |||
]</p> | ]</p> | |||
<p>[ | <p>[ | |||
]: /uri</p> | ]: /uri</p> | |||
. | . | |||
A [collapsed reference link](@collapsed-reference-link) | A [collapsed reference link](@collapsed-reference-link) | |||
consists of a [link label] that [matches] a | consists of a [link label] that [matches] a | |||
[link reference definition] elsewhere in the | [link reference definition] elsewhere in the | |||
document, optional [whitespace], and the string `[]`. | document, followed by the string `[]`. | |||
The contents of the first link label are parsed as inlines, | The contents of the first link label are parsed as inlines, | |||
which are used as the link's text. The link's URI and title are | which are used as the link's text. The link's URI and title are | |||
provided by the matching reference link definition. Thus, | provided by the matching reference link definition. Thus, | |||
`[foo][]` is equivalent to `[foo][foo]`. | `[foo][]` is equivalent to `[foo][foo]`. | |||
. | . | |||
[foo][] | [foo][] | |||
[foo]: /url "title" | [foo]: /url "title" | |||
. | . | |||
skipping to change at line 7039 | skipping to change at line 7144 | |||
The link labels are case-insensitive: | The link labels are case-insensitive: | |||
. | . | |||
[Foo][] | [Foo][] | |||
[foo]: /url "title" | [foo]: /url "title" | |||
. | . | |||
<p><a href="/url" title="title">Foo</a></p> | <p><a href="/url" title="title">Foo</a></p> | |||
. | . | |||
As with full reference links, [whitespace] is allowed | As with full reference links, [whitespace] is not | |||
between the two sets of brackets: | allowed between the two sets of brackets: | |||
. | . | |||
[foo] | [foo] | |||
[] | [] | |||
[foo]: /url "title" | [foo]: /url "title" | |||
. | . | |||
<p><a href="/url" title="title">foo</a></p> | <p><a href="/url" title="title">foo</a> | |||
[]</p> | ||||
. | . | |||
A [shortcut reference link](@shortcut-reference-link) | A [shortcut reference link](@shortcut-reference-link) | |||
consists of a [link label] that [matches] a | consists of a [link label] that [matches] a | |||
[link reference definition] elsewhere in the | [link reference definition] elsewhere in the | |||
document and is not followed by `[]` or a link label. | document and is not followed by `[]` or a link label. | |||
The contents of the first link label are parsed as inlines, | The contents of the first link label are parsed as inlines, | |||
which are used as the link's text. the link's URI and title | which are used as the link's text. the link's URI and title | |||
are provided by the matching link reference definition. | are provided by the matching link reference definition. | |||
Thus, `[foo]` is equivalent to `[foo][]`. | Thus, `[foo]` is equivalent to `[foo][]`. | |||
skipping to change at line 7268 | skipping to change at line 7374 | |||
. | . | |||
![](/url) | ![](/url) | |||
. | . | |||
<p><img src="/url" alt="" /></p> | <p><img src="/url" alt="" /></p> | |||
. | . | |||
Reference-style: | Reference-style: | |||
. | . | |||
![foo] [bar] | ![foo][bar] | |||
[bar]: /url | [bar]: /url | |||
. | . | |||
<p><img src="/url" alt="foo" /></p> | <p><img src="/url" alt="foo" /></p> | |||
. | . | |||
. | . | |||
![foo] [bar] | ![foo][bar] | |||
[BAR]: /url | [BAR]: /url | |||
. | . | |||
<p><img src="/url" alt="foo" /></p> | <p><img src="/url" alt="foo" /></p> | |||
. | . | |||
Collapsed: | Collapsed: | |||
. | . | |||
![foo][] | ![foo][] | |||
skipping to change at line 7311 | skipping to change at line 7417 | |||
The labels are case-insensitive: | The labels are case-insensitive: | |||
. | . | |||
![Foo][] | ![Foo][] | |||
[foo]: /url "title" | [foo]: /url "title" | |||
. | . | |||
<p><img src="/url" alt="Foo" title="title" /></p> | <p><img src="/url" alt="Foo" title="title" /></p> | |||
. | . | |||
As with full reference links, [whitespace] is allowed | As with reference links, [whitespace] is not allowed | |||
between the two sets of brackets: | between the two sets of brackets: | |||
. | . | |||
![foo] | ![foo] | |||
[] | [] | |||
[foo]: /url "title" | [foo]: /url "title" | |||
. | . | |||
<p><img src="/url" alt="foo" title="title" /></p> | <p><img src="/url" alt="foo" title="title" /> | |||
[]</p> | ||||
. | . | |||
Shortcut: | Shortcut: | |||
. | . | |||
![foo] | ![foo] | |||
[foo]: /url "title" | [foo]: /url "title" | |||
. | . | |||
<p><img src="/url" alt="foo" title="title" /></p> | <p><img src="/url" alt="foo" title="title" /></p> | |||
skipping to change at line 7594 | skipping to change at line 7701 | |||
A [single-quoted attribute value](@single-quoted-attribute-value) | A [single-quoted attribute value](@single-quoted-attribute-value) | |||
consists of `'`, zero or more | consists of `'`, zero or more | |||
characters not including `'`, and a final `'`. | characters not including `'`, and a final `'`. | |||
A [double-quoted attribute value](@double-quoted-attribute-value) | A [double-quoted attribute value](@double-quoted-attribute-value) | |||
consists of `"`, zero or more | consists of `"`, zero or more | |||
characters not including `"`, and a final `"`. | characters not including `"`, and a final `"`. | |||
An [open tag](@open-tag) consists of a `<` character, a [tag name], | An [open tag](@open-tag) consists of a `<` character, a [tag name], | |||
zero or more [attributes](@attribute], optional [whitespace], an optional `/` | zero or more [attribute]s, optional [whitespace], an optional `/` | |||
character, and a `>` character. | character, and a `>` character. | |||
A [closing tag](@closing-tag) consists of the string `</`, a | A [closing tag](@closing-tag) consists of the string `</`, a | |||
[tag name], optional [whitespace], and the character `>`. | [tag name], optional [whitespace], and the character `>`. | |||
An [HTML comment](@html-comment) consists of `<!--` + *text* + `-->`, | An [HTML comment](@html-comment) consists of `<!--` + *text* + `-->`, | |||
where *text* does not start with `>` or `->`, does not end with `-`, | where *text* does not start with `>` or `->`, does not end with `-`, | |||
and does not contain `--`. (See the | and does not contain `--`. (See the | |||
[HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) | [HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) | |||
skipping to change at line 7662 | skipping to change at line 7769 | |||
<a foo="bar" bam = 'baz <em>"</em>' | <a foo="bar" bam = 'baz <em>"</em>' | |||
_boolean zoop:33=zoop:33 /> | _boolean zoop:33=zoop:33 /> | |||
. | . | |||
<p><a foo="bar" bam = 'baz <em>"</em>' | <p><a foo="bar" bam = 'baz <em>"</em>' | |||
_boolean zoop:33=zoop:33 /></p> | _boolean zoop:33=zoop:33 /></p> | |||
. | . | |||
Custom tag names can be used: | Custom tag names can be used: | |||
. | . | |||
<responsive-image src="foo.jpg" /> | Foo <responsive-image src="foo.jpg" /> | |||
<My-Tag> | ||||
foo | ||||
</My-Tag> | ||||
. | . | |||
<responsive-image src="foo.jpg" /> | <p>Foo <responsive-image src="foo.jpg" /></p> | |||
<My-Tag> | ||||
foo | ||||
</My-Tag> | ||||
. | . | |||
Illegal tag names, not parsed as HTML: | Illegal tag names, not parsed as HTML: | |||
. | . | |||
<33> <__> | <33> <__> | |||
. | . | |||
<p><33> <__></p> | <p><33> <__></p> | |||
. | . | |||
skipping to change at line 7719 | skipping to change at line 7819 | |||
. | . | |||
<a href='bar'title=title> | <a href='bar'title=title> | |||
. | . | |||
<p><a href='bar'title=title></p> | <p><a href='bar'title=title></p> | |||
. | . | |||
Closing tags: | Closing tags: | |||
. | . | |||
</a> | </a></foo > | |||
</foo > | ||||
. | . | |||
</a> | <p></a></foo ></p> | |||
</foo > | ||||
. | . | |||
Illegal attributes in closing tag: | Illegal attributes in closing tag: | |||
. | . | |||
</a href="foo"> | </a href="foo"> | |||
. | . | |||
<p></a href="foo"></p> | <p></a href="foo"></p> | |||
. | . | |||
skipping to change at line 7785 | skipping to change at line 7883 | |||
. | . | |||
CDATA sections: | CDATA sections: | |||
. | . | |||
foo <![CDATA[>&<]]> | foo <![CDATA[>&<]]> | |||
. | . | |||
<p>foo <![CDATA[>&<]]></p> | <p>foo <![CDATA[>&<]]></p> | |||
. | . | |||
Entities are preserved in HTML attributes: | Entity and numeric character references are preserved in HTML | |||
attributes: | ||||
. | . | |||
<a href="ö"> | foo <a href="ö"> | |||
. | . | |||
<a href="ö"> | <p>foo <a href="ö"></p> | |||
. | . | |||
Backslash escapes do not work in HTML attributes: | Backslash escapes do not work in HTML attributes: | |||
. | . | |||
<a href="\*"> | foo <a href="\*"> | |||
. | . | |||
<a href="\*"> | <p>foo <a href="\*"></p> | |||
. | . | |||
. | . | |||
<a href="\""> | <a href="\""> | |||
. | . | |||
<p><a href="""></p> | <p><a href="""></p> | |||
. | . | |||
## Hard line breaks | ## Hard line breaks | |||
skipping to change at line 8017 | skipping to change at line 8116 | |||
## Overview {-} | ## Overview {-} | |||
Parsing has two phases: | Parsing has two phases: | |||
1. In the first phase, lines of input are consumed and the block | 1. In the first phase, lines of input are consumed and the block | |||
structure of the document---its division into paragraphs, block quotes, | structure of the document---its division into paragraphs, block quotes, | |||
list items, and so on---is constructed. Text is assigned to these | list items, and so on---is constructed. Text is assigned to these | |||
blocks but not parsed. Link reference definitions are parsed and a | blocks but not parsed. Link reference definitions are parsed and a | |||
map of links is constructed. | map of links is constructed. | |||
2. In the second phase, the raw text contents of paragraphs and headers | 2. In the second phase, the raw text contents of paragraphs and headings | |||
are parsed into sequences of Markdown inline elements (strings, | are parsed into sequences of Markdown inline elements (strings, | |||
code spans, links, emphasis, and so on), using the map of link | code spans, links, emphasis, and so on), using the map of link | |||
references constructed in phase 1. | references constructed in phase 1. | |||
At each point in processing, the document is represented as a tree of | At each point in processing, the document is represented as a tree of | |||
**blocks**. The root of the tree is a `document` block. The `document` | **blocks**. The root of the tree is a `document` block. The `document` | |||
may have any number of other blocks as **children**. These children | may have any number of other blocks as **children**. These children | |||
may, in turn, have other blocks as children. The last child of a block | may, in turn, have other blocks as children. The last child of a block | |||
is normally considered **open**, meaning that subsequent lines of input | is normally considered **open**, meaning that subsequent lines of input | |||
can alter its contents. (Blocks that are not open are **closed**.) | can alter its contents. (Blocks that are not open are **closed**.) | |||
skipping to change at line 8080 | skipping to change at line 8179 | |||
2. Next, after consuming the continuation markers for existing | 2. Next, after consuming the continuation markers for existing | |||
blocks, we look for new block starts (e.g. `>` for a block quote. | blocks, we look for new block starts (e.g. `>` for a block quote. | |||
If we encounter a new block start, we close any blocks unmatched | If we encounter a new block start, we close any blocks unmatched | |||
in step 1 before creating the new block as a child of the last | in step 1 before creating the new block as a child of the last | |||
matched block. | matched block. | |||
3. Finally, we look at the remainder of the line (after block | 3. Finally, we look at the remainder of the line (after block | |||
markers like `>`, list markers, and indentation have been consumed). | markers like `>`, list markers, and indentation have been consumed). | |||
This is text that can be incorporated into the last open | This is text that can be incorporated into the last open | |||
block (a paragraph, code block, header, or raw HTML). | block (a paragraph, code block, heading, or raw HTML). | |||
Setext headers are formed when we detect that the second line of | Setext headings are formed when we detect that the second line of | |||
a paragraph is a setext header line. | a paragraph is a setext heading line. | |||
Reference link definitions are detected when a paragraph is closed; | Reference link definitions are detected when a paragraph is closed; | |||
the accumulated text lines are parsed to see if they begin with | the accumulated text lines are parsed to see if they begin with | |||
one or more reference link definitions. Any remainder becomes a | one or more reference link definitions. Any remainder becomes a | |||
normal paragraph. | normal paragraph. | |||
We can see how this works by considering how the tree above is | We can see how this works by considering how the tree above is | |||
generated by four lines of Markdown: | generated by four lines of Markdown: | |||
``` markdown | ``` markdown | |||
skipping to change at line 8192 | skipping to change at line 8291 | |||
-> list_item | -> list_item | |||
-> paragraph | -> paragraph | |||
"aliquando id" | "aliquando id" | |||
``` | ``` | |||
## Phase 2: inline structure {-} | ## Phase 2: inline structure {-} | |||
Once all of the input has been parsed, all open blocks are closed. | Once all of the input has been parsed, all open blocks are closed. | |||
We then "walk the tree," visiting every node, and parse raw | We then "walk the tree," visiting every node, and parse raw | |||
string contents of paragraphs and headers as inlines. At this | string contents of paragraphs and headings as inlines. At this | |||
point we have seen all the link reference definitions, so we can | point we have seen all the link reference definitions, so we can | |||
resolve reference links as we go. | resolve reference links as we go. | |||
``` tree | ``` tree | |||
document | document | |||
block_quote | block_quote | |||
paragraph | paragraph | |||
str "Lorem ipsum dolor" | str "Lorem ipsum dolor" | |||
softbreak | softbreak | |||
str "sit amet." | str "sit amet." | |||
End of changes. 110 change blocks. | ||||
160 lines changed or deleted | 258 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |