Couldn't find wdiff. Falling back to builtin diff colouring...
spec.txt | spec.txt | |||
---|---|---|---|---|
--- | --- | |||
title: CommonMark Spec | title: CommonMark Spec | |||
author: John MacFarlane | author: John MacFarlane | |||
version: 0.20 | version: 0.21 | |||
date: 2015-06-08 | date: 2015-07-14 | |||
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | |||
... | ... | |||
# Introduction | # Introduction | |||
## What is Markdown? | ## What is Markdown? | |||
Markdown is a plain text format for writing structured documents, | Markdown is a plain text format for writing structured documents, | |||
based on conventions used for indicating formatting in email and | based on conventions used for indicating formatting in email and | |||
usenet posts. It was developed in 2004 by John Gruber, who wrote | usenet posts. It was developed in 2004 by John Gruber, who wrote | |||
skipping to change at line 240 | skipping to change at line 240 | |||
A [unicode whitespace character](@unicode-whitespace-character) is | A [unicode whitespace character](@unicode-whitespace-character) is | |||
any code point in the unicode `Zs` class, or a tab (`U+0009`), | any code point in the unicode `Zs` class, or a tab (`U+0009`), | |||
carriage return (`U+000D`), newline (`U+000A`), or form feed | carriage return (`U+000D`), newline (`U+000A`), or form feed | |||
(`U+000C`). | (`U+000C`). | |||
[Unicode whitespace](@unicode-whitespace) is a sequence of one | [Unicode whitespace](@unicode-whitespace) is a sequence of one | |||
or more [unicode whitespace character]s. | or more [unicode whitespace character]s. | |||
A [space](@space) is `U+0020`. | A [space](@space) is `U+0020`. | |||
A [non-space character](@non-space-character) is any character | A [non-whitespace character](@non-space-character) is any character | |||
that is not a [whitespace character]. | that is not a [whitespace character]. | |||
An [ASCII punctuation character](@ascii-punctuation-character) | An [ASCII punctuation character](@ascii-punctuation-character) | |||
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | |||
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | |||
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | |||
A [punctuation character](@punctuation-character) is an [ASCII | A [punctuation character](@punctuation-character) is an [ASCII | |||
punctuation character] or anything in | punctuation character] or anything in | |||
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | |||
## Preprocessing | ## Tabs | |||
Tabs in lines are immediately expanded to [spaces][space], with a tab | Tabs in lines are not expanded to [spaces][space]. However, | |||
stop of 4 characters: | in contexts where indentation is significant for the | |||
document's structure, tabs behave as if they were replaced | ||||
by spaces with a tab stop of 4 characters. | ||||
. | . | |||
→foo→baz→→bim | →foo→baz→→bim | |||
. | . | |||
<pre><code>foo baz bim | <pre><code>foo→baz→→bim | |||
</code></pre> | ||||
. | ||||
. | ||||
→foo→baz→→bim | ||||
. | ||||
<pre><code>foo→baz→→bim | ||||
</code></pre> | </code></pre> | |||
. | . | |||
. | . | |||
a→a | a→a | |||
ὐ→a | ὐ→a | |||
. | . | |||
<pre><code>a a | <pre><code>a→a | |||
ὐ a | ὐ→a | |||
</code></pre> | </code></pre> | |||
. | . | |||
. | ||||
- foo | ||||
→bar | ||||
. | ||||
<ul> | ||||
<li> | ||||
<p>foo</p> | ||||
<p>bar</p> | ||||
</li> | ||||
</ul> | ||||
. | ||||
. | ||||
>→foo→bar | ||||
. | ||||
<blockquote> | ||||
<p>foo→bar</p> | ||||
</blockquote> | ||||
. | ||||
## Insecure characters | ## Insecure characters | |||
For security reasons, the Unicode character `U+0000` must be replaced | For security reasons, the Unicode character `U+0000` must be replaced | |||
with the replacement character (`U+FFFD`). | with the replacement character (`U+FFFD`). | |||
# Blocks and inlines | # Blocks and inlines | |||
We can think of a document as a sequence of | We can think of a document as a sequence of | |||
[blocks](@block)---structural elements like paragraphs, block | [blocks](@block)---structural elements like paragraphs, block | |||
quotations, lists, headers, rules, and code blocks. Some blocks (like | quotations, lists, headers, rules, and code blocks. Some blocks (like | |||
skipping to change at line 446 | skipping to change at line 476 | |||
a------ | a------ | |||
---a--- | ---a--- | |||
. | . | |||
<p>_ _ _ _ a</p> | <p>_ _ _ _ a</p> | |||
<p>a------</p> | <p>a------</p> | |||
<p>---a---</p> | <p>---a---</p> | |||
. | . | |||
It is required that all of the [non-space character]s be the same. | It is required that all of the [non-whitespace character]s be the same. | |||
So, this is not a horizontal rule: | So, this is not a horizontal rule: | |||
. | . | |||
*-* | *-* | |||
. | . | |||
<p><em>-</em></p> | <p><em>-</em></p> | |||
. | . | |||
Horizontal rules do not need blank lines before or after: | Horizontal rules do not need blank lines before or after: | |||
skipping to change at line 536 | skipping to change at line 566 | |||
</ul> | </ul> | |||
. | . | |||
## ATX headers | ## ATX headers | |||
An [ATX header](@atx-header) | An [ATX header](@atx-header) | |||
consists of a string of characters, parsed as inline content, between an | consists of a string of characters, parsed as inline content, between an | |||
opening sequence of 1--6 unescaped `#` characters and an optional | opening sequence of 1--6 unescaped `#` characters and an optional | |||
closing sequence of any number of `#` characters. The opening sequence | closing sequence of any number of `#` characters. The opening sequence | |||
of `#` characters cannot be followed directly by a | of `#` characters cannot be followed directly by a | |||
[non-space character]. The optional closing sequence of `#`s must be | [non-whitespace character]. The optional closing sequence of `#`s must be | |||
preceded by a [space] and may be followed by spaces only. The opening | preceded by a [space] and may be followed by spaces only. The opening | |||
`#` character may be indented 0-3 spaces. The raw contents of the | `#` character may be indented 0-3 spaces. The raw contents of the | |||
header are stripped of leading and trailing spaces before being parsed | header are stripped of leading and trailing spaces before being parsed | |||
as inline content. The header level is equal to the number of `#` | as inline content. The header level is equal to the number of `#` | |||
characters in the opening sequence. | characters in the opening sequence. | |||
Simple headers: | Simple headers: | |||
. | . | |||
# foo | # foo | |||
skipping to change at line 668 | skipping to change at line 698 | |||
Spaces are allowed after the closing sequence: | Spaces are allowed after the closing sequence: | |||
. | . | |||
### foo ### | ### foo ### | |||
. | . | |||
<h3>foo</h3> | <h3>foo</h3> | |||
. | . | |||
A sequence of `#` characters with a | A sequence of `#` characters with a | |||
[non-space character] following it | [non-whitespace character] following it | |||
is not a closing sequence, but counts as part of the contents of the | is not a closing sequence, but counts as part of the contents of the | |||
header: | header: | |||
. | . | |||
### foo ### b | ### foo ### b | |||
. | . | |||
<h3>foo ### b</h3> | <h3>foo ### b</h3> | |||
. | . | |||
The closing sequence must be preceded by a space: | The closing sequence must be preceded by a space: | |||
skipping to change at line 737 | skipping to change at line 767 | |||
### ### | ### ### | |||
. | . | |||
<h2></h2> | <h2></h2> | |||
<h1></h1> | <h1></h1> | |||
<h3></h3> | <h3></h3> | |||
. | . | |||
## Setext headers | ## Setext headers | |||
A [setext header](@setext-header) | A [setext header](@setext-header) | |||
consists of a line of text, containing at least one [non-space character], | consists of a line of text, containing at least one [non-whitespace character], | |||
with no more than 3 spaces indentation, followed by a [setext header | with no more than 3 spaces indentation, followed by a [setext header | |||
underline]. The line of text must be | underline]. The line of text must be | |||
one that, were it not followed by the setext header underline, | one that, were it not followed by the setext header underline, | |||
would be interpreted as part of a paragraph: it cannot be | would be interpreted as part of a paragraph: it cannot be | |||
interpretable as a [code fence], [ATX header][ATX headers], | interpretable as a [code fence], [ATX header][ATX headers], | |||
[block quote][block quotes], [horizontal rule][horizontal rules], | [block quote][block quotes], [horizontal rule][horizontal rules], | |||
[list item][list items], or [HTML block][HTML blocks]. | [list item][list items], or [HTML block][HTML blocks]. | |||
A [setext header underline](@setext-header-underline) is a sequence of | A [setext header underline](@setext-header-underline) is a sequence of | |||
`=` characters or a sequence of `-` characters, with no more than 3 | `=` characters or a sequence of `-` characters, with no more than 3 | |||
skipping to change at line 1312 | skipping to change at line 1342 | |||
~~~~ | ~~~~ | |||
aaa | aaa | |||
~~~ | ~~~ | |||
~~~~ | ~~~~ | |||
. | . | |||
<pre><code>aaa | <pre><code>aaa | |||
~~~ | ~~~ | |||
</code></pre> | </code></pre> | |||
. | . | |||
Unclosed code blocks are closed by the end of the document: | Unclosed code blocks are closed by the end of the document | |||
(or the enclosing [block quote] or [list item]): | ||||
. | . | |||
``` | ``` | |||
. | . | |||
<pre><code></code></pre> | <pre><code></code></pre> | |||
. | . | |||
. | . | |||
````` | ````` | |||
``` | ``` | |||
aaa | aaa | |||
. | . | |||
<pre><code> | <pre><code> | |||
``` | ``` | |||
aaa | aaa | |||
</code></pre> | </code></pre> | |||
. | . | |||
. | ||||
> ``` | ||||
> aaa | ||||
bbb | ||||
. | ||||
<blockquote> | ||||
<pre><code>aaa | ||||
</code></pre> | ||||
</blockquote> | ||||
<p>bbb</p> | ||||
. | ||||
A code block can have all empty lines as its content: | A code block can have all empty lines as its content: | |||
. | . | |||
``` | ``` | |||
``` | ``` | |||
. | . | |||
<pre><code> | <pre><code> | |||
</code></pre> | </code></pre> | |||
skipping to change at line 1554 | skipping to change at line 1598 | |||
``` | ``` | |||
``` aaa | ``` aaa | |||
``` | ``` | |||
. | . | |||
<pre><code>``` aaa | <pre><code>``` aaa | |||
</code></pre> | </code></pre> | |||
. | . | |||
## HTML blocks | ## HTML blocks | |||
An [HTML block tag](@html-block-tag) is | An [HTML block](@html-block) is a group of lines that is treated | |||
an [open tag] or [closing tag] whose tag | as raw HTML (and will not be escaped in HTML output). | |||
name is one of the following (case-insensitive): | ||||
`article`, `header`, `aside`, `hgroup`, `blockquote`, `hr`, `iframe`, | ||||
`body`, `li`, `map`, `button`, `object`, `canvas`, `ol`, `caption`, | ||||
`output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`, | ||||
`section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`, | ||||
`fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`, | ||||
`tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`, | ||||
`script`, `style`. | ||||
An [HTML block](@html-block) begins with an | There are seven kinds of [HTML block], which can be defined | |||
[HTML block tag], [HTML comment], [processing instruction], | by their start and end conditions. The block begins with a line that | |||
[declaration], or [CDATA section]. | meets a [start condition](@start-condition) (after up to three spaces | |||
It ends when a [blank line] or the end of the | optional indentation). It ends with the first subsequent line that | |||
input is encountered. The initial line may be indented up to three | meets a matching [end condition](@end-condition), or the last line of | |||
spaces, and subsequent lines may have any indentation. The contents | the document, if no line is encountered that meets the | |||
of the HTML block are interpreted as raw HTML, and will not be escaped | [end condition]. If the first line meets both the [start condition] | |||
in HTML output. | and the [end condition], the block will contain just that line. | |||
Some simple examples: | 1. **Start condition:** line begins with the string `<script`, | |||
`<pre`, or `<style` (case-insensitive), followed by whitespace, | ||||
the string `>`, or the end of the line.\ | ||||
**End condition:** line contains an end tag | ||||
`</script>`, `</pre>`, or `</style>` (case-insensitive; it | ||||
need not match the start tag). | ||||
2. **Start condition:** line begins with the string `<!--`.\ | ||||
**End condition:** line contains the string `-->`. | ||||
3. **Start condition:** line begins with the string `<?`.\ | ||||
**End condition:** line contains the string `?>`. | ||||
4. **Start condition:** line begins with the string `<!` | ||||
followed by an uppercase ASCII letter.\ | ||||
**End condition:** line contains the character `>`. | ||||
5. **Start condition:** line begins with the string | ||||
`<![CDATA[`.\ | ||||
**End condition:** line contains the string `]]>`. | ||||
6. **Start condition:** line begins the string `<` or `</` | ||||
followed by one of the strings (case-insensitive) `address`, | ||||
`article`, `aside`, `base`, `basefont`, `blockquote`, `body`, | ||||
`caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, | ||||
`dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, | ||||
`footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, | ||||
`html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, | ||||
`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`, | ||||
`section`, `source`, `title`, `summary`, `table`, `tbody`, `td`, | ||||
`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed | ||||
by [whitespace], the end of the line, the string `>`, or | ||||
the string `/>`.\ | ||||
**End condition:** line is followed by a [blank line]. | ||||
7. **Start condition:** line begins with an [open tag] | ||||
(with any [tag name]) followed only by [whitespace] or the end | ||||
of the line.\ | ||||
**End condition:** line is followed by a [blank line]. | ||||
All types of [HTML blocks] except type 7 may interrupt | ||||
a paragraph. Blocks of type 7 may not interrupt a paragraph. | ||||
(This restricted is intended to prevent unwanted interpretation | ||||
of long tags inside a wrapped paragraph as starting HTML blocks.) | ||||
Some simple examples follow. Here are some basic HTML blocks | ||||
of type 6: | ||||
. | . | |||
<table> | <table> | |||
<tr> | <tr> | |||
<td> | <td> | |||
hi | hi | |||
</td> | </td> | |||
</tr> | </tr> | |||
</table> | </table> | |||
skipping to change at line 1607 | skipping to change at line 1689 | |||
. | . | |||
<div> | <div> | |||
*hello* | *hello* | |||
<foo><a> | <foo><a> | |||
. | . | |||
<div> | <div> | |||
*hello* | *hello* | |||
<foo><a> | <foo><a> | |||
. | . | |||
A block can also start with a closing tag: | ||||
. | ||||
</div> | ||||
*foo* | ||||
. | ||||
</div> | ||||
*foo* | ||||
. | ||||
Here we have two HTML blocks with a Markdown paragraph between them: | Here we have two HTML blocks with a Markdown paragraph between them: | |||
. | . | |||
<DIV CLASS="foo"> | <DIV CLASS="foo"> | |||
*Markdown* | *Markdown* | |||
</DIV> | </DIV> | |||
. | . | |||
<DIV CLASS="foo"> | <DIV CLASS="foo"> | |||
<p><em>Markdown</em></p> | <p><em>Markdown</em></p> | |||
</DIV> | </DIV> | |||
. | . | |||
In the following example, what looks like a Markdown code block | The tag on the first line can be partial, as long | |||
as it is split where there would be whitespace: | ||||
. | ||||
<div id="foo" | ||||
class="bar"> | ||||
</div> | ||||
. | ||||
<div id="foo" | ||||
class="bar"> | ||||
</div> | ||||
. | ||||
. | ||||
<div id="foo" class="bar | ||||
baz"> | ||||
</div> | ||||
. | ||||
<div id="foo" class="bar | ||||
baz"> | ||||
</div> | ||||
. | ||||
An open tag need not be closed: | ||||
. | ||||
<div> | ||||
*foo* | ||||
*bar* | ||||
. | ||||
<div> | ||||
*foo* | ||||
<p><em>bar</em></p> | ||||
. | ||||
A partial tag need not even be completed (garbage | ||||
in, garbage out): | ||||
. | ||||
<div id="foo" | ||||
*hi* | ||||
. | ||||
<div id="foo" | ||||
*hi* | ||||
. | ||||
. | ||||
<div class | ||||
foo | ||||
. | ||||
<div class | ||||
foo | ||||
. | ||||
The initial tag doesn't even need to be a valid | ||||
tag, as long as it starts like one: | ||||
. | ||||
<div *???-&&&-<--- | ||||
*foo* | ||||
. | ||||
<div *???-&&&-<--- | ||||
*foo* | ||||
. | ||||
In type 6 blocks, the initial tag need not be on a line by | ||||
itself: | ||||
. | ||||
<div><a href="bar">*foo*</a></div> | ||||
. | ||||
<div><a href="bar">*foo*</a></div> | ||||
. | ||||
. | ||||
<table><tr><td> | ||||
foo | ||||
</td></tr></table> | ||||
. | ||||
<table><tr><td> | ||||
foo | ||||
</td></tr></table> | ||||
. | ||||
Everything until the next blank line or end of document | ||||
gets included in the HTML block. So, in the following | ||||
example, what looks like a Markdown code block | ||||
is actually part of the HTML block, which continues until a blank | is actually part of the HTML block, which continues until a blank | |||
line or the end of the document is reached: | line or the end of the document is reached: | |||
. | . | |||
<div></div> | <div></div> | |||
``` c | ``` c | |||
int x = 33; | int x = 33; | |||
``` | ``` | |||
. | . | |||
<div></div> | <div></div> | |||
``` c | ``` c | |||
int x = 33; | int x = 33; | |||
``` | ``` | |||
. | . | |||
A comment: | To start an [HTML block] with a tag that is *not* in the | |||
list of block-level tags in (6), you must put the tag by | ||||
itself on the first line (and it must be complete): | ||||
. | ||||
<a href="foo"> | ||||
*bar* | ||||
</a> | ||||
. | ||||
<a href="foo"> | ||||
*bar* | ||||
</a> | ||||
. | ||||
In type 7 blocks, the [tag name] can be anything: | ||||
. | ||||
<Warning> | ||||
*bar* | ||||
</Warning> | ||||
. | ||||
<Warning> | ||||
*bar* | ||||
</Warning> | ||||
. | ||||
. | ||||
<i class="foo"> | ||||
*bar* | ||||
</i> | ||||
. | ||||
<i class="foo"> | ||||
*bar* | ||||
</i> | ||||
. | ||||
These rules are designed to allow us to work with tags that | ||||
can function as either block-level or inline-level tags. | ||||
The `<del>` tag is a nice example. We can surround content with | ||||
`<del>` tags in three different ways. In this case, we get a raw | ||||
HTML block, because the `<del>` tag is on a line by itself: | ||||
. | ||||
<del> | ||||
*foo* | ||||
</del> | ||||
. | ||||
<del> | ||||
*foo* | ||||
</del> | ||||
. | ||||
In this case, we get a raw HTML block that just includes | ||||
the `<del>` tag (because it ends with the following blank | ||||
line). So the contents get interpreted as CommonMark: | ||||
. | ||||
<del> | ||||
*foo* | ||||
</del> | ||||
. | ||||
<del> | ||||
<p><em>foo</em></p> | ||||
</del> | ||||
. | ||||
Finally, in this case, the `<del>` tags are interpreted | ||||
as [raw HTML] *inside* the CommonMark paragraph. (Because | ||||
the tag is not on a line by itself, we get inline HTML | ||||
rather than an [HTML block].) | ||||
. | ||||
<del>*foo*</del> | ||||
. | ||||
<p><del><em>foo</em></del></p> | ||||
. | ||||
HTML tags designed to contain literal content | ||||
(`script`, `style`, `pre`), comments, processing instructions, | ||||
and declarations are treated somewhat differently. | ||||
Instead of ending at the first blank line, these blocks | ||||
end at the first line containing a corresponding end tag. | ||||
As a result, these blocks can contain blank lines: | ||||
A pre tag (type 1): | ||||
. | ||||
<pre language="haskell"><code> | ||||
import Text.HTML.TagSoup | ||||
main :: IO () | ||||
main = print $ parseTags tags | ||||
</code></pre> | ||||
. | ||||
<pre language="haskell"><code> | ||||
import Text.HTML.TagSoup | ||||
main :: IO () | ||||
main = print $ parseTags tags | ||||
</code></pre> | ||||
. | ||||
A script tag (type 1): | ||||
. | ||||
<script type="text/javascript"> | ||||
// JavaScript example | ||||
document.getElementById("demo").innerHTML = "Hello JavaScript!"; | ||||
</script> | ||||
. | ||||
<script type="text/javascript"> | ||||
// JavaScript example | ||||
document.getElementById("demo").innerHTML = "Hello JavaScript!"; | ||||
</script> | ||||
. | ||||
A style tag (type 1): | ||||
. | ||||
<style | ||||
type="text/css"> | ||||
h1 {color:red;} | ||||
p {color:blue;} | ||||
</style> | ||||
. | ||||
<style | ||||
type="text/css"> | ||||
h1 {color:red;} | ||||
p {color:blue;} | ||||
</style> | ||||
. | ||||
If there is no matching end tag, the block will end at the | ||||
end of the document (or the enclosing [block quote] or | ||||
[list item]): | ||||
. | ||||
<style | ||||
type="text/css"> | ||||
foo | ||||
. | ||||
<style | ||||
type="text/css"> | ||||
foo | ||||
. | ||||
. | ||||
> <div> | ||||
> foo | ||||
bar | ||||
. | ||||
<blockquote> | ||||
<div> | ||||
foo | ||||
</blockquote> | ||||
<p>bar</p> | ||||
. | ||||
. | ||||
- <div> | ||||
- foo | ||||
. | ||||
<ul> | ||||
<li> | ||||
<div> | ||||
</li> | ||||
<li>foo</li> | ||||
</ul> | ||||
. | ||||
The end tag can occur on the same line as the start tag: | ||||
. | ||||
<style>p{color:red;}</style> | ||||
*foo* | ||||
. | ||||
<style>p{color:red;}</style> | ||||
<p><em>foo</em></p> | ||||
. | ||||
. | ||||
<!-- foo -->*bar* | ||||
*baz* | ||||
. | ||||
<!-- foo -->*bar* | ||||
<p><em>baz</em></p> | ||||
. | ||||
Note that anything on the last line after the | ||||
end tag will be included in the [HTML block]: | ||||
. | ||||
<script> | ||||
foo | ||||
</script>1. *bar* | ||||
. | ||||
<script> | ||||
foo | ||||
</script>1. *bar* | ||||
. | ||||
A comment (type 2): | ||||
. | . | |||
<!-- Foo | <!-- Foo | |||
bar | bar | |||
baz --> | baz --> | |||
. | . | |||
<!-- Foo | <!-- Foo | |||
bar | bar | |||
baz --> | baz --> | |||
. | . | |||
A processing instruction: | A processing instruction (type 3): | |||
. | . | |||
<?php | <?php | |||
echo '>'; | echo '>'; | |||
?> | ?> | |||
. | . | |||
<?php | <?php | |||
echo '>'; | echo '>'; | |||
?> | ?> | |||
. | . | |||
CDATA: | A declaration (type 4): | |||
. | ||||
<!DOCTYPE html> | ||||
. | ||||
<!DOCTYPE html> | ||||
. | ||||
CDATA (type 5): | ||||
. | . | |||
<![CDATA[ | <![CDATA[ | |||
function matchwo(a,b) | function matchwo(a,b) | |||
{ | { | |||
if (a < b && a < 0) then | if (a < b && a < 0) then { | |||
{ | return 1; | |||
return 1; | ||||
} | } else { | |||
else | ||||
{ | return 0; | |||
return 0; | ||||
} | } | |||
} | } | |||
]]> | ]]> | |||
. | . | |||
<![CDATA[ | <![CDATA[ | |||
function matchwo(a,b) | function matchwo(a,b) | |||
{ | { | |||
if (a < b && a < 0) then | if (a < b && a < 0) then { | |||
{ | return 1; | |||
return 1; | ||||
} | } else { | |||
else | ||||
{ | return 0; | |||
return 0; | ||||
} | } | |||
} | } | |||
]]> | ]]> | |||
. | . | |||
The opening tag can be indented 1-3 spaces, but not 4: | The opening tag can be indented 1-3 spaces, but not 4: | |||
. | . | |||
<!-- foo --> | <!-- foo --> | |||
<!-- foo --> | <!-- foo --> | |||
. | . | |||
<!-- foo --> | <!-- foo --> | |||
<pre><code><!-- foo --> | <pre><code><!-- foo --> | |||
</code></pre> | </code></pre> | |||
. | . | |||
An HTML block can interrupt a paragraph, and need not be preceded | . | |||
by a blank line. | <div> | |||
<div> | ||||
. | ||||
<div> | ||||
<pre><code><div> | ||||
</code></pre> | ||||
. | ||||
An HTML block of types 1--6 can interrupt a paragraph, and need not be | ||||
preceded by a blank line. | ||||
. | . | |||
Foo | Foo | |||
<div> | <div> | |||
bar | bar | |||
</div> | </div> | |||
. | . | |||
<p>Foo</p> | <p>Foo</p> | |||
<div> | <div> | |||
bar | bar | |||
</div> | </div> | |||
. | . | |||
However, a following blank line is always needed, except at the end of | However, a following blank line is needed, except at the end of | |||
a document: | a document, and except for blocks of types 1--5, above: | |||
. | . | |||
<div> | <div> | |||
bar | bar | |||
</div> | </div> | |||
*foo* | *foo* | |||
. | . | |||
<div> | <div> | |||
bar | bar | |||
</div> | </div> | |||
*foo* | *foo* | |||
. | . | |||
An incomplete HTML block tag may also start an HTML block: | HTML blocks of type 7 cannot interrupt a paragraph: | |||
. | . | |||
<div class | Foo | |||
foo | <a href="bar"> | |||
baz | ||||
. | . | |||
<div class | <p>Foo | |||
foo | <a href="bar"> | |||
baz</p> | ||||
. | . | |||
This rule differs from John Gruber's original Markdown syntax | This rule differs from John Gruber's original Markdown syntax | |||
specification, which says: | specification, which says: | |||
> The only restrictions are that block-level HTML elements — | > The only restrictions are that block-level HTML elements — | |||
> e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from | > e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from | |||
> surrounding content by blank lines, and the start and end tags of the | > surrounding content by blank lines, and the start and end tags of the | |||
> block should not be indented with tabs or spaces. | > block should not be indented with tabs or spaces. | |||
In some ways Gruber's rule is more restrictive than the one given | In some ways Gruber's rule is more restrictive than the one given | |||
here: | here: | |||
- It requires that an HTML block be preceded by a blank line. | - It requires that an HTML block be preceded by a blank line. | |||
- It does not allow the start tag to be indented. | - It does not allow the start tag to be indented. | |||
- It requires a matching end tag, which it also does not allow to | - It requires a matching end tag, which it also does not allow to | |||
be indented. | be indented. | |||
Indeed, most Markdown implementations, including some of Gruber's | Most Markdown implementations (including some of Gruber's own) do not | |||
own perl implementations, do not impose these restrictions. | respect all of these restrictions. | |||
There is one respect, however, in which Gruber's rule is more liberal | There is one respect, however, in which Gruber's rule is more liberal | |||
than the one given here, since it allows blank lines to occur inside | than the one given here, since it allows blank lines to occur inside | |||
an HTML block. There are two reasons for disallowing them here. | an HTML block. There are two reasons for disallowing them here. | |||
First, it removes the need to parse balanced tags, which is | First, it removes the need to parse balanced tags, which is | |||
expensive and can require backtracking from the end of the document | expensive and can require backtracking from the end of the document | |||
if no matching end tag is found. Second, it provides a very simple | if no matching end tag is found. Second, it provides a very simple | |||
and flexible way of including Markdown content inside HTML tags: | and flexible way of including Markdown content inside HTML tags: | |||
simply separate the Markdown from the HTML using blank lines: | simply separate the Markdown from the HTML using blank lines: | |||
Compare: | ||||
. | . | |||
<div> | <div> | |||
*Emphasized* text. | *Emphasized* text. | |||
</div> | </div> | |||
. | . | |||
<div> | <div> | |||
<p><em>Emphasized</em> text.</p> | <p><em>Emphasized</em> text.</p> | |||
</div> | </div> | |||
. | . | |||
Compare: | ||||
. | . | |||
<div> | <div> | |||
*Emphasized* text. | *Emphasized* text. | |||
</div> | </div> | |||
. | . | |||
<div> | <div> | |||
*Emphasized* text. | *Emphasized* text. | |||
</div> | </div> | |||
. | . | |||
skipping to change at line 1830 | skipping to change at line 2242 | |||
. | . | |||
<table> | <table> | |||
<tr> | <tr> | |||
<td> | <td> | |||
Hi | Hi | |||
</td> | </td> | |||
</tr> | </tr> | |||
</table> | </table> | |||
. | . | |||
Moreover, blank lines are usually not necessary and can be | There are problems, however, if the inner tags are indented | |||
deleted. The exception is inside `<pre>` tags; here, one can | *and* separated by spaces, as then they will be interpreted as | |||
replace the blank lines with ` ` entities. | an indented code block: | |||
So there is no important loss of expressive power with the new rule. | . | |||
<table> | ||||
<tr> | ||||
<td> | ||||
Hi | ||||
</td> | ||||
</tr> | ||||
</table> | ||||
. | ||||
<table> | ||||
<tr> | ||||
<pre><code><td> | ||||
Hi | ||||
</td> | ||||
</code></pre> | ||||
</tr> | ||||
</table> | ||||
. | ||||
Fortunately, blank lines are usually not necessary and can be | ||||
deleted. The exception is inside `<pre>` tags, but as described | ||||
above, raw HTML blocks starting with `<pre>` *can* contain blank | ||||
lines. | ||||
## Link reference definitions | ## Link reference definitions | |||
A [link reference definition](@link-reference-definition) | A [link reference definition](@link-reference-definition) | |||
consists of a [link label], indented up to three spaces, followed | consists of a [link label], indented up to three spaces, followed | |||
by a colon (`:`), optional [whitespace] (including up to one | by a colon (`:`), optional [whitespace] (including up to one | |||
[line ending]), a [link destination], | [line ending]), a [link destination], | |||
optional [whitespace] (including up to one | optional [whitespace] (including up to one | |||
[line ending]), and an optional [link | [line ending]), and an optional [link | |||
title], which if it is present must be separated | title], which if it is present must be separated | |||
from the [link destination] by [whitespace]. | from the [link destination] by [whitespace]. | |||
No further [non-space character]s may occur on the line. | No further [non-whitespace character]s may occur on the line. | |||
A [link reference definition] | A [link reference definition] | |||
does not correspond to a structural element of a document. Instead, it | does not correspond to a structural element of a document. Instead, it | |||
defines a label which can be used in [reference link]s | defines a label which can be used in [reference link]s | |||
and reference-style [images] elsewhere in the document. [Link | and reference-style [images] elsewhere in the document. [Link | |||
reference definitions] can come either before or after the links that use | reference definitions] can come either before or after the links that use | |||
them. | them. | |||
. | . | |||
[foo]: /url "title" | [foo]: /url "title" | |||
skipping to change at line 1945 | skipping to change at line 2383 | |||
. | . | |||
[foo]: | [foo]: | |||
[foo] | [foo] | |||
. | . | |||
<p>[foo]:</p> | <p>[foo]:</p> | |||
<p>[foo]</p> | <p>[foo]</p> | |||
. | . | |||
Both title and destination can contain backslash escapes | ||||
and literal backslashes: | ||||
. | ||||
[foo]: /url\bar\*baz "foo\"bar\baz" | ||||
[foo] | ||||
. | ||||
<p><a href="/url%5Cbar*baz" title="foo"bar\baz">foo</a></p> | ||||
. | ||||
A link can come before its corresponding definition: | A link can come before its corresponding definition: | |||
. | . | |||
[foo] | [foo] | |||
[foo]: url | [foo]: url | |||
. | . | |||
<p><a href="url">foo</a></p> | <p><a href="url">foo</a></p> | |||
. | . | |||
skipping to change at line 2006 | skipping to change at line 2455 | |||
. | . | |||
[ | [ | |||
foo | foo | |||
]: /url | ]: /url | |||
bar | bar | |||
. | . | |||
<p>bar</p> | <p>bar</p> | |||
. | . | |||
This is not a link reference definition, because there are | This is not a link reference definition, because there are | |||
[non-space character]s after the title: | [non-whitespace character]s after the title: | |||
. | . | |||
[foo]: /url "title" ok | [foo]: /url "title" ok | |||
. | . | |||
<p>[foo]: /url "title" ok</p> | <p>[foo]: /url "title" ok</p> | |||
. | . | |||
This is a link reference definition, but it has no title: | ||||
. | ||||
[foo]: /url | ||||
"title" ok | ||||
. | ||||
<p>"title" ok</p> | ||||
. | ||||
This is not a link reference definition, because it is indented | This is not a link reference definition, because it is indented | |||
four spaces: | four spaces: | |||
. | . | |||
[foo]: /url "title" | [foo]: /url "title" | |||
[foo] | [foo] | |||
. | . | |||
<pre><code>[foo]: /url "title" | <pre><code>[foo]: /url "title" | |||
</code></pre> | </code></pre> | |||
skipping to change at line 2240 | skipping to change at line 2698 | |||
form of the definition is: | form of the definition is: | |||
> If X is a sequence of blocks, then the result of | > If X is a sequence of blocks, then the result of | |||
> transforming X in such-and-such a way is a container of type Y | > transforming X in such-and-such a way is a container of type Y | |||
> with these blocks as its content. | > with these blocks as its content. | |||
So, we explain what counts as a block quote or list item by explaining | So, we explain what counts as a block quote or list item by explaining | |||
how these can be *generated* from their contents. This should suffice | how these can be *generated* from their contents. This should suffice | |||
to define the syntax, although it does not give a recipe for *parsing* | to define the syntax, although it does not give a recipe for *parsing* | |||
these constructions. (A recipe is provided below in the section entitled | these constructions. (A recipe is provided below in the section entitled | |||
[A parsing strategy](#appendix-a-a-parsing-strategy).) | [A parsing strategy](#appendix-a-parsing-strategy).) | |||
## Block quotes | ## Block quotes | |||
A [block quote marker](@block-quote-marker) | A [block quote marker](@block-quote-marker) | |||
consists of 0-3 spaces of initial indent, plus (a) the character `>` together | consists of 0-3 spaces of initial indent, plus (a) the character `>` together | |||
with a following space, or (b) a single character `>` not followed by a space. | with a following space, or (b) a single character `>` not followed by a space. | |||
The following rules define [block quotes]: | The following rules define [block quotes]: | |||
1. **Basic case.** If a string of lines *Ls* constitute a sequence | 1. **Basic case.** If a string of lines *Ls* constitute a sequence | |||
of blocks *Bs*, then the result of prepending a [block quote | of blocks *Bs*, then the result of prepending a [block quote | |||
marker] to the beginning of each line in *Ls* | marker] to the beginning of each line in *Ls* | |||
is a [block quote](#block-quotes) containing *Bs*. | is a [block quote](#block-quotes) containing *Bs*. | |||
2. **Laziness.** If a string of lines *Ls* constitute a [block | 2. **Laziness.** If a string of lines *Ls* constitute a [block | |||
quote](#block-quotes) with contents *Bs*, then the result of deleting | quote](#block-quotes) with contents *Bs*, then the result of deleting | |||
the initial [block quote marker] from one or | the initial [block quote marker] from one or | |||
more lines in which the next [non-space character] after the [block | more lines in which the next [non-whitespace character] after the [block | |||
quote marker] is [paragraph continuation | quote marker] is [paragraph continuation | |||
text] is a block quote with *Bs* as its content. | text] is a block quote with *Bs* as its content. | |||
[Paragraph continuation text](@paragraph-continuation-text) is text | [Paragraph continuation text](@paragraph-continuation-text) is text | |||
that will be parsed as part of the content of a paragraph, but does | that will be parsed as part of the content of a paragraph, but does | |||
not occur at the beginning of the paragraph. | not occur at the beginning of the paragraph. | |||
3. **Consecutiveness.** A document cannot contain two [block | 3. **Consecutiveness.** A document cannot contain two [block | |||
quotes] in a row unless there is a [blank line] between them. | quotes] in a row unless there is a [blank line] between them. | |||
Nothing else counts as a [block quote](#block-quotes). | Nothing else counts as a [block quote](#block-quotes). | |||
skipping to change at line 2628 | skipping to change at line 3086 | |||
## List items | ## List items | |||
A [list marker](@list-marker) is a | A [list marker](@list-marker) is a | |||
[bullet list marker] or an [ordered list marker]. | [bullet list marker] or an [ordered list marker]. | |||
A [bullet list marker](@bullet-list-marker) | A [bullet list marker](@bullet-list-marker) | |||
is a `-`, `+`, or `*` character. | is a `-`, `+`, or `*` character. | |||
An [ordered list marker](@ordered-list-marker) | An [ordered list marker](@ordered-list-marker) | |||
is a sequence of one of more digits (`0-9`), followed by either a | is a sequence of 1--9 arabic digits (`0-9`), followed by either a | |||
`.` character or a `)` character. | `.` character or a `)` character. (The reason for the length | |||
limit is that with 10 digits we start seeing integer overflows | ||||
in some browsers.) | ||||
The following rules define [list items]: | The following rules define [list items]: | |||
1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of | 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of | |||
blocks *Bs* starting with a [non-space character] and not separated | blocks *Bs* starting with a [non-whitespace character] and not separated | |||
from each other by more than one blank line, and *M* is a list | from each other by more than one blank line, and *M* is a list | |||
marker of width *W* followed by 0 < *N* < 5 spaces, then the result | marker of width *W* followed by 0 < *N* < 5 spaces, then the result | |||
of prepending *M* and the following spaces to the first line of | of prepending *M* and the following spaces to the first line of | |||
*Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a | *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a | |||
list item with *Bs* as its contents. The type of the list item | list item with *Bs* as its contents. The type of the list item | |||
(bullet or ordered) is determined by the type of its list marker. | (bullet or ordered) is determined by the type of its list marker. | |||
If the list item is ordered, then it is also assigned a start | If the list item is ordered, then it is also assigned a start | |||
number, based on the ordered list marker. | number, based on the ordered list marker. | |||
For example, let *Ls* be the lines | For example, let *Ls* be the lines | |||
skipping to change at line 2692 | skipping to change at line 3152 | |||
<p>A block quote.</p> | <p>A block quote.</p> | |||
</blockquote> | </blockquote> | |||
</li> | </li> | |||
</ol> | </ol> | |||
. | . | |||
The most important thing to notice is that the position of | The most important thing to notice is that the position of | |||
the text after the list marker determines how much indentation | the text after the list marker determines how much indentation | |||
is needed in subsequent blocks in the list item. If the list | is needed in subsequent blocks in the list item. If the list | |||
marker takes up two spaces, and there are three spaces between | marker takes up two spaces, and there are three spaces between | |||
the list marker and the next [non-space character], then blocks | the list marker and the next [non-whitespace character], then blocks | |||
must be indented five spaces in order to fall under the list | must be indented five spaces in order to fall under the list | |||
item. | item. | |||
Here are some examples showing how far content must be indented to be | Here are some examples showing how far content must be indented to be | |||
put under the list item: | put under the list item: | |||
. | . | |||
- one | - one | |||
two | two | |||
skipping to change at line 2750 | skipping to change at line 3210 | |||
<ul> | <ul> | |||
<li> | <li> | |||
<p>one</p> | <p>one</p> | |||
<p>two</p> | <p>two</p> | |||
</li> | </li> | |||
</ul> | </ul> | |||
. | . | |||
It is tempting to think of this in terms of columns: the continuation | It is tempting to think of this in terms of columns: the continuation | |||
blocks must be indented at least to the column of the first | blocks must be indented at least to the column of the first | |||
[non-space character] after the list marker. However, that is not quite right. | [non-whitespace character] after the list marker. However, that is not quite rig ht. | |||
The spaces after the list marker determine how much relative indentation | The spaces after the list marker determine how much relative indentation | |||
is needed. Which column this indentation reaches will depend on | is needed. Which column this indentation reaches will depend on | |||
how the list item is embedded in other constructions, as shown by | how the list item is embedded in other constructions, as shown by | |||
this example: | this example: | |||
. | . | |||
> > 1. one | > > 1. one | |||
>> | >> | |||
>> two | >> two | |||
. | . | |||
skipping to change at line 2893 | skipping to change at line 3353 | |||
<pre><code>bar | <pre><code>bar | |||
</code></pre> | </code></pre> | |||
<p>baz</p> | <p>baz</p> | |||
<blockquote> | <blockquote> | |||
<p>bam</p> | <p>bam</p> | |||
</blockquote> | </blockquote> | |||
</li> | </li> | |||
</ol> | </ol> | |||
. | . | |||
Note that ordered list start numbers must be nine digits or less: | ||||
. | ||||
123456789. ok | ||||
. | ||||
<ol start="123456789"> | ||||
<li>ok</li> | ||||
</ol> | ||||
. | ||||
. | ||||
1234567890. not ok | ||||
. | ||||
<p>1234567890. not ok</p> | ||||
. | ||||
A start number may begin with 0s: | ||||
. | ||||
0. ok | ||||
. | ||||
<ol start="0"> | ||||
<li>ok</li> | ||||
</ol> | ||||
. | ||||
. | ||||
003. ok | ||||
. | ||||
<ol start="3"> | ||||
<li>ok</li> | ||||
</ol> | ||||
. | ||||
A start number may not be negative: | ||||
. | ||||
-1. not ok | ||||
. | ||||
<p>-1. not ok</p> | ||||
. | ||||
2. **Item starting with indented code.** If a sequence of lines *Ls* | 2. **Item starting with indented code.** If a sequence of lines *Ls* | |||
constitute a sequence of blocks *Bs* starting with an indented code | constitute a sequence of blocks *Bs* starting with an indented code | |||
block and not separated from each other by more than one blank line, | block and not separated from each other by more than one blank line, | |||
and *M* is a list marker of width *W* followed by | and *M* is a list marker of width *W* followed by | |||
one space, then the result of prepending *M* and the following | one space, then the result of prepending *M* and the following | |||
space to the first line of *Ls*, and indenting subsequent lines of | space to the first line of *Ls*, and indenting subsequent lines of | |||
*Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. | *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. | |||
If a line is empty, then it need not be indented. The type of the | If a line is empty, then it need not be indented. The type of the | |||
list item (bullet or ordered) is determined by the type of its list | list item (bullet or ordered) is determined by the type of its list | |||
marker. If the list item is ordered, then it is also assigned a | marker. If the list item is ordered, then it is also assigned a | |||
skipping to change at line 2998 | skipping to change at line 3500 | |||
</code></pre> | </code></pre> | |||
<p>paragraph</p> | <p>paragraph</p> | |||
<pre><code>more code | <pre><code>more code | |||
</code></pre> | </code></pre> | |||
</li> | </li> | |||
</ol> | </ol> | |||
. | . | |||
Note that rules #1 and #2 only apply to two cases: (a) cases | Note that rules #1 and #2 only apply to two cases: (a) cases | |||
in which the lines to be included in a list item begin with a | in which the lines to be included in a list item begin with a | |||
[non-space character], and (b) cases in which | [non-whitespace character], and (b) cases in which | |||
they begin with an indented code | they begin with an indented code | |||
block. In a case like the following, where the first block begins with | block. In a case like the following, where the first block begins with | |||
a three-space indent, the rules do not allow us to form a list item by | a three-space indent, the rules do not allow us to form a list item by | |||
indenting the whole thing and prepending a list marker: | indenting the whole thing and prepending a list marker: | |||
. | . | |||
foo | foo | |||
bar | bar | |||
. | . | |||
skipping to change at line 3228 | skipping to change at line 3730 | |||
indented code | indented code | |||
> A block quote. | > A block quote. | |||
</code></pre> | </code></pre> | |||
. | . | |||
5. **Laziness.** If a string of lines *Ls* constitute a [list | 5. **Laziness.** If a string of lines *Ls* constitute a [list | |||
item](#list-items) with contents *Bs*, then the result of deleting | item](#list-items) with contents *Bs*, then the result of deleting | |||
some or all of the indentation from one or more lines in which the | some or all of the indentation from one or more lines in which the | |||
next [non-space character] after the indentation is | next [non-whitespace character] after the indentation is | |||
[paragraph continuation text] is a | [paragraph continuation text] is a | |||
list item with the same contents and attributes. The unindented | list item with the same contents and attributes. The unindented | |||
lines are called | lines are called | |||
[lazy continuation line](@lazy-continuation-line)s. | [lazy continuation line](@lazy-continuation-line)s. | |||
Here is an example with [lazy continuation line]s: | Here is an example with [lazy continuation line]s: | |||
. | . | |||
1. A paragraph | 1. A paragraph | |||
with two lines. | with two lines. | |||
skipping to change at line 4201 | skipping to change at line 4703 | |||
. | . | |||
<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> | <p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> | |||
. | . | |||
Backslashes before other characters are treated as literal | Backslashes before other characters are treated as literal | |||
backslashes: | backslashes: | |||
. | . | |||
\→\A\a\ \3\φ\« | \→\A\a\ \3\φ\« | |||
. | . | |||
<p>\ \A\a\ \3\φ\«</p> | <p>\→\A\a\ \3\φ\«</p> | |||
. | . | |||
Escaped characters are treated as regular characters and do | Escaped characters are treated as regular characters and do | |||
not have their usual Markdown meanings: | not have their usual Markdown meanings: | |||
. | . | |||
\*not emphasized* | \*not emphasized* | |||
\<br/> not a tag | \<br/> not a tag | |||
\[not a link](/foo) | \[not a link](/foo) | |||
\`not code` | \`not code` | |||
skipping to change at line 4279 | skipping to change at line 4781 | |||
. | . | |||
<http://example.com?find=\*> | <http://example.com?find=\*> | |||
. | . | |||
<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> | <p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> | |||
. | . | |||
. | . | |||
<a href="/bar\/)"> | <a href="/bar\/)"> | |||
. | . | |||
<p><a href="/bar\/)"></p> | <a href="/bar\/)"> | |||
. | . | |||
But they work in all other contexts, including URLs and link titles, | But they work in all other contexts, including URLs and link titles, | |||
link references, and [info string]s in [fenced code block]s: | link references, and [info string]s in [fenced code block]s: | |||
. | . | |||
[foo](/bar\* "ti\*tle") | [foo](/bar\* "ti\*tle") | |||
. | . | |||
<p><a href="/bar*" title="ti*tle">foo</a></p> | <p><a href="/bar*" title="ti*tle">foo</a></p> | |||
. | . | |||
skipping to change at line 4325 | skipping to change at line 4827 | |||
unicode characters as entities or leave them as they are. (However, | unicode characters as entities or leave them as they are. (However, | |||
`"`, `&`, `<`, and `>` must always be rendered as entities.) | `"`, `&`, `<`, and `>` must always be rendered as entities.) | |||
[Named entities](@name-entities) consist of `&` | [Named entities](@name-entities) consist of `&` | |||
+ any of the valid HTML5 entity names + `;`. The | + any of the valid HTML5 entity names + `;`. The | |||
[following document](https://html.spec.whatwg.org/multipage/entities.json) | [following document](https://html.spec.whatwg.org/multipage/entities.json) | |||
is used as an authoritative source of the valid entity names and their | is used as an authoritative source of the valid entity names and their | |||
corresponding codepoints. | corresponding codepoints. | |||
. | . | |||
& © Æ Ď ¾ ℋ ⅆ &Cl ockwiseContourIntegral; | & © Æ Ď | |||
¾ ℋ ⅆ | ||||
∲ ≧̸ | ||||
. | . | |||
<p> & © Æ Ď ¾ ℋ ⅆ ∲</p> | <p> & © Æ Ď | |||
¾ ℋ ⅆ | ||||
∲ ≧̸</p> | ||||
. | . | |||
[Decimal entities](@decimal-entities) | [Decimal entities](@decimal-entities) | |||
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | |||
entities need to be recognised and transformed into their corresponding | entities need to be recognised and transformed into their corresponding | |||
unicode codepoints. Invalid unicode codepoints will be replaced by | unicode codepoints. Invalid unicode codepoints will be replaced by | |||
the "unknown codepoint" character (`U+FFFD`). For security reasons, | the "unknown codepoint" character (`U+FFFD`). For security reasons, | |||
the codepoint `U+0000` will also be replaced by `U+FFFD`. | the codepoint `U+0000` will also be replaced by `U+FFFD`. | |||
. | . | |||
skipping to change at line 4388 | skipping to change at line 4894 | |||
<p>&MadeUpEntity;</p> | <p>&MadeUpEntity;</p> | |||
. | . | |||
Entities are recognized in any context besides code spans or | Entities are recognized in any context besides code spans or | |||
code blocks, including raw HTML, URLs, [link title]s, and | code blocks, including raw HTML, URLs, [link title]s, and | |||
[fenced code block] [info string]s: | [fenced code block] [info string]s: | |||
. | . | |||
<a href="öö.html"> | <a href="öö.html"> | |||
. | . | |||
<p><a href="öö.html"></p> | <a href="öö.html"> | |||
. | . | |||
. | . | |||
[foo](/föö "föö") | [foo](/föö "föö") | |||
. | . | |||
<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> | <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> | |||
. | . | |||
. | . | |||
[foo] | [foo] | |||
skipping to change at line 5690 | skipping to change at line 6196 | |||
. | . | |||
<p><em>foo _bar</em> baz_</p> | <p><em>foo _bar</em> baz_</p> | |||
. | . | |||
. | . | |||
**foo*bar** | **foo*bar** | |||
. | . | |||
<p><em><em>foo</em>bar</em>*</p> | <p><em><em>foo</em>bar</em>*</p> | |||
. | . | |||
. | ||||
*foo __bar *baz bim__ bam* | ||||
. | ||||
<p><em>foo <strong>bar *baz bim</strong> bam</em></p> | ||||
. | ||||
Rule 16: | Rule 16: | |||
. | . | |||
**foo **bar baz** | **foo **bar baz** | |||
. | . | |||
<p>**foo <strong>bar baz</strong></p> | <p>**foo <strong>bar baz</strong></p> | |||
. | . | |||
. | . | |||
*foo *bar baz* | *foo *bar baz* | |||
skipping to change at line 5773 | skipping to change at line 6285 | |||
(the URI that is the link destination), and optionally a [link title]. | (the URI that is the link destination), and optionally a [link title]. | |||
There are two basic kinds of links in Markdown. In [inline link]s the | There are two basic kinds of links in Markdown. In [inline link]s the | |||
destination and title are given immediately after the link text. In | destination and title are given immediately after the link text. In | |||
[reference link]s the destination and title are defined elsewhere in | [reference link]s the destination and title are defined elsewhere in | |||
the document. | the document. | |||
A [link text](@link-text) consists of a sequence of zero or more | A [link text](@link-text) consists of a sequence of zero or more | |||
inline elements enclosed by square brackets (`[` and `]`). The | inline elements enclosed by square brackets (`[` and `]`). The | |||
following rules apply: | following rules apply: | |||
- Links may not contain other links, at any level of nesting. | - Links may not contain other links, at any level of nesting. If | |||
multiple otherwise valid link definitions appear nested inside each | ||||
other, the inner-most definition is used. | ||||
- Brackets are allowed in the [link text] only if (a) they | - Brackets are allowed in the [link text] only if (a) they | |||
are backslash-escaped or (b) they appear as a matched pair of brackets, | are backslash-escaped or (b) they appear as a matched pair of brackets, | |||
with an open bracket `[`, a sequence of zero or more inlines, and | with an open bracket `[`, a sequence of zero or more inlines, and | |||
a close bracket `]`. | a close bracket `]`. | |||
- Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly | - Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly | |||
than the brackets in link text. Thus, for example, | than the brackets in link text. Thus, for example, | |||
`` [foo`]` `` could not be a link text, since the second `]` | `` [foo`]` `` could not be a link text, since the second `]` | |||
is part of a code span. | is part of a code span. | |||
skipping to change at line 5929 | skipping to change at line 6443 | |||
Parentheses and other symbols can also be escaped, as usual | Parentheses and other symbols can also be escaped, as usual | |||
in Markdown: | in Markdown: | |||
. | . | |||
[link](foo\)\:) | [link](foo\)\:) | |||
. | . | |||
<p><a href="foo):">link</a></p> | <p><a href="foo):">link</a></p> | |||
. | . | |||
A link can contain fragment identifiers and queries: | ||||
. | ||||
[link](#fragment) | ||||
[link](http://example.com#fragment) | ||||
[link](http://example.com?foo=bar&baz#fragment) | ||||
. | ||||
<p><a href="#fragment">link</a></p> | ||||
<p><a href="http://example.com#fragment">link</a></p> | ||||
<p><a href="http://example.com?foo=bar&baz#fragment">link</a></p> | ||||
. | ||||
Note that a backslash before a non-escapable character is | ||||
just a backslash: | ||||
. | ||||
[link](foo\bar) | ||||
. | ||||
<p><a href="foo%5Cbar">link</a></p> | ||||
. | ||||
URL-escaping should be left alone inside the destination, as all | URL-escaping should be left alone inside the destination, as all | |||
URL-escaped characters are also valid URL characters. HTML entities in | URL-escaped characters are also valid URL characters. HTML entities in | |||
the destination will be parsed into the corresponding unicode | the destination will be parsed into the corresponding unicode | |||
codepoints, as usual, and optionally URL-escaped when written as HTML. | codepoints, as usual, and optionally URL-escaped when written as HTML. | |||
. | . | |||
[link](foo%20bä) | [link](foo%20bä) | |||
. | . | |||
<p><a href="foo%20b%C3%A4">link</a></p> | <p><a href="foo%20b%C3%A4">link</a></p> | |||
. | . | |||
skipping to change at line 6134 | skipping to change at line 6671 | |||
There are three kinds of [reference link](@reference-link)s: | There are three kinds of [reference link](@reference-link)s: | |||
[full](#full-reference-link), [collapsed](#collapsed-reference-link), | [full](#full-reference-link), [collapsed](#collapsed-reference-link), | |||
and [shortcut](#shortcut-reference-link). | and [shortcut](#shortcut-reference-link). | |||
A [full reference link](@full-reference-link) | A [full reference link](@full-reference-link) | |||
consists of a [link text], optional [whitespace], and a [link label] | consists of a [link text], optional [whitespace], and a [link label] | |||
that [matches] a [link reference definition] elsewhere in the document. | that [matches] a [link reference definition] elsewhere in the document. | |||
A [link label](@link-label) begins with a left bracket (`[`) and ends | A [link label](@link-label) begins with a left bracket (`[`) and ends | |||
with the first right bracket (`]`) that is not backslash-escaped. | with the first right bracket (`]`) that is not backslash-escaped. | |||
Between these brackets there must be at least one non-[whitespace character]. | Between these brackets there must be at least one [non-whitespace character]. | |||
Unescaped square bracket characters are not allowed in | Unescaped square bracket characters are not allowed in | |||
[link label]s. A link label can have at most 999 | [link label]s. A link label can have at most 999 | |||
characters inside the square brackets. | characters inside the square brackets. | |||
One label [matches](@matches) | One label [matches](@matches) | |||
another just in case their normalized forms are equal. To normalize a | another just in case their normalized forms are equal. To normalize a | |||
label, perform the *unicode case fold* and collapse consecutive internal | label, perform the *unicode case fold* and collapse consecutive internal | |||
[whitespace] to a single space. If there are multiple | [whitespace] to a single space. If there are multiple | |||
matching reference link definitions, the one that comes first in the | matching reference link definitions, the one that comes first in the | |||
document is used. (It is desirable in such cases to emit a warning.) | document is used. (It is desirable in such cases to emit a warning.) | |||
skipping to change at line 6381 | skipping to change at line 6918 | |||
. | . | |||
. | . | |||
[foo][ref\[] | [foo][ref\[] | |||
[ref\[]: /uri | [ref\[]: /uri | |||
. | . | |||
<p><a href="/uri">foo</a></p> | <p><a href="/uri">foo</a></p> | |||
. | . | |||
A [link label] must contain at least one non-[whitespace character]: | A [link label] must contain at least one [non-whitespace character]: | |||
. | . | |||
[] | [] | |||
[]: /uri | []: /uri | |||
. | . | |||
<p>[]</p> | <p>[]</p> | |||
<p>[]: /uri</p> | <p>[]: /uri</p> | |||
. | . | |||
skipping to change at line 6961 | skipping to change at line 7498 | |||
## Raw HTML | ## Raw HTML | |||
Text between `<` and `>` that looks like an HTML tag is parsed as a | Text between `<` and `>` that looks like an HTML tag is parsed as a | |||
raw HTML tag and will be rendered in HTML without escaping. | raw HTML tag and will be rendered in HTML without escaping. | |||
Tag and attribute names are not limited to current HTML tags, | Tag and attribute names are not limited to current HTML tags, | |||
so custom tags (and even, say, DocBook tags) may be used. | so custom tags (and even, say, DocBook tags) may be used. | |||
Here is the grammar for tags: | Here is the grammar for tags: | |||
A [tag name](@tag-name) consists of an ASCII letter | A [tag name](@tag-name) consists of an ASCII letter | |||
followed by zero or more ASCII letters or digits. | followed by zero or more ASCII letters, digits, or | |||
hyphens (`-`). | ||||
An [attribute](@attribute) consists of [whitespace], | An [attribute](@attribute) consists of [whitespace], | |||
an [attribute name], and an optional | an [attribute name], and an optional | |||
[attribute value specification]. | [attribute value specification]. | |||
An [attribute name](@attribute-name) | An [attribute name](@attribute-name) | |||
consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII | consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII | |||
letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML | letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML | |||
specification restricted to ASCII. HTML5 is laxer.) | specification restricted to ASCII. HTML5 is laxer.) | |||
skipping to change at line 6994 | skipping to change at line 7532 | |||
A [single-quoted attribute value](@single-quoted-attribute-value) | A [single-quoted attribute value](@single-quoted-attribute-value) | |||
consists of `'`, zero or more | consists of `'`, zero or more | |||
characters not including `'`, and a final `'`. | characters not including `'`, and a final `'`. | |||
A [double-quoted attribute value](@double-quoted-attribute-value) | A [double-quoted attribute value](@double-quoted-attribute-value) | |||
consists of `"`, zero or more | consists of `"`, zero or more | |||
characters not including `"`, and a final `"`. | characters not including `"`, and a final `"`. | |||
An [open tag](@open-tag) consists of a `<` character, a [tag name], | An [open tag](@open-tag) consists of a `<` character, a [tag name], | |||
zero or more [attributes], optional [whitespace], an optional `/` | zero or more [attributes](@attribute], optional [whitespace], an optional `/` | |||
character, and a `>` character. | character, and a `>` character. | |||
A [closing tag](@closing-tag) consists of the string `</`, a | A [closing tag](@closing-tag) consists of the string `</`, a | |||
[tag name], optional [whitespace], and the character `>`. | [tag name], optional [whitespace], and the character `>`. | |||
An [HTML comment](@html-comment) consists of `<!--` + *text* + `-->`, | An [HTML comment](@html-comment) consists of `<!--` + *text* + `-->`, | |||
where *text* does not start with `>` or `->`, does not end with `-`, | where *text* does not start with `>` or `->`, does not end with `-`, | |||
and does not contain `--`. (See the | and does not contain `--`. (See the | |||
[HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) | [HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) | |||
skipping to change at line 7059 | skipping to change at line 7597 | |||
With attributes: | With attributes: | |||
. | . | |||
<a foo="bar" bam = 'baz <em>"</em>' | <a foo="bar" bam = 'baz <em>"</em>' | |||
_boolean zoop:33=zoop:33 /> | _boolean zoop:33=zoop:33 /> | |||
. | . | |||
<p><a foo="bar" bam = 'baz <em>"</em>' | <p><a foo="bar" bam = 'baz <em>"</em>' | |||
_boolean zoop:33=zoop:33 /></p> | _boolean zoop:33=zoop:33 /></p> | |||
. | . | |||
Custom tag names can be used: | ||||
. | ||||
<responsive-image src="foo.jpg" /> | ||||
<My-Tag> | ||||
foo | ||||
</My-Tag> | ||||
. | ||||
<responsive-image src="foo.jpg" /> | ||||
<My-Tag> | ||||
foo | ||||
</My-Tag> | ||||
. | ||||
Illegal tag names, not parsed as HTML: | Illegal tag names, not parsed as HTML: | |||
. | . | |||
<33> <__> | <33> <__> | |||
. | . | |||
<p><33> <__></p> | <p><33> <__></p> | |||
. | . | |||
Illegal attribute names: | Illegal attribute names: | |||
skipping to change at line 7107 | skipping to change at line 7660 | |||
. | . | |||
<p><a href='bar'title=title></p> | <p><a href='bar'title=title></p> | |||
. | . | |||
Closing tags: | Closing tags: | |||
. | . | |||
</a> | </a> | |||
</foo > | </foo > | |||
. | . | |||
<p></a> | </a> | |||
</foo ></p> | </foo > | |||
. | . | |||
Illegal attributes in closing tag: | Illegal attributes in closing tag: | |||
. | . | |||
</a href="foo"> | </a href="foo"> | |||
. | . | |||
<p></a href="foo"></p> | <p></a href="foo"></p> | |||
. | . | |||
skipping to change at line 7175 | skipping to change at line 7728 | |||
foo <![CDATA[>&<]]> | foo <![CDATA[>&<]]> | |||
. | . | |||
<p>foo <![CDATA[>&<]]></p> | <p>foo <![CDATA[>&<]]></p> | |||
. | . | |||
Entities are preserved in HTML attributes: | Entities are preserved in HTML attributes: | |||
. | . | |||
<a href="ö"> | <a href="ö"> | |||
. | . | |||
<p><a href="ö"></p> | <a href="ö"> | |||
. | . | |||
Backslash escapes do not work in HTML attributes: | Backslash escapes do not work in HTML attributes: | |||
. | . | |||
<a href="\*"> | <a href="\*"> | |||
. | . | |||
<p><a href="\*"></p> | <a href="\*"> | |||
. | . | |||
. | . | |||
<a href="\""> | <a href="\""> | |||
. | . | |||
<p><a href="""></p> | <p><a href="""></p> | |||
. | . | |||
## Hard line breaks | ## Hard line breaks | |||
skipping to change at line 7387 | skipping to change at line 7940 | |||
Internal spaces are preserved verbatim: | Internal spaces are preserved verbatim: | |||
. | . | |||
Multiple spaces | Multiple spaces | |||
. | . | |||
<p>Multiple spaces</p> | <p>Multiple spaces</p> | |||
. | . | |||
<!-- END TESTS --> | <!-- END TESTS --> | |||
# Appendix A: A parsing strategy {-} | # Appendix: A parsing strategy {-} | |||
In this appendix we describe some features of the parsing strategy | ||||
used in the CommonMark reference implementations. | ||||
## Overview {-} | ## Overview {-} | |||
Parsing has two phases: | Parsing has two phases: | |||
1. In the first phase, lines of input are consumed and the block | 1. In the first phase, lines of input are consumed and the block | |||
structure of the document---its division into paragraphs, block quotes, | structure of the document---its division into paragraphs, block quotes, | |||
list items, and so on---is constructed. Text is assigned to these | list items, and so on---is constructed. Text is assigned to these | |||
blocks but not parsed. Link reference definitions are parsed and a | blocks but not parsed. Link reference definitions are parsed and a | |||
map of links is constructed. | map of links is constructed. | |||
2. In the second phase, the raw text contents of paragraphs and headers | 2. In the second phase, the raw text contents of paragraphs and headers | |||
are parsed into sequences of Markdown inline elements (strings, | are parsed into sequences of Markdown inline elements (strings, | |||
code spans, links, emphasis, and so on), using the map of link | code spans, links, emphasis, and so on), using the map of link | |||
references constructed in phase 1. | references constructed in phase 1. | |||
## The document tree {-} | ||||
At each point in processing, the document is represented as a tree of | At each point in processing, the document is represented as a tree of | |||
**blocks**. The root of the tree is a `document` block. The `document` | **blocks**. The root of the tree is a `document` block. The `document` | |||
may have any number of other blocks as **children**. These children | may have any number of other blocks as **children**. These children | |||
may, in turn, have other blocks as children. The last child of a block | may, in turn, have other blocks as children. The last child of a block | |||
is normally considered **open**, meaning that subsequent lines of input | is normally considered **open**, meaning that subsequent lines of input | |||
can alter its contents. (Blocks that are not open are **closed**.) | can alter its contents. (Blocks that are not open are **closed**.) | |||
Here, for example, is a possible document tree, with the open blocks | Here, for example, is a possible document tree, with the open blocks | |||
marked by arrows: | marked by arrows: | |||
``` tree | ``` tree | |||
skipping to change at line 7429 | skipping to change at line 7983 | |||
"Lorem ipsum dolor\nsit amet." | "Lorem ipsum dolor\nsit amet." | |||
-> list (type=bullet tight=true bullet_char=-) | -> list (type=bullet tight=true bullet_char=-) | |||
list_item | list_item | |||
paragraph | paragraph | |||
"Qui *quodsi iracundia*" | "Qui *quodsi iracundia*" | |||
-> list_item | -> list_item | |||
-> paragraph | -> paragraph | |||
"aliquando id" | "aliquando id" | |||
``` | ``` | |||
## How source lines alter the document tree {-} | ## Phase 1: block structure {-} | |||
Each line that is processed has an effect on this tree. The line is | Each line that is processed has an effect on this tree. The line is | |||
analyzed and, depending on its contents, the document may be altered | analyzed and, depending on its contents, the document may be altered | |||
in one or more of the following ways: | in one or more of the following ways: | |||
1. One or more open blocks may be closed. | 1. One or more open blocks may be closed. | |||
2. One or more new blocks may be created as children of the | 2. One or more new blocks may be created as children of the | |||
last open block. | last open block. | |||
3. Text may be added to the last (deepest) open block remaining | 3. Text may be added to the last (deepest) open block remaining | |||
on the tree. | on the tree. | |||
Once a line has been incorporated into the tree in this way, | Once a line has been incorporated into the tree in this way, | |||
it can be discarded, so input can be read in a stream. | it can be discarded, so input can be read in a stream. | |||
For each line, we follow this procedure: | ||||
1. First we iterate through the open blocks, starting with the | ||||
root document, and descending through last children down to the last | ||||
open block. Each block imposes a condition that the line must satisfy | ||||
if the block is to remain open. For example, a block quote requires a | ||||
`>` character. A paragraph requires a non-blank line. | ||||
In this phase we may match all or just some of the open | ||||
blocks. But we cannot close unmatched blocks yet, because we may have a | ||||
[lazy continuation line]. | ||||
2. Next, after consuming the continuation markers for existing | ||||
blocks, we look for new block starts (e.g. `>` for a block quote. | ||||
If we encounter a new block start, we close any blocks unmatched | ||||
in step 1 before creating the new block as a child of the last | ||||
matched block. | ||||
3. Finally, we look at the remainder of the line (after block | ||||
markers like `>`, list markers, and indentation have been consumed). | ||||
This is text that can be incorporated into the last open | ||||
block (a paragraph, code block, header, or raw HTML). | ||||
Setext headers are formed when we detect that the second line of | ||||
a paragraph is a setext header line. | ||||
Reference link definitions are detected when a paragraph is closed; | ||||
the accumulated text lines are parsed to see if they begin with | ||||
one or more reference link definitions. Any remainder becomes a | ||||
normal paragraph. | ||||
We can see how this works by considering how the tree above is | We can see how this works by considering how the tree above is | |||
generated by four lines of Markdown: | generated by four lines of Markdown: | |||
``` markdown | ``` markdown | |||
> Lorem ipsum dolor | > Lorem ipsum dolor | |||
sit amet. | sit amet. | |||
> - Qui *quodsi iracundia* | > - Qui *quodsi iracundia* | |||
> - aliquando id | > - aliquando id | |||
``` | ``` | |||
skipping to change at line 7541 | skipping to change at line 8125 | |||
"Lorem ipsum dolor\nsit amet." | "Lorem ipsum dolor\nsit amet." | |||
-> list (type=bullet tight=true bullet_char=-) | -> list (type=bullet tight=true bullet_char=-) | |||
list_item | list_item | |||
paragraph | paragraph | |||
"Qui *quodsi iracundia*" | "Qui *quodsi iracundia*" | |||
-> list_item | -> list_item | |||
-> paragraph | -> paragraph | |||
"aliquando id" | "aliquando id" | |||
``` | ``` | |||
## From block structure to the final document {-} | ## Phase 2: inline structure {-} | |||
Once all of the input has been parsed, all open blocks are closed. | Once all of the input has been parsed, all open blocks are closed. | |||
We then "walk the tree," visiting every node, and parse raw | We then "walk the tree," visiting every node, and parse raw | |||
string contents of paragraphs and headers as inlines. At this | string contents of paragraphs and headers as inlines. At this | |||
point we have seen all the link reference definitions, so we can | point we have seen all the link reference definitions, so we can | |||
resolve reference links as we go. | resolve reference links as we go. | |||
``` tree | ``` tree | |||
document | document | |||
skipping to change at line 7572 | skipping to change at line 8156 | |||
str "quodsi iracundia" | str "quodsi iracundia" | |||
list_item | list_item | |||
paragraph | paragraph | |||
str "aliquando id" | str "aliquando id" | |||
``` | ``` | |||
Notice how the [line ending] in the first paragraph has | Notice how the [line ending] in the first paragraph has | |||
been parsed as a `softbreak`, and the asterisks in the first list item | been parsed as a `softbreak`, and the asterisks in the first list item | |||
have become an `emph`. | have become an `emph`. | |||
The document can be rendered as HTML, or in any other format, given | ### An algorithm for parsing nested emphasis and links {-} | |||
an appropriate renderer. | ||||
By far the trickiest part of inline parsing is handling emphasis, | ||||
strong emphasis, links, and images. This is done using the following | ||||
algorithm. | ||||
When we're parsing inlines and we hit either | ||||
- a run of `*` or `_` characters, or | ||||
- a `[` or `![` | ||||
we insert a text node with these symbols as its literal content, and we | ||||
add a pointer to this text node to the [delimiter stack](@delimiter-stack). | ||||
The [delimiter stack] is a doubly linked list. Each | ||||
element contains a pointer to a text node, plus information about | ||||
- the type of delimiter (`[`, `![`, `*`, `_`) | ||||
- the number of delimiters, | ||||
- whether the delimiter is "active" (all are active to start), and | ||||
- whether the delimiter is a potential opener, a potential closer, | ||||
or both (which depends on what sort of characters precede | ||||
and follow the delimiters). | ||||
When we hit a `]` character, we call the *look for link or image* | ||||
procedure (see below). | ||||
When we hit the end of the input, we call the *process emphasis* | ||||
procedure (see below), with `stack_bottom` = NULL. | ||||
#### *look for link or image* {-} | ||||
Starting at the top of the delimiter stack, we look backwards | ||||
through the stack for an opening `[` or `![` delimiter. | ||||
- If we don't find one, we return a literal text node `]`. | ||||
- If we do find one, but it's not *active*, we remove the inactive | ||||
delimiter from the stack, and return a literal text node `]`. | ||||
- If we find one and it's active, then we parse ahead to see if | ||||
we have an inline link/image, reference link/image, compact reference | ||||
link/image, or shortcut reference link/image. | ||||
+ If we don't, then we remove the opening delimiter from the | ||||
delimiter stack and return a literal text node `]`. | ||||
+ If we do, then | ||||
* We return a link or image node whose children are the inlines | ||||
after the text node pointed to by the opening delimiter. | ||||
* We run *process emphasis* on these inlines, with the `[` opener | ||||
as `stack_bottom`. | ||||
* We remove the opening delimiter. | ||||
* If we have a link (and not an image), we also set all | ||||
`[` delimiters before the opening delimiter to *inactive*. (This | ||||
will prevent us from getting links within links.) | ||||
#### *process emphasis* {-} | ||||
Parameter `stack_bottom` sets a lower bound to how far we | ||||
descend in the [delimiter stack]. If it is NULL, we can | ||||
go all the way to the bottom. Otherwise, we stop before | ||||
visiting `stack_bottom`. | ||||
Let `current_position` point to the element on the [delimiter stack] | ||||
just above `stack_bottom` (or the first element if `stack_bottom` | ||||
is NULL). | ||||
We keep track of the `openers_bottom` for each delimiter | ||||
type (`*`, `_`). Initialize this to `stack_bottom`. | ||||
Then we repeat the following until we run out of potential | ||||
closers: | ||||
- Move `current_position` forward in the delimiter stack (if needed) | ||||
until we find the first potential closer with delimiter `*` or `_`. | ||||
(This will be the potential closer closest | ||||
to the beginning of the input -- the first one in parse order.) | ||||
- Now, look back in the stack (staying above `stack_bottom` and | ||||
the `openers_bottom` for this delimiter type) for the | ||||
first matching potential opener ("matching" means same delimiter). | ||||
- If one is found: | ||||
+ Figure out whether we have emphasis or strong emphasis: | ||||
if both closer and opener spans have length >= 2, we have | ||||
strong, otherwise regular. | ||||
+ Insert an emph or strong emph node accordingly, after | ||||
the text node corresponding to the opener. | ||||
+ Remove any delimiters between the opener and closer from | ||||
the delimiter stack. | ||||
+ Remove 1 (for regular emph) or 2 (for strong emph) delimiters | ||||
from the opening and closing text nodes. If they become empty | ||||
as a result, remove them and remove the corresponding element | ||||
of the delimiter stack. If the closing node is removed, reset | ||||
`current_position` to the next element in the stack. | ||||
- If none in found: | ||||
+ Set `openers_bottom` to the element before `current_position`. | ||||
(We know that there are no openers for this kind of closer up to and | ||||
including this point, so this puts a lower bound on future searches.) | ||||
+ If the closer at `current_position` is not a potential opener, | ||||
remove it from the delimiter stack (since we know it can't | ||||
be a closer either). | ||||
+ Advance `current_position` to the next element in the stack. | ||||
After we're done, we remove all delimiters above `stack_bottom` from the | ||||
delimiter stack. | ||||
End of changes. 74 change blocks. | ||||
98 lines changed or deleted | 682 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |