Skip to content

Commit 1811239

Browse files
committed
add deep parse optional end tags
1 parent 519d22e commit 1811239

18 files changed

Lines changed: 248 additions & 34 deletions

File tree

.idea/debuggerHistory.xml

Lines changed: 22 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/markdown-navigator.xml

Lines changed: 2 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/markdown-navigator/COPY_HTML_MIME.xml

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/markdown-navigator/OVERVIEW.xml

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,8 @@ More information can be found in the documentation:
9292
`PegdownOptionsAdapter` class converts pegdown `Extensions.*` flags to flexmark options and
9393
extensions list. Pegdown `Extensions.java` is included for convenience and new options not found
9494
in pegdown 1.6.0. These are located in `flexmark-profile-pegdown` module but you can grab the
95-
source from this repo: [PegdownOptionsAdapter.java], [Extensions.java] and
96-
make your own version, modified to your project's needs.
95+
source from this repo: [PegdownOptionsAdapter.java], [Extensions.java] and make your own
96+
version, modified to your project's needs.
9797

9898
You can pass your extension flags to static `PegdownOptionsAdapter.flexmarkOptions(int)` or you
9999
can instantiate `PegdownOptionsAdapter` and use convenience methods to set, add and remove
@@ -119,6 +119,32 @@ public class PegdownOptions {
119119
}
120120
```
121121

122+
Default flexmark-java pegdown emulation uses less strict HTML block parsing which interrupts an
123+
HTML block on a blank line. Pegdown only interrupts an HTML block on a blank line if all tags in
124+
the HTML block are closed.
125+
126+
To get closer to original pegdown HTML block parsing behavior use the method which takes a
127+
`boolean strictHtml` argument:
128+
129+
```java
130+
import com.vladsch.flexmark.html.HtmlRenderer;
131+
import com.vladsch.flexmark.parser.Parser;
132+
import com.vladsch.flexmark.profiles.pegdown.Extensions;
133+
import com.vladsch.flexmark.profiles.pegdown.PegdownOptionsAdapter;
134+
import com.vladsch.flexmark.util.options.DataHolder;
135+
136+
public class PegdownOptions {
137+
static final DataHolder OPTIONS = PegdownOptionsAdapter.flexmarkOptions(true,
138+
Extensions.ALL
139+
);
140+
141+
static final Parser PARSER = Parser.builder(OPTIONS).build();
142+
static final HtmlRenderer RENDERER = HtmlRenderer.builder(OPTIONS).build();
143+
144+
// use the PARSER to parse and RENDERER to render with pegdown compatibility
145+
}
146+
```
147+
122148
A sample with a
123149
[custom link resolver](https://github.com/vsch/flexmark-java/blob/master/flexmark-java-samples/src/com/vladsch/flexmark/samples/PegdownCustomLinkResolverOptions.java)
124150
is also available.
@@ -129,8 +155,8 @@ is also available.
129155

130156
### Latest Additions
131157

132-
* Deep HTML block parsing for better handling of raw text tags that come after other tags and
133-
for pegdown HTML block parsing compatibility.
158+
* Deep HTML block parsing option for better handling of raw text tags that come after other tags
159+
and for [pegdown] HTML block parsing compatibility.
134160
* `flexmark-all` module that includes: core, all extensions, formatter, JIRA and YouTrack
135161
converters, pegdown profile module and HTML to Markdown conversion.
136162
* [PDF converter module](https://github.com/vsch/flexmark-java/wiki/Extensions#pdf-output-module)
@@ -232,8 +258,8 @@ commonmark. If you want to use flexmark to fully emulate another markdown proces
232258
you have to adjust the parser and configure the flexmark extensions that provide the additional
233259
features available in the parser that you want to emulate.
234260

235-
Latest addition was a rewrite of the list parser to better control emulation of other markdown
236-
processors as per [Markdown Processors Emulation](MarkdownProcessorsEmulation.md). Addition of
261+
A rewrite of the list parser to better control emulation of other markdown processors as per
262+
[Markdown Processors Emulation](MarkdownProcessorsEmulation.md) is complete. Addition of
237263
processor presets to emulate specific markdown processing behaviour of these parsers is on a
238264
short to do list.
239265

@@ -253,11 +279,11 @@ Major processor families are implemented and some family members also:
253279
* [CommonMark] (spec 0.27)
254280
* [ ]  [League/CommonMark]
255281
* [GitHub] Comments
256-
* [Kramdown]
257-
* [ ]  [Jekyll]
282+
* [ ] [Jekyll]
258283
* [Markdown.pl][Markdown]
259284
* [ ]  [Php Markdown Extra]
260285
* [GitHub] Docs (old GitHub markdown parser)
286+
* [Kramdown]
261287
* FixedIndent
262288
* [MultiMarkdown]
263289
* [Pegdown]

VERSION.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -160,13 +160,26 @@ flexmark-java
160160
0.22.4
161161
------
162162

163-
* Add: `Parser.HTML_BLOCK_DEEP_PARSE_FIRST_OPEN_TAG_ON_ONE_LINE` to not parse open tags unless they
164-
are contained on one line. Parsers like MultiMarkdown 6.0 more compatible with this mode on.
163+
* [ ] Add: parser family specific HTML block test cases
164+
165+
* Add: `ParserEmulationProfile.PEGDOWN_STRICT` profile to emulate HTML block parsing according
166+
to pegdown rules. `ParserEmulationProfile.PEGDOWN` uses less strict HTML block parsing which
167+
will end an HTML block on a blank line.
168+
169+
* Add: `Parser.HTML_BLOCK_DEEP_PARSE_FIRST_OPEN_TAG_ON_ONE_LINE` to not parse open tags unless
170+
they are contained on one line. Parsers like MultiMarkdown 6.0 more compatible with this mode
171+
on.
165172

166173
* Add: html deep block parsing for non-commonmark parsers. Need to add HTML block parsing tests
167174
to parser emulation family tests.
168175

169-
* [ ] Add: parser family specific HTML block test cases
176+
* API Change: `BlockParser.isRawText()` used for interruptible blocks, when this method returns
177+
`true` then indenting spaces are passed to the block. Used by `HtmlBlockParser` to keep
178+
indents on continuation lines that could be interrupted by another markdown element.
179+
180+
* Fix: add optional tag logic to `HtmlDeepParser` so that optional end tags when omitted do not
181+
cause nesting of tags as per
182+
[8.1.2.4. Optional tags](https://www.w3.org/TR/html51/syntax.html#optional-tags)
170183

171184
0.22.2
172185
------

flexmark-profile-pegdown/src/main/java/com/vladsch/flexmark/profiles/pegdown/PegdownOptionsAdapter.java

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,12 @@ public PegdownOptionsAdapter(int pegdownExtensions) {
5252
}
5353

5454
public static DataHolder flexmarkOptions(int pegdownExtensions, Extension... extensions) {
55+
return flexmarkOptions(false, pegdownExtensions, extensions);
56+
}
57+
58+
public static DataHolder flexmarkOptions(boolean strictHtml, int pegdownExtensions, Extension... extensions) {
5559
PegdownOptionsAdapter optionsAdapter = new PegdownOptionsAdapter(pegdownExtensions);
56-
return optionsAdapter.getFlexmarkOptions(extensions);
60+
return optionsAdapter.getFlexmarkOptions(strictHtml, extensions);
5761
}
5862

5963
public boolean haveExtensions(int mask) {
@@ -65,6 +69,10 @@ public boolean allExtensions(int mask) {
6569
}
6670

6771
public DataHolder getFlexmarkOptions(Extension... additionalExtensions) {
72+
return getFlexmarkOptions(false,additionalExtensions);
73+
}
74+
75+
public DataHolder getFlexmarkOptions(boolean strictHtml, Extension... additionalExtensions) {
6876
if (myIsUpdateNeeded) {
6977
myIsUpdateNeeded = false;
7078
MutableDataSet options = myOptions;
@@ -75,7 +83,7 @@ public DataHolder getFlexmarkOptions(Extension... additionalExtensions) {
7583
extensions.addAll(Arrays.asList(additionalExtensions));
7684

7785
// Setup List Options for Fixed List Indent profile
78-
options.setFrom(ParserEmulationProfile.PEGDOWN);
86+
options.setFrom(strictHtml ? ParserEmulationProfile.PEGDOWN_STRICT : ParserEmulationProfile.PEGDOWN);
7987

8088
options.set(HtmlRenderer.SUPPRESS_HTML_BLOCKS, haveExtensions(SUPPRESS_HTML_BLOCKS));
8189
options.set(HtmlRenderer.SUPPRESS_INLINE_HTML, haveExtensions(SUPPRESS_INLINE_HTML));

flexmark/src/main/java/com/vladsch/flexmark/internal/DocumentParser.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -649,7 +649,7 @@ private void incorporateLine(BasedSequence ln) {
649649

650650
BlockStartImpl blockStart = findBlockStart(blockParser);
651651
if (blockStart == null) {
652-
setNewIndex(nextNonSpace);
652+
if (!blockParser.isRawText()) setNewIndex(nextNonSpace);
653653
break;
654654
}
655655

flexmark/src/main/java/com/vladsch/flexmark/internal/HtmlBlockParser.java

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ public void addLine(ParserState state, BasedSequence line) {
148148

149149
@Override
150150
public boolean canInterruptBy(final BlockParserFactory blockParserFactory) {
151-
return myHtmlBlockDeepParseMarkdownInterruptsClosed && (!(blockParserFactory instanceof HtmlBlockParser.Factory) && deepParser == null || deepParser.isHtmlClosed());
151+
return myHtmlBlockDeepParseMarkdownInterruptsClosed && !(blockParserFactory instanceof HtmlBlockParser.Factory || blockParserFactory instanceof IndentedCodeBlockParser.BlockFactory) && deepParser != null && deepParser.isHtmlClosed();
152152
}
153153

154154
@Override
@@ -161,6 +161,11 @@ public boolean isInterruptible() {
161161
return myHtmlBlockDeepParseMarkdownInterruptsClosed && deepParser != null && deepParser.isHtmlClosed();
162162
}
163163

164+
@Override
165+
public boolean isRawText() {
166+
return true;
167+
}
168+
164169
@Override
165170
public void closeBlock(ParserState state) {
166171
block.setContent(content);

flexmark/src/main/java/com/vladsch/flexmark/internal/HtmlDeepParser.java

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,7 @@ public enum HtmlMatch {
1616
NON_TAG("<(![A-Z])", ">", false),
1717
TEMPLATE("<([?])", "\\?>", false),
1818
COMMENT("<(!--)", "-->", false),
19-
CDATA("<!\\[(CDATA)\\[", "\\]\\]>", false),
20-
;
19+
CDATA("<!\\[(CDATA)\\[", "\\]\\]>", false),;
2120

2221
public final Pattern open;
2322
public final Pattern close;
@@ -32,6 +31,7 @@ public enum HtmlMatch {
3231

3332
public static final Set<String> BLOCK_TAGS;
3433
public static final Set<String> VOID_TAGS;
34+
public static final Map<String, Set<String>> OPTIONAL_TAGS;
3535
public static final Pattern START_PATTERN;
3636
private static final HtmlMatch[] PATTERN_MAP;
3737
static {
@@ -58,6 +58,23 @@ public enum HtmlMatch {
5858
String[] voidTags = ("area|base|br|col|embed|hr|img|input|keygen|link|menuitem|meta|param|source|track|wbr").split("\\|");
5959
VOID_TAGS.addAll(Arrays.asList(voidTags));
6060

61+
OPTIONAL_TAGS = new HashMap<String, Set<String>>();
62+
OPTIONAL_TAGS.put("li", new HashSet<String>(Arrays.asList(new String[] { "li" })));
63+
OPTIONAL_TAGS.put("dt", new HashSet<String>(Arrays.asList(new String[] { "dt", "dd" })));
64+
OPTIONAL_TAGS.put("dd", new HashSet<String>(Arrays.asList(new String[] { "dd", "dt" })));
65+
OPTIONAL_TAGS.put("p", new HashSet<String>(Arrays.asList(new String[] { "address", "article", "aside", "blockquote", "details", "div", "dl", "fieldset", "figcaption", "figure", "footer", "form", "h1", "h2", "h3", "h4", "h5", "h6", "header", "hr", "main", "menu", "nav", "ol", "p", "pre", "section", "table", "ul" })));
66+
OPTIONAL_TAGS.put("rt", new HashSet<String>(Arrays.asList(new String[] { "rt", "rp" })));
67+
OPTIONAL_TAGS.put("rp", new HashSet<String>(Arrays.asList(new String[] { "rt", "rp" })));
68+
OPTIONAL_TAGS.put("optgroup", new HashSet<String>(Arrays.asList(new String[] { "optgroup" })));
69+
OPTIONAL_TAGS.put("option", new HashSet<String>(Arrays.asList(new String[] { "option", "optgroup" })));
70+
OPTIONAL_TAGS.put("colgroup", new HashSet<String>(Arrays.asList(new String[] { "colgroup" })));
71+
OPTIONAL_TAGS.put("thead", new HashSet<String>(Arrays.asList(new String[] { "tbody", "tfoot" })));
72+
OPTIONAL_TAGS.put("tbody", new HashSet<String>(Arrays.asList(new String[] { "tbody", "tfoot" })));
73+
OPTIONAL_TAGS.put("tfoot", new HashSet<String>(Arrays.asList(new String[] { "tbody" })));
74+
OPTIONAL_TAGS.put("tr", new HashSet<String>(Arrays.asList(new String[] { "tr" })));
75+
OPTIONAL_TAGS.put("td", new HashSet<String>(Arrays.asList(new String[] { "td", "th" })));
76+
OPTIONAL_TAGS.put("th", new HashSet<String>(Arrays.asList(new String[] { "td", "th" })));
77+
6178
// combine all patterns and create map by pattern number
6279
PATTERN_MAP = new HtmlMatch[HtmlMatch.values().length];
6380
StringBuilder startPattern = new StringBuilder();
@@ -124,6 +141,21 @@ public boolean hadHtml() {
124141
return myHtmlCount > 0 || !isHtmlClosed();
125142
}
126143

144+
// handle optional closing tags
145+
private void openTag(final String tagName) {
146+
if (!myOpenTags.isEmpty()) {
147+
String lastTag = myOpenTags.get(myOpenTags.size() - 1);
148+
149+
if (OPTIONAL_TAGS.containsKey(lastTag)) {
150+
if (OPTIONAL_TAGS.get(lastTag).contains(tagName)) {
151+
myOpenTags.set(myOpenTags.size() - 1, tagName);
152+
return;
153+
}
154+
}
155+
}
156+
myOpenTags.add(tagName);
157+
}
158+
127159
public void parseHtmlChunk(CharSequence html, boolean blockTagsOnly, final boolean parseNonBlock, final boolean firstOpenTagOnOneLine) {
128160
if (myHtmlCount == 0 && myHtmlMatch != null) {
129161
myHtmlCount++;
@@ -163,7 +195,7 @@ public void parseHtmlChunk(CharSequence html, boolean blockTagsOnly, final boole
163195
if (pendingOpen != null) {
164196
// now we have it
165197
if (!VOID_TAGS.contains(pendingOpen)) {
166-
myOpenTags.add(pendingOpen);
198+
openTag(pendingOpen);
167199
}
168200
myHtmlCount++;
169201
}
@@ -234,9 +266,9 @@ public void parseHtmlChunk(CharSequence html, boolean blockTagsOnly, final boole
234266
myHtmlMatch = htmlMatch;
235267
myClosingPattern = htmlMatch.close;
236268
if (useFirstOpenTagOnOneLine) {
237-
pendingOpen = group;
269+
pendingOpen = group;
238270
} else {
239-
myOpenTags.add(group);
271+
openTag(group);
240272
if (myHtmlCount != 0) myHtmlCount++;
241273
}
242274
} else {

0 commit comments

Comments
 (0)