Markdown is a simple and convenient format to write documentations as simple text. This format is commonly used by platform such as GitHub.
In this post, we will describe how to parse and use your markdown content to produce other formats. For this purpose, we will implement the Pegdown tool available at the address https://github.com/sirthias/pegdown.
Installing Pegdown
We use the version 1.2.1 of Pegdown based on Parboiled available at https://github.com/sirthias/parboiled. Jar files of these tools are respectively available at adresses https://github.com/sirthias/pegdown/downloads and https://github.com/sirthias/parboiled/downloads. You can notice that the asm tool is also required.
I have a Markdown file that I wish to convert to PDF so that I can upload it on Speakerdeck. I am using Pandoc to convert from markdown to PDF. My problem is I can't specify what content should go on what page of the PDF, because Markdown doesn't provide any feature like that. E.g., Markdown: ###Hello. abc. def ###Bye. ghi. jkl. There are two ways to format code in Markdown. You can either use inline code, by putting backticks (`) around parts of a line, or you can use a code block, which some renderers will apply syntax highlighting to.
After having installing these tools, we will have the following jar files in our classpath:
- Get the best out of Visual Studio Code for Markdown.
- The Java Markdown Generator library contains helper classes that are frequently used in our projects, they will do all the hard work for you. Most elements can be created in-line: Most elements can be created in-line.
Extensible markdown java implementation. Node.js binding for Discount. Parser for JavaScript/node.js.
- asm-all-4.1.jar: the asm tool
- parboiled-core-1.1.4.jar: the parboiled core jar
- parboiled-java-1.1.4.jar: the parboiled jar specific for Java
- pegdown-1.2.1.jar: the pegdown jar
Lets dive now into how to handle markdown content.
Parsing markdown with Pegdown
Pegdown provides a processor that parses your markdown content provided as input. Following code describes how to parse markdown:
String fileName = (...)
PegDownProcessor processor = new PegDownProcessor(Extensions.ALL);
char[] markdown = FileUtils.readAllChars(fileName);
Preconditions.checkNotNull(markdown, 'The specified file isn't found - '+fileName);
RootNode rootNode = processor.parseMarkdown(markdown);
The parseMarkdown method actually parses the content and provides a RootNode document corresponding to an object representation. You are now ready to use it to create another content (XML, and so on).
Using markdown content
We want to parse the following markdown-formatted text content. We will base on it for the rest of the post.
An introduction sentence. Another introduction sentence.
An introduction sentence.
# First header title
Some content. Some content.
* List item 1: some description
* List item 2: some description
Some content. Some content.
SomeClass clazz = new SomeClass();
clazz.test();
Some content.
Here is the markdown content that will use as input of the PegDown processor:
ParaNode
SuperNode
TextNode
SpecialTextNode
TextNode
SpecialTextNode
ParaNode
SuperNode
TextNode
SpecialTextNode
HeaderNode
TextNode
ParaNode
SuperNode
TextNode
SpecialTextNode
TextNode
SpecialTextNode
BulletListNode
ListItemNode
RootNode
SuperNode
TextNode
SpecialTextNode
TextNode
ListItemNode
RootNode
SuperNode
TextNode
SpecialTextNode
TextNode
ParaNode
SuperNode
TextNode
SpecialTextNode
TextNode
SpecialTextNode
VerbatimNode
ParaNode
SuperNode
TextNode
SpecialTextNode
Based on the root node returned when parsing the markdown file, we can iterate basing the getChildren method of the Node class.
Node rootNode = (...)
List<Node> nodes = rootNode.getChildren();
StringBuilder content = new StringBuilder();
for (Node node : nodes) {
if (node instanceof HeaderNode) {
HeaderNode headerNode = (HeaderNode) node;
String text = getTextContent(node);
(...)
} else if (node instanceof ParaNode) {
ParaNode paraNode = (ParaNode) node;
String text = getTextContent(node);
(...)
} else if (node instanceof VerbatimNode) {
VerbatimNode verbatimNode = (VerbatimNode) node;
String text = getTextContent(node);
(...)
} else if (node instanceof BulletListNode) {
BulletListNode bulletListNode = (BulletListNode) node;
displayNodeChildren(bulletListNode);
content.append('<ul>');
List<Node> listItemNodes = bulletListNode.getChildren();
for (Node childNode : listItemNodes) {
if (childNode instanceof ListItemNode) {
ListItemNode listItemNode = (ListItemNode) childNode;
String text = getTextContent(childNode);
(...)
}
}
content.append('</ul>');
}
}
The getTextContent methods implement how to get text from different blocks like headers, paragraphes and code listings:
Java Markdown Generator
private String getTextContent(Node node) {
if (node instanceof TextNode) {
return getTextContent((TextNode)node);
} else if (node instanceof HeaderNode) {
HeaderNode headerNode = (HeaderNode) node;
return getTextContent((TextNode) headerNode.getChildren().get(0));
} else if (node instanceof ParaNode) {
ParaNode paraNode = (ParaNode) node;
Node firstChildNode = paraNode.getChildren().get(0);
if (firstChildNode instanceof SuperNode) {
return getTextContent((SuperNode) firstChildNode);
} else if (firstChildNode instanceof TextNode) {
return getTextContent((TextNode) firstChildNode);
}
} else if (node instanceof ListItemNode) {
ListItemNode listItemNode = (ListItemNode) node;
RootNode rootNode = (RootNode) listItemNode.getChildren().get(0);
Node firstChildNode = rootNode.getChildren().get(0);
if (firstChildNode instanceof SuperNode) {
return getTextContent((SuperNode) firstChildNode);
} else if (firstChildNode instanceof TextNode) {
return getTextContent((TextNode) firstChildNode);
}
}
return null;
}
private String getTextContent(SuperNode node) {
List<Node> nodes = node.getChildren();
StringBuilder content = new StringBuilder();
for (Node child : nodes) {
if (child instanceof TextNode) {
content.append(getTextContent((TextNode)child));
} else if (child instanceof SpecialTextNode) {
content.append(getTextContent((SpecialTextNode)child));
}
}
return content.toString();
}
private String getTextContent(TextNode node) {
return node.getText();
}
Generating new content
Now we can implement the complete transformation of our markdown content to a pseudo HTML format. We wrap headers within a h2 tag and program listing within a code tag. We leave paragraphes as they are without any wrapping. Following code describes this approach:
Java Markdown Editor
Node rootNode = (...)
List<Node> nodes = rootNode.getChildren();
StringBuilder content = new StringBuilder();
for (Node node : nodes) {
if (node instanceof HeaderNode) {
HeaderNode headerNode = (HeaderNode) node;
content.append('<h2>');
String text = getTextContent(node);
if (text!=null) {
content.append(text);
}
content.append('</h2>');
content.append('nn');
} else if (node instanceof ParaNode) {
ParaNode paraNode = (ParaNode) node;
String text = getTextContent(node);
if (text!=null) {
content.append(text);
}
content.append('nn');
} else if (node instanceof VerbatimNode) {
VerbatimNode verbatimNode = (VerbatimNode) node;
content.append('<code>');
String text = getTextContent(node);
if (text!=null) {
content.append(text);
}
content.append('</code>');
content.append('nn');
} else if (node instanceof BulletListNode) {
BulletListNode bulletListNode = (BulletListNode) node;
content.append('<ul>');
List<Node> listItemNodes = bulletListNode.getChildren();
for (Node childNode : listItemNodes) {
if (childNode instanceof ListItemNode) {
ListItemNode listItemNode = (ListItemNode) childNode;
content.append('<li>');
String text = getTextContent(childNode);
if (text!=null) {
content.append(text);
}
content.append('</li>');
}
}
content.append('</ul>');
}
}
Here is the final output:
An introduction sentence. Another introduction sentence.
An introduction sentence.
<h2>First header title</h2
Some content. Some content.
<ul>
<li>List item 1: some description</li>
<li>List item 2: some description</li>
</ul>
Some content. Some content.
Java Markdown
<code>
SomeClass clazz = new SomeClass();
clazz.test();
</code>
Java Markdown Render
Some content.