company June 25, 2008

Three Markdown Gotchas

I’ve enjoyed working with the excellent WMD “what you see is what you mean” Markdown control while building Stack Overflow. I’ve been very pleasantly surprised how easy it is to type a smattering of concise Markdown and generate rather nice-looking content. One of Markdown’s biggest advantages is its simplicity. Here’s a little Markdown test post…
Avatar for Jeff Atwood
Co-Founder (Former)

I’ve enjoyed working with the excellent WMD “what you see is what you mean” Markdown control while building Stack Overflow. I’ve been very pleasantly surprised how easy it is to type a smattering of concise Markdown and generate rather nice-looking content.

One of Markdown’s biggest advantages is its simplicity. Here’s a little Markdown test post I’ve been using that exercises the basic formatting options:

##Header##

----------

Some **bold** Some *italic* and [a link][1] 

A little code sample

    </head>
    <title>Web Page Title</title>
    </head>

A picture

![alt text][2]

A list

- apples
- oranges
- eggs

A numbered list

1. a
2. b
3. c

A little quote

> It is now time for all good men to come to the aid of their country. 

A final paragraph.

  [1]: http://www.google.com
  [2]: http://www.google.com/intl/en_ALL/images/logo.gif

However, I’ve also noticed there are a few edge cases where Markdown syntax can get weird and produce unexpected results.

I started to wonder if there were other edge conditions in advanced Markdown syntax I should know about. I figured John Fraser of AttackLab, the author of the WMD control, would be the best person to ask. He was kind enough to respond in some detail, and granted permission for me to repost his thoughts, where he outlines three gotchas to worry about when using Markdown:

1) Markdown’s single biggest flaw is its intra-word emphasis.

I don’t think anybody writes:

un*fricking*believable

often enough to justify making it nearly impossible to talk about tokens with underscores in them:

some_file_name

is interpreted as:

some<em>file</em>name

It even works across word boundaries:

file_one and file_two

becomes:

file<em>one and file</em> two

Whenever you’re writing tokens with underscores you have to make absolutely sure you’re in a code span. The same problem will also nail you on equations like abc, but that seems to pop up less frequently.

Showdown follows the reference implementation on all this, but in WMD I do a little preprocessing to hack the idiocy away: basically I just backslash-escape any underscores or asterisks that might trigger it. It’s a flagrant violation of the standard, but since it’s a pre-pass that should produce identical output with any Markdown processor, I feel justified. Unfortunately my hack did screw up one edge case (which I don’t have in front of me) and there isn’t any way to disable it. Both those things will change in the next release.

2) List items only nest if they cross a magical four-character boundary.

So:

- level 1
  - level 2
    - level 3
      - level 4
        - level 5
          - level 6

is interpreted as:

- level 1
    - level 2
    - level 2
        - level 3
        - level 3
            - level 4

Which can be pretty surprising to humans. I’ve suggested an alternative algorithm a couple of times but it looks like neither of the big implementors is interested. (The mailing list’s HTML archive strips the whitespace from that first link; do “View Source” to make it make sense.)

3) Mixing HTML and Markdown has a couple of serious limitations.

You can put Markdown within inline elements:

<span>This *will* work.</span>

but not within block elements:

<div>
  This *won't* work.
</div>

I think this is a symptom of Markdown’s being designed for blog posts. You can paste in big chunks of foreign HTML verbatim without having to double-check them, but it’s pretty much impossible to write whole pages in Markdown. Again Gruber’s not interested; dunno about Fortin.

In my mind, this last one is huge. If we allowed Markdown within block-level HTML, we could write a non-lossy version of html2text and make my dream of Markdown as a transient editing format a reality.

Oh, also? The HTML parser is pretty broken, so what gets recognized as a complete block of HTML can sometimes be surprising. But Showdown uses an older, even-more-broken algorithm than the latest Markdown.pl beta, so I probably shouldn’t point fingers.

Remember, if you don’t like Markdown, you can always fall back to HTML — at least the whitelisted HTML. And if you’re curious about how any of this works I strongly encourage you to head over to the WMD advanced demo sandbox and try it out for yourself.

Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

code-for-a-living July 28, 2021

700,000 lines of code, 20 years, and one developer: How Dwarf Fortress is built

Dwarf Fortress is one of those oddball passion projects that’s broken into Internet consciousness. It’s a free game where you play either an adventurer or a fortress full of dwarves in a randomly generated fantasy world. The simulation runs deep, with new games creating multiple civilizations with histories, mythologies, and artifacts. I reached out to him to see how he’s managed a single, growing codebase over 15+ years, the perils of pathing, and debugging dead cats. Our conversation below has been edited for clarity. 
code-for-a-living July 5, 2021

Best practices for writing code comments

While there are many resources to help programmers write better code—such as books and static analyzers—there are few for writing better comments. While it's easy to measure the quantity of comments in a program, it's hard to measure the quality, and the two are not necessarily correlated. A bad comment is worse than no comment at all. Here are some rules to help you achieve a happy medium.