I'm learning about LR parsers that use ENBF grammars where non-terminals are defined by regular expressions. To start out easy, I've been rendering the grammar rules as railroad diagrams in text using the Unicode table drawing characters. It's pretty easy to get a pleasing layout by following the tree structure of the regexps.

I also generate a minimal automaton to recognize the regexp, and now I wonder if it is possible to create a railroad diagram for the automaton. I don't want to do arbitrary graph layout à la Graphviz, I want to infer some structure that helps with the railroad layout. It's a lot like creating structured control flow from an arbitrary CFG.


Wirth actually has some irreducible loops in the Pascal manual. I couldn’t render those — that’s not a regular expression.

Sticking separators like “,” and “;” in the loop back-edge is a neat trick. Regular expressions can’t do that, but I can render those diagrams by modeling a loop with a forwards and backwards part.

(I think ANTLR has regexp syntax for “foos separated by bars”)

Sign in to participate in the conversation
OldBytes Space - Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!