Tips for learning regular expressions

Here are a few realizations that helped me the most when I was learning regular expressions.

1. Regular expressions aren’t trivial. If you think they’re trivial, but you can’t get them to work, then you feel stupid. They’re not trivial, but they’re not that hard either. They just take some study.

2. Regular expressions are not command line wild cards. They contain some of the same symbols but they don’t mean the same thing. They’re just similar enough to cause confusion.

3. Regular expressions are a little programming language.Regular expressions are usually contained inside another programming language, like JavaScript or PowerShell. Think of the expressions as little bits of a foreign language, like a French quotation inside English prose. Don’t expect rules from the outside language to have any relation to the rules inside, no more than you’d expect English grammar to apply inside that French quote.

4. Character classes are a little sub-language within regular expressions. Character classes are their own little world. Once you realize that and don’t expect the usual rules for regular expressions outside character classes to apply, you can see that they’re not very complicated, just different. Failure to realize that they are different is a major source of bugs.

Once you’re ready to dive into regular expressions, read Jeffrey Friedl’s book. It’s by far the best book on the subject. Read the first few chapters carefully, but then flip the pages quickly when he goes off into NFA engines and all that.

One Response to “Tips for learning regular expressions”

  1. Larry Singer says:

    Hi John:

    Nice blog. I need to get out my text books for some of the entries.