8/29/2014 - 9:14 PM

Parsing CSS file with monadic parser in Clojure

Parsing CSS file with monadic parser in Clojure

Parsing CSS file with monadic parser in Clojure

Inspired by "Parsing CSS with Parsec".

Just quick notes and code that you can play with in REPL.

By @kachayev

0. Intro

What do we have?

.container h1 {
  color: rgba(255, 0, 0, 0.9);
  font-size: 24px;
  font-family: Monaco;

What do we want to get?

{:selector ".container h1",
 ({:key "color", :value "rgba(255, 0, 0, 0.9)"}
  {:key "font-size", :value "24px"}
  {:key "font-family", :value "Monaco"})}

Let's do this with Monadic parsing approach. Step-by-step explanation of how it works one can find in Monadic Parsing in Python.

What about Clojure?

1. Parser abstraction

From theory:

type Parser a = String -> [(a, String)]

So, we're going to represent parser as simple function that takes input and returns seq of possible results. Simplest parsers:

;; takes any input and "consume" first char from it
(defn any [input]
  (if-let [c (first input)]
    (list [c (.substring input 1)])

;; this one doesn't accept any input
(defn failure [_] '())

(any "clojure-1.9") ;; => ([\c lojure-1.9])

We also will use few helpers:

(defn parse [parser input]
  (parser input))

(defn parse-all [parser input]
  (->> input
       (parse parser)
       (filter #(= "" (second %)))

2. Monads

I'm not going to pay much attention to what monads are. Everything that you need to know is that we will use it to compose small parsers into bigger one. Also we're not going to implement entire "monad infrastructure".

;; builds parser that always returns given element without consuming (changing) input
(defn return [v]
  (fn [input] (list [v input])))

;; takes parser and function that builds new parsers from (each) result of applying first one
(defn >>= [m f]
  (fn [input]
    (->> input
         (parse m)
         (mapcat (fn [[v tail]] (parse (f v) tail)))))

Next part is a simple macro that provides haskell-like do notation. If you don't know how do block works in Haskell, just skip this code and return to it when necessary.

(defn merge-bind [body bind]
  (if (and (not= clojure.lang.Symbol (type bind))
           (= 3 (count bind))
           (= '<- (second bind)))
    `(>>= ~(last bind) (fn [~(first bind)] ~body))
    `(>>= ~bind (fn [~'_] ~body))))

(defmacro do* [& forms]
  (reduce merge-bind (last forms) (reverse (butlast forms))))

3. Basic parsers

Let's build basic parsers that can recognize chars, strings, letters, spaces, etc.

(defn- -sat [pred]
  (>>= any (fn [v] (if (pred v) (return v) failure))))

;; just a helper
(defn- -char-cmp [f]
  (fn [c] (-sat (partial f (first c)))))

;; recognizes given char
(def match (-char-cmp =))

;; rejects given char
(def none-of (-char-cmp not=))

;; just a helper
(defn- -from-re [re]
  (-sat (fn [v] (not (nil? (re-find re (str v)))))))

;; recognizes any digit
(def digit (-from-re #"[0-9]"))

;; recognizes any letter
(def letter (-from-re #"[a-zA-Z]"))

Lets try:

(parse (match "c") "clojure1.9")
;; => ([\c "lojure1.9"])

(parse letter "clojure1.9")
;; => ([\c "lojure1.9"])

(parse digit "1.9clojure")
;; => ([\1 ".9clojure"])

Ok, so far so good. Moving forward...

4. Combinators

Useful combinators that will help us:

;; (ab)
(defn and-then [p1 p2]
   (r1 <- p1)
   (r2 <- p2)
   ;; xxx: note, that it's dirty hack to use STR to concat outputs,
   ;; full functional implementation should use MonadPlus protocol
   (return (str r1 r2))))

;; (a|b)
(defn or-else [p1 p2]
  (fn [input]
    (lazy-cat (parse p1 input) (parse p2 input))))

(declare plus)
(declare optional)

;; (a*)
(defn many [parser]
  (optional (plus parser)))

;; (a+) equals to (aa*)
(defn plus [parser]
   (a <- parser)
   (as <- (many parser))
   (return (cons a as))))

;; (a?)
(defn optional [parser]
  (or-else parser (return "")))

Pay special attention to plus combinator implementation using do* mentioned above. do* is only syntax sugar for bind operations, in reality plus looks like:

(defn plus [parser]
  (>>= parser
       (fn [a] (>>= (many parser)
                    (fn [as] (return (cons a as)))))))

Example of combinators usage:

;; recognizes space (or newline)
(def space (or-else (match " ") (match "\n")))

;; recognizes empty string or arbitrary number of spaces 
(def spaces (many space))

;; recognizes given string, i.e. "clojure"
(defn string [s]
  (reduce and-then (map #(match (str %)) s)))

Lets try to build something more complicated and interesting:

(def clojure-version (do*
                      (string "clojure")
                      (match " ")
                      (major <- digit)
                      (match ".")
                      (minor <- digit)
                      (return (str "major: " major "; minor: " minor))))

(parse-all clojure-version "clojure 1.7")
;; => "major: 1; minor: 7"

5. CSS

From CSS 2.1 grammar definition:

type Selector = String
data Rule = Rule String String deriving Show
data Ruleset = Ruleset Selector [Rule] deriving Show

In Clojure we can use records to represent data types:

(defrecord Rule [key value])
(defrecord Ruleset [selector rules])

Now lets define rule and ruleset parsers:

(def letter+ (or-else letter (match "-")))

(def rule (do*
           (p <- (many (none-of ":")))
           (match ":")
           (v <- (many (none-of ";")))
           (match ";")
           (return (Rule. (apply str p) (apply str v)))))

(def ruleset (do*
              (s <- (plus (none-of "{")))
              (match "{")
              (r <- (plus rule))
              (match "}")
              (return (Ruleset. (trim (apply str s)) r))))

Play a bit:

(parse-all rule "background: #fafafa; ")
;; {:key "background", :value "#fafafa"}

(parse-all ruleset "p { background: #fafafa; color: red; }")
;; {:selector "p",
;;  :rules
;;  ({:key "background", :value "#fafafa"} {:key "color", :value "red"})}

(def css ".container h1 {
  color: rgba(255, 0, 0, 0.9);
  font-size: 24px;
  font-family: Monaco;

(parse-all ruleset css)
;; {:selector ".container h1",
;;  :rules
;;  ({:key "color", :value "rgba(255, 0, 0, 0.9)"}
;;   {:key "font-size", :value "24px"}
;;   {:key "font-family", :value "Monaco"})}

Looks nice.

6. Notes

  • it's easier to work both with monads and parsers in dynamically typed language

  • it's not that convenient to work with strings and list of chars (you can see few "irrational" apply str _ and first _)

  • there is still room for improvement, i.e. more general return, bind and do*

  • you can try to rewrite parsers using applicative style