Metaprogramming experiments in PicoLisp

To follow along,

   $ git clone https://github.com/erdg/pl-defmacro.git
   $ cd pl-defmacro/
   $ pil m.l +
   : (mload "mac.m.l")
   : (load  "mac.tests.l")
   mac.m.l -- all tests passed
   -> T

I was playing around with Common Lisp macros in PicoLisp. It was mostly a joke, but it got me thinking. I find quasiquote to be such a wonderful DSL for list interpolation and macro writing. What would it take to implement "native" quasiquote-style macros in Picoisp? (i.e. without using transient symbols, as in the link above)

The problem is that both backquote ` and comma , are aleady read-macros in PicoLisp. So technically I don't want to be writing PicoLisp code anymore - I want to be writing code in a language that is mostly PicoLisp, except that ` and , (and consequently ,@) behave as if I was writing Common Lisp code.

The Picolisp macro language

Let's call this new language the PicoLisp macro language. Macro language code lives in .m.l files. By convention all code in an "mfile" is contained in one top-level call to m.
mfiles

   # 'square.m.l'
   (m
      (mac square (X) `(,X ,X))
      ...
   )

mfiles are loaded into PicoLisp with the function mload.

   : (mload "square.m.l")  # no need for this step
   -> square               # if "following along"
   : (square 3)
   -> 9

Here's the definition of mload:

   (de mload (File)
      (run
         (cdr
            (transform
               (mread File) ) ) ) )

   (def 'MREADSTRING "_-=!?<>$*")

   # read a macro file (".m.l" by convention)
   (de mread (File)
      (in File
         (make
            (while (read MREADSTRING)
               (link @) ) ) ) )

mload reads a file, transforms it from PL macro language to normal PicoLisp, and evals all the forms. mread uses the "low-level" mode of the built-in read to read an mfile into PicoLisp as a list of symbols. Parens and ` and , are read in as transient-symbols, while everything else is read in as internal symbols. This list of symbols is then transformed into a list of actual PicoLisp code.

   (de transform (Lst)
      (any
         (glue " "
            (_transform Lst) ) ) )

   (de _transform (X)
      (recur (X Acc)
         (ifn X
            (flip Acc)
            (case (car X)
               ("`" (recurse (cdr  X) (cons (sym "`") Acc)) )
               ("," (if (= (cadr X) "@")
                        (recurse (cddr X) (cons (sym ",@") Acc))
                        (recurse (cdr X) (cons (sym ",") Acc)) ) )
               (T   (recurse (cdr  X) (cons (car X) Acc))) ) ) ) )

_transform simply recurses over the list to create a copy. Anytime ` or , or ,@ is encountered, sym is called so it remains a transient symbol. The resulting list is glueed together with spaces to create a string. The string is converted to a list with any and all forms in the cdr (discard the dummy function m) are evaled.

To break it down:

   # code in square.m.l
   (m
      (mac square (X) `(* ,X ,X))
   )

   # normal picolisp repl now
   : (mload "square.m.l")

         (mread "square.m.l")
         -> ("(" m "(" mac square "(" X ")" "`"(* "," X "," X ")" ")" ")" )

         (glue " " (_transform @))
         -> "(m ( mac square ( X ) "\"`\"" ( * "\",\"" X "\",\"" X ) ) )"

         (any @)
         -> (m (mac square (X) "`"(* ","X ","X)))

         (cdr @)
         -> ((mac square (X) "`"(* ","X ","X)))

         (mapc eval @)

   -> square

   : (square 9)
   -> 81

m1 - the (one-shot) macro repl
If mfiles are a layer over normal PicoLisp files to allow "native" use of quasiquote forms, a "macro repl" would be a layer over the normal PicoLisp repl.

   : (m1)
   m1 -- the macro repl
   : (let X 2 (quasiquote `(* ,X ,X)))^  # hat '^' to end
   -> (* 2 2)
   -> NIL   # no idea why there are two "returns"

The macro repl is easy to implement in PicoLisp.

   (de m1 ()
      (prinl "m1 -- the macro repl")
      (prin  ": ")
      (let M (till '^)  # hat '^' to end
         (prog
            (out (tmp "mrepl") (prin M))
            (eval (transform (mread (tmp "mrepl"))) ) ) ) )

All input till the hat ^ character is written to a temporary file. That file (one lisp form) is then read, transformed and evaluated.

Note that evaling a quasiquote form is functionally the same as the PL built-in macro,

   : (m1)
   m1 -- the macro repl
   : (let X 2 (eval (quasiquote `(* ,X ,X))))^
   -> 4
   -> NIL

   : (let @X 2 (macro (* @X @X)))
   -> 4

and using macro on a quoted list is the same as quasiquote.

   : (let @X 2 (macro '(* @X @X)))
   -> (* 2 2)

   : (m1)
   m1 -- the macro repl
   : (let X 2 (quasiquote `(* ,X ,X)))^
   -> (* 2 2)
   -> NIL

With mfiles and the macro repl in place, it's time to bootstrap a useful macro-writing environment!

A macro-writing macro

mac is the PicoLisp (macro language) equivalent of Common Lisp's defmacro.

   (de mac Lst                         # unevaluated args
      (let [(Nm Args . Body) Lst]      # are destructured
         (eval
            (quasiquote                # and filled in
               `(de ,Nm Lst
                  (let [,Args Lst]
                     (eval
                        (quasiquote ,@Body) ) ) ) ) ) ) )

This is fine, but there's a lot of noise in the definition. Let's clean it up a bit.
Read-macros to the rescue
The evq read-macro was created because (eval (quasiquote ...)) is such a common pattern. With evq, the definition of mac becomes:

   (de mac Lst
      (let [(Nm Args . Body) Lst]
         ~(evq
            `(de ,Nm Lst
               (let [,Args Lst]
                  ~(evq ,@Body) ) ) ) ) )

Much more concise.

Here's the definition of evq:

   # normal picolisp code

   (de eval-quasiquote Lst
      (macro
         '((eval (quasiquote ^ Lst))) ) )

   (def 'evq eval-quasiquote)

It transforms (evq ...) into (eval (quasiquote ...)) as the code is read. As such, the previous two definitions of mac are identical as far as the PicoLisp interpreter is concerned.

Note that evq returns a list wrapped in an extra layer of parens. This is because the tilde ~ read-macro is used to splice it in. Remember that the backquote ` read-macro has been shadowed in mfiles, so the ~ read-macro must be used.

Another common pattern is (let [...] (eval (quasiquote ...))). leq was created as an abbreviation. leq allows to concisely express quasiquote macros with variable bindings using the same syntax as let.

Here's the definition of leq:

   # normal picolisp code

   (de let-eval-quasiquote Lst
      (let [(Args . Body) (leqargs Lst)]
         (macro
            '((let ^ Args (eval (quasiquote ^ Body )))) ) ) )

   (de leqargs (Lst)
      (let [L (_leqargs Lst)
            I (index (find pair L) L)
            Args (head I L)
            Body (tail (- I) L)]
         (macro '((^ Args) ^ Body)) ) )

   (de _leqargs (Lst)
      (if (atom (car Lst))
         (cons (car Lst) (_leqargs (cdr Lst)))
         Lst ) )

It works the same as the evq read-macro, but requires some extra processing of arguments to correctly parse the (possibly unquoted) 'let' args and body

Using leq, the definition of macro becomes:

   # macro language code

   (de mac Lst
      ~(leq [(Nm Args . Body) Lst]
         `(de ,Nm Lst
            ~(leq [,Args Lst]
               ,@Body) ) ) )

Now that is a sharp piece of code! And once again, because leq is a read-macro that transforms the raw list structure, this definition is the same as the previous few.

Writing new macros

mac can be used to define macros (in mfiles).

   # macro language code

   (mac aif (Test Then Else)
      `(let it ,Test
            (if it ,Then ,Else) ) )

   # NOTE - 'aif' is merely an example macro and would never be used in
   # picolisp because the same functionality exists in the built-in
   # 'if'

aif is an "anaphoric" macro that captures the symbol it and binds it to the result of the Test expression so it can be refered to without having to explicitly create new variables.

   # normal picolisp code

   (de add2-safe (N)
      (aif (num? N)
         (+ 2 it)
         (prinl "ERROR - " it " is NOT a number")) )

As expected, the definition of aif has the (eval (quasiquote ...)) pattern:

   # normal picolisp code, transformed from macro language

   : (pretty aif)
   (Lst
      (let ((Test Then Else) Lst)
         (eval
            (quasiquote
               "`"(let it "," Test
                     (if it "," Then "," Else) ) ) ) ) )

mc - the macro compiler

If aif is going to be used heavily, a serious run-time performance hit will be taken. eval and quasiquote are pretty expensive operations. But there's nothing in aif's run-time code that will require quasiquote. Remember that "eval quasiquote" forms are just a different way of writing PicoLisp macros. It turns out that a "macro compiler" can be written which turns "eval quasiquote" macros into normal PicoLisp macros by "precomputing" the translation.

Well this is exactly what quasiquote does! Looking at the definition,

   (de quasiquote Lst
      (macro
         (macro
            (^(macro (_quasiquote ^ Lst))) ) ) )

the translation happens in two parts. First quasiquote passes the list to the internal function _quasiquote

   (_quasiquote
      "`"(let it "," Test
            (if it "," Then "," Else) ) )

_quasiquote does the actual transformation on the raw list structure. (see source or explanation) At this point nothing has been evaluated and _quasiquote returns the translation

   '(let it ^ (list Test)
      (if it ^ (list Then) ^ (list Else)) )

quasiquote then passes this list to macro to fill in the values. The translation (_quasiquote) is distinct from the execution (quasiquote), and thus can be "precomputed" and the macro can be redefined.

Which is what mc (the macro compiler) does. (see source) The idea is to pipe the macro definition

   (eval
      (quasiquote
         "`"(let it "," Test
               (if it "," Then "," Else) ) ) )

to low-level read to transform (eval (quasiquote ...)) forms into (macro `(_quasiquote ...))" forms (with a backquote ` read-macro), like so

   (macro
      `(_quasiquote
         "`"(let it "," Test
               (if it "," Then "," Else) ) ) )

bring this back to PicoLisp so the backquote read-macro fires

   (macro
      '(let it ^ (list Test)
         (if it ^ (list Then) ^ (list Else)) )

pipe that back to low-level read to remove the quotes from the macro lists and finally bring it back to PicoLisp again where it can run like normal

   (macro
      (let it ^ (list Test)
         (if it ^ (list Then) ^ (list Else)) )

Compiled macros
mc can be used to define a compiled version of a macro

   : (def 'aifc (mc aif))
   -> aifc

   : (pretty aif)
   (Lst
      (let ((Test Then Else) Lst)
         (eval
            (quasiquote
               "`"(let it ","Test
                     (if it "," Then "," Else) ) ) ) ) )

   : (pretty aifc)
   (Lst
      (let ((Test Then Else) Lst)
         (macro
            (let it ^ (list Test)
               (if it ^ (list Then) ^ (list Else)) ) ) ) )


Compiled macros are an order of magnitude faster.

   : (bench (do 1000000 (aif (+ 1 2) (+ 3 it))))
   2.646 sec
   -> 6

   : (bench (do 1000000 (aifc (+ 1 2) (+ 3 it))))
   0.308 sec
   -> 6

The compiler could be improved by further "compiling" to more efficient forms that don't use macro, but that is beyond the scope of this article.

A moment of reflection

So what's the point of all this? Ostensibly, it allows the use of a more convenient syntax for code transformations.

   # write this
   (mac aif (Test Then Else)
      `(let it ,Test
            (if it ,Then ,Else) ) )


   # get this (compiled)
   (de aif Lst
      (let [(Test Then Else) Lst]
         (macro
            (let it ^ (list Test)
               (if it ^ (list Then) ^ (list Else)) ) ) ) )

While it's not much of an improvement for such a simple macro, it does make it easy to write more complex macros.

   (mac defunits (Quantity Base . Units)
      `(mac! ,(symb 'unit-of- Quantity) (!Val !Unit)
         `(* ,,$Val
            (case ,,$Unit
               (,Base 1)
               ,@(mapcar
                     ~(qfn (X)      # read-macro for quasiquote fns
                        `(,(car X)
                           ,(defunits-chaining
                              (car X)
                                 (cons
                                    (quasiquote `(,Base 1))
                                    (groups 2 Units) ) ) ) )
                     (groups 2 Units) ) ) ) ) )

The best part is that the whole system is just a bunch of PicoLisp functions (FEXPRS really). Everything can be debugged and stepped, traced, etc - a deep understanding of the system can be gained.

But it could be argued whether such "macros" have any place in PicoLisp...

So maybe more realistically, the point of all this is to show just how flexible and powerful PicoLisp is. A pretty impressive (toy) Common Lisp Macro System can be added to PicoLisp with roughly 300 lines of code.

(NOTE - the following sections offer PicoLisp translations of defmacro/g!, defmacro! and defunits from Doug Hoyte's book, Let Over Lambda)

Macros and multiple evaluation

Let's return to our first macro, square (actually square-buggy if "following along")

   (mac square (X) `(* ,X ,X))

It works as expected for simple numbers,

   : (square 3)
   -> 9

but unexpected things start happening when side-effects are involved.

   : (let Y 2 (square (inc 'Y)))    # expect 9
   -> 12                            # -_-

Looking at the compiled version of square,

   : (pretty (mc square))
   -> (Lst (let ((X) Lst) (macro (* ^ (list X) ^ (list X)))))

it can be seen that X is listed (and thus evaluated) twice. So the above call was actually,

   : (let Y 2 (* (inc 'Y) (inc 'Y)))   # (* 3 4)
   -> 12

instead of the expected

   : (square 3)

One possible solution is to introduce a new anonymous symbol (with box) and then bind the argument X to it.

   (mac square (X)
      (let $X (box)
         `(let ,$X ,X
            (* ,$X ,$X) ) ) )

This works as expected.

   : (let Y 2 (square (inc 'Y)))
   -> 9

mac$
Multiple evaluation is a pretty common problem and it would be annoying to have to manually create so many boxes. Better to write another macro. mac$ is a variation of mac that scans its Body for "$ symbols" and automatically creates boxes.

   # mac$ - 'defmacro/g!' from Let Over Lambda

   (de mac$ Lst
      ~(leq [(Name Args . Body) Lst]
         `(de ,Name Lst
            (let [,Args Lst]
               ~(leq ,[boxes ($syms-in Body)]
                  ,@Body ) ) ) ) )

   (de boxes (Lst)
      (mapcan ~(qfn `(,Q1 (box))) Lst) )

   (de $syms-in (L)
      (uniq (filter $sym (flat L))) )

   (de $sym (S)
      (and (sym? S)
         (> (length S 1))
         (pre? '$ S) ) )

As Doug Hoyte writes in Let Over Lambda, mac$ allows us to say:

I want a [box] to be bound around this expression, and I've already given the symbol. Make it happen.

(see Section 3.5 in Let Over Lambda)

(NOTE - this macro was named mac$ because anonymous symbols (boxes) are visually represented as e.g. $177389156348275 in PicoLisp)

Nested backquotes

This idea of "automatic boxes" can be taken a step further. (see Section 3.6 of Let Over Lambda)
mac!

   # mac! - 'defmacro!' from Let Over Lambda

   (de mac! Lst
      (let [(Name Args . Body) Lst]
         ~(leq [!Syms (!syms-in Args) $Syms (!syms-to-$syms !Syms)]
            `(de ,Name Lst
               (let [,Args Lst]
                  ~(leq ,[boxes ($syms-in Body)]
                     `(let ,,[letargs (list ,@$Syms) (list ,@!Syms)]
                        ~(evq ,@Body) ) ) ) ) ) ) )

   (de !syms-in (L)
      (uniq (filter !sym (flat L))) )

   (de !sym (S)
      (and (sym? S)
         (> (length S 1))
         (pre? '! S) ) )

   (de !sym-to-$sym (S)
      (any (pack '$ (cdr (chop S)))) )

   (de !syms-to-$syms (Syms)
      (mapcar !sym-to-$sym Syms) )

   (de letargs (Xs Ys)
      (mapcan list Xs Ys) )

mac! is used like this:

  (mac! square (!X)    # args prefixed with '!' are evaluated once only
     `(* ,$X ,$X) )    # and referred to as prefixed with '$'

mac! accomplishes this by gathering the !Args and creating the corresponding $Syms. It then searches the body of the macro being defined for any additional $Syms. All the $Syms are then bound to boxes in the expansion. Finally - when the defined macro is called (hence the nested backquotes) - the result of evaluting the provided !Args are bound to the boxes.

In other words, mac! creates a bunch of variable bindings in the macro it defines (some of which are are finally filled in during that macro's run-time) to provide a convenient way of handling variable capture and execution when writing macros.
defunits
defunits was shown earlier, but here it is again. (see Section 5.2 of Let Over Lambda)

   (mac defunits (Quantity Base . Units)
      `(mac! ,(symb 'unit-of- Quantity) (!Val !Unit)
         `(* ,,$Val
            (case ,,$Unit
               (,Base 1)
               ,@(mapcar
                     ~(qfn (X)
                        `(,(car X)
                           ,(defunits-chaining
                              (car X)
                                 (cons
                                    (quasiquote `(,Base 1))
                                    (groups 2 Units) ) ) ) )
                     (groups 2 Units) ) ) ) ) )

   (de defunits-chaining (U Units)
      (let Spec (assoc U Units)
         (ifn Spec
            (prinl (text "Unknown unit @1" U))
            (let Chain (cadr Spec)
               (if (pair Chain)
                  (* (car Chain)
                     (defunits-chaining
                        (cadr Chain)
                        Units ) )
                  Chain ) ) ) ) )

defunits is a macro used to define relationships between similar types of quantities (e.g. quantities of time - seconds, minutes, days, milliseconds, etc.)

   : (defunits time s
         m 60        # minutes
         h (60 m)    # hours
         d (24 h)    # days
         y (365 d)   # years
         C (100 y)   # centuries
         M (10 C) )  # millenia
   -> unit-of-time

   # 10,000 millenia  is a lot of seconds
   : (unit-of-time 10000 'M)
   -> 315360000000000

   : (defunits distance-lol in
         ft 12
         yd (3 ft)
         fathom (2 yd)
         mile (5280 ft)
         rod (15 ft)
         furlong (40 rod)
         chain (4 rod)
         league (3 mile)
         cubit 18 )
   -> unit-of-distance-lol

   # compile it
   (def 'unit-of-distance-lol (mc unit-of-distance-lol))

   : (+ (unit-of-distance-lol 156 'rod)
        (unit-of-distance-lol 17  'league)
        (unit-of-distance-lol 3   'cubit) )
   -> 3259494 # inches

Nested backquotes allow for general patterns like this:

You're writing a macro (the parent) that writes a macro (the child) and you need to refer to the (future) run-time arguments of the child macro right now (in the definition of the parent).

More specifically, defunits is a macro that writes another macro (unit-of-[quantity]) that captures user-defined relationships as basic multiplication (using a case statement). That's a fancy way of saying there are 60 seconds in a minute and in order to find out how many seconds are in 17 minutes we need to multiply (* 17 60).

Right now, however, we're writing a macro that defines another macro which defines relationships between (as-of-now) unknown quantities. But we know that whatever they are, we want to multiply them to get to some number of some base unit. So we use nested backquotes (and "stacked" unquotes) to name these abstract somethings and write code with them now.

As a catchphrase:

"when a macro-writing (parent) macro has run-time arguments (from the child) to use, nested backquotes will do!"
Differences to Common Lisp
Regarding (nested) backquotes - each quasiquote call "removes" only one backquote (typically the car of Lst), while processing "one level" of 'unquote' and 'unqoute-splice's (nested arbitrarily deep) in Lst.

   : (m1)
   m1 -- the macro repl
   : (let X 2
      (quasiquote
         `(* ,X `(+ ,,X ,@(3 4 5))) ) )

   -> (* 2 "`"(+ ","X 3 4 5))

It is what it is, I guess. Seems easy enough to reason about and still allows for nested backquote macros that work as expected (though the exact syntax differs from common lisp).

https://picolisp.com/wiki/?metaprogrammingexperiments

13dec20    erik
Revision History