Pretty-Print and Source Code Indentation

Yesterday I rewrote the PicoLisp 'pretty'-printing function from scratch, so it is the right time to explain some things.

In Lisp, white space in S-expressions is ignored - except as delimiter between atomic data. Source code formatting is not mission-critical, and programmers are free to format their code according to their taste. On the other hand, published code is easier to read by others if it looks somewhat familiar.

Basic Algorithm

For historical reasons, the code in PicoLisp source files - and when printed with the pretty-printer - follows a different style than most other Lisps. I won't elaborate on the differences here, but the basic rule is simple:

1. If an expression is atomic or has a size less or equal to 12, then directly 'print' it.

2. Otherwise, print a left parenthesis, recurse on the CAR, then recurse on the elements in the CDR, each on a new line indented by 3 spaces, and finally print a right parenthesis separated by a space.

The original pretty-printer from the mid-80s ("8kLisp" on CP/M Z80 - not yet PicoLisp, so please ignore the details) looked like this:
   (de pretty (x l)
      (reptn l
         (reptn 4 (sp)) )
      (if (lessp (depth x) 3)
         (prin1 x)
         (putc 40)
         (while
            (and
               (member
                  (prin1 (pop x))
                  '(setq put get if when unless while until reptn))
               (lessp (depth (car x)) 3) )
            (sp) )
         (while x
            (cr)
            (pretty (pop x) (1+ l)) )
         (sp)
         (putc 41) ) )

   (de pp $x
      (pretty (getd (car $x)) 0)
      (cr) )
That's all! As you see, it already extends the above algorithm (still using 'depth' instead of 'size'), to handle certain symbols like 'if' and 'when'. After all, we don't want
   (if
      (condition)
      (trueExpr)
      (falseExpr1)
      (falseExpr2) )
but better
   (if (condition)
      (trueExpr)
      (falseExpr1)
      (falseExpr2) )
Over the years, this function grew in a heuristic way, adding more and more rules and exceptions. This was hard to maintain, and sometimes gave unexpected results.

Use Cases

The new implementation follows the same basic principles, but is more modular and flexible in handling the individual cases.

The largest confusion seems to involve large 'let' expressions. For example, the list in
   (let (A (foo a b c d)  B (bar e f g h)  C (mumble i j k l))
      ... )
is larger than 12 in size
   : (size '(A (foo a b c d)  B (bar e f g h)  C (mumble i j k l)))
   -> 21
and thus should be indented. Until now, it appeared as
   : (pretty '(let (A (foo a b c d)  B (bar e f g h)  C (mumble i j k l))
      ... ) )
   (let
      (A
         (foo a b c d)
         B
         (bar e f g h)
         C
         (mumble i j k l) )
      ... )
which is a bit unreadable. In the new version, it is displayed in the "right" way:
   (let
      (A (foo a b c d)
         B (bar e f g h)
         C (mumble i j k l) )
      ... )
Note that we still have the original convention that the CDR elements are indented by three additional spaces.

"Double" Indentation

From this rule it follows that if two left parentheses open on a line, the following lines must be indented by six spaces. This is often the case in 'cond' expressions:
   : (pretty
      '(cond
         ((and (foo a b c d) (bar e f g h)) (abc))
         ((or (foo i j k l) (bar m n o p)) (def)) ) )
   (cond
      ((and
            (foo a b c d)
            (bar e f g h) )
         (abc) )
      ((or
            (foo i j k l)
            (bar m n o p) )
         (def) ) )


Source Code

For indenting source code in the editor, I use the following script:
   #!/usr/bin/picolisp /usr/lib/picolisp/lib.l
   # 30nov13abu

   (let Lst
      (mapcar
         '((L)
            (let N 0
               (while (and L (sp? (car L)))
                  (inc 'N)
                  (pop 'L) )
               (cons N L) ) )
         (trim (split (in NIL (till)) "^J")) )
      (let (N (caar Lst)  Sup N  Str)
         (for L Lst
            (set L N)
            (while (setq L (cdr L))
               (case (car L)
                  ("\\" (pop 'L))
                  ("\"" (onOff Str))
                  ("#" (or Str (off L)))
                  ("(" (or Str (inc 'N 3)))
                  (")" (or Str (dec 'N 3)))
                  ("["
                     (unless Str
                        (push 'Sup N)
                        (inc 'N 3) ) )
                  ("]" (or Str (setq N (pop 'Sup)))) ) ) ) )
      (for L Lst
         (space (pop 'L))
         (prinl L) ) )

   (bye)

   # vi:et:ts=3:sw=3
I have it in my execution path under the name 'pilIndent', and it correctly handles indentation according to the above rules. Only indentation, though, it doesn't do anything with newlines or white spaces within a line.

As I'm using vim, I redefined the comma key in my .vimrc
   map , !}pilIndent<CR>
because the comma in vi searches backwards for single characters, and is a far too convenient key to be wasted for such seldom-used functionality.

Now, hitting the comma key on any line cause the code block (the "paragraph", i.e. up to the next empty line) to be properly indented, knowing about the relevant PicoLisp syntax.

Calling pretty from vim

It is easy to call pretty on the source code directly, without a spearate script. Just position the cursor at the beginning of an s-expression, and type
   !%pil -'pretty (read)' -bye
Note, however, that this is often not what is desired. vim pipes the parenthesized expression (because of the '%') to pil, which uses 'read' to read it from standard input, and then writes the output of pretty to standard output, which in turn 'vim' uses to replace the edit text. The drawbacks here are that read expands possible read macros (which are then lost), and removes comments.

Another problem with this simple approach is that it works correctly only for expressions starting on their own line.

For pretty-printing arbitrary sub-expressions - i.e. expressions determined by the editor's cursor position - I installed the following executable script named "pilPretty":
   #!/usr/bin/picolisp /usr/lib/picolisp/lib.l
   # 15jun14abu

   (let N 0
      (in NIL
         (do (dec (format (opt)))
            (inc
               'N
               (if (sub? (prin (char)) "([") 3 1) ) )
         (pretty (read) (- N))
         (echo) ) )
   (bye)

   # vi:et:ts=3:sw=3
It expects the current cursor position as an argument, and skips such many characters in the input stream while naively counting parentheses and brackets to calculate the proper indentation. Then it reads the expression at that position, pretty-prints it, and echoes possibly trailing text.

In my .vimrc I defined a macro for F11
   map <F11> %mz%:execute ".,'z!pilPretty" col(".")<CR>
There is perhaps a better way. The sequence %mz% is a kludge, it determines the extend of the expression with % and puts it into the z register, then builds the call to pilPretty with the current cursor column argument, and passes it to execute. I asked in #vim, but didn't get a satisfactory alternative.

http://picolisp.com/wiki/?prettyprint

09apr17    rowanthorpe