MiniPicoLisp Code in ROM

MiniPicoLisp is a good candidate for embedded systems. It stores s-expressions in a more compact form than standard pil32 (let alone pil64), omits some features, doesn't require POSIX, and runs faster because of its shortnum-only arithmetics.

PicoLisp is an interpreter. It reads and executes source files passed as command line arguments, which consist of s-expressions in textual representation, and which in turn cause code- and data-structures to be defined (with de, setq, class etc.) in the Lisp heap for later execution.

RAM and ROM

The Lisp heap is normally allocated in big chunks of system memory. It holds the Lisp cells (cons-pairs and symbols) for both code and data, and is managed by the garbage collector. This memory must be both readable and writable to function correctly (random access memory, RAM).

Unfortunately, RAM is a scarce resource on embedded systems. It is usually much smaller than the available ROM (read-only memory). Binary programs and libraries, as produced by a C-compiler, are stored in ROM and not modified thereafter. This includes the binary of the PicoLisp interpreter itself.

But most of the Lisp code in an embedded system won't be modified after it is debugged, and it would be nice to have a way to move it into ROM too. This would spare the precious RAM for runtime data and the stack.

A C-Preprocessor

However, this is not as straightforward as it may seem. The interpreter's C-code makes certain assumptions about how the memory is organized, and the garbage collector must know how to deal with regions of read-only memory. The s-expressions - which are graphs of cells and pointers - must eventually be understood and compiled by the C-compiler, and integrated into the binary of the PicoLisp interpreter.

I solved this by writing a special preprocessor program in C. It contains parts of the PicoLisp read functionality to be able to read Lisp source code, and produces C code fragments which are then #included in the C sources by the build procedure in the Makefile.

Overview of the Build Process

The preprocessor is called gen3m, because it generates three "M" data files ("sym.d", "rom.d" and "ram.d").

Makefile first builds the gen3m executable. This is necessary because gen3m must be aware of the architecture (cell sizes differ on 32 or 64 bit machines). Therefore, the first argument to gen3m specifies the size (32 or 64, or 0 to use the size of the host environment). Then gen3m reads its source files (currently "init.s", "lib.s" and "pilog.s") and builds the three "M" data files. The other PicoLisp source files are compiled in the normal way.

The file "init.s" replaces the file "tab.c", which used to initialize the PicoLisp kernel symbols. "lib.s" contains initial symbol definitions, and is taken from the standard PicoLisp "lib.l" file.

You can pass more *.s files to gen3m. The idea is that you debug your embedded library functions one by one, and then move them from their normal *.l source file to a *.s file, to be included in the binary (and thus in ROM).

The system symbols NIL, T and quote are created in ROM, and are thus immutable.

All other symbols are always generated in RAM structures. This is necessary so that their values can still be modified (e.g. to trace a function), or that the can receive properties via put (which is for example needed for the Pilog (PicoLisp Prolog) functionality). They won't take too much RAM space, as each symbol needs only a single cell (8 bytes on a 32-bit system), no matter how long its name or value is.

In addition to the single cell, each symbol (except transient symbols) takes up another one or two cells for the symbol table. The symbol table (a tree, to be precise) resides also in RAM, allowing symbols to be zapped at runtime.

gen3m Source File Syntax

These *.s source files contain only symbols and their definitions. The are written in normal PicoLisp syntax, though not with de etc, but as alternating sequences of symbols and their values.

Read-macros are not supported. A read-macro requires the Lisp runtime environment, which is not available when this preprocessor runs. For the same reason, this is not a read-eval-loop, so no evaluation takes place (not even de or setq).

Separated by white space and comments, the source files should contain tuples of the form: Here is how the system global *Scl is defined:
   *Scl [Scl] 0
That means: The symbol has the name "*Scl" on the Lisp level, it can be referred to by C-code as Scl, and the initial value is zero.

The function recur is defined as:
   recur (recurse
      (run (cdr recurse)) )
The symbol is "recur" on the Lisp level. It doesn't need a C identifier in brackets, and its value (function definition in this case) is the expression
   (recurse (run (cdr recurse)))


As a special case, which is not standard PicoLisp syntax, is when the value is an identifier surrounded by braces:
   append {doAppend}
Then this identifier should be a function written somewhere in the C-code. This example defines the Lisp append function.

Other Advantages

Writing MiniPicoLisp functions in C is a little bit easier now, because it is no longer necessary to specify both the prototype in "pico.h" and the initializer in "tab.c". These are generated by the preprocessor, so just a single entry in the *.s file is sufficient.

Less Redundancy

Another (theoretical) advantage is that symbols specified with a C-label (most notably the frequently-used symbols T and NIL) are now no longer variables on the C-level, but constants. This results in smaller and faster code.

Cell Sharing

Sub-structures of s-expressions which are equal to each other are shared. They are created just once in ROM, and my be pointed to from other places. This simple optimization (structure-detection) can be done by the preprocessor, because it is not time-critical.

Shortcomings

In addition to the fact mentioned above, that read-macros cannot be used in the sources, care must be taken with circular structures.

You can still specify circular lists with the PicoLisp dot-notation in ROM code. However, certain processing of circular lists like printing, or the length and size calculations, require write access - for marking - to the cells of these lists. So the above functionalities are not able to detect circular structures, and may go into infinite loops.

As a matter of principle, functions in ROM cannot be single-stepped with debug any more, as they cannot be modified. In general, any attempt to modify a function definition in ROM (or, as a matter of fact, any data structure in ROM) will result in an immediate Segmentation fault.
   : (debug 'msg)
   Segmentation fault
(This is system-dependent. The above is just Unix' way of telling us that we tried a access const data. On an embedded system in fact nothing may happen at all)

But any function - despite its definition being in ROM - can be redefined, and thus also traced.
   : (redef msg (X)
      (msg '<1>)
      (msg X)
      (msg '<2>) )
   -> "msg"

   : (msg 'abc)
   <1>
   abc
   <2>
   -> <2>

   : (trace '+)
   -> +
   : (+ 1 2 3)
    + : 1 2 3
    + = 6
   -> 6


Another disadvantage might be a slightly slower execution speed, depending on the ROM hardware.

Example

Say you defined a function in one of your Lisp sources as
   # Fibonacci function
   (de fibo (N)
      (if (>= 2 N)
         1
         (+ (fibo (dec N)) (fibo (- N 2))) ) )
How much space does this function take up?
   : (size fibo)
   -> 22
It occupies 22 cells. On a 32-bit machine, each cell has 8 bytes, so it is 176 bytes in total. The symbol fibo has a rather short name, so it fits into a single cell. Together with 1 and a half cell (statistically) for the index tree we have 196 bytes (we assume here that the symbol N is also used in other structures, and was thus not created exclusively for fibo).

Now remove the above definition from your Lisp source, and add to your *.s file
   # Fibonacci function
   fibo ((N)
      (if (>= 2 N)
         1
         (+ (fibo (dec N)) (fibo (- N 2))) ) )
Note that the only changes necessary are:
  1. delete the de symbol
  2. move the open parenthesis to the right
After you rebuilt the executable, fibo works just as before
   : (pp 'fibo)
   (de fibo (N)
      (if (>= 2 N)
         1
         (+ (fibo (dec N)) (fibo (- N 2))) ) )
   -> fibo

   : (fibo 22)
   -> 17711

   : (trace 'fibo)
   -> fibo

   : (fibo 3)
    fibo : 3
     fibo : 2
     fibo = 1
     fibo : 1
     fibo = 1
    fibo = 2
   -> 2
Only 20 of the 196 bytes stay in RAM (for the fibo symbol), the rest is moved off to ROM.

Download Link

MiniPicoLisp can, as before, be downloaded from miniPicoLisp.tgz.

It is meant just as an example. For concrete applications you should modify (reduce and extend) it, as needed for the target platform.

https://picolisp.com/wiki/?minicoderom

30nov17    abu