Array Abstinence

Some people are unhappy with the situation that PicoLisp has no built-in array (or vector) data type. As described in the reference manual, the PicoLisp Machine supports exactly three data types: Numbers, Symbols and Lists.

But what about arrays? After all, any other language supports arrays! We are so very much used to them.

If we look at a typical Fortran, C or Java program, we find almost every second statement to be something like (e.g. in Java):
   for (i = 0; i < Arr.length; ++i)
It is nearly impossible to survive in such languages without arrays.

What is an Array?

An array is a wonderful data structure.

It implements a mapping of numeric keys (integers) to arbitrary data. Given a number, the associated data can be found in O(1) time. And the keys themselves do not take up any memory space, they exist implicit in the data offsets form the array's start address.


PicoLisp has lists.

A list can simulate an array, albeit less efficiently. It takes up twice the space, and indexed access to its data takes O(N/2) time on the average.

But lists have a lot of advantages, and are also more efficient than arrays in many other cases. We do not need to discuss these advantages here in detail, any Lisp programmer will know them.

As PicoLisp strives for simplicity, there is no room for an additional data type - also for hard physical reasons, as the number of precious tag bits is limited. To fully support an array data type, all those powerful built-in list manipulation functions would need array counterparts, and the programmer would permanently need to make design decisions whether to use lists or arrays for the problem at hand.

If there is a choice between arrays or lists, lists will clearly win.

Are Arrays Really Needed?

My short answer would be "No". The advantages in efficiency do not outweigh the disadvantages of increased complexity.

I would even say that the proponents of arrays in PicoLisp misunderstand some aspects of PicoLisp programming.

The predominance of arrays in other languages, as shown with the above example
   for (i = 0; i < Arr.length; ++i)
does not carry over to PicoLisp. Such constructs are simply not used. Instead, you have a large number of mapping functions at your disposal:
   (mapc doSomething Arr)
In fact, it is the lack of functionality for the direct manipulation of compound data, which requires the excess indexing into arrays in other languages.

The mistake is mixing up two separate concepts:
  1. Sequential access to certain pieces of data
  2. Mapping integer keys to data items
What arrays are really needed for is point (2): You have an integer, and want to get the corresponding piece of data.

Point (1), however, can and should be handled directly and elegantly with mapping and other access functions.

So back to point (2). When in our programming life do we really need to map integers to data?

These cases are surprisingly rare. We have to keep in mind that for arrays only the mapping of continuous integer ranges makes sense. Most practical tasks of mapping numbers to other data involve sparse or non-linear input data, or non-integers, or no numeric keys at all.

Sure, there are typical cases like image rasterization. But when did you the last time implement Bresenham's line algorithm in application programming? Instead, you resort to a library, or write it in C or even assembly if a library is not available.

In other cases - like two-dimensional maps - there are better ways. Look for example how the board in the PicoLisp chess program (and other games and many rosetta code solutions) is implemented with direct connection attributes (north, west etc.) between the fields instead of integer arithmetics for array indexes.

If mapping integer keys to data items is really needed, you can also use the 'enum' function which maintains a binary tree with implicit (possibly sparse) integer keys. The tree occupies on the average one and a half cells per node, just a little more than a linear list.

For more flexible (not only integer, or even numeric) key mapping, built-in mechanisms like association lists, symbol properties or binary 'idx' trees can be used. Agreed, these mechanisms are less efficient than simple integer-indexed arrays, but the difference is not very dramatic.

Relative Performance Consideration

What would be the performance penalty for using lists instead of arrays, if we would really need a mapping of a continuous integer range to other data?

Let's assume an integer array of length 100, initialized to increasing values:
   int Arr[100];

   for (i = 0; i < 100; ++i)
      Arr[i] = i;
In PicoLisp, this is equivalent to
   (setq List (range 1 100))

Now let's increment each element:
   for (i = 0; i < 100; ++i)
The PicoLisp version would be
   (map inc List)
but let's assume for a moment we have no mapping function, and insist on indexed access instead:
   (for N 100
      (inc (nth List N)) )
'nth' is PicoLisp's idea of an indexed array access.

Let's compare that to an analog O(1) access, by repeatedly accessing the first list element:
   (for N 100
      (inc (nth List 1)) )
The time difference will be the relative overhead for the O(N/2) access to indexed list elements.

To get measurable timing results, we do each test ten thousand times. First the accesses to all list elements:
   : (bench
      (do 10000
         (for N 100
            (inc (nth List N)) ) ) )
   0.282 sec

Then the "simulation of an array data type", by accessing only the first element:
   : (bench
      (do 10000
         (for N 100
            (inc (nth List 1)) ) ) )
   0.121 sec

We can see, the results are in the same order of magnitude. The difference of 161 milliseconds was the time actually spent in list traversals.

In fact, the greatest part of execution time is taken by the interpreter overhead anyway. Compare this to a full-list increment using 'map':
   : (bench
      (do 10000
         (map inc List) ) )
   0.076 sec
Of course the difference gets bigger if the list gets longer, but typically there will be also much more processing of the data than simple increments.

Regarding all that, it should become clear that wasting tag bits and efforts for such a data type and its associated functions is not a good idea.

01dec22    abu
Revision History