PicoLisp Wiki: tutdb

Database Programming

To a database, there is more than just persistence. PicoLisp includes an entity/relation class framework (see also Database) which allows a close mapping of the application data structure to the database.

We provided a simple yet complete database and GUI demo application in "doc/family.l". We recommend to start it up for test purposes in the following way:

   $ ./dbg doc/family.l -main
   :

This loads the source file, initializes the database by calling the 'main' function, and prompts for user input.

The data model is small and simple. We define a class '+Person' and two subclasses '+Man' and '+Woman'.

   (class +Person +Entity)

'+Person' is a subclass of the +Entity system class. Usually all objects in a database are of a direct or indirect subclass of +Entity . We can then define the relations to other data with the rel function.

   (rel nm (+Need +Sn +Idx +String))      # Name

This defines the name property ('nm') of a person. The first argument to 'rel' is always a list of relation classes (subclasses of +relation), optionally followed by further arguments, causing relation daemon objects be created and stored in the class definition. These daemon objects control the entity's behavior later at runtime.

Relation daemons are a kind of metadata, controlling the interactions between entities, and maintaining database integrity. Like other classes, relation classes can be extended and refined, and in combination with proper prefix classes a fine-grained description of the application's structure can be produced.

Besides primitive relation classes, like '+Number', '+String' or '+Date', there are

relations between entities, like '+Link' (unidirectional link), '+Joint' (bidirectional link) or '+Hook' (object-local index trees)
relations that bundle other relations into a single unit ('+Bag')
a '+List' prefix class
a '+Blob' class for "binary large objects"
prefix classes that maintain index trees, like '+Key' (unique index), '+Ref' (non-unique index) or '+Idx' (full text index)
prefix classes which in turn modify index class behavior, like '+Sn' (modified soundex algorithm for tolerant searches, see Donald E. Knuth: "The Art of Computer Programming", Vol.3, Addison-Wesley, 1973, p. 392).
a '+Need' prefix class, for existence checks
a '+Dep' prefix class controlling dependencies between other relations

In the case of the person's name ('nm') above, the relation object is of type '(+Need +Sn +Idx +String)'. Thus, the name of each person in this demo database is a mandatory attribute ('+Need'), searchable with the soundex algorithm ('+Sn') and a full index ('+Idx') of type '+String'.

   (rel pa (+Joint) kids (+Man))          # Father
   (rel ma (+Joint) kids (+Woman))        # Mother
   (rel mate (+Joint) mate (+Person))     # Partner

The attributes for father ('pa'), Mother ('ma') and partner ('mate') are all defined as '+Joint's. A '+Joint' is probably the most powerful relation mechanism in PicoLisp; it establishes a bidirectional link between two objects.

The above declarations say that the father ('pa') attribute points to an object of type '+Man', and is joined with that object's 'kids' attribute (which is a list of joints back to all his children).

The consistency of '+Joint's is maintained automatically by the relation daemons. These become active whenever a value is stored to a person's 'pa', 'ma', 'mate' or 'kids' property.

For example, interesting things happen when a person's 'mate' is changed to a new value. Then the 'mate' property of the old mate's object is cleared (she has no mate after that). Now when the person pointed to by the new value already has a mate, then that mate's 'mate' property gets cleared, and the happy new two mates now get their joints both set correctly.

The programmer doesn't have to care about all that. He just declares these relations as '+Joint's.

The last four attributes of person objects are just static data:

   (rel job (+Ref +String))               # Occupation
   (rel dat (+Ref +Date))                 # Date of birth
   (rel fin (+Ref +Date))                 # Date of death
   (rel txt (+String))                    # Info

They are all searchable via a non-unique index ('+Ref'). Date values in PicoLisp are just numbers, representing the numbers of days since first of March in the year zero.

A method 'url>' is defined:

   (dm url> ()
      (list "@person" '*ID This) )

It is needed later in the GUI, to cause a click on a link to switch to that object.

The classes '+Man' and '+Woman' are subclasses of '+Person':

   (class +Man +Person)
   (rel kids (+List +Joint) pa (+Person)) # Children

   (class +Woman +Person)
   (rel kids (+List +Joint) ma (+Person)) # Children

They inherit everything from '+Person', except for the 'kids' attribute. This attribute joins with the 'pa' or 'ma' attribute of the child, depending on the parent's gender.

That's the whole data model for our demo database application.

It is followed by a call to dbs ("database sizes"). This call is optional. If it is not present, the whole database will reside in a single file, with a block size of 256 bytes. If it is given, it should specify a list of items, each having a number in its CAR, and a list in its CDR. The CARs taken together will be passed later to pool , causing an individual database file with that size to be created. The CDRs tell what entity classes (if an item is a symbol) or index trees (if an item is a list with a class in its CAR and a list of relations in its CDR) should be placed into that file.

A handful of access functions is provided, that know about database relationships and thus allows higher-level access modes to the external symbols in a database.

For one thing, the B-Trees created and maintained by the index daemons can be used directly. Though this is rarely done in a typical application, they form the base mechanisms of other access modes and should be understood first.

The function tree returns the tree structure for a given relation. To iterate over the whole tree, the functions iter and scan can be used:

   (iter (tree 'dat '+Person) '((P) (println (datStr (get P 'dat)) (get P 'nm))))
   "1770-08-03" "Friedrich Wilhelm III"
   "1776-03-10" "Luise Augusta of Mecklenburg-Strelitz"
   "1797-03-22" "Wilhelm I"
   ...

They take a function as the first argument. It will be applied to all objects found in the tree (to show only a part of the tree, an optional begin- and end-value can be supplied), producing a simple kind of report.

More useful is collect; it returns a list of all objects that fall into a range of index values:

   : (collect 'dat '+Person (date 1982 1 1) (date 1988 12 31))
   -> ({2-M} {2-L} {2-E})

This returns all persons born between 1982 and 1988. Let's look at them with show:

   : (more (collect 'dat '+Person (date 1982 1 1) (date 1988 12 31)) show)
   {2-M} (+Man)
      nm "William"
      dat 724023
      ma {2-K}
      pa {2-J}
      job "Heir to the throne"

   {2-L} (+Man)
      nm "Henry"
      dat 724840
      ma {2-K}
      pa {2-J}
      job "Prince"

   {2-E} (+Woman)
      nm "Beatrice"
      dat 726263
      ma {2-D}
      job "Princess"
      pa {2-B}

If you are only interested in a certain attribute, e.g. the name, you can return it directly:

   : (collect 'dat '+Person (date 1982 1 1) (date 1988 12 31) 'nm)
   -> ("William" "Henry" "Beatrice")

To find a single object in the database, the function db is used:

   : (db 'nm '+Person "Edward")
   -> {2-;}

If the key is not unique, additional arguments may be supplied:

   : (db 'nm '+Person "Edward"  'job "Prince"  'dat (date 1964 3 10))
   -> {2-;}

The programmer must know which combination of keys will suffice to specify the object uniquely. The tree search is performed using the first value ("Edward"), while all other attributes are used for filtering. Later, in the Pilog section, we will show how more general (and possibly more efficient) searches can be performed.

https://picolisp.com/wiki/?tutdb

02nov10

abu