*==============================================================================*
|                                                                              |
|                        * Constructing the Metatable *                        |
|                                                                              |
|                Bringing Associative Arrays and Objects Into C                |
|                                                                              |
|                           By James "Quasar" Haley                            |
|                                                                              |
*==============================================================================*

=== * License * ================================================================

This document may be freely copied and distributed, but may not be altered.

Code segments taken from this article may only be placed into programs which
are under the following licenses:

* BSD
* GNU GPL

If the program is BSD-licensed, it must either stipulate in addition that all 
use of those modules containing code taken from this file must remain open-
source in any publicly distributed program, or the program must include this
file in whole, in human-readable form, with any public distribution.

=== * Table of Contents (Use "Find" to jump to a section by number) * ==========

[000] Introduction
[001] Principles: Inheritance and Polymorphism
[002] Foundation: The List
[003] First Floor: The Hash Table
[004] Interlude: Hash Specialization
[005] Elevator: The Metaobject
[006] Penthouse: The Metatable
[007] The View: Finding Objects
[008] The Rooftop: Metatypes 
[009] The Sky is the Limit: Metaobject Specializations



=== * Introduction [000] * =====================================================

C, being neither intrinsically object-oriented nor dynamically typed, does not 
offer much when it comes to creating structures whose desired layout can only be
completely known at runtime.

The creation of the metatable seeks to overturn the static, fixed nature of the 
C structure and augment it with an elegant and simple-to-use interface which 
provides the power of objects in languages such as JavaScript, where properties
can be fluidly added or removed from objects at runtime.

This article will step through the underlying structures that are needed to 
build the metatable, and will then follow through with the meat of the system
itself, piece by piece.



=== * Principles: Inheritance and Polymorphism [001] * =========================

Before going into implementation details, it is important to outline a 
fundamental principle - the simulation of inheritance and polymorphic
behavior in C.

Even though C is not an object-oriented language, the structures that it
provides serve as the basis of objects in C++, and are therefore similar 
already.

However, an important property of C structures allows us to add inheritance,
such that more derived types contain the fields of their parents, and not only
that, but those types are capable of being treated as their parents in code
that expects the parent type.

The property which allows this is defined as part of the ANSI C standard, and
is the fact that the first item in a structure type is always aligned to the
beginning of the structure in memory.

-- Example --

   typedef struct parent_s
   {
      int x;
      int y;
   } parent_t;
   
   typedef struct child_s
   {
      parent_t parent;
      int z;
   } child_t;
   
   void printparent(parent_t *parent)
   {
      printf("x = %d, y = %d\n", parent->x, parent->y);
   }
   
   void examplefunc(void)
   {
      child_t object = { { 1, 2 }, 3 };
      
      printparent((parent_t *)&object);
   }

-- End Example --

As you can see above, it is possible to directly cast a pointer to an object of
the child type to its parent type. 

Unfortunately there is a small caveat to this fact: in the C99 standard, it was
decided that compiler optimization was a more important goal than backward 
compatibility or flexibility, and the casting of pointers to unrelated types was
severely restricted. In most compilers it is possible to disable this behavior, 
with options such as GCC's -fno-strict-aliasing.

When disabling strict aliasing optimizations is undesirable, it is also possible
to skirt the issue by using an expression such as:
  
   &object->parent
   
Being of the appropriate type, there is no aliasing issue. This is more verbose,
however, and can be problematic in some cases. It also does not help with the
issue of downcasting from the parent type to the child type. For that it is
necessary to have a "void *object" member in the parent type which always points
to the actual start of the derived object, in which case downcasting is 
accomplished like this:

   child_t *child = (child_t *)(parent->object);
   
With the addition of function pointers to the structures in question, it also
becomes possible to simulate polymorphism:

-- Example --

   typedef struct parent_s
   {
      int x;
      int y;
      void (*print)(struct parent_s *);
   } parent_t;
   
   typedef struct child_s
   {
      parent_t parent;
      int z;
   } child_t;

   void printparent(struct parent_s *obj)
   {
      printf("x = %d, y = %d\n", parent->x, parent->y);
   }

   void printchild(struct parent_s *obj)
   {
      child_t *cobj = (child_t *)obj; // downcasting to child type
      printf("x = %d, y = %d, z = %d\n", obj->x, obj->y, cobj->z);
   }
   
   void examplefunc(void)
   {
      int i;
      parent_t justaparent = { 1, 2, printparent };
      child_t  achild      = { { 1, 2, printchild }, 3 };
      
      // put pointers to the objects defined above into an array,
      // just for the sake of treating them uniformly
      parent_t *objects[2];
      objects[0] = &justaparent;
      objects[1] = &achild;
      
      // invoke the print function pointer in each object
      for(i = 0; i < 2; ++i)
         objects[i]->print(objects[i]);
   }
   
-- End Example --

The output from the example above would look like this:

   1, 2
   1, 2, 3
   
Because these function-pointer-based methods can be assigned unique values on
a per-type or even per-object basis, you should immediately realize that they
are always "virtual," or in other words, the function they call is determined
at run-time rather than at compile-time. This matches the behavior of methods
in dynamic languages like JavaScript.

These two important fundamental principles are what drives the metatable.



=== * Foundation: The List [002] * =============================================

At the very utmost foundation of the metatable, we need a simple and efficient 
list structure which supports the insertion and deletion of objects in constant
time, and without requiring that the objects all be of the same type. You will
see eventually that these lists serve as the chains, or buckets, of another
critical underlying component, a generic hash table implementation, but let's
not get ahead of ourselves.

We start by defining the list item structure:

-- Code --

   typedef struct listitem_s
   {
      struct listitem_s  *next;   // next object in the list
      struct listitem_s **prev;   // pointer to previous object's next pointer
      void               *object; // a pointer to the parent object
   } listitem_t;
   
-- End Code --

The use of the pointer-to-pointer prev link may at first be a suprise, but it
will be explained immediately upon definition of the insertion and deletion
functions used by this list structure.

* Inserting an Object *

To insert a link item into a list, we define a ListInsert function which takes
a pointer to the listitem, a pointer to the base object (for satisfying C99,
mostly), and the address of the list's head pointer, which starts out NULL when
the list is empty.

-- Code --

   void ListInsert(listitem_t *item, void *object, listitem_t **head)
   {
      // Get the item at the head of the list, if one exists.
      // It will become the next item in the list after the one being added.
      listitem_t *next = *head; 
      
      // Assign the old head value to the new item's next pointer. If the
      // pointer is valid, set the next object's "prev" pointer to point back
      // to our own next link.
      if((item->next = next))
         next->prev = &item->next;
         
      // Our own item's prev pointer should point to the list head, because it
      // is the first item in the list.
      item->prev = head;
      
      // Reset the list head pointer to point to our new item.
      *head = item;

      // Set the object pointer (this could be made self-referential)
      item->object = object;
   }
   
-- End Code --

Notice how this code is free of special cases for the beginning, middle, or
end - all nodes are treated uniformly, which is a basic property of this list
type which makes it blazingly fast.

Insertion of items is always at the head of the list, which is an important
property for our hash table that will be defined later.

* Removing an Object *

All that is needed to complete this abstract data type is a deletion method 
which can remove arbitrary items from the list:

-- Code --

   void ListRemove(listitem_t *item)
   {
      listitem_t **prev = item->prev; // get previous item's next pointer
      listitem_t *next  = item->next; // get this item's next pointer
   
      // Redirect the previous object's next pointer to this item's next object.
      // If next object is valid, set its prev pointer to the previous object.
      if((*prev = next))
         next->prev = prev;
   }

-- End Code --

That's it for deletion - a single if statement and two assignments. This
function is quite likely to be inlined if declared properly in a header file,
and is simply as fast as a linked list deletion can be made. The pointer-to-
pointer prev member not only enables this uniformity, but it also allows the 
deletion to be contextless: notice that we don't even have to know of which 
list the object is a part in order to remove it from its list. This is another
important quality for our higher-level data structures.



=== * First Floor: The Hash Table [003] * ======================================

The foundation of objects in dynamic, weakly-typed languages such as JavaScript
is invariably the associative array, or hash table, which when given a string
name, returns the object (or objects) which has that name.

Hash tables are an extension of the concept of a simple array, where you give
an index number and are returned a particular value, but hash tables are more
general in the following ways:

* The contained items do not have to be consecutive in memory.
* The index that serves as input does not strictly have to be numeric.
* The hash does not have to have a valid object at every index in its range.
* The hash may have more than one object at a given index.

Defining a simple hash table isn't that complicated, but for the purposes of
implementing our higher-level metatable, we want a hash which is general. That
means that it can apply to any type of object, so long as a specialization is
provided for the hashing of that type of object. We accomplish this by requiring
that any object to be hashed descends from listitem_t, our linked list item
type. This allows us to employ the pseudo-inheritance outlined in the first
section of this article.

In order to do this, we will need to create a basic hash structure with a 
dynamically allocated set of hash chains, sometimes also called "buckets," that
will each hold objects in the same equivalence class - that is, whose keys are
equal modulus the size of the hash table.

We will also need to make our structure contain a number of methods, so that
the functions of the hash operation can be tailored to specific specializations.
Let's define these needed operations first:

-- Code --

   // You will find it necessary to forward-declare the typedef of your hash
   // structure type here, so that it is visible to the function pointer type
   // below.
   typedef struct hash_s hash_t;

   // Hash key computation function prototype. The hash key is computed from the
   // data held in an object, and determines the chain to which it belongs.
   typedef unsigned int (*HashFunc_t)(const void *);

   // Object comparison function prototype. When looking for an object in the 
   // hash table, we need to do object comparisons.
   typedef boolean      (*CompFunc_t)(hash_t *, void *, const void *);

   // Key retrieval function prototype. Because our hash table doesn't want to 
   // have to know about the memory layout of the objects it contains, we have a 
   // method which returns a pointer to the "key" member of the item.
   typedef const void * (*KeyFunc_t) (void *);

   // Link retrieval function prototype. Again, we don't know the memory layout
   // of the objects that will be in our hash, so we have this method to return 
   // the listitem_t -- if the listitem_t is at the beginning of the function, 
   // and we are not subject to C99's aliasing restrictions, then we may just 
   // actually return the same pointer we are passed as an argument, in which
   // case this method is not technically needed.
   typedef void *       (*LinkFunc_t)(void *);

-- End Code --

With the above method types defined, let us define our basic hash structure:

-- Code --

   struct hash_s
   {
      listitem_t **chains;    // hash chains - an array of list head pointers.
      unsigned int numchains; // number of chains
      unsigned int numitems;  // number of items currently in table
      float loadfactor;       // load = numitems / numchains
      HashFunc_t hashfunc;    // hash key computation function
      CompFunc_t compfunc;    // object comparison function
      KeyFunc_t  keyfunc;     // key retrieval function
      LinkFunc_t linkfunc;    // link-for-object function
      boolean isinit;         // true if hash is initialized
   };

-- End Code --

Because the key and link functions are typically simple one-line "getter"
methods, it would be advantageous to define a pair of macros which can auto-
generate these functions for any type that we want to be able to add to this
hash table. We can do this using some preprocessor magic as such:

-- Code --

// Key Function macro - autogenerates a key retrieval function

#define KEYFUNC(type, field) \
   static const void *HashKeyFunc_ ## type ## field (void *object) \
   { \
      return &(((type *)object)-> field ); \
   }

#define KEYFUNCNAME(type, field) HashKeyFunc_ ## type ## field

// Link Function macro - autogenerates a link retrieval function.

#define LINKFUNC(type, field) \
   static void *HashLinkFunc_ ## type ## field (void *object) \
   { \
      return &(((type *)object)-> field ); \
   }

#define LINKFUNCNAME(type, field) HashLinkFunc_ ## type ## field

-- End Code --

If you are not familiar with the C preprocessor's ## operator, it is simply
a string concatenation operation. It takes the tokens on both sides of it
and turns them into a single token.

Our KEYFUNC and LINKFUNC macros can now be used to autogenerate the needed
methods for every type we put into a generic hash table. Simply pass the
macros the name of the type and the name of the field. The use of these
macros will be demonstrated explicitly in the definition of the metatable
later, so don't feel too lost. The corresponding NAME macros let us refer
to those autogenerated names without having to know about the implementation
of the key and link function macros.

When we are free of C99 restrictions on aliasing, we can additionally define
a default link function, since as mentioned above, the link function becomes
a no-op for structures which directly inherit from listitem_t. For such 
structures, a default link method would appear like this:

-- Code --

void *LinkForObject(void *object)
{
   return object;
}

-- End Code --

As long as the listitem_t is at the head of the item, we can simply return the
same pointer passed in, and via our inheritance simulation, it will be treated
as a listitem. We will be assigning this method to any item that does not define
its link method explicitly in just a moment.

The first operation that our hash table itself needs to support, as with any
abstract data type, is being initialized. Let's define a HashInit function which
will set up a hash table, specializing its generic structure to function on a
specific specialization.

-- Code --

   void HashInit(hash_t *table, unsigned int numchains,
                 HashFunc_t hfunc, CompFunc_t cfunc, KeyFunc_t kfunc, 
                 LinkFunc_t lfunc)
   {
      // Create the array of chains and initialize them all to zero.
      // IRL, you would want to check the validity of calloc's return value.
      table->chains = calloc(numchains, sizeof(listitem_t *));
      
      // Initialize the number of chains, number of items contained in the
      // table, and the table's load factor (a metric of how many items are in
      // the table versus the number of chains in the table) all to zero.
      table->numchains  = numchains;
      table->numitems   = 0;
      table->loadfactor = 0.0f;
      
      // Assign the methods passed into this function to the hash object, so
      // that it will behave polymorphically. This allows the generic hash
      // algorithm we are about to write to function over any sort of object.
      //
      // If a link function is not passed in, we use the default one we defined
      // above (caveat C99 users).
      table->hashfunc = hfunc;
      table->compfunc = cfunc;
      table->keyfunc  = kfunc;
      table->linkfunc = lfunc ? lfunc : LinkForObject;
      
      // Mark the table as initialized.
      // In C99 you might want to use the standard _Bool type. For ANSI C, just
      // define an enumeration to handle this, as such: 
      // typedef enum { false, true } boolean;
      table->isinit = true;
   }

-- End Code --

This function sets up our hash table. Let's take some time to break down some of
the individual components further:

* chains

   A hash table always has a certain number of chains, or buckets, which the
   objects placed into it are categorized based on the value of their key.
   Think of the chains as being the extended "array," and the key as being
   the extended "index."
   
* numchains, numitems, loadfactor

   The number of allocated chains has to be tracked for purposes of taking the
   object key values modulus the number of chains, so that we do not index into
   the chains array out of bounds.
   
   We additionally track the number of items in the table, and use that
   information to compute a load factor as such:
   load factor = numitems / numchains
   This is important for supporting growable hash tables - that is, hash tables
   where the number of chains can be changed at runtime. This is important 
   because, ideally, the search behavior of a hash table is O(1), or constant
   time.
   
   The more items added to a hash table, the more likely you are to have more
   than one item on a chain, and for efficiency, this is undesirable. Thus,
   when the load factor exceeds a certain preordained level, it may be good
   to "rehash." We will define a rehashing function later.
   
   
Now, how do we go about adding an object to our hash table? Quite simple really.
We want to invoke the table's key method on the object being inserted, and
then compute the hash code, or index into the chains table, by taking the key
number modulus the number of chains. After that, we simply insert the object
into that chain using our earlier-defined list insertion operation.

-- Code --

   void HashAddObject(hash_t *table, void *object)
   {
      if(table->isinit) // be sure the table is initialized
      {
         // Get the key for the object passed in.
         const void *key = table->keyfunc(object);
      
         // Compute the hash code.
         unsigned int hashcode = table->hashfunc(key) % table->numchains;
      
         // Insert the object into the list.
         // linkfunc returns the pointer to the object's listitem_t parent.
         // We also pass in the object pointer, and the address of the chain 
         // into which we're inserting the object.
         ListInsert(table->linkfunc(object), 
                    object, 
                    &(table->chains[hashcode]));

         // We added an object, so increment the number of items and recompute 
         // the table's load factor.
         table->numitems++;

         table->loadfactor = (float)table->numitems / table->numchains;
      }
   }

-- End Code --

We can support removing objects in a similar manner, but even more simply. Here
is where the contextless efficiency of our linked list's remove method comes in
handy, as we are not required to determine what chain the object is in before
removing it.

-- Code --

   void HashRemoveObject(hash_t *table, void *object)
   {
      if(table->isinit)
      {
         // Just remove the item's parent listitem from its list.
         ListRemove(table->linkfunc(object));
      
         // Decrement number of items and recompute the load factor.
         table->numitems--;
         table->loadfactor = (float)table->numitems / table->numchains;
      }
   }

-- End Code --

Note that you will also want to define a version of the HashRemoveObject
function that does not decrement the load factor. This will be useful in the
rehashing process that will be covered below. The purpose will be evident when
we get to it.

Now, the next and arguably most important operation that the hash table must
support is the finding of an object inside the table given the object's key.
Because we do not want to restrict our table to having a single object with a
given key, we implement this operation as an iterator which will walk down the
hash chain and return each item in it that matches the given key.

A search is started by passing a NULL pointer in object. A search is continued
by passing the previous return value in object successively. By doing this until
the function returns NULL, we can iterate over every object in the table with
the same key.

-- Code --
   
   // Given a key, this routine returns the first object found in the table
   // with that key.
   void *HashObjectForKey(hash_t *table, const void *key)
   {
      if(table->isinit)
      {
         // Compute the object's hash code as during insertion
         unsigned int hashcode = table->hashfunc(key) % table->numchains;
         
         // Get a pointer to the object's chain.
         listitem_t *chain  = table->chains[hashcode];
      
         // Walk down the chain until we find an object that matches the
         // requested key. We do this by calling the table's compfunc method.
         while(chain && !table->compfunc(table, chain->object, key))
            chain = chain->next;
      
         // If the chain is exhausted, there is no object with that key in the
         // table, so we return NULL in that case. Otherwise, we return the
         // listitem's object pointer.
         return chain ? chain->object : NULL;
      }
      else
         return NULL; // this is in case the table isn't initialized...
   }

   void *HashObjectIterator(hash_t *table, void *object, const void *key)
   {
      void *ret;

      // Be sure the table is initialized.
      if(!table->isinit)
         return NULL;

      // Starting a new search? Get the first object that matches the key.
      if(!object) 
         ret = HashObjectForKey(table, key);
      else
      {
         // "object" is valid, so we are continuing a search. Get the listitem
         //  for the last item that was returned.
         listitem_t *item = table->linkfunc(object);

         // Start on the next object in hash chain, since we already processed
         // the one that was passed back to us.
         item = item->next; 

         // Walk down the chain and use the comparison method to look for the
         // next match.
         while(item && !table->compfunc(table, item->object, key))
            item = item->next;

         // Again, if a match is found, we return it. Otherwise there are no
         // more objects in the table with that key.
         ret = item ? item->object : NULL;
      }

      return ret;
   }

-- End Code --

When you know there is only ever one object in a table for each key, you can
simply call the HashObjectForKey function. Otherwise, you use the 
HashObjectIterator function as in this example:

-- Example --

   void example(void)
   {
      void *object = NULL;
      
      while((object = HashObjectIterator(table, object, key)))
      {
         // cast it to the descendent type and do something with it...
      }
   }

-- End Example --

The important thing to grasp is that you both pass the "object" pointer into
the iterator function as the second parameter, and put the function's return
value into it. This both serves to prime the search initially, and to power
its step to the next item until the list is exhausted.

It is also possible to define a similar iteration which passes over ALL items in
the hash table regardless of their keys, but that will be reserved for 
discussion later in the metatable implementation, where we will find a need to 
use it.

Finally, let's define our rehash operation, which will be used when we see that
our table is too full. It isn't always necessary or even desirable to rehash
tables, so we leave the calling of this function up to the code that is using
the hash table, rather than dictating a rehashing policy in this data structure
itself.

What rehashing implies is removing all objects from the table, resizing the 
array of hash chains, and then readding all of the objects to the table. This is
necessary because changing the number of chains changes the objects' hash codes.

Without rehashing, the search time for an object in the table will approach
the worst-case linear time, which is undesirable. By rehashing, we hope to
minimize "collisions," which are the occurance of objects whose keys are not
equal, but whose hash codes are, such that they all end up inside the same
chain.

Note that rehashing is a very expensive operation, so the way in which the table
growth is managed is extremely important. An example of smart growing will be
demonstrated in the definition of the metatable later.

-- Code --

   void HashRebuild(hash_t *table, unsigned int newnumchains)
   {
      listitem_t **oldchains = table->chains;       // save current chains
      unsigned int oldnumchains = table->numchains;
      unsigned int i;

      if(!table->isinit)
         return;

      // Allocate a new chains table. We rely on the caller to tell us how many
      // chains should be allocated, so this could in fact shrink the table, if
      // you needed to decrease the number of chains for some reason.
      table->chains    = calloc(newnumchains, sizeof(listitem_t *));
      table->numchains = newnumchains;

      // Mark the number of items currently in the table as zero.
      table->numitems = 0;

      // Run down the old hash chains.
      for(i = 0; i < oldnumchains; ++i)
      {
         listitem_t *chain;

         // Restart from the beginning of the old chain each time, since the
         // linked list has been modified.
         while((chain = oldchains[i]))
         {
            // Remove from the old hash. This is calling the discussed but not
            // illustrated method that removes an object from the table without
            // decrementing table->numitems. We don't want to do that here, 
            // since we already set numitems to 0 above.
            HashRemoveObjectNoCount(table, chain->object);

            // Clear out fields for safety.
            chain->next = NULL;
            chain->prev = NULL;

            // Add this object to the new hash.
            HashAddObject(table, chain->object);
         }
      }

      // recalculate load factor
      table->loadfactor = (float)table->numitems / table->numchains;

      // delete old chains
      free(oldchains);
   }

-- End Code --

This concludes implementation of the generic hash table data structure. We now
have an associative array that, when given an identifying key object, returns to
us the object or objects that it represents. The next section is an interlude
that demonstrates how to provide specializations of this generic data structure.



=== * Interlude: Hash Specialization [004] * ===================================

In order to make our generic hash useful, we need to provide one or more
"specializations," similar to the function of templates in C++. Though the
generic hash interface itself is not particularly typesafe via its use of void
pointers, our specializations restrict that genericity to a single type. As 
long as you do not deliberately do something stupid, it is impossible to fall
afoul of typing problems when using a specialization.

It is possible to define specializations that use any sort of key, but the two
most common key types for objects in a hash table are plain integers and 
strings. We will examine both for completeness, but it is the string hash that
our metatable will eventually employ, in two different flavors - case sensitive
and case insensitive. Only the case sensitive hash will be detailed here, since
the only difference between the two is the treatment of characters in the 
comparison and hash key computation functions.

Using an integer as a hash key is the simplest possible specialization. Let's 
start out by defining the hash function itself. All we have to do is return the
integer key. The key will be taken modulus the table size to compute the hash
code automatically by the generic routines defined earlier.

-- Code --

   unsigned int IntHashFunc(const void *key)
   {
      return (unsigned int)(*(const int *)key);
   }

-- End Code --

Remember that the hash function is passed a pointer to the key by the generic
routines, so we need to dereference the key pointer in order to get the value
of the integer. We then cast it to unsigned int for purposes of matching the
return value, so that the modulus in the hash code computation never renders a
negative value that would crash our program.

Next we need a comparison function that, given an object in the hash and a key,
checks to see if the object has the same key.

-- Code --

   boolean IntCompareFunc(hash_t *table, void *object, const void *key)
   {
      // Get the object's key using the table's keyfunc method as usual.
      int objkey = *(int *)(table->keyfunc(object));

      // Compare the object's key to the key in question and return the result.
      return objkey == *(const int *)key;
   }

-- End Code --

That's pretty much all that is required to provide a specialization for a
particular type of key. In order to actually use an integer hash, we still need
one additional function however, and that is a function that sets up a hash_t
to act like an integer hash for a particular type of object. We do that here:

-- Code --

   void IntHashInit(hash_t *table, unsigned int numchains, 
                    KeyFunc_t kfunc, LinkFunc_t lfunc)
   {
      // Our hash and compare functions are set, but the key and link functions,
      // which can be defined with the KEYFUNC and LINKFUNC macros, are specific
      // to the object type using this hash.
      HashInit(table, numchains, IntHashFunc, IntCompareFunc, kfunc, lfunc);
   }

-- End Code --

Now I will show a similar specialization for strings, but all at once without a
lot of commenting. It is the same structure, but based around a different key
type.

-- Code --
   
   // We have to turn a string into an unsigned integer. There are a billion
   // ways to do that, all with their own ups and downs. This method is based
   // on the one used by the SGI implementation of the C++ STL.
   
   unsigned int StrHashFunc(const void *key)
   {
      // It's important to remember we are given a POINTER to the key, and the
      // key in this case is a const char *, so "key" is really a pointer to a
      // pointer. We cast and then dereference one level.
      const char *c = *(const char **)key;
      unsigned int h = 0;

      // Go down the string and do some math magic on the characters. The chief
      // idea is to use multiplication to spread out the values, because the
      // more bits that are set in the key, the less likely are hash collisions.
      while(*c)
      {
         h = 5 * h + *c;
         ++c;
      }

      return h;
   }

   boolean StrCompareFunc(hash_t *table, void *object, const void *key)
   {
      const char *objkey = *(const char **)(table->keyfunc(object));

      // Simple string comparison between the two keys.
      return !strcmp(objkey, *(const char **)key);
   }

   void StrHashInit(hash_t *table, unsigned int numchains, 
                    KeyFunc_t kfunc, LinkFunc_t lfunc)
   {
      HashInit(table, numchains, StrHashFunc, StrCompareFunc, kfunc, lfunc);
   }

-- End Code --

It should be easy to see the development pattern for hash specializations
between these two examples. Try defining specializations for unsigned integers,
case-insensitive strings, and even complex objects on your own.



=== * Elevator: The Metaobject [005] * =========================================

We have now defined a hash table that can be specialized to deal with any sort
of object, so long as that object derives from our equally generic linked list.
So what happens if we define a sort of object which itself is capable of 
pointing to anything that derives from itself, and keeps track of important
type information?

Then we would have what I call the metaobject, which serves as the basis of our
metatable.

-- Code --

   #define METATYPE(type) #type
   
   typedef struct metaobject_s
   {
      listitem_t links;     // principle set of links
      listitem_t typelinks; // links for a secondary list (explained later)
      char *key;            // the metaobject's hash key
      const char *type;     // the metaobject's type name
      void *object;         // pointer back to the containing (child) object
   } metaobject_t;

-- End Code --

The METATYPE macro makes use of another C preprocessor trick, the "stringize"
operator. It takes the argument to the macro and transforms it into a quoted
string literal. This means that we can pass type names into functions within
the program and then act based on their value.

The object pointer here is just like the one previously defined by listitem_t,
and can either be self-referential, or point to another location in memory. 
Either way, it is meant to refer to the metaobject descendent to which this
metaobject belongs.

We only need to define one function that is specific to general metaobjects.
It is our run-time type identification function (RTTI). It tests the type
stored in the metaobject against a type we pass to it using the METATYPE
macro. Note that we treat RTTI strings as case sensitive, so that they match
the semantics of the native C types on which they are built.

-- Code --
   
   boolean IsMetaKindOf(metaobject_t *object, const char *type)
   {
      return !strcmp(object->type, type);
   }
   
-- End Code --

With this type defined, we are ready to begin work on the metatable itself.



=== * Penthouse: The Metatable [006] ===========================================

Let's start right off the bat by defining our metatable type. It is shockingly
simple:

-- Code --
   
   typedef struct metatable_s
   {
      hash_t keyhash;  // hash of metaobjects by key
      hash_t typehash; // hash of metaobjects by type
   } metatable_t;
   
-- End Code --

That's right, it's just two hash tables. Metaobjects contain both a key and a 
type, and it is useful to be able to efficiently search the metatable by either
criterion.

Because the number of keys in the table is arbitrary and is defined solely by
the data set on which the program will operate, it is best to make the key
hash dynamically growable. The type hash on the other hand will be set 
largely in stone, depending only on the metaobject specializations that are
provided in your program, so it can be a fixed-size table.

I additionally choose to implement keys as a case-insensitive hash, and types
as a case-sensitive one (as already mentioned), although neither decision is
mandatory, and you can define your own semantics as needed.

So as with the underlying hash table, the first thing we do is define a 
function that will initialize our metatable and its underlying hashes:

-- Code --

   // The initial number of chains in the table is a tunable factor for your
   // program. You can determine the best value only by doing empirical testing.
   // However, prime numbers are always the best! They reduce artificial
   // clustering in the hash table due to common factors between the hash codes
   // and the number of chains.
   
   #define METANUMCHAINS     53
   #define METANUMTYPECHAINS 31

   // Remember our KEYFUNC and LINKFUNC macros?
   // Here is how and when we use them:

   // key retrieval function for metatable hashing by key
   KEYFUNC(metaobject_t, key)

   // key and link retrieval function for metatable hashing by type
   KEYFUNC(metaobject_t, type)
   LINKFUNC(metaobject_t, typelinks)

   void MetaInit(metatable_t *metatable)
   {
      // Initialize the key hash as a case-insensitive string hash on
      // metaobjects. This one uses the default link retrieval method
      // because the listitem for keys is the first item in the structure
      // (again, caveat C99 users!)
      
      NCStrHashInit(&metatable->keyhash, METANUMCHAINS, 
                    KEYFUNCNAME(metaobject_t, key), NULL);

      // Initialize the typehash as a case-sensitive string hash on
      // metaobjects. This one requires an explicitly defined link retrieval
      // method, since typelinks is NOT the first item in metaobject_t.
      
      StrHashInit(&metatable->typehash, METANUMTYPECHAINS,
                  KEYFUNCNAME(metaobject_t, type), 
                  LINKFUNCNAME(metaobject_t, typelinks));
   }

-- End Code --

The next basic operation to define is, of course, adding a metaobject to the
table. This would be pretty basic, except that we want to allow the metatable
to grow in case it becomes too loaded. This is where we must implement table
growth logic.

-- Code --
   
   // As with the initial number of chains, the maximum load factor to allow is
   // a tunable which you must determine to suit your needs.
   #define METALOADFACTOR    0.667f

   // These primes roughly double in size.
   static const unsigned int metaPrimes[] =
   {
        53,    97,   193,   389,   769,   1543,   3079,
      6151, 12289, 24593, 49157, 98317, 393241, 786433
   };

   // Number of primes in the above array.
   #define METANUMPRIMES (sizeof(metaPrimes) / sizeof(unsigned int))

   void MetaAddObject(metatable_t *metatable, const char *key, 
                      metaobject_t *object, void *data, const char *type)
   {
      hash_t *keyhash = &metatable->keyhash;

      // Note that we take ownership responsibility for key strings, but
      // not for type strings. This distinction relies on usage patterns,
      // but I have found it works best for my purposes, where types are
      // static but field names are generated from user-input data.
      object->key    = strdup(key);
      object->type   = type;
      
      // Generally, a pointer back to the owning structure.
      object->object = data; 

      // Check for key table overload.
      if(keyhash->loadfactor > METALOADFACTOR)
      {
         int newnumchains = 0;

         if(keyhash->numchains < metaPrimes[METANUMPRIMES - 1])
         {
            int i;

            // find a prime larger than the current number of chains
            for(i = 0; keyhash->numchains < metaPrimes[i]; ++i);
         
            newnumchains = metaPrimes[i];
         }
         else
            newnumchains = keyhash->numchains * 2; // too big, just double it.

         HashRebuild(keyhash, newnumchains);
      }

      // Add the object to the key table
      HashAddObject(keyhash, object);

      // Add the object to the type table, which is static in size
      HashAddObject(&metatable->typehash, object);
   }

-- End Code --

Now we can grow our metatables to any size and they should continue to function
efficiently. It may be smart to impose some kind of limit on the growth, which
the code above does not do, lest you find yourself out of memory.

Removing objects from the table is much simpler:

-- Code --

   void MetaRemoveObject(metatable_t *metatable, metaobject_t *object)
   {
      // Remove the object from both hashes.
      HashRemoveObject(&metatable->keyhash,  object);
      HashRemoveObject(&metatable->typehash, object);

      // Delete our copy of the key string.
      if(object->key)
      {
         free(object->key);
         object->key = NULL;
      }
   }

-- End Code --

In the next section we will create a bevy of methods for searching and examining
objects inside the metatable.



=== * The View: Finding Objects [007] * ========================================

We have defined the basic methods of the metatable for adding and removing
objects, but the metatable's true power comes from letting us obtain fields we
add to it by providing a string which serves as the field's name or type.

There are several ways to do this. We can provide methods that are attuned to
singleton fields, of which only one is expected to exist in the metatable at a
time, or we can create iterator methods, built on top of our hash table's 
iteration function, that return all objects with a given key or type in 
succession.

Both types of functions are relatively straight-forward, since the bulk of the
lookup logic is implemented inside our generic hash table. All we have to do
is call the appropriate function.

Here are basic lookup functions that are suitable for testing simply whether or
not a given field or type of object exists, or getting the last such object
added to the table (remember that our linked list always inserts at the head of
the list, so the newest objects are at the front of the hash chains).

-- Code --

   metaobject_t *MetaGetObject(metatable_t *metatable, const char *key)
   {
      return HashObjectForKey(&metatable->keyhash, &key);
   }

   metaobject_t *MetaGetObjectType(metatable_t *metatable, const char *type)
   {
      return HashObjectForKey(&metatable->typehash, &type);
   }

   metaobject_t *MetaGetObjectKeyAndType(metatable_t *metatable, const char *key,
                                         const char *type)
   {
      metaobject_t *obj = NULL;

      // Here we search down the key hash for objects that also match the
      // specified type.
      while((obj = HashObjectIterator(&metatable->keyhash, obj, &key)))
      {
         if(IsMetaKindOf(obj, type))
            break;
      }

      return obj;
   }
   
-- End Code --

Nothing too bad here, except that you must remember to pass the *address* of the
key to the HashObjectForKey and HashObjectIterator functions.

Here are equivalent functions which iterate over all the objects in the table
that match the requested criterion.

-- Code --

   metaobject_t *MetaGetNextObject(metatable_t *metatable, metaobject_t *object,
                                   const char *key)
   {
      return HashObjectIterator(&metatable->keyhash, object, &key);
   }

   metaobject_t *MetaGetNextType(metatable_t *metatable, metaobject_t *object,
                                 const char *type)
   {
      return HashObjectIterator(&metatable->typehash, object, &type);
   }

   metaobject_t *MetaGetNextKeyAndType(metatable_t *metatable, metaobject_t *object,
                                       const char *key, const char *type)
   {
      metaobject_t *obj = object;

      while((obj = HashObjectIterator(&metatable->keyhash, obj, &key)))
      {
         if(IsMetaKindOf(obj, type))
            break;
      }

      return obj;
   }
   
-- End Code --

Not much more complicated, are they? The way in which HashObjectIterator works
means that these can also be used as singleton retrieval methods, making the 
ones we defined above somewhat redundant, and thus for convenience only. When
starting a new search, you pass NULL in the object argument, and then you
iterate passing in object each time until the return value is NULL.

You can use similar implementation methods to create methods that simply return
a boolean to tell you whether or not a given key or type exists in the table, or
to actually count how many objects of a given key or type exist in the table. I
leave those as exercises for the reader, since they are painfully simple
additions to the methods shown above.

There is, however, one other type of iteration that we might need, and it is the
one that I promised I would cover later during the discussion of implementing
the hash table. That is, an iteration which visits every object in the table
regardless of key or type. We need this in order to apply any sort of action to
every object in the table.

Before we can implement a metatable method to do this, we need to implement the
functionality in the parent hash table. This is a relatively simple variation on
our existing iterator, but in addition to keeping track of the object we are
currently on, we will also need to keep track of the chain we are currently
going down, so that we will know when we have visited all of the hash chains.

-- Code --

   // To prime a new search with this function, you should pass NULL in object
   // AND pass -1 in index.

   void *HashTableIterator(hash_t *table, void *object, unsigned int *index)
   {
      void *ret = NULL;
      unsigned int i = *index;

      if(!table->isinit)
         return NULL;

      // Already searching?
      if(object)
      {
         // Is there another object on the same chain?
         listitem_t *item = table->linkfunc(object);

         // Return that item.
         if(item->next)
            ret = item->next->object;
      }

      // Else, search for the next chain with an object in it
      while(!ret && ++i < table->numchains)
      {
         listitem_t *chain = table->chains[i];
      
         // Return the first object on that chain.
         if(chain)
            ret = chain->object;
      }

      *index = i;

      return ret;
   }

-- End Code --

Now with this routine in place, we can implement the metatable's wrapper. Note
that we need to pass on responsibility for defining and initializing the object
and index to the caller, since this routine needs to preserve the state of those
arguments across calls.

-- Code --
   
   metaobject_t *MetaTableIterator(metatable_t *metatable, metaobject_t *object,
                                   unsigned int *index)
   {
      return HashTableIterator(&metatable->keyhash, object, index);
   }

-- End Code --

This wraps up discussion of finding objects in the metatable. The next thing we
need to do is create a way to maintain enhanced type information about our
metaobjects that allows us to do incredible things with them that most people
would assume are only possible in C++.



=== * The Rooftop: Metatypes [008] * ===========================================

In order to store extended information about the types of metaobjects that we
add to our table, in order to support operations such as copying of items from 
one metatable to another, we will establish a global cache of this information
called the metatype table.

The metatype table, for convenience of implementation, is itself a metatable,
which functions using our first specialization of metaobject_t:

-- Code --

   // Like our generic hash, metatypes have virtual methods. Only these methods
   // are effectively the "class" methods of the metaobjects that this metatype
   // represents.
   //
   // Yes that is right, a metatype is our equivalent of a C++ class, or a
   // JavaScript prototype. It defines a copy constructor as well as other
   // possible methods.
   
   // metatype method prototypes
   typedef void         *(*MetaAllocFn_t) (size_t);
   typedef void          (*MetaCopyFn_t)  (void *, const void *, size_t);
   typedef metaobject_t *(*MetaObjPtrFn_t)(void *);
   typedef const char   *(*MetaToStrFn_t) (void *);

   typedef struct metatype_s
   {
      metaobject_t   parent;   // metatypes are also metaobjects
      const char    *name;     // name of metatype (derived from C type)
      size_t         size;     // size of type for allocation purposes
      boolean        isinit;   // if true, this type has been registered
      MetaAllocFn_t  alloc;    // allocation method
      MetaCopyFn_t   copy;     // copy method
      MetaObjPtrFn_t objptr;   // object pointer method (returns metaobject)
      MetaToStrFn_t  toString; // string conversion method
   } metatype_t;

-- End Code --

Now with the type defined, we need to create a global metatype registry. The
first time we add a metaobject of a derived type to the metatable, we will
register its metatype object too. Note that metaobjects are never required to
have registered metatypes - it simply limits what you can do with them.

-- Code --
   // The metatypes registry. This is itself a metatable.
   static metatable_t metaTypeRegistry;

   // Default method for allocation of an object given its metatype.
   static void *MetaAlloc(size_t size)
   {
      return calloc(1, size);
   }

   // Default method for copying of an object given its metatype. 
   // Performs a shallow copy only.
   static void MetaCopy(void *dest, const void *src, size_t size)
   {
      memcpy(dest, src, size);
   }

   // Default method to get the metaobject_t field for an object.
   // Returns the same pointer. Works for objects which have the
   // metaobject_t as their first item only.
   static metaobject_t *MetaObjectPtr(void *object)
   {
      return object;
   }
   
   // The default "toString" method won't be discussed here, but is left to
   // the reader as an exercise. My own default toString method produces a
   // hex dump of the metaobject, but it requires use of a reallocating string
   // "class" which is outside the scope of this document.
   ...

   void MetaRegisterType(metatype_t *type)
   {
      // Early check for an already-initialized metatype.
      // This allows us to send the same object in more than once, in case
      // we don't feel like checking every time.
      if(type->isinit)
         return;

      // init table the first time
      if(!metaTypeRegistry.keyhash.isinit)
         MetaInit(&metaTypeRegistry);

      // Set default methods if any are NULL

      if(!type->alloc)
         type->alloc = MetaAlloc;

      if(!type->copy)
         type->copy = MetaCopy;

      if(!type->objptr)
         type->objptr = MetaObjectPtr;

      if(!type->toString)
         type->toString = MetaDefToString;

      // sanity check: we do not allow multiple metatypes of the same name
      if(MetaGetObject(&metaTypeRegistry, type->name))
      {
         // call an error method here if you have one...
         error("MetaRegisterType: type %s is non-singleton", type->name);
      }

      // Add this metatype object to the metatable.
      MetaAddObject(&metaTypeRegistry, type->name, &type->parent, type, 
                    METATYPE(metatype_t));

      type->isinit = true;
   }

   // For convenience's sake we also define the following function, which
   // constructs the metatype C++-style by taking all of its fields as
   // function parameters:
   
   void MetaRegisterTypeEx(metatype_t *type, const char *typeName, size_t typeSize,
                           MetaAllocFn_t alloc, MetaCopyFn_t copy, 
                           MetaObjPtrFn_t objptr, MetaToStrFn_t tostr)
   {
      // early check for already-initialized metatype
      if(type->isinit)
         return;

      type->name     = typeName;
      type->size     = typeSize;
      type->alloc    = alloc;
      type->copy     = copy;
      type->objptr   = objptr;
      type->toString = tostr;

      MetaRegisterType(type);
   }

-- End Code --

That's it for defining and maintaining the metatype table. But now, how do we
use it to do those crazy C++-like things that I promised earlier? Here are two
excellent examples.

The first implements a metatable copying method, whereby you can automatically
create duplicates of all of the metaobjects in a metatable which have registered
metatypes and put them into a second metatable, all without having to know or
care about how many or of what type the metaobjects are! This is functional
magic of the highest degree, at least for a C program.

-- Code --
   
   void MetaCopyTable(metatable_t *desttable, metatable_t *srctable)
   {
      metaobject_t *srcobj = NULL;
      unsigned int i       = -1;

      // Iterate on the source table. Here's why we needed the fancy
      // HashTableIterator method:
      while((srcobj = MetaTableIterator(srctable, srcobj, &i)))
      {
         metatype_t *type;

         // See if the object has a registered metatype.
         if((type = (metatype_t *)MetaGetObject(&metaTypeRegistry, srcobj->type)))
         {
            // Create the new object.
            // We use the metatype alloc, objptr, and copy methods.
            
            void *destobj         = type->alloc(type->size);
            metaobject_t *newmeta = type->objptr(destobj);

            // Copy from the old object to the new object.
            type->copy(destobj, srcobj->object, type->size);

            // Clear the new metaobject for safety.
            memset(newmeta, 0, sizeof(metaobject_t));

            // Add the new object to the destination table.
            // Note that MetaAddObject duplicates the source object's key inside
            // the destination object, so there are no cross-references. This is
            // important when it comes to deleting objects from tables...
            
            MetaAddObject(desttable, srcobj->key, newmeta, destobj, type->name);
         }
      }
   }

-- End Code --

Send two metatables into that function, and you will end up with one that is an
exact copy of the other, save for any objects that don't have registered 
metatypes!

In case that's not enough for you, here is the similar definition of the 
toString method:

-- Code --

   const char *MetaToString(metaobject_t *obj)
   {
      const char *ret;
      metatype_t *type;

      // See if the object has a registered metatype.
      if((type = (metatype_t *)MetaGetObject(&metaTypeRegistry, obj->type)))
         ret = type->toString(obj->object);
      else
         ret = "(unregistered object type)";

      return ret;
   }

-- End Code --

And there you have it. Using the development pattern seen in MetaToString, it is
possible to add any conceivable type of functionality to metaobjects. Simply add
the method prototype into the metatype object, and define a function that 
executes the "virtual method."



=== * The Sky is the Limit: Metaobject Specializations [009] * =================

Although in defining the metatype registry, I demonstrated a basic 
specialization of the metaobject, I do not feel that particular example
demonstrates the typical use of metaobjects.

To do that, I will define here two specializations of the metaobject on
basic C types, so that we can store these common objects into metatables.
Then, by including the metatable type inside an ordinary C structure, we
can fully realize our ability to declare runtime-defined types like in
JavaScript.

The first specialization I will cover is the metaint. It simply adds 
storage to the metaobject for an integer variable. Pretty simple, huh?

-- Code --

   typedef struct metaint_s
   {
      metaobject_t parent;
      int value;
   } metaint_t;

   // toString method for metaints. Uses non-standard itoa. Feel free to
   // change this implementation however you see fit to avoid that.
   static const char *metaIntToString(void *obj)
   {
      metaint_t *mi = (metaint_t *)obj;
      static char str[33];

      memset(str, 0, sizeof(str));

      itoa(mi->value, str, 10);

      return str;
   }

   // Adds a metaint to a metatable.
   void MetaAddInt(metatable_t *metatable, const char *key, int value)
   {
      static metatype_t metaIntType;
      metaint_t *newInt = calloc(1, sizeof(metaint_t));

      // On the first call, we register a metatype, so that metaints can
      // be copied and printed.
      if(!metaIntType.isinit)
      {
         // register metaint type
         MetaRegisterTypeEx(&metaIntType, 
                            METATYPE(metaint_t), sizeof(metaint_t),
                            NULL, NULL, NULL, metaIntToString);
      }

      // Add it.
      newInt->value = value;
      MetaAddObject(metatable, key, &newInt->parent, newInt, 
                    METATYPE(metaint_t));
   }


   // Retrieval method. If the field doesn't exist, we offer to return a default
   // value that was passed to us. In my own implementation I also set a global
   // "metaerrno" value to indicate when this occurs, in case I need to treat
   // it as an error in a particular code segment.
   int MetaGetInt(metatable_t *metatable, const char *key, int defvalue)
   {
      int retval;
      metaobject_t *obj;

      metaerrno = META_ERR_NOERR;

      if(!(obj = MetaGetObjectKeyAndType(metatable, key, METATYPE(metaint_t))))
      {
         metaerrno = META_ERR_NOSUCHOBJECT;
         retval = defvalue;
      }
      else
         retval = ((metaint_t *)obj->object)->value;

      return retval;
   }

   // Remove method. Very straight forward. The value is returned in case you
   // need it at the same time you want to remove it.
   int MetaRemoveInt(metatable_t *metatable, const char *key)
   {
      metaobject_t *obj;
      int value;

      metaerrno = META_ERR_NOERR;

      if(!(obj = MetaGetObjectKeyAndType(metatable, key, METATYPE(metaint_t))))
      {
         metaerrno = META_ERR_NOSUCHOBJECT;
         return 0;
      }

      MetaRemoveObject(metatable, obj);

      value = ((metaint_t *)obj->object)->value;

      free(obj->object);

      return value;
   }
   
-- End Code --

Now, to wrap up this document, I will also show the other simple specialization
for strings. This is more or less exactly the same as above, except for the
type that is being returned and handled inside the functions.

-- Code --

   static const char *metaStringToString(void *obj)
   {
      return ((metastring_t *)obj)->value;
   }

   void MetaAddString(metatable_t *metatable, const char *key, 
                      const char *value)
   {
      static metatype_t metaStrType;
      metastring_t *newString = calloc(1, sizeof(metastring_t));

      if(!metaStrType.isinit)
      {
         // register metastring type
         MetaRegisterTypeEx(&metaStrType, 
                            METATYPE(metastring_t), sizeof(metastring_t), 
                            NULL, NULL, NULL, metaStringToString);
      }

      newString->value = value;

      MetaAddObject(metatable, key, &newString->parent, newString, 
                    METATYPE(metastring_t));
   }

   const char *MetaGetString(metatable_t *metatable, const char *key, 
                             const char *defvalue)
   {
      const char *retval;
      metaobject_t *obj;

      metaerrno = META_ERR_NOERR;

      if(!(obj = MetaGetObjectKeyAndType(metatable, key, 
                                         METATYPE(metastring_t))))
      {
         metaerrno = META_ERR_NOSUCHOBJECT;
         retval = defvalue;
      }
      else
         retval = ((metastring_t *)obj->object)->value;

      return retval;
   }

   const char *MetaRemoveString(metatable_t *metatable, const char *key)
   {
      metaobject_t *obj;
      const char *value;

      metaerrno = META_ERR_NOERR;

      if(!(obj = MetaGetObjectKeyAndType(metatable, key, 
                                         METATYPE(metastring_t))))
      {
         metaerrno = META_ERR_NOSUCHOBJECT;
         return NULL;
      }

      MetaRemoveObject(metatable, obj);

      value = ((metastring_t *)obj->object)->value;

      free(obj->object);

      return value;
   }

-- End Code --

Using this development pattern, it is simple to define metaobject wrappers for
any and every type of data conceivable. As mentioned previously, you can augment
any plain C structure in your program with a metatable, and suddenly you are
able to query and add and remove fields at runtime, rather than requiring the
program to be recompiled.

The applications of the metatable for use along with application scripting are
nearly endless, especially if being employed to reflect data from a similarly
dynamic language such as JavaScript or Python.

With a simple call like this:

   int value = MetaGetInt(meta, "foobar", 0);
   
you are playing with a power far beyond that offered to you by C alone. The
sky truly is the limit.

=== * FIN * ====================================================================