*==============================================================================* | | | * Constructing the Metatable * | | | | Bringing Associative Arrays and Objects Into C | | | | By James "Quasar" Haley | | | *==============================================================================* === * License * ================================================================ This document may be freely copied and distributed, but may not be altered. Code segments taken from this article may only be placed into programs which are under the following licenses: * BSD * GNU GPL If the program is BSD-licensed, it must either stipulate in addition that all use of those modules containing code taken from this file must remain open- source in any publicly distributed program, or the program must include this file in whole, in human-readable form, with any public distribution. === * Table of Contents (Use "Find" to jump to a section by number) * ========== [000] Introduction [001] Principles: Inheritance and Polymorphism [002] Foundation: The List [003] First Floor: The Hash Table [004] Interlude: Hash Specialization [005] Elevator: The Metaobject [006] Penthouse: The Metatable [007] The View: Finding Objects [008] The Rooftop: Metatypes [009] The Sky is the Limit: Metaobject Specializations === * Introduction [000] * ===================================================== C, being neither intrinsically object-oriented nor dynamically typed, does not offer much when it comes to creating structures whose desired layout can only be completely known at runtime. The creation of the metatable seeks to overturn the static, fixed nature of the C structure and augment it with an elegant and simple-to-use interface which provides the power of objects in languages such as JavaScript, where properties can be fluidly added or removed from objects at runtime. This article will step through the underlying structures that are needed to build the metatable, and will then follow through with the meat of the system itself, piece by piece. === * Principles: Inheritance and Polymorphism [001] * ========================= Before going into implementation details, it is important to outline a fundamental principle - the simulation of inheritance and polymorphic behavior in C. Even though C is not an object-oriented language, the structures that it provides serve as the basis of objects in C++, and are therefore similar already. However, an important property of C structures allows us to add inheritance, such that more derived types contain the fields of their parents, and not only that, but those types are capable of being treated as their parents in code that expects the parent type. The property which allows this is defined as part of the ANSI C standard, and is the fact that the first item in a structure type is always aligned to the beginning of the structure in memory. -- Example -- typedef struct parent_s { int x; int y; } parent_t; typedef struct child_s { parent_t parent; int z; } child_t; void printparent(parent_t *parent) { printf("x = %d, y = %d\n", parent->x, parent->y); } void examplefunc(void) { child_t object = { { 1, 2 }, 3 }; printparent((parent_t *)&object); } -- End Example -- As you can see above, it is possible to directly cast a pointer to an object of the child type to its parent type. Unfortunately there is a small caveat to this fact: in the C99 standard, it was decided that compiler optimization was a more important goal than backward compatibility or flexibility, and the casting of pointers to unrelated types was severely restricted. In most compilers it is possible to disable this behavior, with options such as GCC's -fno-strict-aliasing. When disabling strict aliasing optimizations is undesirable, it is also possible to skirt the issue by using an expression such as: &object->parent Being of the appropriate type, there is no aliasing issue. This is more verbose, however, and can be problematic in some cases. It also does not help with the issue of downcasting from the parent type to the child type. For that it is necessary to have a "void *object" member in the parent type which always points to the actual start of the derived object, in which case downcasting is accomplished like this: child_t *child = (child_t *)(parent->object); With the addition of function pointers to the structures in question, it also becomes possible to simulate polymorphism: -- Example -- typedef struct parent_s { int x; int y; void (*print)(struct parent_s *); } parent_t; typedef struct child_s { parent_t parent; int z; } child_t; void printparent(struct parent_s *obj) { printf("x = %d, y = %d\n", parent->x, parent->y); } void printchild(struct parent_s *obj) { child_t *cobj = (child_t *)obj; // downcasting to child type printf("x = %d, y = %d, z = %d\n", obj->x, obj->y, cobj->z); } void examplefunc(void) { int i; parent_t justaparent = { 1, 2, printparent }; child_t achild = { { 1, 2, printchild }, 3 }; // put pointers to the objects defined above into an array, // just for the sake of treating them uniformly parent_t *objects[2]; objects[0] = &justaparent; objects[1] = &achild; // invoke the print function pointer in each object for(i = 0; i < 2; ++i) objects[i]->print(objects[i]); } -- End Example -- The output from the example above would look like this: 1, 2 1, 2, 3 Because these function-pointer-based methods can be assigned unique values on a per-type or even per-object basis, you should immediately realize that they are always "virtual," or in other words, the function they call is determined at run-time rather than at compile-time. This matches the behavior of methods in dynamic languages like JavaScript. These two important fundamental principles are what drives the metatable. === * Foundation: The List [002] * ============================================= At the very utmost foundation of the metatable, we need a simple and efficient list structure which supports the insertion and deletion of objects in constant time, and without requiring that the objects all be of the same type. You will see eventually that these lists serve as the chains, or buckets, of another critical underlying component, a generic hash table implementation, but let's not get ahead of ourselves. We start by defining the list item structure: -- Code -- typedef struct listitem_s { struct listitem_s *next; // next object in the list struct listitem_s **prev; // pointer to previous object's next pointer void *object; // a pointer to the parent object } listitem_t; -- End Code -- The use of the pointer-to-pointer prev link may at first be a suprise, but it will be explained immediately upon definition of the insertion and deletion functions used by this list structure. * Inserting an Object * To insert a link item into a list, we define a ListInsert function which takes a pointer to the listitem, a pointer to the base object (for satisfying C99, mostly), and the address of the list's head pointer, which starts out NULL when the list is empty. -- Code -- void ListInsert(listitem_t *item, void *object, listitem_t **head) { // Get the item at the head of the list, if one exists. // It will become the next item in the list after the one being added. listitem_t *next = *head; // Assign the old head value to the new item's next pointer. If the // pointer is valid, set the next object's "prev" pointer to point back // to our own next link. if((item->next = next)) next->prev = &item->next; // Our own item's prev pointer should point to the list head, because it // is the first item in the list. item->prev = head; // Reset the list head pointer to point to our new item. *head = item; // Set the object pointer (this could be made self-referential) item->object = object; } -- End Code -- Notice how this code is free of special cases for the beginning, middle, or end - all nodes are treated uniformly, which is a basic property of this list type which makes it blazingly fast. Insertion of items is always at the head of the list, which is an important property for our hash table that will be defined later. * Removing an Object * All that is needed to complete this abstract data type is a deletion method which can remove arbitrary items from the list: -- Code -- void ListRemove(listitem_t *item) { listitem_t **prev = item->prev; // get previous item's next pointer listitem_t *next = item->next; // get this item's next pointer // Redirect the previous object's next pointer to this item's next object. // If next object is valid, set its prev pointer to the previous object. if((*prev = next)) next->prev = prev; } -- End Code -- That's it for deletion - a single if statement and two assignments. This function is quite likely to be inlined if declared properly in a header file, and is simply as fast as a linked list deletion can be made. The pointer-to- pointer prev member not only enables this uniformity, but it also allows the deletion to be contextless: notice that we don't even have to know of which list the object is a part in order to remove it from its list. This is another important quality for our higher-level data structures. === * First Floor: The Hash Table [003] * ====================================== The foundation of objects in dynamic, weakly-typed languages such as JavaScript is invariably the associative array, or hash table, which when given a string name, returns the object (or objects) which has that name. Hash tables are an extension of the concept of a simple array, where you give an index number and are returned a particular value, but hash tables are more general in the following ways: * The contained items do not have to be consecutive in memory. * The index that serves as input does not strictly have to be numeric. * The hash does not have to have a valid object at every index in its range. * The hash may have more than one object at a given index. Defining a simple hash table isn't that complicated, but for the purposes of implementing our higher-level metatable, we want a hash which is general. That means that it can apply to any type of object, so long as a specialization is provided for the hashing of that type of object. We accomplish this by requiring that any object to be hashed descends from listitem_t, our linked list item type. This allows us to employ the pseudo-inheritance outlined in the first section of this article. In order to do this, we will need to create a basic hash structure with a dynamically allocated set of hash chains, sometimes also called "buckets," that will each hold objects in the same equivalence class - that is, whose keys are equal modulus the size of the hash table. We will also need to make our structure contain a number of methods, so that the functions of the hash operation can be tailored to specific specializations. Let's define these needed operations first: -- Code -- // You will find it necessary to forward-declare the typedef of your hash // structure type here, so that it is visible to the function pointer type // below. typedef struct hash_s hash_t; // Hash key computation function prototype. The hash key is computed from the // data held in an object, and determines the chain to which it belongs. typedef unsigned int (*HashFunc_t)(const void *); // Object comparison function prototype. When looking for an object in the // hash table, we need to do object comparisons. typedef boolean (*CompFunc_t)(hash_t *, void *, const void *); // Key retrieval function prototype. Because our hash table doesn't want to // have to know about the memory layout of the objects it contains, we have a // method which returns a pointer to the "key" member of the item. typedef const void * (*KeyFunc_t) (void *); // Link retrieval function prototype. Again, we don't know the memory layout // of the objects that will be in our hash, so we have this method to return // the listitem_t -- if the listitem_t is at the beginning of the function, // and we are not subject to C99's aliasing restrictions, then we may just // actually return the same pointer we are passed as an argument, in which // case this method is not technically needed. typedef void * (*LinkFunc_t)(void *); -- End Code -- With the above method types defined, let us define our basic hash structure: -- Code -- struct hash_s { listitem_t **chains; // hash chains - an array of list head pointers. unsigned int numchains; // number of chains unsigned int numitems; // number of items currently in table float loadfactor; // load = numitems / numchains HashFunc_t hashfunc; // hash key computation function CompFunc_t compfunc; // object comparison function KeyFunc_t keyfunc; // key retrieval function LinkFunc_t linkfunc; // link-for-object function boolean isinit; // true if hash is initialized }; -- End Code -- Because the key and link functions are typically simple one-line "getter" methods, it would be advantageous to define a pair of macros which can auto- generate these functions for any type that we want to be able to add to this hash table. We can do this using some preprocessor magic as such: -- Code -- // Key Function macro - autogenerates a key retrieval function #define KEYFUNC(type, field) \ static const void *HashKeyFunc_ ## type ## field (void *object) \ { \ return &(((type *)object)-> field ); \ } #define KEYFUNCNAME(type, field) HashKeyFunc_ ## type ## field // Link Function macro - autogenerates a link retrieval function. #define LINKFUNC(type, field) \ static void *HashLinkFunc_ ## type ## field (void *object) \ { \ return &(((type *)object)-> field ); \ } #define LINKFUNCNAME(type, field) HashLinkFunc_ ## type ## field -- End Code -- If you are not familiar with the C preprocessor's ## operator, it is simply a string concatenation operation. It takes the tokens on both sides of it and turns them into a single token. Our KEYFUNC and LINKFUNC macros can now be used to autogenerate the needed methods for every type we put into a generic hash table. Simply pass the macros the name of the type and the name of the field. The use of these macros will be demonstrated explicitly in the definition of the metatable later, so don't feel too lost. The corresponding NAME macros let us refer to those autogenerated names without having to know about the implementation of the key and link function macros. When we are free of C99 restrictions on aliasing, we can additionally define a default link function, since as mentioned above, the link function becomes a no-op for structures which directly inherit from listitem_t. For such structures, a default link method would appear like this: -- Code -- void *LinkForObject(void *object) { return object; } -- End Code -- As long as the listitem_t is at the head of the item, we can simply return the same pointer passed in, and via our inheritance simulation, it will be treated as a listitem. We will be assigning this method to any item that does not define its link method explicitly in just a moment. The first operation that our hash table itself needs to support, as with any abstract data type, is being initialized. Let's define a HashInit function which will set up a hash table, specializing its generic structure to function on a specific specialization. -- Code -- void HashInit(hash_t *table, unsigned int numchains, HashFunc_t hfunc, CompFunc_t cfunc, KeyFunc_t kfunc, LinkFunc_t lfunc) { // Create the array of chains and initialize them all to zero. // IRL, you would want to check the validity of calloc's return value. table->chains = calloc(numchains, sizeof(listitem_t *)); // Initialize the number of chains, number of items contained in the // table, and the table's load factor (a metric of how many items are in // the table versus the number of chains in the table) all to zero. table->numchains = numchains; table->numitems = 0; table->loadfactor = 0.0f; // Assign the methods passed into this function to the hash object, so // that it will behave polymorphically. This allows the generic hash // algorithm we are about to write to function over any sort of object. // // If a link function is not passed in, we use the default one we defined // above (caveat C99 users). table->hashfunc = hfunc; table->compfunc = cfunc; table->keyfunc = kfunc; table->linkfunc = lfunc ? lfunc : LinkForObject; // Mark the table as initialized. // In C99 you might want to use the standard _Bool type. For ANSI C, just // define an enumeration to handle this, as such: // typedef enum { false, true } boolean; table->isinit = true; } -- End Code -- This function sets up our hash table. Let's take some time to break down some of the individual components further: * chains A hash table always has a certain number of chains, or buckets, which the objects placed into it are categorized based on the value of their key. Think of the chains as being the extended "array," and the key as being the extended "index." * numchains, numitems, loadfactor The number of allocated chains has to be tracked for purposes of taking the object key values modulus the number of chains, so that we do not index into the chains array out of bounds. We additionally track the number of items in the table, and use that information to compute a load factor as such: load factor = numitems / numchains This is important for supporting growable hash tables - that is, hash tables where the number of chains can be changed at runtime. This is important because, ideally, the search behavior of a hash table is O(1), or constant time. The more items added to a hash table, the more likely you are to have more than one item on a chain, and for efficiency, this is undesirable. Thus, when the load factor exceeds a certain preordained level, it may be good to "rehash." We will define a rehashing function later. Now, how do we go about adding an object to our hash table? Quite simple really. We want to invoke the table's key method on the object being inserted, and then compute the hash code, or index into the chains table, by taking the key number modulus the number of chains. After that, we simply insert the object into that chain using our earlier-defined list insertion operation. -- Code -- void HashAddObject(hash_t *table, void *object) { if(table->isinit) // be sure the table is initialized { // Get the key for the object passed in. const void *key = table->keyfunc(object); // Compute the hash code. unsigned int hashcode = table->hashfunc(key) % table->numchains; // Insert the object into the list. // linkfunc returns the pointer to the object's listitem_t parent. // We also pass in the object pointer, and the address of the chain // into which we're inserting the object. ListInsert(table->linkfunc(object), object, &(table->chains[hashcode])); // We added an object, so increment the number of items and recompute // the table's load factor. table->numitems++; table->loadfactor = (float)table->numitems / table->numchains; } } -- End Code -- We can support removing objects in a similar manner, but even more simply. Here is where the contextless efficiency of our linked list's remove method comes in handy, as we are not required to determine what chain the object is in before removing it. -- Code -- void HashRemoveObject(hash_t *table, void *object) { if(table->isinit) { // Just remove the item's parent listitem from its list. ListRemove(table->linkfunc(object)); // Decrement number of items and recompute the load factor. table->numitems--; table->loadfactor = (float)table->numitems / table->numchains; } } -- End Code -- Note that you will also want to define a version of the HashRemoveObject function that does not decrement the load factor. This will be useful in the rehashing process that will be covered below. The purpose will be evident when we get to it. Now, the next and arguably most important operation that the hash table must support is the finding of an object inside the table given the object's key. Because we do not want to restrict our table to having a single object with a given key, we implement this operation as an iterator which will walk down the hash chain and return each item in it that matches the given key. A search is started by passing a NULL pointer in object. A search is continued by passing the previous return value in object successively. By doing this until the function returns NULL, we can iterate over every object in the table with the same key. -- Code -- // Given a key, this routine returns the first object found in the table // with that key. void *HashObjectForKey(hash_t *table, const void *key) { if(table->isinit) { // Compute the object's hash code as during insertion unsigned int hashcode = table->hashfunc(key) % table->numchains; // Get a pointer to the object's chain. listitem_t *chain = table->chains[hashcode]; // Walk down the chain until we find an object that matches the // requested key. We do this by calling the table's compfunc method. while(chain && !table->compfunc(table, chain->object, key)) chain = chain->next; // If the chain is exhausted, there is no object with that key in the // table, so we return NULL in that case. Otherwise, we return the // listitem's object pointer. return chain ? chain->object : NULL; } else return NULL; // this is in case the table isn't initialized... } void *HashObjectIterator(hash_t *table, void *object, const void *key) { void *ret; // Be sure the table is initialized. if(!table->isinit) return NULL; // Starting a new search? Get the first object that matches the key. if(!object) ret = HashObjectForKey(table, key); else { // "object" is valid, so we are continuing a search. Get the listitem // for the last item that was returned. listitem_t *item = table->linkfunc(object); // Start on the next object in hash chain, since we already processed // the one that was passed back to us. item = item->next; // Walk down the chain and use the comparison method to look for the // next match. while(item && !table->compfunc(table, item->object, key)) item = item->next; // Again, if a match is found, we return it. Otherwise there are no // more objects in the table with that key. ret = item ? item->object : NULL; } return ret; } -- End Code -- When you know there is only ever one object in a table for each key, you can simply call the HashObjectForKey function. Otherwise, you use the HashObjectIterator function as in this example: -- Example -- void example(void) { void *object = NULL; while((object = HashObjectIterator(table, object, key))) { // cast it to the descendent type and do something with it... } } -- End Example -- The important thing to grasp is that you both pass the "object" pointer into the iterator function as the second parameter, and put the function's return value into it. This both serves to prime the search initially, and to power its step to the next item until the list is exhausted. It is also possible to define a similar iteration which passes over ALL items in the hash table regardless of their keys, but that will be reserved for discussion later in the metatable implementation, where we will find a need to use it. Finally, let's define our rehash operation, which will be used when we see that our table is too full. It isn't always necessary or even desirable to rehash tables, so we leave the calling of this function up to the code that is using the hash table, rather than dictating a rehashing policy in this data structure itself. What rehashing implies is removing all objects from the table, resizing the array of hash chains, and then readding all of the objects to the table. This is necessary because changing the number of chains changes the objects' hash codes. Without rehashing, the search time for an object in the table will approach the worst-case linear time, which is undesirable. By rehashing, we hope to minimize "collisions," which are the occurance of objects whose keys are not equal, but whose hash codes are, such that they all end up inside the same chain. Note that rehashing is a very expensive operation, so the way in which the table growth is managed is extremely important. An example of smart growing will be demonstrated in the definition of the metatable later. -- Code -- void HashRebuild(hash_t *table, unsigned int newnumchains) { listitem_t **oldchains = table->chains; // save current chains unsigned int oldnumchains = table->numchains; unsigned int i; if(!table->isinit) return; // Allocate a new chains table. We rely on the caller to tell us how many // chains should be allocated, so this could in fact shrink the table, if // you needed to decrease the number of chains for some reason. table->chains = calloc(newnumchains, sizeof(listitem_t *)); table->numchains = newnumchains; // Mark the number of items currently in the table as zero. table->numitems = 0; // Run down the old hash chains. for(i = 0; i < oldnumchains; ++i) { listitem_t *chain; // Restart from the beginning of the old chain each time, since the // linked list has been modified. while((chain = oldchains[i])) { // Remove from the old hash. This is calling the discussed but not // illustrated method that removes an object from the table without // decrementing table->numitems. We don't want to do that here, // since we already set numitems to 0 above. HashRemoveObjectNoCount(table, chain->object); // Clear out fields for safety. chain->next = NULL; chain->prev = NULL; // Add this object to the new hash. HashAddObject(table, chain->object); } } // recalculate load factor table->loadfactor = (float)table->numitems / table->numchains; // delete old chains free(oldchains); } -- End Code -- This concludes implementation of the generic hash table data structure. We now have an associative array that, when given an identifying key object, returns to us the object or objects that it represents. The next section is an interlude that demonstrates how to provide specializations of this generic data structure. === * Interlude: Hash Specialization [004] * =================================== In order to make our generic hash useful, we need to provide one or more "specializations," similar to the function of templates in C++. Though the generic hash interface itself is not particularly typesafe via its use of void pointers, our specializations restrict that genericity to a single type. As long as you do not deliberately do something stupid, it is impossible to fall afoul of typing problems when using a specialization. It is possible to define specializations that use any sort of key, but the two most common key types for objects in a hash table are plain integers and strings. We will examine both for completeness, but it is the string hash that our metatable will eventually employ, in two different flavors - case sensitive and case insensitive. Only the case sensitive hash will be detailed here, since the only difference between the two is the treatment of characters in the comparison and hash key computation functions. Using an integer as a hash key is the simplest possible specialization. Let's start out by defining the hash function itself. All we have to do is return the integer key. The key will be taken modulus the table size to compute the hash code automatically by the generic routines defined earlier. -- Code -- unsigned int IntHashFunc(const void *key) { return (unsigned int)(*(const int *)key); } -- End Code -- Remember that the hash function is passed a pointer to the key by the generic routines, so we need to dereference the key pointer in order to get the value of the integer. We then cast it to unsigned int for purposes of matching the return value, so that the modulus in the hash code computation never renders a negative value that would crash our program. Next we need a comparison function that, given an object in the hash and a key, checks to see if the object has the same key. -- Code -- boolean IntCompareFunc(hash_t *table, void *object, const void *key) { // Get the object's key using the table's keyfunc method as usual. int objkey = *(int *)(table->keyfunc(object)); // Compare the object's key to the key in question and return the result. return objkey == *(const int *)key; } -- End Code -- That's pretty much all that is required to provide a specialization for a particular type of key. In order to actually use an integer hash, we still need one additional function however, and that is a function that sets up a hash_t to act like an integer hash for a particular type of object. We do that here: -- Code -- void IntHashInit(hash_t *table, unsigned int numchains, KeyFunc_t kfunc, LinkFunc_t lfunc) { // Our hash and compare functions are set, but the key and link functions, // which can be defined with the KEYFUNC and LINKFUNC macros, are specific // to the object type using this hash. HashInit(table, numchains, IntHashFunc, IntCompareFunc, kfunc, lfunc); } -- End Code -- Now I will show a similar specialization for strings, but all at once without a lot of commenting. It is the same structure, but based around a different key type. -- Code -- // We have to turn a string into an unsigned integer. There are a billion // ways to do that, all with their own ups and downs. This method is based // on the one used by the SGI implementation of the C++ STL. unsigned int StrHashFunc(const void *key) { // It's important to remember we are given a POINTER to the key, and the // key in this case is a const char *, so "key" is really a pointer to a // pointer. We cast and then dereference one level. const char *c = *(const char **)key; unsigned int h = 0; // Go down the string and do some math magic on the characters. The chief // idea is to use multiplication to spread out the values, because the // more bits that are set in the key, the less likely are hash collisions. while(*c) { h = 5 * h + *c; ++c; } return h; } boolean StrCompareFunc(hash_t *table, void *object, const void *key) { const char *objkey = *(const char **)(table->keyfunc(object)); // Simple string comparison between the two keys. return !strcmp(objkey, *(const char **)key); } void StrHashInit(hash_t *table, unsigned int numchains, KeyFunc_t kfunc, LinkFunc_t lfunc) { HashInit(table, numchains, StrHashFunc, StrCompareFunc, kfunc, lfunc); } -- End Code -- It should be easy to see the development pattern for hash specializations between these two examples. Try defining specializations for unsigned integers, case-insensitive strings, and even complex objects on your own. === * Elevator: The Metaobject [005] * ========================================= We have now defined a hash table that can be specialized to deal with any sort of object, so long as that object derives from our equally generic linked list. So what happens if we define a sort of object which itself is capable of pointing to anything that derives from itself, and keeps track of important type information? Then we would have what I call the metaobject, which serves as the basis of our metatable. -- Code -- #define METATYPE(type) #type typedef struct metaobject_s { listitem_t links; // principle set of links listitem_t typelinks; // links for a secondary list (explained later) char *key; // the metaobject's hash key const char *type; // the metaobject's type name void *object; // pointer back to the containing (child) object } metaobject_t; -- End Code -- The METATYPE macro makes use of another C preprocessor trick, the "stringize" operator. It takes the argument to the macro and transforms it into a quoted string literal. This means that we can pass type names into functions within the program and then act based on their value. The object pointer here is just like the one previously defined by listitem_t, and can either be self-referential, or point to another location in memory. Either way, it is meant to refer to the metaobject descendent to which this metaobject belongs. We only need to define one function that is specific to general metaobjects. It is our run-time type identification function (RTTI). It tests the type stored in the metaobject against a type we pass to it using the METATYPE macro. Note that we treat RTTI strings as case sensitive, so that they match the semantics of the native C types on which they are built. -- Code -- boolean IsMetaKindOf(metaobject_t *object, const char *type) { return !strcmp(object->type, type); } -- End Code -- With this type defined, we are ready to begin work on the metatable itself. === * Penthouse: The Metatable [006] =========================================== Let's start right off the bat by defining our metatable type. It is shockingly simple: -- Code -- typedef struct metatable_s { hash_t keyhash; // hash of metaobjects by key hash_t typehash; // hash of metaobjects by type } metatable_t; -- End Code -- That's right, it's just two hash tables. Metaobjects contain both a key and a type, and it is useful to be able to efficiently search the metatable by either criterion. Because the number of keys in the table is arbitrary and is defined solely by the data set on which the program will operate, it is best to make the key hash dynamically growable. The type hash on the other hand will be set largely in stone, depending only on the metaobject specializations that are provided in your program, so it can be a fixed-size table. I additionally choose to implement keys as a case-insensitive hash, and types as a case-sensitive one (as already mentioned), although neither decision is mandatory, and you can define your own semantics as needed. So as with the underlying hash table, the first thing we do is define a function that will initialize our metatable and its underlying hashes: -- Code -- // The initial number of chains in the table is a tunable factor for your // program. You can determine the best value only by doing empirical testing. // However, prime numbers are always the best! They reduce artificial // clustering in the hash table due to common factors between the hash codes // and the number of chains. #define METANUMCHAINS 53 #define METANUMTYPECHAINS 31 // Remember our KEYFUNC and LINKFUNC macros? // Here is how and when we use them: // key retrieval function for metatable hashing by key KEYFUNC(metaobject_t, key) // key and link retrieval function for metatable hashing by type KEYFUNC(metaobject_t, type) LINKFUNC(metaobject_t, typelinks) void MetaInit(metatable_t *metatable) { // Initialize the key hash as a case-insensitive string hash on // metaobjects. This one uses the default link retrieval method // because the listitem for keys is the first item in the structure // (again, caveat C99 users!) NCStrHashInit(&metatable->keyhash, METANUMCHAINS, KEYFUNCNAME(metaobject_t, key), NULL); // Initialize the typehash as a case-sensitive string hash on // metaobjects. This one requires an explicitly defined link retrieval // method, since typelinks is NOT the first item in metaobject_t. StrHashInit(&metatable->typehash, METANUMTYPECHAINS, KEYFUNCNAME(metaobject_t, type), LINKFUNCNAME(metaobject_t, typelinks)); } -- End Code -- The next basic operation to define is, of course, adding a metaobject to the table. This would be pretty basic, except that we want to allow the metatable to grow in case it becomes too loaded. This is where we must implement table growth logic. -- Code -- // As with the initial number of chains, the maximum load factor to allow is // a tunable which you must determine to suit your needs. #define METALOADFACTOR 0.667f // These primes roughly double in size. static const unsigned int metaPrimes[] = { 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 393241, 786433 }; // Number of primes in the above array. #define METANUMPRIMES (sizeof(metaPrimes) / sizeof(unsigned int)) void MetaAddObject(metatable_t *metatable, const char *key, metaobject_t *object, void *data, const char *type) { hash_t *keyhash = &metatable->keyhash; // Note that we take ownership responsibility for key strings, but // not for type strings. This distinction relies on usage patterns, // but I have found it works best for my purposes, where types are // static but field names are generated from user-input data. object->key = strdup(key); object->type = type; // Generally, a pointer back to the owning structure. object->object = data; // Check for key table overload. if(keyhash->loadfactor > METALOADFACTOR) { int newnumchains = 0; if(keyhash->numchains < metaPrimes[METANUMPRIMES - 1]) { int i; // find a prime larger than the current number of chains for(i = 0; keyhash->numchains < metaPrimes[i]; ++i); newnumchains = metaPrimes[i]; } else newnumchains = keyhash->numchains * 2; // too big, just double it. HashRebuild(keyhash, newnumchains); } // Add the object to the key table HashAddObject(keyhash, object); // Add the object to the type table, which is static in size HashAddObject(&metatable->typehash, object); } -- End Code -- Now we can grow our metatables to any size and they should continue to function efficiently. It may be smart to impose some kind of limit on the growth, which the code above does not do, lest you find yourself out of memory. Removing objects from the table is much simpler: -- Code -- void MetaRemoveObject(metatable_t *metatable, metaobject_t *object) { // Remove the object from both hashes. HashRemoveObject(&metatable->keyhash, object); HashRemoveObject(&metatable->typehash, object); // Delete our copy of the key string. if(object->key) { free(object->key); object->key = NULL; } } -- End Code -- In the next section we will create a bevy of methods for searching and examining objects inside the metatable. === * The View: Finding Objects [007] * ======================================== We have defined the basic methods of the metatable for adding and removing objects, but the metatable's true power comes from letting us obtain fields we add to it by providing a string which serves as the field's name or type. There are several ways to do this. We can provide methods that are attuned to singleton fields, of which only one is expected to exist in the metatable at a time, or we can create iterator methods, built on top of our hash table's iteration function, that return all objects with a given key or type in succession. Both types of functions are relatively straight-forward, since the bulk of the lookup logic is implemented inside our generic hash table. All we have to do is call the appropriate function. Here are basic lookup functions that are suitable for testing simply whether or not a given field or type of object exists, or getting the last such object added to the table (remember that our linked list always inserts at the head of the list, so the newest objects are at the front of the hash chains). -- Code -- metaobject_t *MetaGetObject(metatable_t *metatable, const char *key) { return HashObjectForKey(&metatable->keyhash, &key); } metaobject_t *MetaGetObjectType(metatable_t *metatable, const char *type) { return HashObjectForKey(&metatable->typehash, &type); } metaobject_t *MetaGetObjectKeyAndType(metatable_t *metatable, const char *key, const char *type) { metaobject_t *obj = NULL; // Here we search down the key hash for objects that also match the // specified type. while((obj = HashObjectIterator(&metatable->keyhash, obj, &key))) { if(IsMetaKindOf(obj, type)) break; } return obj; } -- End Code -- Nothing too bad here, except that you must remember to pass the *address* of the key to the HashObjectForKey and HashObjectIterator functions. Here are equivalent functions which iterate over all the objects in the table that match the requested criterion. -- Code -- metaobject_t *MetaGetNextObject(metatable_t *metatable, metaobject_t *object, const char *key) { return HashObjectIterator(&metatable->keyhash, object, &key); } metaobject_t *MetaGetNextType(metatable_t *metatable, metaobject_t *object, const char *type) { return HashObjectIterator(&metatable->typehash, object, &type); } metaobject_t *MetaGetNextKeyAndType(metatable_t *metatable, metaobject_t *object, const char *key, const char *type) { metaobject_t *obj = object; while((obj = HashObjectIterator(&metatable->keyhash, obj, &key))) { if(IsMetaKindOf(obj, type)) break; } return obj; } -- End Code -- Not much more complicated, are they? The way in which HashObjectIterator works means that these can also be used as singleton retrieval methods, making the ones we defined above somewhat redundant, and thus for convenience only. When starting a new search, you pass NULL in the object argument, and then you iterate passing in object each time until the return value is NULL. You can use similar implementation methods to create methods that simply return a boolean to tell you whether or not a given key or type exists in the table, or to actually count how many objects of a given key or type exist in the table. I leave those as exercises for the reader, since they are painfully simple additions to the methods shown above. There is, however, one other type of iteration that we might need, and it is the one that I promised I would cover later during the discussion of implementing the hash table. That is, an iteration which visits every object in the table regardless of key or type. We need this in order to apply any sort of action to every object in the table. Before we can implement a metatable method to do this, we need to implement the functionality in the parent hash table. This is a relatively simple variation on our existing iterator, but in addition to keeping track of the object we are currently on, we will also need to keep track of the chain we are currently going down, so that we will know when we have visited all of the hash chains. -- Code -- // To prime a new search with this function, you should pass NULL in object // AND pass -1 in index. void *HashTableIterator(hash_t *table, void *object, unsigned int *index) { void *ret = NULL; unsigned int i = *index; if(!table->isinit) return NULL; // Already searching? if(object) { // Is there another object on the same chain? listitem_t *item = table->linkfunc(object); // Return that item. if(item->next) ret = item->next->object; } // Else, search for the next chain with an object in it while(!ret && ++i < table->numchains) { listitem_t *chain = table->chains[i]; // Return the first object on that chain. if(chain) ret = chain->object; } *index = i; return ret; } -- End Code -- Now with this routine in place, we can implement the metatable's wrapper. Note that we need to pass on responsibility for defining and initializing the object and index to the caller, since this routine needs to preserve the state of those arguments across calls. -- Code -- metaobject_t *MetaTableIterator(metatable_t *metatable, metaobject_t *object, unsigned int *index) { return HashTableIterator(&metatable->keyhash, object, index); } -- End Code -- This wraps up discussion of finding objects in the metatable. The next thing we need to do is create a way to maintain enhanced type information about our metaobjects that allows us to do incredible things with them that most people would assume are only possible in C++. === * The Rooftop: Metatypes [008] * =========================================== In order to store extended information about the types of metaobjects that we add to our table, in order to support operations such as copying of items from one metatable to another, we will establish a global cache of this information called the metatype table. The metatype table, for convenience of implementation, is itself a metatable, which functions using our first specialization of metaobject_t: -- Code -- // Like our generic hash, metatypes have virtual methods. Only these methods // are effectively the "class" methods of the metaobjects that this metatype // represents. // // Yes that is right, a metatype is our equivalent of a C++ class, or a // JavaScript prototype. It defines a copy constructor as well as other // possible methods. // metatype method prototypes typedef void *(*MetaAllocFn_t) (size_t); typedef void (*MetaCopyFn_t) (void *, const void *, size_t); typedef metaobject_t *(*MetaObjPtrFn_t)(void *); typedef const char *(*MetaToStrFn_t) (void *); typedef struct metatype_s { metaobject_t parent; // metatypes are also metaobjects const char *name; // name of metatype (derived from C type) size_t size; // size of type for allocation purposes boolean isinit; // if true, this type has been registered MetaAllocFn_t alloc; // allocation method MetaCopyFn_t copy; // copy method MetaObjPtrFn_t objptr; // object pointer method (returns metaobject) MetaToStrFn_t toString; // string conversion method } metatype_t; -- End Code -- Now with the type defined, we need to create a global metatype registry. The first time we add a metaobject of a derived type to the metatable, we will register its metatype object too. Note that metaobjects are never required to have registered metatypes - it simply limits what you can do with them. -- Code -- // The metatypes registry. This is itself a metatable. static metatable_t metaTypeRegistry; // Default method for allocation of an object given its metatype. static void *MetaAlloc(size_t size) { return calloc(1, size); } // Default method for copying of an object given its metatype. // Performs a shallow copy only. static void MetaCopy(void *dest, const void *src, size_t size) { memcpy(dest, src, size); } // Default method to get the metaobject_t field for an object. // Returns the same pointer. Works for objects which have the // metaobject_t as their first item only. static metaobject_t *MetaObjectPtr(void *object) { return object; } // The default "toString" method won't be discussed here, but is left to // the reader as an exercise. My own default toString method produces a // hex dump of the metaobject, but it requires use of a reallocating string // "class" which is outside the scope of this document. ... void MetaRegisterType(metatype_t *type) { // Early check for an already-initialized metatype. // This allows us to send the same object in more than once, in case // we don't feel like checking every time. if(type->isinit) return; // init table the first time if(!metaTypeRegistry.keyhash.isinit) MetaInit(&metaTypeRegistry); // Set default methods if any are NULL if(!type->alloc) type->alloc = MetaAlloc; if(!type->copy) type->copy = MetaCopy; if(!type->objptr) type->objptr = MetaObjectPtr; if(!type->toString) type->toString = MetaDefToString; // sanity check: we do not allow multiple metatypes of the same name if(MetaGetObject(&metaTypeRegistry, type->name)) { // call an error method here if you have one... error("MetaRegisterType: type %s is non-singleton", type->name); } // Add this metatype object to the metatable. MetaAddObject(&metaTypeRegistry, type->name, &type->parent, type, METATYPE(metatype_t)); type->isinit = true; } // For convenience's sake we also define the following function, which // constructs the metatype C++-style by taking all of its fields as // function parameters: void MetaRegisterTypeEx(metatype_t *type, const char *typeName, size_t typeSize, MetaAllocFn_t alloc, MetaCopyFn_t copy, MetaObjPtrFn_t objptr, MetaToStrFn_t tostr) { // early check for already-initialized metatype if(type->isinit) return; type->name = typeName; type->size = typeSize; type->alloc = alloc; type->copy = copy; type->objptr = objptr; type->toString = tostr; MetaRegisterType(type); } -- End Code -- That's it for defining and maintaining the metatype table. But now, how do we use it to do those crazy C++-like things that I promised earlier? Here are two excellent examples. The first implements a metatable copying method, whereby you can automatically create duplicates of all of the metaobjects in a metatable which have registered metatypes and put them into a second metatable, all without having to know or care about how many or of what type the metaobjects are! This is functional magic of the highest degree, at least for a C program. -- Code -- void MetaCopyTable(metatable_t *desttable, metatable_t *srctable) { metaobject_t *srcobj = NULL; unsigned int i = -1; // Iterate on the source table. Here's why we needed the fancy // HashTableIterator method: while((srcobj = MetaTableIterator(srctable, srcobj, &i))) { metatype_t *type; // See if the object has a registered metatype. if((type = (metatype_t *)MetaGetObject(&metaTypeRegistry, srcobj->type))) { // Create the new object. // We use the metatype alloc, objptr, and copy methods. void *destobj = type->alloc(type->size); metaobject_t *newmeta = type->objptr(destobj); // Copy from the old object to the new object. type->copy(destobj, srcobj->object, type->size); // Clear the new metaobject for safety. memset(newmeta, 0, sizeof(metaobject_t)); // Add the new object to the destination table. // Note that MetaAddObject duplicates the source object's key inside // the destination object, so there are no cross-references. This is // important when it comes to deleting objects from tables... MetaAddObject(desttable, srcobj->key, newmeta, destobj, type->name); } } } -- End Code -- Send two metatables into that function, and you will end up with one that is an exact copy of the other, save for any objects that don't have registered metatypes! In case that's not enough for you, here is the similar definition of the toString method: -- Code -- const char *MetaToString(metaobject_t *obj) { const char *ret; metatype_t *type; // See if the object has a registered metatype. if((type = (metatype_t *)MetaGetObject(&metaTypeRegistry, obj->type))) ret = type->toString(obj->object); else ret = "(unregistered object type)"; return ret; } -- End Code -- And there you have it. Using the development pattern seen in MetaToString, it is possible to add any conceivable type of functionality to metaobjects. Simply add the method prototype into the metatype object, and define a function that executes the "virtual method." === * The Sky is the Limit: Metaobject Specializations [009] * ================= Although in defining the metatype registry, I demonstrated a basic specialization of the metaobject, I do not feel that particular example demonstrates the typical use of metaobjects. To do that, I will define here two specializations of the metaobject on basic C types, so that we can store these common objects into metatables. Then, by including the metatable type inside an ordinary C structure, we can fully realize our ability to declare runtime-defined types like in JavaScript. The first specialization I will cover is the metaint. It simply adds storage to the metaobject for an integer variable. Pretty simple, huh? -- Code -- typedef struct metaint_s { metaobject_t parent; int value; } metaint_t; // toString method for metaints. Uses non-standard itoa. Feel free to // change this implementation however you see fit to avoid that. static const char *metaIntToString(void *obj) { metaint_t *mi = (metaint_t *)obj; static char str[33]; memset(str, 0, sizeof(str)); itoa(mi->value, str, 10); return str; } // Adds a metaint to a metatable. void MetaAddInt(metatable_t *metatable, const char *key, int value) { static metatype_t metaIntType; metaint_t *newInt = calloc(1, sizeof(metaint_t)); // On the first call, we register a metatype, so that metaints can // be copied and printed. if(!metaIntType.isinit) { // register metaint type MetaRegisterTypeEx(&metaIntType, METATYPE(metaint_t), sizeof(metaint_t), NULL, NULL, NULL, metaIntToString); } // Add it. newInt->value = value; MetaAddObject(metatable, key, &newInt->parent, newInt, METATYPE(metaint_t)); } // Retrieval method. If the field doesn't exist, we offer to return a default // value that was passed to us. In my own implementation I also set a global // "metaerrno" value to indicate when this occurs, in case I need to treat // it as an error in a particular code segment. int MetaGetInt(metatable_t *metatable, const char *key, int defvalue) { int retval; metaobject_t *obj; metaerrno = META_ERR_NOERR; if(!(obj = MetaGetObjectKeyAndType(metatable, key, METATYPE(metaint_t)))) { metaerrno = META_ERR_NOSUCHOBJECT; retval = defvalue; } else retval = ((metaint_t *)obj->object)->value; return retval; } // Remove method. Very straight forward. The value is returned in case you // need it at the same time you want to remove it. int MetaRemoveInt(metatable_t *metatable, const char *key) { metaobject_t *obj; int value; metaerrno = META_ERR_NOERR; if(!(obj = MetaGetObjectKeyAndType(metatable, key, METATYPE(metaint_t)))) { metaerrno = META_ERR_NOSUCHOBJECT; return 0; } MetaRemoveObject(metatable, obj); value = ((metaint_t *)obj->object)->value; free(obj->object); return value; } -- End Code -- Now, to wrap up this document, I will also show the other simple specialization for strings. This is more or less exactly the same as above, except for the type that is being returned and handled inside the functions. -- Code -- static const char *metaStringToString(void *obj) { return ((metastring_t *)obj)->value; } void MetaAddString(metatable_t *metatable, const char *key, const char *value) { static metatype_t metaStrType; metastring_t *newString = calloc(1, sizeof(metastring_t)); if(!metaStrType.isinit) { // register metastring type MetaRegisterTypeEx(&metaStrType, METATYPE(metastring_t), sizeof(metastring_t), NULL, NULL, NULL, metaStringToString); } newString->value = value; MetaAddObject(metatable, key, &newString->parent, newString, METATYPE(metastring_t)); } const char *MetaGetString(metatable_t *metatable, const char *key, const char *defvalue) { const char *retval; metaobject_t *obj; metaerrno = META_ERR_NOERR; if(!(obj = MetaGetObjectKeyAndType(metatable, key, METATYPE(metastring_t)))) { metaerrno = META_ERR_NOSUCHOBJECT; retval = defvalue; } else retval = ((metastring_t *)obj->object)->value; return retval; } const char *MetaRemoveString(metatable_t *metatable, const char *key) { metaobject_t *obj; const char *value; metaerrno = META_ERR_NOERR; if(!(obj = MetaGetObjectKeyAndType(metatable, key, METATYPE(metastring_t)))) { metaerrno = META_ERR_NOSUCHOBJECT; return NULL; } MetaRemoveObject(metatable, obj); value = ((metastring_t *)obj->object)->value; free(obj->object); return value; } -- End Code -- Using this development pattern, it is simple to define metaobject wrappers for any and every type of data conceivable. As mentioned previously, you can augment any plain C structure in your program with a metatable, and suddenly you are able to query and add and remove fields at runtime, rather than requiring the program to be recompiled. The applications of the metatable for use along with application scripting are nearly endless, especially if being employed to reflect data from a similarly dynamic language such as JavaScript or Python. With a simple call like this: int value = MetaGetInt(meta, "foobar", 0); you are playing with a power far beyond that offered to you by C alone. The sky truly is the limit. === * FIN * ====================================================================