*********************************** Changes to the NIST Express toolkit *********************************** Don Libes, libes@cme.nist.gov Last revised: 19-Aug-1992 POOP adj. (Acronym for Post-OOP) A paradigm (q.v.) long awaited by many. Also, reminiscent of the sound made by the collapse of an overinflated balloon. OVERVIEW OF CHANGES The bad news is: Much has changed. You will not be able to recompile applications without changing them. The good news is: The system is faster. Much faster. And the library is based on the Express DIS, and implements everything needed to do full resolution of all features of Express. Until formal documentation is written, you will have to look at the code. The good news is that the code is much much shorter and cleaner. The bad news is that I left in some of the original code as comments, so you may be distracted by this. I have converted over two pieces of programs that depend on the library. exp2cxx (in ~pdevel/src/fexp2cxx) and the step parser (in ~pdevel/src/fstep). Since I didn't write either one originally, I don't take credit for the overall readability, but they at least provide proof that the library functions. Here is an overview of what's changed. - The overall structure has been changed to allow easier interfacing and more customization. Even sophisticated applications can use the default main now. To use the default main, define EXPRESSinit_init as: void EXPRESSinit_init() { EXPRESSbackend = your-backend-function-goes-here; } Other hooks can be found by looking at the true definition of main. - The OO system is gone. Everything is pointers to real structures rather than "objects". This is what accounts for much of the speed improvement. Debugging is easier, too, since you no longer have to rely on functions to print out structures. The downside is that some of structures have embedded unions. This can be confusing at first, but at least the compiler and debuggers can now understand what you are doing and help you out. - Almost all of the functions in the old library are unnecessary in the new one since you can access structure elements yourself now. Nonetheless, for compatibility, I have defined replacements for the most likely used functions. If you have a function with no definition, either there is no counterpart, I didn't think anyone actually used that function, or I just haven't gotten around to writing it. - The functions most likely to counterpart-less are some of the: schema functions - the definition of a schema changed quite a bit due to USE/REF and nested schemas changing) type functions - types don't resemble those in the old library. See more info below. expression functions - expression don't resemble those in the old library. See more info below. - Error processing has been speeded up. The error messages are greatly improved (no more overloading of a single error message for different situations), more descriptive and much (much, much) more error checking is done. And files are tracked now along with line numbers for all objects. Some specific notes can be found below. GETTING A COPY OF FEDEX AND THE LIBRARY ************************** Getting a precompiled copy ************************** The fedex executable and library can be found in ~pdevel/bin and ~pdevel/arch/lib respectively. They will be regularly updated by me as bugs are fixed. So make a copy if you want a static version. ************************** Getting the source ************************** To retrieve the source, link to the RCS directory, check out the CheckOut file, and then run CheckOut itself. "make" by itself will build an executable while "make libexpress.a" will build the library. Here are real commands to do this: mkdir -p ~/pdevel/src/fexpress2 ln -s ~pdevel/src/fexpress2/RCS ~/pdevel/src/fexpress2 co CheckOut CheckOut make Incidentally, the name 'fexpress2' is temporary while this release is being tested. Eventually, we will give it a better, more permanent name. ************************** 'Libmisc' is dead, but ... ************************** Note that the 'libmisc' library is no longer necessary. (It has been integrated directly into the express library.) However, you still need the the 'usual' tools in pdevel/bin and the 'usual' other libraries in ~pdevel/arch/lib. You can change the targets in either Makefile or make_rules as appropriate. The express directory has its own make_rules for simplicity. ************************** Documentation ************************** There is none. Ok, just kidding. What there is, is a file called Changes which you'll get from CheckOut, describing the changes from the old version to the new version. It is very rough. There is little consistency, although I tried for completeness. (It's 22K.) Nonetheless, it is still an overview and skimps on precise details of many calls. Really, it's just there to jog my memory when I write the real documentation, or for experts (like you) who don't want to wait for the documentation. MISCELLANEOUS NOTES The following are miscellaneous notes that you may find helpful - especially because there is no other documentation. (Sorry.) Numerous elements in the language are now resolved including: ALIAS, RULE, QUERY It is interesting to note that there was formerly no way to even represent them because the libmisc package had no means to do multiple inheritance. Steve and I talked about implementing multiple inheritance but were convinced that it would drastically slow down every other part of the system. This seemed a poor tradeoff considering that we only needed inheritance from at most two orthogonal classes. Enumerations are now separated into different scopes. For the same reason as above, this was formerly impossible. ====================================================================== Class x; -> Class_of_what x; i.e., Class_of_Type x; Similarly, OBJget_class is now specific to whatever class you are using. I.e., OBJget_class(type) -> TYPEget_type(type) if (class == Class_Aggregate_Type) -> if (TYPEis_aggregate(class)) Rationale: underlying type system changed completely. Class/object system gone, but efficiently faked. Can no longer call object type 'Class'. ====================================================================== Some people assumed many functions returned const values. Many functions did in fact return such values. Now they do not. Rational: Most functions are now macros, returning pointers right out of the data structures. Since these are the real objects, they are writable. ====================================================================== Most objects returned from functions do not have to be OBJfree'd. You will have to look at the documentation to see which ones. Thus, OBJfree has been turned into a no-op. Rational: Most functions now return pointers right out of the data structures. Freeing them would corrupts the system. If you are getting a list, call the appropriate data structure function to free it. I.e., SCOPEget_entities_use returns a list, you should call LISTfree to free it. ====================================================================== SCOPEget_entities_supertype_order now no longer returns USEd entities. Use SCHEMAget_entities_use and SCHEMAget_entities_ref to get either of these. Rationale: At KC's request. This decision might be revisited. Perhaps another function could be added. ====================================================================== ENUM_TYPEget_items now returns a dictionary instead of a list. Each element is an expression of type 'enumeration' instead of a symbol. Rationale: Efficiency. ====================================================================== DYNA_init is dead and gone. Remove all such calls. Rationale: Hopeless nonportable and ultimately of little value. ====================================================================== The original pass1/pass2 idea has been revamped. "pass1" is now referred to as "parse" (since that's what it is). "pass2" is referred to as "resolve" (since that's what it is). The resolve pass actually consists of several (currently 5) passes. The current pass number is stored in EXPRESSpass. This number is really only useful for debugging purposes. EXPRESSparse prefers to open the file itself. Either call it as EXPRESSparse(model,(FILE *)0,"filename"); or EXPRESSparse(model,filepointer,(char *)0); EXPRESSparse takes a "model" argument that can be a new or old express abstraction. This allows you to call EXPRESSparse repeatedly to read additional schemas in to an old set. To create a new express model, call EXPRESScreate(). To resolve an express model, call EXPRESSresolve(model). ====================================================================== The STRING abstraction has been removed. You should use the Standard C library calls to deal with strings. I've left a couple macros in place to aid in conversion, but these may go away in the future. Rationale: The STRING abstraction allowed different underlying representations for strings, but was incomplete to the point that users had to assume that the standard C representation was used. It was pointless to complete it, since the Standard C library is now very rich in string support. The result would have just been confusing. ====================================================================== A number of facilities are provided for referencing objects outside the current file. 1) It is possible to logically insert other files during analysis by use of an INCLUDE statement. INCLUDE statements were, at one time, valid Express. However, they are not currently. It is best to think of them as a preprocessing phase of the implementation that has nothing to do with the language proper. (With that in mind...) INCLUDE statements can appear outside a schema or at the top-level of a schema. Included files are not restricted to including schemas, but may include, for example, a set of entities, a rule, etc. For example: INCLUDE 'schema-file.exp'; 2) Referencing a schema that is not defined in the file (or included from another file) causes fedex to search for a file with the same name as the schema with a ".exp" extension in the directories named by the environment variable EXPRESS_PATH. For example, in the C-shell, you could say: setenv EXPRESS_PATH "~pdes/data/part42 ~pdes/data/part202" In order to facilitate this, I recommend that all schema files have symbolic links created to them by the names of any schemas within that are likely to be externally referenced from them. Stable schemas may have symbolic links placed in a directory of stable part files, while unstable schemas should be referenced from a specific part directory. For example, imagine that the directory for stable schemas is ~pdes/data/schemas/standard while, part 202 is still undergoing evolution. In this case, the appropriate command might be: setenv EXPRESS_PATH "~pdes/schemas/part42 \ ~pdes/schemas/standard" If not set, the default path of "." (the current directory) is used. ====================================================================== The old "warning" kludgery is gone. It has been replaced by several routines in the ERROR package including ERRORcreate_option ERRORset_option ERRORset_all_options To associate an option string with a particular error, call ERRORcreate_option. ERRORcreate_option("subtypes",ERROR_missing_subtype); To actually set or unset an option, it suffices to say: ERRORset_option(sc_optarg,set); where set is a true/false value. This is especially convenient with getopt, since you can use the same code to set or unset an option just by testing the option letter inside of the 'set' argument. I.e. ERRORset_option(sc_optarg,c == 'w'); To print all the options out, say: LISTdo(ERRORoptions, opt, Error_Option *) fprintf(stderr,"%s\n",opt->name); LISTod ====================================================================== Fedex has been changed to print errors immediately rather than buffering them up and sorting them by line number. The underlying function to toggle this is defined as follows: ERRORbuffer_messages(boolean); While the buffering code has been speeded up (it used to call two extra processes, now it doesn't call any), I see little point to sorting by line numbers. The order in which diagnostics are presented to the user are the order in which problems should be resolved. I.e., a missing schema will be detected immediately, and will cause many spurious errors. ====================================================================== The error routines have been beefed up in other ways as well, especially for robustness. For example, if an internal or operating system error occurs, a strong attempt is made to produce all previous diagnostics, rather then just dumping core. The main entry for reporting errors was changed from ERRORreport_with_line to ERRORreport_with_symbol. ERRORreport_with_line still exists for programs that don't know anything about symbols (in which case, we guess at the information). Rational: This was a necessary change in order to provide diagnostics with filenames. The symbol abstraction itself also had to be augmented with filenames. ====================================================================== The error messages are formatted a little differently so that the default Emacs compile bindings can automatically read in and position the appropriate Express file and display the error at the same time. As an aside, Jim Wachholz has built an Express mode for Emacs. Contact him for more info. ====================================================================== I have backed off on the original code's attempt at significant information hiding. In particular, while some of the hiding worked, some didn't. For example, users had to know whether information was returned as a list or a dictionary. In fact, it is possible to hide this as well - I don't know why Steve didn't bother, except that he was tired. For example, instead of a single LISTadd_last routine, there would have to different LISTadd_last routines for every class. This would have improved typechecking. The new code is more efficient for a variety of reasons. The original code paid a heavy price in efficiency for dynamic typechecking, and using individuals function to access each data element in a structure. The new code allows direct access. There is necessarily some dynamic typechecking left in the system, but it quite small. The number of switch statements is surprisingly small (less than two dozen). The new code simulates the class hierarchy used by the old code in spirit. In reality, the class hierarchy has been compressed from 5 levels to 2. The resulting code is much, much faster. The key notions in the new system are: a handful of base classes dictionaries understand classes Instead of objects being self-descriptive, context is used. The dictionary is one such example. When you store an object, you describe it to the dictionary. Upon later retrieval, you get the object and the description back. When the object is not in the dictionary, there is no descriptor. Your code takes over the job of remembering what something is. Invariably, this very straightforward. I.e., you might keep a list of entities, in which case you are guaranteed all the elements on the list are entities. A small number of operations can be performed on all classes. For example, it is possible to get the printable description of a class by saying: OBJget_type(type) All OBJ functions are implemented by single-table lookups. Mnemonically-suggestive characters are used as indices into the OBJ table. ====================================================================== Notes on fedex arguments: b flag (buffering) - Now "off" by default. fedex reports the most important error messages first. The idea of messages appearing in the order of line numbers has little value, especially in the context of multiple input files. r flag (no resolve) - Skip resolve pass. p flag (print pass info) - This takes a string argument object types to print out while being processed. Valid object types are: p procedure r rule f function e entity t type s schema or file # pass # E everything (all of the above) For example, the following prints out entity and rule names as they are being processed: fedex -p er ====================================================================== While some ALGxxx functions (macros, really) still exist, some have been replaced by ones specific to the type of algorithm. For example, ALGget_parameters should be changed to FUNCget_parameters, RULEget_parameters, or PROCget_parameters. ====================================================================== The whole idea of passes has been revamped. The old pass2 (now called resolve) is no longer monolithic but is broken into several more passes. The old pass2 did a depth-first resolution over the object tree. Besides requiring a very deep stack, it forced on-demand resolution which was extremely painful - everything had to constantly check whether things had been resolved or whether there was infinite recursion (due to USE/REF). It was possible to restructure this into several breadth-first passes over the object tree. It does not appear as though a heavy penalty is paid for the additional passes. Here is an outline of passes. RENAME-SCHEMAS For each schema For each rename clause Connect the schema symbol to the real schema. At this point, some renames and schemas are marked 'failed'. Interestingly, rather than reading the dictionary to get schema names, we use a FIFO, since schemas names can be dynamically introduced while resolved USE/REFs when reading other files. RENAME-OBJECTS For each schema For each rename clause Connect the final object to the rename At this point, renames are marked 'rename_resolved' and some are marked failed. SUBSUPERS foreach schema foreach entity, type (including within functions, etc) resolve sub/supertypes in types resolve local types RESOLVE-TYPES resolve type defs and entity attribute defs foreach schema resolve type definitions foreach entity, alg resolve attribute types (including LOCALs) resolve proc/func parameter/return types At this point, the only types not resolved are the control variables in query types and repeats. In order to resolve them, you have to do expression resolution. Fortunately, both can be done in an order so that no forward references are required. RESOLVE-INHERITANCE-COUNT (can be combined with RESOLVE-TYPES above) requires: superclasses to be resolved to entities foreach scope foreach entity (e) X: foreach superclass (sc) if entity-inheritance(sc) is not calculated X(sc) e->inheritance += sc->inheritance foreach scope, recurse EXPRESSIONS-&-STATEMENTS foreach schema foreach scope (entity, alg) resolve expression in query, repeat and therefore resolve type of control in query, repeat resolve derived attributes resolve attribute initializers do only entity attributes have initializers??? resolve statements (recurse) foreach type resolve where clause ====================================================================== Original code did not check for redefining keywords. Fixed. ====================================================================== USE and REFERENCE are handled by having separate lists and dictionaries to remember schemas and individual objects that are USEd and REFd. 'rename' structures are used to point to the remote object. (This avoids the need for copying dictionaries, which enabled large time/space savings.) Once the rename has been processed, the rename points directly to the final object, even if several schemas have USEd one another. (The old USE/REF implementation did not detect recursive refs and failed ungracefully in the presence of certain schema errors. Dictionaries entries could not be removed while another part of the code was traversing the dictionary.) ====================================================================== Enumerations are expressions which are entered into two scopes. One scope is that of their own type definition. To adhere to the special visibility rule placed on enumerations, they are also entered into the immediately enclosing scope. In order to allow multiple enumeration tags with the same name (but from different enumeration scopes), the dictionary recognizes such overloads and marks such definitions as "ambiguous" so that later retrievals fail with an appropriate message, while other retrievals succeed. Since the dictionary already knows object types, and this code is only executed during conflicts, it is not expensive to have the dictionary do this. However, it did require another dictionary routine specifically for the purpose of adding enumerations to the enum-scope to handle enumerations with the same name in the same type scope as a real error. ====================================================================== Formal parameter tags are recorded but not analyzed, since it is possible to do all type resolution without it. Oddly, tags are not necessary, I suppose they could be useful for a run-time evaluator. ====================================================================== Implicit loop controls and ALIAS are handled by associating with them a "tiny" scope of one element. The function SCOPEget_nearest_enclosing_entity had to be invented to extract the true referent of a SELF when you're inside of a tiny scope. ====================================================================== Local variables are handled the same way at the schema level that they are at the entity level or any other scope. I only mention this because the the previous implementation did not support locals. ====================================================================== Classes of object types can be represented as bit strings (see express_basic.h). This enables efficient handling of things like the -p flag. More importantly, it can be helpful to give search functions hints, such as when searching for a type (which normally includes entities as well). For example, this provides a way of figuring out the type when given the (legal) attribute declaration of: A1: A1; It is not sufficient to merely start searching at a superscope since types can be defined within the current scopes. The important thing is to ignore attributes. This and the business of allowing duplicate enumerations are exceptions to the rule of only allowing one definition with the same name in a single scope. ====================================================================== CONSTANTs are represented by attributes but with the flag.constant bit on. Unlike normal attributes, these can be found in non-entity scopes. ====================================================================== Always code as if the person who will maintain your code is a sadistic, psychopathic maniac who knows where you live. - David Olsen Writing documentation actually improves code. The reason is that it is usually easier to clean up a crock than have to explain it. - G. Steele.