***********************************
Changes to the NIST Express toolkit
***********************************
Don Libes, libes@cme.nist.gov
Last revised: 19-Aug-1992

	POOP adj. (Acronym for Post-OOP) A paradigm (q.v.) long
	awaited by many.  Also, reminiscent of the sound made by
	the collapse of an overinflated balloon.

OVERVIEW OF CHANGES

The bad news is: Much has changed.  You will not be able to recompile
applications without changing them.

The good news is: The system is faster.  Much faster.  And the library
is based on the Express DIS, and implements everything needed to do
full resolution of all features of Express.

Until formal documentation is written, you will have to look at the
code.  The good news is that the code is much much shorter and
cleaner.  The bad news is that I left in some of the original code as
comments, so you may be distracted by this.

I have converted over two pieces of programs that depend on the
library.  exp2cxx (in ~pdevel/src/fexp2cxx) and the step parser
(in ~pdevel/src/fstep).  Since I didn't write either one originally, I
don't take credit for the overall readability, but they at least
provide proof that the library functions.


Here is an overview of what's changed.

- The overall structure has been changed to allow easier interfacing
and more customization.  Even sophisticated applications can use the
default main now.  To use the default main, define EXPRESSinit_init
as:

	void EXPRESSinit_init() {
		EXPRESSbackend = your-backend-function-goes-here;
	}

Other hooks can be found by looking at the true definition of main.

- The OO system is gone.  Everything is pointers to real structures
rather than "objects".  This is what accounts for much of the speed
improvement.  Debugging is easier, too, since you no longer have to
rely on functions to print out structures.

The downside is that some of structures have embedded unions.  This
can be confusing at first, but at least the compiler and debuggers can
now understand what you are doing and help you out.

- Almost all of the functions in the old library are unnecessary in
the new one since you can access structure elements yourself now.
Nonetheless, for compatibility, I have defined replacements for the
most likely used functions.  If you have a function with no
definition, either there is no counterpart, I didn't think anyone
actually used that function, or I just haven't gotten around to
writing it.

- The functions most likely to counterpart-less are some of the:

	schema functions - the definition of a schema changed quite a
	bit due to USE/REF and nested schemas changing)

	type functions - types don't resemble those in the old
	library.  See more info below.

	expression functions - expression don't resemble those in the
	old library.  See more info below.

- Error processing has been speeded up.  The error messages are
greatly improved (no more overloading of a single error message for
different situations), more descriptive and much (much, much) more
error checking is done.  And files are tracked now along with line
numbers for all objects.

Some specific notes can be found below.


GETTING A COPY OF FEDEX AND THE LIBRARY

**************************
Getting a precompiled copy
**************************

The fedex executable and library can be found in ~pdevel/bin and
~pdevel/arch/lib respectively.  They will be regularly updated by me
as bugs are fixed.  So make a copy if you want a static version.

**************************
Getting the source
**************************

To retrieve the source, link to the RCS directory, check out the
CheckOut file, and then run CheckOut itself.  "make" by itself will
build an executable while "make libexpress.a" will build the library.
Here are real commands to do this:

	mkdir -p ~/pdevel/src/fexpress2
	ln -s ~pdevel/src/fexpress2/RCS ~/pdevel/src/fexpress2
	co CheckOut
	CheckOut
	make

Incidentally, the name 'fexpress2' is temporary while this release is
being tested.  Eventually, we will give it a better, more permanent
name.

**************************
'Libmisc' is dead, but ...
**************************

Note that the 'libmisc' library is no longer necessary.  (It has been
integrated directly into the express library.)  However, you still
need the the 'usual' tools in pdevel/bin and the 'usual' other
libraries in ~pdevel/arch/lib.  You can change the targets in either
Makefile or make_rules as appropriate.  The express directory has its
own make_rules for simplicity.

**************************
Documentation
**************************

There is none.  Ok, just kidding.  What there is, is a file called
Changes which you'll get from CheckOut, describing the changes from
the old version to the new version.

It is very rough.  There is little consistency, although I tried for
completeness.  (It's 22K.)  Nonetheless, it is still an overview and
skimps on precise details of many calls.  Really, it's just there to
jog my memory when I write the real documentation, or for experts
(like you) who don't want to wait for the documentation.


MISCELLANEOUS NOTES

The following are miscellaneous notes that you may find helpful -
especially because there is no other documentation.  (Sorry.)

Numerous elements in the language are now resolved including:

ALIAS, RULE, QUERY

It is interesting to note that there was formerly no way to even
represent them because the libmisc package had no means to do multiple
inheritance.  Steve and I talked about implementing multiple
inheritance but were convinced that it would drastically slow down
every other part of the system.  This seemed a poor tradeoff
considering that we only needed inheritance from at most two
orthogonal classes.

Enumerations are now separated into different scopes.  For the same
reason as above, this was formerly impossible.
======================================================================
Class x;		->		Class_of_what x;
					i.e., 
					Class_of_Type x;

Similarly, OBJget_class is now specific to whatever class you are using.

I.e.,

OBJget_class(type)	->		TYPEget_type(type)

if (class == Class_Aggregate_Type) ->	if (TYPEis_aggregate(class))

Rationale: underlying type system changed completely.  Class/object
system gone, but efficiently faked.  Can no longer call object type
'Class'.
======================================================================
Some people assumed many functions returned const values.  Many
functions did in fact return such values.  Now they do not.

Rational: Most functions are now macros, returning pointers right out
of the data structures.  Since these are the real objects, they are
writable.
======================================================================
Most objects returned from functions do not have to be OBJfree'd.

You will have to look at the documentation to see which ones.  Thus,
OBJfree has been turned into a no-op.

Rational: Most functions now return pointers right out of the data
structures.  Freeing them would corrupts the system.

If you are getting a list, call the appropriate data structure
function to free it.  I.e., SCOPEget_entities_use returns a list, you
should call LISTfree to free it.
======================================================================
SCOPEget_entities_supertype_order now no longer returns USEd entities.

Use SCHEMAget_entities_use and SCHEMAget_entities_ref to get either of
these.

Rationale: At KC's request.  This decision might be revisited.
Perhaps another function could be added.
======================================================================
ENUM_TYPEget_items now returns a dictionary instead of a list.  Each
element is an expression of type 'enumeration' instead of a symbol.

Rationale: Efficiency.
======================================================================
DYNA_init is dead and gone.  Remove all such calls.

Rationale: Hopeless nonportable and ultimately of little value.
======================================================================
The original pass1/pass2 idea has been revamped.  "pass1" is now
referred to as "parse" (since that's what it is).  "pass2" is referred
to as "resolve" (since that's what it is).  The resolve pass actually
consists of several (currently 5) passes.  The current pass number is
stored in EXPRESSpass.  This number is really only useful for
debugging purposes.

EXPRESSparse prefers to open the file itself.  Either call it as

	EXPRESSparse(model,(FILE *)0,"filename");
or	EXPRESSparse(model,filepointer,(char *)0);

EXPRESSparse takes a "model" argument that can be a new or old express
abstraction.  This allows you to call EXPRESSparse repeatedly to read
additional schemas in to an old set.

To create a new express model, call EXPRESScreate().
To resolve an express model, call EXPRESSresolve(model).
======================================================================
The STRING abstraction has been removed.  You should use the Standard
C library calls to deal with strings.  I've left a couple macros in
place to aid in conversion, but these may go away in the future.

Rationale: The STRING abstraction allowed different underlying
representations for strings, but was incomplete to the point that
users had to assume that the standard C representation was used.

It was pointless to complete it, since the Standard C library is now
very rich in string support.  The result would have just been
confusing.
======================================================================
A number of facilities are provided for referencing objects outside
the current file.

1) It is possible to logically insert other files during analysis by
use of an INCLUDE statement.  INCLUDE statements were, at one time,
valid Express.  However, they are not currently.  It is best to think
of them as a preprocessing phase of the implementation that has
nothing to do with the language proper.

(With that in mind...) INCLUDE statements can appear outside a schema
or at the top-level of a schema.  Included files are not restricted to
including schemas, but may include, for example, a set of entities, a
rule, etc.  For example:

	INCLUDE 'schema-file.exp';

2) Referencing a schema that is not defined in the file (or included
from another file) causes fedex to search for a file with the same
name as the schema with a ".exp" extension in the directories named by
the environment variable EXPRESS_PATH.  For example, in the C-shell,
you could say:

	setenv EXPRESS_PATH "~pdes/data/part42 ~pdes/data/part202"

In order to facilitate this, I recommend that all schema files have
symbolic links created to them by the names of any schemas within that
are likely to be externally referenced from them.  Stable schemas may
have symbolic links placed in a directory of stable part files, while
unstable schemas should be referenced from a specific part directory.

For example, imagine that the directory for stable schemas is
~pdes/data/schemas/standard while, part 202 is still undergoing
evolution.  In this case, the appropriate command might be:

	setenv EXPRESS_PATH "~pdes/schemas/part42 \
			     ~pdes/schemas/standard"

If not set, the default path of "." (the current directory) is used.
======================================================================
The old "warning" kludgery is gone.  It has been replaced by several
routines in the ERROR package including

	ERRORcreate_option
	ERRORset_option
	ERRORset_all_options

To associate an option string with a particular error, call
ERRORcreate_option.

	ERRORcreate_option("subtypes",ERROR_missing_subtype);

To actually set or unset an option, it suffices to say:

	ERRORset_option(sc_optarg,set);

where set is a true/false value.  This is especially convenient with
getopt, since you can use the same code to set or unset an option just
by testing the option letter inside of the 'set' argument.  I.e.

	ERRORset_option(sc_optarg,c == 'w');

To print all the options out, say:

	LISTdo(ERRORoptions, opt, Error_Option *)
		fprintf(stderr,"%s\n",opt->name);
	LISTod
======================================================================
Fedex has been changed to print errors immediately rather than
buffering them up and sorting them by line number.  The underlying
function to toggle this is defined as follows:

	ERRORbuffer_messages(boolean);

While the buffering code has been speeded up (it used to call two
extra processes, now it doesn't call any), I see little point to
sorting by line numbers.  The order in which diagnostics are presented
to the user are the order in which problems should be resolved.  I.e.,
a missing schema will be detected immediately, and will cause many
spurious errors.
======================================================================
The error routines have been beefed up in other ways as well,
especially for robustness.  For example, if an internal or operating
system error occurs, a strong attempt is made to produce all previous
diagnostics, rather then just dumping core.

The main entry for reporting errors was changed from
ERRORreport_with_line to ERRORreport_with_symbol.
ERRORreport_with_line still exists for programs that don't know
anything about symbols (in which case, we guess at the information).

Rational: This was a necessary change in order to provide diagnostics
with filenames.  The symbol abstraction itself also had to be
augmented with filenames.
======================================================================
The error messages are formatted a little differently so that the
default Emacs compile bindings can automatically read in and position
the appropriate Express file and display the error at the same time.

As an aside, Jim Wachholz has built an Express mode for Emacs.
Contact him for more info.
======================================================================
I have backed off on the original code's attempt at significant
information hiding.  In particular, while some of the hiding worked,
some didn't.  For example, users had to know whether information was
returned as a list or a dictionary.  In fact, it is possible to hide
this as well - I don't know why Steve didn't bother, except that he
was tired.

For example, instead of a single LISTadd_last routine, there would have to
different LISTadd_last routines for every class.  This would have improved
typechecking.

The new code is more efficient for a variety of reasons.  The original
code paid a heavy price in efficiency for dynamic typechecking, and
using individuals function to access each data element in a structure.

The new code allows direct access.  There is necessarily some dynamic
typechecking left in the system, but it quite small.  The number of
switch statements is surprisingly small (less than two dozen).

The new code simulates the class hierarchy used by the old code in
spirit.  In reality, the class hierarchy has been compressed from 5
levels to 2.  The resulting code is much, much faster.

The key notions in the new system are:

	a handful of base classes
	dictionaries understand classes

Instead of objects being self-descriptive, context is used.  The
dictionary is one such example.  When you store an object, you
describe it to the dictionary.  Upon later retrieval, you get the
object and the description back.  When the object is not in the
dictionary, there is no descriptor.  Your code takes over the job of
remembering what something is.  Invariably, this very straightforward.
I.e., you might keep a list of entities, in which case you are
guaranteed all the elements on the list are entities.

A small number of operations can be performed on all classes.  For
example, it is possible to get the printable description of a class by
saying:

	OBJget_type(type)

All OBJ functions are implemented by single-table lookups.
Mnemonically-suggestive characters are used as indices into the OBJ
table.
======================================================================
Notes on fedex arguments:

	b flag (buffering) - Now "off" by default.  fedex reports the
		most important error messages first.  The idea of
		messages appearing in the order of line numbers has
		little value, especially in the context of multiple
		input files.

	r flag (no resolve) - Skip resolve pass.

	p flag (print pass info) - This takes a string argument
		object types to print out while being processed.

		Valid object types are:

		p	procedure
		r	rule
		f	function
		e	entity
		t	type
		s	schema or file
		#	pass #
		E	everything (all of the above)

		For example, the following prints out entity and rule
		names as they are being processed:

		fedex -p er

======================================================================
While some ALGxxx functions (macros, really) still exist, some have
been replaced by ones specific to the type of algorithm.  For example,
ALGget_parameters should be changed to FUNCget_parameters,
RULEget_parameters, or PROCget_parameters.
======================================================================
The whole idea of passes has been revamped.  The old pass2 (now called
resolve) is no longer monolithic but is broken into several more
passes.  The old pass2 did a depth-first resolution over the object
tree.  Besides requiring a very deep stack, it forced on-demand
resolution which was extremely painful - everything had to constantly
check whether things had been resolved or whether there was infinite
recursion (due to USE/REF).

It was possible to restructure this into several breadth-first passes
over the object tree.  It does not appear as though a heavy penalty is
paid for the additional passes.  Here is an outline of passes.

RENAME-SCHEMAS
For each schema
	For each rename clause
		Connect the schema symbol to the real schema.

	At this point, some renames and schemas are marked 'failed'.
	Interestingly, rather than reading the dictionary to get
	schema names, we use a FIFO, since schemas names can be
	dynamically introduced while resolved USE/REFs when reading
	other files.	

RENAME-OBJECTS
For each schema
	For each rename clause
		Connect the final object to the rename

	At this point, renames are marked 'rename_resolved'
	and some are marked failed.

SUBSUPERS
foreach schema
	foreach entity, type (including within functions, etc)
		resolve sub/supertypes in types
		resolve local types

RESOLVE-TYPES		resolve type defs and entity attribute defs
foreach schema
	resolve type definitions
	foreach entity, alg
		resolve attribute types (including LOCALs)
		resolve proc/func parameter/return types

	At this point, the only types not resolved are the control variables
	in query types and repeats.  In order to resolve them, you have to
	do expression resolution.  Fortunately, both can be done in an order
	so that no forward references are required.

RESOLVE-INHERITANCE-COUNT (can be combined with RESOLVE-TYPES above)
requires: superclasses to be resolved to entities
foreach scope
	foreach entity (e)
	     X:	foreach superclass (sc)
		if entity-inheritance(sc) is not calculated
			X(sc)
		e->inheritance += sc->inheritance
	foreach scope, recurse


EXPRESSIONS-&-STATEMENTS
foreach schema
	foreach scope (entity, alg)
		resolve expression in query, repeat and therefore resolve
			type of control in query, repeat
		resolve derived attributes
		resolve attribute initializers
			do only entity attributes have initializers???
		resolve statements (recurse)
	foreach type
		resolve where clause
	
======================================================================
Original code did not check for redefining keywords.  Fixed.
======================================================================
USE and REFERENCE are handled by having separate lists and
dictionaries to remember schemas and individual objects that are USEd
and REFd.  'rename' structures are used to point to the remote object.
(This avoids the need for copying dictionaries, which enabled large
time/space savings.)

Once the rename has been processed, the rename points directly to the
final object, even if several schemas have USEd one another.

(The old USE/REF implementation did not detect recursive refs and
failed ungracefully in the presence of certain schema errors.
Dictionaries entries could not be removed while another part of the
code was traversing the dictionary.)
======================================================================
Enumerations are expressions which are entered into two scopes.  One
scope is that of their own type definition.  To adhere to the special
visibility rule placed on enumerations, they are also entered into the
immediately enclosing scope.  In order to allow multiple enumeration
tags with the same name (but from different enumeration scopes), the
dictionary recognizes such overloads and marks such definitions as
"ambiguous" so that later retrievals fail with an appropriate message,
while other retrievals succeed.

Since the dictionary already knows object types, and this code is only
executed during conflicts, it is not expensive to have the dictionary
do this.  However, it did require another dictionary routine
specifically for the purpose of adding enumerations to the enum-scope
to handle enumerations with the same name in the same type scope as a
real error.
======================================================================
Formal parameter tags are recorded but not analyzed, since it is
possible to do all type resolution without it.  Oddly, tags are not
necessary, I suppose they could be useful for a run-time evaluator.
======================================================================
Implicit loop controls and ALIAS are handled by associating with them
a "tiny" scope of one element.

The function SCOPEget_nearest_enclosing_entity had to be invented to
extract the true referent of a SELF when you're inside of a tiny
scope.
======================================================================
Local variables are handled the same way at the schema level that they
are at the entity level or any other scope.  I only mention this
because the the previous implementation did not support locals.
======================================================================
Classes of object types can be represented as bit strings (see
express_basic.h).  This enables efficient handling of things like the
-p flag.  More importantly, it can be helpful to give search functions
hints, such as when searching for a type (which normally includes
entities as well).  For example, this provides a way of figuring out
the type when given the (legal) attribute declaration of:

	A1: A1;

It is not sufficient to merely start searching at a superscope since
types can be defined within the current scopes.  The important thing
is to ignore attributes.  This and the business of allowing duplicate
enumerations are exceptions to the rule of only allowing one
definition with the same name in a single scope.
======================================================================
CONSTANTs are represented by attributes but with the flag.constant bit
on.  Unlike normal attributes, these can be found in non-entity scopes.
======================================================================
	Always code as if the person who will maintain your code is a
	sadistic, psychopathic maniac who knows where you live.
	- David Olsen

	Writing documentation actually improves code. The reason is
	that it is usually easier to clean up a crock than have to
	explain it.  - G.  Steele.