Dataflows in CHAIMS

Draft 1, 11/2/98

What is user-data and what is meta-data/control-data in CLAM?
Can user-data become control-data?
Type checking?
Comparison and arithmetic over user-data in CLAM?
Format of user-data and control-data in CPAM?

Decisions (Dec 1):

We keep the current type system.
User data can become control data, and vice versa.
CLAM supports boolean expressions.
CLAM allows comparison between all simple CHAIMS types. For the CHAIMS compiler this means that it has to figure out if a value is in gentype format or not, and if it is in gentype format, the compiler first has to generate the code for extracting the real value before it can generate the comparison statement.

1. User-data and meta-data in CLAM

User-data and meta-data

CLAM is a composition language. Arithmetic and IO-functions are pushed down into megamodules. A megaprogram written in CLAM merely composes the methods offered by megamodules. Composition in CLAM consists of two aspects:

invoking methods and routing data from one megamodule to another one (setting up connection, setting parameters, extracting results, ...)
determining the flow of control (while and if statements)

The data that flows between megamodules and the megaprogram falls into three categories:

Meta-data returned by EXAMINE and ESTIMATE: this data is only used for determining the flow of control, (we could also call it control-data).

Question

Data returned by EXTRACT not used for determining the flow of control. This data is routed on to other megamodules (user-data).
Data returned by EXTRACT that is used for control flow, and like data from ESTIMATE may be routed on to helper modules that get necessary control decisions from the user or to helper modules for further arithmetic and decisions (user-data becoming meta-data). At which point such user-data becomes meta-data/control-data varies, it depends on when a megaprogrammer perceives it as user-data, and when as meta-data.

The seperation into meta-data and user-data is fuzzy, and also depends on the point of view: data considered meta-data in the megaprogram (e.g. results from ESTIMATE) is user-data from the point of view of a megamodule doing IO or computation on it (e.g. calculating cost function)!

Tricky: Also from helper modules (even the ones making decisions and comparisons) we extract the result (e.g. a boolean) with EXTRACT, exactly in the same way as we extract results from any other megamodule. In order to avoid infinite recursion we have to allow somehow testing of user-data coming from an EXTRACT within the CHAIMS language.

Example of a piece of code that we cannot yet execute because CLAM does not offer any operations for investigating blobs, even not for comp_res (an encoded gentype containing a boolean):

.... (myfee1 = fee) = best1_mmh.ESTIMATE("Best_Route")
(myfee2 = fee) = best2_mmh.ESTIMATE("Best_Route")
smaller_ih = math_mmh.INVOKE("Smaller", Arg1 = myfee1, Arg2 = myfee2)
WHILE (smaller_ih.EXAMINE() != DONE) {}
(comp_res = ResultB) = smaller_ih.EXTRACT()
IF (comp_res = TRUE) THEN ....

The repository entries for this example are the following:
METHOD CMathS.Smaller(IN Arg1, IN Arg2, RES ResultB)
PARAM CMathS.ResultB BOOLEAN /*boolean result of a comparison

The return values of INVOKE and SETUP are just handles. A handle is meta-data that can neither be used as input to a megamodule method nor can it be used for determining the control flow, so we do not consider handles any further in our discussion here.
Question: Can we imagine any scenarios where this statement is not true, where we have operations over handles?

Where are the problems we have?

Flags returned by EXAMINE are always meta-data, and are used directly as in boolean expressions in WHILE and IF statements. In contrast to that, booleans returned by EXTRACT are normally user data for other megamodules, but sometimes become meta-data that should be used directly in WHILE and IF statements like the flags of EXAMINE (not yet possible).
The testing of flags from EXAMINE and of booleans from EXTRACT in the WHILE and IF statements cannot be delegated to helper megamodules (otherwise we end up with infinite recursions). We can however delegate all operations over all other types of meta-data (e.g. calculating cost functions over data from type datetime or integer received by ESTIMATE) to megamodules.
All data received by EXAMINE or EXTRACT is encoded in gentypes (even if it is of a simple CHAIMS type). This is necessary so that data can be used as input for other methods. Yet there is currently no way how a megaprogram can investigate a gentype ==> even checking the value of a gentype containing a boolean is not possible.
We could enlarge CLAM by operations over all simple CHAIMS-type (which would automatically include user-data as well as control-data, as neither CLAM nor the CHAIMS -architecture allows to distinguish them in all cases). Yet this is contradictory to the goal of having a purely compositional language by pushing down arithmetic and IO to megamodules. CLAM would be become a special C-language.
Introducing that the compiler converts gentypes appearing in comparisons back to normal booleans has also negative consequences like:

We do not have any separation between meta-data and user-data. The goals of CHAIMS so far have been to push down all arithmetic and IO into megamodules, to have ESTIMATE, to allow to user also results of EXTRACT for determining the control flow, and to use the same megamodules for all data, meta-data as well as user-data (see section 4 for more about this).

2. First attempt of a solution:

The original objectives (and current implementation) were (among others):

to push down all processing (arithmetic and logic operations, IO) into megamodules
to have exactly one way of communication with megamodules (current CPAM protocol) independent of what kind of data is processed (user- data or control-data).
to have opaque user data and some simple CHAIMS types to allow somehow data received from a megamodule being used for control
clearly separate the user-data view from the composition (and control-data) view

Due to infinite recursion objective 1 cannot be completely fulfilled. Objective 4 and objective 3 are contradictory. Objective 2 requires that user- and control-data is treated in the same way, and that at some point user-data becomes control-data.

Solution one tries to build on above objectives and on the current CHAIMS system and tries to avoid a complete redefinition of CLAM and CPAM and a redesign of the CHAIMS system.

Approach:

CLAM distinguishes between the simple CHAIMS-type boolean (yser-data or meta-data implemented as a gentype) and decisionflag (implemented as a normal boolean). A gentype-boolean is what can be returned by an EXTRACT. A decisionflag is the result of applying the operators == and != to an operand of type gentype-boolean and a keyword or an invocationstatus and a keyword. A decisionflag is also the result of a boolean expression and can be used in IF and WHILE statements.
Invocationstatus as well as decisionflag cannot be used as input parameters for megamodule methods.
TRUE and FALSE are literals for both decisionflag as well as boolean.
The transition between user-data and control-data can only take place by transforming booleans (i.e. gentype-booleans that are returned by EXTRACT) into decisionflags. This transformation automatically takes place whenever we have in CLAM one of the following statements where res is always a gentype-boolean:

(res = result) = h1.EXTRACT()

IF (res == TRUE) ....

... res == FALSE....

....res != TRUE...

... res != FALSE ....

compiler

result of the comparison is a decisionflag

Also the result of comparing an invocationstatus with an invocationstatus literal results in a decisionflag, and can thus be used in WHILE and IF statements, e.g.:

status = h1.EXAMINE()

IF (status ==DONE) ....

IF (status ==ERROR) .....

WHILE (status !=DONE) ....

CLAM allows boolean expressions only over decisionflags and knows the logical operators | , &, - , ==, != (and, or, not , equal, not-equal), e.g.:

WHILE ((stat1 ==DONE) | (stat2 == stat5) & (-(stat3 ==ERROR) | (stat4 ==ERROR))) .....

expr = (res1 == TRUE) & (res2 == FALSE) | expr2

IF (expr) ....

Consequences of this approach:

All operations with the exception of comparison for invocationstatus and booleans resulting in decisionflags and boolean expressions over decisionflags are pushed down into megamodules. This includes also operations like comparisons of data of other CHAIMS types than boolean, even if it is meta-data. In order to avoid too akward coding, the shortcut INVEX can be used.
The math helper module offers operations (arithmetic as well as logical) over all simple CHAIMS types, i.e. also over gentype-booleans. Because it is a megamodule, the operands as well as the results are all encoded gentypes and can thus be used as input for other methods. The only result than can be used for controlling the control flow are booleans, as they can be transformed into decisionflags in the way described above.
A decisionflag cannot be used as an input parameter for a megamodule. I.e., the comparison operators and boolean expressions offered by CLAM are at the end of any data processing chain and their result can only be used in IF and WHILE statements in order to determine the control flow. Boolean operators cannot be misused for operations on user-data; CLAM remains a nearly purely compositional language, even concerning operations over meta-data that is later on used for determining the control flow.

Question

Results of ESTIMATE cannot be directly compared or tested in CLAM, a math helper module must be used for that. It returns gentype-booleans which can be transformed into decisionflags. We thus preserve the composition-only character and the small set of primitives in CLAM.
Data format in CPAM: all meta- and user-data is encoded into gentypes, apart from the invocationstatus returned by EXAMINE.
Compiler: |, &, - have always decisionflags as operands, so the generated C-statements are the same as the CLAM-statements. If == and != appear between two variable names, then the variables are decisionflags. If == and != appear between a variable name and one of the keywords ERROR, DONE, NOT_DONE, then the first operand is an invocationstatus (e.g. an integer). If == and != appear between a variable name and one of the keywords TRUE, FALSE, then the first operand is either a decisionflag or a gentype-boolean. In case of a gentype-boolean the operand has to be converted into a normal boolean in the generated C-code.
Data-view, composition-view: Apart from the exception of boolean results (which can become decisionflags), there remains a strict separation between data-view and composition-view.
Type checking: at the best a graceful abort at execution time

CHAIMS focuses on composition, with CLAM as a composition-only language. By pushing down all operations we get a language and a system that executes composition of processes over user-data as well as composition of processes over meta-data/control-data. Various megamodules, e.g. math helper module and IO-megamodule can be used for any kind of data. Example IO-megamodule:
- A user gets asked for the cities between which he wants to travel (input and output is user-data).
- A user gets presented three possible ways to go from A to B and decides which one he wants to make a reservation for (input user-data, output meta-data or user-data depending on megamodules).
- A user has to decide if he is satisfied with the routes found so far, or if the program should continue (input user-data, output meta-data).
- A user has to decide - based on meta-data received by ESTIMATE - which megamodule he prefers (input meta-data, output meta-data).

Variant to simplify compiler

In order to avoid that the compiler has to determine if in a statement like (res==TRUE) res is a decisionflag or a gentype-boolean, we could require that we write (res==TRUE) for gentype-booleans, and just res for decisionflags.

3. Other solutions?

Does anybody have other solutions?

4. What is "purely compositional"?

or traditional versus no arithmetic/IO versus separation of user- and meta-data

Having a procedural program that composes software always means having a flow of control. In this flow of control decisions must be made, based on data. In a traditional language no difference between control- and user-data is made - all data returned by components can be used for determining the flow of control, and the same operations apply for any data. In fact, what in a function A is user-data may be control-data in a function B called by A, and vice versa.
Question: How does the situation look like for functional or logic programming?

In a purely compositional language we essentially have two possibilities:

No functions for arithmetic and IO at all: approach taken by CHAIMS, with the exception of boolean expressions and comparison for invocationstatus and booleansvia decisionflags (see approach suggested in section 2). This is necessary, because we use the same megamodules for IO and processing for control- as well as user-data. We allow the transformation of booleans into decisionflags. Decisionflags are only used for boolean expressions in IF and WHILE statements, so they never need to be transformed back into booleans that could be used as input parameters for megamodule methods.
Clear separation of user-data and control-data: IO and arithmetic for control/meta-data is provided for within the language:

either directly by extending the language with operations over all types of control-data
or with a separate set of modules that never ever process user data and my be invoked totally differently.

Question: How is the separation between user-data and meta/control-data solved in other coordination languages?

User-data becomes control-data:

Even in a purely compositional language, at some point user-data has to influence the control flow. In CHAIMS we solved that by transforming booleans into decisionflags. Another possibility would be to introduce special megamodules with methods that take as input parameters user-data and return as output parameters control-data.
Question: Would we also need the transformation from control-data into user-data?

5. CHAIMS-types and type checking

In CHAIMS we have defined CHAIMS types: integer, string, boolean, datetime, opaque. There are two reasons for that:

The purpose of these types is that a megaprogrammer can use the helper math module for arithmetic over user-data of simple type (integer, string, boolean, datetime). Without knowing from the repository (where we specify the CHAIMS type of each parameter of a megamodule method) that certain user-data is of a specific simple type it would not be possible to use a general math megamodule provided with the CHAIMS system, but each megamodule provider would have to provide math megamodules for all the parameter names that a megaprogrammer would want to do general arithmetic with, or he would have to provide SYNONYM statements in the repository for all parameters that could be used in general helper math megamodule.
Having simple CHAIMS types allows that user data of simple CHAIMS-types becomes control-data and that computation on that control data can take place (e.g. calculating cost functions based on data received from megamodules with EXTRACT).
Having simple CHAIMS-types also allows that we compute on meta-data from ESTIMATE in the same way as on other user- or control-data received by EXTRACT. Having no simple CHAIMS-types would require a split between user-data and meta-data in CLAM and the whole CHAIMS system as outlined above.

Question: Do we need all the basic CHAIMS-types we have defined so far? (The set of basic CHAIMS-types can and already is a subset of the set of simple types found in gentypes).

Nasty: We have here really a clash between two paradigms (naming with implicit typing versus explicit typing). On the level of CLAM, most data handled by CHAIMS, i.e. all user-data of CHAIMS-type opaque, is opaque and its type is given implicitly by the name of the data. Correct programming and type checking is only possible indirectly over the name of the parameter: if two methods of the same megamodule have parameters with the same name, we know that these parameters are of the same type, though we do not know what the type is (parameter names are specified in the repository and used in the megaprogram). The same is true for parameters that are specified in the repository to be synonyms: these parameters are of the same type (Examples: SYNONYM BestRouteMM.Routes = RouteInfoMM.Routes, SYNONYM BestRouteMM.TravelCost1 = AirMM.TravelCost). So far so good. With the introduction of a general math-megamodule and of meta-data we break this principle: the general math-megamodule offers operations not just for a few parameters of specific name, but for all parameters of specific types. In order to avoid a long list of SYNONYM statements in the repository, we switch over to specifying the types explicitly for all parameters that could be used as input to the math-megamodule. For ESTIMATE the situation is similar: instead of listing SYNONYM statements for all methods of all megamodules (especially all helper math megamodules) that could take fee, time or datavolume as input parameter, we specify explicitly their types.

Question: This is really nasty. Should not we just get rid of basic CHAIMS-types, and use the concept of implicit types via names everywhere? The cost would be:

More SYNONYMS in the CHAIMS repository (not too bad for just one helper math module, worse if more generally usable modules are added either by us or providers).
We require (what we decided some weeks ago anyway) that the order of parameters with the same name matters.

CMathS.Sub(IN int, IN int, RES int)

(myres=int)=math.INVEX ("Sub", int=oper1, int=4)

myres = oper1 - 4

SYNONYM statements also for the data received by ESTIMATE
We would have either to introduce a second invocation primitive into the protocol: primitive DECISION that works similar to INVOKE but returns a decisionflag that can be used in boolean expressions in IF and WHILE, or to introduce a special SYNONYM to decisionflag.
- user-data could only become control-data via either SYNONYM statements in the repository or special megamodules that return a decisionflag

Type checking

So far, we do not do any type checking. We could do some type checking even for opaque types by taking into account the SYNONYM statements in the repository. Type checking is not easily possible for megamodules like the general IO-megamodule: input and output parameters of the method Ask can be anything, the type of the output parameter is only determined by the type of the input parameter, not by any SYNONYM statements (which would encompass all parameter names anyway).