Dataflows in CHAIMS

Draft 1, 11/2/98

What is user-data and what is meta-data/control-data in CLAM?
Can user-data become control-data?
Type checking?
Comparison and arithmetic over user-data in CLAM?
Format of user-data and control-data in CPAM?


 

Decisions (Dec 1):

1.     User-data and meta-data in CLAM

User-data and meta-data

CLAM is a composition language. Arithmetic and IO-functions are pushed down into megamodules. A megaprogram written in CLAM merely composes the methods offered by megamodules. Composition in CLAM consists of two aspects:
  1. invoking methods and routing data from one megamodule to another one (setting up connection, setting parameters, extracting results, ...)
  2. determining the flow of control (while and if statements)
The data that flows between megamodules and the megaprogram falls into three categories:
  1. Meta-data returned by  EXAMINE and ESTIMATE: this data is only used for determining the flow of control, (we could also call it control-data).

  2. Question: Is the meta-data from EXAMINE and from ESTIMATE really of the same quality?
    No: EXAMINE just gives back some flags, the most they can be used for  are  comparison and boolean expressions in the IF and WHILE statements (CPAM: no gentypes, just an enumeration type). ESTIMATE gives back "real" data of which we know the type, namely integer for fee, datetime for duration, integer for datavolume (CPAM: gentypes). As CHAIMS itself does not offer any arithmetic or IO, this data might be routed on to megamodules for display and requests, for complex comparisons and for arithmetic (helper modules provided by the CHAIMS-system as well as specific megamodules provided with a suite of megamodules).
  3. Data returned by EXTRACT  not used for determining the flow of control. This data is routed on to other megamodules (user-data).
  4. Data returned by EXTRACT that is used for control flow, and like data from ESTIMATE   may be routed on to  helper modules that get necessary control decisions from the user or to helper modules for further arithmetic and decisions (user-data becoming meta-data).  At which point such user-data becomes meta-data/control-data varies, it depends on when a megaprogrammer perceives it as user-data, and when as meta-data.
The seperation into meta-data and user-data is fuzzy, and also depends on the point of view: data considered meta-data in the megaprogram (e.g. results from ESTIMATE) is user-data from the point of view of a megamodule doing IO or computation on it (e.g. calculating cost function)!

Tricky: Also from helper modules (even the ones making decisions and comparisons) we extract the result  (e.g. a boolean)  with EXTRACT, exactly in the same way as we extract results from any other megamodule. In order to avoid infinite recursion we have to allow somehow testing of  user-data coming from an EXTRACT within the CHAIMS language.

Example  of a piece of code that we cannot yet execute  because  CLAM does not offer any operations for investigating blobs, even not for comp_res (an encoded gentype containing a boolean):

....  (myfee1 = fee) = best1_mmh.ESTIMATE("Best_Route")
(myfee2 = fee) = best2_mmh.ESTIMATE("Best_Route")
smaller_ih = math_mmh.INVOKE("Smaller", Arg1 = myfee1, Arg2 = myfee2)
WHILE (smaller_ih.EXAMINE() != DONE) {}
    (comp_res = ResultB) = smaller_ih.EXTRACT()
IF (comp_res = TRUE) THEN ....
The repository entries for this example are the following:
            METHOD CMathS.Smaller(IN Arg1, IN Arg2, RES ResultB)
            PARAM CMathS.ResultB BOOLEAN  /*boolean result of a comparison

The return values of INVOKE and SETUP are just handles. A handle is meta-data that can neither be used as input to a megamodule method nor can it be used  for determining the control flow, so we do not consider handles any further in our discussion here.
Question: Can we imagine any scenarios where this statement is not true, where we have operations over handles?
 

Where are the problems we have?

2. First attempt of a solution:

The original objectives (and current implementation) were (among others):
  1. to push down all processing (arithmetic and logic operations, IO) into megamodules
  2. to have exactly one way of communication with megamodules (current CPAM protocol) independent of what kind of data is processed (user- data or control-data).
  3. to have opaque user data and some simple CHAIMS types to allow somehow data received from a megamodule being used for control
  4. clearly separate the user-data view from the composition (and control-data) view
Due to infinite recursion objective 1 cannot be completely fulfilled. Objective 4 and objective 3 are contradictory. Objective 2 requires that user- and control-data is treated in the same way, and that at some point user-data becomes control-data.

Solution one tries to build on above objectives and on the current CHAIMS system and tries to avoid a complete redefinition of CLAM and CPAM and a redesign of the CHAIMS system.

Approach:

Consequences of this approach:

CHAIMS focuses on composition, with CLAM as a composition-only language. By pushing down all operations we get a language and a system that executes composition of processes over user-data as well as composition of processes over meta-data/control-data. Various megamodules, e.g. math helper module and IO-megamodule can be used for any kind of data. Example IO-megamodule:
- A user gets asked for the cities between which he wants to travel (input and output is user-data).
- A user gets presented three possible ways to go from A to B and decides which one he wants to make a reservation for (input user-data, output meta-data or user-data depending on megamodules).
- A user has to decide if he is satisfied with the routes found so far, or if the program should continue (input user-data, output meta-data).
- A user has to decide - based on meta-data received by ESTIMATE - which megamodule he prefers (input meta-data, output meta-data).

Variant to simplify compiler

In order to avoid that the compiler has to determine if in a statement like (res==TRUE) res is a decisionflag or a gentype-boolean, we could require that we write (res==TRUE) for gentype-booleans, and just res for decisionflags.

3.    Other solutions?

Does anybody have other solutions?
 
 
 
 

4.    What is "purely compositional"?

or traditional  versus  no arithmetic/IO versus separation of user- and meta-data
Having a procedural program that composes software always means having a flow of control. In this flow of control decisions must be made, based on data. In a traditional language no difference between control- and user-data is made - all data returned by components can be used for determining the flow of control, and the same operations apply for any data. In fact, what in a function A is user-data may be control-data in a function B called by A, and vice versa.
Question: How does the situation look like for functional or logic programming?

In a purely compositional language we essentially have two possibilities:

  1. No functions for arithmetic and IO at all: approach taken by CHAIMS, with the exception of boolean expressions and comparison for invocationstatus and booleansvia decisionflags (see approach suggested in section 2). This is necessary, because we use the same megamodules for  IO and processing for control- as well as user-data. We allow the transformation of booleans into decisionflags. Decisionflags are only used for boolean expressions in IF and WHILE statements, so they never need to be transformed back into booleans that could be used as input parameters for megamodule methods.
  2. Clear separation of user-data and control-data: IO and arithmetic for control/meta-data is provided for within the language:
    1. either directly by extending the language with operations over all types of control-data
    2. or with  a separate set of modules that never ever process user data and my be invoked totally differently.
    User-megamodule methods may have a special set of input-parameters that contain only control -data (i.e. this data comes from the composition system and never from another user megamodule). When user-data has to influence the flow of control, a separate set of megamodules is required that takes user-data as input and returns control-data. This approach would require three IO-megamodules (or tripling the methods of our IO-megamodule), additional helper math modules  or not providing general math for user-data, etc. Workflow management systems typically choose such an approach, though they of course combine the various interfaces in a problem specific manner. In CHAIMS we do not strictly separate user-data and meta-data.


Question: How is the separation between user-data and meta/control-data solved in other coordination languages?

User-data becomes control-data:

Even in a purely compositional language, at some point user-data has to influence the control flow. In CHAIMS we solved that by transforming booleans into decisionflags. Another possibility would be to introduce special megamodules with methods that take as input parameters user-data and return as output parameters control-data.
Question: Would we also need the transformation from control-data into user-data?
 

5.    CHAIMS-types and type checking

In CHAIMS we have defined  CHAIMS types: integer, string, boolean, datetime, opaque.  There are two reasons for that:
  1. The purpose of these types is that a megaprogrammer can use the helper math module for arithmetic over user-data of simple type (integer, string, boolean, datetime). Without knowing from the repository (where we specify the CHAIMS type of each parameter of a megamodule method) that certain user-data is of a specific simple type it would not be possible to use a general math megamodule provided with the CHAIMS system, but each megamodule provider would have to provide math megamodules for all the parameter names that a megaprogrammer would want to do general arithmetic with, or he would have to provide SYNONYM statements in the repository for all parameters that could be used in general helper math megamodule.
  2. Having simple CHAIMS types allows that user data of simple CHAIMS-types becomes control-data and that computation on that control data can take place (e.g. calculating cost functions based on data received from megamodules with EXTRACT).
  3. Having simple CHAIMS-types also allows that we compute on meta-data from ESTIMATE in the same way as on other user- or control-data received by EXTRACT. Having no simple CHAIMS-types would require a split between user-data and meta-data in CLAM and the whole CHAIMS system as outlined above.
Question: Do we need all the basic CHAIMS-types we have defined so far? (The set of basic CHAIMS-types can and already is a subset of the set of simple types found in gentypes).

Nasty: We have here really a clash between two paradigms (naming with implicit typing versus explicit typing). On the level of CLAM, most data handled by CHAIMS, i.e. all user-data of CHAIMS-type opaque, is opaque and its type is given implicitly by the name of the data. Correct programming and type checking is only possible indirectly over the name of the parameter: if two methods of the same megamodule have parameters with the same name, we know that these parameters are of the same type, though we do not know what the type is (parameter names are specified in the repository and used in the megaprogram). The same is true for parameters that are specified in the repository to be synonyms: these parameters are of the same type (Examples: SYNONYM BestRouteMM.Routes = RouteInfoMM.Routes, SYNONYM BestRouteMM.TravelCost1 = AirMM.TravelCost). So far so good. With the introduction of a general math-megamodule and of meta-data we break this principle: the general math-megamodule offers operations not just for a few parameters of specific name, but for all parameters of specific types. In order to avoid a long list of SYNONYM statements in the repository, we switch over to specifying the types explicitly for all parameters that could be used as input to the math-megamodule. For ESTIMATE the situation is similar: instead of listing SYNONYM statements for all methods of all megamodules (especially all helper math megamodules) that could take fee, time or datavolume as input parameter, we specify explicitly their types.

Question: This is really nasty. Should not we just get rid of basic CHAIMS-types, and use the concept of implicit types via names everywhere? The cost would be:

Type checking

So far, we do not do any type checking. We could do some type checking even for opaque types by taking into account the SYNONYM statements in the repository. Type checking is not easily possible for megamodules like the general IO-megamodule: input  and output parameters of the method Ask can be anything, the type of the output parameter is only determined by the type of the input parameter, not by any SYNONYM statements (which would encompass all parameter names anyway).