Dataflows in CHAIMS
Draft 1, 11/2/98
What is user-data and what is meta-data/control-data in CLAM?
Can user-data become control-data?
Type checking?
Comparison and arithmetic over user-data in CLAM?
Format of user-data and control-data in CPAM?
Decisions (Dec 1):
-
We keep the current type system.
-
User data can become control data, and vice versa.
-
CLAM supports boolean expressions.
-
CLAM allows comparison between all simple CHAIMS types. For the CHAIMS
compiler this means that it has to figure out if a value is in gentype
format or not, and if it is in gentype format, the compiler first has to
generate the code for extracting the real value before it can generate
the comparison statement.
1. User-data and meta-data in CLAM
User-data and meta-data
CLAM is a composition language. Arithmetic and IO-functions are pushed
down into megamodules. A megaprogram written in CLAM merely composes the
methods offered by megamodules. Composition in CLAM consists of two aspects:
-
invoking methods and routing data from one megamodule to another
one (setting up connection, setting parameters, extracting results, ...)
-
determining the flow of control (while and if statements)
The data that flows between megamodules and the megaprogram falls into
three categories:
-
Meta-data returned by EXAMINE and ESTIMATE: this data is only
used for determining the flow of control, (we could also call it control-data).
Question: Is the meta-data from EXAMINE and from ESTIMATE
really of the same quality?
No: EXAMINE just gives back some flags, the most they can be used for
are comparison and boolean expressions in the IF and WHILE statements
(CPAM: no gentypes, just an enumeration type). ESTIMATE gives back "real"
data of which we know the type, namely integer for fee, datetime for duration,
integer for datavolume (CPAM: gentypes). As CHAIMS itself does not offer
any arithmetic or IO, this data might be routed on to megamodules for display
and requests, for complex comparisons and for arithmetic (helper modules
provided by the CHAIMS-system as well as specific megamodules provided
with a suite of megamodules).
-
Data returned by EXTRACT not used for determining the flow
of control. This data is routed on to other megamodules (user-data).
-
Data returned by EXTRACT that is used for control flow, and like
data from ESTIMATE may be routed on to helper modules
that get necessary control decisions from the user or to helper modules
for further arithmetic and decisions (user-data becoming meta-data).
At which point such user-data becomes meta-data/control-data varies, it
depends on when a megaprogrammer perceives it as user-data, and when as
meta-data.
The seperation into meta-data and user-data is fuzzy, and also depends
on the point of view: data considered meta-data in the megaprogram (e.g.
results from ESTIMATE) is user-data from the point of view of a megamodule
doing IO or computation on it (e.g. calculating cost function)!
Tricky: Also from helper modules (even the ones making
decisions and comparisons) we extract the result (e.g. a boolean)
with EXTRACT, exactly in the same way as we extract results from any other
megamodule. In order to avoid infinite recursion we have to allow
somehow testing of user-data coming from an EXTRACT within the CHAIMS
language.
Example of a piece of code that we cannot yet execute because
CLAM does not offer any operations for investigating blobs, even not for
comp_res
(an encoded gentype containing a boolean):
.... (myfee1 = fee) = best1_mmh.ESTIMATE("Best_Route")
(myfee2 = fee) = best2_mmh.ESTIMATE("Best_Route")
smaller_ih = math_mmh.INVOKE("Smaller", Arg1 = myfee1,
Arg2 = myfee2)
WHILE (smaller_ih.EXAMINE() != DONE) {}
(comp_res = ResultB) = smaller_ih.EXTRACT()
IF (comp_res = TRUE) THEN ....
The repository entries for this example are the following:
METHOD CMathS.Smaller(IN Arg1, IN Arg2, RES ResultB)
PARAM CMathS.ResultB BOOLEAN /*boolean result of a comparison
The return values of INVOKE and SETUP are just handles.
A handle is meta-data that can neither be used as input to a megamodule
method nor can it be used for determining the control flow, so we
do not consider handles any further in our discussion here.
Question: Can we imagine any scenarios where this statement
is not true, where we have operations over handles?
Where are the problems we have?
-
Flags returned by EXAMINE are always meta-data, and are used directly as
in boolean expressions in WHILE and IF statements. In contrast to
that, booleans returned by EXTRACT are normally user data for other
megamodules, but sometimes become meta-data that should be used directly
in WHILE and IF statements like the flags of EXAMINE (not yet possible).
-
The testing of flags from EXAMINE and of booleans from EXTRACT in the WHILE
and IF statements cannot be delegated to helper megamodules (otherwise
we end up with infinite recursions). We can however delegate all
operations over all other types of meta-data (e.g. calculating cost
functions over data from type datetime or integer received by ESTIMATE)
to megamodules.
-
All data received by EXAMINE or EXTRACT is encoded in gentypes (even if
it is of a simple CHAIMS type). This is necessary so that data can be used
as input for other methods. Yet there is currently no way how a megaprogram
can investigate a gentype ==> even checking the value of a gentype containing
a boolean is not possible.
-
We could enlarge CLAM by operations over all simple CHAIMS-type (which
would automatically include user-data as well as control-data, as neither
CLAM nor the CHAIMS -architecture allows to distinguish them in all cases).
Yet this is contradictory to the goal of having a purely compositional
language by pushing down arithmetic and IO to megamodules. CLAM would be
become a special C-language.
-
Introducing that the compiler converts gentypes appearing in comparisons
back to normal booleans has also negative consequences like:
- How does the compiler find out when a variable
name contain a gentype-boolean, when a normal boolean?
- No type checking possible, e.g. the IO-module
has as type definition opaque for the results of the ask method, the actual
type of the result is determined by the default value given as parameter
when invoking ask. Yet the ask method is needed for the control flow to
ask the user what he wants to do.
-
We do not have any separation between meta-data and user-data. The goals
of CHAIMS so far have been to push down all arithmetic and IO into megamodules,
to have ESTIMATE, to allow to user also results of EXTRACT for determining
the control flow, and to use the same megamodules for all data, meta-data
as well as user-data (see section 4 for more about this).
2. First attempt of a solution:
The original objectives (and current implementation) were (among others):
-
to push down all processing (arithmetic and logic operations, IO) into
megamodules
-
to have exactly one way of communication with megamodules (current CPAM
protocol) independent of what kind of data is processed (user- data or
control-data).
-
to have opaque user data and some simple CHAIMS types to allow somehow
data received from a megamodule being used for control
-
clearly separate the user-data view from the composition (and control-data)
view
Due to infinite recursion objective 1 cannot be completely fulfilled. Objective
4 and objective 3 are contradictory. Objective 2 requires that user- and
control-data is treated in the same way, and that at some point user-data
becomes control-data.
Solution one tries to build on above objectives and on the current CHAIMS
system and tries to avoid a complete redefinition of CLAM and CPAM and
a redesign of the CHAIMS system.
Approach:
-
CLAM distinguishes between the simple CHAIMS-type boolean
(yser-data or meta-data implemented as a gentype) and decisionflag
(implemented as a normal boolean). A gentype-boolean is what can be returned
by an EXTRACT. A decisionflag is the result of applying the operators ==
and != to an operand of type gentype-boolean and a keyword or an
invocationstatus and a keyword. A decisionflag is also the result of a
boolean expression and can be used in IF and WHILE statements.
-
Invocationstatus as well as decisionflag cannot
be used as input parameters for megamodule methods.
-
TRUE and FALSE are literals for both decisionflag as well as boolean.
-
The transition between user-data and control-data can only take place by
transforming
booleans (i.e. gentype-booleans that are returned by EXTRACT) into decisionflags.
This transformation automatically takes place whenever we have in CLAM
one of the following statements where
res is always a gentype-boolean:
(res = result)
= h1.EXTRACT()
IF (res ==
TRUE) ....
... res ==
FALSE....
....res !=
TRUE...
... res !=
FALSE ....
In all these cases where res is a gentype-boolean the compiler
transforms the gentype-boolean into a decisionflag (normal boolean in C)
and the result of the comparison is a decisionflag.
-
Also the result of comparing an invocationstatus with an invocationstatus
literal results in a decisionflag, and can thus be used in WHILE and IF
statements, e.g.:
status = h1.EXAMINE()
IF (status
==DONE) ....
IF (status
==ERROR) .....
WHILE (status
!=DONE) ....
-
CLAM allows boolean expressions only over decisionflags and knows
the logical operators | , &, - , ==, != (and, or, not , equal, not-equal),
e.g.:
WHILE ((stat1
==DONE) | (stat2 == stat5) & (-(stat3 ==ERROR) | (stat4 ==ERROR)))
.....
expr = (res1
== TRUE) & (res2 == FALSE) | expr2
IF (expr)
....
The operands are decisionflags, and the result is again a decisionflag.
Consequences of this approach:
-
All operations with the exception of comparison for invocationstatus
and booleans resulting in decisionflags and boolean expressions over decisionflags
are pushed down into megamodules. This includes also operations like
comparisons of data of other CHAIMS types than boolean, even if it is meta-data.
In order to avoid too akward coding, the shortcut INVEX can be used.
-
The math helper module offers operations (arithmetic as well as logical)
over all simple CHAIMS types, i.e. also over gentype-booleans. Because
it is a megamodule, the operands as well as the results are all encoded
gentypes and can thus be used as input for other methods. The only result
than can be used for controlling the control flow are booleans, as they
can be transformed into decisionflags in the way described above.
-
A decisionflag cannot be used as an input parameter for a megamodule. I.e.,
the comparison operators and boolean expressions offered by CLAM are at
the end of any data processing chain and their result can only be used
in IF and WHILE statements in order to determine the control flow. Boolean
operators cannot be misused for operations on user-data; CLAM remains
a nearly purely compositional language, even concerning operations over
meta-data that is later on used for determining the control flow.
Question: If we need certain booleans later on, we can simply
copy them into a boolean variable for later, or do logical arithmetic via
the math helper module instead of directly in a boolean expression of CLAM.
How is it for invocationstatus? Do we have cases where invocationstatus
needs to be used as input parameter of a megamodule method? So far we have
assumed no, and thus decided not to encode it in a gentype.
-
Results of ESTIMATE cannot be directly compared or tested in CLAM,
a math helper module must be used for that. It returns gentype-booleans
which can be transformed into decisionflags. We thus preserve the
composition-only character and the small set of primitives in CLAM.
-
Data format in CPAM: all meta- and user-data is encoded into gentypes,
apart from the invocationstatus returned by EXAMINE.
-
Compiler: |, &, - have always decisionflags as operands, so
the generated C-statements are the same as the CLAM-statements. If == and
!= appear between two variable names, then the variables are decisionflags.
If == and != appear between a variable name and one of the keywords ERROR,
DONE, NOT_DONE, then the first operand is an invocationstatus (e.g. an
integer). If == and != appear between a variable name and one of the keywords
TRUE, FALSE, then the first operand is either a decisionflag or a gentype-boolean.
In case of a gentype-boolean the operand has to be converted into a normal
boolean in the generated C-code.
-
Data-view, composition-view: Apart from the exception of boolean
results (which can become decisionflags), there remains a strict separation
between data-view and composition-view.
-
Type checking: at the best a graceful abort at execution time
CHAIMS focuses on composition, with CLAM as a composition-only language.
By pushing down all operations we get a language and a system that executes
composition of processes over user-data as well as composition of processes
over meta-data/control-data. Various megamodules, e.g. math helper module
and IO-megamodule can be used for any kind of data. Example IO-megamodule:
- A user gets asked for the cities between which he wants
to travel (input and output is user-data).
- A user gets presented three possible ways to go from
A to B and decides which one he wants to make a reservation for (input
user-data, output meta-data or user-data depending on megamodules).
- A user has to decide if he is satisfied with the routes
found so far, or if the program should continue (input user-data, output
meta-data).
- A user has to decide - based on meta-data received
by ESTIMATE - which megamodule he prefers (input meta-data, output meta-data).
Variant to simplify compiler
In order to avoid that the compiler has to determine if in a statement
like (res==TRUE) res is a decisionflag or a gentype-boolean, we
could require that we write (res==TRUE) for gentype-booleans, and
just res for decisionflags.
3. Other solutions?
Does anybody have other solutions?
4. What is "purely compositional"?
or traditional versus no arithmetic/IO
versus
separation of user- and meta-data
Having a procedural program that composes software always means having
a flow of control. In this flow of control decisions must be made, based
on data. In a traditional language no difference between control-
and user-data is made - all data returned by components can be used for
determining the flow of control, and the same operations apply for any
data. In fact, what in a function A is user-data may be control-data in
a function B called by A, and vice versa.
Question: How does the situation look like for functional or
logic programming?
In a purely compositional language we essentially have two possibilities:
-
No functions for arithmetic and IO at all: approach taken by CHAIMS,
with the exception of boolean expressions and comparison for invocationstatus
and booleansvia decisionflags (see approach suggested in section 2). This
is necessary, because we use the same megamodules for IO and
processing for control- as well as user-data. We allow the transformation
of booleans into decisionflags. Decisionflags are only used for boolean
expressions in IF and WHILE statements, so they never need to be transformed
back into booleans that could be used as input parameters for megamodule
methods.
-
Clear separation of user-data and control-data: IO and arithmetic
for control/meta-data is provided for within the language:
-
either directly by extending the language with operations over all
types of control-data
-
or with a separate set of modules that never ever process
user data and my be invoked totally differently.
User-megamodule methods may have a special set of input-parameters that
contain only control -data (i.e. this data comes from the composition system
and never from another user megamodule). When user-data has to influence
the flow of control, a separate set of megamodules is required that takes
user-data as input and returns control-data. This approach would require
three IO-megamodules (or tripling the methods of our IO-megamodule), additional
helper math modules or not providing general math for user-data,
etc. Workflow management systems typically choose such an approach, though
they of course combine the various interfaces in a problem specific manner.
In CHAIMS we do not strictly separate user-data and meta-data.
Question: How is the separation between user-data and meta/control-data
solved in other coordination languages?
User-data becomes control-data:
Even in a purely compositional language, at some point user-data has to
influence the control flow. In CHAIMS we solved that by transforming booleans
into decisionflags. Another possibility would be to introduce special megamodules
with methods that take as input parameters user-data and return as output
parameters control-data.
Question: Would we also need the transformation from control-data
into user-data?
5. CHAIMS-types and type checking
In CHAIMS we have defined CHAIMS types: integer, string, boolean,
datetime, opaque. There are two reasons for that:
-
The purpose of these types is that a megaprogrammer can use the helper
math module for arithmetic over user-data of simple type (integer,
string, boolean, datetime). Without knowing from the repository (where
we specify the CHAIMS type of each parameter of a megamodule method) that
certain user-data is of a specific simple type it would not be possible
to use a general math megamodule provided with the CHAIMS system, but each
megamodule provider would have to provide math megamodules for all the
parameter names that a megaprogrammer would want to do general arithmetic
with, or he would have to provide SYNONYM statements in the repository
for all parameters that could be used in general helper math megamodule.
-
Having simple CHAIMS types allows that user data of simple CHAIMS-types
becomes control-data and that computation on that control data can
take place (e.g. calculating cost functions based on data received from
megamodules with EXTRACT).
-
Having simple CHAIMS-types also allows that we compute on meta-data
from ESTIMATE in the same way as on other user- or control-data received
by EXTRACT. Having no simple CHAIMS-types would require a split between
user-data and meta-data in CLAM and the whole CHAIMS system as outlined
above.
Question: Do we need all the basic CHAIMS-types we have defined
so far? (The set of basic CHAIMS-types can and already is a subset of the
set of simple types found in gentypes).
Nasty: We have here really a clash between two paradigms (naming
with implicit typing versus explicit typing). On the level of
CLAM, most data handled by CHAIMS, i.e. all user-data of CHAIMS-type opaque,
is opaque and its type is given implicitly by the name of the data. Correct
programming and type checking is only possible indirectly over the name
of the parameter: if two methods of the same megamodule have parameters
with the same name, we know that these parameters are of the same type,
though we do not know what the type is (parameter names are specified in
the repository and used in the megaprogram). The same is true for parameters
that are specified in the repository to be synonyms: these parameters are
of the same type (Examples: SYNONYM BestRouteMM.Routes =
RouteInfoMM.Routes, SYNONYM BestRouteMM.TravelCost1 = AirMM.TravelCost).
So
far so good. With the introduction of a general math-megamodule and of
meta-data we break this principle: the general math-megamodule offers operations
not just for a few parameters of specific name, but for all parameters
of specific types. In order to avoid a long list of SYNONYM statements
in the repository, we switch over to specifying the types explicitly for
all parameters that could be used as input to the math-megamodule. For
ESTIMATE the situation is similar: instead of listing SYNONYM statements
for all methods of all megamodules (especially all helper math megamodules)
that could take fee, time or datavolume as input parameter, we specify
explicitly their types.
Question: This is really nasty. Should not we just get rid of
basic CHAIMS-types, and use the concept of implicit types via names everywhere?
The cost would be:
-
More SYNONYMS in the CHAIMS repository (not too bad for just one helper
math module, worse if more generally usable modules are added either by
us or providers).
-
We require (what we decided some weeks ago anyway) that the order of parameters
with the same name matters.
Example: CMathS.Sub(IN int, IN int, RES int) in the repository,
(myres=int)=math.INVEX
("Sub", int=oper1, int=4) in the megaprogram ==> myres = oper1 -
4 in the megamodule.
-
SYNONYM statements also for the data received by ESTIMATE
-
We would have either to introduce a second invocation primitive
into the protocol: primitive DECISION that works similar to INVOKE but
returns a decisionflag that can be used in boolean expressions in IF and
WHILE, or to introduce a special SYNONYM to decisionflag.
-
- user-data could only become control-data via either SYNONYM statements
in the repository or special megamodules that return a decisionflag
Type checking
So far, we do not do any type checking. We could do some type checking
even for opaque types by taking into account the SYNONYM statements in
the repository. Type checking is not easily possible for megamodules like
the general IO-megamodule: input and output parameters of the method
Ask can be anything, the type of the output parameter is only determined
by the type of the input parameter, not by any SYNONYM statements (which
would encompass all parameter names anyway).