Model free scheduling of invocation, extract and other primitives in CHAIMS
Dorothea Beringer, Laurence Melloul, Neal Sample, Gio Wiederhold
Draft, Jan 7, 1998, by db
Abstract
1. Introduction
2. Optimization objectives
The purpose of the scheduling of primitives in CHAIMS is:
-
Minimal overall execution time of a megaprogram.
Factors to the execution time are:
-
the effective time needed by an invoked remote method to provide
the desired result
-
the time needed to transfer invocation parameters to and results back from
the megamodule
The execution time of the calls to the megamodule as well as of statements
executed in the client (e.g. assignments) are considered as neglectable.
Therefore we treat e.g., ESTIMATE statements as having no execution time.
(Does everybody agree with that?).
-
Minimal overall fee from the execution of a megaprogram.
Factors for fee are:
-
the fee charged for the providing of results by a remote method (per invocation,
or per result)
-
the fee charged for the recourses needed by an active invocation (per time)
-
the fee charged for the transfer of input parameters or results over the
network
To start with, we will only take into account the first two sources for
fees.
In cases where these two things are contradictory, a cost function is necessary
to determine the end-user specific optimum.
Techniques for reducing overall execution time:
-
exploiting the inherent parallelism of remote methods
-
invoking methods even if it is not yet known if their results are needed
-
direct dataflows between megamodules (deferred: we do not yet investigate
this issue in depth in the following, but we should keep it in mind)
Techniques for reducing overall fees:
-
invoke methods only if it is clear that their results are needed
-
terminate invocations as soon as they are no longer needed (saves resources
at the megamodule side which could also be charged to the client)
3. Preprocessing
A scheduling preprocessor reads a megaprogram and writes a new, hopefully
optimized, megaprogram. In the new megaprogram statements may have been
moved around as well as new statements introduced.
Simple optimizations:
-
Moving all EXTRACT statements down to where its results are used by another
statement (most often an IF or an INVOKE statement).
-
Moving all INVOKE statements up as far as possible.
Problem: this simple algorithm does not take into account the actual
time of invocations, so it does not produce good optimizations.
Problem: cannot easily be done across blocks of statements, e.g. moving
outside an IF or WHILE block.
-
Adding TERMINATE statements whenever no more results will be extracted
from an invocation.
Problem: Additional information must be taken into account in order
to determine if an invocation may be actually terminated or not, see also
minutes of various CHAIMS meetings.
Implemented in the existing preprocessor, see also readmes to that preprocessor.
Experiences: not yet integrated, so no experimental results exist.
Optimization with cost function:
-
Moving an INVOKE statement up and outside an IF-block, based on the evaluation
of a cost function. For the cost function ESTIMATE statements have to be
inserted for the method whose invocation statement is under consideration
as well as for all other methods that are invoked as well as extracted
between the IF-block and the new location of the INVOCATION statement.
Preprocessor with access to megamodules
Much more optimization possibilities arise when the preprocessor can execute
ESTIMATE calls to the megamodules.
-
Based on the results of ESTIMATE calls the order of EXTRACT and INVOKE
statements can reflect the estimated duration of the various methods, thus
overcoming the problem mentioned under simple optimization.
-
Insertion of rescheduling points for runtime environment (see NGS proposal)
Other possibilities
yet to be investigated...
Limits of a pure preprocessing approach to scheduling
yet to be listed....
4. Runtime with graph
Runtime scheduling in an execution machine based on a graph of the megamodule
is totally different from the idea of optimizing a sequential program by
rearranging and maybe adding statements. The megaprogram is a mere collection
of statements, some of these statements are related to each other by preconditions
on their possible order. These preconditions are given mainly by the dataflow
(only by the dataflow?). The runtime system takes a megaprogram as input,
converts it into a dependancy graph of statements, and then executes that
graph.
The setup of runtime scheduling
Figure: program - parser - interpreter and scheduler
...
The megaprogram graph
The megaprogram is represented as an (acylic?) directed graph.
Nodes: represent either a statement or a sub graph
Edges: represent dependencies, edges can have additional conditions.
Conditions on edges:
-
The default condition of edges between an invoke and an extract statement
of the same invocation is that the extract only takes place if the results
to be extracted are ready.
-
When creating the tree, all EXWDONE statements can be replaced by an EXTRACT
statement, connected by an edge to the corresponding INVOKE statement.
This edge has the default condition mentioned before.
....
How to represent IF and WHILE blocks? How to handle the cases where
a statement could be executed already earlier on, but it is not yet clear,
if its results are needed?
Should we rather take a variant of a Petri Net?
Complexity of this approach?
Execution of the graph
The runtime system typcically has several ongoing invocations, and waits
for one of them to fulfill the conditions of one or several edges so the
next node (most often an INVOKE statement) can be executed. As CPAM does
not know any callbacks from a megamodule to a client, these testing the
readiness of invocations is up to the runtime system. Easiest way is probably
polling.
Whenever the graph contains a serie of statements executed locally
(e.g. assignments), these are executed without further optimization considerations,
as we assume that their time consumption is neglectable.
5. Scheduling in other systems
Compilers
who is adding something here?
Workflow Management Systems
who is adding something here?
Disclaimer
This is a paper in progress, many more cases have to be considered!
Add ons and input is welcome!