STREAM Functionality

STREAM supports declarative continuous queries over two types of inputs: streams and relations. A continuous query is simply a long-running query, which produces output in a continuous fashion as the input arrives. The queries are expressed in a language called CQL, which is described in this technical report.


Streams and Relations

Streams and relations are defined using some ordered time domain, which may or may not be related to wall-clock time.

Stream: A stream is a sequence of timestamped tuples. There could be more than one tuple with the same timestamp. The tuples of an input stream are required to arrive at the system in the order of increasing timestamps. A stream has an associated schema consisting of a set of named attributes, and all tuples of the stream conform to the schema.

Relation: A relation is time-varying bag of tuples. Here ``time'' refers to an instant in the time domain. Input relations are presented to the system as a sequence of timestamped updates which capture how the relation changes over time. An update is either a tuple insertion or a tuple deletion. The updates are required to arrive at the system in the order of increasing timestamps. Like streams, relations have a fixed schema to which all tuples conform.

Note that the timestamp ordering requirement is specific to one stream or a relation. For example, tuples of different streams could be arbitrarily interleaved.


Output of a CQL Query

The output of a CQL query is a stream or relation depending on the query. The output is produced in a continuous fashion as described below:

  1. If the output is a stream, the tuples of the stream are produced in the order of increasing timestamps. The tuples with timestamp t are produced once all the input stream tuples and relation updates with timestamps <=t have arrived.

  2. If the output is a relation, the relation is represented as a sequence of timestamped updates (just like the input relations). The updates are produced in the order of increasing timestamps, and updates with timestamp t are produced once all input stream tuples and relation updates with timestamps <=t have arrived.

Note that the representation of a relation as a sequence of timestamped updates is not unique, since tuple insertions and tuple deletions with the same timestamp and the same tuple value cancel each other. If the output of a query is a relation, the STREAM system does not specify which of the several possible representations of a relation is produced.


Example CQL Queries

It is beyond the scope of this document to provide a detailed introduction to CQL. We refer the reader to the CQL technical report for all the gory details. Instead, we present a few examples here to illustrate common query forms. CQL derives its syntax and semantics from SQL, so the following discussion should be self-contained with a knowledge of SQL.

Filters over streams: The following query selects out all tuples of a stream S whose A attribute is larger than 10.

    Select *
    From S
    Where A > 10;

Stream projections: If the schema of the stream S is S(A, B, C, D), the following query projects out the first, third, and the square of the fourth attribute of S.

    Select A, C, D*D
    From S;

Sliding-window aggregations: The following query computes the average value of stocks over the previous 15 minutes. Note that since this is a continuous query, so the average value is computed (at least conceptually) at every time instant over the previous 15 minutes. The output of this query is a relation; at every instant the relation contains a single tuple with a single attribute containing the average value over the last 15 minutes.

    Select Avg(value)
    From Stocks [Range 15 Minutes];

Streaming an aggregation: The following query is identical to the previous query, except that the average value is streamed as output everytime it changes. The Istream operator is a relation-to-stream operator that streams whenever new tuples appear in its input relation, in this case the average over the last 15 minutes.

    Istream(
      Select Avg(value)
      From Stocks [Range 15 Minutes]
    );

Sliding-window joins: The following query maintains the join of the last 10 tuples of stream S1 and the last 10 tuples of stream S2.

    Select *
    From S1 [Rows 10], S2 [Rows 10]
    Where S1.A = S2.A;

CQL Restrictions

STREAM currently does not support all the features of CQL specified in the technical report. In this section, we mention the important features omitted in the current implementation of STREAM. In the next section, we describe how queries which require these features can be specified in an alternate fashion using named intermediate queries (or views). The important omissions are:
  1. Subqueries are not allowed in the Where clause. For example the following query is not supported:
       Select * 
       From S
       Where S.A in (Select R.A
                     From R)
    

  2. The Having clause is not supported, but Group By clause is supported. For example, the following query is not supported:
       Select A, SUM(B)
       From S
       Group By A
       Having MAX(B) > 50
    

  3. Expressions in the Project clause involving aggregations are not supported. For example, the query:
       Select A, (MAX(B) + MIN(B))/2
       From S
       Group By A
    
    is not supported. However, non-aggregated attributes can participate in arbitrary arithmetic expressions in the project clause and the where clause. For example, the following query is supported:
       Select (A + B)/2
       From   S
       Where  (A - B) * (A - B) > 25
    

  4. Attributes can have one of four types: Integer, Float, Char(n), and Byte. Variable length strings (Varchar(n)) are not supported. We do not currently support any casting from one type to another.

  5. Windows with the slide parameter are not supported.

  6. The binary operations Union and Except is supported, but Intersect is not.

A semi-formal representation of the CQL syntax supported by the STREAM server can be found here.

Named Queries

A CQL query can be assigned a name and a schema to allow its result to be referenced by other queries. This feature allows us to express some of the esoteric, omitted features of CQL mentioned in the previous section. The following query is an example of a CQL view (it produces a relation):

  AggS (A, Sum_B, Max_B):

    Select A, SUM(B), MAX(B)
    From S
    Group By A
It can be used in a different query just like an input relation:
    Select A, Sum_B
    From AggS
    Where Max_B > 50
Note that the combination of these two queries produces the same output as the query with a Having clause that we mentioned in the previous section.