Problem for First Part

Suppose you have 2 relations, R(A, B, C) and S(B, C, D, E).

Relation R is stored contiguously on disk, sorted
on attribute A. We have a B-tree index on attribute A.
Assume this index is kept entirely in memory (i.e., you do not need
to read it from disk).

Relation S is also stored contiguously on disk, sorted
on attribute C. We have two indexes:
  * a B-tree index on attribute B
  * a B-tree index on attribute C
Both indexes are kept entirely in memory.

Other relevant data:

1000 tuples of R are stored per block on disk.
T(R) = 1,000,000 (number of tuples of R)
100 tuples of S are stored per block on disk.
T(S) = 200,000 (number of tuples of S)
V(R, A) = 4,000 (distinct values in R for A)
V(S, B) = 40,000
V(S, C) = 400

You want to execute the following query:
    SELECT *
    FROM R, S
    WHERE (R.B=S.B) AND (R.C=S.C)

We present you with two query plans:

Plan 1:
For every block Bl of R, retrieved using the index on A for R
  For every tuple r of Bl
    Use index on B for S to retrieve all tuples s of S such that s.B=r.B
    For each of these tuples s, if s.C=r.C,
       output r.A, r.B, r.C, s.B, s.C, s.D, s.E

Plan 2:
For every block Bl of R, retrieved using the index on A for R
  For every tuple r of B
    Use index on C for S to retrieve all tuples s of S such that s.C=r.C
    For each of these tuples s, if s.B=r.B,
       output r.A, r.B, r.C, s.B, s.C, s.D, s.E

(a) What is the expected number of IOs for Plan 1?
Do NOT include IOs for writing the resulting join tuples.
You may assume that R join S has few or no dangling tuples.
Explain briefly how you get your answer.


(b) What is the expected number of IOs for Plan 2?
Do NOT include IOs for writing the resulting join tuples.
You may assume that R join S has few or no dangling tuples.
Explain briefly how you get your answer.

(c) For what range of T(S) values is Plan 1
better than Plan 2?

---------------------------------------------------------------------------
Problem for Second Part

(a) Can all schedules produced with two-phase locking (2PL)
be also generated with a validation scheme?
That is, can all sequences of read and write actions
that may occur under 2PL also ocur under validation?

If the answer is NO, show a counter-example schedule
(containing reads and writes) that could have been generated
with 2PL (and explain when locks are requested and released).
Then show that this schedule could not have been obtained
by a validation mechanism. (In your schedule, use r_i(A) to represent a read action
by transaction i on some object A, and w_i(A) for a write action.)

If the answer is YES, give an informal proof.

(b) Can all schedules produced by a validation mechnisms
also be generated by a 2PL mechnisms?

If the answer is NO, show a counter-example schedule
(containing reads and writes) that could have been generated
with validation (and explain when transactions validated and finished).
Then show that this schedule could not have been obtained
by a 2PL mechanism.

If the answer is YES, give an informal proof.

(c) Consider now a database with a single object A. Transactions
on this "very small database" have either a single action
(read or write A), or two actions (read A followed by write A).

In this restricted scenario, do your
answers to parts (a) and (b) change?

If the answers do NOT change, explain briefly why your
counter-examples or proofs of parts (a) and (b) still hold.

If the answers do change, construct new counter-examples of proofs
for this restricted scenario.

(d) Continuing with the one-object database, are all 
(2PL and validation based) schedules now recoverable?
If not, give a counter-example; if yes, give an informal proof.