Date | Topics | Notes | Readings |
---|---|---|---|
4/4/2001 | Introduction, Inverted indexes, Issues in building such indexes, Course administrivia | [powerpoint] [pdf (large)] [pdf (small)] |
MG Ch. 3, MIR Ch. 7.2 Porter's stemmer Shakespeare plays |
4/9/2001 | Inverted index storage, Boolean queries, Wild-card queries, Positional/phrase queries, Evaluating IR systems | [powerpoint] [pdf (large)] [pdf (small)] |
MG Ch. 4, MIR Ch. 3 Princeton Wordnet |
4/11/2001 | Section: IR project, VDK software | [powerpoint] [html] |
Installing VDK course page |
4/16/2001 | Index construction, Dynamic indices (updating), Term weighting, Vector space indices | [powerpoint] [pdf (large)] [pdf (small)] |
MG Ch. 5, MIR Ch. 2.5.3 |
4/18/2001 | Computing cosine-based ranking, Speeding up cosine ranking (Sampling and pre-grouping, Latent semantic indexing, Random projection) | [powerpoint] [pdf (large)] [pdf (small)] |
MG Ch. 4.6, MIR Ch. 2.7.2 Random projection theorem Faster random projection Latent semantic indexing |
4/18/2001 | Section: IR project 2, VDK software | [powerpoint] [html] |
none |
4/23/2001 | Generalized query operators, Bayesian nets for Text Retrieval, Structured+ Unstructured queries | [powerpoint] [pdf (large)] [pdf (small)] |
MIR Ch. 2.6, 2.8 Bayesian Resources |
4/25/2001 | Link-based ranking in web search engines | [powerpoint] [pdf (large)] [pdf (small)] |
MIR 13 Anatomy of a large-scale hypertextual web search engine Authoritative sources in a hyperlinked environment Hypersearching the Web Dubhashi resource collection covering recent topics |
4/30/2001 | Rest of web ranking, Peer-to-peer search, Search deployment models, Review of search topics | [powerpoint] [pdf (large)] [pdf (small)] |
MIR 9 |
5/7/2001 | Document Clustering | [powerpoint] [pdf (large)] [pdf (small)] |
Yale results clustering demo |
5/9/2001 | Automatic document classification | [powerpoint] [pdf (large)] [pdf (small)] |
Resources for the lecture |
5/14/2001 | Centroid/Nearest-neighbor classification, Bayesian classification, Link-based classification, Document summarization | [powerpoint] [pdf (large)] [pdf (small)] |
Enhanced
hypertext categorization using hyperlinks Using lexical chains for text summarization |
5/16/2001 | Link-based clustering, Enumerative clustering/trawling, Recommendation systems | [powerpoint] [pdf (large)] [pdf (small)] |
Hypertext clustering:
Clustering hypertext with applications to Web search Duplicate detection: Syntactic clustering of the Web A priori algorithm: Fast algorithms for mining association rules Trawling:Trawling emerging cyber-communities automatically |
5/21/2001 | Web characterization; Research problems | [powerpoint] [pdf (large)] [pdf (small)] |
Broder et al. Graph
structure in the Web Jeong and Barabasi. Diameter of the world wide web Faloutsos et al. On Power Law relationships of the Internet Topology |
5/23/2001 | Distributed databases - Introductory topics; Fragmentation; Allocation | [powerpoint] [pdf (large)] [pdf (small)] |
PDDS Ch. 5 |
5/30/2001 | Query processing in distributed databases - localization, distributed query operators, optimization | [powerpoint] [pdf (large)] [pdf (small)] |
PDDS Ch. 7,8, and 9 |
6/4/2001 | Concurrency Control (Schedules, Serializability, Locking, Timestamp control); Reliability (Failure models, 2-phase commit) | [powerpoint] [pdf (large)] [pdf (small)] |
Concurrency Control and Recovery in Database Systems Ch. 9 of CS245 textbook (Database System Implementation) |
6/6/2001 | Reliability (3-phase commit, Majority 3PC); Network paritions | [powerpoint] [pdf (large)] [pdf (small)] |
Concurrency Control and Recovery in Database Systems |
6/6/2001 | Review session | IR Part II [ppt] [pdf] Dist. DB [ppt] [pdf] |
You are also responsible for the material covered before the midterm. These were the review slides for the midterm (at the end of lecture 7): (IR Part I) [ppt] [pdf] |