Group Project

Students taking CS347 for credit are required to complete an information retrieval project, which will count for 30% of the student's grade. Students are encouraged to work in teams of up to 5 members. All projects are to be approved in advanced by the course instructor (see details below). The goal of the project is not to reinvent basic text retrieval components, but rather build interesting higher-level applications on top of these foundations. It is therefore strongly encouraged that projects make use of the standard indexing and related capabilities available in Verity's Developer Kernel (VDK), a description of which follows."

PROVIDED INFORMATION RETRIEVAL SOFTWARE

The VDK engine facilitates the construction of applications allowing advanced search, categorization and "profiling", as well as other functions over a set of documents. Typically, applications developers (1) configure Verity Collections for optimal performance of the features the application will use, (2) use the VDK api to insert documents into Verity Collections ("indexing" documents), and (3) subsequently perform searches and other functions made available by the VDK api over the created Collections.

During indexing, the VDK engine reads documents using a gateway driver, culls metadata and other useful information from documents using filter drivers, and stores document information as well as tables of word location information in the collection. Collections are pseudo-databases designed to be robust; allow concurrent indexing, searching and other operations; minimize latency time during real-time indexing; and provide provide fast search-time performance. During search and other operations, collections are accessed, and depending on configuration, the original documents may also be accessed (particularly when viewing a document returned after a search).

The VDK engine is highly customizable, both via configuration options and C api driverization. Although projects may have any scope, it is possible to incur high levels of complexity diving deep within any particular feature of the engine. Thus, project ideas must be approved to ensure appropriate scope before proceeding too far with design or implementation. Students should combine into teams of ~4 to conquer projects; teams should come up with an idea and schedule to review the design with Brent Miller. All team-members must be present at the time of the project design review. Students who have difficulty forming teams should contact bdmiller@stanford.edu.

Example ideas for projects could be:

PROJECT SCHEDULE

Project sections broadcast live on Channel E3, or available on videotape from the library

4/11 3:15pm @ Skilling 193 - Section - introducing VDK

4/18 3:15pm @ Skilling 193 - Section - more introducing VDK, and questions

4/25 3:15pm @ Skilling 193 - Tentative Section - Project designs must be approved

5/21 11:59pm- Projects due

5/27 11:59pm- Latest any projects will be accepted (with penalty)

COMPLETING PROJECTS

Instructions for delivering completed projects is available here.

OFFICE HOURS

4/23 5:30pm @ Gates B26B 736-1817

4/25 5:30pm @ Gates B24B 736-1816

5/2 5:30pm @ Gates B24B 736-1816

5/9 5:30pm @ Gates B24B 736-1816

5/16 5:30pm @ Gates B24B 736-1816

5/23 5:30pm @ Gates B24B 736-1816Last office hours