DS

Welcome!

Hi, I'm a postdoctoral fellow in the Computer Science Department at Stanford University, working with Jure Leskovec. I got my PhD from Carnegie Mellon University, where I was advised by Manuel Blum and Carlos Guestrin.

My research focuses on the information overload problem. This problem spans entire sectors, from web users to scientists and intelligence analysts, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture.

My research aims at providing more expressive options for specifying complex information needs (going beyond keyword queries), while at the same time presenting the retrieved results in a structured and annotated manner that better enables the user to connect the dots. Moreover, as the information needs vary from person to person, an overarching component of this approach is the personalization of results based on rich user interaction. You can see some examples below.

Selected Publications

Information Overload

Information overload and ways to fight it is my new (and exciting!) research area. Below are two examples of our approach - for the news domain, and for blogs. More examples coming soon.

  • Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang and Jure Leskovec, Information Cartography: Creating Zoomable, Large-Scale Maps of Information.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013.

    We propose a methodology for creating structured summaries of information, which we call zoomable metro maps. Just as cartographic maps have been relied upon for centuries to help us understand our surroundings, metro maps can help us understand the information landscape. [...] As different users might be interested in different levels of granularity, the maps are zoomable, with each level of zoom showing finer details and interactions.

  • Dafna Shahaf, Carlos Guestrin and Eric Horvitz, Metro Maps of Information.
    ACM SIGWEB Newsletter, 2013.
  • Dafna Shahaf, Carlos Guestrin and Eric Horvitz, Metro Maps of Science.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2012.

    Information overload is a major challenge for scientists today, and is especially daunting for new investigators attempting to master a discipline and scientists who seek to cross disciplinary borders. [...] We create structured summaries of information, which we call metro maps. Pilot user studies demonstrate that our method can help researchers acquire new knowledge efficiently.



  • Dafna Shahaf, Carlos Guestrin and Eric Horvitz, Trains of Thought: Generating Information Maps.
    International World Wide Web Conference (WWW), 2012.

    Complex stories spaghetti into branches, side stories, and intertwining narratives. In order to explore these stories, one needs a map to navigate unfamiliar territory. We propose a methodology for creating structured summaries of information, which we call metro maps. [...] Most importantly, metro maps explicitly show the relations among retrieved pieces in a way that captures story development.

  • Dafna Shahaf and Carlos Guestrin, Connecting Two (or Less) Dots: Discovering Structure in News Articles.
    [Journal] ACM Transactions on Knowledge Discovery from Data, 2011.
  • Dafna Shahaf and Carlos Guestrin, Connecting the Dots Between News Articles.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2010. Also presented at IJCAI'11.

    Best Research Paper, KDD'10

    The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. [...] In this paper, we investigate methods for automatically connecting the dots - providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate.

  • Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin, Turning Down the Noise in the Blogosphere.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2009.

    In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. [...] we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere. [...] In addition, since people have varied interests, our coverage algorithm incorporates user preferences in order to tailor the selected posts to individual tastes.

Other Publications

Publications from my Msc, internships, etc.

Adventures in Human-Computation

  • Dafna Shahaf and Eric Horvitz, Generalized task markets for human and machine computation.
    The Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI) 2010.

    We discuss challenges and opportunities for developing generalized task markets where human and machine intelligence are enlisted to solve problems, based on a consideration of the competencies, availabilities, and pricing of different problemsolving resources. The approach couples human computation with machine learning and planning, and is aimed at optimizing the flow of subtasks to people and to computational problem solvers. We illustrate key ideas in the context of Lingua Mechanica, a project focused on harnessing human and machine translation skills to perform translation among languages.

  • Dafna Shahaf and Eyal Amir, Towards a Theory of AI-Completeness.
    CommonSense 2007

    (A thought experiment: complexity models for computational problems that include a human in the process. Take with a grain of salt)

Logic (My MSc)

Graphical Models