WSDM 2009 logo

Second ACM International Conference
on Web Search and Data Mining
WSDM 2009
Barcelona, Spain - February 9-12, 2009

Sponsored by:
acm_logo


Abstract

There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by employing large-scale information extraction of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style ontologies and reaching into Deep-Web sources can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision.

This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO knowledge base and the NAGA search engine but also covering related projects. YAGO is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet with high accuracy and reconciled into a consistent RDF-style "semantic" graph. For further growing YAGO from Web sources while retaining its high quality, pattern-based extraction is combined with logic-based consistency checking in a unified framework. NAGA provides graph-template-based search over this data, with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques.

YAGO is publicly available and has been imported into various other knowledge-management projects including DBpedia. YAGO shares many of its goals and methodologies with parallel projects along related lines. These include Avatar, Cimple/DBlife, DBpedia, KnowItAll/TextRunner, Kylin/KOG, and the Libra technology (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities.

About the speaker

Gerhard Weikum

Gerhard Weikum is a Scientific Director at the Max-Planck Institute for Informatics, where he is leading the research group on databases and information systems. Earlier he held positions at Saarland University in Germany, ETH Zurich in Switzerland, MCC in Austin, and he was a visiting senior researcher at Microsoft Research in Redmond. His recent working areas include peer-to-peer information systems, the integration of database-systems and information-retrieval methods, and information extraction for building and maintaining large-scale knowledge bases. Weikum has co-authored more than 150 publications, including a comprehensive textbook on transactional concurrency control and recovery. He received several best paper awards including the VLDB 2002 ten-year award, and he is an ACM Fellow. He has served on the editorial boards of various journals and book series, including ACM TODS, the Springer LNCS series, and the new CACM, and as program committee chair for international conferences like ICDE 2000, ACM SIGMOD 2004, and CIDR 2007. He is currently the president of the VLDB Endowment.