Yahoo's Knowledge Graph
Share this Session:
  Nicolas Torzec   Nicolas Torzec
Principal Research Engineer


Wednesday, August 20, 2014
03:15 PM - 04:00 PM

Level:  Technical - Intermediate

We present Yahoo Knowledge, a platform designed to build, maintain, and serve a unified knowledge graph of all the entities and concepts we care about at Yahoo, to support knowledge-based applications across the company: Web Search, Media Verticals, Content Understanding, Personalization, Advertisement...

The resulting knowledge graph provides key information about entities (i.e. attributes, relationships, features, links to content), as well as interlinking across data sources.

Typical usages include:

  • Searching and displaying information about entities
  • Recognizing entities in context
  • Connecting entities to content and data sources
  • Discovering and recommending related information

  1. We acquire and extract informations about entities from multiple complementary sources on an ongoing basis using simple information extraction techniques. We leverage open data sources such as Wikipedia as well as closed data sources from paid providers.
  2. We store these informations uniformly in a central knowledge repository where entities and their attributes and relationships are categorized, normalized, and validated against a common ontology (250 classes, 800 properties) using a generalized and scalable framework.
  3. We use machine learning techniques to disambiguate and blend together entities that co-refer to the same real-world objects, eventually turning siloed, incomplete, inconsistent, and possibly inaccurate informations into a rich, unified, disambiguated knowledge graph.
  4. We have a plugin system to enrich the graph with inferred information useful for the applications we support. We also leverage editorial curation for hot fixes.
  5. We provide access to our knowledge graph via APIs. We also generate data exports on an ongoing basis for large-scale offline data processing.

The Yahoo Knowledge platform manages millions of interconnected entities and relationships, and runs on top of distributed storage and data processing systems.

Nicolas Torzec is the science lead for the "Yahoo Knowledge" project, which focuses on building and maintaining a unified knowledge graph of all the entities and concepts relevant to Yahoo, to support knowledge-based applications across the company. Nicolas has been working on Knowledge projects at Yahoo Labs for the past 5 years. He holds a degree in computer sciences. He is interested in Information and Knowledge Management and in Machine Learning. His specialities include Natural Language Processing, Text Mining, Information Extraction, Information Integration, Knowledge Management, Knowledge Representation, Data Mining, and Graph Mining. In past lives, Nicolas has been a Web Search engineer at Yahoo, a Shopping Comparison tech lead at Kelkoo, a teaching and research assistant in Computer Sciences at the University of Rennes, and a research engineer in Natural Language Processing at Orange Labs.

Close Window