Thursday, August 21, 2014
08:30 AM - 09:15 AM
|Level: ||All - General Audience|
In this case study, we (Modus Operandi, Inc.) were asked by our customer to scale our existing text ingest and semantic wiki system (Blade) from 2,000 documents per day to 20,000 documents per day. Along the way, we ended up achieving a 45,000 documents per hour (not day) ingest rate, and a responsive semantic wiki operating over 6 million documents and 2.2 billion triples!
In this session we'll dive into the architectural changes we made to greatly increase the scale of our semantic system, including:
- Hadoop-ifying the text-to-RDF ingest process
- Moving to pluggable scale-up/scale-out triple stores
- Completely re-architecting our semantic wiki to run native over a triple store with no additional database We will also quantify the query and edit performance of the resulting system.
Our goal is to give you insight into the options available if high-scale text processing, triple stores, or semantic wikis are among your goals.
Mr. Mark Wallace is the Principal Engineer for Semantic Applications at Modus Operandi, Inc. Mr. Wallace has over 25 years experience in software development, and 15 years experience as lead architect on software projects for DoD customers and private industry. He has worked both in research and in commercial product development. He has been working with Semantic Technologies since 2004.
Prior to re-joining Modus Operandi in 2009, Mark served as Chief Architect and Ontologist at 3 Sigma Research. Mr. Wallace served previously at Modus Operandi as Director of Product Development for the Wave(tm) Product, an enterprise data federation and transformation product utilizing OWL, XML, and XQuery, which extended the AquaLogic Data Services Platform developed by BEA Systems, Inc.
Mr. Wallace's recent work has focused on leading the development of high-scale semantic wiki systems using RDF/OWL, SPARQL, GATE, Hadoop, and Accumulo for DoD customers in the armed services.