Kamran Khan Keynote: A Big Data Architecture for Search #KMWorld

Kamran Khan is CEO of Search Technologies

[These are my notes from the KMWorld 2013 Conference. Since I’m publishing them as soon as possible after the end of a session, they may contain the occasional typographical or grammatical error. Please excuse those. To the extent I’ve made any editorial comments, I’ve shown those in brackets.]

Session Description: Search engines, distributed processing and content processing pipelines are not new. However enabling technologies of mature search engines, powerful content processing pipelines and cheap distributed processing are coming together to empower a next generation of information access, analysis and presentation much closer to the holy grails of knowledge management. Hear from the founder of Search Technologies how modern search engines are currently being combined with powerful independent content processing pipelines and the distributed processing technologies from big data to form new and exciting enterprise search architecture, delivering results only available to the biggest companies with the deepest pockets in the past.

NOTES:

What do his customers do with Big Data? (1) Organize content for internal efficiency. (2) Organize content to create new products and services for external customers.
What was the origin of the modern concept of Big Data? Organizing large amounts of data is not new. What gave birth to the modern concept of Big Data is the desire to do something useful with the enormous log files that are produced by modern computing. 19% of companies that analyze Big Data are focused on log files.
Traditional Search Architecture: Identify various repositories, create connectors from those repositories to the search engine. Typically when you make a change in the search application, you have to go back and re-index the entire underlying content.
New Enterprise Search Architecture: Using Hadoop, his company extracts all the content from the repositories and then does some very sophisticated analysis on the entire collection. By using Hadoop, they offload the entire content and never have to re-index the entire underlying content source when they change the search application. They simply re-index the relevant content.

Related