You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Current Article

Elasticsearch 1.0 Takes Realtime Search To The Next Level

By   /  February 12, 2014  /  No Comments

esearchpixby Jennifer Zaino

ElasticSearch 1.0 launches today, combining Elasticsearch realtime search and analytics, Logstash (which helps you take logs and other event data from your systems and store them in a central place), and Kibana (for graphing and analyzing logs) in an end-to-end stack designed to be a complete platform for data interaction. This first major update of the solution that delivers actionable insights in real-time from almost any type of structured and unstructured data source follows on the heels of the release of the commercial monitoring solution Elasticsearch Marvel, which gives users insight into the health of Elasticsearch clusters.

Organizations from Wikimedia to Netflix to Facebook today take advantage of Elasticsearch, which vp of engineering Kevin Kluge says is distinguished by its focus from its open-source start four years ago on realtime search in a distributed fashion. The native JSON and RESTful search tool “has intelligence where when it gets a new field that it hasn’t seen before, it discerns from the content of the field what type of data it is,” he explains. Users can optionally define schemas if they want, or be more freeform and very quickly add new styles of data and still profit from easier management and administration, he says.

Models also exist for using JSON-LD to represent RDF in a manner that can be indexed by Elasticsearch. The BBC World Service Archive prototype, in fact, uses an index based on ElasticSearch and constructed from the RDF data held in a central triple store to make sure its search engine and aggregation pages are quick enough.

Elasticsearch Version 1.0 has been enhanced in a number of aspects, including boosting scalability overall and adding a new federated analytics capability.  That feature, Tribe Node, “can make a single query that spans across all the different ElasticSearch clusters in an organization that you have permission to access,” Kluge says. Previous to this feature, users would have to go to each different ElasticSearch deployment, issue the query, and bring the results together themselves. GA 1.0 also brings to the fore an aggregations feature to combine specific queries for more complex analysis. As an example, Kluge says, users can use this feature to set up a query to look at all server log events in Elasticsearch that reference a particular error string, broken down by some fairly complex nesting hierarchies and data groupings.

“So as fast as I can type I can very quickly know how severe the problem is and where is it occurring, and then quickly pinpoint the server or country where I as an administrator need to investigate and figure out what is going on,” he says.

Also massively improved in terms of scalability is its search in reverse feature, dubbed Distributed Percolation, which lets users know when data they are interested in is added to their system. “The feature lets you register queries with Elasticsearch so that when a new document is added that matches the query, you are effectively notified,” he says. That comes in handy in particular for media sites that want to keep their users connected with new content. In fact, says CEO Steven Schuurman, “The Guardian has been very explicit about the fact that Elasticsearch has transformed their business with features like Percolation.”

Kluge notes that application developers do have the capability to build in advance semantic matches to ensure that appropriate content is always directed to users – for instance, that documents that reference “pigskin game” surface for users interested in football. Elasticsearch’s own potential journey towards becoming a more semantic platform is still in the future, though. “We certainly are interested in the semantic web and how it applies to search but any work we’d do this year in that area would be more of a research effort,” he says.

“There are other great analytics tools and very good search solutions and data stores to store petabytes of data, but the fact Elasticsearch stitches all the needs together in one stack and stil makes the whole thing very digestible to non-data scientists” makes it stand out, Schuurmann says. “Before they know it they are doing amazing things with big data.”

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Data Science Use Cases

Read More →