The Splunk Company is developing a platform for the world of Big Data that can index and fuse data from different sources. The Company, which has been operating in Israel for a number of years through such integrators as EMET Computing, is currently preparing to establish a local sales office. Splunk was established in 2003 and went public in 2012, and its current annual turnover amounts to billions of dollars.
"Our uniqueness is in the ability to manage data coming in from computer devices and convert this data into operational knowledge for the client," explains Dimitris Vergos, Partner SE Manager Emerging Markets (Middle East, Africa & Eastern Europe). "Computing systems, production lines, SCADA systems, network devices or any device capable of transmitting data, our solution can store, index, normalize data and enable running of data fusion algorithms."
Schema on the Fly
One of the primary advantages of Splunk's platform is the ability to build a schema "on the fly." A database schema is the framework that represents the logic display of the entire database. It determines how the data will be organized and how it will be interrelated. Splunk's platform employs two primary technologies: MapReduce (a programming model that enables simultaneous running of multiple processes) and a patent-protected indexing capability. MapReduce may operate in conjunction with Hadoop, which divides the data into multiple links. The simultaneous processing of the data enables fast searches and adjustment of data volumes according to demand. For this reason, Splunk's solution enables handling of data volumes ranging from megabytes to petabytes.
"We collect data from a variety of sources using Web Services, API or any other method. The data is stored in various databases such as MongoDB or Elastic Databases, and the magic starts when a search request is made. Until that moment, the data collected is kept in the databases unchanged. When a search request is made, according to the question being asked, a dedicated schema will be built for that question. In this way, the answers are provided almost in real time," says Vergos.
The option of generating a new schema for each search contributes to scenarios where prompt changes are required while searching through the data. Such capabilities are required, among other places, in military scenarios. The option of storing substantial volumes of data from diversified sources, indexing the data and performing high-speed searches can help convert raw data into knowledge very quickly. This applies to tactical scenarios where the response intervals are extremely short. Indeed, the Israeli military (IDF) is one of Splunk's major clients in Israel. IDF sources did not volunteer information as to where the platform is used, but said that "the IDF maintains an extensive range of agreements with suppliers in the field of data extraction, including the Splunk Company, for the benefit of improving the operational effectiveness of the IDF."
One can only imagine the nature and characteristics of the scenarios in the military environment: from collecting structured and unstructured data using various collection devices, through storing the data in databases of various types in the various systems of the military, to the need to pool the data into a single, central data pool so as to run algorithms on it. A central platform capable of storing image, video, text and audio files, indexing the files and enabling algorithms that find patterns to work on a substantial volume of data will provide the military with a significant data fusion capability. If this is accomplished in near-real time, the platform will provide near-real-time data fusion. Such capabilities are required, for example, in combined surveillance-firing systems or such combined arms systems as the TZAYAD (Digital Land Army) or Fire Weaver systems. These systems significantly improve the operational effectiveness of the military.
Looking for Irregular Behavior
Vergos explains that Splunk's platform had originally served (as it continues to do) in the worlds of cyber and IT. In the world of cyber, the system enables collection of data from the web layer, from the end devices, from the servers, and from any element that produces data, including log files. In combination with built-in algorithms, the platform presents a status picture of the organization. Additionally, the ability to perform data cross sections and establish patterns helps identify irregular behavior. In the IT world, the same capabilities are used to perform predictive maintenance of computer systems, to identify degraded performance of hard disks, overheating of devices, conflicting operation of devices, etc. The introduction of computer systems into the industrial field makes Splunk's platform relevant in industrial plants and critical infrastructures for predictive maintenance purposes.
In Israel, Splunk works with some 200 clients. Vergos explains that the pricing model is based on the actual use of data volume, so that it is suitable for small enterprises as well as for major clients. "We seek clients who generate a lot of data, massive volumes," Vergos explains. "They do not have to be major clients necessarily, but clients capable of utilizing the system's capabilities and using it to gain a business added value."
As Vergos is responsible for the Middle East region, he can elaborate on the relevant trends in neighboring countries. According to him, industry throughout the Middle East is becoming more computerized and businesses generate more data. One of the fields where Splunk plays a major role is oil production. This field requires precise maintenance as parts have to be ordered months in advance – and Splunk's platform supports a predictive maintenance application. "If an oil drilling business can order parts in time, before a failure has occurred, it will save itself a lot of money."
Aiming for the IoT World
Another content world Splunk aims for is IoT. In this world, the scenario envisaged for the future includes a large amount of sensors that generate a massive volume of data, and a data fusion platform is expected to be a primary element in that scenario. In such a content world, a platform capable of fusing data within a short time and providing insights to the network operator or to the user will have an advantage. In this case, too, scenarios of preventive maintenance, identifying and responding to irregular events and fusing of data for the purpose of analyzing trends along the time axis are expected to offer an economic or operational added value to the client, and that is what the Splunk Company aims for.
One of the questions asked during the interview with Vergos was what the clients had done before Splunk. How did clients fuse data from different sources, in different formats? Well, clients created, and are still creating "in house" solutions. "If you have multiple databases, for example, you will have to access each one, retrieve the data, normalize it and perform analyses on this data at some central location," explains Vergos. "The problem starts when the person or people who had worked on the project are no longer with the company. How do you maintain the operational continuity of the data management setup you built? In some cases, there is no documentation of the source code. In other cases, there are no qualified professionals within the organization who would know how to continue the project. Splunk is an off-the-shelf product. Its implementation in the organization is fully documented, and personnel capable of operating it may be trained."
Without a doubt, the world of Big Data is challenging. The amounts of data organizations generate, in all sectors, are on the rise. Data becomes the basis for economic effectiveness, but it presents a major difficulty – gleaning insights from it, in time. Splunk enters this niche with an off-the-shelf product intended to replace local solutions developed especially for this purpose in-house. "For more and more companies, data is the basis for profit," explains Vergos. "In such a reality, the business advantage of the organization is based on the ability to glean insights from the data – fast. This mechanism provides a competitive advantage for organizations in the data era."