Splunk

A few weeks ago, I attended some training on a new product at my company called Splunk. After the training was over, I spent some time reflecting on this product’s strengths and how it can be leveraged in our business.

 

Splunk is great working with unstructured data. The most common source of this data is from log files. Log files are forwarded to the Splunk system. This system indexes the log file’s contents to facilitate searching that data. Over the course of my career, I’ve written a number of parsers to obtain information from a log file. I’ve used Perl and of course awk/sed/grep all in an effort to glean information from the log file. Each time, I have been required to determine column positions in a line, or delimiters in the file that let me separate the meaningful data from the chaff.  Splunk does all this work for you, nice and easy.

 

As data is ingested into Splunk, the Splunk engine indexes portions of the data. Splunk works with many of the common logs you may find in your IT organization. Apache web server logs. Windows Event View Logs. Unix/Linux syslogs. We all have these types of logs in our environment. Splunk indexes these logs around common tags that facilitate searching. The first thing that is indexed is the timestamp of the log entry and not surprisingly, time range searching is very common in Splunk. If you spot items that could serve as potential tags in the data stream, or if Splunk is not familiar with your specific log file, you can create your own tags and Splunk will index on those tags for you.

 

In Splunk, they call it “search” but for database professionals, we are used to the term “query”. These two are essentially the same. Splunk’s query language is not the same as SQL most DBA’s are familiar with. However, the DBA will have little trouble adapting to Splunk’s query language. In our Splunk training class, the first part of the training on how to search was spent learning what the DBA might term the WHERE clause. If you are familiar with SQL, you know that the WHERE clause of the query limits the rows returned in the result set. Splunk is very similar. You construct search criteria to limit rows. The next part the DBA will most likely focus is to limit the columns of the result set. Splunk allows you to do similar and even has an option to convert the data into a table format. Similar to the way a DBA would leverage views or stored procedures to provide a level of abstraction from the data, Splunk allows you to save your searches so that they can be used in reports or dashboards.

As I stated earlier, Splunk is very good at working with unstructured data. Where I work, we have a table in our production database to capture application errors. When the application has an error, it writes information into this table such as the date/time of the error, the module of the application being executed. But the most important piece of information in this table is a CLOB column called the “developer message”. The developer message is the full Oracle error stack, which we do not expose to the end user, but is very useful to IT staff in diagnosing errors. Being that this message is in a CLOB column, it is difficult to search with SQL statements. This is where Splunk excels. In the near future, I will be setting up a routine in Splunk to ingest this error log table so that we can perform quick searches of the unstructured data in the developer message column.

Splunk can query databases directly. During the training, I started thinking of other things we could extract from our production database and use to search in Splunk. I had to stop myself because what I was often thinking of is no different than issuing SQL in the database. If the data is structured in rows and columns in a database table, then use the database engine to query for the data. The database engine is very good at doing so. But in the paragraph above, I am describing one column of a table that contains unstructured data. So if we can move that into Splunk, we can then use that tool to do what it is very good at doing.

 

That being said, there are still very good reasons to have Splunk query a database system directly for structured data. One of our uses is to mine logs for security related issues. We have already set up a security dashboard that our auditors can use to gain information. We also have a need to add information already contained in our databases to this dashboard. Now in the past, we’ve written many reports and used many different reporting tools to present data in a meaningful way. We’ve never needed Splunk in the past for that. However, what is different in this case is that we need a lot of information from the unstructured data source, hence leveraging Splunk. And we desire to add data contained in a structured layout to the same report. So we will obtain this information from the database so that all data, structured and unstructured, is contained in one report, in Splunk.
But the problem is that the drugs are low quality check out this link now viagra uk or have less potency. It is an effective herb especially viagra free pills for treating the problem of erectile dysfunction. Following receiving your current low cost drug treatments, you need buy pill viagra to carefully look at shoe inserts to be able to learn to appropriately shop and employ these. When kids feel defeated and give up before browse this link commander cialis they even evolve to you.
 

Towards the end of my training, I kept thinking more and more that Splunk is really just a database engine, a highly specialized one. It won’t replace Oracle or SQL Server because Splunk has no concept of transaction control. Nonetheless, at its very essence, Splunk is a database engine. Just like Oracle and SQL Server, data is added to the system. Indexes are created to facilitate fast searching of that data. Splunk has “report acceleration” that is akin to table partitioning. Reports are run faster if your time period in the report is within the date range of a partition (Splunk uses the term “bucket”). Splunk’s reporting engine is not that different than SQL Server’s Reporting Services, part of a traditional SQL Server installation. I’m always finding parallels between Splunk and other traditional database engines. Whereas Oracle and SQL Server are general purpose databases capable of handling many different data serving needs, Splunk is specialized having the ability to easily handle unstructured data.

 

I do like this product very much. I am excited to learn more about it. I think that Splunk will have a good future in our organization.