24 May How Data ages in Splunk
In the last blog, we covered the basics of Splunk when getting started. We briefly touched upon working in data and how Splunk stores and works with data. If you need a refresher on the basics again, read here. In this blog, we’ll try to simplify how data ages in Splunk and how you can carefully plan your data aging policy and retention schedule. Besides, it’s essential to know where to find your data especially if you haven’t worked on it for a while.
To start with the basics…..
When you add data in Splunk, it enters the indexer which processes it through a pipeline. Event processing occurs here, and the processed data is written to disk. There are two main stages to event processing; parsing and indexing. During parsing, data is extracted, configured and timestamps and event boundaries are identified. Indexing then processes the data further by breaking all events into segments, building the index data structures and post-indexing compression of data.
How Splunk stores data?
Data is finally stored in an index and the index directory is called a bucket. Another term used to describe buckets is database or “db”. Each index occupies its own directory under $SPLUNK_HOME/var/lib/splunk. The directory name is the same as the index name. The index directory is made up of a series of subdirectories that categorize buckets by their state (described below) and buckets are subdirectories within these directories.
A bucket moves through the following stages as it ages:
When data is indexed, it goes into a hot bucket where it is both searchable and can be actively written to. The default location is $SPLUNK_HOME/var/lib/splunk/defaultdb/db/*. There can be multiple hot subdirectories that belong to each hot bucket.
When a hot bucket reaches a size limit or gets restarted, it rolls on to become a warm bucket. The default location for a Warm bucket state is $SPLUNK_HOME/var/lib/splunk/defaultdb/db/*. Several other conditions can lead to a hot bucket rolling on to a warm bucket. Warm buckets are also searchable but can’t be written to. Each warm bucket has a separate subdirectory.
In the same way when further conditions are met, a warm bucket rolls over to a cold bucket and is searchable as well. The default location for a cold bucket is $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/*. Splunk selects the oldest warm bucket to roll to cold but doesn’t rename it. There are multiple cold subdirectories.
Cold buckets further roll to frozen, typically after a set period of time, and are then either archived or deleted. They don’t have a default location and you may archive these into a directory location specified by you. Deletion occurs by default and we will cover how to archive indexed data in future blogs. Frozen buckets are not searchable.
Thawing archived data
If the frozen data has been archived, it can be returned to the index by thawing it. The default location for data that has been archived and later thawed is $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/*. We will cover how to restore archived data to a thawed state in future blogs. Thawed data is available for searches.
The default directory structure of the index (defaultdb) for the hot/warm, cold, and thawed directories can be configured and all index locations must be writable.
The indexer in Splunk Enterprise by default administers indexed data through various states and removes old data from your system after a long period of time. This default system may work for you. Alternately, you can also specify your customized bucket aging policy by editing attributes in indexes.conf. If you are a Splunk administrator, its beneficial to understand how the indexer stores indexes across buckets and how Splunk Enterprise ages data. This blog was an attempt to give you an overview of just that.
Cyber Chasse has an intensive training program for Splunk Enterprise. For more information and to find out about our course details, visit Cyber Chasse.