Backing up your data is common sense. But before you decide how to back up your indexed data, you should familiarize yourself with how the indexer stores data in Splunk and how data ages once it has been indexed. We have covered this topic in our blog here. Moving forward, in this blog, we’ll cover how you can backup indexed data in Splunk Enterprise.
To start with below is a general rule for backing up your data in different stages:
- Hot buckets are currently being written to -> Do not back up
- Warm buckets are rolled from hot -> Can be safely backed up
- Cold buckets are rolled from warm -> Move to another location
- Frozen buckets are deleted by the indexer -> You can archive the contents prior to deletion.
We have discussed the directory structure for the default index (defaultdb) depending on the Bucket type in our previous blog. If you need to refer to it, read here. It is important to note that all index locations must be writable. Additionally, you can store cold buckets in a separate location from hot/warm buckets since the paths for hot/warm and cold directories can be configured.
Selecting your backup strategy
When choosing your backup strategy, there are two basic approaches to keep in mind. One; ongoing, incremental backups of warm data, and two; backup of all data. How you choose to backup indexed data in Splunk Enterprise will depend entirely on the tools and procedures available to you in your organization. However, below we discuss the two basic backup strategies you can use as a guideline.
Ongoing, incremental backups of warm data
Generally speaking, it is recommended to schedule backups of warm buckets on a regular basis. However, if you are rolling buckets in quick succession, it’s a good idea to include cold directories in your backup. This will ensure you don’t miss buckets that roll over from warm to cold before being backed up. Names of bucket directories remain unchanged when they roll from one stage to another so you can just filter by name.
Backup all data
It’s good practice to backup all your data in hot, warm, and cold buckets and there are several ways you can do this. Depending on how much time you have on hand and the size of your data, below are some guidelines to use:
- For small size data, you can shut down the indexer and make a copy of your directory before carrying out the upgrade.
- For large size data, you will want to snapshot your hot buckets prior to upgrade. More about snapshots below.
Any which way, if you have been performing incremental backups of your warm buckets when they roll from hot, you should only need to backup your hot buckets at this time.
Backup Hot buckets
In order to backup hot buckets, you will need to take a snapshot of the files with a tool like VSS (on Windows/NTFS), ZFS snapshots (on ZFS), or any snapshot tool available to you. If you have none, you can manually roll a hot bucket to warm and then back it up. To do this, use the following CLI command, replacing <index_name> with the name of the index you are looking to roll:
Splunk _internal call /data/indexes/<index_name>/roll-hot-buckets -auth <admin_username>:<admin_password>
Note: Larger buckets are more efficient to search. However, when you manually roll hot buckets, you are prematurely rolling buckets and producing smaller and less efficient buckets. Hence, rolling hot buckets manually is not suggested and a snapshot backup is recommended instead.
Guidelines for Recovery
If you experience a disk failure and still have some of your data but the indexer won’t run, you can move the index directory aside and restore from a backup rather than on top of a partially corrupted datastore. This will cause the indexer to automatically create hot directories during startup and continue indexing, picking up where it had stopped.
Backup clustered data
Inadvertently, you will run into duplication when backing up clustered data. Trying to identify exactly one copy of each bucket on the cluster to back up is the solution, although a tedious and time consuming one. Another consideration would be if you want to backup just the bucket’s rawdata or both rawdata and index files. Index file scripts must also identify a searchable copy of each bucket.
As a result of these complications associated with cluster backup, it is highly recommended to seek out a Splunk professional for guidance in backing up single copies. Cyber Chasse can design you a customized solution to backup indexed data in Splunk Enterprise. Get in touch with us today.