Skip to main content
Version: Atlas v3.14

Data Governance Journey

Data Governance provides a way for Splunk owners to understand the availability, usability, and integrity of the data in your Splunk environment. The Data Governance Journey will empower Splunk administrators to manage the data in their Splunk environment from many perspectives. By populating the Data Inventory, administrators will be able to track the organizational ownership of data, monitor the usage of data and get visibility into the health of the data feeds while understanding the impact of the data to its business.

Atlas Elements Utilized

Outcomes

Assigning Ownership to Data Feeds and Populating the Data Inventory

Assigning logical ownership and context about the data coming into a Splunk environment is a critical first step to achieving a comprehensive data governance plan. We refer to this as a Data Inventory in Atlas. The Data Inventory is critical for maintaining and understanding the business use cases that are supported by the data and who to contact if there are issues with this data. Data feeds lacking business ownership complicate decision-making on reduction of ingest or justification to improve utilization of available data. The Data Inventory can be implemented at the Splunk index or Index:Source Type level. Each organization should decide what the best approach is for their needs.

  1. Open the Data Management element within Atlas.
  2. Navigate to the Data Inventory Page.
  3. In the Split By field choose Index or Index:Source Type, depending on how you want to manage your data inventory.
  4. Expand the row next to the data element that you want to modify the definition for and click the Edit Definition button.
  5. A modal will appear with several fields, many of them will already be pre-populated from Splunk. The most important fields that should be populated for ownership data are the Owner, Business Unit, and Owner Contact Info. These fields are used to populate ownership reports.
  6. If there are many data elements to populate and a user does not want to use the user interface to populate the data, the same outcome can be achieved by populating the KV store on the back end of Splunk by importing a .csv file. The Expertise on Demand Team can assist with this task for the initial population of the KV store.
  7. Track progress of data inventory population by monitoring the KPIs at the top of the dashboard.

Reporting on Splunk Data Ingest by Data Owner or Business Unit

Once the Data Inventory has been populated with ownership data, the data ownership report within Data Management will be populated. This will help to track the consumption of Splunk data ingest by data owners. This helps Splunk owners to understand the ownership of data when compared to data ingest volumes. This can aid in decision making relative to the cost of Splunk by owner or business unit.

  1. Open the Data Management element within Atlas.
  2. Navigate to the Reports > Data Ownership page.
  3. Choose how you want to see the ownership report by changing the Ownership Split By field.
  4. If License Pools are being used, you can include or exclude license pools from the report by selecting which pools should be reflected in the report.
  5. Change the Data Split By field to set Index or Index:Source Type to reflect how you have configured the data.
  6. Modify the time range to select how long you want to see the report.
  7. The Data Ownership Report will display the distribution of ingest by Owner or Business Unit within the selected time range.

Investigate Data Sources Consuming Excessive License

Understanding the volume of individual data sources being ingested provides context for further action improving the state of the environment. This information helps Splunk administrators and owners with the data that they need to make educated decisions about how data is ingested into Splunk. The purpose of this outcome is to determine if any datasets are candidates for disabling or volume reduction based on the historical usage of the data. Action taken to reduce ingest volume is most impactful for the largest datasets being utilized.

  1. Open the Data Utilization element within Atlas.
  2. Navigate to the Configuration page to ensure that the element is configured appropriately. If the requirements indicators show that configuration is needed, follow the on-screen instruction to complete setup.
  3. A Backfill Data operation may be needed if the element has not been configured yet and you want to see data utilization immediately. This operation will populate utilization data for the selected time range.
  4. Navigate back to the Data Utilization screen and select how you want to view the utilization report:
    1. Select Index or Index:Source Type for how the report will be displayed.
    2. Isolate the datasets that you want to analyze if desired.
    3. Select Utilization Time Range for the time range to analyze data usage.
    4. Select License Usage Time Range to set the time range for license usage analysis.
  5. Under Key Metrics you must select how you want to define the Utilization Threshold for the report. This means that you can view the output in several different ways:
    1. Total Queries (most common)
    2. Queries/Day
    3. Queries/10 GB of data
  6. Once the Utilization Threshold method is chosen, select the associated Threshold number that you would like to define as underutilized. The lower the number the more data that will appear as underutilized in the report. This may be appropriate for some use cases, but you must determine what is the appropriate threshold for your organization’s needs.
  7. The Utilization Overview report will list all the datasets that fall within the Utilization Threshold and will detail the type of activity that has occurred along with the associate license usage that correlates to that utilization.

Correlating Data Utilization to License Utilization

Providing insight into how data sources are being utilized through searches, reports, alerts, and dashboards gives valuable insight into data sources that are being underutilized. This helps to illuminate areas where more alerting would be valuable in a customer’s environment.

  1. Open the Data Utilization element within Atlas.
  2. Navigate to the Configuration page to ensure that the element is configured appropriately. If the requirements indicators show that configuration is needed, follow the on-screen instruction to complete setup.
  3. A Backfill Data operation may be needed if the element has not been configured yet and you want to see data utilization immediately. This operation will populate utilization data for the selected time range.
  4. Navigate back to the Data Utilization screen and select how you want to view the utilization report:
    1. Select Index or Index:Source Type for how the report will be displayed.
    2. Isolate the datasets that you want to analyze if desired.
    3. Select Utilization Time Range for the time range to analyze data usage.
    4. Select License Usage Time Range to set the time range for license usage analysis.
  5. Under Key Metrics you must select how you want to define the Utilization Threshold for the report. This means that you can view the output in several different ways:
    1. Total Queries (most common)
    2. Queries/Day
    3. Queries/10 GB of data
  6. Once the Utilization Threshold method is chosen, select the associated Threshold number that you would like to define as underutilized. The lower the number the more data that will appear as underutilized in the report. This may be appropriate for some use cases, but you must determine what is the appropriate threshold for your organization’s needs.
  7. The Queries/GB metric illuminates how much activity has occurred on that dataset in the time range.
    1. Red indicates that it is below the utilization threshold.
    2. Blue indicates that it cannot be calculated due to low data volume.
    3. Green indicates that it has utilization above the designated threshold.
  8. License Utilization is summarized in the KPIs at the top of the page:
    1. License Usage Underutilized (GB): Displays how much data in the environment is considered underutilized based on the settings provided by the user.
    2. License Usage Underutilized (%): Displays what percentage of the data ingested in the time range is considered underutilized based on the settings provided by the user.

Identifying Underutilized Splunk Data Sources

Providing insight into how data sources are being utilized through searches, reports, alerts, and dashboards gives valuable insight into data sources that are being underutilized. This can illuminate areas where more utilization of data would be valuable, or if it would be best to consider looking at optimizing the how this data is stored. Underutilized Splunk data are data sources being ingested but not utilized in searches, reports, alerts, or dashboards.

  1. Open the Data Utilization element within Atlas.
  2. Navigate to the Configuration page to ensure that the element is configured appropriately. If the requirements indicators show that configuration is needed, follow the on-screen instruction to complete setup.
  3. A Backfill Data operation may be needed if the element has not been configured yet and you want to see data utilization immediately. This operation will populate utilization data for the selected time range.
  4. Navigate back to the Data Utilization screen and select how you want to view the utilization report:
    1. Select Index or Index:Source Type for how the report will be displayed.
    2. Isolate the datasets that you want to analyze if desired.
    3. Select Utilization Time Range for the time range to analyze data usage.
    4. Select License Usage Time Range to set the time range for license usage analysis.
  5. Under Key Metrics you must select how you want to define the Utilization Threshold for the report. This means that you can view the output in several different ways:
    1. Total Queries (most common)
    2. Queries/Day
    3. Queries/10 GB of data
  6. Once the Utilization Threshold method is chosen, select the associated Threshold number that you would like to define as underutilized. The lower the number the more data that will appear as underutilized in the report. This may be appropriate for some use cases, but you must determine what is the appropriate threshold for your organization’s needs.
  7. The Queries/GB metric illuminates how much activity has occurred on that dataset in the time range.
    1. Red indicates that it is below the utilization threshold.
    2. Blue indicates that it cannot be calculated due to low data volume.
    3. Green indicates that it has utilization above the designated threshold.
  8. License Utilization is summarized in the KPIs at the top of the page:
    1. License Usage Underutilized (GB): Displays how much data in the environment is considered underutilized based on the settings provided by the user.
    2. License Usage Underutilized (%): Displays what percentage of the data ingested in the time range is considered underutilized based on the settings provided by the user.

Identifying the Methods of Data Utilization for Each Dataset

Identifying how data sets are being used can show which methods of searching (scheduled searches, ad-hoc, or dashboard) are dominant for that data set which can show potential areas of improvement or changes that may need to be made in the environment. Investigating dataset usage by type and ensuring that a dataset is being used effectively and not being under or overused needlessly consuming excess Splunk resources.

  1. Open the Data Utilization element within Atlas.
  2. Navigate to the Configuration page to ensure that the element is configured appropriately. If the requirements indicators show that configuration is needed, follow the on-screen instruction to complete setup.
  3. A Backfill Data operation may be needed if the element has not been configured yet and you want to see data utilization immediately. This operation will populate utilization data for the selected time range.
  4. Navigate back to the Data Utilization screen and select how you want to view the utilization report:
    1. Select Index or Index:Source Type for how the report will be displayed.
    2. Isolate the datasets that you want to analyze if desired.
    3. Select Utilization Time Range for the time range to analyze data usage.
    4. Select License Usage Time Range to set the time range for license usage analysis.
  5. Under Key Metrics you must select how you want to define the Utilization Threshold for the report. This means that you can view the output in several different ways:
    1. Total Queries (most common)
    2. Queries/Day
    3. Queries/10 GB of data
  6. The Utilization Overview table displays all active utilization of a dataset within the specified time range. Here you can investigate the type of activities that occur on a data set (ad-hoc queries, scheduled searches, dashboard queries).
  7. To perform a deeper investigation of the data, you can click on a data set and scroll down to the Investigation Panel. Here you can see detailed information about the utilization activity that is being run. This is useful for when you need to understand the details about how a dataset is being utilized in the Splunk environment.