Skip to main content
Version: Atlas v3.13

Data Source Integrity Journey

The Data Source Integrity Journey is used to help Splunk owners understand the health of the data pipelines that are feeding Splunk. To do this, we must validate both the source and destination to ensure that data is being sent at the expected volumes and that the external sources are healthy. The journey primarily focuses on Splunk forwarders and the data that is coming into the Splunk indexes. Data Source Integrity plays a crucial role in the overall data management strategy, as the integrity of these streams directly influences the effectiveness of Splunk's analytical capabilities. When these data streams encounter failures or disruptions, the downstream processes that rely on this data will be impacted. Ensuring the robustness of these data inputs is therefore essential to prevent any adverse effects on dependent operations and to maintain the continuity and accuracy of data-driven insights within Splunk.

Atlas Elements Utilized

Outcomes

Applying Logical Groupings to Splunk Forwarders for Health Monitoring

A healthy ecosystem of Splunk Forwarders is important to protect the integrity of the data coming into most Splunk environments. Understanding the behaviors of Splunk Forwarders and how their performance impacts your specific environment can be challenging in Splunk. Providing additional metadata, such as system ownership, improves the effectiveness of communication and the Splunk administrators understanding of how an outage could impact the business and how to effectively perform troubleshooting. Creating Forwarder Groups in Atlas Forwarder Awareness allows you to group forwarders together and track an uptime metric. Here you can understand the health of your forwarders and if they are checking in at the expected intervals. You can also setup alerting so that if your uptime falls below a certain level, you can be alerted.

  1. Open the Forwarder Awareness element in Atlas.
  2. Navigate to the Forwarder Group Overview page.
  3. Click on the New Group button in the top right-hand corner.
  4. From there you will be presented with a modal that will allow you to complete the needed fields:
    1. Group Name: The desired name of the Forwarder Group
    2. Priority: Apply a priority label to the group.
    3. Hosts: Choose the hosts from the available hosts list, or if server classes are available, you can utilize those for grouping.
    4. Description: Provide a description for the forwarder group.
    5. Owner: Provide a value for group owner.
    6. Contact Info: Provide contact information for the group owner.
  5. Click the Save button to save the group.

Identifying Missing Forwarders and Investigating Impact

With many unique hosts reporting data to Splunk, awareness of forwarder downtime is a constant challenge. Providing notifications to system owners and admins, while giving reporting on downtime and impact helps overcome the difficulty in monitoring all forwarding systems. Using Atlas Forwarder Awareness Splunk administrators can investigate the outage duration, last-known forwarder configurations, and data sources being provided by the forwarding system.

  1. Open the Forwarder Awareness element in Atlas.
  2. Navigate to the Forwarder Group Overview page.
  3. Select a Forwarder Group or All Forwarders to navigate to the Forwarder Awareness dashboard.
  4. Here you can see the last known status of a forwarder. A green status indicator shows that it has checked in within the selected time range. A red status indicator shows that it has not checked in as expected and is considered missing.
  5. To view the associated telemetry of forwarder activity you can expand the row selection and look at the trend lines that show activity in the time range. This is a good way to see when the forwarder went offline, and which data sources were impacted.
  6. Alerting can be paused by clicking on the Alert symbol so that alerts are not sent about that forwarder being down. This is useful for when investigating an outage or performing system maintenance.

Data Ingest Monitoring

Partial data outages for any unique data feed can be difficult to identify. Applying modular thresholding to priority data monitoring allows for understanding when data is not being ingested as expected. Unmonitored data sources can go offline without Splunk administrator awareness. Monitors can be applied to data sources with customized thresholds so that outage alerting and ingest patterns can be analyzed. Atlas uses a capability called a Data Watch, that is used to monitor the ingest activity of data and can be configured to alert when volumes fall below an expected threshold for a specified period.

  1. Open the Monitor element in Atlas.
  2. Navigate to the Configuration page.
  3. Ensure that the requirements for the app are met. If a requirement is not met, there will be on screen instructions to guide you through the configuration.
  4. Create a Monitor Group on the Group Overview dashboard. These groups should represent logical separations of data by ownership or purpose.
  5. On a Monitor Group, create a Data Watch that can track a data ingest by index, or even more detailed options. Identify a starting threshold and time range to track nominal data flow.
  6. The Monitor Report dashboard will now track the data flow into the system and email the Data Group owner if a Data Watch experiences an outage. Using the Monitor Report dashboard, a user can inspect data flow against the threshold on the row expansion and get a list of all outages in the time range.