Skip to main content
Version: Atlas v3.11

System Performance Journey

The System Performance Journey enables admins to gain more visibility into their Splunk ecosystem performance. This equips admins with more knowledge to take proactive actions and quickly assess any pain points in the environment. Splunk performance is the foundation to achieving outcomes in Splunk because a poorly performing system can be a roadblock to achieving results. Poor system performance can manifest itself as poorly performing system hardware, misconfiguration of Splunk apps leading to inconsistent user experience, or overloading the Splunk scheduler that impacts the user experience and hardware utilization. With Atlas, Splunk admins have more visibility into these issues to resolve them before they become problems.

Given that most Splunk deployments have different infrastructure components that serve specific purposes, it can be difficult for Splunk Administrators to understand where in their stack the problems reside. The System Performance Journey in Atlas is designed to help Splunk Administrators pinpoint issues easily and understand what system components are experiencing problems.

Atlas Elements Utilized

Outcomes

Identify Performance Issues in a Splunk Environment

Splunk's infrastructure is designed with distinct server roles: the search head tier, indexer tier, and support tier. Each tier requires proper hardware resources to handle the daily data flow and usage effectively. Although adhering to Splunk's guidelines provides a solid foundation, administrators can enhance their oversight with Atlas's Splunk Performance and Capacity Analytics element. This tool offers a unified dashboard for monitoring environmental performance., This element operates optimally on a dedicated search head integrated with the search head cluster and indexer cluster.

  1. Open the Splunk Performance and Capacity Analytics element in Atlas.
  2. Ensure the element is properly configured by visiting the Configuration dashboard. Select what each role the Splunk servers identified are and hit save.
  3. Navigate to the System Resources dashboard.
  4. Review the Splunk Summary Status section for any high-level issues. The top row ensures basic configurations are set. The bottom row identifies if any resources are being overly taxed.
  5. Review the detailed telemetry for each tier below the Summary section. These sections highlight any abnormalities and unusual activity. By looking for high CPU, memory, and disk utilization you can see if the current usage of infrastructure resources is trending above its provisioning.
  6. This view is intended to provide a single point of entry for all critical components of the Splunk infrastructure.

Find Version Drift of Splunk Apps in your Clustered Environment

Splunk applications enable Splunk users to normalize data and view dashboards for investigating data. They are paramount for driving outcomes out of the platform. Due to the clustered nature of the majority of Splunk deployments, there is a possibility of applications getting out of sync with each other, leading to dashboards and data not being consistent across the environment. Atlas elements such as App Awareness which can assist with identifying these issues and driving them to completion.

  1. Open the App Awareness Atlas element.
  2. Navigate to the App Tracking dashboard.
  3. Review the table of Splunk applications. Sorting on App Versions column will identify any applications experiencing multiple versions.
  4. Clicking the row expansion on a row identifies where in the Splunk ecosystem these applications are out of date. Working with Expertise on Demand can assist with creating an attack plan on resolving these version differences to ensure your users and alerts have consistent data and performance.

Identify and Remediate High Impact Searches

High Impact Searches indicate that they are consuming CPU cores longer than desired. This means that the search should be rescheduled to a time slot where it is competing less with other searches, or the SPL should be optimized so that the search runs more efficiently.

  1. Open the Scheduling Assistant element in Atlas.
  2. Select a time range that you want to use to analyze the scheduled searches in the Splunk environment.
  3. Utilize the KPIs displayed at the top of the page to locate the scheduled searches that are labeled High Impact.
    1. High Impact = Utilizes a Search Slot Over 3% of time range.
    2. Moderate Impact = Utilizes a Search Slot 1.5-3% of time range.
  4. Click on the KPI to isolate the searches identified by Atlas.
  5. Click on a search to perform a detailed analysis of the search.
  6. Use the Cron Schedule field and click the Submit Preview button to test a new schedule and assess the impact of the schedule change.
    1. Find a new schedule that reduces Concurrent Scheduling and Average Concurrency and minimizes changes to the limit breach ratio.
    2. Impacts are indicated by colors in the Change field.
  7. If the modeled change output is desirable, click on the Save Changes to implement the schedule change.
  8. If there is not a time range that can be identified to reduce the impact of the search, you should consider optimizing the SPL for better performance.