Skip to main content
Version: Atlas v3.14

Search Performance Journey

The Search Performance Journey is designed to ensure that Splunk environments don’t become overrun with performance issues due to poorly written searches and high search concurrency. Poor search scheduler health in Splunk can lead to failed searches and unpredictable Splunk performance that can take a long time to resolve using manual methods. Splunk administrators are not always aware that they have search scheduler problems. They understand that they have inconsistent performance but usually attempt to address the issue by giving the system more resources rather than addressing the root cause. Trying to identify and resolve these issues range from difficult to impossible depending on the size of the environment. Any Splunk user can benefit from Atlas because poor scheduler health can directly impact system resource utilization in both Splunk Cloud and Splunk Enterprise.

The Search Performance Journey is executed by identifying scheduled searches that may be causing issues in a Splunk environment. This journey can also be used to perform routine scheduler maintenance to ensure that search schedules stay evenly distributed in the environment to avoid scheduler issues later.

Atlas Elements Utilized

Outcomes

Identify and Remediate Searches with High Skip Ratios

High skip ratios indicate that a search is skipping frequently in its current time slot. Skipped searches mean that the scheduled search is not running.

  1. Open the Scheduling Assistant element in Atlas.
  2. Select a time range that you want to use to analyze the scheduled searches in the Splunk environment.
  3. Utilize the KPIs displayed at the top of the page to locate the scheduled searches that have high skipped ratios.
    1. High skipped ratio = skipped runs/scheduled runs > 5%
    2. Moderate skipped ratio = skipped runs/scheduled runs 0-5%
  4. Click on the KPI to isolate the searches identified by Atlas.
  5. Click on a search to perform a detailed analysis of the search.
  6. Use the Cron Schedule field and click the Submit Preview button to test a new schedule and assess the impact of the schedule change.
    1. Find a new schedule that reduces Concurrent Schedulings and Average Concurrency and minimizes changes to the limit breach ratio.
    2. Impacts are indicated by colors in the Change field.
  7. If the modeled change output is desirable, click on the Save Changes to implement the schedule change.

Identify and Remediate High Impact Searches

High Impact Searches indicate that they are consuming CPU cores longer than desired. This means that the search should be rescheduled to a time slot where it is competing less with other searches, or the SPL should be optimized so that the search runs more efficiently.

  1. Open the Scheduling Assistant element in Atlas.
  2. Select a time range that you want to use to analyze the scheduled searches in the Splunk environment.
  3. Utilize the KPIs displayed at the top of the page to locate the scheduled searches that are labeled High Impact.
    1. High Impact = Utilizes a Search Slot Over 3% of time range.
    2. Moderate Impact = Utilizes a Search Slot 1.5-3% of time range.
  4. Click on the KPI to isolate the searches identified by Atlas.
  5. Click on a search to perform a detailed analysis of the search.
  6. Use the Cron Schedule field and click the Submit Preview button to test a new schedule and assess the impact of the schedule change.
    1. Find a new schedule that reduces Concurrent Schedulings and Average Concurrency and minimizes changes to the limit breach ratio.
    2. Impacts are indicated by colors in the Change field.
  7. If the modeled change output is desirable, click on the Save Changes to implement the schedule change.
  8. If there is not a time range that can be identified to reduce the impact of the search, you should consider optimizing the SPL for better performance.

Identify and Remediate High Frequency Searches

High frequency searches are searches that scheduled to run frequently in an environment. As a best practice, searches should be executed only as frequently as needed and should rarely be running every minute. Typically, this is an indication of a user who has scheduled a search without knowledge of the impact of to the environment. Searches that run more frequently that once every five minutes are flagged.

  1. Open the Scheduling Assistant element in Atlas.
  2. Select a time range that you want to use to analyze the scheduled searches in the Splunk environment.
  3. Utilize the KPIs displayed at the top of the page to locate the scheduled searches that are labeled High Impact.
    1. High Frequency = Scheduled to run every minute.
    2. Moderate Frequency = Scheduled to run every 2-5 minutes.
  4. Click on the KPI to isolate the searches identified by Atlas.
  5. Click on a search to perform a detailed analysis of the search.
  6. Use the Cron Schedule field and click the Submit Preview button to test a new schedule and assess the impact of the schedule change.
    1. Find a new schedule that reduces Concurrent Schedulings and Average Concurrency and minimizes changes to the limit breach ratio.
    2. Impacts are indicated by colors in the Change field.
  7. If the modeled change output is desirable, click on the Save Changes to implement the schedule change.
  8. Correcting this issue will typically result in lowering the frequency of scheduled runs to no more than once every five minutes.

Identify and Remediate Searches with High Latency

Searches with High Latency means that, on average, they do not start at their scheduled time. This is typically because other searches scheduled nearby that are consuming the available resources at the time it is trying to start. If a search cannot start, it will be skipped.

  1. Open the Scheduling Assistant element in Atlas.
  2. Select a time range that you want to use to analyze the scheduled searches in the Splunk environment.
  3. Utilize the KPIs displayed at the top of the page to locate the scheduled searches that are labeled with Average Latency.
    1. High Latency = Search run delayed over 30s on average.
    2. Moderate Latency = Search run delayed 15-30s on average.
  4. Click on the KPI to isolate the searches identified by Atlas.
  5. Click on a search to perform a detailed analysis of the search.
  6. Use the Cron Schedule field and click the Submit Preview button to test a new schedule and assess the impact of the schedule change.
    1. Find a new schedule that reduces Concurrent Schedulings and Average Concurrency and minimizes changes to the limit breach ratio.
    2. Impacts are indicated by colors in the Change field.
  7. If the modeled change output is desirable, click on the Save Changes to implement the schedule change.
  8. Correcting this issue will typically result in finding a time window that is less that has less searches scheduled nearby.

Identify and Remediate Scheduled Searches with Time Range Mismatches

Scheduled searches with coverage gaps indicate that the time between your scheduled runs is greater than the time window of your search. This could lead to each scheduled run missing the data in the time window, producing potentially incomplete search results. Resolving these searches in a Splunk environment can provide improved accuracy of search results and improved utilization of Splunk resources.

Scheduled searches with excessive time windows indicate that the time range of a search is searching data multiple times over multiple runs. This could lead to CPU and SVC utilization waste, and further lead to skipped searches. Resolving these searches can improve your environment performance and health.

  1. Open the Scheduling Assistant element in Atlas.
  2. Identify the KPI 'Time Range Mismatch' and select it.
  3. To find the worst offenders of Time Range Mismatch. Negative numbers indicate a coverage gap, positive numbers indicate excessive time windows.
  4. Click on the magnifying glass in the Actions column of the search that you want to investigate. This will open and execute the search in the Splunk Search and Reporting App.
  5. Click on the wrench icon to review the recommended fix by Atlas for the search time range.
  6. If the recommendation is desirable, click the Apply button.