So you have received a high CPU utilization alert from your monitoring system and wondering what to do next. If yes, then this blog post may be helpful for you.
You can follow the below steps to isolate the problem and mitigate it.
- Open task manager and identify if SQL Server process is causing the CPU spike or some other process. If any process other than SQL server is consuming high CPU, get in touch with the respective team. Now a day, with advanced monitoring tools /custom built script, we can easily detect which process is consuming high amount of CPU and direct the alert to the respective teams.( I will discuss this in another article)
- If SQL Server process is taking high CPU bandwidth, we can further segregate them using the below mentioned perfmon counters.
% Processor Time
% Privileged Time
% User Time
% Processor Time
% Privileged Time
% User Time
SQL Server: SQL Statistics:
Note: CPU is consumed in two different modes:
1) User Mode
2) Kernel Mode
- If you see % Privileged Time under the processor node is greater than 25 -30 %, it basically indicates that the processor is spending more time on executing kernel mode operations. There’s possibility that a faulty system drivers or any antivirus software that is running currently is causing this. You should get in touch with the server admin/ windows server support folks at this point to investigate this.
- Now, let’s check if SQL Server processes are consuming most of the CPU by taking a look at the % Privileged Time counter value at the Process (sqlservr) node. It measures the percentage of elapsed time that the SQL process threads spent executing code in privileged Kernel mode. Verify if (Process (sqlservr)% Privileged time/No of logical Processor) is >30%. If yes, chances are there that some large I/O operations are being performed at that time (or other kernel mode operations executing). Again check with the server admin/ windows server support folks if underlying server infra(storage performance, IOPS etc..) is performing fine.
- Finally roll up your sleeves if you see high % User Time under Process (sqlservr) node. It measures the percentage of elapsed time that the process threads spent executing code in user mode. You have to perform a number of checks as mentioned below to isolate the problem.
a) Identify the top CPU consuming processes using the below query. If you are experiencing severe performance issue which is paralyzing the system, you should contact the person/ team running the query ASAP and terminate it after taking their consent.
SELECT s.session_id, r.status, r.blocking_session_id, r.wait_type, wait_resource, r.wait_time / (1000 * 60) 'wait_time(Min)', r.cpu_time, r.logical_reads, r.reads, r.writes, r.total_elapsed_time / (1000 * 60) 'total_elapsed_time(Min)', Substring(st.TEXT,(r.statement_start_offset / 2) + 1, ((CASE r.statement_end_offset WHEN -1 THEN Datalength(st.TEXT) ELSE r.statement_end_offset END - r.statement_start_offset) / 2) + 1) AS statement_text, Coalesce(Quotename(Db_name(st.dbid)) + N'.' + Quotename(Object_schema_name(st.objectid, st.dbid)) + N'.' + Quotename(Object_name(st.objectid, st.dbid)), '') AS command_text, r.command, qp.query_plan, s.login_name, s.host_name, s.program_name, s.last_request_end_time, s.login_time, r.open_transaction_count FROM sys.dm_exec_sessions AS s JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp WHERE r.session_id != @@SPID ORDER BY r.cpu_time desc
Now that the CPU intensive queries are terminated and CPU is stable now, you need to take a deep dive into the query plan (look at the query_plan column from output of the above query) of the problematic query and make necessary changes to fine tune the queries in order to restrict the queries from creating a mess again.
I will discuss about what to look for in a query plan and how to tune the problematic query in a separate blog post.
b) Along with it, check if there are any SQL background threads like Ghost Cleanup or Lazy Writer threads are running which generally consumes a significant amount of CPU resource.
select * from sys.sysprocesses where cmd like 'LAZY WRITER' or cmd like '%Ghost%'
c) You should also take a look at the Compilations & Re-Compilations in SQL Server through perfmon counter values.
If the SQL Compilation/sec perfmon counter value is greater than 15 -20% of Batch Requests/sec or SQL Recompilation/sec value is greater than 10% of SQL Compilation/sec value, then consider checking the below points:
- If the statistics are up to date? If no, then go ahead and update the stats.
- Check with the dev/application team about any recent Schema changes in the databases.
- Try to see if there is any stored procedure created with the OPTION (RECOMPILE) query hint.
- Parameterize your stored procedures and queries which will help in query plan reuse that is essential for optimal performance is high OLTP systems.
- Enable Optimize for ad-hoc workloads Server Configuration Option which is used to improve the efficiency of the plan cache for workloads that contain many single use ad hoc batches.
d) Finally, go the extra mile like a Rock Star DBA, and take a look at the SQL Server configurations. Though, I don’t think in the age of Desired State Configuration (DSC), there would be any anomaly in SQL Server configuration settings.
Look at the below settings to check if they are set as per the Microsoft recommendations:
- Max degree of parallelism (MAXDOP)
- Cost Threshold for Parallelism
- Max Worker Thread
These are the first few things that I generally look for while troubleshooting CPU issues. There are tons of additional checks that you may have to perform based on the complicacy and intensity of the problems. I will update this space with the additional details as and when I encounter anything new.
P.S: If your system is already in ICU and not accepting any local connections as well, try connecting through DAC and run the diagnostic queries.