SQL Server健康檢查

曾淼Mark 2018-09-22

展開全文

對(duì)一臺(tái)新的或不熟悉的服務(wù)器，你首先要做的是收集盡可能多有關(guān)Server的信息，從硬件和存儲(chǔ)子系統(tǒng)的細(xì)節(jié)，到OS，再到SQL Server實(shí)例本身。你需要知道要處理什么事情----配置是否差，存儲(chǔ)子系統(tǒng)完全不充分的過(guò)時(shí)Server，或者，希望一些東西更佳。對(duì)于把精力放在正確管理和優(yōu)化數(shù)據(jù)庫(kù)Server，這份信息是一個(gè)關(guān)鍵的開始點(diǎn)。作為一個(gè)數(shù)據(jù)庫(kù)專業(yè)人士，實(shí)在沒有理由不清楚每臺(tái)數(shù)據(jù)庫(kù)Server的硬件及配置信息。

這個(gè)章節(jié)會(huì)通過(guò)一套查詢帶你做SQL Server健康檢查，解釋其含義及如何理解結(jié)果。這些查詢從硬件和實(shí)例層次上開始，然后讓你能夠鉆取到一個(gè)特定數(shù)據(jù)庫(kù)，去采集更多特定信息。

很多DBA收集這類信息的一個(gè)攔路虎是官僚主義。系統(tǒng)管理員或系統(tǒng)工程師提供并管理實(shí)際的數(shù)據(jù)庫(kù)Server硬件和OS，SAN管理員常常負(fù)責(zé)存儲(chǔ)子系統(tǒng)。這些人員中，常有職稱高于你，而對(duì)SQL Server基本不懂的人，成為你收集有關(guān)數(shù)據(jù)庫(kù)Server重要信息的一大障礙。他們或許會(huì)把信息采集看成是入侵他們的領(lǐng)域，因此不愿意配合你。

不管任何官僚主義還是組織障礙，你仍然可以通過(guò)技術(shù)在SSMS中收集大部分你需要的信息來(lái)做比較全面的SQL Server健康檢查。其中最有用、易用的技術(shù)是使用DM和DMF來(lái)收集服務(wù)器、實(shí)例及數(shù)據(jù)庫(kù)有關(guān)健康檢查的信息。

運(yùn)行DMV和DMF查詢

要運(yùn)行大多數(shù)的DMV和DMF查詢，你需要對(duì)SQL Server實(shí)例有VIEW SERVER STATE權(quán)限。

首先你要想準(zhǔn)確地找出SQL Server的版本、版次及建立，還想知道是x64還是x86、OS是什么。很簡(jiǎn)單，直接可以通過(guò)如下非DMV查詢得到：

SELECT @@VERSION AS [SQL Server and OS Version Info];

知道這些信息后，有助于了解SQL Server的可用功能。例如，企業(yè)版有數(shù)據(jù)壓縮，而標(biāo)準(zhǔn)版或BI版的SQL Server 2012就不能使用數(shù)據(jù)壓縮。

SQL Server 創(chuàng)建
微軟定期發(fā)布SQL Server的累積更新（CU-Cumulative Update），每個(gè)更新有10~40個(gè)熱補(bǔ)丁，以集成的安裝包發(fā)布。

下面這個(gè)查詢，能夠獲取數(shù)據(jù)庫(kù)Server上有關(guān)OS、語(yǔ)言等更多信息：

SELECT windows_release, windows_service_pack_level,windows_sku, os_language_version FROM sys.dm_os_windows_info WITH (NOLOCK) OPTION (RECOMPILE);

下面的查詢會(huì)告訴你有多少邏輯處理器、處理器的超線程比率、有多少物力CPU及多大物理內(nèi)存。

-- Hardware information from SQL Server 2012 (new virtual_machine_type_desc)(Cannot distinguish between HT and multi-core)
SELECT cpu_count AS [Logical CPU Count], hyperthread_ratio AS [HyperthreadRatio],cpu_count/hyperthread_ratio AS [Physical CPU Count],
physical_memory_kb/1024 AS [Physical Memory (MB)],affinity_type_desc, virtual_machine_type_desc, sqlserver_start_time
FROM sys.dm_os_sys_info WITH (NOLOCK) OPTION (RECOMPILE);
-- Gives you some good basic hardware information about your database server

下面的查詢會(huì)讀取SQL Server錯(cuò)誤日志，以獲取廠商及數(shù)據(jù)庫(kù)Server的型號(hào)（model number）：

-- Get System Manufacturer and model number from SQL Server Error log. This query might take a few seconds
-- if you have not recycled your error log recently
EXEC xp_readerrorlog 0,1,"Manufacturer";
-- This can help you determine the capabilities and capacities of your database server

知道廠商和型號(hào)，你就能找到重要的信息，如有多少處理器插槽、有多少內(nèi)存插槽，多少及何種類型的PCI-E擴(kuò)展槽，還會(huì)告訴你處理器類型（Intel還是AMD）和處理器是那一代。

下面的查詢會(huì)返回處理器的描述及來(lái)自Windows Registry的額定時(shí)鐘頻率：

-- Get processor description from Windows Registry
EXEC xp_instance_regread 'HKEY_LOCAL_MACHINE','HARDWARE\DESCRIPTION\System\CentralProcessor\0','ProcessorNameString';
-- Gives you the model number and rated clock speed of your processor(s)

知道額定時(shí)鐘頻率非常重要，因?yàn)槟愕奶幚砥骺赡芤驗(yàn)殡娫垂芾?，不是每次都全速運(yùn)行。Windows Server 2008(R2)使用Balanced Windows Power Plan。這意味著當(dāng)處理器不在高負(fù)荷的情況下，會(huì)降低時(shí)鐘速率來(lái)減少節(jié)省電力。當(dāng)處理器突遇高負(fù)載時(shí)，它會(huì)增加時(shí)鐘速率以達(dá)到全速，但這不會(huì)立即發(fā)生從而對(duì)查詢性能有負(fù)面影響。當(dāng)使用默認(rèn)的Balanced Windows Power Plan，而不是High Performance電源計(jì)劃，通常OLTP負(fù)載下只有20~25%的速率。要避免這樣的問(wèn)題，你首先要確認(rèn)你的數(shù)據(jù)庫(kù)Server使用High Performance電源計(jì)劃，而不是Balanced電源計(jì)劃。這個(gè)設(shè)置可以動(dòng)態(tài)改變，不需要重啟Windows。其次，使用cpuid.com提供的CPU-Z免費(fèi)工具確定實(shí)際的時(shí)鐘速率。如果你使用高性能電源計(jì)劃且處理器仍然美歐全速運(yùn)行，那么你需要進(jìn)入系統(tǒng)BIOS去改變電源管理設(shè)置，要么OS控制，要么完全禁用。

下面的查詢會(huì)返回安裝哪些SQL Server服務(wù)及其如何配置的信息：

-- SQL Server Services information from SQL Server 2012
SELECT servicename, startup_type_desc, status_desc,last_startup_time, service_account, is_clustered, cluster_nodename
FROM sys.dm_server_services WITH (NOLOCK) OPTION (RECOMPILE);
-- Gives you information about your installed SQL Server Services, whether they are clustered, and which node owns the cluster resources

下面的查詢獲取SQL Server錯(cuò)誤日志相關(guān)的信息，事先知道錯(cuò)誤日志的路徑等有助于在必要時(shí)去查閱：

-- Shows you where the SQL Server error log is located and how it is configured
SELECT is_enabled, [path], max_size, max_files
FROM sys.dm_os_server_diagnostics_log_configurations WITH (NOLOCK) OPTION (RECOMPILE);
-- Knowing this information is important for troubleshooting purposes

下面的查詢會(huì)返回是否你的數(shù)據(jù)庫(kù)Server使用了Windows Clustering：

-- Get information about your OS cluster
--(if your database server is in a cluster)
SELECT VerboseLogging, SqlDumperDumpFlags, SqlDumperDumpPath,SqlDumperDumpTimeOut, FailureConditionLevel, HealthCheckTimeout
FROM sys.dm_os_cluster_properties WITH (NOLOCK) OPTION (RECOMPILE);
-- You will see no results if your instance is not clustered

如果你使用群集，你可以使用下面的查詢來(lái)獲取有關(guān)群集節(jié)點(diǎn)的有用信息：

-- Get information about your cluster nodes and their status (if your database server is in a cluster)
SELECT NodeName, status_description, is_current_owner
FROM sys.dm_os_cluster_nodes WITH (NOLOCK) OPTION (RECOMPILE);
-- Knowing which node owns the cluster resources is critical Especially when you are installing Windows or SQL Server updates

下面的查詢收集有關(guān)SQL Server實(shí)例配置的額外信息，你可以使用sp_configure來(lái)改變屬性：

-- Get configuration values for instance
SELECT name, value, value_in_use, [description] FROM sys.configurations WITH (NOLOCK) ORDER BY name OPTION (RECOMPILE);
-- Focus on backup compression default
-- clr enabled (only enable if it is needed)
-- lightweight pooling (should be zero)
-- max degree of parallelism
-- max server memory (MB) (set to an appropriate value)
-- optimize for ad hoc workloads (should be 1)
-- priority boost (should be zero)

下面的查詢返回網(wǎng)絡(luò)配置相關(guān)的一點(diǎn)信息，對(duì)于處理網(wǎng)絡(luò)、防火墻相關(guān)的問(wèn)題有幫助：

-- Get information about TCP Listener for SQL Server
SELECT listener_id, ip_address, is_ipv4, port, type_desc, state_desc, start_time
FROM sys.dm_tcp_listener_states WITH (NOLOCK) OPTION (RECOMPILE);
-- Helpful for network and connectivity troubleshooting

下面的查詢返回來(lái)自Windows注冊(cè)表SQL Server相關(guān)的信息：

-- SQL Server Registry information
SELECT registry_key, value_name, value_data
FROM sys.dm_server_registry WITH (NOLOCK) OPTION (RECOMPILE);
-- This lets you safely read some SQL Server related information from the Windows Registry

下面的查詢返回是否SQL Server實(shí)例產(chǎn)生內(nèi)存轉(zhuǎn)儲(chǔ)：

-- Get information on location, time and size of any memory dumps from SQL Server
SELECT [filename], creation_time, size_in_bytes
FROM sys.dm_server_memory_dumps WITH (NOLOCK) OPTION (RECOMPILE);
-- This will not return any rows if you have not had any memory dumps (which is a good thing)

下面的查詢返回SQL Server實(shí)例有多少在運(yùn)行的數(shù)據(jù)庫(kù)，它們位于哪里：

-- File Names and Paths for Tempdb and all user databases in instance
SELECT DB_NAME([database_id])AS [Database Name],
[file_id], name, physical_name, type_desc, state_desc,
CONVERT( bigint, size/128.0) AS [Total Size in MB]
FROM sys.master_files WITH (NOLOCK)
WHERE [database_id] > 4
AND [database_id] <> 32767
OR [database_id] = 2
ORDER BY DB_NAME([database_id]) OPTION (RECOMPILE);
-- Things to look at:
-- Are data files and log files on different drives?
-- Is everything on the C: drive?
-- Is TempDB on dedicated drives?
-- Are there multiple data files?

下面的查詢返回實(shí)例上所有數(shù)據(jù)庫(kù)的一些關(guān)鍵屬性：

-- Recovery model, log reuse wait description, log file size, log usage size
-- and compatibility level for all databases on instance
SELECT db.[name] AS [Database Name], db.recovery_model_desc AS [Recovery Model],
db.log_reuse_wait_desc AS [Log Reuse Wait Description],
ls.cntr_value AS [Log Size (KB)], lu.cntr_value AS [Log Used (KB)],
CAST(CAST(lu.cntr_value AS FLOAT) / CAST(ls.cntr_value AS FLOAT)AS DECIMAL(18,2)) *
100 AS
[Log Used %], db.[compatibility_level] AS [DB Compatibility Level],
db.page_verify_option_desc AS [Page Verify Option], db.is_auto_create_stats_on,
db.is_auto_update_stats_on, db.is_auto_update_stats_async_on,
db.is_parameterization_forced,
db.snapshot_isolation_state_desc, db.is_read_committed_snapshot_on,
is_auto_shrink_on, is_auto_close_on
FROM sys.databases AS db WITH (NOLOCK)
INNER JOIN sys.dm_os_performance_counters AS lu WITH (NOLOCK)
ON db.name = lu.instance_name
INNER JOIN sys.dm_os_performance_counters AS ls WITH (NOLOCK)
ON db.name = ls.instance_name
WHERE lu.counter_name LIKE N'Log File(s) Used Size (KB)%'
AND ls.counter_name LIKE N'Log File(s) Size (KB)%'
AND ls.cntr_value > 0 OPTION (RECOMPILE);
-- Things to look at:
-- How many databases are on the instance?
-- What recovery models are they using?
-- What is the log reuse wait description?
-- How full are the transaction logs ?
-- What compatibility level are they on?

下面的查詢返回哪個(gè)數(shù)據(jù)庫(kù)文件有最大的I/O延遲：

-- Calculates average stalls per read, per write,
-- and per total input/output for each database file.
SELECT DB_NAME(fs.database_id) AS [Database Name], mf.physical_name,
io_stall_read_ms, num_of_reads,
CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS
[avg_read_stall_ms],io_stall_write_ms,
num_of_writes,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS
[avg_write_stall_ms],
io_stall_read_ms + io_stall_write_ms AS [io_stalls], num_of_reads + num_of_writes
AS [total_io],
CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS
NUMERIC(10,1))
AS [avg_io_stall_ms]
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);
-- Helps determine which database files on
-- the entire instance have the most I/O bottlenecks

如果遇到I/O瓶頸，你可以使用Windows Performance Monitor查看邏輯磁盤的Avg Disk Sec/Write和 Avg Disk Sec/Read等計(jì)數(shù)器。根據(jù)存儲(chǔ)的類型，通過(guò)增加更多的spindle、改變RAID控制器高速緩存策略或改變RAID級(jí)別或許可以改善I/O性能。如果可能的話，你也可以考慮移除一些數(shù)據(jù)文件到其他磁盤。

下面的查詢返回占用最多內(nèi)存的用戶數(shù)據(jù)庫(kù)：

-- Get total buffer usage by database for current instance
SELECT DB_NAME(database_id) AS [Database Name],
COUNT(*) * 8/1024.0 AS [Cached Size (MB)]
FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)
WHERE database_id > 4 -- system databases
AND database_id <> 32767 -- ResourceDB
GROUP BY DB_NAME(database_id)
ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);
-- Tells you how much memory (in the buffer pool)
-- is being used by each database on the instance

下面的查詢返回使用最多處理器時(shí)間的用戶數(shù)據(jù)庫(kù)：

-- Get CPU utilization by database
WITH DB_CPU_Stats
AS
(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName],
SUM(total_worker_time) AS [CPU_Time_Ms]
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID]
FROM sys.dm_exec_plan_attributes(qs.plan_handle)
WHERE attribute = N'dbid') AS F_DB
GROUP BY DatabaseID)
SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [row_num],
DatabaseName, [CPU_Time_Ms],
CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms])
OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]
FROM DB_CPU_Stats
WHERE DatabaseID > 4 -- system databases
AND DatabaseID <> 32767 -- ResourceDB
ORDER BY row_num OPTION (RECOMPILE);
-- Helps determine which database is using the most CPU resources on the instance

下面的查詢返回SQL Server上次重啟或使用DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)以來(lái)，最高的累計(jì)等待統(tǒng)計(jì)：

-- Isolate top waits for server instance since last restart or statistics clear
WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats WITH (NOLOCK)
WHERE wait_type NOT IN (N'CLR_SEMAPHORE',N'LAZYWRITER_SLEEP',N'RESOURCE_QUEUE',
N'SLEEP_TASK',N'SLEEP_SYSTEMTASK',N'SQLTRACE_BUFFER_FLUSH',N'WAITFOR',
N'LOGMGR_QUEUE',N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
N'XE_TIMER_EVENT',N'BROKER_TO_FLUSH',N'BROKER_TASK_STOP',N'CLR_MANUAL_EVENT',
N'CLR_AUTO_EVENT',N'DISPATCHER_QUEUE_SEMAPHORE', N'FT_IFTS_SCHEDULER_IDLE_WAIT',
N'XE_DISPATCHER_WAIT', N'XE_DISPATCHER_JOIN', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'ONDEMAND_TASK_QUEUE', N'BROKER_EVENTHANDLER', N'SLEEP_BPOOL_FLUSH',
N'DIRTY_PAGE_POLL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'SP_SERVER_DIAGNOSTICS_SLEEP'))
SELECT W1.wait_type,
CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 99 OPTION (RECOMPILE); -- percentage threshold
-- Clear Wait Stats
-- DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

博客http://blogs./b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx里介紹了很多等待類型。

下面的查詢返回實(shí)例上累積的信號(hào)（CPU）等待：

-- Signal Waits for instance
SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM (wait_time_ms) AS
NUMERIC(20,2)) AS [%resource waits]
FROM sys.dm_os_wait_stats WITH (NOLOCK) OPTION (RECOMPILE);
-- Signal Waits above 15-20% is usually a sign of CPU pressure

信號(hào)等待是CPU相關(guān)的等待。通常信號(hào)等待在15~20%就表示CPU壓力。

下面的查詢返回最常連接數(shù)據(jù)庫(kù)的登錄信息：

-- Get logins that are connected and how many sessions they have
SELECT login_name, COUNT(session_id) AS [session_count]
FROM sys.dm_exec_sessions WITH (NOLOCK)
GROUP BY login_name
ORDER BY COUNT(session_id) DESC OPTION (RECOMPILE);
-- This can help characterize your workload and
-- determine whether you are seeing a normal level of activity

下面的查詢返回當(dāng)前的任務(wù)急pending的I/O計(jì)數(shù)信息，返回的3個(gè)值越低越好：

-- Get Average Task Counts (run multiple times)
SELECT AVG(current_tasks_count) AS [Avg Task Count],
AVG(runnable_tasks_count) AS [Avg Runnable Task Count],
AVG(pending_disk_io_count) AS [Avg Pending DiskIO Count]
FROM sys.dm_os_schedulers WITH (NOLOCK)
WHERE scheduler_id < 255 OPTION (RECOMPILE);
-- Sustained values above 10 suggest further investigation in that area
-- High Avg Task Counts are often caused by blocking or other resource contention
-- High Avg Runnable Task Counts are a good sign of CPU pressure
-- High Avg Pending DiskIO Counts are a sign of disk pressure

下面的查詢返回過(guò)去256分鐘CPU使用的歷史狀況，1分鐘一個(gè)間隔：

-- Get CPU Utilization History for last 256 minutes (in one minute intervals)
-- This version works with SQL Server 2008 and above
DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks)
FROM sys.dm_os_sys_info WITH (NOLOCK));
SELECT TOP(256) SQLProcessUtilization AS [SQL Server Process CPU Utilization],
SystemIdle AS [System Idle Process],
100 - SystemIdle - SQLProcessUtilization
AS [Other Process CPU Utilization],
DATEADD(ms, -1 * (@ts_now - [timestamp]),
GETDATE()) AS [Event Time]
FROM (SELECT record.value('(./Record/@id)[1]', 'int') AS record_id,
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int')
AS[SystemIdle],record.value('(./Record/SchedulerMonitorEvent/SystemHealth/
ProcessUtilization)[1]','int')
AS [SQLProcessUtilization], [timestamp]
FROM (SELECT [timestamp], CONVERT(xml, record) AS [record]
FROM sys.dm_os_ring_buffers WITH (NOLOCK)
WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
AND record LIKE N'%<SystemHealth>%') AS x
) AS y
ORDER BY record_id DESC OPTION (RECOMPILE);
-- Look at the trend over the entire period.
-- Also look at high sustained Other Process CPU Utilization values

如果Other Process CPU Utilization持續(xù)超過(guò)5%，你就應(yīng)該查看什么在使用CPU。

下面的查詢返回OS級(jí)別物理內(nèi)存的狀況：

-- Good basic information about OS memory amounts and state
SELECT total_physical_memory_kb, available_physical_memory_kb,
total_page_file_kb, available_page_file_kb,
system_memory_state_desc
FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE);
-- You want to see "Available physical memory is high"
-- This indicates that you are not under external memory pressure

下面的查詢返回SQL Server的內(nèi)存使用情況：

-- SQL Server Process Address space info
--(shows whether locked pages is enabled, among other things)
SELECT physical_memory_in_use_kb,locked_page_allocations_kb,
page_fault_count, memory_utilization_percentage,
available_commit_limit_kb, process_physical_memory_low,
process_virtual_memory_low
FROM sys.dm_os_process_memory WITH (NOLOCK) OPTION (RECOMPILE);
-- You want to see 0 for process_physical_memory_low
-- You want to see 0 for process_virtual_memory_low
-- This indicates that you are not under internal memory pressure

查看SQL Server是否處于內(nèi)存壓力的一個(gè)經(jīng)典做法是查看Page Life Expectancy (PLE)，PLE越高越好（MS推薦300為可接受）：

-- Page Life Expectancy (PLE) value for default instance
SELECT cntr_value AS [Page Life Expectancy]
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE [object_name] LIKE N'%Buffer Manager%' -- Handles named instances
AND counter_name = N'Page life expectancy' OPTION (RECOMPILE);
-- PLE is one way to measure memory pressure.
-- Higher PLE is better. Watch the trend, not the absolute value.

下面的查詢返回Memory Grants Outstanding：

-- Memory Grants Outstanding value for default instance
SELECT cntr_value AS [Memory Grants Outstanding]
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE [object_name] LIKE N'%Memory Manager%' -- Handles named instances
AND counter_name = N'Memory Grants Outstanding' OPTION (RECOMPILE);
-- Memory Grants Outstanding above zero
-- for a sustained period is a secondary indicator of memory pressure

下面的查詢返回Memory Grants Pending：

-- Memory Grants Pending value for default instance
SELECT cntr_value AS [Memory Grants Pending]
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE [object_name] LIKE N'%Memory Manager%' -- Handles named instances
AND counter_name = N'Memory Grants Pending' OPTION (RECOMPILE);
-- Memory Grants Pending above zero
-- for a sustained period is an extremely strong indicator of memory pressure

當(dāng)你從上面3個(gè)查詢中看到任何內(nèi)存壓力，要進(jìn)一步查看整體的內(nèi)存使用狀況：

-- Memory Clerk Usage for instance
-- Look for high value for CACHESTORE_SQLCP (Ad-hoc query plans)
SELECT TOP(10) [type] AS [Memory Clerk Type],
SUM(pages_kb) AS [SPA Mem, Kb]
FROM sys.dm_os_memory_clerks WITH (NOLOCK)
GROUP BY [type]
ORDER BY SUM(pages_kb) DESC OPTION (RECOMPILE);
-- CACHESTORE_SQLCP SQL Plans
-- These are cached SQL statements or batches that
-- aren't in stored procedures, functions and triggers
--
-- CACHESTORE_OBJCP Object Plans
-- These are compiled plans for
-- stored procedures, functions and triggers
--
-- CACHESTORE_PHDR Algebrizer Trees
-- An algebrizer tree is the parsed SQL text
-- that resolves the table and column names

如果你看到CACHESTORE_SQLCP memory clerk使用很多內(nèi)存，那么你可以確定在Procedure緩存里是否有很多只用一次且占用了大量?jī)?nèi)存的ad hoc查詢計(jì)劃：

-- Find single-use, ad-hoc queries that are bloating the plan cache
SELECT TOP(20) [text] AS [QueryText], cp.size_in_bytes
FROM sys.dm_exec_cached_plans AS cp WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE cp.cacheobjtype = N'Compiled Plan'
AND cp.objtype = N'Adhoc'
AND cp.usecounts = 1
ORDER BY cp.size_in_bytes DESC OPTION (RECOMPILE);
-- Gives you the text and size of single-use ad-hoc queries that
-- waste space in the plan cache
-- Enabling 'optimize for ad hoc workloads' for the instance
-- can help (SQL Server 2008 and above only)
-- Enabling forced parameterization for the database can help, but test first!

數(shù)據(jù)庫(kù)級(jí)別的查詢

切換到特定用戶數(shù)據(jù)庫(kù)：

-- Database specific queries ******************************************************
-- **** Switch to a user database *****
USE YourDatabaseName;
GO

查詢數(shù)據(jù)庫(kù)大?。?/p>

-- Individual File Sizes and space available for current database
SELECT name AS [File Name], physical_name AS [Physical Name], size/128.0 AS [Total
Size in MB],
size/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS [Available Space
In MB], [file_id]
FROM sys.database_files WITH (NOLOCK) OPTION (RECOMPILE);
-- Look at how large and how full the files are and where they are located
-- Make sure the transaction log is not full!!

查看事務(wù)日志大小及空間使用：

-- Get transaction log size and space information for the current database
SELECT DB_NAME(database_id) AS [Database Name], database_id,
CAST((total_log_size_in_bytes/1048576.0) AS DECIMAL(10,1))
AS [Total_log_size(MB)],
CAST((used_log_space_in_bytes/1048576.0) AS DECIMAL(10,1))
AS [Used_log_space(MB)],
CAST(used_log_space_in_percent AS DECIMAL(10,1)) AS [Used_log_space(%)]
FROM sys.dm_db_log_space_usage WITH (NOLOCK) OPTION (RECOMPILE);
-- Another way to look at transaction log file size and space

按文件搜集I/O統(tǒng)計(jì)：

-- I/O Statistics by file for the current database
SELECT DB_NAME(DB_ID()) AS [Database Name],[file_id], num_of_reads, num_of_writes,
io_stall_read_ms, io_stall_write_ms,
CAST(100. * io_stall_read_ms/(io_stall_read_ms + io_stall_write_ms)
AS DECIMAL(10,1)) AS [IO Stall Reads Pct],
CAST(100. * io_stall_write_ms/(io_stall_write_ms + io_stall_read_ms)
AS DECIMAL(10,1)) AS [IO Stall Writes Pct],
(num_of_reads + num_of_writes) AS [Writes + Reads], num_of_bytes_read,
num_of_bytes_written,
CAST(100. * num_of_reads/(num_of_reads + num_of_writes) AS DECIMAL(10,1))
AS [# Reads Pct],
CAST(100. * num_of_writes/(num_of_reads + num_of_writes) AS DECIMAL(10,1))
AS [# Write Pct],
CAST(100. * num_of_bytes_read/(num_of_bytes_read + num_of_bytes_written)
AS DECIMAL(10,1)) AS [Read Bytes Pct],
CAST(100. * num_of_bytes_written/(num_of_bytes_read + num_of_bytes_written)
AS DECIMAL(10,1)) AS [Written Bytes Pct]
FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL) OPTION (RECOMPILE);
-- This helps you characterize your workload better from an I/O perspective

查看事務(wù)日志Virtual Log File (VLF)計(jì)數(shù)：

-- Get VLF count for transaction log for the current database,
-- number of rows equals the VLF count. Lower is better!
DBCC LOGINFO;
-- High VLF counts can affect write performance
-- and they can make database restore and recovery take much longer

事務(wù)日志中如果有大量VLF，就會(huì)影響寫入事務(wù)日志的性能，更重要的是影響恢復(fù)數(shù)據(jù)庫(kù)的時(shí)間。
查看特定數(shù)據(jù)庫(kù)上的查詢活動(dòng)：

-- Top cached queries by Execution Count (SQL Server 2012)
SELECT qs.execution_count, qs.total_rows, qs.last_rows, qs.min_rows, qs.max_rows,
qs.last_elapsed_time, qs.min_elapsed_time, qs.max_elapsed_time,
SUBSTRING(qt.TEXT,qs.statement_start_offset/2 +1,
(CASE WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.TEXT)) * 2
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)
AS query_text
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
ORDER BY qs.execution_count DESC OPTION (RECOMPILE);
-- Uses several new rows returned columns
-- to help troubleshoot performance problems

查看緩存的存儲(chǔ)過(guò)程：

-- Top Cached SPs By Execution Count (SQL Server 2012)
SELECT TOP(250) p.name AS [SP Name], qs.execution_count,
ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0)
AS [Calls/Second],
qs.total_worker_time/qs.execution_count AS [AvgWorkerTime],
qs.total_worker_time AS [TotalWorkerTime],qs.total_elapsed_time,
qs.total_elapsed_time/qs.execution_count AS [avg_elapsed_time],
qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
ORDER BY qs.execution_count DESC OPTION (RECOMPILE);
-- Tells you which cached stored procedures are called the most often
-- This helps you characterize and baseline your workload

查看存儲(chǔ)過(guò)程在緩存里待多長(zhǎng)時(shí)間（Cached_time）：

-- Top Cached SPs By Avg Elapsed Time (SQL Server 2012)
SELECT TOP(25) p.name AS [SP Name], qs.total_elapsed_time/qs.execution_count
AS [avg_elapsed_time], qs.total_elapsed_time, qs.execution_count,
ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time,
GETDATE()), 0) AS [Calls/Second],
qs.total_worker_time/qs.execution_count AS [AvgWorkerTime],
qs.total_worker_time AS [TotalWorkerTime], qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
ORDER BY avg_elapsed_time DESC OPTION (RECOMPILE);
-- This helps you find long-running cached stored procedures that
-- may be easy to optimize with standard query tuning techniques

從整體CPU角度查看最耗時(shí)的存儲(chǔ)過(guò)程：

-- Top Cached SPs By Total Worker time (SQL Server 2012).
-- Worker time relates to CPU cost
SELECT TOP(25) p.name AS [SP Name], qs.total_worker_time AS [TotalWorkerTime],
qs.total_worker_time/qs.execution_count AS [AvgWorkerTime], qs.execution_count,
ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0)
AS [Calls/Second],qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count
AS [avg_elapsed_time], qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);
-- This helps you find the most expensive cached
-- stored procedures from a CPU perspective
-- You should look at this if you see signs of CPU pressure

從邏輯度的角度查看緩存的存儲(chǔ)過(guò)程相關(guān)的信息：

-- Top Cached SPs By Total Logical Reads (SQL Server 2012).
-- Logical reads relate to memory pressure
SELECT TOP(25) p.name AS [SP Name], qs.total_logical_reads
AS [TotalLogicalReads], qs.total_logical_reads/qs.execution_count
AS [AvgLogicalReads],qs.execution_count,
ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0)
AS [Calls/Second], qs.total_elapsed_time,qs.total_elapsed_time/qs.execution_count
AS [avg_elapsed_time], qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
ORDER BY qs.total_logical_reads DESC OPTION (RECOMPILE);
-- This helps you find the most expensive cached
-- stored procedures from a memory perspective
-- You should look at this if you see signs of memory pressure

從物理度的角度查看最耗時(shí)的存儲(chǔ)過(guò)程：

-- Top Cached SPs By Total Physical Reads (SQL Server 2012).
-- Physical reads relate to disk I/O pressure
SELECT TOP(25) p.name AS [SP Name],qs.total_physical_reads
AS [TotalPhysicalReads],qs.total_physical_reads/qs.execution_count
AS [AvgPhysicalReads], qs.execution_count, qs.total_logical_reads,
qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count
AS [avg_elapsed_time], qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
AND qs.total_physical_reads > 0
ORDER BY qs.total_physical_reads DESC,
qs.total_logical_reads DESC OPTION (RECOMPILE);
-- This helps you find the most expensive cached
-- stored procedures from a read I/O perspective
-- You should look at this if you see signs of I/O pressure or of memory pressure

從邏輯寫來(lái)看最耗時(shí)的緩存存儲(chǔ)過(guò)程：

-- Top Cached SPs By Total Logical Writes (SQL Server 2012).
-- Logical writes relate to both memory and disk I/O pressure
SELECT TOP(25) p.name AS [SP Name], qs.total_logical_writes
AS [TotalLogicalWrites], qs.total_logical_writes/qs.execution_count
AS [AvgLogicalWrites], qs.execution_count,
ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0)
AS [Calls/Second],qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count
AS [avg_elapsed_time], qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
ORDER BY qs.total_logical_writes DESC OPTION (RECOMPILE);
-- This helps you find the most expensive cached
-- stored procedures from a write I/O perspective
-- You should look at this if you see signs of I/O pressure or of memory pressure

從平均I/O來(lái)看緩存的存儲(chǔ)過(guò)程中最耗資源的語(yǔ)句：

-- Lists the top statements by average input/output
-- usage for the current database
SELECT TOP(50) OBJECT_NAME(qt.objectid) AS [SP Name],
(qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count
AS [Avg IO],SUBSTRING(qt.[text],qs.statement_start_offset/2,
(CASE
WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), qt.[text])) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) AS [Query Text]
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE qt.[dbid] = DB_ID()
ORDER BY [Avg IO] DESC OPTION (RECOMPILE);
-- Helps you find the most expensive statements for I/O by SP

查看寫比讀更多的非聚集索引：

-- Possible Bad NC Indexes (writes > reads)
SELECT OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name],
i.index_id,user_updates AS [Total Writes],
user_seeks + user_scans + user_lookups AS [Total Reads],
user_updates - (user_seeks + user_scans + user_lookups) AS [Difference]
FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK)
INNER JOIN sys.indexes AS i WITH (NOLOCK)
ON s.[object_id] = i.[object_id]
AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1
AND s.database_id = DB_ID()
AND user_updates > (user_seeks + user_scans + user_lookups)
AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC OPTION
(RECOMPILE);
-- Look for indexes with high numbers of writes
-- and zero or very low numbers of reads
-- Consider your complete workload
-- Investigate further before dropping an index!

查看缺失的索引信息：

-- Missing Indexes current database by Index Advantage
SELECT user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)
AS [index_advantage],
migs.last_user_seek, mid.[statement] AS [Database.Schema.Table],
mid.equality_columns, mid.inequality_columns, mid.included_columns,
migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost,
migs.avg_user_impact
FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)
INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)
ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID() -- Remove this to see for entire instance
ORDER BY index_advantage DESC OPTION (RECOMPILE);
-- Look at last user seek time, number of user seeks
-- to help determine source and importance
-- SQL Server is overly eager to add included columns, so beware
-- Do not just blindly add indexes that show up from this query!!!

查看緩存的執(zhí)行計(jì)劃中缺失索引警告：

-- Find missing index warnings for cached plans in the current database
-- Note: This query could take some time on a busy instance
SELECT TOP(25) OBJECT_NAME(objectid) AS [ObjectName],query_plan,
cp.objtype, cp.usecounts
FROM sys.dm_exec_cached_plans AS cp WITH (NOLOCK)
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
WHERE CAST(query_plan AS NVARCHAR(MAX)) LIKE N'%MissingIndex%'
AND dbid = DB_ID()
ORDER BY cp.usecounts DESC OPTION (RECOMPILE);
-- Helps you connect missing indexes to specific stored procedures or queries
-- This can help you decide whether to add them or not

查看SQL Server緩沖池中占用最多空間的表和索引：

-- Breaks down buffers used by current database
-- by object (table, index) in the buffer cache
SELECT OBJECT_NAME(p.[object_id]) AS [ObjectName],
p.index_id, COUNT(*)/128 AS [Buffer size(MB)], COUNT(*) AS [BufferCount],
p.data_compression_desc AS [CompressionType]
FROM sys.allocation_units AS a WITH (NOLOCK)
INNER JOIN sys.dm_os_buffer_descriptors AS b WITH (NOLOCK)
ON a.allocation_unit_id = b.allocation_unit_id
INNER JOIN sys.partitions AS p WITH (NOLOCK)
ON a.container_id = p.hobt_id
WHERE b.database_id = CONVERT(int,DB_ID())
AND p.[object_id] > 100
GROUP BY p.[object_id], p.index_id, p.data_compression_desc
ORDER BY [BufferCount] DESC OPTION (RECOMPILE);
-- Tells you what tables and indexes are
-- using the most memory in the buffer cache

查看數(shù)據(jù)庫(kù)中所有表的大小及數(shù)據(jù)壓縮狀態(tài)：

-- Get Table names, row counts, and compression status
-- for the clustered index or heap
SELECT OBJECT_NAME(object_id) AS [ObjectName],
SUM(Rows) AS [RowCount], data_compression_desc AS [CompressionType]
FROM sys.partitions WITH (NOLOCK)
WHERE index_id < 2 --ignore the partitions from the non-clustered index if any
AND OBJECT_NAME(object_id) NOT LIKE N'sys%'
AND OBJECT_NAME(object_id) NOT LIKE N'queue_%'
AND OBJECT_NAME(object_id) NOT LIKE N'filestream_tombstone%'
AND OBJECT_NAME(object_id) NOT LIKE N'fulltext%'
AND OBJECT_NAME(object_id) NOT LIKE N'ifts_comp_fragment%'
AND OBJECT_NAME(object_id) NOT LIKE N'filetable_updates%'
GROUP BY object_id, data_compression_desc
ORDER BY SUM(Rows) DESC OPTION (RECOMPILE);
-- Gives you an idea of table sizes, and possible data compression opportunities

查看數(shù)據(jù)庫(kù)中所有索引最后一次統(tǒng)計(jì)更新的時(shí)間：

-- When were Statistics last updated on all indexes?
SELECT o.name, i.name AS [Index Name],STATS_DATE(i.[object_id],
i.index_id) AS [Statistics Date], s.auto_created,
s.no_recompute, s.user_created, st.row_count
FROM sys.objects AS o WITH (NOLOCK)
INNER JOIN sys.indexes AS i WITH (NOLOCK)
ON o.[object_id] = i.[object_id]
INNER JOIN sys.stats AS s WITH (NOLOCK)
ON i.[object_id] = s.[object_id]
AND i.index_id = s.stats_id
INNER JOIN sys.dm_db_partition_stats AS st WITH (NOLOCK)
ON o.[object_id] = st.[object_id]
AND i.[index_id] = st.[index_id]
WHERE o.[type] = 'U'
ORDER BY STATS_DATE(i.[object_id], i.index_id) ASC OPTION (RECOMPILE);
-- Helps discover possible problems with out-of-date statistics
-- Also gives you an idea which indexes are most active

查看當(dāng)前數(shù)據(jù)庫(kù)中碎片最多的索引：

-- Get fragmentation info for all indexes
-- above a certain size in the current database
-- Note: This could take some time on a very large database
SELECT DB_NAME(database_id) AS [Database Name],
OBJECT_NAME(ps.OBJECT_ID) AS [Object Name],
i.name AS [Index Name], ps.index_id, index_type_desc,
avg_fragmentation_in_percent, fragment_count, page_count
FROM sys.dm_db_index_physical_stats(DB_ID(),NULL, NULL, NULL ,'LIMITED') AS ps
INNER JOIN sys.indexes AS i WITH (NOLOCK)
ON ps.[object_id] = i.[object_id]
AND ps.index_id = i.index_id
WHERE database_id = DB_ID()
AND page_count > 500
ORDER BY avg_fragmentation_in_percent DESC OPTION (RECOMPILE);
-- Helps determine whether you have fragmentation in your relational indexes
-- and how effective your index maintenance strategy is

如果你發(fā)現(xiàn)超過(guò)10%碎片的索引，你就需要決定是重組還是重建它們。重組通常是Online操作，能在任何時(shí)間停止。重建可以是Online或Offline操作。在SQL Server 2012中，你可以O(shè)nline重建聚集索引，而不必考慮表包含什么類型的數(shù)據(jù)。收縮數(shù)據(jù)文件非常消耗資源。不要犯常見的錯(cuò)誤定期去重建所有索引，這非常浪費(fèi)資源。你可以在Internet上找一些好的索引維護(hù)腳本。Ola Hallengren開發(fā)了一個(gè)非常好的腳本，你可以從 http://ola.獲取。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：曾淼Mark > 《MSSQL》

舉報(bào)/認(rèn)領(lǐng)