How to Troubleshoot Slow Disk Read/write Interview Questions

In this article, nosotros will discuss how to resolve I/O bug that is a very important point for the SQL Server troubleshooting. The storage subsystem is one of the meaning functioning factors for the databases. Detecting and identifying I/O problems in SQL Server can be a tough task for the database administrators (DBAs). More often than not, the underlying reasons for the I/O problems can be:

Misconfigured or malfunctioning deejay subsystems
Insufficient disk performances
Applications that generate redundant I/O activities
Poor designed or unoptimized queries

Analyzing the symptoms should be a major principle to clarify the underlying reason that causes the I/O problems on SQL Server. Otherwise, nosotros can waste matter time dealing with irrelevant problems or discussing the problems with system or storage administrators unnecessarily. Wait types give very useful information for SQL Server troubleshooting. The post-obit await types can point I/O issues, but these look types exercise not suffice to decide whatever trouble on the disks.

PAGEIOLATCH_*
WRITELOG
ASYNC_IO_COMPLETION

At first, we will briefly describe these wait types and their relations to the I/O problems.

PAGEIOLATCH_*

SQL Server reserves an area on the memory to itself, and this area uses to cache information and index pages to reduce the disk activities. This reserved memory expanse is called Buffer Puddle. The working mechanism of the buffer puddle is very elementary; the data loads from the disk to the memory when any request has been received for reading or irresolute, and they process in the buffer pool. The data is written to the disk over again when information technology is modified. In light of this information, PAGEIOLATCH_* occurs when transferring data from disk to buffer pool. Information technology is very normal to detect some PAGEIOLATCH_* nevertheless, it indicates a problem when we meet this wait type oft and more than than the other expect types. PAGEIOLATCH_* does not indicate deejay problems by oneself considering this wait blazon tin can occur for a variety of reasons. For example:

Outdated statistics or poorly designed indexes can crusade to PAGEIOLATCH_* waits because these types of problems crusade redundant deejay activities
Enabling CDC (Change Data Capture) option can cause extra I/O workload
Insufficient memory can cause PAGEIOLATCH_* problems because SQL Server does not keep the data pages long enough in the buffer enshroud. The other sign of this trouble is the Page Life Expectancy metric

WRITELOG

When any modification is performed in the database, SQL Server writes this modification to log buffer, and then it writes this buffer data to deejay. Therefore, this wait type is related to the physical disk that contains the log file (ldf). Placing log files (ldf) on as fast and dedicated disks as possible will be the correct arroyo to overcome these problems. At the same time, performance statistics of concrete disks that shop ldf files should be considered when this problem occurs. The log information is written into the disk sequentially, and the reading process is likewise performed sequentially. Due to this working principle, the disks selected for the log files must perform well for the sequential read and write throughput along with the minimum latency.

ASYNC_IO_COMPLETION

This wait type occurs when the SQL Server processes backup and restore operations; however, when this operation takes more time than usual, it might exist a alert for the I/O problems. The BACKUPIO can exist seen with the ASYNC_IO_COMPLETION so we tin consider about any deejay problem.

I/O Stalls

I/O stall time is an indicator that can be used to discover I/O problems. The dm_io_virtual_file_stats is a dynamic management part that gives detailed information about the stall times of the data and log files then information technology will simplify the SQL Server troubleshooting process. This dynamic management part takes two parameters start 1 is database id, and the second i is the database file number.

SELECT * FROM

sys . dm_io_virtual_file_stats (

{ database_id | NULL } ,

{ file_id | Naught }

)

Nosotros tin can execute this dynamic management role similar the below for all databases.

select Db . name , vfs . * from

sys . dm_io_virtual_file_stats ( Cypher , Zippo ) As VFS

JOIN sys . databases AS Db

ON vfs . database_id = Db . database_id

Using dm_io_virtual_file_stats function for SQL Server troubleshooting.

database_id: This cavalcade represents the id number of the database, and we can use sys.databases table to obtain all database id numbers.

file_id: This column represents the id number of the file, and we can use sys.master_files table to obtain all database id numbers.

sample_ms: This column shows the duration of the since server restarted.

num_of_reads: This column shows the number of physical reads that occurred since the server restarted.

num_of_bytes_reads: This column shows the total corporeality of physical reads in bytes that occurred since the server restarted.

io_stall_read_ms: This column shows total latency for the read operations in a millisecond.

num_of_writes: This column shows the number of writes that occurred since the server restarted.

num_of_bytes_written: This column shows the full amount of reads in bytes that occurred since the server restarted.

io_stall_write_ms: This column shows total latency for the write operations in a millisecond.

io_stall: This cavalcade shows the total latency fourth dimension for the I/O operations in a millisecond.

The high stall times bespeak I/O problems and busy deejay activities. With the help of the following query, we can find out the read, write, and total latency of the database files and so that nosotros tin can diagnose any storage problems.

SELECT DB_NAME ( vfs . database_id ) Every bit database_name , physical_name AS [ Physical Proper name ] ,

size_on_disk_bytes / 1024 / 1024. As [ Size of Disk ] ,

Cast ( io_stall_read_ms / ( i.0 + num_of_reads ) Every bit NUMERIC ( 10 , 1 ) ) Equally [ Average Read latency ] ,

CAST ( io_stall_write_ms / ( 1.0 + num_of_writes ) AS NUMERIC ( x , 1 ) ) Every bit [ Average Write latency ] ,

Cast ( ( io_stall_read_ms + io_stall_write_ms )

/ ( i.0 + num_of_reads + num_of_writes )

AS NUMERIC ( x , i ) ) Equally [ Average Total Latency ] ,

num_of_bytes_read / NULLIF ( num_of_reads , 0 ) AS [ Average Bytes Per Read ] ,

num_of_bytes_written / NULLIF ( num_of_writes , 0 ) AS [ Average Bytes Per Write ]

FROM sys . dm_io_virtual_file_stats ( NULL , Zilch ) AS vfs

JOIN sys . master_files AS mf

ON vfs . database_id = mf . database_id AND vfs . file_id = mf . file_id

Lodge Past [ Average Total Latency ] DESC

Analyzing I/O latency for SQL Server troubleshooting

The Average Total Latency column represents the full latency about the database files, and we tin utilize the following tabular array as reference to evaluate the deejay functioning confronting latency.

Excellent	<1 ms
Very good	<v ms
Good	<5 – 10 ms
Poor	< 10 – 20 ms
Bad	< 20 – 100 ms
Very Bad	<100 ms -500 ms
Atrocious	> 500 ms

Using Functioning Monitor to Analyze I/O Bug

Functioning Monitor is also known as Perfmon and this tool helps to track metrics about reckoner resources or installed applications. Specially, Perfmon assists in analyzing and troubleshooting SQL Server performance bug considering It includes some particular counters for SQL Server beside the general resource counters. We can understand that Perfmon plays a key part in the SQL Server troubleshooting according to this explanation. When we focus on the I/O counters of the Perfmon, some of them come to the forefront. Showtime of all, we should go on on eye to latency metrics considering these values can tell everything about the disk performance.

Latency is a performance metric that measures the fourth dimension gap between requests and responses for the disks. We can use the following counters to measure the disk latency.

Avg. Disk sec/Transfer counter shows the total latency, and these values should exist nether the 10 milliseconds
Avg. Disk sec/Read counter shows the read latency
Avg. Deejay sec/Write counter shows the write latency

Measuring disk latency with Perfmon for SQL Server troubleshooting

When we analyze the image above, this box performs an awful performance. The average latency is 0.229 seconds, so it equals 0.229*thousand=229 milliseconds.

IOPS (Input/Output operations per second) is a performance metric for the disks that measures the total input and output operations performed past the deejay in one second.

Disk Reads/sec counter indicates write IOPS
Disk Writes/sec counter indicates read IOPS
Disk Transfers/sec counter indicates the total number of the IOPS this value equals to summing Disk Reads/sec and Deejay Writes/sec counters

The throughput metric indicates how much MB can exist read or written past the deejay subsystems per 2d. The throughput value will exist changed according to our disk infrastructure, types, and vendors. For this reason, the exact value could not be given for this counter.

Disk Bytes/sec counter shows the total throughput of the disk per second
Disk Read Bytes/sec counter shows the read throughput
Disk Write Bytes/sec counter shows the read throughput

Measuring disk throughput with Perfmon for SQL Server troubleshooting

Conclusion

In this article, nosotros learned the basic methods that assist to diagnose and troubleshoot SQL Server I/O problems. To overcome this type of problem, we demand to observe all metrics that can be helped to figure out the principal problem. Understanding the main trouble is a very significant betoken for the SQL Server troubleshooting.

Author
Recent Posts

Esat Erkec is a SQL Server professional person who began his career 8+ years agone as a Software Developer. He is a SQL Server Microsoft Certified Solutions Proficient.

Almost of his career has been focused on SQL Server Database Administration and Development. His current interests are in database administration and Business organisation Intelligence. You can find him on LinkedIn.

View all posts by Esat Erkec

mariabeign1941.blogspot.com

Source: https://www.sqlshack.com/sql-server-troubleshooting-disk-i-o-problems/

Maria Beign1941