I. Introduction
II. Overview
III. Measuring Performance
- Perfmon
- Perfmon Counter Set
- Using perfmon
- Evaluating Perfmon Data
- Profiler
- Evaluating Profiler Data
- Third Party Tools
- Server Performance
- Database Performance
- Indexing
- Files and FileGroups
- Normalization
- Data Types
- Others
- TSQL Performance
- Client Access
I. Introduction
Have you heard this statement or others like it? This is usually delivered early in a development cycle when someone, probably not the person spewing these words of "wisdom," suggests that performance should be taken into account when designing and building the database, or laying out the new server configuration, or writing a trigger. Don't listen to them. Performance is as important to a small system as it is to a large one. A tiny database with 10 users is as important to those 10 users as Amazon.com is to the 10 million users it supports. It's true that worrying about the difference between a 20ms and a 5ms query on the small system may be a waste of time, but there is plenty of work to do before you get down to worrying about that sort of minutiae. The goal of this crib sheet is to provide, in broad strokes, a place to get basic performance information and tuning that applies equally well to SQL Server 2000 or 2005.
2. Overview
How much performance is enough? Where do you start tuning? When do you stop tuning? Each application being developed will answer these questions in a different way. The important thing is not to establish a single mechanism for answering them. Your goal is to establish best practices and guidelines that will lead to the answers in the right way for the application under consideration.
First, and most important, the SQL Server system itself needs to be configured correctly. It also needs to be running on a correctly configured Windows server. This is the foundation on which the databases will be built. After the server is configured, you need to design and build the database to perform optimally. That's assuming you're building a new database. If you're trying to tune an inherited database, then you'll want to know what a good database looks like. Once the server and database are out of the way, you need to be concerned with the code running against it. This means the views, triggers, functions and, in SQL Server 2005, the CLR code. It doesn't stop there because you need to be sure that the development staff is accessing the database correctly either in their general use of the database and it's code, or in their own use of ADO or whatever other client they might be using to access the system.
In order to address all these concerns you need to understand how to measure and test performance. Once you've measured the performance and found it wanting, you'll need to know what to do about it. After addressing structural changes to the database or changes to the server or refactoring the TSQL code, you'll need to have a mechanism in place to test the results in order to be sure your work is accurate.
After all this, you should have a correctly functioning system that performs and scales well.
III. Measuring Performance
While setting up the server and designing a database are the foundations on which performance is built, understanding how to measure performance allows you to verify the decisions you've made are the right ones and provides you with direction on future decisions. The two main areas that you'll measure are the performance of the server itself, including both Windows and SQL Server, and the performance of databases and their associated code within SQL Server. To well and truly understand what's happening with your system you'll combine these two sets of measures. There are some tools provided with SQL Server, and whatever flavor of Windows server you're running, that perform these measurements. An entire industry has grown around monitoring and measuring performance of SQL Server and Windows.
Perfmon
Microsoft provides Performance Monitor Counters as a means for keeping an eye on the server. These are accessed through a product called Performance Monitor, commonly referred to as perfmon, from the name of the executable. The counters themselves are grouped into Performance Objects. These vary from the mundane of Physical Disk, Server, Memory, Processor and Cache to the more obscure like Telephony and WMI Objects, all included as part of a standard server installation. After you install SQL Server an additional list of counters specific to SQL Server Objects to measure how the server is behaving such as SQL Statistics, Buffer Manager, Cache Manager, SQL Server Memory and more are available. It can be quite overwhelming when you consider that each object then lists the available counters. So for example the Buffer Manager object contains 21 counters from "AWE Lookup Maps/¬Sec" to "Total Pages." Defining a standard set of counters that capture the core information to monitor the system is a must.
Perfmon Counter Set
As an initial list, collecting all the counters for these objects can act as a good baseline for data to make performance tuning decisions:
Memory
Network Segment
Physical Disk
Processor
Server
System
SQL Server: Access Methods
SQL Server: Buffer Manager
SQL Server: General Statistics
SQL Server: Locks
SQL Server: SQL Statistics
Using perfmon
when you first open performance it will have three basic counters your current machine up and running. assuming that not logged onto the server in order to add counters need to type in the machine name. this will load the counters for that so panic if you see only the standard set of counters and not sql server specific counters. Perfmon can be used one of two ways.
1. You can add a series of counters and observe their behavior through the GUI in real time.
2. You can use perfmon's ability to create Counter Logs with Window's Scheduler to record counters for later review and manipulation.
The latter approach is certainly more systematic. Counter Logs are collections of data written to a file. You can run them over a period of time, say during year-end processing if that's when you're expecting high load, or over the course of a two hour test (more on that below in Performance Testing). Instead of simply watching the counters scroll by and losing the data forever, you'll have a record that allows you to isolate when and where bottle necks occurred.
A key point to keep in mind is that Perfmon has a data collection cycle interval. This means that if you're experiencing high loads, but short transactions, a 30 second sample rate may entirely miss the events you'd most like to record. With this in mind, when not performing a long running system test, a low interval like 10 seconds would not be unreasonable. Remember that the lower the interval the more data will be collected. Keep this in mind when planning for disk space. The log collection can contain start and stop times or define a length of time for which to run. All of these settings are available through the command line, so you can start data collection using third party tools or schedulers.
Supported file formats include binary, comma delimited and tab delimited. You can also store the data directly to SQL Server, but for most performance monitoring situations, storing the data to file rather than to SQL Server works well. It means you can run Perfmon on a machine other than that which is being monitored, which means that any I/O costs are not incurred by the monitored machine. If you were to use SQL Server, probably to set up some sort of enterprise level of monitoring, you would want to be careful to not use the machine you're monitoring to also store the data being collected as this will mask the normal performance of the machine behind the transactions necessary to support perfmon.
One other source of counter information in SQL Server 2005 is the new dynamic management view sys.dm_os_performance_counters. These are only a sub-set of the counters available in perfmon, but this subset is immediately available inside of queries for whatever monitoring solution you might be trying to put in place.
Evaluating perfmon data
Having collected the data on your server during high production use, or as part of a load test, you need to know how to evaluate these counters. Each set of counters, and each individual counter, tells a different part of the story. A massive amount of drill-down is required to define all of them. Instead, I'll focus on a very few of the most important counters - ones that can directly indicate the health of your overall system, the Server itself and SQL Server.
Server health
Starting with the server itself, you need to worry about:
- Memory
- Disk I/O
- The Processors
Memory
The most basic memory check is "Pages/¬Sec". According to the MS documentation, this counter shows the:
"Rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays."
A high number here means there is a lot of activity in the memory of your system. What constitutes "a high number" depends on the amount of memory on the machine. My laptop shows spikes of 156 pages/¬Sec under a light load, whereas one of my production servers can show a spike of over 1,000 under a normal load. Measuring your system over time will allow you to develop the knowledge of what constitutes an abnormal load.
Disk I/O
The core check for disk drives is the length of time that writes (including updates, inserts and deletes) or reads wait in the queue, and this is measured by the counter "Avg. Disk Queue Length." Again, a high number is bad. This counter can be set to average all the disks on a system or you can look at each individual disk. Averaging the disks may be an interesting exercise, but to understand what is happening on your system, you really need to set a counter for each individual disk.
Processors
Finally you can look at the activity of the processors using the "% Processor Time" counter. This counter, like the disk one, can either be an average or a precise measure of each processor and once more, the higher the number, the worse the performance. Assuming a dedicated SQL Server machine, the average CPU usage can be a useful number. However, if you start looking at machines running more than one instance, with each instance assigned to a particular processor (or set or processors), then you'll need to collect data on the individual processors.
SQL Server health
The above counters point to the general health of the server, but what about the SQL Server itself? There are a few counters here that can indicate how well the server is behaving, but keep in mind that these are only broad indicators. Detailed troubleshooting will require more detailed information to support accurate analysis.
Buffer Cache Hit Ratio
First off, the Buffer Cache Hit Ratio , located in the Buffer Manager will show what percentages of pages were found in memory, thereby avoiding a disk read. In this instance, higher is better. Under most circumstances, with a well configured machine, you should see a buffer cache hit ratio above 90% and probably average over 95%. Lower numbers indicate a lot of disk reads which will slow things down.
Full Scans¬/¬Sec
Next, as a general measure of health, it's good to look at the Full Scans/¬Sec counter in Access Methods. This is basically the number of table or index scans that the system is experiencing. A high number here shows either poorly written stored procedures or bad indexing. Either way, you need to get to work identifying the source of the problem.
Lock Requests/¬Sec
Under Locks the Lock Requests/¬Sec counter can show the number of new locks and lock conversions that are taking place on the system. This counter can be general, showing Total Locks, or it can get extremely specific, counting row ID locks (RID) in 2005 or key, file & page locks in both SQL Server 2000 & 2005. While higher numbers may be bad, depending on the circumstances, they may also be an indicator of just a lot of use on the system. If this number is spiking or growing over time, you will need to pursue more details to ascertain whether or not you have a problem.
Deadlock/¬Sec
Also under Locks, you may want to put the Deadlock/¬Sec counter on if you're experiencing deadlocks. Deadlocks, by their very nature are indicative of performance problems that require, at the least, procedure tuning and review, and a check on the indexes of a machine. Other steps may be required.
User
User Connections
Not actually an indicator, but a very useful measure in combination with all the other counters. is the User Connections counter under General Statistics. It's not that this number will necessarily indicate a problem on your server, but it helps in conjunction with the other counters to identify where real problems exist. For example, say you have a larger than normal number of locks. Check the number of user connections. If it's higher than normal, then you're probably just experiencing a spike in usage, but if it's average or below average, then you may have a problem and it's time for more detailed investigation.
Batch Requests/Sec
The last general measure is Batch Requests/Sec under SQL Statistics. This measure quite simply is the number of requests coming in to the server. This is a very general number and may be indicative of nothing more than high use, but it's a good value to track over time because it can show you how your system use is scaling. It can also be used to indicate when you have peaks and valleys in the number of user requests on the system.
All of the counters outlined above are only the beginning of the metrics you can use to get a general measure of system performance. All the other available counters will enable you to drill down into specifics within SQL Server or the server itself. After determining what the OS and the Server are up to, you will need to look inside at what the queries are doing. This is where Profiler comes into play.
Profiler
Profiler can run, similar to Performance Monitor, either in a GUI mode, or in an automated manner with outputs to files or databases. Sitting and watching the GUI window is usually referred to as SQL-TV. That may be a good way to spot check issues on a database server, or do some ad hoc troubleshooting, but for real performance monitoring you need to set up an automated process and capture the data for processing later. Profiler collects information on events within SQL Server. The broad categories of events are as follows:
• Cursors
• Database
• Errors and Warnings
• Locks
• Objects
• Performance
• Scans
• Security Audit
• Server
• Sessions
• Stored Procedures
• TSQL
• Transactions
Each of these categories has a large number of events within it. Rather than detail all the various options, the following is a minimum set of events for capturing basic TSQL performance.
Stored Procedures - RPC:Completed
This records the end point of a remote procedure call (RPC). These are the more common events you'll see when an application is running against your database.
Stored Procedures - PC:Completed
These would be calls against procedures from the system itself, meaning you've logged into SQL Server and you're calling procedures through query analyzer, or the server is calling itself from a SQL Agent process.
TSQL: SQL Batch:Completed
These events are registered by TSQL statements running locally against the server which is not the same as a stored procedure call, for example: SELECT * FROM tablename.
Each of these events can then collect a large number of columns of information, each one may or may not be collected from a given event, depending on the event and column in question and each one may collect different data from the event, again depending on the event and column in question. These columns include but are not limited to:
TextData
In the events listed above this column represents the text of the stored procedure call, including the parameters used for the individual call, or the SQL batch statement executed.
ApplicationName
This may or may not be filled in, depending on the connections string used by the calling application. In order to facilitate trouble shooting and performance tuning, it's worth making it a standard within your organization to require this as part of the connection from any application.
LoginName
The NT domain and user name that executed the procedure or SQL batch
CPU
This is an actual measure of CPU time, expressed in milliseconds, used by the event in question.
Reads
These are the count of read operations against the logical disk made by the event being captured.
Writes
Unlike the Reads, this is the physical writes performed by the procedure or SQL batch.
Duration
This is the length of time that the event captured took to complete. In SQL Server 2000 this piece of data is in milliseconds. As of SQL Server 2005, this has been changed and is now recorded in microseconds. Keep this in mind if you're comparing the performance of a query between the two servers using Trace events.
SPID
The server process ID of the event captured. This can sometimes be used to trace a chain of events.
StartTime
This records the start time of the event.
In short, a great deal of information can be gleaned from Profiler. You may or may not be aware, but in previous versions of SQL Server, running Trace, as Profiler was called previously, against a system could bring the system to its knees before you gathered enough information to get a meaningful set of data. This is no longer true. It is possible to turn on enough events and columns to impact the performance of a system but with a reasonable configuration Profiler will use much less than 1% of system resources.
That does not mean that you should load up counters on the GUI and sit back to watch your server. This will add some load and can be easily avoided. Instead, take advantage of the extended stored procedures that are available for creating and managing SQL Traces. These will allow you to gather data and write it out to disk (either a local one or a remote one). This means that you'll have the data in a transportable format that you can import into databases or spreadsheets to explore, search and clean to your heart's content. You can also write the results directly to a database, but I've generally found this to be slower, therefore having more impact on the server, than writing to a file. This is supported by recommendations in the BOL as well. Here is a basic script to create a trace for output to a file:
In order to further limit the data collected, you may want to add filters to restrict by application or login in order to eliminate noise:
EXEC sp_trace_setfilter
@trace_id,
@columnid,
@logicaloperator,
@comparisonoperator,
@value
So, for example to keep any trace events from intruding on the data collection above, we could add:
EXEC sp_trace_setfiter
@trace_id = @TraceId,
@columnid = 10, --app name column
@logicaloperator = 1, -- logical "or"
@comparisonoperator = 0, -- equals
@value = N'SQL Profiler'
The output can be loaded into the SQL Server Profiler GUI for browsing or you can run this procedure to import the trace data into a table:
SELECT * INTO temp_trc
FROM fn_trace_gettable('c:\temp\my_trace.trc', DEFAULT);
Evaluating Profiler data
Now that you've collected all this data about your system, what do you do with it? There are many ways in which you can use Profiler data. The simplest approach is to use the data from the Duration column as a measure of how slowly a procedure is performing. After collecting the information and moving it into a table, you can begin writing queries against it. Grouping by stored procedure name, stripping off the parameters from the string, will allow you to use aggregates, average, maximum, and minimum, to outline the poorest performing procedures on the system. From there you can look at their CPU or I/O and query plans in order to identify the tuning opportunities available to you.
With the release of SQL Server 2005, one additional piece of functionality was added to Profiler which will radically increase your ability to identify where you have performance issues. You now have the ability to load a performance monitor log file and a profiler trace file into the Profiler. This allows you to identify points of time when the server was under load and see what was occurring within the database at the same time.
Third Party Tools
There is an entire industry built around monitoring servers and databases. Most of these use the same counters that you can access through Performance monitor or Profiler. A few of these tools find ways to monitor packets or logs or memory and provide a slightly different set of measures. However, most of them do things that you could do for yourself, eventually. What they offer is the ability to set up monitoring and alerts and take advantage of all the work that they've put into recording the data over time, setting up reports, building enterprise level alerts, etc. If you have more than a handful of servers to monitor, doing the research and finding a third party tool that works in a manner that you find acceptable is worth the time and effort they will save you.
IV. Tuning Performance
In short, this is setting or changing the server configuration, the database indexing or storage or the T-SQL code in order to achieve enhanced performance. After gathering information about what is happening on your machine, you can begin the process of identifying where you are experiencing slow performance. Once you've identified the source you can do something about it.
Server Performance
The quickest possible solution here is to get more and faster CPUs running on more and faster memory against bigger and faster disks. There, we're done. What's that, you've got a big honking machine and it isn't performing properly, or you've got a limited budget so you need to squeeze more from the machine that you have? OK. Listed below are some areas where you can change settings in order to make a difference to how your machine is configured. Most of these are suggestions as each and every set of circumstance is unique.
Rather than let SQL Server manage memory allocation, and grow and shrink it as necessary, simply determine the maximum memory that you can allocate to the server and fix the memory allocation at this point. Variable memory is only helpful if you're running SQL Server in a mixed environment with other applications (this, by the way, will lead to poor performance as the server has to contend with other applications for precious memory, cpu, and disk resources).
You should reserve somewhere between 1GB and 2GB of memory for the OS, depending on the amount of memory on the system.
You should enable AWE (Address Windowing Extensions) on systems with more than 4GB of memory.
Make sure that the maximum number of processors (MAX DOP) is set to the actual number of physical processors on your machine, not the number of hyper-threads.
One of the most consistent bottlenecks in SQL Server is the tempdb. This is used when applications create objects such as temporary tables, but it is also used when rebuilding indexes and sorting data. Create one tempdb data file for each physical processor on the machine. Where possible, isolate the tempdb files to their own drives.
Database Performance
Database performance almost always comes down to I/O. How much data can you get in/out of the system and how quickly? First, and most important, will be your actual database design. It can't be over-emphasized that a poorly constructed database, no matter how sophisticated the server, will perform badly.
Indexing
To start with, you need to plan an indexing strategy at the same time as you plan the database. First, and most important, is the clustered index. As a rule, every table in an online transactional processing (OLTP) system should get a clustered index. There are exceptions, but the exceptions should be exceptional. You can only put one clustered index on a given table, so the proper selection of exactly what column or columns to place it on is extremely important. By default, most tools simply place the clustered index on the primary key. This may well be the appropriate place to put a clustered index, but you need to evaluate how the data is most likely to be accessed.
It makes sense to leave the cluster on the primary key if that primary key provides the natural access path, the most common field used to either search the table or relate the table to another.
Changing the parameters slightly, if the access path is mostly through another column, say Company Name, this may make a more appropriate clustered index. Another situation is when the table is no longer at the top of the chain, but somewhere in the middle and the most likely avenue of selection for the data is going to come through the foreign key to a parent table. Then the foreign key column becomes a good candidate for the clustered index. A further wrinkle could be added by needing to get the related data in this child table, but based on a date. This would result in a clustered index composed of two columns: the foreign key and the date. As you can see, in a few sentences a number of options were laid out. You need to think this through as you design the system.
You may also identify, either during design, testing or monitoring the production system, that other indexes are needed. While multiple indexes on any given table may be needed, you need to keep in mind that each index adds overhead to the system because these values have to be maintained as the data gets modified, which includes inserts, updates and deletes. Further, since indexes are stored by pointing to the clustered index (one of the many reasons you need a clustered index on the table) changes to the clustered index can result in a cascade through all the other indexes on a table. Because of all this, while indexes are good and necessary, restraint must be exercised in their application & use.
Files and FileGroups
Other factors that can affect the performance of your system include the way the files and file groups are laid out. You should be creating multiple files for your databases to optimize performance. A baseline for this would be to create one file for the logs, another for the data (defined by the clustered index), and another for non-clustered indexes. Additional files may be necessary to separate out BLOB data or XML data or unusually active tables, each onto its own file, and where possible, onto its own disk. This has been found to be true even on SAN systems because distributing the load takes further advantage of the architecture of the SAN.
Normalization
The data stored and the data retrieved should be defined appropriately. This means normalizing the storage. If you simply want a flat file, don't put it into a relational database system. You're paying for overhead you don't need and sacrificing benefits that you could achieve. That said, targeted denormalization, picking some fields to duplicate rather than maintaining a perfect third normal form system, will provide some performance benefits.
Data Types
Define the data types that you need, not what you think you might need someday. A phone number is a string, not a number. Define the length of field that you need and enforce that length.
Other Issues
Simple things can make a difference too.
Turn auto-shrink off.
Make sure auto-update of statistics is turned on.
If a database is read only, set it to read only.
Use triggers very judiciously. They mostly operate in the background making them difficult to monitor and troubleshoot.
Be very careful of autogrowth settings on the database. 10% autogrowth will be fine when the database is 500mb. It makes a huge difference when the system is 50gb. For larger databases, change the setting to grow by a fixed amount rather than a percentage of total database size.
TSQL Performance
After you've configured your server and built a functional database, you'll need to move data in and out of it. Most of this will be through TSQL queries. These queries should be defined within stored procedures and will make use of views and user-defined functions to manipulate and access sets of data. The most important part of this concept is the set. Most people think in terms of pieces of data instead of sets of data. This conceptual shift, to manipulating the data in batch instead of row by row delivers the biggest performance benefits when working with TSQL. Learning TSQL syntax and set-based querying methodologies up front will provide more performance benefits by having well written procedures up front. This is much easier than attempting to tune or fix hundreds or even thousands of poorly written procedures after the fact.
Writing TSQL queries that perform well isn't always as easy as it sounds. The target needs to be to work with the TSQL compiler and optimizer, processes internal to the SQL Server itself, to provide them with queries that they can tune to deliver your data. You must then start with the basics and get the simple stuff right at the start.
- Make sure that your queries are written to manipulate only the data you need.
- Ensure that simple things like qualifying database objects by their owning user or schema are a regular part of your coding practices
- Learn and use the latest ANSI style of syntax employed by SQL Server (ANSI 99 for 2005, ANSI 92 for 2000).
- Avoid cursors as much as possible. While there are some good uses for them, they usually become a crutch used instead of learning how to manipulate the data in sets.
- Remember that transactions and transaction processing within the procedure should be kept as small as practicable. For example, if you need to do an insert and read some data, separate the read from the insert.
- Minimize the dependence in your code on constructs such as table variables or temporary tables. Again, these very useful tools frequently substitute a piece-meal approach for one that emphasizes sets of data.
- When writing stored procedures, things as simple as making the data type of parameters match the data type of the columns being referenced can make a big difference.
Once you've identified a stored procedure or query as being problematic, you'll want to tune it. This is where a query plan comes into play. SQL Server 2000 can display either a graphical plan (estimated and actual) or a text plan. SQL Server 2005 adds on the XML plan. There are advantages to each type of plan. The graphical plan can be a quick and easy way to peruse the actions a query has put the system through. Especially useful is the ability to display the estimated query plan, which could identify problems without having to actually execute a query. Text and XML query plans present more data immediately to the user and can be searched or parsed through code, allowing for some automation of tuning if you so desired it. Some of the things to look for in your query plans are scans, work tables, or hash joins. Each of these can usually be fixed by adding an index to the system or adjusting the query so that the join or where clause takes advantage of an existing index.
Client Access
All the good work done within SQL Server can be undone by a poorly written piece of client code. More often than not, the DBA can't write that code for the company. However, you can monitor, mostly through Profiler, what that code is doing to your system and how it is doing it. This is where you can make a difference.
If transactions are managed by code instead of by the database, observe these transactions to ensure that they are as short as possible and that any data access is isolated from other client side processing.
A classic error is to open a database transaction and then wait for user input. Be sure that the application is using the procedures you provided in the manner in which they were meant. You may have a large query that returns a lot of data being called frequently. Talking to the developers you may find that they only need one or two columns from this result set. Providing them with a modified or new procedure can save lots of processing time.
Be sure that the latest ADO or ADO.NET is in use. Be sure that the connections being made are using the settings you expect since things such as the connection timeout can be different than the database timeout resulting in odd behavior within your application. Again, moving only the data you need and only when you need it should be a working maxim.
V. Testing Performance
Instead of "discovering" problems only when the application hits production, try loading up the database with some data and run tests against it. While it is time consuming, database testing is as easy as that. First, you need a good understanding of the data. Just how many rows of what type of data are you likely to see? Then you need a good understanding of your data model so that you know that "x" parents have so many children on various tables of type "y". This isn't a technical issue. It's primarily a series of questions put to your business.
Once this is defined, you either have to define your transactions, or put Profiler to work capturing a set of transactions from the application. Obviously the second approach is preferable since the data will be much more accurate.
For this you can use Profiler to "replay" the transactions you captured against a system with a lot more data in it. This will allow you to identify where you'll be experiencing performance problems due to the amount of data in the system, your indexes, table design and your TSQL code. What this won't show you is what happens when you have multiple users hitting the system at the same time.
This is a much more difficult type of testing. I'd recommend looking into obtaining a third party tool such as Quest Benchmark Factory or Idera Diagnostic Manager. These allow you to take the series of transactions you've recorded and run them through, with multiple users simultaneously hitting the system. Testing of this type allows you to identify performance problems due to contention, deadlocks, and again your TSQL code, table design and indexes.
If nothing else, open up each query and look at the query plan to identify areas that might cause you trouble down the road. While that table scan is hitting 75 rows, it doesn't really affect performance, but when it's hitting 75,000 or 75 million rows, you'll probably notice. While running through all these tests, use Performance Monitor and Profiler to capture the data. This can then be used over time to track how well your performance tuning is working out.
No comments:
Post a Comment