Monday, August 25, 2008

MySQL Query Analyzer: DBA Task #1: Finding Bad Queries

One of the biggest problems I faced as a field DBA was defending my production databases against poorly performing SQL code. Talking to folks at the MySQL UC, during customer visits and trolling the MySQL forums/blogs confirms that this is a common problem and even more challenging when a performance problem pops up during times when no code promotions have taken place.

Traditional/popular way of finding bad SQL code
For MySQL DBAs the challenge to identify and isolate resource intensive SQL code is really two-fold; job #1 is to find the offending code (after that the chore becomes fixing it so it performs, but that is a topic for another post). Most MySQL DBAs have used the MySQL Slow Query Log ("SQ Log") at some point to help with job #1. For those not familiar, the SQ Log is enabled by starting MySQL with --log-slow-queries[=file_name] option. Once enabled the SQ Log captures queries that took longer than X seconds (user definable value between 1 and 10) and presents them for parsing and analysis. Beginning in 5.0 it can be configured to also log any queries that do not use indexes. The SQ Log is formatted into entries that look like:

#Time: 08073101 16:25:24
#User@Host: root[root] @ localhost []
#Query_time: 8 Lock_time: 0 Rows_sent: 20 Rows_examined: 243661
SELECT part_num FROM `inventory`.`parts` WHERE
(`ven` = "foo") ORDER BY `delivery_datetime` DESC LIMIT 100;

While the SQ Log provides good details on bad code, using it comes at a cost:
- Prior to 5.1, MySQL has to be restarted to enable logging
- The log grows very large, very quickly. Running it in production is not mainstream, but can be managed via scripts and FLUSH LOGS
- Entries must be parsed/sorted for relevance
- Must use long_query_time=0 option to capture all queries, which comes with overhead of its own
- Statements are logged in raw format, there is no aggregation, or roll ups of executions times, occurrences, etc. Basically, the SQ Log is not easy to use, and takes time to navigate to find queries that are suspect and in need of tuning or elimination.

At this point, I have to admit that I *strongly* agree with those that say that MySQL performance related metrics should be part of the MySQL server and readily available for all to consume. In short, things like the SQ Log are good, but such instrumentation belongs in a true performance/wait interface that is designed into the MySQL server product. My buddy Robin Schumacher has been promoting such an Engineering spec for many moons and it appears we will see the first fruits of that work appear in 6.0. So while we are waiting we have a choice; we can continue to pine for a server side solution, or we can be proactive and create an easy to use solution that works now and will be valid for our older MySQL servers. MySQL Proxy to the rescue...

Using MySQL Proxy for Monitoring Queries/Performance
The MySQL Proxy is available under the GPL and provides a chassis for those who want to build advanced MySQL query and performance monitoring solutions. By implementing the Proxy between your apps and MySQL servers you can intercept their communication stream and build your own query and performance related data collections using Lua and then use your own custom apps to aggregate, monitor and view the data as it runs on your systems. Info (and user forums) on using the Proxy is here.

Using MySQL's Query Analyzer to find bad SQL code
MySQL Enterprise provides an out-of-the-box solution for DBAs/Devs who want to use the Proxy to monitor for poorly performing SQL code, but who don't have the time to build a solution on their own. The Enterprise Monitor Service Agent has been extended to include the Proxy chassis and required Lua extensions so that it listens on a user defined port for application queries, collects the SQL code and performance stats and then reports the results back so they can be stored in the Monitor repository for monitoring and analysis. Integrated into the Enterprise Monitor, the Query Analyzer leverages the Proxy reported metrics so DBAs and Devs can visually pinpoint bad SQL code based on aggregated exec times, number of execs, rows/data returned, etc. The cool thing is the Query Analyzer saves development time building a custom monitoring solution. It also eliminates all of the human and system overhead attributed with enabling and parsing the SQ Log. The Query Analyzer:

- minimizes/eliminates the need to enable the SQ Log to find expensive SQL code
- aggregates queries by total exec time, number of executions
- organizes and sorts SQL by relevance across all servers
- filters by query type, content
- shows trending by date range
- tracks queries by database, application
- provides drill downs into normalized and specific execution statistics
- provides full explain of worst offenders (user definable)
- provides summary of all queries in a consolidated view:

After you find a suspect statement, you can then drill into specific occurs across servers:

While enabling the Query Analyzer comes with some Agent overhead expense, it can be quickly enabled/disabled without a server restart and the SQL code and performance data that is collected and returned is stored in the Monitor repository so it can used for later analysis. For the SQ Log purists, we do plan to incorporate a means to pull over the MySQL log files into the Monitor too, but we wanted to implement a solution that helps our pre-5.1 install base (meaning, they can monitor for bad queries without enabling the SQ Log via a server re-start) as part of the first rev. Hey, if we have to wait for a true performance/wait interface, may as well make the best use of our time...

1 comment:

Anonymous said...

The Slow Query log realy saved my live over the last few moons at a exploding company. With a script (remember me again that I should opensourcing it) I collect the logs over serveral server, extract a time period and analyze it with mysqldumpslow. I generate the average time log and the overall time log. While the overall time log offers often some low hanging fruits that will make a better overall performance of the cluster, the average time log show me where users had to wait long, long times for there query.

Slow Query log (together with mysqldumpslow) is for me realy good enough(TM).