Benchmarking
When a database becomes big, query performance can suffer. It becomes important to measure the performance of the system and this is called benchmarking. We can measure many things like number of transactions per unit time, response time, the scalability or support for concurrency. When benchmarking, it is important not to use only a subset of real data, incorrectly distributed data or test using a single connection even though the database is supposed to be used concurrently.
ab
Various tools exist which help to benchmark a web application. One of them is the Apache Benchmarking tool (ab) which benchmarks the web application as a whole i.e. the full stack (operating system, database, web server, scripting language).
The work to do is to use ab to evaluate the performance of the web application written before.
mysqlslap
Another important benchmarking tool is mysqlslap which is distributed with the MySQL database. mysqlslap measures the performance of the database (and not the web application) and has various capabilities:
- Testing using automatically generated tables and SQL queries
- Measuring performance when the database is accessed concurrently
- Running tests many times to get more significant results
- Using existing schemas and custom SQL queries when measuring performance
- Producing performance reports
The work to do is to use mysqlslap to evaluate the performance of the standard MySQL employees sample database.
Benchmarking sometimes shows that query performance is not good enough. The question then is to know which query is too slow and to find ways to make it run more quickly. Those two steps are called profiling and optimisation.
Profiling
MySQL has a slow query log (which is inactive by default) which logs all SQL queries which are taking too long to execute. Whenever an application is slow, the slow query log should be enabled and monitored to discover the culprits. This can be done with the SET GLOBAL slow_query_log = 1; command. It is also possible to change the threshold above which queries are considered to be slow. Do SET long_query_time = 2; for 2 seconds for example.
MySQL provides many ways for a programmer (or a database administrator) to discover why an SQL query is performing badly.
MySQL server status variables provide information about the execution of queries within MySQL. There are a lot of variables but the most relevant for performance have names starting with:
- Bytes_ (bytes received and sent)
- Com_ (commands the server executed)
- Created_ (temporary tables created)
- Handler_ (storage engine operations e.g. InnoDB)
- Select_ (various types of join execution plans)
- Sort_ (various types of sort operations)
The various types of join execution plans. Some of them are:
- Select_scan: refers to a table that is completely read in sequence from the hard drive.
- Select_range: refers to a table that was read from the hard drive only in the necessary places to satisfy a limited range of conditions.
- Select_full_join: is the same as Select_scan with the difference that Select_full_join applies to the second and subsequent tables in the join plan for a multiple table query.
- etc.
The work to do is to use MySQL server status variables as well as the EXPLAIN command to fully understand how queries are executed.
(MySQL also provides the following commands:
SET profiling = 1;
SHOW PROFILES;
SHOW PROFILE FOR QUERY 1;
which give the durations of the various phases of a query)