A database is a collection of data that is arranged in such a way that it can be easily retrieved, managed and updated. Data is stored in rows, columns and in tables. It is indexed for quicker retrieval of required information. First we discuss the basics in brief then we will move on to this common question – How to optimize database response time?
Types of databases
There are mainly four structural database management systems:
In recent times, NoSQL and object-oriented databases have evolved for storing data. NoSQL databases do not follow the row/column/table approach. NoSQL databases are document based and include a collection of a key-value pair, graph databases which do not have the traditional schema as in RDBMS. NoSQL databases are well-suited for working with huge sets of distributed data. Only limited outside experts are available for assisting in the setup and deployment of large-scale NoSQL deployments.
Relational databases are the most commonly used database systems. SQL Server, Oracle Database, Sybase and MySql are some examples of Relational databases. RDBMS permits multiple users to access the data at the same time without compromising security factors. A table in RDBMS works similar to a spreadsheet. A Set of tables is referred to as a schema. A number of schemas combined together make up a database. A single server can contain multiple numbers of databases.
Let us now discuss how to optimize database response time
The below- mentioned tips must be considered prior to the design of a database and before writing SQL Scripts to optimize database response.
1. Database Statistics: An SQL Optimizer must be well aware of the statistics included in the different tables within each catalog. Details about the indexes and their distribution are referred to as Statistics. Using these details, an optimizer decides the optimum path for satisfying a query. Outdated or missing statistics details will cause the optimizer to take a path that is not optimum and this leads to an increase in response time. Consider the below example query:
where city = “Pune”
and phone =213-345-346;
Here Customer is the table name.
In the above query, Where clause has two fields. There are two indexes defined, each containing one field. The optimizer can use only ONE INDEX per table. In this case, the query will run much faster if the optimizer uses the phone field as the index which returns the least number of rows. So, if the database is not properly updated and if the right field is not selected as the index, there will be an enormous increase in the response time.
2. Construct Optimized Indexes: Indexes are very crucial for the construction of any query. But, many indexes slow down the speed of the (Insert, Update or Delete) queries. It is necessary to maintain a proper balance of index on tables. Fields that are included in the table and their order also plays a vital role.
(a) Composite Indexes: When an index contains more than one field, it is known as the composite index. Composite indexes are constructed when queries which include multiple fields in the where clause are executed. In this case, all fields combined together will significantly generate a minimum number of rows than the first field alone.
(b) Clustered Index: The physical order of arrangement of data in a table is referred to as the clustered index. This means that the actual data is sorted based on the index fields. Example: A telephone directory in which data is arranged by the last name of a person. Each table can have only one clustered index. A clustered index is often used in cases where columns are often searched for a range of values.
3. Avoid functions on RHS of the Operator: Functions and methods are often used in the SQL queries. Consider the below example:
where YEAR(Sample CreatedOn) == 2009
AND Month(SampleCreatedON) = 6;
In the above query, SampleCreatedOn has an index. But, the where clause is changed in such a way, that this index cannot be used. Modifying the query in the below manner increases the response time tremendously.
select * from Customer where SampleCreatedOn between ‘6/1/2009’and ‘6/30/2009’;
4. Specify expected growth for the index: One method to minimize the negative impact of indexes is to specify the appropriate value for expected growth while creating indexes.
Data for the index columns is normally stored on a disk. In the event of inclusion of new rows in the table or modifying the existing values in the indexed columns, the database has to reorganize the storage of data in order to accumulate the new rows. This reorganization affects the response time of the queries. In such cases, if new rows are incorporated on a regular basis, expected growth for an index can be specified beforehand. In MS SQL Server, the term used for expected growth is FILL FACTOR and for Oracle and DB2 it is PCTFREE, meaning Percent Free.
5. Include Optimizer hints in SELECT: It is always a good practice to mention the index name in the select query. Consider the below example.
select * from Customer
where city = “Pune” AND phone = ‘213-345-346’;
The additional WITH clause after FROM indicating the index name will help to increase the response time of the query. The above example is specific to MS SQL SERVER.
6. Usage of EXPLAIN: An execution plan for a SELECT statement created by the optimizer is returned by the databases in most cases. This execution helps a lot in fine tuning SQL queries.
SQL syntax for the execution plan is Set SHOWPLAN_ALL ON>Query<. Tools such as WINSQL Professional can also be used to run EXPLAIN commands.
7. Avoid Foreign key Constraints: Usage of foreign key constraints assure data integrity but at the cost of performance. If optimizing the response time is the primary goal, foreign key constraints can be avoided. For example: The System table in RDBMS contains Metadata information about user databases. The tables included here contain relationships but have no foreign key. The client here enforces these rules to optimize database response time.
8. Usage of multiple hard disks: As the size of a database increases, the I/O operation on the hard disks becomes evidently slow. Databases can be split across multiple physical hard drives to speed up I/O. It is also possible to split the contents of a table into multiple disks. Usage of multiple disks increases the speed of I/O operations as more heads retrieve data in parallel
9. Select minimum data: The minimum data retrieved, the quicker the query will run. Filtering should be made as much as possible on the server-end rather than filtering on the client. This will result in minimal data being processed and provide faster results. Eliminate any computed or unnecessary columns. Consider the below example:
select Firstname, Lastname, City
where City =”Pune”;
In the above example, the city column can be avoided as it will always be Pune. Even though it does not make a large effect, it can improve response time for large datasets.
10. Remove indexes before loading data.
The indexes on a table can be dropped before loading a large batch of data. This allows the insert statement to work faster. The Index can be recreated once again, once the inserts are completed.
In the case of an online system, that requires thousands of rows to be inserted, a temporary table can be used to load data. This temporary table should not have any index. Transferring data from one table to another is always much faster than loading from an external source. Drop indexes on the primary table, transfer data from temporary to the final table and finally create the indexes again.
Some more tips for optimizing the response time
Avoid using Group By, Order By and Distinct: Usage of Group By, Order By and Distinct must be avoided as much as possible. When the above clauses are used, the SQL server creates a temporary table and loads the data into this table. Data is processed in this temporary table as requested by the query and the final result is retrieved. So, it is always advisable to use Group By, Order By and Distinct in the queries when it is an absolute necessity.
Use Set NOCOUNT ON: Set NOCOUNT ON must be used for DML operations (Insert, Update, Delete and Select). SQL server by default counts and returns the number of rows affected. Small queries are not affected much, but in the case of large, complex queries with lots of joins, this causes a big performance issue. Set NOCOUNT ON will definitely increase the response time as it will not count the number of rows affected.
Include Owner/Schema Name: Object names for tables, stored procedures must be prefixed along with the Owner/ Schema Name. If Owner/Schema name is omitted, the server will search for the object in all schemas until the object is found. Specifying owner/schema directs the server to search for the table only in that particular schema.
Nullable Columns: Avoid using NOT IN and use NOT EXISTS when comparing with nullable columns. When NOT IN is used in a query, SQL Server will check every result (even when there are no rows with null values) to find out if it is null or not. NOT EXISTS does not perform the comparison check with nulls.
Adjusting Internal Variables: Fine-tuning some of the default settings helps a lot to speed up the response time and improve performance.
Changing Index Buffer Size (key_buffer)
The buffer size can be controlled with the help of this variable when managing table indices (both read and write operations). The value of this variable can be increased to about 25% of the total system memory. To improve response time, try using different values for this variable.
Changing Table Buffer Size (read_buffer_size)
In cases where a query needs a table to be checked sequentially, MySQL provides a memory buffer to this query. The buffer size is controlled by the read_buffer_size variable. If the sequential scan takes more time, performance can be improved by increasing the size of this value and hence the size of the memory buffer.
Setting the number of Maximum Open Tables (table_cache)
The maximum number of tables that can be open at any particular time is managed by table_cache_variable. This variable is similar to max_connections variables. Increasing the value of this variable permits a large number of tables to remain open. The table_cache value can be altered if the server receives queries on various databases and tables.
Specifying a time limit for long queries (long_query_time)
Usage of the long_query_time variable allows tracking inefficient or misbehaving queries. This variable helps a lot to improve response time and to optimize performance.