Introduction
Even if you have just started to work with RDBMS, you may have heard that using indexes is good. While there is a plethora of information on the Internet, my goal is to choose and consolidate some basic knowledge about indexes into a series of articles. In doing so I’m hoping to provide a good starting point for beginner DBAs as well as valuable information for advanced DBAs too. In these articles I will cover some of the most frequently asked questions about indexes.
I’ve always want to know more about the system I’m working with, to perform tests and find facts that are not covered by regular documentation, but doing so is time consuming – as I’m sure you know. In this article I will help you understand how indexes can provide an order of magnitude improvement in performance by demonstrating how to test and analyze the impact of indexes. All this will be seasoned with a small amount of humor to make the education pass by a bit easier.
Some of the topics that we will cover include:
Beginner level:
- How index works?
- How index helps?
- How many indexes do you need?
- Can indexes kill performance?
- Which columns should be indexed?
- Does the order of the columns in WHERE clause matter?
- Does the order of the columns in index definition matter?
- How can I measure performance improvement after index creation?
- How do I know if index is NOT used for query processing?
- I created an index, but it is not used by DB2. What is wrong?
Advanced level:
- Compound index: Should I create one index for multiple columns or multiple indexes for each column?
- What does “INCLUDE <columns>” clause for an index means and how it can help me?
- What types of indexes are appropriate for you?
- …
So, what an index is?
Before we start to dig in, let’s talk briefly why indexes are so helpful and how indexes work. The best approach to learn something is to understand why it was done and why it was done in that particular way.
For example, you have a table with employee records stored (in no particular order) in files on hard drive. The goal is to get the list of employees records with DEPARTMENT=”Support”. How will you do that?
Let’s think like guys who invented the index. DB2 (or you) knows information about location of each table record on the hard drive. You can retrieve all records from the hard drive, go through them individually, get the DEPARTMENT value, compare it to “Support”, and get list of the records you interested in. Problem solved. However, you are getting claims from the users that it takes too much time to get the results. How can you improve it?
What if you can store somewhere just a DEPARTMENT value and location of the record on the hard drive? Since this is a very small subset of the data, it will take less resources to scan the hard drive and locate the records you need quickly. You have just invented an index. Additionally, indexes can be presorted and compressed for even more performance improvements.

In the TABLE, rows are stored without any particular order.
In the INDEX you can see that DEPRATMENT column values are stored in alphanumeric order. It is very easy to get all the records that belong to “Support” department. You know that you need TABLE records with row indexes (RID) 1 and 5 only.
So, what is an index? Index is kind of a small system table, with two columns: column(s) value, which is called KEY, and pointer to physical location of the corresponding record. An important note here is that the real column value is not stored in index as plain text. It is compressed and hashed. A wisely created index will always improve query execution in hundreds if not thousand times over non-indexed records.
How index helps?
- Index provides fast and efficient method to locate specific records in a table. It reduces execution time of a query.
- Index stores data in logical order instead of physical. Key values are stored in ascending or descending order. It eliminates need to SORT values during query processing.
- Index improves concurrency of the transactions. Records in the table can be located faster, as a result acquired locks will be released faster.
How many indexes do you need?
It is absolutely up to you. Index definitely brings performance improvements, but there are two things to be aware of:
First of all, when you create an index, you “duplicate” a column value in another place. This means, when you create an index for each column you use the same amount of space as the original table, plus some space to store data for record pointer for each index. So, space used by all indexes can be more than the size of the original table.
Secondly, problems appear when you create or update records in a table. Each update means that the index should be updated as well. If you have 10 indexes – DB2 will update them first and only then will proceed with update of the modified record. It forces your hard disk to find 10 small index tables, read them, and write changes back before it updates one original table record. A busy hard disk is a performance enemy.
Can index kill your performance?
The most common answer is yes. But, is this always true? My answer is – not really. Let’s run some simple performance tests to verify my position
Performance test:
For this test I chose a table with 300,000 records.

I created 5 indexes for all columns except LOGTEXT (VARCHAR(1024)), plus one PRIMARY KEY index. So, I have 6 indexes in total.
Then I did 10 INSERT statements and measured average execution time. In my case it was 62 ms.
After that I dropped all indexes and repeated the INSERTs. Now that there were no indexes to update, in theory, I was supposed to see some improvement here, but in fact I got the same 62 ms. Improvement was less than 1 millisecond, I did not even notice it.
To update one record, DB2 should perform 6 index updates (in real life I believe DB2 smart enough to see if column value was changed or not before updating it). In worst case scenario, execution time for INSERT or UPDATE statement will be increased in 6 times. Still, improvement from using the index is hundreds and thousands times. We are talking about spending of $6 and getting reimbursement for $6,000.
You can say that 300,000 records are nothing, and I should do my test on 300,000,000 records. Ok, perhaps in this case I will see a difference. Can you imagine a query processing time over table with such amount of records without an index?
Negative aspects of the indexes:
- Indexes require additional space.
If you have a 100 Gb database and you will create indexes for all tables, for all columns, the size of the database will be increases at least twice in size.
- A full index must be retrieved from the hard drive before it can be used.
- Insert / Update / Delete operations require updating the index data.
- Each index adds an alternative access plan that the DB2 Optimizer must consider during calculations of the optimum access plan.
Even with all these negative factors, large amounts of indexes will not kill your performance. They may negatively impact storage space, however.
So, index can not kill your performance. But, it depends.
Primarily, it depends on what type of database you have. Is it a warehouse (almost read only) or OLAP (often updates) system? If UPDATE execution time is critical, and you count each microsecond – a lot of indexes can be bad. If you do not care about that – then you mostly like have a regular, non-OLAP system, and your primary goal is to quickly send response on a user query.
Secondary, it depends on your goal. If you want to kill application performance via indexes – I am sure you will find a way to do this.
While indexes can bring performance improvement to your system, some times it’s more wise not to add indexes
Be wise…
Which columns should be indexed?
As we discussed, index contains value for some column, which is going to be used in some kind of comparison. Any column used in WHERE, ORDER BY or GROUP BY clause is a perfect candidate for an index. Columns that have a Foreign Key definition are recommended for indexing too.
What if I have few columns in WHERE clause? Should I create a separate index for each column or create one index for all columns? Sorry, I will answer on that question in my next article.
Does the order of the columns in WHERE clause matter?
Does the order of the columns in multicolumn index definition matter?
Yes, it does. I will discuss this topic in detail in my next DB2 Tips Blog.
How can I measure performance improvement after index creation?
DB2 has an excellent product called the EXPLAIN facility, which can be found in mostly all DB2 related products.
Look for
icon in IBM Optim (InfoSphere, Data Studio) products, or “Access plan” icon in DB2 Command Editor (DB2 Control Center component).

This facility visually describes how DB2 engine proceeding your query. All you need is a SQL query which you would like to analyze. Here is an example of output.

Red TBSCAN octagon means place for potential improvement. TBSCAN stands for “Table Scan”, which means DB2 is not using any indexes for this part of query, it retrieves whole table and scans each record to see if it matches your request. In the properties of octagon you will be able to find which predicates (columns used in WHERE clause) were used. Create index for them and the problem will be solved.
Yellow IXSCAN, RIDSCN octagons mean that index was used for this part of the query.
Each figure in the following diagram has a numeris value representing the system resources spent processing; they are measured in timerons. It is not a time unit of measure. It is hardware independent unit of measure, which describes how many steps DB2 will perform to complete your query. In our example RETURN value s 2091. But, 2 servers with different hardware will spend different amount of time to complete this same query, which has the same RETURN value.
To measure improvement you should run EXPLAIN facility before you create an index. Then create an index, update Statistics for table AND INDEX (which is not by default), and run EXPLAIN again. If you did all right you will see that TBSCAN octagon is gone, and you have a yellow index SCAN instead of it. RETURN value will be changed as well.
How do I know if index is NOT used for query processing?
Run this query (DB2 9.1 for LUW and higher):
SELECT * FROM SYSCAT.INDEXES WHERE LASTUSED = ’0001-01-01′ AND TABSCHEMA NOT LIKE ‘SYS%’
ORDER BY TABSCHEMA, TABNAME;
All indexes in the result set were never used. You may find a lot of interesting information in that view. LASTUSED column represents when last time index was accessed by any SELECT query. If index was never used, it has value ’0001-01-01′.
In the same way you can check statistics for some particular index to see when it was used last time.
I created an index, but it is not used by DB2. What is wrong?
There are 3 main reasons.
1. You did not update table and index statistics.
RUNSTATS ON TABLE YOUR_TABLE ON ALL COLUMNS AND DETAILED INDEXES ALL ALLOW WRITE ACCESS ;
COMMIT WORK;
2. Based on the table and index statistics that DB2 have, it can decide that it will be faster to scan the whole table instead of checking some index. For example, you perform query like SELECT * FROM … WHERE DEPARTMENT_ID=200;
If 99.99% of all records has DEPARTMENT_ID=200, it may be easier to get whole table from HDD and get rid of these 0.01% of records.
3. You forgot to include WHERE clause in your query. No conditions for search, no indexes involved.
Next time.
Today I have covered enough basics to get you going. Next time I will cover these topics:
- Compound index: Should I create one index for multiple columns or multiple indexes for each column?
- Does the order of the columns in WHERE clause matter?
- Does the order of the columns in multicolumn index definition matter?
And as always will do performance investigation.
 |
About the Author Maxim Moroz (Maxim.Moroz@pointsourcellc.com) is a Database Architect with PointSource with overall 15 years in software development and administration. He pays a lot of attention to the aspects that effect application performance. He is IBM certified in both IBM DB2 and IBM Lotus Domino products. |


