Percentiles, lies, and database performance monitoring

Track B 12:00 - 12:45
Težavnost: Intermediate
Področje: DBA
Jezik: Angleščina

It is becoming increasingly popular these days to put response time percentiles into the SLAs and OLAs. But, unfortunately, most people do it wrong. In this presentation, we will explain the most common failure modes regarding performance metrics and how to handle situations when someone comes complaining with a 99th percentile graph.

Response times are often not normally distributed, and averages and dispersions are just garbage. The simplest way to make sense of such data is to use percentiles. Popular choices are 50th (median) and 99th percentiles. By definition, there can be values larger than 99th percentile, so adding 99th percentile to the SLA will hide outliers.

Oracle provides histograms for the wait events, for query response times it is still only averages. The only way to get a hold of the outliers is ASH or event 10046 traces. As always, these options come with their own challenges.

Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.

Speaker:
Priit Piipuu
Podjetje
Kindred Group