[RackTop Quick Presentations] #tags: presentations, slides, markdown

7/21/2017 - 9:27 PM

[RackTop Quick Presentations] #tags: presentations, slides, markdown

<!-- $theme: gaia -->
<!-- $size: 16:9 -->
<!-- page_number: true -->

# Important Metrics for All Disks
* Number of IOs per interval, IOPS is common (i.e. per second)
* Throughput
* IO size, which directly limits throughput
* Latency
* Random or Sequential IO?
* IO direction bias (Reads v. Writes)

---

# How to think about these numbers
* We want to describe reasoning behind these numbers and what they can tell an average consumer
* There are two obvious areas, one being health, and one performance

---

## Health
* Is number of IOs substentially different between all like drives?
* Is latency vastly different between two like drives?
* Are number of bytes similar between all like drives?
* Do any drives show much more extreme observations? Define Extreme...
* How much active time? Define Active...
* What about IO errors?

---

## Performance
* Are my IOs large or small, and why does it matter?
* Can I satisfy throughput requirement of X?
* Do I experience high latency, what is high anyway?
* Pending IOs inform about how busy devices are

---

# Summarizing Data
* Expected Values, Medians, Sums, Mins, Maxs, Buckets
* Limit loss of insights
* Averages tend to obscure structure

---

## Expected Values, Medians, Sums, Mins, Maxs, Buckets
* Most measurements benefit from reporting a mean and extremely low and high obsrvations, or rage as MAX - MIN
* Median not biased by high values like latency spikes or IO stalls, but expensive
* Percentiles are useful for presenting ranges and tendency of data
* Percentile calculation in dtrace is expensive and lacks real number support, but easy in Influx

---

## Limit loss of insights
* Avoid reporting ONLY averages without a range
* Histograms allow for categorical or numerical ranges to be distilled
* Categorical ranges like *high*, *normal*, *low* can be useful for latency grouping
* Numerical ranges allow for more precise summary of a specific metric than Categorical; meaningful for IOPS, latency, IO size, etc.

---

## Averages tend to obscure structure
* Average = sum(N) / N influenced by extreme values,
* but hides their true significance
* Two drives may do vastly different amount of IO, but have same average latency
* Average may suggest much lower or higher expected value than reality when N is small and outliers exist, i.e. **avg([1, 1000, 30, 30000]) = 7757.75**
* No sense of how common or exceptional large or small values are

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster

[RackTop Quick Presentations] #tags: presentations, slides, markdown