kandy
 
0.8.0
Because kandy 0.8.0 is still in development, this documentation may not be entirely accurate and is subject to change.

Boxplot

Edit pageLast modified: 15 July 2024

A boxplot, alternatively referred to as a whisker plot, serves as a statistical visualization technique, illustrating the distribution and summary statistics of a dataset in a graphical format. It consists of several components:

  1. Median (Q2): the line inside the box represents the median of the dataset, which is the middle value when the data is sorted in ascending order. It divides the data into two equal halves, with 50% of the data falling below and 50% above the median.

  2. Interquartile Range (IQR): the box itself spans the interquartile range, which is the range between the first quartile (Q1) and the third quartile (Q3). The first quartile (Q1) is the 25th percentile, meaning that 25% of the data falls below it, while the third quartile (Q3) is the 75th percentile, indicating that 75% of the data falls below it. The IQR captures the middle 50% of the data.

  3. Whiskers: the whiskers extend from the top and bottom edges of the box to the minimum and maximum non-outlier data points within a certain range. The range is typically determined by a multiplier (often 1.5 times the IQR), and it defines the outer limits for what is considered a potential outlier.

  4. Outliers (optional): individual data points that fall outside the whiskers are considered potential outliers. These are data points that are significantly different from the rest of the data and may warrant special attention in further analysis. The auxiliary statistic "boxplotOutliers" is used to count outliers. This statistic is not weighted.

This notebook uses definitions from DataFrame.