Heatmap

Statistics "count2d" are calculated on the sample of two categorical variables (usually provided as two samples of single variable — x and y). It counts the number of observations in each pair of x-category and y-category. It's weighted, it means the weighted count for each pair is calculated (each element within a pair is counted along with its weight).

Arguments

Generalized signature

The specific signature depends on the function, but all functions related to "count2d" statistic (which will be discussed further below — different variations of statCount2D(), heatmap()) have approximately the same signature with the arguments above:

statCount2DArgs :=
   x,
   y,
   weights = null

The possible types of x, y and weights depend on where a certain function is used. They can be simply Iterable (List, Set, etc.) or a reference to a column in a DataFrame (String, ColumnAccessor) or the DataColumn itself. x elements are type of X — generic type parameter, y elements are type of Y — generic type parameter.

Output statistics

name	type	description
Stat.x	X	`x`-category
Stat.y	Y	`y`-category
Stat.count	Int	Number of observations in this category
Stat.countWeighted	Double	Weighted count (sum of observations weights in this category)

StatCount plots

// Use "mpg" dataset
val mpgDF =
    DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv")
mpgDF.head(5)

untitled	manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
1	audi	a4	18,0	1999	4	auto(l5)	f	18	29	p	compact
2	audi	a4	18,0	1999	4	manual(m5)	f	21	29	p	compact
3	audi	a4	2,0	2008	4	manual(m6)	f	20	31	p	compact
4	audi	a4	2,0	2008	4	auto(av)	f	21	30	p	compact
5	audi	a4	28,0	1999	6	auto(l5)	f	16	26	p	compact

// We need only three columns
val df = mpgDF["class", "drv", "hwy"]
df.head(5)

class	drv	hwy
compact	f	29
compact	f	29
compact	f	31
compact	f	30
compact	f	26

Let's take a look at StatCount2D output DataFrame:

df.statCount2D("class", "drv", "hwy")

Stat
x	y	count	countWeighted

As you can see, we got a DataFrame with one ColumnGroup called Stat which contains several columns with statics. For statCount2D, each row corresponds to one pair of categories. Stat.x is the column with its x-category. Stat.y is the column with its y-category. Stat.count contains the number of observations in the pair. Stat.countWeighted — weighted version of count. DataFrame with "count2D" statistics is called StatCount2DFrame

`statCount2D` plot transform

statCount2D(statCount2DArgs) { /*new plotting context*/ } modifies a plotting context — instead of original data (no matter was it empty or not) new statCount2D dataset (calculated on given arguments; inputs and weights can be provided as Iterable or as dataset column reference - by name as a String, as a ColumnReference or as a DataColumn) is used inside a new context (original dataset and primary context are not affected — you can add layers using initial dataset outside the statCount2D context). Since the old dataset is irrelevant, we cannot use references for its columns. But we can refer to the new ones. They are all contained in the Stat group and can be called inside the new context:

df.plot {
    statCount2D(`class`, drv) {
        // New `StatCount` dataset here
        points {
            // Use `Stat.*` columns for mappings
            x(Stat.x) {
                axis.expand(0.0, 0.5)
            }
            y(Stat.y)
            size(Stat.count) {
                scale = continuous(10.0..30.0)
            }
            color = Color.RED
        }
    }
}

Heatmap layer

Heatmap is a statistical plot used for visualizing the distribution of two categorical variables sample. It's a tile plot where each tile is representing one of a pair of categories: its x coordinate is corresponding to x category, y to y category, and its color is to count of this pair. So basically, we can build a heatmap with statCount2D as follows:

val statCount2DAndTilePlot = df.plot {
    statCount2D("class", "drv") {
        tiles {
            x(Stat.x)
            y(Stat.y)
            fillColor(Stat.count)
        }
    }
    layout.title = "`statCount2D()` + `tile()` layer"
}
statCount2DAndTilePlot

But we can do it even faster with heatmap(statCount2DArgs) method:

val heatmapLayerPlot = df.plot {
    heatmap(`class`, drv)
    layout.title = "`heatmap()` layer"
}
heatmapLayerPlot

plotGrid(listOf(statCount2DAndTilePlot, heatmapLayerPlot))

These two plots are identical. Indeed, heatmap just uses statCount2D and tile and performs coordinates and fillColor mappings under the hood. And we can customize heatmap layer: heatmap() optionally opens a new context, where we can configure tiles (as in the usual context opened by tile { ... }) — even change default mappings. StatCount2D dataset of heatmap also can be accessed here.

df.plot {
    heatmap(`class`, drv) {
        // Swap coordinate mappings:
        x(Stat.y)
        y(Stat.x)
        // Default mapping but with custom scale
        fillColor(Stat.count) {
            scale = continuousColorBrewer(BrewerPalette.Sequential.Reds)
        }
    }
}

If we specify weights, Stat.countWeighted is mapped to fillColor by default:

df.plot {
    heatmap(`class`, drv, hwy)
}

`heatmap` plot

heatmap(statCount2DArgs) and DataFrame.heatmap(statCount2DArgs) are a family of functions for fast plotting a heatmap.

heatmap(
    listOf("A", "A", "A", "B", "B", "C", "B", "B"),
    listOf(1, 1, 1, 2, 1, 2, 1, 2),
)

df.heatmap("class", "drv")

In case you want to provide inputs and weights using column selection DSL, it's a bit different from the usual one — you should assign x and y inputs and (optionally) weight throw invocation eponymous functions:

df.heatmap {
    x(`class`)
    y(drv)
    weight(hwy)
}

Heatmap plot can be configured with .configure {} extension — it opens context that combines tile, StatCount2D and plot context. That means you can configure tile settings, mappings using StatCount2D dataset and any plot adjustments:

df.heatmap {
    x(`class`)
    y(drv)
    weight(hwy)
}.configure {
    // Tile + StatCount2D + PlotBuilder
    // Can't add new layer
    // Can add tile mapping, including for `Stat.*` columns
    fillColor(Stat.count) {
        scale = continuous(Color.GREEN..Color.RED)
    }
    alpha = 0.6
    // Can configure general plot adjustments
    layout {
        title = "Configured `heatmap` plot"
        size = 600 to 350
    }
}

Heatmap﻿

Usage﻿

Arguments﻿

Generalized signature﻿

Output statistics﻿

StatCount plots﻿

statCount2D plot transform﻿

Heatmap layer﻿

heatmap plot﻿

See also