Quickstart Guide

Read DataFrame

Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various databases. Read a CSV with the "Jetbrains Repositories" dataset into df variable:

val df = DataFrame.readCsv(
    "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)

Display And Explore

df

Kotlin Notebook has special interactive outputs for DataFrame. Learn more about them here.

Use .describe() method to get dataset summaries — column types, number of nulls, and simple statistics.

df.describe()

Select Columns

Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of columns. Column selectors are widely used across operations — one of the simplest examples is .select { }, which returns a new DataFrame with only the columns chosen in Columns Selection expression.

After executing the cell where a DataFrame variable is declared, extension properties for its columns are automatically generated. These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.

// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected

Row Filtering

Some operations use the DataRow API, with expressions and conditions that apply for all DataFrame rows. For example, .filter { } that returns a new DataFrame with rows that satisfy a condition given by row expression.

// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered

Columns Rename

Columns can be renamed using the .rename { } operation, which also uses the Columns Selection DSL to select a column to rename. The rename operation does not perform the renaming immediately; instead, it creates an intermediate object that must be finalized into a new DataFrame by calling the .into() function with the new column name.

// Rename "full_name" column into "name"
val dfRenamed = dfFiltered.rename { full_name }.into("name")
    // And "stargazers_count" into "starsCount"
    .rename { stargazers_count }.into("starsCount")
dfRenamed

Modify Columns

Columns can be modified using the update { } and convert { } operations. Both operations select columns to modify via the Columns Selection DSL and, similar to rename, create an intermediate object that must be finalized to produce a new DataFrame.

The update operation preserves the original column types, while convert allows changing the type. In both cases, column names and their positions remain unchanged.

val dfUpdated = dfRenamed
    // Update "name" values with only its second part (after '/')
    .update { name }.with { it.split("/")[1] }
    // Convert "topics" `String` values into `List<String>` by splitting:
    .convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") }
dfUpdated

dfUpdated.topics.type()

kotlin.collections.List<kotlin.String>

Adding New Columns

The .add { } function allows creating a DataFrame with a new column, where the value for each row is computed based on the existing values in that row. These values can be accessed within the row expressions.

Add a new Boolean column "isIntellij":

// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
    name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij

Grouping And Aggregating

A DataFrame can be grouped by column keys, meaning its rows are split into groups based on the values in the key columns. The .groupBy { } operation selects columns and groups the DataFrame by their values, using them as grouping keys.

The result is a GroupBy — a DataFrame-like structure that associates each key with the corresponding subset of the original DataFrame.

Group dfWithIsIntellij by "isIntellij":

val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij

A GroupBy can be aggregated — that is, you can compute one or several summary statistics for each group. The result of the aggregation is a DataFrame containing the key columns along with new columns holding the computed statistics for a corresponding group.

For example, count() computes size of group:

groupedByIsIntellij.count()

Compute several statistics with .aggregate { } that provides an expression for aggregating:

groupedByIsIntellij.aggregate {
    // Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
    sumOf { starsCount } into "sumStars"
    maxOf { starsCount } into "maxStars"
}

Sorting Rows

.sort {}/.sortByDesc sortes rows by value in selected columns, returning a DataFrame with sorted rows. take(n) returns a new DataFrame with the first n rows.

val dfTop10 = dfWithIsIntellij
    // Sort by "starsCount" value descending
    .sortByDesc { starsCount }.take(10)
dfTop10

Plotting With Kandy

Kandy can be loaded into notebook using %use kandy:

%use kandy

Build a simple bar chart with .plot { } extension for DataFrame, that allows to use extension properties inside Kandy plotting DSL (plot will be rendered as an output after cell execution):

dfTop10.plot {
    bars {
        x(name)
        y(starsCount)
    }

    layout.title = "Top 10 JetBrains repositories by stars count"
}

Write DataFrame

A DataFrame supports writing to all formats that it is capable of reading.

dfWithIsIntellij.writeExcel("jb_repos.xlsx")

Quickstart Guide﻿

Read DataFrame﻿

Display And Explore﻿

Select Columns﻿

tip

Row Filtering﻿

Columns Rename﻿

Modify Columns﻿

Adding New Columns﻿

Grouping And Aggregating﻿

Sorting Rows﻿

Plotting With Kandy﻿

Write DataFrame﻿

What's Next?﻿