Quickstart Guide
This guide shows how to quickly get started with Kotlin DataFrame:
you'll learn how to load data, perform basic transformations, and build a simple plot using Kandy.
We recommend starting with Kotlin Notebook for the best beginner experience — everything works out of the box, including interactivity and rich DataFrame and plots rendering.
You can instantly see the results of each operation: view the contents of your DataFrames after every transformation, inspect individual rows and columns, and explore data step-by-step in a live and interactive way.
You can view this guide as a notebook on GitHub or download quickstart.ipynb.
To start working with Kotlin DataFrame in a notebook, run the cell with the next code:
%useLatestDescriptors
%use dataframe
This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame rendering. Learn more here.
Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various databases. Read a CSV with the "Jetbrains Repositories" dataset into df
variable:
val df = DataFrame.readCsv(
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)
To display your dataframe as a cell output, place it in the last line of the cell:
df
Kotlin Notebook has special interactive outputs for DataFrame
. Learn more about them here.
Use .describe()
method to get dataset summaries — column types, number of nulls, and simple statistics.
df.describe()
Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of columns. Column selectors are widely used across operations — one of the simplest examples is .select { }
, which returns a new DataFrame with only the columns chosen in Columns Selection expression.
After executing the cell where a DataFrame
variable is declared, extension properties for its columns are automatically generated. These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
Select some columns:
// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected
tip
With a Kotlin DataFrame Compiler Plugin enabled, you can use auto-generated properties in your IntelliJ IDEA projects.
Some operations use the DataRow API, with expressions and conditions that apply for all DataFrame
rows. For example, .filter { }
that returns a new DataFrame
with rows that satisfy a condition given by row expression.
Inside a row expression, you can access the values of the current row by column names through auto-generated properties. Similar to the Columns Selection DSL, but in this case the properties represent actual values, not column references.
Filter rows by "stargazers_count" value:
// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered
Columns can be renamed using the .rename { }
operation, which also uses the Columns Selection DSL to select a column to rename. The rename
operation does not perform the renaming immediately; instead, it creates an intermediate object that must be finalized into a new DataFrame
by calling the .into()
function with the new column name.
Rename "full_name" and "stargazers_count" columns:
// Rename "full_name" column into "name"
val dfRenamed = dfFiltered.rename { full_name }.into("name")
// And "stargazers_count" into "starsCount"
.rename { stargazers_count }.into("starsCount")
dfRenamed
Columns can be modified using the update { }
and convert { }
operations. Both operations select columns to modify via the Columns Selection DSL and, similar to rename
, create an intermediate object that must be finalized to produce a new DataFrame
.
The update
operation preserves the original column types, while convert
allows changing the type. In both cases, column names and their positions remain unchanged.
Update "name" and convert "topics":
val dfUpdated = dfRenamed
// Update "name" values with only its second part (after '/')
.update { name }.with { it.split("/")[1] }
// Convert "topics" `String` values into `List<String>` by splitting:
.convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") }
dfUpdated
Check the new "topics" type out:
dfUpdated.topics.type()
Output:
kotlin.collections.List<kotlin.String>
The .add { }
function allows creating a DataFrame
with a new column, where the value for each row is computed based on the existing values in that row. These values can be accessed within the row expressions.
Add a new Boolean
column "isIntellij":
// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij
A DataFrame
can be grouped by column keys, meaning its rows are split into groups based on the values in the key columns. The .groupBy { }
operation selects columns and groups the DataFrame
by their values, using them as grouping keys.
The result is a GroupBy
— a DataFrame
-like structure that associates each key with the corresponding subset of the original DataFrame
.
Group dfWithIsIntellij
by "isIntellij":
val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij
A GroupBy
can be aggregated — that is, you can compute one or several summary statistics for each group. The result of the aggregation is a DataFrame
containing the key columns along with new columns holding the computed statistics for a corresponding group.
For example, count()
computes size of group:
groupedByIsIntellij.count()
Compute several statistics with .aggregate { }
that provides an expression for aggregating:
groupedByIsIntellij.aggregate {
// Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
sumOf { starsCount } into "sumStars"
maxOf { starsCount } into "maxStars"
}
.sort {}
/.sortByDesc
sortes rows by value in selected columns, returning a DataFrame with sorted rows. take(n)
returns a new DataFrame
with the first n
rows.
Combine them to get Top-10 repositories by number of stars:
val dfTop10 = dfWithIsIntellij
// Sort by "starsCount" value descending
.sortByDesc { starsCount }.take(10)
dfTop10
Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a convenient and typesafe way to build data visualizations.
Kandy can be loaded into notebook using %use kandy
:
%use kandy
Build a simple bar chart with .plot { }
extension for DataFrame, that allows to use extension properties inside Kandy plotting DSL (plot will be rendered as an output after cell execution):
dfTop10.plot {
bars {
x(name)
y(starsCount)
}
layout.title = "Top 10 JetBrains repositories by stars count"
}
A DataFrame
supports writing to all formats that it is capable of reading.
Write into Excel:
dfWithIsIntellij.writeExcel("jb_repos.xlsx")
In this quickstart, we covered the basics — reading data, transforming it, and building a simple visualization.
Ready to go deeper? Check out what’s next:
📘 Explore in-depth guides and various examples with different datasets, API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
🛠️ Browse the operations overview to learn what Kotlin DataFrame can do.
🧠 Understand the design and core concepts in the library overview.
🔤 Learn more about Extension Properties
and make working with your data both convenient and type-safe.💡 Use Kotlin DataFrame Compiler Plugin
for auto-generated column access in your IntelliJ IDEA projects.📊 Master Kandy for stunning and expressive DataFrame visualizations learning Kandy Documentation.