DataFrame 1.0 Help

Kotlin DataFrame for SQL & Backend Developers

This guide helps Kotlin backend developers with SQL experience quickly adapt to Kotlin DataFrame, mapping familiar SQL and ORM operations to DataFrame concepts.

If you plan to work on a Gradle project without a Kotlin Notebook, we recommend installing the library together with our experimental Kotlin compiler plugin (available since version 2.2.*). This plugin generates type-safe schemas at compile time, tracking schema changes throughout your data pipeline.

Add Kotlin DataFrame Gradle dependency

You could read more about the setup of the Gradle build in the Gradle Setup Guide.

In your Gradle build file (build.gradle or build.gradle.kts), add the Kotlin DataFrame library as a dependency:

dependencies { implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2") }
dependencies { implementation 'org.jetbrains.kotlinx:dataframe:1.0.0-Beta2' }

1. What is a dataframe?

If you’re used to SQL, a dataframe is conceptually like a table:

  • Rows: ordered records of data

  • Columns: named, typed fields

  • Schema: a mapping of column names to types

Kotlin DataFrame also supports hierarchical, JSON-like data — columns can contain nested dataframes or column groups, allowing you to represent and transform tree-like structures without flattening.

Unlike a relational DB table:

  • A DataFrame object lives in memory — there’s no storage engine or transaction log

  • It’s immutable — each operation produces a new DataFrame

  • There is no concept of foreign keys or relations between DataFrames

  • It can be created from any source: CSV, JSON, SQL tables, Apache Arrow, in-memory objects

2. Reading Data From SQL

Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.

Approach

Example

From a table

val df = DataFrame.readSqlTable(dbConfig, "customers")

From a SQL query

val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")

From a JDBC Connection

val df = connection.readDataFrame("SELECT * FROM orders")

From a ResultSet (extension)

val df = resultSet.readDataFrame(connection)

import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig val dbConfig = DbConnectionConfig( url = "jdbc:postgresql://localhost:5432/mydb", user = "postgres", password = "secret" ) // Table val customers = DataFrame.readSqlTable(dbConfig, "customers") // Query val salesByRegion = DataFrame.readSqlQuery( dbConfig, """ SELECT region, SUM(amount) AS total FROM sales GROUP BY region """ ) // From JDBC connection connection.readDataFrame("SELECT * FROM orders") // From ResultSet val rs = connection.createStatement().executeQuery("SELECT * FROM orders") rs.readDataFrame(connection)

More information can be found here.

3. Why It’s Not an ORM

Frameworks like Hibernate or Exposed:

  • Map DB tables to Kotlin objects (entities)

  • Track object changes and sync them back to the database

  • Focus on persistence and transactions

Kotlin DataFrame:

  • Has no persistence layer

  • Doesn’t try to map rows to mutable entities

  • Focuses on in-memory analytics, transformations, and type-safe pipelines

  • The main idea is that the schema changes together with your transformations — and the **Compiler Plugin ** updates the type-safe API automatically under the hood.

    • You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or transformations.

  • In ORMs, the mapping layer is frozen — schema changes require manual model edits and migrations.

Think of Kotlin DataFrame as a data analysis/ETL tool, not an ORM.

4. Key Differences from SQL & ORMs

Feature / Concept

SQL Databases (PostgreSQL, MySQL…)

ORM (Hibernate, Exposed…)

Kotlin DataFrame

Storage

Persistent

Persistent

In-memory only

Schema definition

CREATE TABLE DDL

Defined in entity classes

Derived from data or transformations or defined manually

Schema change

ALTER TABLE

Manual migration of entity classes

Automatic via transformations + Compiler Plugin or defined manually

Relations

Foreign keys

Mapped via annotations

Not applicable

Transactions

Yes

Yes

Not applicable

DB Indexes

Yes

Yes (via DB)

Not applicable

Data manipulation

SQL DML (INSERT, UPDATE)

CRUD mapped to DB

Transformations only (immutable)

Joins

JOIN keyword

Eager/lazy loading

.join()/.leftJoin() DSL

Grouping & aggregation

GROUP BY

DB query with groupBy

.groupBy().aggregate()

Filtering

WHERE

Criteria API / query DSL

.filter { ... }

Permissions

GRANT/REVOKE

DB-level permissions

Not applicable

Execution

On DB engine

On DB engine

In JVM process

5. SQL → Kotlin DataFrame Cheatsheet

DDL Analogues

SQL DDL Command / Example

Kotlin DataFrame Equivalent

Create table:
CREATE TABLE person (name text, age int);

@DataSchema
interface Person {
val name: String
val age: Int
}

Add column:
ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;

.add("profit") { revenue - cost }

Rename column:
ALTER TABLE sales RENAME COLUMN old_name TO new_name;

.rename { old_name }.into("new_name")

Drop column:
ALTER TABLE sales DROP COLUMN old_col;

.remove { old_col }

Modify column type:
ALTER TABLE sales ALTER COLUMN amount TYPE numeric;

.convert { amount }.to<Double>()

DML Analogues

SQL DML Command / Example

Kotlin DataFrame Equivalent

SELECT col1, col2

df.select { col1 and col2 }

WHERE amount > 100

df.filter { amount > 100 }

ORDER BY amount DESC

df.sortByDesc { amount }

GROUP BY region

df.groupBy { region }

SUM(amount)

.aggregate { sum { amount } }

JOIN

.join(otherDf) { id match right.id }

LIMIT 5

.take(5)

Pivot:
SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);

.pivot(region, year) { sum { amount } }

Explode array column:
SELECT id, unnest(tags) AS tag FROM products;

.explode { tags }

Update column:
UPDATE sales SET amount = amount * 1.2;

.update { amount }.with { it * 1.2 }

6. Example: SQL vs. DataFrame Side-by-Side

SQL (PostgreSQL):

SELECT region, SUM(amount) AS total FROM sales WHERE amount > 0 GROUP BY region ORDER BY total DESC LIMIT 5;
sales.filter { amount > 0 } .groupBy { region } .aggregate { sum(amount).into("total") } .sortByDesc { total } .take(5)

In Conclusion

  • Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe ** and fully integrated into Kotlin.

  • The main focus is readability and schema change safety via the Compiler Plugin.

  • It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory layer for analytics and transformations.

  • It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources.

  • Use Kotlin DataFrame as a type-safe DSL for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted.

  • Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more performant database engine.

What's Next?

If you're ready to go through a complete example, we recommend our Quickstart Guide — you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.

Ready to go deeper? Check out what’s next:

22 August 2025