Kotlin DataFrame for SQL & Backend Developers
This guide helps Kotlin backend developers with SQL experience quickly adapt to Kotlin DataFrame, mapping familiar SQL and ORM operations to DataFrame concepts.
If you plan to work on a Gradle project without a Kotlin Notebook, we recommend installing the library together with our experimental Kotlin compiler plugin (available since version 2.2.*). This plugin generates type-safe schemas at compile time, tracking schema changes throughout your data pipeline.
Add Kotlin DataFrame Gradle dependency
You could read more about the setup of the Gradle build in the Gradle Setup Guide.
In your Gradle build file (build.gradle
or build.gradle.kts
), add the Kotlin DataFrame library as a dependency:
1. What is a dataframe?
If you’re used to SQL, a dataframe is conceptually like a table:
Rows: ordered records of data
Columns: named, typed fields
Schema: a mapping of column names to types
Kotlin DataFrame also supports hierarchical, JSON-like data — columns can contain nested dataframes or column groups, allowing you to represent and transform tree-like structures without flattening.
Unlike a relational DB table:
A DataFrame object lives in memory — there’s no storage engine or transaction log
It’s immutable — each operation produces a new DataFrame
There is no concept of foreign keys or relations between DataFrames
It can be created from any source: CSV, JSON, SQL tables, Apache Arrow, in-memory objects
2. Reading Data From SQL
Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.
Approach | Example |
---|---|
From a table |
|
From a SQL query |
|
From a JDBC Connection |
|
From a ResultSet (extension) |
|
More information can be found here.
3. Why It’s Not an ORM
Frameworks like Hibernate or Exposed:
Map DB tables to Kotlin objects (entities)
Track object changes and sync them back to the database
Focus on persistence and transactions
Kotlin DataFrame:
Has no persistence layer
Doesn’t try to map rows to mutable entities
Focuses on in-memory analytics, transformations, and type-safe pipelines
The main idea is that the schema changes together with your transformations — and the **Compiler Plugin ** updates the type-safe API automatically under the hood.
You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or transformations.
In ORMs, the mapping layer is frozen — schema changes require manual model edits and migrations.
Think of Kotlin DataFrame as a data analysis/ETL tool, not an ORM.
4. Key Differences from SQL & ORMs
Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
---|---|---|---|
Storage | Persistent | Persistent | In-memory only |
Schema definition |
| Defined in entity classes | Derived from data or transformations or defined manually |
Schema change |
| Manual migration of entity classes | Automatic via transformations + Compiler Plugin or defined manually |
Relations | Foreign keys | Mapped via annotations | Not applicable |
Transactions | Yes | Yes | Not applicable |
DB Indexes | Yes | Yes (via DB) | Not applicable |
Data manipulation | SQL DML ( | CRUD mapped to DB | Transformations only (immutable) |
Joins |
| Eager/lazy loading | |
Grouping & aggregation |
| DB query with groupBy | |
Filtering |
| Criteria API / query DSL | |
Permissions |
| DB-level permissions | Not applicable |
Execution | On DB engine | On DB engine | In JVM process |
5. SQL → Kotlin DataFrame Cheatsheet
DDL Analogues
SQL DDL Command / Example | Kotlin DataFrame Equivalent |
---|---|
Create table: |
|
Add column: |
|
Rename column: |
|
Drop column: |
|
Modify column type: |
|
DML Analogues
SQL DML Command / Example | Kotlin DataFrame Equivalent |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pivot: |
|
Explode array column: |
|
Update column: |
|
6. Example: SQL vs. DataFrame Side-by-Side
SQL (PostgreSQL):
In Conclusion
Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe ** and fully integrated into Kotlin.
The main focus is readability and schema change safety via the Compiler Plugin.
It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory layer for analytics and transformations.
It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources.
Use Kotlin DataFrame as a type-safe DSL for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted.
Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more performant database engine.
What's Next?
If you're ready to go through a complete example, we recommend our Quickstart Guide — you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.
Ready to go deeper? Check out what’s next:
📘 Explore in-depth guides and various examples with different datasets, API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
🛠️ Browse the operations overview to learn what Kotlin DataFrame can do.
🧠 Understand the design and core concepts in the library overview.
🔤 Learn more about Extension Properties
and make working with your data both convenient and type-safe.💡 Use Kotlin DataFrame Compiler Plugin
for auto-generated column access in your IntelliJ IDEA projects.📊 Master Kandy for stunning and expressive DataFrame visualizations Kandy Documentation.