How to Extend DataFrame Library for Custom SQL Database Support: Example with HSQLDB
Edit page Last modified: 14 January 2025This guide demonstrates how advanced users can extend the Kotlin DataFrame library to support a custom SQL database, using HSQLDB as an example. By following these steps, you will be able to integrate your custom database into the DataFrame library, allowing for seamless DataFrame creation, manipulation, and querying.
This guide is intended for Gradle projects, but the experience will be similar in Kotlin Notebooks, as demonstrated in this Kotlin DataFrame SQL Example.
Prerequisites
Create a Gradle Project:
Add the following dependencies and dataframe plugin to your build.gradle.kts
:
plugins {
id("org.jetbrains.kotlinx.dataframe") version "$dataframe_version"
}
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:$dataframe_version")
implementation("org.hsqldb:hsqldb:$version")
}
Install HSQLDB:
Follow the HSQLDB Quick Guide to set up HSQLDB locally.
Start the HSQLDB Server:
Launch a terminal or command prompt and execute the following command:
java -classpath lib/hsqldb.jar org.hsqldb.server.Server --database.0 file:hsqldb/demodb --dbname.0 testdb
Implementing Custom Database Type Support
To enable HSQLDB integration, implement a custom DbType
by overriding required methods.
Create the HSQLDB Type
/**
* Represents the HSQLDB database type.
*
* This class provides methods to convert data from a ResultSet to the appropriate type for HSQLDB,
* and to generate the corresponding column schema.
*/
public object HSQLDB : DbType("hsqldb") {
override val driverClassName: String
get() = "org.hsqldb.jdbcDriver"
override fun convertSqlTypeToColumnSchemaValue(tableColumnMetadata: TableColumnMetadata): ColumnSchema? {
return null
}
override fun isSystemTable(tableMetadata: TableMetadata): Boolean {
val locale = Locale.getDefault()
fun String?.containsWithLowercase(substr: String) = this?.lowercase(locale)?.contains(substr) == true
val schemaName = tableMetadata.schemaName
val name = tableMetadata.name
return schemaName.containsWithLowercase("information_schema") ||
schemaName.containsWithLowercase("system") ||
name.containsWithLowercase("system_")
}
override fun buildTableMetadata(tables: ResultSet): TableMetadata =
TableMetadata(
tables.getString("TABLE_NAME"),
tables.getString("TABLE_SCHEM"),
tables.getString("TABLE_CAT"),
)
override fun convertSqlTypeToKType(tableColumnMetadata: TableColumnMetadata): KType? {
return null
}
}
Defining Helper Functions
Define utility functions to manage database connections and tables. For example purposes, we create a small function that can populate the table with a schema and some sample data.
const val URL = "jdbc:hsqldb:hsql://localhost/testdb"
const val USER_NAME = "SA"
const val PASSWORD = ""
fun removeTable(con: Connection): Int {
val stmt = con.createStatement()
return stmt.executeUpdate("""DROP TABLE orders""")
}
fun createAndPopulateTable(con: Connection) {
val stmt = con.createStatement()
stmt.executeUpdate(
"""CREATE TABLE IF NOT EXISTS orders (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
item VARCHAR(50) NOT NULL,
price DOUBLE NOT NULL,
order_date DATE
);
""".trimIndent()
)
stmt.executeUpdate(
"""INSERT INTO orders (item, price, order_date)
VALUES ('Laptop', 1500.00, NOW())""".trimIndent()
)
stmt.executeUpdate(
"""INSERT INTO orders (item, price, order_date)
VALUES ('Smartphone', 700.00, NOW())""".trimIndent()
)
}
Define the Table Schema
Use the @DataSchema
annotation to define a custom data schema for the orders
table.
@DataSchema
interface Orders {
val id: Int
val item: String
val price: Double
val orderDate: java.util.Date
}
End-to-End Example
Finally, use the following code to create, populate, read, and delete the table in HSQLDB.
fun main() {
DriverManager.getConnection(URL, USER_NAME, PASSWORD).use { con ->
createAndPopulateTable(con)
val df = con
.readDataFrame("SELECT * FROM orders", dbType = HSQLDB)
.rename { all() }.into { it.name.lowercase(Locale.getDefault()).toCamelCaseByDelimiters(DELIMITERS_REGEX) }
.cast<Orders>(verify = true)
df.filter { it.price > 800 }.print()
removeTable(con)
}
}
Running the main
function above will output filtered rows from the orders
table where price > 800
.
It will also demonstrate how to define and use custom SQL database extensions in the DataFrame library.
Find a full example project here.