Read from SQL databases
These functions allow you to interact with an SQL database using a Kotlin DataFrame library.
There are two main blocks of available functionality:
reading data from the database
function
readSqlTable
reads specific database tablefunction
readSqlQuery
executes SQL queryfunction
readResultSet
reads from created earlier ResultSetfunction
readAllSqlTables
reads all tables (all non-system tables)
schema retrieval
getSchemaForSqlTable
for specific tablesgetSchemaForSqlQuery
for result of executing SQL queriesgetSchemaForResultSet
for rows reading through the given ResultSetgetSchemaForAllSqlTables
for all non-system tables
NOTE: This is an experimental module and for now, we only support four databases: MariaDB, MySQL, PostgreSQL, and SQLite.
Additionally, support for JSON and date-time types is limited. Please take this into consideration when using these functions.
Getting started with reading from SQL database in Gradle Project
In the first, you need to add a dependency
after that, you need to add a dependency for a JDBC driver for the used database, for example
For MariaDB:
For PostgreSQL:
For MySQL:
For SQLite:
In the second, be sure that you can establish a connection to the database.
For this, usually, you need to have three things: a URL to a database, a username and a password.
Call one of the following functions to obtain data from a database and transform it to the dataframe.
For example, if you have a local PostgreSQL database named as testDatabase
with table Customer
, you could read first 100 rows and print the data just copying the code below:
Find a full example project here.
Getting Started with Notebooks
To use the latest version of the Kotlin DataFrame library and a specific version of the JDBC driver for your database (MariaDB is used as an example below) in your Notebook, run the following two cells.
First of all, specify the version of the JDBC driver
Next, import Kotlin DataFrame
library in the cell below.
NOTE: The order of cell execution is important, the dataframe library is waiting for a JDBC driver to force classloading.
Find full example Notebook here.
Reading Specific Tables
These functions read all data from a specific table in the database. Variants with a limit parameter restrict how many rows will be read from the table.
readSqlTable(dbConfig: DatabaseConfiguration, tableName: String): AnyFrame
Read all data from a specific table in the SQL database and transform it into an AnyFrame object.
The dbConfig: DatabaseConfiguration
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username and password.
readSqlTable(dbConfig: DatabaseConfiguration, tableName: String, limit: Int): AnyFrame
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read.
readSqlTable(connection: Connection, tableName: String): AnyFrame
Another variant, where instead of dbConfig: DatabaseConfiguration
we use a JDBC connection: Connection
object.
readSqlTable(connection: Connection, tableName: String, limit: Int): AnyFrame
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read.
Executing SQL Queries
These functions execute an SQL query on the database and convert the result into a DataFrame. If a limit is provided, only that many rows will be returned from the result.
readSqlQuery(dbConfig: DatabaseConfiguration, sqlQuery: String): AnyFrame
Execute a specific SQL query on the SQL database and retrieve the resulting data as an AnyFrame.
The dbConfig: DatabaseConfiguration
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username and password.
readSqlQuery(dbConfig: DatabaseConfiguration, sqlQuery: String, limit: Int): AnyFrame
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read.
readSqlQuery(connection: Connection, sqlQuery: String): AnyFrame
Another variant, where instead of dbConfig: DatabaseConfiguration
we use a JDBC connection: Connection
object.
readSqlQuery(connection: Connection, sqlQuery: String, limit: Int): AnyFrame
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read.
Reading from ResultSet
These functions read data from a ResultSet object and convert it into a DataFrame. The versions with a limit parameter will only read up to the specified number of rows.
readResultSet(resultSet: ResultSet, dbType: DbType): AnyFrame
This function allows reading a ResultSet object from your SQL database and transforms it into an AnyFrame object.
The dbType: DbType
parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc.), supported by a library. Currently, the following classes are available: H2, MariaDb, MySql, PostgreSql, Sqlite
.
readResultSet(resultSet: ResultSet, dbType: DbType, limit: Int): AnyFrame
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read.
readResultSet(resultSet: ResultSet, connection: Connection): AnyFrame
Another variant, where instead of dbType: DbType
we use a JDBC connection: Connection
object.
readResultSet(resultSet: ResultSet, connection: Connection, limit: Int): AnyFrame
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read.
Reading Entire Tables
These functions read all data from all tables in the connected database. Variants with a limit parameter restrict how many rows will be read from each table.
readAllSqlTables(connection: Connection): List<AnyFrame>
Retrieves data from all the non-system tables in the SQL database and returns them as a list of AnyFrame objects.
The dbConfig: DatabaseConfiguration
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username and password.
readAllSqlTables(connection: Connection, limit: Int): List<AnyFrame>
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read from each table.
NOTE: the setting the different limits for different tables is not supported.
readAllSqlTables(connection: Connection): List<AnyFrame>
Another variant, where instead of dbConfig: DatabaseConfiguration
we use a JDBC connection: Connection
object.
readAllSqlTables(connection: Connection, limit: Int): List<AnyFrame>
A variant of the previous function, but with an added limit: Int
parameter that allows setting the maximum number of records to be read from each table.
NOTE: the setting the different limits for different tables is not supported.
Schema retrieval for specific SQL table
The purpose of these functions is to facilitate the retrieval of table schema. By providing a table name and either a database configuration or connection, these functions return the DataFrameSchema of the specified table.
getSchemaForSqlTable(dbConfig: DatabaseConfiguration, tableName: String): DataFrameSchema
This function captures the schema of a specific table from an SQL database.
The dbConfig: DatabaseConfiguration
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username and password.
getSchemaForSqlTable(connection: Connection, tableName: String): DataFrameSchema
Another variant, where instead of dbConfig: DatabaseConfiguration
we use a JDBC connection: Connection
object.
Schema retrieval from SQL query
These functions return the schema of an SQL query result.
Once you provide a database configuration or connection and an SQL query, they return the DataFrameSchema of the query result.
getSchemaForSqlQuery(dbConfig: DatabaseConfiguration, sqlQuery: String): DataFrameSchema
This function executes an SQL query on the database and then retrieves the resulting schema.
The dbConfig: DatabaseConfiguration
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username and password.
getSchemaForSqlQuery(connection: Connection, sqlQuery: String): DataFrameSchema
Another variant, where instead of dbConfig: DatabaseConfiguration
we use a JDBC connection: Connection
object.
Schema retrieval from ResultSet
These functions return the schema from a ResultSet provided by the user.
This can help developers infer the structure of the result set, which is quite essential for data transformation and mapping purposes.
getSchemaForResultSet(resultSet: ResultSet, dbType: DbType): DataFrameSchema
This function reads the schema from a ResultSet object provided by the user.
The dbType: DbType
parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc.), supported by a library. Currently, the following classes are available: H2, MariaDb, MySql, PostgreSql, Sqlite
.
getSchemaForSqlQuery(connection: Connection, sqlQuery: String): DataFrameSchema
Another variant, where instead of dbConfig: DatabaseConfiguration
we use a JDBC connection: Connection
object.
Schema retrieval for all non-system tables
These functions return a list of all DataFrameSchema
from all the non-system tables in the SQL database. They can be called with either a database configuration or a connection.
getSchemaForAllSqlTables(dbConfig: DatabaseConfiguration): List<DataFrameSchema>
This function retrieves the schema of all tables from an SQL database and returns them as a list of DataFrameSchema
.
The dbConfig: DatabaseConfiguration
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username and password.
getSchemaForAllSqlTables(connection: Connection): List<DataFrameSchema>
This function retrieves the schema of all tables using a JDBC connection: Connection
object and returns them as a list of DataFrameSchema
.