Read from SQL databases
These functions allow you to interact with an SQL database using a Kotlin DataFrame library.
There are two main blocks of available functionality:
Methods for reading data from a database
readSqlTable
reads specific database tablereadSqlQuery
executes SQL queryreadResultSet
reads from created earlier ResultSetreadAllSqlTables
reads all tables (all non-system tables)
Methods for reading table schemas
getSchemaForSqlTable
for specific tablesgetSchemaForSqlQuery
for result of executing SQL queriesgetSchemaForResultSet
for created earlierResultSet
getSchemaForAllSqlTables
for all non-system tables
All methods above can be accessed like DataFrame.getSchemaFor...()
via a companion for DataFrame
.
Also, there are a few extension functions available on Connection
, ResultSet
, and DbConnectionConfig
objects.
Methods for reading data from a database
readDataFrame
onConnection
orDbConnectionConfig
converts the result of an SQL query or SQL table to aDataFrame
object.readDataFrame
onResultSet
reads from created earlierResultSet
Methods for reading table schemas from a database
getDataFrameSchema
onConnection
orDbConnectionConfig
for an SQL query result or the SQL tablegetDataFrameSchema
onResultSet
for created earlierResultSet
NOTE: This is an experimental module, and for now, we only support four databases: MS SQL, MariaDB, MySQL, PostgreSQL, and SQLite.
Additionally, support for JSON and date-time types is limited. Please take this into consideration when using these functions.
Getting started with reading from SQL database in Gradle Project
In the first, you need to add a dependency
after that, you need to add a dependency for a JDBC driver for the used database, for example
For MariaDB:
Maven Central version could be found here.
For PostgreSQL:
Maven Central version could be found here.
For MySQL:
Maven Central version could be found here.
For SQLite:
Maven Central version could be found here.
For MS SQL:
Maven Central version could be found here.
In the second, be sure that you can establish a connection to the database.
For this, usually, you need to have three things: a URL to a database, a username, and a password.
Call one of the following functions to collect data from a database and transform it to the dataframe.
For example, if you have a local PostgreSQL database named as testDatabase
with table Customer
, you could read first 100 rows and print the data just copying the code below:
Find a full example project here.
Getting Started with Notebooks
To use the latest version of the Kotlin DataFrame library and a specific version of the JDBC driver for your database (MariaDB is used as an example below) in your Notebook, run the following two cells.
First, specify the version of the JDBC driver
Next, import Kotlin DataFrame
library in the cell below.
NOTE: The order of cell execution is important, the dataframe library is waiting for a JDBC driver to force classloading.
Find a full example Notebook here.
Nullability Inference
Each method has an important parameter called inferNullability
.
By default, this parameter is set to true
, indicating that the method should inherit the NOT NULL
constraints from the SQL table definition.
However, if you prefer to ignore the SQL constraints and determine nullability solely based on the presence of null values in the data, set this parameter to false
.
In this case, the column will be considered nullable if there is at least one null value in the data; otherwise, it will be considered non-nullable for the newly created DataFrame
object.
Reading Specific Tables
These functions read all data from a specific table in the database. Variants with a limit parameter restrict how many rows will be read from the table.
readSqlTable(dbConfig: DbConnectionConfig, tableName: String, limit: Int, inferNullability: Boolean): AnyFrame
Read all data from a specific table in the SQL database and transform it into an AnyFrame
object.
The dbConfig: DbConnectionConfig
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
The limit: Int
parameter allows setting the maximum number of records to be read.
readSqlTable(connection: Connection, tableName: String, limit: Int, inferNullability: Boolean): AnyFrame
Another variant, where instead of dbConfig: DbConnectionConfig
we use a JDBC connection: Connection
object.
Extension functions for reading SQL table
The same example, rewritten with the extension function:
Connection.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean): AnyFrame
Read all data from a specific table in the SQL database and transform it into an AnyFrame
object.
sqlQueryOrTableName:String
is the SQL query to execute or name of the SQL table.
NOTE: It should be a name of one of the existing SQL tables, or the SQL query should start from SELECT and contain one query for reading data without any manipulation. It should not contain ;
symbol.
All other parameters are described above.
DbConnectionConfig.readDataFrame(sqlQueryOrTableName: String, limit: Int, inferNullability: Boolean): AnyFrame
If you do not have a connection object or need to run a quick, isolated experiment reading data from an SQL database, you can delegate the creation of the connection to DbConnectionConfig
.
Executing SQL Queries
These functions execute an SQL query on the database and convert the result into a DataFrame
object. If a limit is provided, only that many rows will be returned from the result.
readSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String, limit: Int, inferNullability: Boolean): AnyFrame
Execute a specific SQL query on the SQL database and retrieve the resulting data as an AnyFrame.
The dbConfig: DbConnectionConfig
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
readSqlQuery(connection: Connection, sqlQuery: String, limit: Int, inferNullability: Boolean): AnyFrame
Another variant, where instead of dbConfig: DbConnectionConfig
we use a JDBC connection: Connection
object.
Extension functions for reading a result of an SQL query
The same example, rewritten with the extension function:
Reading from ResultSet
These functions read data from a ResultSet
object and convert it into a DataFrame
. The versions with a limit parameter will only read up to the specified number of rows.
readResultSet(resultSet: ResultSet, dbType: DbType, limit: Int, inferNullability: Boolean): AnyFrame
This function allows reading a ResultSet
object from your SQL database and transforms it into an AnyFrame
object.
A ResultSet object maintains a cursor pointing to its current row of data. By default, a ResultSet
object is not updatable and has a cursor that moves forward only. Therefore, you can iterate it only once and only from the first row to the last row.
More details about ResultSet
can be found in the official Java documentation.
Note that reading from the ResultSet
could potentially change its state.
The dbType: DbType
parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc.), supported by a library. Currently, the following classes are available: H2, MsSql, MariaDb, MySql, PostgreSql, Sqlite
.
readResultSet(resultSet: ResultSet, connection: Connection, limit: Int, inferNullability: Boolean): AnyFrame
Another variant, where instead of dbType: DbType
we use a JDBC connection: Connection
object.
Extension functions for reading a result of the SQL query
The same example, rewritten with the extension function:
ResultSet.readDataFrame(connection: Connection, limit: Int, inferNullability: Boolean): AnyFrame
Reads the data from a ResultSet
and converts it into a DataFrame
.
connection
is the connection to the database (it's required to extract the database type) that the ResultSet
belongs to.
Reading Entire Tables
These functions read all data from all tables in the connected database. Variants with a limit parameter restrict how many rows will be read from each table.
readAllSqlTables(dbConfig: DbConnectionConfig, limit: Int, inferNullability: Boolean): Map<String, AnyFrame>
Retrieves data from all the non-system tables in the SQL database and returns them as a map of table names to AnyFrame
objects.
The dbConfig: DbConnectionConfig
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
readAllSqlTables(connection: Connection, limit: Int, inferNullability: Boolean): Map<String, AnyFrame>
Another variant, where instead of dbConfig: DbConnectionConfig
we use a JDBC connection: Connection
object.
Schema reading for a specific SQL table
The purpose of these functions is to facilitate the retrieval of table schema. By providing a table name and either a database configuration or connection, these functions return the DataFrameSchema of the specified table.
getSchemaForSqlTable(dbConfig: DbConnectionConfig, tableName: String): DataFrameSchema
This function captures the schema of a specific table from an SQL database.
The dbConfig: DbConnectionConfig
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
getSchemaForSqlTable(connection: Connection, tableName: String): DataFrameSchema
Another variant, where instead of dbConfig: DbConnectionConfig
we use a JDBC connection: Connection
object.
Schema reading from an SQL query
These functions return the schema of an SQL query result.
Once you provide a database configuration or connection and an SQL query, they return the DataFrameSchema of the query result.
getSchemaForSqlQuery(dbConfig: DbConnectionConfig, sqlQuery: String): DataFrameSchema
This function executes an SQL query on the database and then retrieves the resulting schema.
The dbConfig: DbConnectionConfig
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
getSchemaForSqlQuery(connection: Connection, sqlQuery: String): DataFrameSchema
Another variant, where instead of dbConfig: DbConnectionConfig
we use a JDBC connection: Connection
object.
Extension functions for schema reading from an SQL query or an SQL table
The same example, rewritten with the extension function:
Connection.getDataFrameSchema(sqlQueryOrTableName: String): DataFrameSchema
Retrieves the schema of an SQL query result or an SQL table using the provided database configuration.
DbConnectionConfig.getDataFrameSchema(sqlQueryOrTableName: String): DataFrameSchema
Retrieves the schema of an SQL query result or an SQL table using the provided database configuration.
The dbConfig: DbConnectionConfig
represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
Schema reading from ResultSet
These functions return the schema from a ResultSet
provided by the user.
This can help developers infer the structure of the result set, which is quite essential for data transformation and mapping purposes.
getSchemaForResultSet(resultSet: ResultSet, dbType: DbType): DataFrameSchema
This function reads the schema from a ResultSet
object provided by the user.
The dbType: DbType
parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc.), supported by a library. Currently, the following classes are available: H2, MariaDb, MySql, PostgreSql, Sqlite
.
getSchemaForResultSet(connection: Connection, sqlQuery: String): DataFrameSchema
Another variant, where instead of dbType: DbType
we use a JDBC connection: Connection
object.
Extension functions for schema reading from the ResultSet
The same example, rewritten with the extension function:
if you are using this extension function
ResultSet.getDataFrameSchema(connection: Connection): DataFrameSchema
or
based on
ResultSet.getDataFrameSchema(dbType: DbType): DataFrameSchema
Schema reading for all non-system tables
These functions return a list of all DataFrameSchema
from all the non-system tables in the SQL database. They can be called with either a database configuration or a connection.
getSchemaForAllSqlTables(dbConfig: DbConnectionConfig): Map<String, DataFrameSchema>
This function retrieves the schema of all tables from an SQL database and returns them as a map of table names to DataFrameSchema
objects.
The dbConfig: DbConnectionConfig
parameter represents the configuration for a database connection, created under the hood and managed by the library. Typically, it requires a URL, username, and password.
getSchemaForAllSqlTables(connection: Connection): Map<String, DataFrameSchema>
This function retrieves the schema of all tables using a JDBC connection: Connection
object and returns them as a list of DataFrameSchema
.