You can write your DataFrame in XLS, XLSX format to a file, OutputStream or Workbook object.
df.writeExcel(file)
Values of ColumnGroup, FrameColumn, i.e. AnyRow, AnyFrame will be serialized as JSON objects.
If you work directly with Apache POI, you can use created Workbook and Sheets in your code:
/**
* Do something with generated sheets. Here we set bold style for headers and italic style for first data column
*/
fun setStyles(sheet: Sheet) {
val headerFont = sheet.workbook.createFont()
headerFont.bold = true
val headerStyle = sheet.workbook.createCellStyle()
headerStyle.setFont(headerFont)
val indexFont = sheet.workbook.createFont()
indexFont.italic = true
val indexStyle = sheet.workbook.createCellStyle()
indexStyle.setFont(indexFont)
sheet.forEachIndexed { index, row ->
if (index == 0) {
for (cell in row) {
cell.cellStyle = headerStyle
}
} else {
row.first().cellStyle = indexStyle
}
}
}
// Create a workbook (or use existing)
val wb = WorkbookFactory.create(true)
// Create different sheets from different data frames in the workbook
val allPersonsSheet = df.writeExcel(wb, sheetName = "allPersons")
val happyPersonsSheet =
df.filter { person -> person.isHappy }.remove("isHappy").writeExcel(wb, sheetName = "happyPersons")
val unhappyPersonsSheet =
df.filter { person -> !person.isHappy }.remove("isHappy").writeExcel(wb, sheetName = "unhappyPersons")
// Do anything you want by POI
listOf(happyPersonsSheet, unhappyPersonsSheet).forEach { setStyles(it) }
// Save the result
file.outputStream().use { wb.write(it) }
wb.close()
Add new sheets without using Apache POI directly by using a parameter to keep using the same file if it already exists
// Create a new Excel workbook with a single sheet called "allPersons", replacing the file if it already exists -> Current sheets: allPersons
df.writeExcel(file, sheetName = "allPersons")
// Add a new sheet to the previous file without replacing it, by setting keepFile = true -> Current sheets: allPersons, happyPersons
df.filter { person -> person.isHappy }.remove("isHappy")
.writeExcel(file, sheetName = "happyPersons", keepFile = true)
// Add a new sheet to the previous file without replacing it, by setting keepFile = true -> Current sheets: allPersons, happyPersons, unhappyPersons
df.filter { person -> !person.isHappy }.remove("isHappy")
.writeExcel(file, sheetName = "unhappyPersons", keepFile = true)
Data may be saved "as is" (like exporting to new Excel file) or converted to match some target Schema if you have it (like inserting into existing SQL table).
The first approach is quite easy:
df.writeArrowIPC(file)
// or
df.writeArrowFeather(file)
(writing to file, opened stream or channel),
val ipcByteArray: ByteArray = df.saveArrowIPCToByteArray()
// or
val featherByteArray: ByteArray = df.saveArrowFeatherToByteArray()
(creating byte array). Nested frames and columns with mixed or unsupported types will be saved as String.
The second approach is a bit more tricky. You have to specify schema itself and casting behavior mode as ArrowWriter parameters. Behavior Mode has four independent switchers: restrictWidening, restrictNarrowing, strictType, strictNullable. You can use Mode.STRICT (this is default), Mode.LOYAL or any combination you want. The ArrowWriter object should be closed after using because Arrow uses random access buffers not managed by Java GC. Finally, you can specify a callback to be invoked if some data is lost or can not be saved according to your schema.
Here is full example:
// Get schema from anywhere you want. It can be deserialized from JSON, generated from another dataset
// (including the DataFrame.columns().toArrowSchema() method), created manually, and so on.
val schema = Schema.fromJSON(schemaJson)
df.arrowWriter(
// Specify your schema
targetSchema = schema,
// Specify desired behavior mode
mode = ArrowWriter.Mode(
restrictWidening = true,
restrictNarrowing = true,
strictType = true,
strictNullable = false,
),
// Specify mismatch subscriber
mismatchSubscriber = writeMismatchMessage,
).use { writer: ArrowWriter ->
// Save to any format and sink, like in the previous example
writer.writeArrowFeather(file)
}