parse
Returns a DataFrame
in which the given String
columns are parsed into other types.
This is a special case of the convert operation.
This parsing operation is sometimes executed implicitly, for example, when reading from CSV or type converting from String
columns. You can recognize this by the locale
or parserOptions
arguments in these functions.
df.parse()
To parse only particular columns use a column selector:
df.parse { age and weight }
parse
tries to parse every String
column into one of supported types in the following order:
Int
Long
Instant
(kotlinx.datetime
andjava.time
)LocalDateTime
(kotlinx.datetime
andjava.time
)LocalDate
(kotlinx.datetime
andjava.time
)Duration
(kotlin.time
andjava.time
)LocalTime
(java.time
)URL
(java.net
)Double
(with optional locale settings)Boolean
BigDecimal
JSON
(arrays and objects) (requires theorg.jetbrains.kotlinx:dataframe-json
dependency)
DataFrame supports multiple parser options that can be used to customize the parsing behavior. These can be supplied to the parse
function (or any other function that can implicitly parse Strings
) as an argument.
For each option you don't supply (or supply null
) DataFrame will take the value from the Global Parser Options.
Available parser options:
locale: Locale
is used to parse doublesGlobal default locale is
Locale.getDefault()
dateTimePattern: String
is used to parse date and timeGlobal default supports ISO (local) date-time
dateTimeFormatter: DateTimeFormatter
is used to parse date and timeIs derived from
dateTimePattern
and/orlocale
ifnull
nullStrings: List<String>
is used to treat particular strings asnull
valueGlobal default null strings are "null" and "NULL"
When reading from CSV, we include even more defaults, like "", and "NA". See the KDocs there for the exact details
skipTypes: Set<KType>
types that should be skipped during parsingEmpty set by global default; parsing can result in any supported type
useFastDoubleParser: Boolean
is used to enable or disable the new fast double parserEnabled by global default
df.parse(options = ParserOptions(locale = Locale.CHINA, dateTimeFormatter = DateTimeFormatter.ISO_WEEK_DATE))
As mentioned before, you can change the default global parser options that will be used by read
, convert
, and other parse
operations. Whenever you don't explicitly provide parser options to a function call, DataFrame will use these global options instead.
For example, to change the locale to French and add a custom date-time pattern for all following DataFrame calls, do:
DataFrame.parser.locale = Locale.FRANCE
DataFrame.parser.addDateTimePattern("dd.MM.uuuu HH:mm:ss")