parse

Parsing Order

parse tries to parse every String column into one of supported types in the following order:

Int
Long
Instant (kotlinx.datetime and java.time)
LocalDateTime (kotlinx.datetime and java.time)
LocalDate (kotlinx.datetime and java.time)
Duration (kotlin.time and java.time)
LocalTime (java.time)
URL (java.net)
Double (with optional locale settings)
Boolean
Uuid (kotlin.uuid.Uuid) (requires parseExperimentalUuid = true)
BigDecimal
JSON (arrays and objects) (requires the org.jetbrains.kotlinx:dataframe-json dependency)

Parser Options

DataFrame supports multiple parser options that can be used to customize the parsing behavior. These can be supplied to the parse function (or any other function that can implicitly parse Strings) as an argument.

For each option you don't supply (or supply null) DataFrame will take the value from the Global Parser Options.

df.parse(options = ParserOptions(locale = Locale.CHINA, dateTimeFormatter = DateTimeFormatter.ISO_WEEK_DATE))

Global Parser Options

As mentioned before, you can change the default global parser options that will be used by read, convert, and other parse operations. Whenever you don't explicitly provide parser options to a function call, DataFrame will use these global options instead.

DataFrame.parser.locale = Locale.FRANCE
DataFrame.parser.addDateTimePattern("dd.MM.uuuu HH:mm:ss")

For locale, this means that the one being used by the parser is defined as:

↪ The locale given as function argument directly, or in parserOptions, if it is not null, else

↪ The locale set by DataFrame.parser.locale = ..., if it is not null, else

↪ Locale.getDefault(), which is the system's default locale that can be changed with Locale.setDefault().

Parsing Doubles

DataFrame has a new fast and powerful double parser enabled by default. It is based on the FastDoubleParser library for its high performance and configurability (in the future, we might expand this support to Float, BigDecimal, and BigInteger as well).

This means you can safely parse "123'456 789,012.345×10^6" with a US locale but not "1.234,5".

Aside from this, DataFrame also explicitly recognizes "∞", "inf", "infinity", and "infty" as Double.POSITIVE_INFINITY (as well as their negative counterparts), "nan", "na", and "n/a" as Double.NaN, and all forms of whitespace are treated equally.

If FastDoubleParser fails to parse a String as Double, DataFrame will try to parse it using the standard NumberFormat.parse() function as a last resort.

If you experience any issues with the new parser, you can turn it off by setting useFastDoubleParser = false, which will use the old NumberFormat.parse() function instead.

parse﻿

Parsing Order﻿

Parser Options﻿

Global Parser Options﻿

Parsing Doubles﻿

Parsing Order

Parser Options

Global Parser Options

Parsing Doubles