readCodePointValue

Decodes a single code point value from UTF-8 code units, reading between 1 and 4 bytes as necessary.

If this source is exhausted before a complete code point can be read, this throws an EOFException and consumes no input.

If this source starts with an ill-formed UTF-8 code units sequence, this method will remove 1 or more non-UTF-8 bytes and return the replacement character (U+fffd).

The replacement character (U+fffd) will be also returned if the source starts with a well-formed code units sequences, but a decoded value does not pass further validation, such as the value is out of range (beyond the 0x10ffff limit of Unicode), maps to UTF-16 surrogates (U+d800..U+dfff), or an overlong encoding is detected (such as 0xc080 for the NUL character in modified UTF-8).

Note that in general, returned value may not be directly converted to Char as it may be out of Char's values range and should be manually converted to a surrogate pair.

Throws

when the source is exhausted before a complete code point can be read.

when the source is closed.

when some I/O error occurs.

Samples

import kotlinx.io.*
import kotlin.test.*

fun main() { 
   //sampleStart 
   val buffer = Buffer()

buffer.writeUShort(0xce94U)
assertEquals(0x394, buffer.readCodePointValue()) // decodes a single UTF-8 encoded code point 
   //sampleEnd
}
import kotlinx.io.*
import kotlin.test.*

fun main() { 
   //sampleStart 
   val buffer = Buffer()

// that's a U+1F31A, a.k.a. "new moon with face"
buffer.writeString("🌚")
// it should be encoded with 4 code units
assertEquals(4, buffer.size)

// let's read it back as a single code point
val moonCodePoint = buffer.readCodePointValue()
// all code units were consumed
assertEquals(0, buffer.size)

// the moon is too wide to fit in a single UTF-16 character!
assertNotEquals(moonCodePoint, moonCodePoint.toChar().code)
// "too wide" means in the [U+010000, U+10FFFF] range
assertTrue(moonCodePoint in 0x10000..0x10FFFF)

// See https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF for details
val highSurrogate = (0xD800 + (moonCodePoint - 0x10000).ushr(10)).toChar()
val lowSurrogate = (0xDC00 + (moonCodePoint - 0x10000).and(0x3FF)).toChar()

assertContentEquals(charArrayOf(highSurrogate, lowSurrogate), "🌚".toCharArray()) 
   //sampleEnd
}