Skip to content

Characters

Characters represent individual Unicode codepoints. Character literals are written as #\a, #\space, #\newline, or #\x1F600 (hex codepoint). These procedures are available from (scheme base) and (scheme char).


Conversion

char->integer

Syntax: (char->integer char)

Returns the Unicode codepoint of char as an exact integer. The result is always a non-negative fixnum in the range 0 to #x10FFFF.

kaappi> (char->integer #\a)
;=> 97
kaappi> (char->integer #\space)
;=> 32
kaappi> (char->integer #\x03BB)
;=> 955

See also: integer->char


integer->char

Syntax: (integer->char n)

Returns the character whose Unicode codepoint is the exact integer n. It is an error if n is not an exact integer in the range 0 to #x10FFFF.

kaappi> (integer->char 97)
;=> #\a
kaappi> (integer->char 955)
;=> #\x03BB
kaappi> (integer->char 32)
;=> #\space

See also: char->integer


Comparison

All character comparison procedures accept two or more arguments and check that the relation holds transitively across every consecutive pair. Comparison is by Unicode codepoint value.

char<?

Syntax: (char<? char1 char2 ...)

Returns #t if the codepoints of the characters are monotonically increasing (each strictly less than the next).

kaappi> (char<? #\a #\b #\c)
;=> #t
kaappi> (char<? #\a #\a)
;=> #f

See also: char<=?, char-ci<?


char<=?

Syntax: (char<=? char1 char2 ...)

Returns #t if the codepoints of the characters are monotonically non-decreasing.

kaappi> (char<=? #\a #\a #\b)
;=> #t
kaappi> (char<=? #\b #\a)
;=> #f

See also: char<?, char-ci<=?


char=?

Syntax: (char=? char1 char2 ...)

Returns #t if all characters have the same codepoint.

kaappi> (char=? #\a #\a #\a)
;=> #t
kaappi> (char=? #\a #\A)
;=> #f

See also: char-ci=?


char>=?

Syntax: (char>=? char1 char2 ...)

Returns #t if the codepoints of the characters are monotonically non-increasing.

kaappi> (char>=? #\c #\b #\a)
;=> #t
kaappi> (char>=? #\a #\b)
;=> #f

See also: char>?, char-ci>=?


char>?

Syntax: (char>? char1 char2 ...)

Returns #t if the codepoints of the characters are monotonically decreasing (each strictly greater than the next).

kaappi> (char>? #\c #\b #\a)
;=> #t
kaappi> (char>? #\a #\a)
;=> #f

See also: char>=?, char-ci>?


Classification

Classification predicates test Unicode properties, not just ASCII. For example, char-alphabetic? recognizes letters from Latin, Greek, Cyrillic, CJK, Hangul, Devanagari, Thai, Arabic, Hebrew, and many other scripts.

char-alphabetic?

Syntax: (char-alphabetic? char)

Returns #t if char is a Unicode alphabetic character. This covers letters across all supported scripts -- Latin, Greek, Cyrillic, Armenian, Georgian, Cherokee, Hangul, Hiragana, Katakana, CJK ideographs, Devanagari, Thai, Arabic, Hebrew, and more.

kaappi> (char-alphabetic? #\a)
;=> #t
kaappi> (char-alphabetic? #\1)
;=> #f
kaappi> (char-alphabetic? #\x03BB)   ; Greek lambda
;=> #t

See also: char-numeric?, char-upper-case?, char-lower-case?


char-numeric?

Syntax: (char-numeric? char)

Returns #t if char is a Unicode numeric digit. This includes ASCII digits 0-9 as well as digits from other scripts (Arabic-Indic, Devanagari, Thai, Tibetan, fullwidth digits, and many more -- 36 digit ranges are recognized).

kaappi> (char-numeric? #\5)
;=> #t
kaappi> (char-numeric? #\a)
;=> #f
kaappi> (char-numeric? #\x0966)   ; Devanagari digit zero
;=> #t

See also: digit-value, char-alphabetic?


char-whitespace?

Syntax: (char-whitespace? char)

Returns #t if char is a Unicode whitespace character. This includes ASCII whitespace (tab, newline, vertical tab, form feed, carriage return, space) and Unicode whitespace such as no-break space (U+00A0), ogham space mark (U+1680), en/em spaces (U+2000--U+200A), line separator (U+2028), paragraph separator (U+2029), narrow no-break space (U+202F), medium mathematical space (U+205F), and ideographic space (U+3000).

kaappi> (char-whitespace? #\space)
;=> #t
kaappi> (char-whitespace? #\newline)
;=> #t
kaappi> (char-whitespace? #\a)
;=> #f

See also: char-alphabetic?


char-upper-case?

Syntax: (char-upper-case? char)

Returns #t if char is a Unicode uppercase letter. A character is considered uppercase if it has a lowercase mapping in the Unicode case tables or belongs to an additional set of uppercase letters (mathematical symbols, etc.).

kaappi> (char-upper-case? #\A)
;=> #t
kaappi> (char-upper-case? #\a)
;=> #f
kaappi> (char-upper-case? #\x0391)   ; Greek capital Alpha
;=> #t

See also: char-lower-case?, char-upcase


char-lower-case?

Syntax: (char-lower-case? char)

Returns #t if char is a Unicode lowercase letter. A character is considered lowercase if it has an uppercase mapping in the Unicode case tables or belongs to an additional set of lowercase letters (phonetic extensions, etc.).

kaappi> (char-lower-case? #\a)
;=> #t
kaappi> (char-lower-case? #\A)
;=> #f
kaappi> (char-lower-case? #\x03B1)   ; Greek small alpha
;=> #t

See also: char-upper-case?, char-downcase


Case Conversion

char-upcase

Syntax: (char-upcase char)

Returns the uppercase equivalent of char using Unicode case mappings. If char has no uppercase mapping, it is returned unchanged. For ASCII characters, this uses the standard ASCII mapping; for non-ASCII characters, the Unicode uppercase mapping table is consulted.

kaappi> (char-upcase #\a)
;=> #\A
kaappi> (char-upcase #\A)
;=> #\A
kaappi> (char-upcase #\1)
;=> #\1
kaappi> (char-upcase #\x03B1)   ; Greek small alpha
;=> #\x0391                     ; Greek capital Alpha

See also: char-downcase, char-foldcase, string-upcase


char-downcase

Syntax: (char-downcase char)

Returns the lowercase equivalent of char using Unicode case mappings. If char has no lowercase mapping, it is returned unchanged.

kaappi> (char-downcase #\A)
;=> #\a
kaappi> (char-downcase #\a)
;=> #\a
kaappi> (char-downcase #\x0391)   ; Greek capital Alpha
;=> #\x03B1                       ; Greek small alpha

See also: char-upcase, char-foldcase, string-downcase


char-foldcase

Syntax: (char-foldcase char)

Returns the case-folded form of char. Case folding is used internally by the char-ci comparison procedures to normalize characters before comparison. For most characters this is equivalent to char-downcase, but certain characters have special fold mappings. For example, the long s (U+017F) folds to the ordinary lowercase s.

kaappi> (char-foldcase #\A)
;=> #\a
kaappi> (char-foldcase #\a)
;=> #\a
kaappi> (char-foldcase #\x017F)   ; Latin small long s
;=> #\s

Fold vs. downcase

char-foldcase consults a dedicated Unicode case-folding table and falls back to char-downcase only when no explicit fold mapping exists. This distinction matters for correct case-insensitive comparison of certain scripts.

See also: char-upcase, char-downcase, string-foldcase


digit-value

Syntax: (digit-value char)

If char is a Unicode decimal digit, returns its numeric value as an exact integer (0--9). Otherwise returns #f. This procedure recognizes digits from 36 different scripts, including ASCII, Arabic-Indic, Devanagari, Bengali, Gujarati, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Khmer, Mongolian, Limbu, New Tai Lue, Tai Tham, Balinese, Sundanese, Lepcha, Ol Chiki, Vai, Saurashtra, Kayah Li, Cham, Meetei Mayek, and fullwidth digits.

kaappi> (digit-value #\3)
;=> 3
kaappi> (digit-value #\a)
;=> #f
kaappi> (digit-value #\x0966)   ; Devanagari digit zero
;=> 0
kaappi> (digit-value #\x0E53)   ; Thai digit three
;=> 3

See also: char-numeric?


Case-Insensitive Character Comparisons

These procedures compare characters after applying Unicode case folding via char-foldcase. Like the case-sensitive variants, they accept two or more arguments and check the relation transitively across every consecutive pair.

char-ci<?

Syntax: (char-ci<? char1 char2 ...)

Returns #t if the case-folded codepoints are monotonically increasing.

kaappi> (char-ci<? #\a #\B)
;=> #t
kaappi> (char-ci<? #\A #\a)
;=> #f

See also: char<?, char-ci<=?


char-ci<=?

Syntax: (char-ci<=? char1 char2 ...)

Returns #t if the case-folded codepoints are monotonically non-decreasing.

kaappi> (char-ci<=? #\A #\a)
;=> #t
kaappi> (char-ci<=? #\b #\A)
;=> #f

See also: char<=?, char-ci<?


char-ci=?

Syntax: (char-ci=? char1 char2 ...)

Returns #t if all characters are equal after case folding.

kaappi> (char-ci=? #\A #\a)
;=> #t
kaappi> (char-ci=? #\a #\b)
;=> #f
kaappi> (char-ci=? #\x017F #\s)   ; long s equals s
;=> #t

See also: char=?, string-ci=?


char-ci>=?

Syntax: (char-ci>=? char1 char2 ...)

Returns #t if the case-folded codepoints are monotonically non-increasing.

kaappi> (char-ci>=? #\B #\a)
;=> #t
kaappi> (char-ci>=? #\A #\a)
;=> #t

See also: char>=?, char-ci>?


char-ci>?

Syntax: (char-ci>? char1 char2 ...)

Returns #t if the case-folded codepoints are monotonically decreasing.

kaappi> (char-ci>? #\B #\a)
;=> #t
kaappi> (char-ci>? #\A #\a)
;=> #f

See also: char>?, char-ci>=?


Case-Insensitive String Comparisons

These procedures compare strings by case-folding each codepoint with char-downcase before comparing. Comparison proceeds codepoint-by-codepoint over the UTF-8 encoding. They accept two or more string arguments and check the relation transitively.

string-ci<?

Syntax: (string-ci<? string1 string2 ...)

Returns #t if the case-folded strings are monotonically increasing in lexicographic order.

kaappi> (string-ci<? "apple" "Banana")
;=> #t
kaappi> (string-ci<? "banana" "APPLE")
;=> #f

See also: string-ci<=?, char-ci<?


string-ci<=?

Syntax: (string-ci<=? string1 string2 ...)

Returns #t if the case-folded strings are monotonically non-decreasing in lexicographic order.

kaappi> (string-ci<=? "Hello" "hello")
;=> #t
kaappi> (string-ci<=? "hello" "HELLO" "world")
;=> #t

See also: string-ci<?, string-ci=?


string-ci=?

Syntax: (string-ci=? string1 string2 ...)

Returns #t if all strings are equal after case folding each codepoint.

kaappi> (string-ci=? "Hello" "HELLO")
;=> #t
kaappi> (string-ci=? "hello" "world")
;=> #f

See also: char-ci=?, string=?


string-ci>=?

Syntax: (string-ci>=? string1 string2 ...)

Returns #t if the case-folded strings are monotonically non-increasing in lexicographic order.

kaappi> (string-ci>=? "Banana" "apple")
;=> #t
kaappi> (string-ci>=? "HELLO" "hello")
;=> #t

See also: string-ci>?, string-ci=?


string-ci>?

Syntax: (string-ci>? string1 string2 ...)

Returns #t if the case-folded strings are monotonically decreasing in lexicographic order.

kaappi> (string-ci>? "banana" "APPLE")
;=> #t
kaappi> (string-ci>? "hello" "HELLO")
;=> #f

See also: string-ci>=?, char-ci>?


String Case Conversion

These procedures return newly allocated strings with case mappings applied codepoint-by-codepoint. They handle Unicode special cases where a single codepoint may expand into multiple codepoints (e.g., German sharp s).

string-upcase

Syntax: (string-upcase string)

Returns a newly allocated string with every codepoint replaced by its Unicode uppercase mapping. Handles special expansions: German sharp s (U+00DF) becomes "SS", Latin ligatures (fb00--fb04) expand to their component letters, and certain Greek characters expand to multiple codepoints.

kaappi> (string-upcase "hello")
;=> "HELLO"
kaappi> (string-upcase "Stra\x00DF;e")   ; sharp s
;=> "STRASSE"

Expanding case mappings

Some characters expand to multiple characters when uppercased. The German sharp s (U+00DF) becomes "SS", the Latin small ligature fi (U+FB01) becomes "FI", and similar expansions apply to other ligatures. The returned string may therefore be longer than the input.

See also: string-downcase, string-foldcase, char-upcase


string-downcase

Syntax: (string-downcase string)

Returns a newly allocated string with every codepoint replaced by its Unicode lowercase mapping. Implements context-sensitive downcasing for Greek capital sigma (U+03A3): when sigma appears at the end of a word (preceded by a cased character and not followed by one), it becomes final sigma (U+03C2) instead of the regular small sigma (U+03C3). Also handles Turkish dotted capital I (U+0130), which expands to lowercase i followed by a combining dot above.

kaappi> (string-downcase "HELLO")
;=> "hello"
kaappi> (string-downcase "\x03A3;")   ; lone capital Sigma
;=> "\x03C3;"                          ; small sigma

Greek final sigma

The Greek capital sigma is context-sensitive: at the end of a word it becomes final sigma, and in other positions it becomes the standard small sigma. This follows the Unicode case mapping specification.

See also: string-upcase, string-foldcase, char-downcase


string-foldcase

Syntax: (string-foldcase string)

Returns a newly allocated string with Unicode case folding applied to every codepoint. Case folding is similar to lowercasing but uses the full Unicode CaseFolding.txt mappings, which differ from simple lowercasing for certain characters. This is the operation used internally by the string-ci comparison procedures.

Like string-upcase and string-downcase, this handles expanding mappings. For example, German sharp s (U+00DF) folds to "ss", and Latin ligatures fold to their lowercase component letters. The long s/t ligatures (U+FB05, U+FB06) fold to "st".

kaappi> (string-foldcase "HELLO")
;=> "hello"
kaappi> (string-foldcase "Stra\x00DF;e")
;=> "strasse"

Foldcase vs. downcase

string-foldcase and string-downcase produce different results for certain inputs. Case folding is specifically designed for case-insensitive comparison and may map characters differently than simple lowercasing. The sharp s is a classic example: string-downcase preserves it, while string-foldcase expands it to "ss".

See also: string-upcase, string-downcase, char-foldcase