Home  Contents

UTF8 functions

String Core4 Lua Commands

SYNOPSIS

  1. result = string.utf8at(s [, i [, j]])
  2. result = string.utf8char(...)
  3. result = string.utf8len(s)
  4. result = string.utf8lower(s [, i [, j]])
  5. result = string.utf8reverse(s [, i [, j]])
  6. result = string.utf8sub(s [, i [, j]])
  7. result = string.utf8upper(s [, i [, j]])

DESCRIPTION

This is a group of commands that are used to manipulate UTF8 encoded text. This encoding may use a different number of bytes for different characters.

All functions are based on basic lua functions that work on raw strings, which are shown in parentheses below. All indexes count actual unicode entities instead of raw bytes, however.

string.utf8at
(string.byte)
Returns the unicode codepoints of the characters s[i], s[i+1], ···, s[j].
The default value for i is 1; the default value for j is i.
string.utf8char
(string.char)
Receives zero or more integers. Returns a string that is the concatenation of all numbers interpreted as unicode codepoints.
string.utf8len
(#string)
Returns the number of unicode characters inside s, which might be less than #s.
string.utf8lower
(string.lower)
Returns a string that has the characters s[i], s[i+1], ···, s[j] converted into lower case.
The default value for i is 1; the default value for j is the number of UTF8 characters in s.
Uses the captialization database from unicode.org.
This code does not know how to do case conversions where the conversion results in a different number of characters (e.g. ß vs SS).
string.utf8reverse
(string.reverse)
Returns a string that contains the characters s[i], s[i+1], ···, s[j] in reverse order.
The default value for i is 1; the default value for j is the number of UTF8 characters in s.
string.utf8sub
(string.sub)
Returns a string that contains the characters s[i], s[i+1], ···, s[j].
The default value for i is 1; the default value for j is the number of UTF8 characters in s.
string.utf8upper
(string.upper)
Returns a string that has the characters s[i], s[i+1], ···, s[j] converted into upper case.
The default value for i is 1; the default value for j is the number of UTF8 characters in s.
Uses the captialization database from unicode.org.
This code does not know how to do case conversions where the conversion results in a different number of characters (e.g. ß vs SS).

ARGUMENTS

All functions that take a range support passing negative values for i or j. In that case, the value is taken as an offset from the end of the string.

EXAMPLE

This returns the first j UTF8 characters from s:

string.utf8sub(s, 1, j)

This returns the final j UTF8 characters from s:

string.utf8sub(s, 1, -j)

SEE ALSO