Bidirectional Support

SYNOPSIS

result = string.utf8bidi(s [, i [, j]])

DESCRIPTION

The function string.utf8bidi() implements an algorithm that handles bidirectional languages and arabic joining.

The source string is scanned in logical order from first to last character. Any runs of characters that belong to a language which uses right-to-left writing are rearranged. The resulting string, then, when printed default left-to-right, will display correctly for these languages.

In addition, the function knows about the specialities of the arabic writing system. In arabic, a lot of letters need to be written differently based on their position in a word. The algorithm looks for generic arabic unicodes in the approximate range of 0x06xx. These codes are replaced by their arabic presentational forms in the range 0xFExx as appropriate.

Parameters select a substring of str that starts at i and continues until j; i and j can be negative. If j is absent, then it is assumed to be equal to -1 (which is the same as the string length).

RETURN VALUE

Returns an UTF-8 encoded string.

Due to possible ligatures, the resulting string can be shorter than the initial string. It is never longer.

NOTES

Incorporates code from the UCData package under a BSD style License.

ISSUES

The output of this function is of limited use to gcx:text(). When using word-wrap mode together with right-to-left languages, the order of lines may appear upside-down and word-wrapping wraps incorrectly.

It is recommended to use the built-in bidirectional support of gcx:text() instead, which does not have that issue.

EXAMPLE

This example uses the arabic word for "Thank you" as an example. After running through the algorithm, the order of the characters have been reversed. Also, the characters have been replaced by their presentational forms.

> > > > > > > > > > > >

function utf8dump(name, str) io.write(string.format("%8s:", name)) for ch in string.utf8fwd(str) do io.write(string.format(" 0x%04X", ch)) end io.write("\n") end local inp = "\u0634\u0643\u0631\u0627\u064B" local out = string.utf8bidi(inp) utf8dump("input", inp) utf8dump("output", out)

This prints:

input: 0x0634 0x0643 0x0631 0x0627 0x064B output: 0xFE70 0xFE8D 0xFEAE 0xFEDC 0xFEB7