1.47.2. String

Felix provides two kinds of strings: 8 bit and 32 bit, denotes string and ustring respectively.

Both kinds of string are intended to provide a universal representation of human readable text using the ISO-10646/Unicode character set.

Both kinds of string encode 32 bit values representing ISO-10646 code points. Strings use UTF-8, whilst ustrings use UCS-4.

Both kinds of string literals are replaced where they occur with named constants with internal linkage of types basic_string<char>, and basic_string<uint32_t>.

String literals are output in the C files as C string literals with all UTF-8 encoding expressed with hex escapes, surrounded by a constructor for basic_string<char>.

Ustrings are surrounded by a Felix function which constructs a basic_string<uint32_t> from an 8 bit C string literal by decoding UTF-8 sequences.

Both kinds of string admit \uXXXX and \UXXXXXXXX escapes, as well as the usual C escapes \\, \', \", \r, \n, \t, \b, \v, \f for slosh, quote, double quote, return, newline, tab, bell, vertical tab, and form feed, respectively.

Octal and hex escapes are NOT allowed: strings are for internationalisable human text, and do not represent arbitrary raw memory extents.

Although not part of the lexicology, note here two special forms for strings: a string may be applied to a string, or, a string may be applied to an integer. The first case is remodelled as a concatenation, and the second as the concatenation of the string and the ISO-10646 code point the integer represents. Note that if the string is 8 bit, UTF-8 encoding will be applied.

Start C++ section to tut/examples/tut159.flx[1 /1 ]
     1: include "std";
     2: var s = "Hello" 32; // add a space after "Hello"
     3: s = s s; // says "Hello Hello "
     4: print s; endl;
     5: 
End C++ section to tut/examples/tut159.flx[1]