You may have seen text that has strange diamond characters in it. If so, you likely have an encoding problem.
Fundamentally, Xojo strings are a series of bytes and a TextEncoding object that specifies how to interpret the bytes. It’s up to the programmer to make sure that the bytes are tagged with the correct TextEncoding object. You can do this using the DefineEncoding method or by passing in a TextEncoding parameter to various functions like BinaryStream.Read. Informally speaking, strings without an encoding are considered to be a simple “bag of bytes”, much like a MemoryBlock. Strings with a valid encoding are considered to be “textual” strings.
In the Xojo Cocoa framework, just about everything you see in the user interface, including drawing to a Graphics object or setting a TextArea’s Text property, requires a “textual” string. If the Xojo Cocoa framework needs a “textual” string but is passed a “bag of bytes” or a string that has a TextEncoding that isn’t correct for the bytes, it attempts to provide a fairly graceful fallback to avoid crashes. The fallback path tries its best to display something, which often means inserting the Unicode replacement character — the diamonds seen in the screenshot below:
The exact behavior of the fallback path might change over time, but it currently works like this:
- If the string is marked as UTF-8 but isn’t valid UTF-8, the framework attempts to parse it as UTF-8 and replace any invalid sequences it finds with the Unicode replacement character.
- If the string had no encoding or it was otherwise invalid, the framework treats the string as ASCII and replaces anything non-printable with the Unicode replacement character.
To avoid this fallback behavior, you just need to make sure that the encoding of your string is defined properly. If all your strings are defined within Xojo, then you are all set because they use UTF-8 by default. But if your strings come in from external files, Internet communication or databases you may need to specifically define (or convert) the encoding.