Clickatell's Unicode SMS guide
Unicode seems like it should be a cross between technology and mythology. But it’s really nothing like that. It’s a system which has been devised to understand the characters in every language, used on websites and within SMS solutions.
What’s the history of Unicode?
Unicode has its origins in the late 1980s when staffers from a number of tech companies, including Xerox and Apple, collaborated to create a universal character set which included all of the living languages in the world. The name Unicode was chosen to “suggest a unique, unified, universal encoding”, according to one of its creators, Xerox’s Joe Becker.
Let’s get the basics out of the way. What is Unicode?
You know that an SMS consists of 160 characters. That’s pretty common knowledge. The reason for this is because each message part is limited to a maximum file size of 1120 bits. Phones which support languages like English, French or Italian – Latin-based alphabets – use GSM character encoding. This uses seven bits per character, allowing a maximum of 160 characters.
In comparison, phones which support languages other than Latin-based languages – like Arabic or Japanese – usually make use of Unicode Transformation Format (UTF-8). This makes use of 16 bits per character which allows for 70 characters per text message.
This difference in characters is why there’s some background magic that happens when sending a message using Clickatell’s SMS Platform. Our platform will automatically convert to Unicode when necessary, but the real magic is called concatenated SMS. More on that later.
The UTF-8 library of characters covers written characters and symbols for all major languages, including emojis. It was also designed to be ASCII backwards compatible. A number of mobile networks support only the GSM character set. This set isn’t as large as UTF-8 so it’s possible that some characters won’t be supported and messages will not be delivered in full.
Luckily, when sending messages using Clickatell’s SMS gateway, that’s not something you need to worry about. Our SMS Platform gives you the option to choose the character set which you’re using to submit your messages. And, as we’ve mentioned, the good news is that if a user does not specify a character set, all messages will immediately be encoded using the UTF-8 standard.
Our platform was developed to recognize right away which encoding is needed for the best possible delivery of a text message. If characters in the message you submit are not supported by the mobile network, Clickatell will immediately convert the message to Unicode.
When would I need to use Unicode?
Quite simply, you’ll never have to be cognizant of using Unicode. It’s a feature which is supported by Clickatell’s SMS gateway. If you use special characters which are not GSM compatible, your message format would be automatically switched to Unicode to ensure message delivery. It’s important to remember though that this would decrease the number of characters per message. By using this format, each message part consists of just 70 characters.
There’s absolutely nothing that you need to do aside from submitting your messages in regular text format. We’ll do all the work of converting to Unicode for you.
Now that you know all there is to know about the Unicode SMS solution, it’s time to learn more about concatenated SMS. Simply put, this is the splitting up of messages into smaller parts to be sure the entire SMS is received as intended. That seems confusing, right? Splitting up a text message to ensure it’s received as one? That’s why we’ve created the brand new guide to concatenated SMS. You’re welcome.