Code Format

A code is a sequence of bytes. All of the bytes must fall in the range 0x21-0x7E or 0xA0-0xFF. Every bank and intermediary must properly handle any code that is in this format. Here is a grammar definition, in [http://en.wikipedia.org/wiki/Augmented_Backus-Naur_form ABNF]:

ValidChar = %x21-7E / %xA1-AC / %xAE-FF
ValidCode ::= 1*256validchar

Code Display Recommendation

One goal of the code format is to allow people to use codes that have meaning in their own language. The reasons for this are:

Therefore the code format should support various languages. Rather than support various character encodings, the protocol will use ISO-8859-15 with UNICODE entities. Codes should be displayed using the [http://en.wikipedia.org/wiki/ISO_8859-15 ISO/IEC 8859-15 character encoding]. Substrings matching the following [http://en.wikipedia.org/wiki/Augmented_Backus-Naur_form ABNF] grammar are handled differently:

HexDigit = %x30-39 / %x41-46 / %x61-66 ; [0-9a-fA-F]
HexEntity = "&" , "#" , "x" , 1*HexDigit , ";"

Substrings matching HexEntity are converted to a [http://www.unicode.org/ UNICODE] code point, as in [http://www.w3.org/TR/REC-xml/#sec-references XML]. Entities encoding characters that don't need entity encoding should be displayed as-is.

For example, in the string "abcA", the entity "A" encodes the ISO-8859-15 character 'A'. If the software displays the code as "abcA", the user might write this down and later enter the code as "abcA", an error. The software should display the entity as-is, "abcA". The software should keep from displaying ambiguous representations of the code.

Software may choose to show an entity as-is, if doing so will eliminate ambiguity. Ambiguity may arise when a code contains a character that appears multiple times in the UNICODE. Some similar characters appear in the Chinese and Japanese portions of the unicode. When displaying a code containing a Chinese character on a Japanese computer, the entity should be displayed as-is.

To be safe, the software may display the code twice: once as-is and once with entities converted to their corresponding characters.

Code Entry

Software must allow the user to enter all codes that match ValidCode. The software may allow the user to enter characters not matching ValidChar. Such software must convert those characters to unicode code points and then to a string matching HexEntity. Website software accepting input via web browser may receive unicode characters encoded in decimal notation:

Digit = %x30-39 ; 0-9
DecimalEntity = "&" , "#" , 1*Digit , ";"

The software must convert substrings matching DecimalEntity into substrings matching HexEntity. Software displaying codes containing DecimalEntity must display them as-is.

Recommendation for Code Creation

When a client or bank generates a code, it should use characters from only one language. This minimizes the chances for the user to enter a character in the wrong language.

The code generator algorithm should use the output of a random number generator to select components of the code. The security of user accounts may be compromised if a malicious person is able to predict and guess codes. See PasswordEntropy.

Some codes are created for pre-printed cards, like credit card numbers. Such codes are intended for use by people. They should be:

English Language Codes

You may generate English language codes using any letter, number, or symbol that appears in the English language. But let's apply some usability analysis and pare down that set:

This leaves us with twenty characters. A 64-bit random number may be encoded into a 15-digit base-20 number. This is quite manageable.

For comparison, a VISA credit card has a 16 digit account number, 4 digit expiration date, 3 digit CVV code, and the owner's name. Credit card information is often given over the telephone, with the caller spelling her name. Our proposed 15-digit codes contain letters that require spelling when relayed over the telephone. But I expect that relaying a debit code over the phone will require about the same amount of time as a credit card number. I also expect errors to be about as common. Experiments are needed to determine the best code composition and length.

Codes for Software

Some codes are generated and handled by software. Such codes are rarely displayed to a user, so they can be longer and more complicated. Such applications could use an alphabet of all 188 valid characters, for very compact representation. A 128-bit random number fits into a 17-digit base-188 number.

TODO: decide whether or not to remove '&' from the ValidChar range: ValidChar = %x21-25 / %x27-7E / %xA1-AC / %xAE-FF

Discussion

CodeFormat (last edited 2007-10-16 07:15:16 by MichaelLeonhard)