Kostis Netzwerkberatung
Copyright (c) 1993-2000 by Kostis Netzwerkberatung
Talstr. 25, D-63322 Rödermark, Tel. +49 6074 881056, FAX 881058
kosta@kostis.net (Kosta Kostis), http://www.kostis.net/

This information may be used free of charge at your own risk.

trans V1.30 2001-02-22


Character Encoding Description File Format (cedf)

The format for the character encoding description lines is the following:

four comment lines

Comment lines must not be longer than 69 characters.


The first comment line is the Character Encoding Name.

Example:

ISO/IEC 8859-1:1998 Latin Alphabet No. 1


The 2nd, 3rd and 4th lines are for version and contributor information.

Example:

V1.30 by Kosta Kostis <kosta@kostis.net>
Source: EMCA-94
ISO 2022 Sequence: <ESC>(B<ESC>-A


The remaining file consists of a list of characters in the following form:

<CHAR_NUM><TAB><ISO_NAME>

where

or

which includes all codes defined in the file "cedf-name".


Each line is separated by a "\n" (LF=LINE FEED) or "\r\n" (CR=CARRIAGE RETURN and LF) depending on how your OS / editor / C runtime library stores plain text files. This package is maintained using a box running Linux, so lines are separated by LF in my original files.


Note:

To ensure maximum "fault tolerance" and minimum error, function ReadCodeTable () does a little more than simply comparing two ISO/IEC 10646-1:1993names.

  1. ReadCodeTable () reads an ISO/IEC 10646-1:2000 name into a string.
  2. ReadCodeTable () converts the string to upper case.
  3. If there is a LEFT PARENTHESIS "(" in the string, it deletes everything after and including it (optional comment string).
  4. ReadCodeTable () deletes trailing white space.

The resulting string is stored in tables in memory used for the creation of the conversion tables.

The use of comment strings and mixed case in ISO/IEC 10646-1:2000 names is not recommended, though. See Function ReadCodeTable () in the file "readtab.c" for more details. It should be fairly easy to use this function in other (eg. your) programs.