Contents:

eNON-txt

The eNON-txt format is a text-based representation of e-NON data.

Overview

A binary format is not useful in all situations:

The eNON-txt format provides an alternative means of representing e-NON data in these situations.

Because an e-NON stream processor can read and write either binary or text formats, the same processor can be used for all streams. While other text-based formats could be used, i.e. XML, JSON, the eNON-txt format supports all of the features of eNON. This allows text and binary streams to be used interchangeably.

Nomenclature

Format Name

The name eNON-txt will be used in narrative content. The name e-NON-txt is also acceptable.

File Extension

The file extension .enont may be used when storing eNON-txt data in a file system.

Format Definition

Newlines are significant in eNON-txt files. The newline character, ‘\n’, 0x0A, is the termiating symbol for all data elements.

Prolog

An eNON-txt file must start with the string #enon-txt on a line by itself. Any number of blank lines may appear before the prolog.

Data Elements

Following the prolog any number of data elements can appear.

Data Element Rules

Element Table

element prefix prefix (hex) description
null ‘null’   Null is specified by the string ‘null’.
false ‘false’   The boolean value false is specified by the string ‘false’.
true ‘true’   The boolean value true is specified by the string ‘null’.
∞, positive ’+’ 0x2B Positive Infinity is specified by the plus character ‘+’.
∞, negative ’-‘ 0x2D Negative Infinity is specified by the minus character ‘-‘.
NaN ‘nan’   The special value not a number is specified by the string ‘nan’.
number ‘n’ 0x6E The generic number element has a prefix ‘n’ follwed by a base-10 numeric string value.
byte ‘b’ 0x62 Similar to number, but the value must be valid signed 8-bit byte value.
short ’s’ 0x73 Similar to number, but the value must be valid signed 16-bit short value.
int ‘i’ 0x69 Similar to number, but the value must be valid signed 32-bit int value.
long ‘l’ 0x6C Similar to number, but the value must be valid signed 64-bit long value.
float ‘f’ 0x66 Similar to number, but the value must be valid signed 32-bit float value.
double ‘d’ 0x64 Similar to number, but the value must be valid signed 64-bit double value.
string 0x22 A string is prefixed by a double-quote follwed by a UTF-8 string.
unicode string U” 0x55,0x22 A string containing unicode characters in hex notation, see below.
temporal t 0x74 A temporal value is a string that identifies various time-based values.
BLOB (byte[]) ‘B’ 0x42 Binary data is prefixed by a capital ‘B’ followed by a Base64 binary string. Alternatively, binary data can be referenced from an external file, see below.
hard continuation ‘&’ 0x26 A new line may be inserted into a string by using a hard continuation, see below.
soft continuation ’\’ 0x5C A long string or BLOB can be split onto multiple lines by using a soft continuation, see below.
list ’[’ 0x5B A list is started with an opening square brace.
end list ’]’ 0x5D A list is terminated by the closing square brace.
map ’{‘ 0x7B A map is started with an opening curly brace.
end map ’}’ 0x7D A map is terminated by the closing curly brace.

 

Null, False, True

These appear as just the strings null, false, true respectively. Nothing else appears on the line.

Negative/Positive Infinity

These appear as just the symbols - and + (minus 0x2D and plus 0x2B) respectively for negative and positive infinity. Nothing else appears on the line.

NaN

This appears as just the string nan. Nothing else appears on the line.

Byte, Short, Int, Long, Float, Double, Number

Numeric values appear as base-10 number strings without quotes. Numbers can be specified in exponential notation, see the examples below.

If the generic number prefix, ‘n’, is used, then the number can be of any size and precision.

Alternatively, if a specific numeric scale is specified, ‘b’, ‘s’, ‘i’, ‘l’, ‘f’, or ‘d’, then the number provided must be within the range and precision of the specified type.

White space can appear between the prefix and the number value.

The e-NON nano-int type can’t be specified directly. The eNON-txt processor may produce a nano-int internally if the prefix is ‘b’, and the value is within the nano-int range, (-63..64).

Examples:

These examples show a space after the prefix, but it is also legal to put the value immediately after the prefix: e.g. d123.45

Examples:

String

The string is introduced by the double-quote character: " (0x22). Everything immediately after the prefix is considered to be part of the string, up to, but not including the newline.

Multi-line strings

A string may be continued on the next line by using a continuation prefix. There are 2 continuation prefixes for strings:

Any number of continuation lines may appear. Continuation lines have to be consecutive: continuation ends at the first line that is not a continuation. A blank line terminates continuation.

As with other prefixes, all white space preceding the ", & and \ is ignored. This allows text to be indented along with other elements, even when continuations are included.

Unicode String

If the Unicode prefix is used, U" then the string data will be pre-processed for Unicode characters. Strings matching the pattern \u(X)+; will be replaced with a unicode character. The (X)+ represents a case-insensitive hexadecimal value. There must be at least 1 hex digit after the \u prefix, immediately follwed by a semicolon. Leading zeros in the hex value are not necessary.

This feature is provided for hand-written eNON-txt files, using editors that can’t insert Unicode values directly. The writer code doesn’t scan strings for the \u(X); pattern, and it won’t use the U" prefix in the output.

Carriage Return

Some line readers (i.e. BufferedReader in Java) don’t discriminate between CR (\r) and newline (\n) when terminating lines. The “official” line terminator for eNON-txt is the newline character, 0x0A. This may cause a CR character to be replaced by a newline during processing.

If it is important to preserve a CR character (0x0D), then it can be escaped as \uD; in a unicode string.

To maintain line formatting, the escaped CR can be immediately followed by a soft-continuation. This will allow a line break in the eNON-txt stream, but will not insert an extra newline upons reading.

Examples:

Best practice: try to avoid ending string lines with white space becuase they may not be easily visible in a text editor. With multi-line strings, put the white space on the next line following the continuation character.

Character Type

The Character type is not explicitly implemented. Just use a single character string.

Temporal

The temporal element represents various date and time related values.

A temporal value appears on a single line. The temporal prefix, 't', is followed by a temporal map. The temporal map is a string of key-value pairs. Each pair is a field-key character followed by its value.

The subtypes and expected values are as follows:

key field data range description
i instant integer -263 .. 263-1 milliseconds from the UNIX epoch: midnight on 1 January 1970
Y year integer -231 .. 231-1 year, such as 2019
M month integer -128 .. 127 month within a year, nominally 1..12
D day integer -127 .. 128 day within a month, nominally 1..31
h hour integer -127 .. 128 hour within a day, nominally 0..59
m minute integer -127 .. 128 minute within an hour, nominally 0..59
s second number -127 .. 128 second within a minute, nominally 0..59
n nanosecond number -231 .. 231-1 nanosecond within a second, nominally 0..999,999,999
o offset special (number) ±596523h 14m 7s time zone offset from UTC. See below for offset formatting rules. Known offsets are in the range ±14h.
z zone id string zone ID from IANA TZDB time zone identifier, such as “America/Chicago”. The zone may be able to calculate a UTC offset based on knowledge of DST rules.

 

Rules for temporal fields:

The eNON-txt format does not impose semantic meaning on the fields. It is up to the sending and receiving processors to agree upon the semantic meanings, regardless whether the values fall in the nominal ranges.

Examples:

Why not support ISO-8601?
ISO-8601 is an “everything for everyone, human-readable” format which is difficult to parse in all of its valid forms. IMHO, it’s not worth taking on that complexity for a data transfer protocol where human-readability and unlimited flexibility are probably less imporatant than processing speed.

The format above unambiguously identifies what fields are present, and requires no complex parsing or analysis of the text. When you account for the punctuation in the ISO format, the eNON-txt format is just slightly more verbose.

BLOB (byte array)

The BLOB type is prefixed with B (capital B, 0x42). Data can be specified in two ways: Base64 or a file path.

The interpretation of the file path will be dependent on the application, host, and/or operating system. The file path option is provided primarily as a convenience for manually manipulating eNON data and may not be applicable for data transmission.

For BLOB lines, white space can appear after the prefix, before the data starts.

Multi-line Data

Base64 data can be continued on multiple lines by using the & or \ continuation prefix similar to strings. For BLOB, both continuation types behave the same way: newline chars are never inserted into the data. Continuation lines are used only to control line length in the e-NON TXT file. All lines are concatenated by the reader into a continuous BLOB.

Unlike strings, white space after the line prefix is ignored for BLOB data. This is true for the first line and continuation lines. Data is considered to start at the first non-white-space character on each line.

Examples:

List

A list uses the prefix '[' (open square bracket). The list is terminated by a closing square bracket ']' appearing as the first character on a new line.

If the list is empty, the terminating ‘]’ may be placed on the same line as the opening ‘[’.

Examples:

Map

The map is intruduce by the '{' curly brace prefix.

If the list is empty, the terminating ‘}’ may be placed on the same line as the opening ‘{‘.

The map is similar to the list except that elements must appear in key-value pairs. The eNON-txt parser will expect to find an even number of elements. The keys and values have no prefixes to identify them as such.

Blank lines can be placed between key-value pairs for readability.

Examples:

Metadata Elements

coming soon

Comment

coming soon

Data Type

coming soon

Attributes

coming soon