Contents:

eNON-txt

The eNON-txt format is a text-based representation of e-NON data.

Overview

A binary format is not useful in all situations:

Some transport protocols such as HTTP can not accept binary data.
Some data is best when human-readable and editable, for example configuration files.

The eNON-txt format provides an alternative means of representing e-NON data in these situations.

Because an e-NON stream processor can read and write either binary or text formats, the same processor can be used for all streams. While other text-based formats could be used, i.e. XML, JSON, the eNON-txt format supports all of the features of eNON. This allows text and binary streams to be used interchangeably.

Nomenclature

Format Name

The name eNON-txt will be used in narrative content. The name e-NON-txt is also acceptable.

File Extension

The file extension .enont may be used when storing eNON-txt data in a file system.

Format Definition

Newlines are significant in eNON-txt files. The newline character, ‘\n’, 0x0A, is the termiating symbol for all data elements.

Prolog

An eNON-txt file must start with the string #enon-txt on a line by itself. Any number of blank lines may appear before the prolog.

Data Elements

Following the prolog any number of data elements can appear.

Data Element Rules

An element consists of a prefix token possibly followed by a value.
An element appears on a single line.
- Multi-line data can be represented by using continuation elements on subsequent lines.
Only one element appears on a line.
Leading white space on a line is ignored (any indentation scheme is supported).
Blank lines (i.e. nothing but white space) are ignored
Except for string and character types:
- white space may appear between the prefix token and the value.
- white space at the end of the line is ignored.

Element Table

element	prefix	prefix (hex)	description
null	‘null’		Null is specified by the string ‘null’.
false	‘false’		The boolean value false is specified by the string ‘false’.
true	‘true’		The boolean value true is specified by the string ‘null’.
∞, positive	’+’	`0x2B`	Positive Infinity is specified by the plus character ‘+’.
∞, negative	’-‘	`0x2D`	Negative Infinity is specified by the minus character ‘-‘.
NaN	‘nan’		The special value not a number is specified by the string ‘nan’.
number	‘n’	`0x6E`	The generic number element has a prefix ‘n’ follwed by a base-10 numeric string value.
byte	‘b’	`0x62`	Similar to number, but the value must be valid signed 8-bit `byte` value.
short	’s’	`0x73`	Similar to number, but the value must be valid signed 16-bit `short` value.
int	‘i’	`0x69`	Similar to number, but the value must be valid signed 32-bit `int` value.
long	‘l’	`0x6C`	Similar to number, but the value must be valid signed 64-bit `long` value.
float	‘f’	`0x66`	Similar to number, but the value must be valid signed 32-bit `float` value.
double	‘d’	`0x64`	Similar to number, but the value must be valid signed 64-bit `double` value.
string	”	`0x22`	A string is prefixed by a double-quote follwed by a UTF-8 string.
unicode string	U”	`0x55,0x22`	A string containing unicode characters in hex notation, see below.
temporal	t	`0x74`	A temporal value is a string that identifies various time-based values.
BLOB (byte[])	‘B’	`0x42`	Binary data is prefixed by a capital ‘B’ followed by a Base64 binary string. Alternatively, binary data can be referenced from an external file, see below.
hard continuation	‘&’	`0x26`	A new line may be inserted into a string by using a hard continuation, see below.
soft continuation	’\’	`0x5C`	A long string or BLOB can be split onto multiple lines by using a soft continuation, see below.
list	’[’	`0x5B`	A list is started with an opening square brace.
end list	’]’	`0x5D`	A list is terminated by the closing square brace.
map	’{‘	`0x7B`	A map is started with an opening curly brace.
end map	’}’	`0x7D`	A map is terminated by the closing curly brace.

Null, False, True

These appear as just the strings null, false, true respectively. Nothing else appears on the line.

Negative/Positive Infinity

These appear as just the symbols - and + (minus 0x2D and plus 0x2B) respectively for negative and positive infinity. Nothing else appears on the line.

NaN

This appears as just the string nan. Nothing else appears on the line.

Byte, Short, Int, Long, Float, Double, Number

Numeric values appear as base-10 number strings without quotes. Numbers can be specified in exponential notation, see the examples below.

If the generic number prefix, ‘n’, is used, then the number can be of any size and precision.

Alternatively, if a specific numeric scale is specified, ‘b’, ‘s’, ‘i’, ‘l’, ‘f’, or ‘d’, then the number provided must be within the range and precision of the specified type.

White space can appear between the prefix and the number value.

The e-NON nano-int type can’t be specified directly. The eNON-txt processor may produce a nano-int internally if the prefix is ‘b’, and the value is within the nano-int range, (-63..64).

Examples:

n 12345 The number 12345
n 1.2345E240 The number 1.2345 * 10^240^
n -1.2345E-45 The number -1.2345 * 10^-45^
b -123 The byte value -123
b 15 The nano-int value 15
s 12345 The short value 12345
i 1234567 The int value 1234567
l -12345 The long value -12345
f 12.345 The float value 12.345
d 123.45 The double value 123.45

These examples show a space after the prefix, but it is also legal to put the value immediately after the prefix: e.g. d123.45

Examples:

‘C The latin capital C
U’0e01 The Thai character ก
‘ก The Thai character ก (If the file is UTF encoded and the text editor is capable of inserting Unicode characters directly)

String

The string is introduced by the double-quote character: " (0x22). Everything immediately after the prefix is considered to be part of the string, up to, but not including the newline.

Multi-line strings

A string may be continued on the next line by using a continuation prefix. There are 2 continuation prefixes for strings:

hard continuation '&' (ampersand, 0x26) The hard continuation inserts a newline into the string. This prefix is used to preserve newlines in the original text.
soft continuation '\' (backslash, 0x5C) A soft continuation does not insert a newline into the text. The soft continuation is used to wrap long text lines without altering the content.

Any number of continuation lines may appear. Continuation lines have to be consecutive: continuation ends at the first line that is not a continuation. A blank line terminates continuation.

As with other prefixes, all white space preceding the ", & and \ is ignored. This allows text to be indented along with other elements, even when continuations are included.

Unicode String

If the Unicode prefix is used, U" then the string data will be pre-processed for Unicode characters. Strings matching the pattern \u(X)+; will be replaced with a unicode character. The (X)+ represents a case-insensitive hexadecimal value. There must be at least 1 hex digit after the \u prefix, immediately follwed by a semicolon. Leading zeros in the hex value are not necessary.

This feature is provided for hand-written eNON-txt files, using editors that can’t insert Unicode values directly. The writer code doesn’t scan strings for the \u(X); pattern, and it won’t use the U" prefix in the output.

Carriage Return

Some line readers (i.e. BufferedReader in Java) don’t discriminate between CR (\r) and newline (\n) when terminating lines. The “official” line terminator for eNON-txt is the newline character, 0x0A. This may cause a CR character to be replaced by a newline during processing.

If it is important to preserve a CR character (0x0D), then it can be escaped as \uD; in a unicode string.

To maintain line formatting, the escaped CR can be immediately followed by a soft-continuation. This will allow a line break in the eNON-txt stream, but will not insert an extra newline upons reading.

Examples:

“This is a single-line string value
U”This line contains a unicode char: \uE01;
“ This is a multi-line string value
&with text on a 2nd line
& Indentation is preserved
U”This is a multi-line string that uses carriage return \uD;
\(CR) line terminators with soft-continuations. \uD;
\ Notice that this is identified as a Unicode string to allow escaped cahrs.
“This is a multi-line string value
&with text on a 2nd line, and ending with a newline.
&
“This is a multi-line string value
\ using “soft” breaks and “hard” breaks together.
&
&Also note that ‘quotes’ can appear without escaping
\ because the text always runs to the end of the line.

Best practice: try to avoid ending string lines with white space becuase they may not be easily visible in a text editor. With multi-line strings, put the white space on the next line following the continuation character.

Character Type

The Character type is not explicitly implemented. Just use a single character string.

Temporal

The temporal element represents various date and time related values.

A temporal value appears on a single line. The temporal prefix, 't', is followed by a temporal map. The temporal map is a string of key-value pairs. Each pair is a field-key character followed by its value.

The subtypes and expected values are as follows:

key	field	data	range	description
i	instant	integer	-2⁶³ .. 2⁶³-1	milliseconds from the UNIX epoch: midnight on 1 January 1970
Y	year	integer	-2³¹ .. 2³¹-1	year, such as 2019
M	month	integer	-128 .. 127	month within a year, nominally 1..12
D	day	integer	-127 .. 128	day within a month, nominally 1..31
h	hour	integer	-127 .. 128	hour within a day, nominally 0..59
m	minute	integer	-127 .. 128	minute within an hour, nominally 0..59
s	second	number	-127 .. 128	second within a minute, nominally 0..59
n	nanosecond	number	-2³¹ .. 2³¹-1	nanosecond within a second, nominally 0..999,999,999
o	offset	special (number)	±596523h 14m 7s	time zone offset from UTC. See below for offset formatting rules. Known offsets are in the range ±14h.
z	zone id	string	zone ID from IANA TZDB	time zone identifier, such as “America/Chicago”. The zone may be able to calculate a UTC offset based on knowledge of DST rules.

Rules for temporal fields:

Key pairs may be separated by white space. No space separates the field-key from its value.
Numeric values are made negative by putting a ‘-‘ sign after the field-key.
Except for the second field, numbers may contain only decimal digits without punctuation (other than the negation sign).
The second field may contain a decimal point followed by up to 9 places of subsecond accuracy.
Trailing zeros after the decimal point in the second field may be truncated. For example, 13.850 may be transmitted and received as 13.85.
The instant (i) has priority over all other time-date fields. When instant is provided, other time-date fields (YMDhms) are ignored. The instant can be used with offset or zone ID.
The offset may be ignored if a zone ID is provided and the receiving processor knows how to compute the offset from the zone ID.
When a zone ID is provided it must be the last field. The value is the remaining non white space on the line following the ‘z’ key.
Offset formatting options:
- (-)h or (-)hh
  One or two hour digits, optionally negative.
- (-)h*mmss
  Zero or more hour digits, followed by two digits each for minutes & seconds, optionally negative.

The eNON-txt format does not impose semantic meaning on the fields. It is up to the sending and receiving processors to agree upon the semantic meanings, regardless whether the values fall in the nominal ranges.

Examples:

instant representing the datetime 2019/03/26 20:44:58.841 in Bangkok:
ti1553607898841zAsia/Bangkok
the same date-time as the previous example, using individual fields:
tY2019M3D26h20m44s58.841zAsia/Bangkok
the same example using spaces to delimit the fields for readability:
t Y2019 M3 D26 h20 m44 s58.841 zAsia/Bangkok
the same example using an offset rather than the zone ID (Bangkok is UTC+7h):
t Y2019 M3 D26 h20 m44 s58.841 o7
example with a negative offset (Chicago is UTC-6h during standard time):
t Y2019 M3 D26 h20 m44 s58.841 o-6
a month-day (recurring date), such as a birthday, March 24:
t M3 D24
a time of day in a specific time zone, such as a meeting time, 2:15pm:
t h14 m15 zAmerica/New_York
an instant with nanosecond accuracy 2019/03/26 20:44:58.841004025:
ti1553607898841n4025

Why not support ISO-8601?
ISO-8601 is an “everything for everyone, human-readable” format which is difficult to parse in all of its valid forms. IMHO, it’s not worth taking on that complexity for a data transfer protocol where human-readability and unlimited flexibility are probably less imporatant than processing speed.

The format above unambiguously identifies what fields are present, and requires no complex parsing or analysis of the text. When you account for the punctuation in the ISO format, the eNON-txt format is just slightly more verbose.

BLOB (byte array)

The BLOB type is prefixed with B (capital B, 0x42). Data can be specified in two ways: Base64 or a file path.

Base64
If the B prefix is not followed by a double-quote, then the data that follows is interpreted as Base64.

Data should be encoded using the Basic mode as specified in Table 1 of RFC 4648 and RFC 2045 (not MIME or URL). There are no limits to line length, and no line breaks are included in the data.
File path
If the B prefix is followed by a double-quote, then the remainder of the line is interpreted as a file path. The contents of that file will be used as the BLOB data. The file path must appear on a single line: continuation is not supported for the file path. Note that the file contents should be raw binary, not Base64.

The interpretation of the file path will be dependent on the application, host, and/or operating system. The file path option is provided primarily as a convenience for manually manipulating eNON data and may not be applicable for data transmission.

For BLOB lines, white space can appear after the prefix, before the data starts.

Multi-line Data

Base64 data can be continued on multiple lines by using the & or \ continuation prefix similar to strings. For BLOB, both continuation types behave the same way: newline chars are never inserted into the data. Continuation lines are used only to control line length in the e-NON TXT file. All lines are concatenated by the reader into a continuous BLOB.

Unlike strings, white space after the line prefix is ignored for BLOB data. This is true for the first line and continuation lines. Data is considered to start at the first non-white-space character on each line.

Examples:

import the content from the file boat.jpg, relative to the location of the eNON-txt file:
B “boat.jpg
import data from absolute file path /home/me/fish/png
B “/home/me/fish.png
decode the Base64 value:
B TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG
decode a multi-line Base64 value:
B dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2Y
\ dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yg
\ ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hb=

List

A list uses the prefix '[' (open square bracket). The list is terminated by a closing square bracket ']' appearing as the first character on a new line.

If the list is empty, the terminating ‘]’ may be placed on the same line as the opening ‘[’.

Examples:

An empty list:
[ ]
A list with a string, a number, and a BLOB. Note that the elements in the list do not need to be the same type:
   [
      “A string element
      n 12345
      B “image.jpg
   ]

Map

The map is intruduce by the '{' curly brace prefix.

If the list is empty, the terminating ‘}’ may be placed on the same line as the opening ‘{‘.

The map is similar to the list except that elements must appear in key-value pairs. The eNON-txt parser will expect to find an even number of elements. The keys and values have no prefixes to identify them as such.

Blank lines can be placed between key-value pairs for readability.

Examples:

An empty map:
{ }
This produces a map with 3 keys: “key1”, “key2” and the character ‘k’. The respective values are the number 12345, null, and a list of floats:
   {
      “key1
      12345

      “key2
      null

      ‘k
      [
         f12.45
         f19.7
         f-23.874
      ]
   }

Metadata Elements

coming soon

Comment

coming soon

Data Type

coming soon

Attributes

coming soon