The eNON-txt format is a text-based representation of e-NON data.
A binary format is not useful in all situations:
The eNON-txt format provides an alternative means of representing e-NON data in these situations.
Because an e-NON stream processor can read and write either binary or text formats, the same processor can be used for all streams. While other text-based formats could be used, i.e. XML, JSON, the eNON-txt format supports all of the features of eNON. This allows text and binary streams to be used interchangeably.
The name eNON-txt will be used in narrative content. The name e-NON-txt is also acceptable.
The file extension .enont may be used when storing eNON-txt data in a file system.
Newlines are significant in eNON-txt files. The newline character, ‘\n’,
0x0A
, is the termiating symbol for all data elements.
An eNON-txt file must start with the string #enon-txt
on a line by itself. Any number of blank lines may appear before the prolog.
Following the prolog any number of data elements can appear.
element | prefix | prefix (hex) | description |
---|---|---|---|
null | ‘null’ | Null is specified by the string ‘null’. | |
false | ‘false’ | The boolean value false is specified by the string ‘false’. | |
true | ‘true’ | The boolean value true is specified by the string ‘null’. | |
∞, positive | ’+’ | 0x2B |
Positive Infinity is specified by the plus character ‘+’. |
∞, negative | ’-‘ | 0x2D |
Negative Infinity is specified by the minus character ‘-‘. |
NaN | ‘nan’ | The special value not a number is specified by the string ‘nan’. | |
number | ‘n’ | 0x6E |
The generic number element has a prefix ‘n’ follwed by a base-10 numeric string value. |
byte | ‘b’ | 0x62 |
Similar to number, but the value must be valid signed 8-bit byte value. |
short | ’s’ | 0x73 |
Similar to number, but the value must be valid signed 16-bit short value. |
int | ‘i’ | 0x69 |
Similar to number, but the value must be valid signed 32-bit int value. |
long | ‘l’ | 0x6C |
Similar to number, but the value must be valid signed 64-bit long value. |
float | ‘f’ | 0x66 |
Similar to number, but the value must be valid signed 32-bit float value. |
double | ‘d’ | 0x64 |
Similar to number, but the value must be valid signed 64-bit double value. |
string | ” | 0x22 |
A string is prefixed by a double-quote follwed by a UTF-8 string. |
unicode string | U” | 0x55,0x22 |
A string containing unicode characters in hex notation, see below. |
temporal | t | 0x74 |
A temporal value is a string that identifies various time-based values. |
BLOB (byte[]) | ‘B’ | 0x42 |
Binary data is prefixed by a capital ‘B’ followed by a Base64 binary string. Alternatively, binary data can be referenced from an external file, see below. |
hard continuation | ‘&’ | 0x26 |
A new line may be inserted into a string by using a hard continuation, see below. |
soft continuation | ’\’ | 0x5C |
A long string or BLOB can be split onto multiple lines by using a soft continuation, see below. |
list | ’[’ | 0x5B |
A list is started with an opening square brace. |
end list | ’]’ | 0x5D |
A list is terminated by the closing square brace. |
map | ’{‘ | 0x7B |
A map is started with an opening curly brace. |
end map | ’}’ | 0x7D |
A map is terminated by the closing curly brace. |
These appear as just the strings null
, false
, true
respectively. Nothing else appears on the line.
These appear as just the symbols -
and +
(minus 0x2D and plus 0x2B) respectively for negative and positive infinity. Nothing else appears on the line.
This appears as just the string nan
. Nothing else appears on the line.
Numeric values appear as base-10 number strings without quotes. Numbers can be specified in exponential notation, see the examples below.
If the generic number prefix, ‘n’, is used, then the number can be of any size and precision.
Alternatively, if a specific numeric scale is specified, ‘b’, ‘s’, ‘i’, ‘l’, ‘f’, or ‘d’, then the number provided must be within the range and precision of the specified type.
White space can appear between the prefix and the number value.
The e-NON nano-int type can’t be specified directly. The eNON-txt processor may produce a nano-int internally if the prefix is ‘b’, and the value is within the nano-int range, (-63..64).
Examples:
n 12345
The number 12345n 1.2345E240
The number 1.2345 * 10^240^n -1.2345E-45
The number -1.2345 * 10^-45^b -123
The byte
value -123b 15
The nano-int value 15s 12345
The short
value 12345i 1234567
The int
value 1234567l -12345
The long
value -12345f 12.345
The float
value 12.345d 123.45
The double
value 123.45
The string is introduced by the double-quote character: "
(0x22). Everything immediately after the prefix is considered to be part of the string, up to, but not including the newline.
A string may be continued on the next line by using a continuation prefix. There are 2 continuation prefixes for strings:
'&'
(ampersand, 0x26)
The hard continuation inserts a newline into the string. This prefix is used to preserve newlines in the original text.'\'
(backslash, 0x5C)
A soft continuation does not insert a newline into the text. The soft continuation is used to wrap long text lines without altering the content.Any number of continuation lines may appear. Continuation lines have to be consecutive: continuation ends at the first line that is not a continuation. A blank line terminates continuation.
As with other prefixes, all white space preceding the "
, &
and \
is ignored. This allows text to be indented along with other elements, even when continuations are included.
If the Unicode prefix is used, U"
then the string data will be pre-processed for Unicode characters. Strings matching the pattern \u(X)+;
will be replaced with a unicode character. The (X)+
represents a case-insensitive hexadecimal value. There must be at least 1 hex digit after the \u
prefix, immediately follwed by a semicolon. Leading zeros in the hex value are not necessary.
This feature is provided for hand-written eNON-txt files, using editors that can’t insert Unicode values directly. The writer code doesn’t scan strings for the
\u(X);
pattern, and it won’t use theU"
prefix in the output.
Some line readers (i.e. BufferedReader in Java) don’t discriminate between CR (\r) and newline (\n) when terminating lines. The “official” line terminator for eNON-txt is the newline character, 0x0A
. This may cause a CR character to be replaced by a newline during processing.
If it is important to preserve a CR character (0x0D
), then it can be escaped as \uD;
in a unicode string.
To maintain line formatting, the escaped CR can be immediately followed by a soft-continuation. This will allow a line break in the eNON-txt stream, but will not insert an extra newline upons reading.
Examples:
“This is a single-line string value
U”This line contains a unicode char: \uE01;
“ This is a multi-line string value
&with text on a 2nd line
& Indentation is preserved
U”This is a multi-line string that uses carriage return \uD;
\(CR) line terminators with soft-continuations. \uD;
\ Notice that this is identified as a Unicode string to allow escaped cahrs.
“This is a multi-line string value
&with text on a 2nd line, and ending with a newline.
&
“This is a multi-line string value
\ using “soft” breaks and “hard” breaks together.
&
&Also note that ‘quotes’ can appear without escaping
\ because the text always runs to the end of the line.
Best practice: try to avoid ending string lines with white space becuase they may not be easily visible in a text editor. With multi-line strings, put the white space on the next line following the continuation character.
The Character type is not explicitly implemented. Just use a single character string.
The temporal element represents various date and time related values.
A temporal value appears on a single line. The temporal prefix, 't'
, is followed by a temporal map. The temporal map is a string of key-value pairs. Each pair is a field-key character followed by its value.
The subtypes and expected values are as follows:
key | field | data | range | description |
---|---|---|---|---|
i | instant | integer | -263 .. 263-1 | milliseconds from the UNIX epoch: midnight on 1 January 1970 |
Y | year | integer | -231 .. 231-1 | year, such as 2019 |
M | month | integer | -128 .. 127 | month within a year, nominally 1..12 |
D | day | integer | -127 .. 128 | day within a month, nominally 1..31 |
h | hour | integer | -127 .. 128 | hour within a day, nominally 0..59 |
m | minute | integer | -127 .. 128 | minute within an hour, nominally 0..59 |
s | second | number | -127 .. 128 | second within a minute, nominally 0..59 |
n | nanosecond | number | -231 .. 231-1 | nanosecond within a second, nominally 0..999,999,999 |
o | offset | special (number) | ±596523h 14m 7s | time zone offset from UTC. See below for offset formatting rules. Known offsets are in the range ±14h. |
z | zone id | string | zone ID from IANA TZDB | time zone identifier, such as “America/Chicago”. The zone may be able to calculate a UTC offset based on knowledge of DST rules. |
Rules for temporal fields:
13.850
may be transmitted and received as 13.85
.(-)h
or (-)hh
(-)h*mmss
The eNON-txt format does not impose semantic meaning on the fields. It is up to the sending and receiving processors to agree upon the semantic meanings, regardless whether the values fall in the nominal ranges.
Examples:
instant representing the datetime 2019/03/26 20:44:58.841
in Bangkok:
ti1553607898841zAsia/Bangkok
the same date-time as the previous example, using individual fields:
tY2019M3D26h20m44s58.841zAsia/Bangkok
the same example using spaces to delimit the fields for readability:
t Y2019 M3 D26 h20 m44 s58.841 zAsia/Bangkok
the same example using an offset rather than the zone ID (Bangkok is UTC+7h):
t Y2019 M3 D26 h20 m44 s58.841 o7
example with a negative offset (Chicago is UTC-6h during standard time):
t Y2019 M3 D26 h20 m44 s58.841 o-6
a month-day (recurring date), such as a birthday, March 24
:
t M3 D24
a time of day in a specific time zone, such as a meeting time, 2:15pm
:
t h14 m15 zAmerica/New_York
an instant with nanosecond accuracy 2019/03/26 20:44:58.841004025
:
ti1553607898841n4025
Why not support ISO-8601?
ISO-8601 is an “everything for everyone, human-readable” format which is difficult to parse in all of its valid forms. IMHO, it’s not worth taking on that complexity for a data transfer protocol where human-readability and unlimited flexibility are probably less imporatant than processing speed.The format above unambiguously identifies what fields are present, and requires no complex parsing or analysis of the text. When you account for the punctuation in the ISO format, the eNON-txt format is just slightly more verbose.
The BLOB type is prefixed with B
(capital B, 0x42). Data can be specified in two ways: Base64 or a file path.
Base64
If the B
prefix is not followed by a double-quote, then the data that follows is interpreted as Base64.
Data should be encoded using the Basic mode as specified in Table 1 of RFC 4648 and RFC 2045 (not MIME or URL). There are no limits to line length, and no line breaks are included in the data.
File path
If the B
prefix is followed by a double-quote, then the remainder of the line is interpreted as a file path. The contents of that file will be used as the BLOB data. The file path must appear on a single line: continuation is not supported for the file path. Note that the file contents should be raw binary, not Base64.
The interpretation of the file path will be dependent on the application, host, and/or operating system. The file path option is provided primarily as a convenience for manually manipulating eNON data and may not be applicable for data transmission.
For BLOB lines, white space can appear after the prefix, before the data starts.
Base64 data can be continued on multiple lines by using the &
or \
continuation prefix similar to strings. For BLOB, both continuation types behave the same way: newline chars are never inserted into the data. Continuation lines are used only to control line length in the e-NON TXT file. All lines are concatenated by the reader into a continuous BLOB.
Unlike strings, white space after the line prefix is ignored for BLOB data. This is true for the first line and continuation lines. Data is considered to start at the first non-white-space character on each line.
Examples:
import the content from the file boat.jpg
, relative to the location of the eNON-txt file:
B “boat.jpg
import data from absolute file path /home/me/fish/png
B “/home/me/fish.png
decode the Base64 value:
B TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG
decode a multi-line Base64 value:
B dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2Y
\ dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yg
\ ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hb=
A list uses the prefix '['
(open square bracket). The list is terminated by a closing square bracket ']'
appearing as the first character on a new line.
If the list is empty, the terminating ‘]’ may be placed on the same line as the opening ‘[’.
The map is intruduce by the '{'
curly brace prefix.
If the list is empty, the terminating ‘}’ may be placed on the same line as the opening ‘{‘.
The map is similar to the list except that elements must appear in key-value pairs. The eNON-txt parser will expect to find an even number of elements. The keys and values have no prefixes to identify them as such.
Blank lines can be placed between key-value pairs for readability.
An empty map:
{ }
This produces a map with 3 keys: “key1”, “key2” and the character ‘k’. The respective values are the number 12345, null, and a list of floats:
{
“key1
12345
“key2
null
‘k
[
f12.45
f19.7
f-23.874
]
}
coming soon
coming soon
coming soon
coming soon