e-NON is electronic Network Object Notation: an object serialization format that supports binary data natively.
e-NON contains the same basic data types as JSON, (map, array, primitives, etc.) but rendered in a more compact form and allowing raw binary data. Binary encoded types are written in network byte order.
Extensions to the e-NON format have been defined to suit common (and uncommon) data serialization needs.
The e-NON version described in this document is 0 (zero).
The e-NON format is intended as a more efficient and flexible data transfer format compared to JSON and other text-based formats such as XML.
JSON has been a huge success in the software industry, but it has limitations that make it difficult to use for some applications. In particular, it is inefficient with binary data.
Protocols more “binary-friendly” than HTTP are gaining acceptance. WebSocket and RSocket are two examples. e-NON was inspired as a means of encoding object data in the RSocket payload.
e-NON is not a human-readable format. Its primary purpose is efficient data transmission between systems using software to read and write. Tools can be created to view and/or edit e-NON.
e-NON contains binary data. This makes it unsuitable for Web protocols that require text payloads, such as HTTP. e-NON could be Base64 encoded and stuffed into an HTTP message. However, e-NON is intended for other transports that support binary traffic.
e-NON uses declarative notation (prefixes) to identify elements. Each element starts with an ASCII code letter to indicate the type. After the code letter, a size indicator tells how much data follows. Primitive types have fixed sizes, so the size indicator is not necessary. There are no end-delimiters.
e-NON supports multiple top-level elements. This is in contrast to JSON and XML which allow only a single root element.
The format may be written as e-NON or eNON. The form e-NON is used in official documentation because it coincides with the official domain name e-non.org.
The official file extension for e-NON data is .enon.
e-NON-0 is the minimum feature set that all implementations have to support.
Basic data types are available:
null, boolean, nano-int, int, double, string, number, byte[], list, map
Only one top-level (root) entry is allowed. Any of the supported types can be the root entry.
e-NON-0 is analogous to JSON, plus binary support for int, double and byte[].
e-NON-G adds support for the Glossary and map references.
e-NON-M adds support for Metadata.
e-NON-S adds support for Streaming and chunked data blocks.
Unbounded map and list sizes are supported.
Multiple top-level (root) entries are allowed.
Chunked records are supported.
e-NON-X is eXtended e-NON, adding more data types:
byte, short, long, float, temporal, array
e-NON-X adds more data types in binary form, thereby requiring fewer bytes to transmit. For languages with native support for these types, there is less processing to encode and decode the values.
e-NON-Y adds support for encrYption.
e-NON-Z adds support for gZip compression.
Each field entry in an e-NON stream is prefixed with a 1-byte (ASCII) type indicator. If the type has a known size, then the data of the field follows the type indicator. If the size of the type is variable, then a size specifier precedes the data.
The prolog is the first data to appear in the e-NON stream. It informs the receiveing processor about the stream to follow. It contains a format version, a bitset that indicates which features are required, and a timestamp. All elements in the prolog are required.
An e-NON stream begins with single byte (unsigned) that indicates the version number of the e-NON format.
Immediately following the version byte is the feature-set byte, which is a bit set indicating the features which are needed to read the following e-NON stream.
Feature Set Bits:
feature set | bit mask (hex) |
---|---|
X | 0x01 |
G | 0x02 |
M | 0x04 |
S | 0x08 |
Z | 0x10 |
Y | 0x20 |
This is an 8-byte timestamp which represents the number of milliseconds since the Unix epoch.
The e-NON processor does not make any demands on what value is used for the timestamp. However, dates in temporal elements may be relative to this timestamp.
An e-NON data element contains up to 3 segments: prefix|size*|data*
For the remaining discussion, the symbol <element>
refers to any data element.
A <size>
is a series of 1-9 bytes that indicates the size of the data to follow. Refer to the section Sizes below for detail. A size can be 0.
The notation of (superscript) size
indicates that the preceding item occurs exactly size times.
This table summarizes the available data types. It includes types from all feature sets.
element | prefix | format † |
---|---|---|
null | 'N' |
‘N’ |
false | '0' |
‘0’ |
true | '1' |
‘1’ |
positive ∞ | '+' |
’+’ |
negative ∞ | '-' |
’-‘ |
NaN | '?' |
’?’ |
nano int | 0x80..0xFF |
(nano-int) |
byte | 'b' |
‘b’<byte-value> |
short | 's' |
’s’<short-value> |
int | 'i' |
‘i’<int-value> |
long | 'l' |
‘l’<long-value> |
float | 'f' |
‘f’<float-value> |
double | 'd' |
‘d’<ieee-754-64bit-value> |
string | '"' (double quote) |
’”‘(meta-size)<utf8-string-value> |
number | 'n' |
‘n’(meta-size)<utf8-base10-number-string> |
temporal | 't' |
‘t’(temporal) |
ISO-8601 | '8' |
‘8’(format-code)(meta-size)<utf8-string-value> |
BLOB | 'B' |
‘B’(meta-size)<byte>size |
list | '[' |
’[‘(meta-size)(element)size |
map | '{' |
’{‘(meta-size)<map-id>{(element)(element)}size |
array | '(' |
’(‘(element-prefix)(meta-size)<value>size |
map ref | '@' |
’@’<map-id> |
glossary ref | 'G' |
‘G’<glossary-key> |
compressed block | 'Z' |
‘Z’(meta-size)<byte>size |
encrypted block | 'Y' |
‘Y’(meta-size)<byte>size |
meta | 0x1B (esc) |
(esc)(meta-code)<byte>size |
Some of the prefixes specify contant values:
null, true, false, positive/negative
∞, NaN
.
These values require no bytes after the prefix because trhe value is contained within the prefix.
The byte range 0x80..0xFF represent the nano-int values -63..64. The numeric value is computed by subtracting 191 from the unsigned value of the prefix:
value = (unsigned_byte)prefix - 191
.
The data size is always the same for certain element types. Therefore, sizes are not specified in the e-NON stream.
byte
: 1 byte, 8 bitsshort
: 2 bytes, 16 bitsint
: 4 bytes, 32 bitslong
: 8 bytes, 64 bitsfloat
: 4 bytes, 32 bitsdouble
: 8 bytes, 64 bitsTypes requiring more than 1 byte are serialized in network byte order (big endian).
The String element is used to encode strings, char
, and char[]
types.
The character encoding is UTF-8.
The meta-size
size of a string is specified in bytes, not characters.
This is a UTF-8 string representation of a number. It can be used for any type of numeric data, but it is provided primarily to support big-integer and big-decimal types. Strings need to comply with the format defined by the Java BigDecimal constructor.
The temporal element is a variable-sized element used to store temporal (time-based) values. A temporal element contains one or more fields. The existence of fields is determined by the temporal subtype.
field | code | description | size | range* |
---|---|---|---|---|
year | Y | The proleptic year | 4 bytes | -231 .. 231-1 |
month | M | The numeric month within a year, starting at 1 | 1 byte | -128 .. 127 |
day | D | The numeric day within a month, starting at 1 | 1 byte | -128 .. 127 |
hour | h | The hour within a day, nominally 0..23 | 1 byte | -128 .. 127 |
minute | m | The minute within an hour, nominally 0..59 | 1 byte | -128 .. 127 |
second | s | The second within a minute, nominally 0..59 | 1 byte | -128 .. 127 |
millisecond in a minute | S | The milliseconds within a minute, nominally 0..59,999 (to be used inplace of seconds and nanoseconds, where ms accuracy is needed) | 2 bytes unsigned | 0..65536 |
ns | n | The nanosecond within a second, nominally 0..999,999,999 | 4 bytes | -231 .. 231-1 |
instant | i | A number of milliseconds since the Unix epoch, January 1, 1970 at midnight UTC. | 8 bytes | -263 .. 263-1 |
offset seconds | o | the offset, in seconds, from UTC, +/- 18 hours | 4 bytes | -231 .. 231-1 |
offset intervals | O | the offset, in 15-minute intervals, from UTC, -32:00..+31:45 | 1 byte | -128..127 |
zone ID | z | The string value of a timezone ID, per IANA TZDB | variable | n/a |
The temporal subtype is identified by a 1-byte prefix immediately following the temporal type prefix. The subtype determines what data follows.
Most of the subtypes indicate a compact, fixed combination of fields. However, there is one special prefix, ‘{‘, which introduces a temporal map.
prefix | temporal subtype | fields | size | compact |
---|---|---|---|---|
{ | temporal-map | any | variable | no |
0 | year | Y | 2 bytes | yes |
1 | year, month | YM | 3 bytes | yes |
2 | local date | YMD | 4 bytes | yes |
3 | local date-time, hour-minute | YMDhm | 6 bytes | yes |
4 | local date-time, s | YMDhms | 7 bytes | yes |
5 | local date-time, ms | YMDhmS | 8 bytes | yes |
6 | local date-time, ns | YMDhmsn | 11 bytes | yes |
7 | local time, seconds | hms | 3 bytes | yes |
8 | offset time, minutes | hmO | 3 bytes | yes |
9 | offset time, seconds | hmsO | 4 bytes | yes |
d | offset date-time, seconds | YMDhmsO | 8 bytes | yes |
i | instant UTC (date-time, ms) | i | 8 bytes | yes |
o | offset date-time (instant, ms) | iO | 9 bytes | yes |
z | zoned date-time (instant, ms) | iz | 8 + variable | yes |
I* | instant+ns, UTC (date-time, ns) | in | 12 bytes | yes |
O* | offset date-time+ns (instant, ns) | inO | 13 bytes | yes |
Z* | zoned date-time+ns (instant, ns) | inz | 12 + variable | yes |
*When adding nanoseconds to an instant, make sure not to double-count the milliseconds present in teh instant. Either zero-out the ms in the instnat, or subtract them from the nanoseconds value.
Note that the temporal subtype prefix is not to be confused with the field code. Although they may appear similar, they are used in different contexts. The field code refers to just a single field, while the subtype prefix specifies a group of multiple fields.
The temporal map is a variable combination of temporal fields. Each field is prefixed by a 1-byte key. Any temporal fields can appear in any order. The map is terminated with a key of ‘}’.
Each field should appear only once. Behavior is undefined is a field appers more than once.
The key for each field is the same as the code from the temporal field table. Tha size of the field is identified in that same table.
Temporal fields with fixed size are written with the 1-byte key follwed immediately by the field data.
The zone ID is a variable size string value. It is written with the key ‘z’ follwed by a 1-byte size, followed by the string data.
The compact values are an optimized set of temporal fields intended to support the most common usage patterns. Each compact value has a fixed set of fields, so unlike the temporal map, no field prefixes are necessary.
Fields are written strictly in the order they are listed in the fields column of the temporal subtypes table.
To qualify as compact, there are some additional restrictions on the field values. Values that don’t meet these requirements will be written as a temporal map.
The basic restrictions are:
field | size | compact value | description |
---|---|---|---|
year | 2 bytes | -32768 .. 32767 | A compact year has a more limited range due to smaller storage size. |
offset | 1 byte | round to 15 minute intervals* | The compact offset is stored as 1-byte. Each increment is ±15 minutes. |
Why not support ISO-8601?
ISO-8601 is an “everything for everyone, human-readable” format which is difficult to parse in all of its valid forms. IMHO, it’s not worth taking on that complexity for a binary data transfer protocol where human-readability is irrelevant compared to procesing speed and data compactness.The binary format above unambiguously identifies what fields are present, optimizes byte count, and requires no complex parsing or analysis of text.
TBD
Arbitrary binary data can be passed using this element. The number of bytes is specified by the meta-size
size. The prefix ‘B’ is followed immediately by size bytes of data.
The list is analgous to the array type in JSON. It contains any number of elements, and the elements may be of different types. The number of elements is determined by the size.
Each element in the list is specified completely, including the prefix, any meta-size and/or data.
This is analogous to the map type in JSON. It contains any number of key/value pairs. The number of entries is determined by the size. Each key and value is fully specified including the prefix, any meta-size and/or data.
The key of a map entry can be any element type, not limited to string values.
The map has a special map-id which appears just before the data. This is a numeric identifier that must be unique within each top-level container element (map or array). It is used to terminate cyclic references that may occur in the source data model. See Map Reference below.
The array is a tightly packed series of a single fixed-size type. The fixed-size type is specified only once, immediately after the array prefix ‘(‘. The number of elements in the array is determined by size.
The element-type can be any of the types enumerated in the Fixed Size Values section above.
The BLOB type, specified with the prefix ‘B’ is equivalent to a byte array specified as ‘(b’.
To create a boolean array use the the array prefix follwed by the false prefix: '(0'
.
A boolean array is packed into a bitset, each bit representing one of the boolean values. As a result, the transmission size of the array will be 1/8 the length of the original array, plus 1 more byte if the original array is not a mutiple of 8 in length. Bits after the last value are ignored.
This is a pointer to a map which has appeared previously in the e-NON stream.
The map-id
is a variable-length size code which is interpreted as a number rather than a size. Map ids can not contain special codes.
The scope of a reference is limited to the top level container (map, array) in which it appears. The map-id
must be unique within that scope.
A map-id
of 0 is a special case which indicates “none”. The map-id
of 0 may appear any number of times within any scope. Maps with a map-id
of 0 can’t be referenced.
This is a key into the most recently defined glossary. Just like a map-id
, a glossary-key
is a variable-length size-code, interpreted as a number. Glossary keys can not contain special codes.
A compressed block is a series of e-NON data elements that have been compressed together as a group. This allows compression to be applied to elements that otherwise would not support compression.
When decompressed, the compressed block will be parsed as if the compressed elements had appeared inline, uncompressed, at the location of the compressed block.
An encrypted block is a series of e-NON data elements that have been encrypted together as a group. This allows encryption to be applied to elements that otherwise would not support encryption.
When decrypted, the encrypted block will be parsed as if the decrypted elements had appeared inline, unencrypted, at the location of the encrypted block.
Some element types require a size to be specified. The size tells the parser how much data to read before starting the next element.
A size is specified by a series of 1, 3, or 9 bytes, depending on how big the size is. Sizes less than 251 can be specified in a single byte. Larger sizes use more bytes to define. Refer to the Size Codes table for details.
The size sub-element has been overloaded to allow metadata to be embedded into elements. By using a special prefix (0xFB = 251) in place of the size code we can introduce metadata. Refer to the meta section to see how metadata is formatted.
Any number of metadata sub-elements can occur. Eventually, metadata elements must be followed by a size, which ends the embedded metadata.
Metadata embedded into an element applies only to that element.
The primitive element types have fixed sizes. To conserve space, these elements do not support a size sub-element, and therefore can’t have embedded metadata.
The meta-size sub-element has variable length. The length is determined by the first byte. Refer to the Size Codes and Special Codes tables below.
A size code translates to a positive number. The size code may be 1, 3, or 9 bytes in length. The first byte is a prefix value in the range 0-250 (0x00..0xFA)
or 253-255 (0xFD..0xFF)
. The value of the prefix determines what follows.
prefix | size range | description |
---|---|---|
[0x00..0xFA] | [0 .. 250] | For sizes <= 250, the prefix itself is the size. The size is a single unsigned byte. |
0xFF (-1) | [0 .. 216 - 1] (~65K) | The next 2 bytes contain an unsigned short which specifies the size. |
0xFE (-2) | [0 .. 263 - 1] (~9*1018) | The next 8 bytes contain a signed long which specifies the size. Although this is a signed value, negative sizes are not allowed. |
0xFD (-3) | unbounded | Unbounded length applies to the container elements list and map. The end of the container is marked by the ETB control element. |
The e-NON reader/writer is not required to support all possible sizes. Implementations may choose arbitrary maximum sizes for any element types.
There are two special codes for meta-size: 0xFB and 0xFC.
prefix | meaning | description |
---|---|---|
0xFC (-4) | glossary id | The next segment will be a size structure indicating the glossary-id of the element. The reader should cache the value of the element with this glossary-id. The cached value will be used when a future glossary-ref calls for this glossary-id. |
0xFB (-5) | metadata | This code indicates that the next segment of data will be metadata. Metadata applied using the 0xFB code applies only to the data element in which it appears. Refer to the section named Meta for details of the meta structure. At the end of each metadata entry, the size is checked again for the containing element. Any size code is valid here, including another 0xFB. |
How sizes are interpreted for each element type:
prefix | type | size |
---|---|---|
‘N’ | null | 0* |
‘0’ | false | 0* |
‘1’ | true | 0* |
’+’ | positive ∞ | 0* |
’-‘ | negative ∞ | 0* |
’?’ | NaN | 0* |
0x80..0xFF | nano-int | 0* |
‘b’ | byte | 1* |
’s’ | short | 2* |
‘i’ | int32 | 4* |
‘l’ | long (int64) | 8* |
‘f’ | float | 4* |
‘d’ | double float | 8* |
’”’ | string (utf-8) | no. of bytes in the string |
‘n’ | number (utf-8) | no. of bytes in the string |
‘c’ | calendar | n/a† |
‘B’ | BLOB (bytes) | no. of bytes |
’[’ | list | no. of entries |
’{‘ | map | no. of key/value pairs |
’(‘ | array | no. of entries |
’@’ | map ref | id of reference target |
‘G’ | glossary ref | id of reference target |
In addition to the data elements, there are control elements. The control elements do not contribute to the data content, but can affect the processing.
A control prefix character must occur in a place where a data type prefix could legally appear.
prefix | meaning | structure |
---|---|---|
0x02 (^B) | start chunked data series (STX) | 0x02 |
0x03 (^C) | end chunked data series (ETX) | 0x03 |
0x04 (^D) | end of transmission (EOT) | 0x04 |
0x17 (^W) | end of transmission block (ETB) | 0x17 |
0x1B (^[) | escape (ESC) | 0x1B(meta-code)<meta-data> |
The following elements are part of a single entity but have been split into a series of chunks. These should be stitched back together on the receiving end. Refer to the section Cunked Data Series for more detail.
Each chunk must be a valid, complete e-NON structure. The receiveing end will not accept a partial structure.
The previously started series has ended and the last chunk has been received.
The end of transmission causes the processor to stop, even if there are bytes remaining. The processor will stop regardless when it reaches the end of the input stream, with or without EOT.
The end of transmision block indicates the end of an unbounded list or map. This control only applies after one of those containers has specified unbounded size (0xFD). It is undefined anywhere else and should result in an error.
The escape code introduces a an independent metadata entry. This can be used to place metadata in the stream outside other elements.
Chunks in a series must be of the same element type.
Supported types: string, byte[] (BLOB), array, list, map
Stitching behavior is defined by element type:
element type | stitching behavior |
---|---|
string | all of the strings will be concatenated to a single string. |
byte[] | the byte streams will be concatenated to form a single byte[] stream. |
array | and list: entries will be added to the same destination array or list. |
map | key-value pairs from all chunks will be added to a single map. |
A metadata entry provides additional information about the e-NON stream or about specific data elements. Some metadata entries may affect the way elements are interpreted.
Meta elements can appear in 2 ways: independent or embedded. In either case, the meta element starts with a 1-byte meta-code which indicates what type of metadata is coming. The structure and interpretation of the meta is determined by the meta-code.
Independent meta elements appear anywhere a data element can appear. They are prefixed by the ESC control character (0x1B). The prefix is followed by the meta-code and data.
The scope of independent metadata is limited to the container element where it appears. When metadata appears at the top level of the stream, then it applies to the remainder of the stream.
Embedded metadata is declared in the meta-size header of a data element. Meta elements can be applied to any element which supports meta-size.
Embeded meta is introduced by the special meta-size code 0xFB.
meta-code | meaning | structure |
---|---|---|
’/’ | comment | ’/’(size)<comment-utf8> |
’{‘ | meta map | ’/’(size){(string-element)(element)}size |
‘K’ | key ID | ‘K’(size)<bytes>size |
‘t’ | app-specific type information | ‘t’(size)_<type-info-utf8> |
‘z’ | compression algorithm | ‘z’(z-code) |
A comment is arbitrary UTF-8 string data that can appear in the stream as independent or embedded metadata.
Comments are intended to provide contextual information for human readers and should not have any side-effects on data elements. The e-NON reader should not alter the content (i.e. properties) or interpretation of data elements based on the existence or content of a comment. Other meta types and control types are provided for that purpose.
A data model may include explicit properties for comments. That type of property should be represented using normal elements such as string.
Apart form keeping comments separate from data elements, there are no restrctions on how comments can be used. For example, comments may be retained and externally associated with elements. Depending on the application, comments may be logged or displayed in a UI.
An e-NON reader may choose to ignore comments entirely.
The meta prefix ‘{‘ begins a key-value map, similar to the map data element. It contains any number of key-value pairs determined by its size. However, the meta map can only have string-type keys and can not have embedded metadata.
A reader may interpret the key value pairs in any way. There are no reserved keys or values. Keys that are meaninful to the e-NON specification will be assigned distinct meta types, and will not appear in the meta map.
To avoid key collision for publicly shared data, app- or vendor-specific keys should use distinct prefix and/or suffix values.
####Key ID This is a reference to an encryption key. It is not the key iteself, but a reference to a key that the recipeient already knows.
The key ID informs the reader that the referenced key can be used to decrypt subsequent encrypted data.
This value indicates the compression algorithm used to compress data.
If this meta type is applied as embedded metadata, then the enclosing element will be decompressed using the supplied algorithm.
If compression algorithm is applied as an independent meta element, then all subsequent compressible elements will be decompressed using this algorithm.
Care should be taken when applying a compression algorithm as independent metadata or embedded into a container (list or map). It implies that all compressible elements within that scope have been compressed, and therefore will be decompresed during reading.
This is a hint to an e-NON reader indicating how the following data element should be interpreted. For example, it could be a class name.
The glossary is a collection of data elements, each of which is assigned an ID number. Once an element is entered into the glossary it can be referenced multiple times in the e-NON stream by its ID. The goal is to reduce the stream length by eliminating redundancy.
The glossary is accumulated by the e-NON reader/writer during the reading/writing process.
To enter an element into the glossary the element is tagged by the special meta-size
code 0xFC
followed by a size value. The size value is the glossary ID, and must be unique within a glossary.
Only elements that allow meta-size
data can be placed into the glossary.
It is up to the e-NON writer to determine what elements are placed into the glossary. For example, a writer could add all strings longer then 3 chars or only strings that are also map keys.
Generally, an element would be entered into the glossary the first time it appears in the stream.
Once an element is entered into the glossary it may be referenced any number of times.
A glossary entry is referenced using a glossary reference element. When a glossary reference is encountered in the stream, the e-NON reader should replace it with the glossary entry defined for that ID.
The reader should produce an error when a glossary reference ID does not exist in the glossary.
The glossary does not support forward references. A glossary ID is only valid after it has been tagged to an element in the stream.
The scope of a glossary, by default, is the entire length of the e-NON stream. When an element is placed into the glossary it remains there until the glossary goes out of scope.
Data compression may be applied to the following data element types:
The comperssion algorithm is set by the ‘z’ meta type.
When compression is applied, the meta-size
size must be the compressed length of the data.
Data encryption can be applied to the following element types:
element | prefix | format † |
---|---|---|
byte, unsigned, positive | 'B' |
‘B’<unsigned-byte-value> |
short, unsigned, positive | 'S' |
‘S’<unsigned-short-value> |
int, unsigned, positive | 'I' |
‘I’<unsigned-int-value> |
long, unsigned, positive | 'L' |
‘L’<unsigned-long-value> |
Unsigned values allow smaller storage for a wider range of numbers. Since we need to have a prefix code anyway, we can embed the concept of the sign bit into the prefix code, leaving more room
©2018-2019 Giant Head Software, all rights reserved