Contents:

e-NON

e-NON is electronic Network Object Notation: an object serialization format that supports binary data natively.

e-NON contains the same basic data types as JSON, (map, array, primitives, etc.) but rendered in a more compact form and allowing raw binary data. Binary encoded types are written in network byte order.

Extensions to the e-NON format have been defined to suit common (and uncommon) data serialization needs.

Version

The e-NON version described in this document is 0 (zero).

Purpose

The e-NON format is intended as a more efficient and flexible data transfer format compared to JSON and other text-based formats such as XML.

JSON has been a huge success in the software industry, but it has limitations that make it difficult to use for some applications. In particular, it is inefficient with binary data.

Protocols more “binary-friendly” than HTTP are gaining acceptance. WebSocket and RSocket are two examples. e-NON was inspired as a means of encoding object data in the RSocket payload.

(top)

Overview

(top)

Nomenclature

Format Name

The format may be written as e-NON or eNON. The form e-NON is used in official documentation because it coincides with the official domain name e-non.org.

File Extension

The official file extension for e-NON data is .enon.

Feature Sets

e-NON-0 (zero)

e-NON-0 is the minimum feature set that all implementations have to support.

e-NON-0 is analogous to JSON, plus binary support for int, double and byte[].

e-NON-G

e-NON-G adds support for the Glossary and map references.

e-NON-M

e-NON-M adds support for Metadata.

e-NON-S

e-NON-S adds support for Streaming and chunked data blocks.

e-NON-X

e-NON-X is eXtended e-NON, adding more data types:
byte, short, long, float, temporal, array

e-NON-X adds more data types in binary form, thereby requiring fewer bytes to transmit. For languages with native support for these types, there is less processing to encode and decode the values.

e-NON-Y

e-NON-Y adds support for encrYption.

e-NON-Z

e-NON-Z adds support for gZip compression.

Structure

Each field entry in an e-NON stream is prefixed with a 1-byte (ASCII) type indicator. If the type has a known size, then the data of the field follows the type indicator. If the size of the type is variable, then a size specifier precedes the data.

Prolog

The prolog is the first data to appear in the e-NON stream. It informs the receiveing processor about the stream to follow. It contains a format version, a bitset that indicates which features are required, and a timestamp. All elements in the prolog are required.

Format Version

An e-NON stream begins with single byte (unsigned) that indicates the version number of the e-NON format.

Required Feature Sets

Immediately following the version byte is the feature-set byte, which is a bit set indicating the features which are needed to read the following e-NON stream.

Feature Set Bits:

feature set bit mask (hex)
X 0x01
G 0x02
M 0x04
S 0x08
Z 0x10
Y 0x20

When no bits are set, the e-NON-0 feature set is implied.

Timestamp

This is an 8-byte timestamp which represents the number of milliseconds since the Unix epoch.

The e-NON processor does not make any demands on what value is used for the timestamp. However, dates in temporal elements may be relative to this timestamp.

(top)

Elements and Sizes

An e-NON data element contains up to 3 segments: prefix|size*|data*

For the remaining discussion, the symbol <element> refers to any data element.

A <size> is a series of 1-9 bytes that indicates the size of the data to follow. Refer to the section Sizes below for detail. A size can be 0.

The notation of (superscript) size indicates that the preceding item occurs exactly size times.

(top)

Data Element Types

Data Element Reference

This table summarizes the available data types. It includes types from all feature sets.

element prefix format
null 'N' ‘N’
false '0' ‘0’
true '1' ‘1’
positive ∞ '+' ’+’
negative ∞ '-' ’-‘
NaN '?' ’?’
nano int 0x80..0xFF (nano-int)
byte 'b' ‘b’<byte-value>
short 's' ’s’<short-value>
int 'i' ‘i’<int-value>
long 'l' ‘l’<long-value>
float 'f' ‘f’<float-value>
double 'd' ‘d’<ieee-754-64bit-value>
string '"' (double quote) ’”‘(meta-size)<utf8-string-value>
number 'n' ‘n’(meta-size)<utf8-base10-number-string>
temporal 't' ‘t’(temporal)
ISO-8601 '8' ‘8’(format-code)(meta-size)<utf8-string-value>
BLOB 'B' ‘B’(meta-size)<byte>size
list '[' ’[‘(meta-size)(element)size
map '{' ’{‘(meta-size)<map-id>{(element)(element)}size
array '(' ’(‘(element-prefix)(meta-size)<value>size
map ref '@' ’@’<map-id>
glossary ref 'G' ‘G’<glossary-key>
compressed block 'Z' ‘Z’(meta-size)<byte>size
encrypted block 'Y' ‘Y’(meta-size)<byte>size
meta 0x1B (esc) (esc)(meta-code)<byte>size

† There are no spaces in the formatting.

(top)

Constant Values

Some of the prefixes specify contant values:
null, true, false, positive/negative , NaN.

These values require no bytes after the prefix because trhe value is contained within the prefix.

Nano-int

The byte range 0x80..0xFF represent the nano-int values -63..64. The numeric value is computed by subtracting 191 from the unsigned value of the prefix:
value = (unsigned_byte)prefix - 191.

Fixed Size Values

The data size is always the same for certain element types. Therefore, sizes are not specified in the e-NON stream.

Types requiring more than 1 byte are serialized in network byte order (big endian).

String

The String element is used to encode strings, char, and char[] types.

The character encoding is UTF-8.

The meta-size size of a string is specified in bytes, not characters.

Number

This is a UTF-8 string representation of a number. It can be used for any type of numeric data, but it is provided primarily to support big-integer and big-decimal types. Strings need to comply with the format defined by the Java BigDecimal constructor.

Temporal

The temporal element is a variable-sized element used to store temporal (time-based) values. A temporal element contains one or more fields. The existence of fields is determined by the temporal subtype.

Temporal Fields
field code description size range*
year Y The proleptic year 4 bytes -231 .. 231-1
month M The numeric month within a year, starting at 1 1 byte -128 .. 127
day D The numeric day within a month, starting at 1 1 byte -128 .. 127
hour h The hour within a day, nominally 0..23 1 byte -128 .. 127
minute m The minute within an hour, nominally 0..59 1 byte -128 .. 127
second s The second within a minute, nominally 0..59 1 byte -128 .. 127
millisecond in a minute S The milliseconds within a minute, nominally 0..59,999 (to be used inplace of seconds and nanoseconds, where ms accuracy is needed) 2 bytes unsigned 0..65536
ns n The nanosecond within a second, nominally 0..999,999,999 4 bytes -231 .. 231-1
instant i A number of milliseconds since the Unix epoch, January 1, 1970 at midnight UTC. 8 bytes -263 .. 263-1
offset seconds o the offset, in seconds, from UTC, +/- 18 hours 4 bytes -231 .. 231-1
offset intervals O the offset, in 15-minute intervals, from UTC, -32:00..+31:45 1 byte -128..127
zone ID z The string value of a timezone ID, per IANA TZDB variable n/a

*The range is determined by the storage type of the value, rather than the semantic meaning of the field. Any value within range can be transmitted in the e-NON stream. It is left to the sender and receiver to agree on the interpretation.  

Temporal Subtypes

The temporal subtype is identified by a 1-byte prefix immediately following the temporal type prefix. The subtype determines what data follows.

Most of the subtypes indicate a compact, fixed combination of fields. However, there is one special prefix, ‘{‘, which introduces a temporal map.

prefix temporal subtype fields size compact
{ temporal-map any variable no
0 year Y 2 bytes yes
1 year, month YM 3 bytes yes
2 local date YMD 4 bytes yes
3 local date-time, hour-minute YMDhm 6 bytes yes
4 local date-time, s YMDhms 7 bytes yes
5 local date-time, ms YMDhmS 8 bytes yes
6 local date-time, ns YMDhmsn 11 bytes yes
7 local time, seconds hms 3 bytes yes
8 offset time, minutes hmO 3 bytes yes
9 offset time, seconds hmsO 4 bytes yes
d offset date-time, seconds YMDhmsO 8 bytes yes
i instant UTC (date-time, ms) i 8 bytes yes
o offset date-time (instant, ms) iO 9 bytes yes
z zoned date-time (instant, ms) iz 8 + variable yes
I* instant+ns, UTC (date-time, ns) in 12 bytes yes
O* offset date-time+ns (instant, ns) inO 13 bytes yes
Z* zoned date-time+ns (instant, ns) inz 12 + variable yes

  *When adding nanoseconds to an instant, make sure not to double-count the milliseconds present in teh instant. Either zero-out the ms in the instnat, or subtract them from the nanoseconds value.

Note that the temporal subtype prefix is not to be confused with the field code. Although they may appear similar, they are used in different contexts. The field code refers to just a single field, while the subtype prefix specifies a group of multiple fields.

Temporal Map

The temporal map is a variable combination of temporal fields. Each field is prefixed by a 1-byte key. Any temporal fields can appear in any order. The map is terminated with a key of ‘}’.

Each field should appear only once. Behavior is undefined is a field appers more than once.

The key for each field is the same as the code from the temporal field table. Tha size of the field is identified in that same table.

Temporal fields with fixed size are written with the 1-byte key follwed immediately by the field data.

The zone ID is a variable size string value. It is written with the key ‘z’ follwed by a 1-byte size, followed by the string data.

Compact Values

The compact values are an optimized set of temporal fields intended to support the most common usage patterns. Each compact value has a fixed set of fields, so unlike the temporal map, no field prefixes are necessary.

Fields are written strictly in the order they are listed in the fields column of the temporal subtypes table.

To qualify as compact, there are some additional restrictions on the field values. Values that don’t meet these requirements will be written as a temporal map.

The basic restrictions are:

field size compact value description
year 2 bytes -32768 .. 32767 A compact year has a more limited range due to smaller storage size.
offset 1 byte round to 15 minute intervals* The compact offset is stored as 1-byte. Each increment is ±15 minutes.

*The e-NON processor will not round or otherwise truncate values that are not already whole values.

Why not support ISO-8601?
ISO-8601 is an “everything for everyone, human-readable” format which is difficult to parse in all of its valid forms. IMHO, it’s not worth taking on that complexity for a binary data transfer protocol where human-readability is irrelevant compared to procesing speed and data compactness.

The binary format above unambiguously identifies what fields are present, optimizes byte count, and requires no complex parsing or analysis of text.

ISO-8601

TBD

BLOB

Arbitrary binary data can be passed using this element. The number of bytes is specified by the meta-size size. The prefix ‘B’ is followed immediately by size bytes of data.

(top)

List

The list is analgous to the array type in JSON. It contains any number of elements, and the elements may be of different types. The number of elements is determined by the size.

Each element in the list is specified completely, including the prefix, any meta-size and/or data.

Map

This is analogous to the map type in JSON. It contains any number of key/value pairs. The number of entries is determined by the size. Each key and value is fully specified including the prefix, any meta-size and/or data.

The key of a map entry can be any element type, not limited to string values.

The map has a special map-id which appears just before the data. This is a numeric identifier that must be unique within each top-level container element (map or array). It is used to terminate cyclic references that may occur in the source data model. See Map Reference below.

(top)

Array

The array is a tightly packed series of a single fixed-size type. The fixed-size type is specified only once, immediately after the array prefix ‘(‘. The number of elements in the array is determined by size.

The element-type can be any of the types enumerated in the Fixed Size Values section above.

The BLOB type, specified with the prefix ‘B’ is equivalent to a byte array specified as ‘(b’.

Boolean Array: Bitset

To create a boolean array use the the array prefix follwed by the false prefix: '(0'.

A boolean array is packed into a bitset, each bit representing one of the boolean values. As a result, the transmission size of the array will be 1/8 the length of the original array, plus 1 more byte if the original array is not a mutiple of 8 in length. Bits after the last value are ignored.

(top)

Map Reference

This is a pointer to a map which has appeared previously in the e-NON stream.

The map-id is a variable-length size code which is interpreted as a number rather than a size. Map ids can not contain special codes.

The scope of a reference is limited to the top level container (map, array) in which it appears. The map-id must be unique within that scope.

A map-id of 0 is a special case which indicates “none”. The map-id of 0 may appear any number of times within any scope. Maps with a map-id of 0 can’t be referenced.

Glossary Reference

This is a key into the most recently defined glossary. Just like a map-id, a glossary-key is a variable-length size-code, interpreted as a number. Glossary keys can not contain special codes.

Compressed Block

A compressed block is a series of e-NON data elements that have been compressed together as a group. This allows compression to be applied to elements that otherwise would not support compression.

When decompressed, the compressed block will be parsed as if the compressed elements had appeared inline, uncompressed, at the location of the compressed block.

Encrypted Block

An encrypted block is a series of e-NON data elements that have been encrypted together as a group. This allows encryption to be applied to elements that otherwise would not support encryption.

When decrypted, the encrypted block will be parsed as if the decrypted elements had appeared inline, unencrypted, at the location of the encrypted block.

(top)

Meta-Size

Sizes

Some element types require a size to be specified. The size tells the parser how much data to read before starting the next element.

A size is specified by a series of 1, 3, or 9 bytes, depending on how big the size is. Sizes less than 251 can be specified in a single byte. Larger sizes use more bytes to define. Refer to the Size Codes table for details.

Embedded Metadata

The size sub-element has been overloaded to allow metadata to be embedded into elements. By using a special prefix (0xFB = 251) in place of the size code we can introduce metadata. Refer to the meta section to see how metadata is formatted.

Any number of metadata sub-elements can occur. Eventually, metadata elements must be followed by a size, which ends the embedded metadata.

Metadata embedded into an element applies only to that element.

Fixed-size Elements

The primitive element types have fixed sizes. To conserve space, these elements do not support a size sub-element, and therefore can’t have embedded metadata.

Meta-Size Format

The meta-size sub-element has variable length. The length is determined by the first byte. Refer to the Size Codes and Special Codes tables below.

Size Codes

A size code translates to a positive number. The size code may be 1, 3, or 9 bytes in length. The first byte is a prefix value in the range 0-250 (0x00..0xFA) or 253-255 (0xFD..0xFF). The value of the prefix determines what follows.

prefix size range description
[0x00..0xFA] [0 .. 250] For sizes <= 250, the prefix itself is the size. The size is a single unsigned byte.
0xFF (-1) [0 .. 216 - 1] (~65K) The next 2 bytes contain an unsigned short which specifies the size.
0xFE (-2) [0 .. 263 - 1] (~9*1018) The next 8 bytes contain a signed long which specifies the size. Although this is a signed value, negative sizes are not allowed.
0xFD (-3) unbounded Unbounded length applies to the container elements list and map. The end of the container is marked by the ETB control element.

Note that there is no 4-byte size prefix.

The e-NON reader/writer is not required to support all possible sizes. Implementations may choose arbitrary maximum sizes for any element types.

Special Codes

There are two special codes for meta-size: 0xFB and 0xFC.

prefix meaning description
0xFC (-4) glossary id The next segment will be a size structure indicating the glossary-id of the element. The reader should cache the value of the element with this glossary-id. The cached value will be used when a future glossary-ref calls for this glossary-id.
0xFB (-5) metadata This code indicates that the next segment of data will be metadata. Metadata applied using the 0xFB code applies only to the data element in which it appears. Refer to the section named Meta for details of the meta structure. At the end of each metadata entry, the size is checked again for the containing element. Any size code is valid here, including another 0xFB.

 

How sizes are interpreted for each element type:

prefix type size
‘N’ null 0*
‘0’ false 0*
‘1’ true 0*
’+’ positive ∞ 0*
’-‘ negative ∞ 0*
’?’ NaN 0*
0x80..0xFF nano-int 0*
‘b’ byte 1*
’s’ short 2*
‘i’ int32 4*
‘l’ long (int64) 8*
‘f’ float 4*
‘d’ double float 8*
’”’ string (utf-8) no. of bytes in the string
‘n’ number (utf-8) no. of bytes in the string
‘c’ calendar n/a
‘B’ BLOB (bytes) no. of bytes
’[’ list no. of entries
’{‘ map no. of key/value pairs
’(‘ array no. of entries
’@’ map ref id of reference target
‘G’ glossary ref id of reference target

* size is fixed for this type and is not included in the e-NON data.
† Calendar size is determined by the header byte rather than a meta-size.

(top)

Controls

In addition to the data elements, there are control elements. The control elements do not contribute to the data content, but can affect the processing.

A control prefix character must occur in a place where a data type prefix could legally appear.

prefix meaning structure
0x02 (^B) start chunked data series (STX) 0x02
0x03 (^C) end chunked data series (ETX) 0x03
0x04 (^D) end of transmission (EOT) 0x04
0x17 (^W) end of transmission block (ETB) 0x17
0x1B (^[) escape (ESC) 0x1B(meta-code)<meta-data>

 

STX

The following elements are part of a single entity but have been split into a series of chunks. These should be stitched back together on the receiving end. Refer to the section Cunked Data Series for more detail.

Each chunk must be a valid, complete e-NON structure. The receiveing end will not accept a partial structure.

ETX

The previously started series has ended and the last chunk has been received.

EOT

The end of transmission causes the processor to stop, even if there are bytes remaining. The processor will stop regardless when it reaches the end of the input stream, with or without EOT.

ETB

The end of transmision block indicates the end of an unbounded list or map. This control only applies after one of those containers has specified unbounded size (0xFD). It is undefined anywhere else and should result in an error.

ESC

The escape code introduces a an independent metadata entry. This can be used to place metadata in the stream outside other elements.

(top)

Chunked Data Series

Chunks in a series must be of the same element type.

Supported types: string, byte[] (BLOB), array, list, map

Stitching behavior is defined by element type:

element type stitching behavior
string all of the strings will be concatenated to a single string.
byte[] the byte streams will be concatenated to form a single byte[] stream.
array and list: entries will be added to the same destination array or list.
map key-value pairs from all chunks will be added to a single map.

(top)

Meta

A metadata entry provides additional information about the e-NON stream or about specific data elements. Some metadata entries may affect the way elements are interpreted.

Meta elements can appear in 2 ways: independent or embedded. In either case, the meta element starts with a 1-byte meta-code which indicates what type of metadata is coming. The structure and interpretation of the meta is determined by the meta-code.

Independent Meta

Independent meta elements appear anywhere a data element can appear. They are prefixed by the ESC control character (0x1B). The prefix is followed by the meta-code and data.

The scope of independent metadata is limited to the container element where it appears. When metadata appears at the top level of the stream, then it applies to the remainder of the stream.

Embedded Meta

Embedded metadata is declared in the meta-size header of a data element. Meta elements can be applied to any element which supports meta-size.

Embeded meta is introduced by the special meta-size code 0xFB.

Meta Types

meta-code meaning structure
’/’ comment ’/’(size)<comment-utf8>
’{‘ meta map ’/’(size){(string-element)(element)}size
‘K’ key ID ‘K’(size)<bytes>size
‘t’ app-specific type information ‘t’(size)_<type-info-utf8>
‘z’ compression algorithm ‘z’(z-code)

Comment

A comment is arbitrary UTF-8 string data that can appear in the stream as independent or embedded metadata.

Comments are intended to provide contextual information for human readers and should not have any side-effects on data elements. The e-NON reader should not alter the content (i.e. properties) or interpretation of data elements based on the existence or content of a comment. Other meta types and control types are provided for that purpose.

A data model may include explicit properties for comments. That type of property should be represented using normal elements such as string.

Apart form keeping comments separate from data elements, there are no restrctions on how comments can be used. For example, comments may be retained and externally associated with elements. Depending on the application, comments may be logged or displayed in a UI.

An e-NON reader may choose to ignore comments entirely.

Meta Map

The meta prefix ‘{‘ begins a key-value map, similar to the map data element. It contains any number of key-value pairs determined by its size. However, the meta map can only have string-type keys and can not have embedded metadata.

A reader may interpret the key value pairs in any way. There are no reserved keys or values. Keys that are meaninful to the e-NON specification will be assigned distinct meta types, and will not appear in the meta map.

To avoid key collision for publicly shared data, app- or vendor-specific keys should use distinct prefix and/or suffix values.

####Key ID This is a reference to an encryption key. It is not the key iteself, but a reference to a key that the recipeient already knows.

The key ID informs the reader that the referenced key can be used to decrypt subsequent encrypted data.

Compression Algorithm

This value indicates the compression algorithm used to compress data.

If this meta type is applied as embedded metadata, then the enclosing element will be decompressed using the supplied algorithm.

If compression algorithm is applied as an independent meta element, then all subsequent compressible elements will be decompressed using this algorithm.

Care should be taken when applying a compression algorithm as independent metadata or embedded into a container (list or map). It implies that all compressible elements within that scope have been compressed, and therefore will be decompresed during reading.

App-specific Type Information

This is a hint to an e-NON reader indicating how the following data element should be interpreted. For example, it could be a class name.

(top)

Glossary

The glossary is a collection of data elements, each of which is assigned an ID number. Once an element is entered into the glossary it can be referenced multiple times in the e-NON stream by its ID. The goal is to reduce the stream length by eliminating redundancy.

Glossary Entries

The glossary is accumulated by the e-NON reader/writer during the reading/writing process.

To enter an element into the glossary the element is tagged by the special meta-size code 0xFC followed by a size value. The size value is the glossary ID, and must be unique within a glossary.

Only elements that allow meta-size data can be placed into the glossary.

It is up to the e-NON writer to determine what elements are placed into the glossary. For example, a writer could add all strings longer then 3 chars or only strings that are also map keys.

Generally, an element would be entered into the glossary the first time it appears in the stream.

(top)

Glossary References

Once an element is entered into the glossary it may be referenced any number of times.

A glossary entry is referenced using a glossary reference element. When a glossary reference is encountered in the stream, the e-NON reader should replace it with the glossary entry defined for that ID.

The reader should produce an error when a glossary reference ID does not exist in the glossary.

The glossary does not support forward references. A glossary ID is only valid after it has been tagged to an element in the stream.

Glossary Scope

The scope of a glossary, by default, is the entire length of the e-NON stream. When an element is placed into the glossary it remains there until the glossary goes out of scope.

(top)

Data Compression

Data compression may be applied to the following data element types:

The comperssion algorithm is set by the ‘z’ meta type.

When compression is applied, the meta-size size must be the compressed length of the data.

(top)

Encryption

Data encryption can be applied to the following element types:

(top)

Future Elements

Future Element Types

element prefix format
byte, unsigned, positive 'B' ‘B’<unsigned-byte-value>
short, unsigned, positive 'S' ‘S’<unsigned-short-value>
int, unsigned, positive 'I' ‘I’<unsigned-int-value>
long, unsigned, positive 'L' ‘L’<unsigned-long-value>

Unsigned primitive numbers

Unsigned values allow smaller storage for a wider range of numbers. Since we need to have a prefix code anyway, we can embed the concept of the sign bit into the prefix code, leaving more room

Unsigned Byte

Unsigned Short

Unsigned Int

Unsigned Long


©2018-2019 Giant Head Software, all rights reserved