Design Notes

Contents:

Design Notes

Goals

KISS

The core features of e-NON are intended to provide a somewhat enhanced data transfer format compared to text-based formats such as JSON and XML. By remaining simple (compared to GIOP, for example), the intent is to provide the benefits of JSON to systems communicating over non-HTTP transports.

Whatever complexity has been introduced (hopefully manageable) has been done for the sake of flexibility and performance.

Efficiency

Of course, JSON and XML can be serialized over non-HTTP channels, but if we’re not limited to ASCII or human readability, why bloat the data with Base64 encoding? Why expend the computing resources to marshal and unmarshal everything to strings?

Constrained freedom

There’s a degree of freedom allowed by the e-NON spcification. It’s not assumed that an application’s data must be iterpreted by all readers on the planet.

If you are an application developer then you probably have control over the sender or receiver of your data, maybe both. You probably have some control over the channels where your data exists, where it comes from and where it goes.

For that reason, most applications don’t have to conform their data to universal standards. Data just needs to be meaningful to a target audience. Accordingly, e-NON does not try to define semantics, but gives license to the data owner to do so.

Specifically:

e-NON allows comments which have been forbidden in JSON.
There is support for arbitrary metadata fields which don’t need approval from an international standards body.

The idea is that developers should be able to adapt e-NON to their applications, not the other way around.

For applications that require broad, open support for a larger community, the e-NON specification aims to provide enough guidance that implementors can rely upon for consistency. Such applications can avoid the customizable “playground” areas of the format.

If maximum compatibility is required, implementations should avoid these features:

char(8-bit)
There are too many variables in the interpretation of these values. Use the UTF-8 char instead.
meta map
There is intentionally no official definition for these values. However, unofficial definitions can be adopted. To the extent that these definitions are supported, that may be good enough.

For a lowest common denominator approach, data can be constrained to the e-NON-0 level. This is the easiest to implement, if an implementation is not otherwise available for a given language or environment.

If the current e-NON spec fails to meet some need, there is plenty of room for future expansion.

(top)

Declarative Notation

declarative notation: a data block is preceded by the size of the block. There is no delimiter at the end.

bracket delimiters: character symbols that enclose a block of data by marking the beginning and end of the data.

A form of declarative notation was chosen for performance and other practical purposes.

Using declarative notation eliminates a few problems long associated with bracket delimiters:

delimiter collision: Since e-NON is a binary-friendly format, the ability to put any byte value into the stream takes precedence over the need to visually confirm data boundaries. None of the delimiter collision workarounds are necessary.
delimiter scanning: With bracket delimiters, a reader has to examine every byte to determine when the end delimiter has arrived.

(top)

Character Type

The character type, UTF-8, is provided to support languages that have character types. The dedicated type codes, ‘a’ and ‘c’, provide an efficient way to inform the e-NON processor how to cast these values. Another type could carry the data (byte, int, BLOB, string, etc.) but the processor would need to be informed that char is the intended type. This could be done using metadata, but that would be verbose.

(top)

Array Type

The array type is provided as an efficient way to pack values of the same type. Because the list type allows elements of mixed types it’s necessary to specify the type for every item. This extra byte per item is expensive for primitive values which are 8 bytes or less.

For an array, the item type is specified just once. The items are then packed with no gaps. This should make the array efficient for large homogeneous data sets.

In a similar fashion, non-primitive types could be put into arrays, but the savings will be less pronounced. Non-primitive types have other overhead such as size and metadata. Also, non-primitive types can have null values which would have to be checked, and dealt with in some way.

No Character Array

The e-NON format does not have an explicit type forchar[].

(top)

Glossary

The e-NON glossary entries are explicitly assigned IDs by using the special meta-size prefix 0xFC. Other glossary implementations use implicit entries. For example, CBOR enters all strings into a glossary based on their order of first appearance in the stream.

The disadvantage of the implicit method is that it requires a reader to accumulate strings in a glossary even if they never appear again in the stream. In a long stream this could lead to arbitrarily large amounts of data being retained.

The explicit entry method used by e-NON requires at least 2 bytes extra to make a glossary entry. However it does have some valuable advantages:

The use of the glossary is determined by the stream itself. The reader doesn’t need to start caching strings just in case they might be referenced later in the stream. In fact, a stream may choose to have no glossary entries at all, in which case there is no glossary overhead, and the reader retains no glossary entries.
The glossary may contain values other than strings. The e-NON writer can add arbitrary items to the glossary by tagging them with glossary IDs.

(top)