UJO Binary Data Object Notation is intended to be used as data exchange format through network connections and to store data in files. The name UJO is Esperanto and means container.
This text describes the UJO format. It defines how data is stored in a sequence of octets. A description of an interface to use UJO is not within the scope of this document.
New developments in the past years have changed the number of different devices connected to the internet significantly. The number of so called “constrained devices” with limited resources like memory and cpu power is growing rapidly. This development is pushed by the “Internet of Things” and the increasing use of single board computers for personal projects.
Text based formats like XML or JSON have to be parsed and converted to binary data using CPU power and memory. Encoding binary data to text increases the volume of the data to be transported on the network significantly.
UJO is using as less bytes as possible to encode data depending on the type. No additional names are added to any data field unless an associative array is using text keys. It is easy to implement and allows flexible data modeling due to hierarchical containers.
Status of this Document
The current status of this document is “Final”. Any new features or changes in the UJO document structure will be part of a new version of this specification.
All data is stored in a sequence of octets called a document.
The byte sequence is little-endian.
int8 1 octets (8-bit signed integer) int16 2 octets (16-bit signed integer) int32 4 octets (32-bit signed integer) int64 8 octets (64-bit signed integer) uint8 1 octets (8-bit unsigned integer) uint16 2 octets (16-bit unsigned integer) uint32 4 octets (32-bit unsigned integer) uint64 8 octets (64-bit unsigned integer) float16 2 octets (16-bit IEEE 754 floating point) float32 4 octets (32-bit IEEE 754 floating point) float64 8 octets (64-bit IEEE 754 floating point)
A document starts with a header containing a magic sequence of octets to identify an UJO document. The version number allows implementations to identify and support different versions of the format.
The compression value in the header is always \x00. It is reserved to be used in later versions of UJO.
The content of the document is stored in a container. A container is a combination of atomic types and other containers. A document must contain at least one container. This container is called the top level container.
document ::= "\x5F\x55\x4A\x4F" version compression container version ::= int16 this version = 1 compression ::= "\x00" uncompressed
An element is either an atomic type data field or a container.
element ::= atomic | container | null
A container type contains 0 or more elements. Nesting containers allows flexible data modeling.
container ::= "\x30" list | "\x31" map | "\x32" table
A list is an ordered sequence of elements. The number of elements in the list can be 0.
list ::= (element*) "\x00"
Map, associative array or dictionary composed of a collection of key/value pairs. The key can be of any atomic type. Similar values of different types are different keys. For example 42 (int32), ’42’ (string) or 42 (uint32). The number of key/value pairs can be 0.
UJO allows the same key multiple times in a single map. Depending on the application it does not make sense to use this feature.
map ::= (pair*) "\x00" pair ::= key value key ::= atomic | atomic_null value ::= element
A table is a set of rows and named columns. Each row must contain the same number of elements as the list of column names. The last row is terminates by “\x00″.
table ::= columns (row*) "\x00" columns ::= (string*) "\x00" row ::= (elements*)
Atomic types start with a type identifier and do not contain any other element.
atomic::= "\x01" float64 double precision float (8+1 octets) | "\x02" float32 single precision float (4+1 octets) | "\x03" float16 half precision float (2+1 octets) | "\x04" string UTF-8, UTF-16, UTF-32 or cstring | "\x05" int64 64 bit signed integer (8+1 octets) | "\x06" int32 32 bit signed integer (4+1 octets) | "\x07" int16 16 bit signed integer (2+1 octets) | "\x08" int8 8 bit signed integer (1+1 octets) | "\x09" uint64 64 bit unsigned integer (8+1 octets) | "\x0A" uint32 32 bit unsigned integer (4+1 octets) | "\x0B" uint16 16 bit unsigned integer (2+1 octets) | "\x0C" uint8 8 bit unsigned integer (1+1 octets) | "\x0D" boolean True or False (2 octets) | "\x0E" binary binary data | "\x0F" None (Null, Void) | "\x10" int64 UNIX datetime (8+1 octets) | "\x11" date date (4+1 octets) | "\x12" time time (3+1 octets) | "\x13" timestamp timestamp (9+1 octets)
Date and Time
A UNIX datetime is defined as seconds since 00:00:00 UTC, 1 January 1970. Negative values express the time before this particular date. Besides a UNIX datetime date and time can be expressed as a sequence of integer values. The UJO timestamp type is used by applications that need a resolution of 1 millisecond.
date ::= year month day time ::= hour minute second datetime ::= date time timestamp ::= date time millisec year ::= int16 negative values are BC month ::= uint8 values 1-12 day ::= uint8 values 1-31 hour ::= unit8 values 0-23 minute ::= uint8 values 0-59 second ::= uint8 values 0-61 millisec ::= uint16 values 0-999
UJO supports \x00 terminated c strings with one byte per character or Unicode strings. Unicode strings are not \x00 terminated. The length of a Unicode string is not the number of bytes. Depending of the format each unit is 8, 16 or 32 bit. UJO stores the number of units as length of the string.
string ::= strsize strsub (unit*) strsize ::= uint32 number of units strsub ::= "\x00" 8 bit "\x00" terminated string (cstring) | "\x01" UTF-8 | "\x02" UTF-16 | "\x03" UTF-32 | "\x80" User defined (\x80 - \xFF) unit ::= uint8 | uint16 | uint32
Boolean values, True and False are defined with a single octet.
boolean ::= "\x00" False | "\x01" True
The binary type is a sequence of octets. The type of data is either a generic binary or defined by a specific subtype.
binary ::= binsize binsub (uint8*) binsize ::= uint32 number of octets bin_sub ::= "\x00" Generic binary subtype | "\x01" UJO document | "\x80" User defined (\x80 - \xFF)
The null value is defined for any type to express emptiness. The difference to none or void is the type information. None is a type of its own and is not suitable to define e.g. an empty string.
null ::= atomic_null
atomic_null ::= "\x81" empty float64 (double) | "\x82" empty float32 (single) | "\x83" empty float16 (half) | "\x84" empty string | "\x85" empty signed integer64 | "\x86" empty signed integer32 | "\x87" empty signed integer16 | "\x88" empty signed integer8 | "\x89" empty unsigned integer64 | "\x8A" empty unsigned integer32 | "\x8B" empty unsigned integer16 | "\x8C" empty unsigned integer8 | "\x8D" empty boolean | "\x8E" empty binary | "\x90" empty UTC UNIX datetime | "\x91" empty UTC date | "\x92" empty UTC time | "\x93" empty timestamp
The recommended file extension for UJO data is *.ujo.