UJO Version 1

UJO Binary Data Object Notation

Specification version 1.

UJO Binary Data Object Notation is intended to be used as data exchange format through network connections and to store data in files. The name UJO is Esperanto and means container.

This text describes the UJO format. It defines how data is stored in a sequence of octets. A description of an interface to use UJO is not within the scope of this document.

Motivation

New developments in the past years have changed the number of different devices connected to the internet significantly. The number of so called “constrained devices” with limited resources like memory and cpu power is growing rapidly. This development is pushed by the “Internet of Things” and the increasing use of single board computers for personal projects.

Text based formats like XML or JSON have to be parsed and converted to binary data using CPU power and memory. Encoding binary data to text increases the volume of the data to be transported on the network significantly.

UJO is using as less bytes as possible to encode data depending on the type. No additional names are added to any data field unless an associative array is using text keys. It is easy to implement and allows flexible data modeling due to hierarchical containers.

Status of this Document

The current status of this document is “Final”. Any new features or changes in the UJO document structure will be part of a new version of this specification.

Specification

All data is stored in a sequence of octets called a document.

Fundamental Types

The byte sequence is little-endian.

int8    1  octets (8-bit signed integer)
int16   2  octets (16-bit signed integer)
int32   4  octets (32-bit signed integer)
int64   8  octets (64-bit signed integer)
uint8   1  octets (8-bit unsigned integer)
uint16  2  octets (16-bit unsigned integer)
uint32  4  octets (32-bit unsigned integer)
uint64  8  octets (64-bit unsigned integer)
float16 2  octets (16-bit IEEE 754 floating point)
float32 4  octets (32-bit IEEE 754 floating point)
float64 8  octets (64-bit IEEE 754 floating point)

Document

A document starts with a header containing a magic sequence of octets to identify an UJO document. The version number allows implementations to identify and support different versions of the format.

The compression value in the header is always \x00. It is reserved to be used in later versions of UJO.

The content of the document is stored in a container. A container is a combination of atomic types and other containers. A document must contain at least one container. This container is called the top level container.

document      ::= "\x5F\x55\x4A\x4F" version compression container
version       ::= int16   this version = 1
compression   ::= "\x00"  uncompressed

Element

An element is either an atomic type data field or a container.

element ::= atomic
         | container
         | null

Container Types

A container type contains 0 or more elements. Nesting containers allows flexible data modeling.

container  ::= "\x30" list
             | "\x31" map
             | "\x32" table

List

A list is an ordered sequence of elements. The number of elements in the list can be 0.

list ::= (element*) "\x00"

Map

Map, associative array or dictionary composed of a collection of key/value pairs. The key can be of any atomic type. Similar values of different types are different keys. For example 42 (int32), ’42’ (string) or 42 (uint32). The number of key/value pairs can be 0.

UJO allows the same key multiple times in a single map. Depending on the application it does not make sense to use this feature.

map   ::= (pair*) "\x00" 
pair  ::= key value
key   ::= atomic
        | atomic_null
value ::= element

Table

A table is a set of rows and named columns. Each row must contain the same number of elements as the list of column names. The last row is terminates by “\x00″.

table   ::= columns (row*) "\x00"
columns ::= (string*) "\x00"
row     ::= (elements*)

Atomic Types

Atomic types start with a type identifier and do not contain any other element.

atomic::= "\x01" float64       double precision float (8+1 octets)
        | "\x02" float32       single precision float (4+1 octets)
        | "\x03" float16       half precision float (2+1 octets)
        | "\x04" string        UTF-8, UTF-16, UTF-32 or cstring
        | "\x05" int64         64 bit signed integer (8+1 octets)
        | "\x06" int32         32 bit signed integer (4+1 octets)
        | "\x07" int16         16 bit signed integer (2+1 octets)
        | "\x08" int8          8 bit signed integer (1+1 octets)
        | "\x09" uint64        64 bit unsigned integer (8+1 octets)
        | "\x0A" uint32        32 bit unsigned integer (4+1 octets)
        | "\x0B" uint16        16 bit unsigned integer (2+1 octets)
        | "\x0C" uint8         8 bit unsigned integer (1+1 octets)
        | "\x0D" boolean       True or False (2 octets)
        | "\x0E" binary        binary data
        | "\x0F" None          (Null, Void)
        | "\x10" int64         UNIX datetime (8+1 octets)
        | "\x11" date          date (4+1 octets)
        | "\x12" time          time (3+1 octets)
        | "\x13" timestamp     timestamp (9+1 octets)

Date and Time

A UNIX datetime is defined as seconds since 00:00:00 UTC, 1 January 1970. Negative values express the time before this particular date. Besides a UNIX datetime date and time can be expressed as a sequence of integer values. The UJO timestamp type is used by applications that need a resolution of 1 millisecond.

date      ::= year month day  
time      ::= hour minute second
datetime  ::= date time 
timestamp ::= date time millisec
 
year      ::= int16 negative values are BC
month     ::= uint8 values 1-12
day       ::= uint8 values 1-31   
hour      ::= unit8 values 0-23
minute    ::= uint8 values 0-59
second    ::= uint8 values 0-61
millisec  ::= uint16 values 0-999

Strings

UJO supports \x00 terminated c strings with one byte per character or Unicode strings. Unicode strings are not \x00 terminated. The length of a Unicode string is not the number of bytes. Depending of the format each unit is 8, 16 or 32 bit. UJO stores the number of units as length of the string.

string  ::= strsize strsub (unit*)
strsize ::= uint32    number of units
strsub  ::= "\x00"   8 bit "\x00" terminated string (cstring)
          | "\x01"   UTF-8
          | "\x02"   UTF-16
          | "\x03"   UTF-32
          | "\x80"   User defined (\x80 - \xFF)
unit    ::= uint8
          | uint16
          | uint32

Boolean

Boolean values, True and False are defined with a single octet.

boolean  ::= "\x00"    False
           | "\x01"    True

Binary

The binary type is a sequence of octets. The type of data is either a generic binary or defined by a specific subtype.

binary   ::= binsize binsub (uint8*)
binsize  ::= uint32             number of octets
bin_sub  ::= "\x00"             Generic binary subtype
           | "\x01"   UJO document
           | "\x80"   User defined (\x80 - \xFF)

Null Types

The null value is defined for any type to express emptiness. The difference to none or void is the type information. None is a type of its own and is not suitable to define e.g. an empty string.

null ::= atomic_null

atomic_null ::= "\x81"  empty float64 (double)
              | "\x82"  empty float32 (single)
              | "\x83"  empty float16 (half)
              | "\x84"  empty string
              | "\x85"  empty signed integer64
              | "\x86"  empty signed integer32
              | "\x87"  empty signed integer16
              | "\x88"  empty signed integer8
              | "\x89"  empty unsigned integer64
              | "\x8A"  empty unsigned integer32
              | "\x8B"  empty unsigned integer16
              | "\x8C"  empty unsigned integer8
              | "\x8D"  empty boolean
              | "\x8E"  empty binary
              | "\x90"  empty UTC UNIX datetime
              | "\x91"  empty UTC date
              | "\x92"  empty UTC time
              | "\x93"  empty timestamp

File Extension

The recommended file extension for UJO data is *.ujo.