@jtgrassie pointed out that the example I provided was colored incorrectly. He also made the good point that the image wasn't easy to review/correct. I reworked the example so that it's text-only. It's easier to review and edit, and reveals the structure better in my opinion. Also this is easier for people who can't distinguish colors as easily. Make sure to double-check this work because there's a decent chance I screwed up the comments. The actual byte data was generated and should be solid.
10 KiB
Portable Storage Format
Background
Monero makes use of a set of helper classes from a small library named epee. Part of this library implements a networking protocol called Levin, which internally uses a storage format called Portable Storage. This format (amongst the rest of the epee library), is undocumented - or rather relies on the code itself to serve as the documentation. Unfortunately, whilst the rest of the library is fairly straightforward to decipher, the Portable Storage is less-so. Hence this document.
String and Integer Encoding
Integers
With few exceptions, integers serialized in epee portable storage format are serialized as little-endian.
Varints
Varints are used to pack integers in an portable and space optimized way. Varints are stored as little-endian integers, with the lowest 2 bits storing the amount of bytes required, which means the largest value integer that can be packed into 1 byte is 63 (6 bits).
Byte Sizes
Lowest 2 bits | Size value | Value range |
---|---|---|
b00 | 1 byte | 0 to 63 |
b01 | 2 bytes | 64 to 16383 |
b10 | 4 bytes | 16384 to 1073741823 |
b11 | 8 bytes | 1073741824 to 4611686018427387903 |
Represenations of Example Values
Value | Byte Representation (hex) |
---|---|
0 | 00 |
7 | 1c |
101 | 95 01 |
17,000 | A2 09 01 00 |
7,942,319,744 | 03 BA 98 65 07 00 00 00 |
Strings
These are simply length (varint) prefixed char strings without a null terminator (though one can always add one if desired). There is no specific encoding enforced, and in fact, many times binary blobs are stored as these strings. This type should not be confused with the keys in sections, as those are restricted to a maximum length of 255 and do not use varints to encode the length.
"Howdy" => 14 48 6F 77 64 79
Section Keys
These are similar to strings except that they are length limited to 255 bytes, and use a single byte at the front of the string to describe the length (as opposed to a varint).
"Howdy" => 05 48 6F 77 64 79
Binary Format Specification
Header
The format must always start with the following header:
Field | Type | Value |
---|---|---|
Signature Part A | UInt32 | 0x01011101 |
Signature Part B | UInt32 | 0x01020101 |
Version | UInt8 | 0x01 |
In total, the 9 byte header will look like this (in hex): 01 11 01 01 01 01 02 01 01
Section
Next we have a root object (or section as the library calls it). This is a map of name-value pairs called entries. It starts with a count:
Section | Type |
---|---|
Entry count | varint |
Which is followed by the section's name-value entries sequentially:
Entry
Entry | Type |
---|---|
Name | section key |
Type | byte |
Count1 | varint |
Value(s) | (type dependant data) |
1 Note, this is only present if the entry type has the array flag (see below).
Entry types
The types defined are:
#define SERIALIZE_TYPE_INT64 1
#define SERIALIZE_TYPE_INT32 2
#define SERIALIZE_TYPE_INT16 3
#define SERIALIZE_TYPE_INT8 4
#define SERIALIZE_TYPE_UINT64 5
#define SERIALIZE_TYPE_UINT32 6
#define SERIALIZE_TYPE_UINT16 7
#define SERIALIZE_TYPE_UINT8 8
#define SERIALIZE_TYPE_DOUBLE 9
#define SERIALIZE_TYPE_STRING 10
#define SERIALIZE_TYPE_BOOL 11
#define SERIALIZE_TYPE_OBJECT 12
#define SERIALIZE_TYPE_ARRAY 13
The entry type can be bitwise OR'ed with a flag:
#define SERIALIZE_FLAG_ARRAY 0x80
This signals there are multiple values for the entry. Since only one bit is
reserved for specifying an array, we can not directly represent nested arrays.
However, you can place each of the inner arrays inside of a section, and make
the outer array type SERIALIZE_TYPE_OBJECT | SERIALIZE_FLAG_ARRAY
. Immediately following the type code byte is a varint specifying the length of the array.
Finally, the all the elements are serialized in sequence with no padding and
without any type information. For example:
type, count, value1, value2,..., valuen
Entry values
It's important to understand that entry values can be encoded any way in which an implementation chooses. For example, the integers can be in either big or little endian byte order.
Entry values which are objects (i.e. SERIALIZE_TYPE_OBJECT
), are stored as
sections.
Note, I have not yet seen the type SERIALIZE_TYPE_ARRAY
in use. My assumption
is this would be used for untyped arrays and so subsequent entries could be of
any type.
Overall example
Let's put it all together and see what an entire object would look like serialized. To represent our data, let's create a JSON object (since it's a format that most will be familiar with):
{
"short_quote": "Give me liberty or give me death",
"long_quote": "Monero is more than just a technology. It's also what the technology stands for.",
"signed_32bit_int": 20140418,
"array_of_bools": [true, false, true, true],
"nested_section": {
"double": -6.9,
"unsigned_64bit_int": 11111111111111111111
}
}
This object would translate into the following bytes when serialized into epee portable storage format. The bytes are represented in hex, with comments and whitespace added for readability.
01 11 01 01 01 01 02 01 // Signature
01 // Version
14 // Varint number of section entries (5)
0b // Length of next section key (11)
73 68 6f 72 74 5f 71 75 6f 74 65 // Section key ("short_quote")
0a // Type code (STRING)
80 // Varint length of string (32)
47 69 76 65 20 6d 65 20 6c 69 62 65 72 74 79 20 // STRING value ("Give me liberty ")
6f 72 20 67 69 76 65 20 6d 65 20 64 65 61 74 68 // STRING value cont. ("or give me death")
0a // Length of next section key (10)
6c 6f 6e 67 5f 71 75 6f 74 65 // Section key ("long_quote")
0a // Type code (STRING)
41 01 // Varint length of string (80). Note it's 2 bytes
4d 6f 6e 65 72 6f 20 69 73 20 6d 6f 72 65 20 74 // STRING value ("Monero is more t")
68 61 6e 20 6a 75 73 74 20 61 20 74 65 63 68 6e // STRING value cont. ("han just a techn")
6f 6c 6f 67 79 2e 20 49 74 27 73 20 61 6c 73 6f // STRING value cont. ("ology. It's also")
20 77 68 61 74 20 74 68 65 20 74 65 63 68 6e 6f // STRING value cont. (" what the techno")
6c 6f 67 79 20 73 74 61 6e 64 73 20 66 6f 72 2e // STRING value cont. ("logy stands for.")
10 // Length of next section key (16)
73 69 67 6e 65 64 5f 33 32 62 69 74 5f 69 6e 74 // Section key ("signed_32bit_int")
02 // type code (INT32)
82 51 33 01 // INT32 value (20140418)
0e // Length of next section key (14)
61 72 72 61 79 5f 6f 66 5f 62 6f 6f 6c 73 // Section key ("array_of_bools")
8b // Type code (BOOL | FLAG_ARRAY)
10 // Varint number of array elements (4)
01 00 01 01 // Array BOOL values [true, false, true, true]
0e // Length of next section key (14)
6e 65 73 74 65 64 5f 73 65 63 74 69 6f 6e // Section key ("nested_section")
0c // Type code (OBJECT)
08 // Varint number of inner section entries (2)
06 // Length of first inner section key (6)
64 6f 75 62 6c 65 // Section key ("double")
09 // Type code (DOUBLE)
9a 99 99 99 99 99 1b c0 // DOUBLE value (-6.9)
12 // Length of second inner section key (18)
75 6e 73 69 67 6e 65 64 5f 36 34 62 69 74 5f 69 // Section key ("unsigned_64bit_i")
6e 74 // Section key cont ("nt")
05 // Type code (UINT64)
c7 71 ac b5 af 98 32 9a // UINT64 value (11111111111111111111)
Monero specifics
Entry values
Hashes, Keys, Blobs
These are stored as strings, SERIALIZE_TYPE_STRING
.
STL containers (vector, list)
These can be arrays of standard integer types, strings or
SERIALIZE_TYPE_OBJECT
's for structs.