-
Notifications
You must be signed in to change notification settings - Fork 54
Records
A BinData record declaration is a class containing one or more fields.
class MyName < BinData::Record
type :field_name, param1: "foo", param2: bar, ...
...
end
Each field has:
type
: the name of a builtin type (e.g. uint32be
, string
, array
)
or a user defined type. For user defined types, the class name is
converted from CamelCase
to lowercased underscore_style
.
field_name
: the name to access the field. Must be a Symbol
.
If omitted, then this field is anonymous. An anonymous field is
still read and written, but will not appear in #snapshot
.
Fields may have optional parameters.
The parameters are passed as a Hash
with Symbols
for keys.
Parameters are designed to be lazily evaluated, possibly multiple times.
This means that any parameter value must not have side effects.
Examples of legal values for parameters are:
param: 5
param: -> { foo + 2 }
param: :bar
Most parameters will have literal values, such as 5
.
If the value is not a literal, it is expected to be a lambda. The
lambda will be evaluated in the context of the parent. In this case
the parent is an instance of MyName
.
A symbol is taken as syntactic sugar for a lambda containing the
value of the symbol.
e.g param: :bar
is equivalent to param: -> { bar }
A common occurence in binary file formats is one field depending upon the value of another. e.g. A string preceded by its length.
As an example, let's assume a Pascal style string where the byte preceding the string contains the string's length.
# reading
io = File.open(...)
len = io.getc
str = io.read(len)
puts "string is " + str
# writing
io = File.open(...)
str = "this is a string"
io.putc(str.length)
io.write(str)
Here's how we'd implement the same example with BinData.
class PascalString < BinData::Record
uint8 :len, value: -> { data.length }
string :data, read_length: :len
end
# reading
io = File.open(...)
ps = PascalString.new
ps.read(io)
puts "string is " + ps.data
# writing
io = File.open(...)
ps = PascalString.new
ps.data = "this is a string"
ps.write(io)
This syntax needs explaining. Let's simplify by examining reading and writing separately.
class PascalStringReader < BinData::Record
uint8 :len
string :data, read_length: :len
end
This states that when reading the string, the initial length of the
string (and hence the number of bytes to read) is determined by the
value of the len
field.
Note that read_length: :len
is syntactic sugar for
read_length: -> { len }
, as described previously.
class PascalStringWriter < BinData::Record
uint8 :len, value: -> { data.length }
string :data
end
This states that the value of len
is always equal to the length of
data
. len
may not be manually modified.
Combining these two definitions gives the definition for PascalString
as previously defined.
It is important to note with dependencies, that a field can only depend on one before it. You can't have a string which has the characters first and the length afterwards.
The endianess of numeric types must be explicitly defined so that the code produced is independent of architecture. However, explicitly specifying the endian for each numeric field can result in a bloated declaration that is difficult to read.
class A < BinData::Record
int16be :a
int32be :b
int16le :c # <-- Note little endian!
int32be :d
float_be :e
array :f, type: :uint32be
end
The endian
keyword can be used to set the default endian. This makes
the declaration easier to read. Any numeric field that doesn't use the
default endian can explicitly override it.
class A < BinData::Record
endian :big
int16 :a
int32 :b
int16le :c # <-- Note how this little endian now stands out
int32 :d
float :e
array :f, type: :uint32
end
The increase in clarity can be seen with the above example. The
endian
keyword will cascade to nested types, as illustrated with the
array in the above example.
The endian keyword can also be used to identify custom types that have
endianness. To do this, the class name of the custom types must end with
Le
for little endian, and Be
for big endian.
class CoordLe < BinData::Record
endian :little
int16 :x
int16 :y
end
class CoordBe < BinData::Record
endian :big
int16 :x
int16 :y
end
class Rectangle < BinData::Record
endian :little
coord :upper_left # <-- Here CoordLe is automatically
coord :lower_right # <-- assumed
end
You may wish to declare :big
and :little
versions of a custom
type.
class Coord < BinData::Record
endian :big_and_little
int16 :x
int16 :y
end
is equivalent to
class CoordLe < BinData::Record
endian :little
int16 :x
int16 :y
end
class CoordBe < BinData::Record
endian :big
int16 :x
int16 :y
end
The :endian
can be specified when instantiating the type.
class Coord < BinData::Record
endian :big_and_little
int16 :x
int16 :y
end
c = Coord.new(endian: :big, x: 1, y: 2)
c.to_binary_s #=> "\x00\x01\x00\x02"
BinData supports anonymous nested records. The struct
keyword declares
a nested structure that can be used to imply a grouping of related data.
class LabeledCoord < BinData::Record
string :label, length: 20
struct :coord do
endian :little
double :x
double :z
double :y
end
end
pos = LabeledCoord.new(label: "red leader")
pos.coord.assign(x: 2.0, y: 0, z: -1.57)
This nested structure can be put in its own class and reused. The above example can also be declared as:
class Coord < BinData::Record
endian :little
double :x
double :z
double :y
end
class LabeledCoord < BinData::Record
string :label, length: 20
coord :coord
end
A record may contain optional fields. The optional state of a field is
decided by the :onlyif
parameter. If the value of this parameter is
false
, then the field will be as if it didn't exist in the record.
class RecordWithOptionalField < BinData::Record
...
uint8 :comment_flag
string :comment, length: 20, onlyif: :has_comment?
def has_comment?
comment_flag.nonzero?
end
end
In the above example, the comment
field is only included in the record
if the value of the comment_flag
field is non zero.
You can determine if an :onlyif
field is included with the #field?
method.
obj = RecordWithOptionalField.read "..."
puts obj.comment if obj.comment?
A more advanced usage of :onlyif
can be found in the file_name
and comment
fields of the gzip example.
Compiled languages often generate binary structures where the fields are aligned to set byte boundaries. These byte boundaries are typically the word size of the architecture. The generated structure employ padding between fields that aren't a multiple of this byte alignment.
The :byte_align
parameter can be supplied to fields to ensure that they occur
on the aligned byte boundary.
class RecordWithAlignedFields < BinData::Record
endian :little
uint32 :a, byte_align: 4
uint16 :b, byte_align: 2
uint32 :c, byte_align: 4
uint8 :d
uint16 :e, byte_align: 2
uint32 :f, byte_align: 4
end
r = RecordWithAlignedFields.new
r.a.rel_offset #=> 0
r.b.rel_offset #=> 4
r.c.rel_offset #=> 8
r.d.rel_offset #=> 12
r.e.rel_offset #=> 14
r.f.rel_offset #=> 16
You can DRY the declaration by creating integer types that are automatically aligned.
class AUint32Le < BinData::Uint32le
default_parameter :byte_align => 4
end
class AUint16Le < BinData::Uint16le
default_parameter :byte_align => 2
end
class AUint8 < BinData::Uint8
# aliased for consistency
end
class RecordWithAlignedFields2 < BinData::Record
endian :little
a_uint32 :a
a_uint16 :b
a_uint32 :c
a_uint8 :d
a_uint16 :e
a_uint32 :f
end
Occasionally you need to perform some assert
checks on multiple related
fields. Virtual fields allow you to do this.
class UnitCoord < BinData::Record
endian :little
double :x
double :y
virtual assert: -> { (x**2 + y**2 - 1.0).abs < 0.000001 }
end
The above example describes a cartesian coordinate that is normalised to a
magnitude of 1. The assert
will be performed after reading in both x
and
y
.
An #assert!
method is provided that can be called manually.
class UnitCoord < BinData::Record
endian :little
double :x
double :y
virtual :valid, assert: -> { (x**2 + y**2 - 1.0).abs < 0.000001 }
end
coord = UnitCoord.new(x: 0.3, y: 0.2)
coord.valid.assert! #=> raises BinData::ValidityError: assertion failed for obj.valid