This small set of libraries offers a homogeneous API between 2 types and their derivations with the slice type, as well as a small DSL for decoding "packets" (such as ARP or DNS) without too much difficulty.
The aim is to homogenize the 2 types bytes and bigstring and to derive them
with a slice type, giving the user all the levers needed to manipulate byte
sequences, whether in the form of a bigstring or bytes. The slice view avoids
copying when it comes to decoding a packet and extracting a sub-part. The slice
also applies to bigstrings, whose Bigarray.Array1.sub is more expensive.
This set of libraries is a synthesis of astring (which offers a range of useful functions as well as slice), cstruct (which offers a similar API for bigstrings), bigstringaf (which offers some other useful functions), the standard OCaml library and repr for decoding/encoding these values into OCaml records/variants.
Here is an overview of the functions offered by bstr compared to other
libraries:
| bstr | cstruct | bigstringaf | slice.bstr | |
|---|---|---|---|---|
overlap |
✅ | ❌ | ❌ | ✅ |
memcpy |
✅ | ❌ | ✅ | ✅ |
memmove |
✅ | ✅ | ✅ | ✅ |
fast sub |
❌ | ❌ | ❌ | ✅ |
fast blit |
✅ | ❌ | ❌ | ✅ |
| release GC lock | ✅ | ❌ | ❌ | ✅ |
fast contains |
✅ | ❌ | ✅ | ✅ |
sub is perhaps the most useful operation for a bigarray. In fact, unlike bytes
and strings, sub offers a view (equivalent or smaller) of a bigarray without
making a copy. If, for example, you need to decode1 a large sequence of bytes
(without having the notion of a "stream"), it may be useful to use the sub
operation to decode the information byte by byte and avoid copying throughout
the decoding process.
The implementation of sub proposed by Bstr is a little different from that
of the standard OCaml library. In fact, it is specialized for a bigarray of
dimension 1 containing bytes. In fact, the Bigarray.Array1.sub function is a
little more generic and Bstr takes the opportunity to "specialize" the
function according to our type.
However, according to the representation proposed by Cstruct, Cstruct.sub
remains the fastest operation compared to Bstr and Bigstringaf. If you
want to have the same performance as Cstruct, the specialized Slice module
for Bstr.t values is equivalent.
Here is a comparative table of the sub function between all implementations
(AMD Ryzen 9 7950X 16-Core Processor):
| bigstringaf | bstr | cstruct | slice | |
|---|---|---|---|---|
sub |
20.0 ns | 17.8ns | 2.8ns | 2.4ns |
blit from a string or a bytes is a little faster than Bigstringaf and
Cstruct. The difference basically lies in the fact that Bstr.t uses other
"tags" to describe the FFI with the C memcpy function (specifically the
[@untagged] tag).
Here is a comparative table of the blit_from_string function between all the
implementations:
| bigstringaf | bstr | cstruct | |
|---|---|---|---|
blit_from_string |
5.1ns | 4.3ns | 4.7ns |
There are 2 ways to copy bytes between two bigarrays:
- the "mmaped" version (
{memcpy,memmove}_mmaped) - the simple version (
{memcpy,memmove})
The first is quite specific because it releases the GC lock after a certain
number of bytes (4096) have been copied. This can be advantageous if you want
to make a large copy between two bigarrays in parallel in a Thread.
If we specify mmaped, it is because the copy between two bigarrays, one of
which may come from Unix.map_file, can also take time (and we may want to
do it in parallel in a Thread) since it involves reading/writing on the disk.
let copy_to_file bstr filename () =
let len = Bstr.length bstr in
let fd = Unix.openfile filename Unix.[ O_WRONLY ] 0o644 in
let dst = Unix.map_file fd Bigarray.char Bigarray.c_layout false [| len |] in
let dst = Bigarray.array1_of_genarray dst in
Bstr.memcpy_mmaped bstr ~src_off:0 dst ~dst_off:0 ~len
let () =
let th = Thread.create (copy_to_file bstr filename) () in
(* do something else in true parallel of [copy_to_file]. *)
(* the GC will not interrupt [th] during the copy. *)
Thread.join thThe simple version does not release the GC lock and only applies the
desired function (memmove or memcpy).
Bstr.blit always uses the memmove function. However, it can be
advantageous to use memcpy in a fairly specific case: when you know that the
source refers to a memory area that is not shared with the destination.
To find out, you can use the Bstr.overlap function, which checks whether or
not the two bigarrays given have a common memory area.
Footnotes
-
Binis currently being designed with this in mind. ↩