-
Notifications
You must be signed in to change notification settings - Fork 0
Design: JSONB Repository
This page discusses why it's better to represent commits as JSONB instead of normalized relational data.
The initial approach with pg_delta was to model the entire repository, track, stage and commit pipeline as relational data. This approach works functionally, but has a drawback: Performance. Each commit contains rows, which contains fields, which reference blob literal values. If the repository data structure is modeled relationally, a row has to be inserted into the database every time you track, stage and commit -- not just a row for every row in the commit, but a row for every field of every row in the commit. A table that has 10 columns and 100 rows would need to insert 100 rows for every row, plus 100 x 10 = 1000 rows for every field. Inserting this many rows is a hard bottleneck on performance and pg_delta will never be fast unless we change approaches.
So, I'm starting a new branch that handles things totally differently -- it stores commit contents in a jsonb column called manifest, so that creating a commit needs to only insert one row, with a complex jsonb data structure describing its contents.
-
name text- Repository name -
head_commit_id uuid- The id of the commit that is currently being developed against. Next commit will use this commit as its parent, and all track, stage and diff functions compare the live db against the contents of this commit. -
checkout_commit_id uuid- The commit that is currently checked out into the database, maintained by thedelta.checkout()function. -
tracked_rows_added jsonb- Array ofrow_id::textvalues -
stage_rows_to_add jsonb- Array ofrow_id::textvalues? See Stage-by-Reference vs. Value -
stage_rows_to_remove jsonb- Array ofrow_id::textvalues -
stage_fields_to_change jsonb- Array offield_id::textvalues
Rows that have been staged for inclusion in the next commit, along with their value at stage time.
JSONB Object with key row_id::text and value of jsonb object with key column name and value of field value at stage time.
{
"(shakespeare,character,{id},{Aaron})": {
"id": "Aaron",
"name": "Aaron",
"abbrev": "AARON",
"description": "a Moor, beloved by Tamora",
"speech_count": 57
},
"(shakespeare,character,{id},{Bagot})": {
"id": "Bagot",
"name": "Bagot",
"abbrev": "BAGOT",
"description": "servant to King Richard II",
"speech_count": 6
}
}JSONB Array of row_id::text values - rows that have been staged for deletion from the previous commit, to be excluded in the next commit
[
"(shakespeare,character,{id},{Aaron})",
"(shakespeare,character,{id},{Bagot})"
]
JSONB Object, with key field_id::text and value of field's value at stage time.
{
"(shakespeare,character,{id},{Aaron},name)": "Sally",
"(shakespeare,character,{id},{Bagot},name)": "Joe"
}commit_time textmessage textauthor_name textauthor_email text-
parent_id uuid- foreign key to the previous commit, or NULL for first commit. -
merge_parent_id uuid- If this commit is a merge, references the second commit that is being merged with theparent_idcommit. -
manifest jsonb- commit contents
JSONB data structure that holds the following information:
-
meta.row_ids of the rows in the commit - the order in which rows should be inserted on checkout, honoring dependencies
- for each row, the columns that were committed and their values / value hashes
- hash values
Designing this data structure is tricky. Open questions are:
a) How is row order preserved?
- In an array, which preserves order (because jsonb "does not preserve the order of object keys.")
- As explicit integer value
b) How are column values stored?
- As a jsonb object with keys for column names and hashed field values for the key val. Simple but inefficient, lots of redundancy.
- As an array of value hashes, whose position corresponds to some kind of table/column manifest.
c) Where are blobs stored?
- Manifest has a max size of 255MB (hard limit on max json size) so we want to keep this mostly metadata.
- Storing them in the blob table means a row-insert for every field of every row in the commit. Slow!
- JSONB object's key-val structure is a natural fit, but max size might require some chunking