ScarDB is a lightweight Relational Database Management System (RDBMS) implemented in Rust. It features a custom storage engine, a schema-driven binary serialization format, B-Tree indexing for optimized lookups, and a relational execution engine capable of performing Nested Loop Joins.
This project demonstrates the implementation of low-level database internals, moving from raw byte manipulation on disk to a high-level SQL interface.
- rdbms.oscardev.site (SQL Interface)
screenrecording-2026-01-14_19-58-37.mp4
- Custom Pager-Based Storage: Manages data in 4KB pages to optimize disk I/O.
-
Fixed-Length Binary Format: Uses a schema-driven binary format for storage, ensuring
$O(1)$ row access via offsets. - B-Tree Indexing: Implements primary key constraints and optimized point-lookups using a memory-resident B-Tree index.
- Query Optimizer: A built-in planner that automatically switches from a "Full Table Scan" to an "Index Lookup" when filtering by Primary Key.
-
Relational Joins: Supports
INNER JOINoperations using a Nested Loop Join algorithm. -
Full CRUD Support: Supports
CREATE,INSERT,SELECT,UPDATE,DELETE, andDROP TABLE. -
Advanced SQL Features: Includes
AUTOINCREMENTfor primary keys andWHEREclause filtering. - Interactive REPL: A full-featured CLI with command history.
- Web Integration: A REST API demo showing ScarDB used as a library to power a web application.
Data is stored in a dedicated .db file for each table. To manage space efficiently:
- Pages: The file is divided into 4096-byte blocks.
- Bitmask Management: Each page contains a 64-byte header with a bitmask. This allows ScarDB to track occupied vs. empty slots, ensuring that when a row is deleted, the space is immediately reclaimed for the next
INSERT.
The catalog.json file acts as the database's "brain." It persists table schemas (column names, types, primary key flags) and sequences for AUTOINCREMENT counters.
The engine transforms SQL AST (Abstract Syntax Tree) into logical commands:
- Index Optimization: If a query filters on a Primary Key (e.g.,
WHERE id = 5), the engine bypasses the file scan and probes the B-Tree for the exact page and slot. - Joins: Joins are handled by a Nested Loop Join. The engine iterates through the "Outer" table and matches records in the "Inner" table based on the join predicate.
- Why Rust?: I chose rust for its zero-cost abstractions and memory safety. It allowed for safe raw byte manipulation when serializing data for disk storage.
-
Why Fixed-Length Records?: By requiring a max length for strings (
VARCHAR), we ensure that every row in a table is the same size. This allows for extremely fast "In-Place Updates" and predictable offset math. -
B-Tree Indexing: I chose a memory-resident B-Tree (reconstructed on startup) to ensure
$O(\log N)$ lookup performance while maintaining code simplicity for this challenge.
-- Table Creation
CREATE TABLE users (id INT PRIMARY KEY AUTOINCREMENT, name VARCHAR(20), active BOOLEAN);
-- Data Manipulation
INSERT INTO users (name, active) VALUES ('Oscar', true);
UPDATE users SET active = false WHERE name = 'Oscar';
DELETE FROM users WHERE id = 1;
-- Querying & Joining
SELECT * FROM users WHERE active = true;
SELECT * FROM users JOIN orders ON users.id = orders.user_id;
-- Cleanup
DROP TABLE users;- Rust (latest stable)
cargo run --package clicargo run --package webThen, access the API at http://localhost:3000.
cargo test- sqlparser-rs: Used for parsing SQL strings into an AST.
- rustyline: Used for the interactive REPL interface.
- axum/tokio: Used for the web demonstration.
- serde: Used for catalog serialization.
This challenge was a deep dive into the "magic" of databases. Implementing the pager and the bitmask was particularly enlightening, as it forced me to think about how data actually lives on a physical platter rather than just in memory. Even if not every SQL edge case is handled, the core engine demonstrates a robust understanding of relational theory and systems programming.