p2c is an educational compiling query engine.
Given an operator tree (query plan), it generates C++ code (hence plan-to-code).
The generated code is nicely formatted and can be inspected in gen.cpp.
Components:
p2c.cpp- Main query compiler that generates C++ code from operator treestypes.hpp- Type system supporting integers, doubles, strings, datestpch.hpp- TPC-H schema definitions and database autoloadingio.hpp- Memory-mapped I/O with columnar data accessqueryFrame.cpp- Runtime framework that executes generated code
You will need:
- A C++23 compiler (gcc >= 14, clang >= 19)
- Alternatively: A C++20 compiler and
libfmt - Optionally: clang-format to format generated code
To run a query, follow these steps:
- Data Generation: Convert TPC-H CSV data to optimized binary columnar format
- Code Generation: Transform query operators into optimized C++ code
- Compilation: Build executable with generated code and runtime framework
- Execution: Load data using memory-mapped files and execute generated code
cd data-generator
./generate-data.shThis creates scale factor 1 TPC-H data in data-generator/output/.
The script first uses the dbgen tool to generate csv files, then reads and converts them to binary data.
make p2c # Build the query compiler and sample query in p2c.cpp#main
make query # Compile generated query code
make # Does all of the above # Run with default data location
./query
# Specify data path and run count
./query data-generator/output 3The current implementation includes a sample query equivalent to TPC-H query 5.