-
Notifications
You must be signed in to change notification settings - Fork 25
Iceberg virtual graph (SPARQL over Iceberg) + R2RML mapping (alpha) #1185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bplatz
wants to merge
104
commits into
main
Choose a base branch
from
feature/iceberg
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Create virtual graph nameservice schema for storing BM25 and other VG configs - Add Virtual Graph Manager component for monitoring and managing VGs - Implement create-virtual-graph API function - Add nameservice support for publishing/retrieving virtual graph records - Create test suite for virtual graph functionality - Update nameservice storage to handle both commit and VG record types This enables virtual graphs to be stored as metadata in the nameservice rather than as data within user ledgers.
- Fix virtual graph storage to persist BM25 indexes to disk - Fix circular dependency between BM25 index and storage namespaces - Make memory store list-paths consistent with file store (non-recursive) - Fix type serialization in virtual graph data handling - Add proper type handling for both SID and string formats - Remove unused virtual-graph? function that always returned false - Remove unused virtual graph manager namespace - Remove unused virtual-graphs atom from Ledger record - Extract complex parsing logic into named helper functions - Fix try*/catch* usage in .clj files - Add validation to prevent @ symbols in graph and ledger names - Move virtual graphs to ns@v1/ directory alongside ledgers - Update nameservice scanning to only check immediate directory - Fix volatile handling consistency across callers - Maintain JSON-LD query format with string keys - Add virtual graph loader from nameservice - Update query API to handle virtual graph loading - Fix reflection warnings with proper type hints - Ensure consistent storage behavior across implementations
- Removed FlatRank virtual graph implementation (##-prefixed syntax) - Moved BM25 creation logic from api.cljc to virtual-graph.create namespace - Improved BM25 creation to validate ledgers exist before publishing to nameservice - Fixed async handling in load-and-validate-ledgers function - Consolidated test files and removed debugging tests - Added comprehensive BM25 tests for memory and federated queries - Fixed nameservice query test to be Clojure-only - Cleaned up unused imports and improved code organization This completes the migration to function-based vector search and simplifies the virtual graph creation API.
Major improvements to nameservice storage implementation: - Convert JSON-LD generation to multimethod pattern for extensibility - Consolidate filename generation logic to support both ledgers and VGs - Make retract API consistent by passing resource names instead of filenames - Simplify dependency registration to work with JSON-LD format - Clean up VG dependency tracking (remove non-functional update logic) - Fix migration code to use new multimethod API The publish flow is now cleaner with clear separation of concerns: 1. Convert record to JSON-LD (multimethod based on record type) 2. Write to storage with appropriate filename 3. Register dependencies if needed (VGs only currently)
- Move drop-virtual-graph functions to dedicated virtual-graph.drop namespace - Fix critical bug in unregister-vg-dependencies where empty map was wiping all dependencies - Update drop-ledger to use go-try for proper exception propagation - Add comprehensive drop tests with dependency checking - Replace manual UTF-8 conversion with util.bytes functions - Fix reflection warning in nameservice storage
- Preserved virtual graph-based BM25 implementation from feature branch - Applied main's RecursiveListableStore changes for slash handling in ledger names - Updated drop.cljc to use RecursiveListableStore protocol - Fixed API changes: fluree/create now returns db directly - Removed index_test.clj as vector tests moved to flatrank_test.clj - All tests passing
- Update drop.cljc to use RecursiveListableStore instead of ListableStore - Fix flatrank_test.clj to use new API (fluree/create returns db directly)
…ation tests for mapping relational data to RDF
…ring by literal values in WHERE clauses
… generation, enhancing query execution and result streaming.
…dling of triples maps
…adding filter expression handling in SQL generation; improve integration tests for accurate data retrieval and filtering.
- Introduced a new function `parse-r2rml-from-triples` to encapsulate the logic for parsing R2RML mappings from grouped triples. - Updated `parse-min-r2rml` to support inline TTL and JSON-LD mappings, allowing for more flexible mapping sources. - Modified integration tests to use a consistent query structure with string keys instead of keywords for better compatibility. - Added new tests for inline TTL and JSON-LD mappings to ensure correct functionality and data retrieval. - Improved logging for debugging purposes in various functions.
…atypes, and templates in object mappings; include integration tests for each feature.
…pdate integration tests to verify functionality.
- Resolved conflicts by keeping R2RML support additions - Accepted removal of ledger objects from API (as per bm25-ns) - Kept virtual graph enhancements from both branches - Fixed variable naming (vars -> var-config) in FQL parse
…and SQL execution
…ports, and routes
zonotope
approved these changes
Jan 16, 2026
Contributor
zonotope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💿
Resolves merge conflicts: - deps.edn: Combined graalvm logging deps with iceberg config - api.cljc: Added trace/async-form wrapper to ledger-info while preserving Iceberg VG ledger-info support - query/api.cljc: Added trace/async-form wrapper to load-alias while preserving R2RML and Iceberg VG handling logic Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds Iceberg Virtual Graphs (alpha): query Apache Iceberg tables via SPARQL, backed by a subset of R2RML mappings for column→RDF term projection.
While considered alpha, it is ready for use and includes multi-table joins (RefObjectMap edges), predicate/VALUES pushdown, OPTIONAL semantics, UNION, property paths, aggregation/HAVING, plus supporting scripts, tests, and GraalVM native-image configuration.
Important dependency on
feature/bm25-nsThis work depends on
feature/bm25-ns, which removed “embedded” virtual graphs from traditional Fluree ledgers and made virtual graphs separate entities. Iceberg virtual graphs rely on that nameservice-based VG lifecycle (registration/loading) rather than being stored inside ledger data.How to use
See
docs/iceberg-virtual-graph.md:fluree.db.api/connect-iceberg(JVM-only), then query usingFROM <your-vg-name>/from ["your-vg-name"]Key changes
src/fluree/db/virtual_graph/iceberg*.clj(planning, joins, pushdown, R2RML mapping, query execution)src/fluree/db/tabular/iceberg/*+ related tabular protocols (scans, stats, Arrow batches, FileIO abstractions)src/fluree/db/api.cljcaddsconnect-iceberg/disconnect-icebergsrc/fluree/db/query/api.cljcloads Iceberg VG config from nameservice and supports time-travel alias suffixestest-iceberg/+test/fluree/db/virtual_graph/iceberg/*resources/META-INF/native-image/com.fluree/db/*+graalvm/*Testing
Iceberg-specific tests are in
test-iceberg/andtest/fluree/db/virtual_graph/iceberg/*(benchmarks described indocs/iceberg-virtual-graph.md).