Skip to content

Conversation

@bplatz
Copy link
Contributor

@bplatz bplatz commented Jan 8, 2026

This PR adds Iceberg Virtual Graphs (alpha): query Apache Iceberg tables via SPARQL, backed by a subset of R2RML mappings for column→RDF term projection.

While considered alpha, it is ready for use and includes multi-table joins (RefObjectMap edges), predicate/VALUES pushdown, OPTIONAL semantics, UNION, property paths, aggregation/HAVING, plus supporting scripts, tests, and GraalVM native-image configuration.

Important dependency on feature/bm25-ns

This work depends on feature/bm25-ns, which removed “embedded” virtual graphs from traditional Fluree ledgers and made virtual graphs separate entities. Iceberg virtual graphs rely on that nameservice-based VG lifecycle (registration/loading) rather than being stored inside ledger data.

How to use

See docs/iceberg-virtual-graph.md:

  • Register/connect via fluree.db.api/connect-iceberg (JVM-only), then query using FROM <your-vg-name> / from ["your-vg-name"]
  • R2RML mapping guidance, examples (OPTIONAL/UNION/property paths), pushdown verification, time travel alias formats, performance notes, troubleshooting

Key changes

  • Iceberg VG runtime: src/fluree/db/virtual_graph/iceberg*.clj (planning, joins, pushdown, R2RML mapping, query execution)
  • Iceberg tabular source: src/fluree/db/tabular/iceberg/* + related tabular protocols (scans, stats, Arrow batches, FileIO abstractions)
  • Public API: src/fluree/db/api.cljc adds connect-iceberg / disconnect-iceberg
  • Alias loading: src/fluree/db/query/api.cljc loads Iceberg VG config from nameservice and supports time-travel alias suffixes
  • Docs + scripts + tests: OpenFlights scripts and test-iceberg/ + test/fluree/db/virtual_graph/iceberg/*
  • GraalVM: native-image configs under resources/META-INF/native-image/com.fluree/db/* + graalvm/*

Testing

Iceberg-specific tests are in test-iceberg/ and test/fluree/db/virtual_graph/iceberg/* (benchmarks described in docs/iceberg-virtual-graph.md).

bplatz added 30 commits July 31, 2025 13:23
- Create virtual graph nameservice schema for storing BM25 and other VG configs
- Add Virtual Graph Manager component for monitoring and managing VGs
- Implement create-virtual-graph API function
- Add nameservice support for publishing/retrieving virtual graph records
- Create test suite for virtual graph functionality
- Update nameservice storage to handle both commit and VG record types

This enables virtual graphs to be stored as metadata in the nameservice
rather than as data within user ledgers.
- Fix virtual graph storage to persist BM25 indexes to disk
- Fix circular dependency between BM25 index and storage namespaces
- Make memory store list-paths consistent with file store (non-recursive)
- Fix type serialization in virtual graph data handling
- Add proper type handling for both SID and string formats

- Remove unused virtual-graph? function that always returned false
- Remove unused virtual graph manager namespace
- Remove unused virtual-graphs atom from Ledger record
- Extract complex parsing logic into named helper functions
- Fix try*/catch* usage in .clj files

- Add validation to prevent @ symbols in graph and ledger names
- Move virtual graphs to ns@v1/ directory alongside ledgers
- Update nameservice scanning to only check immediate directory
- Fix volatile handling consistency across callers
- Maintain JSON-LD query format with string keys

- Add virtual graph loader from nameservice
- Update query API to handle virtual graph loading
- Fix reflection warnings with proper type hints
- Ensure consistent storage behavior across implementations
- Removed FlatRank virtual graph implementation (##-prefixed syntax)
- Moved BM25 creation logic from api.cljc to virtual-graph.create namespace
- Improved BM25 creation to validate ledgers exist before publishing to nameservice
- Fixed async handling in load-and-validate-ledgers function
- Consolidated test files and removed debugging tests
- Added comprehensive BM25 tests for memory and federated queries
- Fixed nameservice query test to be Clojure-only
- Cleaned up unused imports and improved code organization

This completes the migration to function-based vector search and simplifies
the virtual graph creation API.
Major improvements to nameservice storage implementation:

- Convert JSON-LD generation to multimethod pattern for extensibility
- Consolidate filename generation logic to support both ledgers and VGs
- Make retract API consistent by passing resource names instead of filenames
- Simplify dependency registration to work with JSON-LD format
- Clean up VG dependency tracking (remove non-functional update logic)
- Fix migration code to use new multimethod API

The publish flow is now cleaner with clear separation of concerns:
1. Convert record to JSON-LD (multimethod based on record type)
2. Write to storage with appropriate filename
3. Register dependencies if needed (VGs only currently)
- Move drop-virtual-graph functions to dedicated virtual-graph.drop namespace
- Fix critical bug in unregister-vg-dependencies where empty map was wiping all dependencies
- Update drop-ledger to use go-try for proper exception propagation
- Add comprehensive drop tests with dependency checking
- Replace manual UTF-8 conversion with util.bytes functions
- Fix reflection warning in nameservice storage
- Preserved virtual graph-based BM25 implementation from feature branch
- Applied main's RecursiveListableStore changes for slash handling in ledger names
- Updated drop.cljc to use RecursiveListableStore protocol
- Fixed API changes: fluree/create now returns db directly
- Removed index_test.clj as vector tests moved to flatrank_test.clj
- All tests passing
- Update drop.cljc to use RecursiveListableStore instead of ListableStore
- Fix flatrank_test.clj to use new API (fluree/create returns db directly)
…ation tests for mapping relational data to RDF
… generation, enhancing query execution and result streaming.
…adding filter expression handling in SQL generation; improve integration tests for accurate data retrieval and filtering.
- Introduced a new function `parse-r2rml-from-triples` to encapsulate the logic for parsing R2RML mappings from grouped triples.
- Updated `parse-min-r2rml` to support inline TTL and JSON-LD mappings, allowing for more flexible mapping sources.
- Modified integration tests to use a consistent query structure with string keys instead of keywords for better compatibility.
- Added new tests for inline TTL and JSON-LD mappings to ensure correct functionality and data retrieval.
- Improved logging for debugging purposes in various functions.
…atypes, and templates in object mappings; include integration tests for each feature.
…pdate integration tests to verify functionality.
- Resolved conflicts by keeping R2RML support additions
- Accepted removal of ledger objects from API (as per bm25-ns)
- Kept virtual graph enhancements from both branches
- Fixed variable naming (vars -> var-config) in FQL parse
@bplatz bplatz requested a review from a team January 8, 2026 20:54
@bplatz bplatz changed the title Iceberg virtual graph (SPARQL over Iceberg) + R2RML mapping + GraalVM native-image support (alpha) Iceberg virtual graph (SPARQL over Iceberg) + R2RML mapping (alpha) Jan 8, 2026
Base automatically changed from feature/bm25-ns to main January 14, 2026 19:19
Copy link
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💿

bplatz and others added 8 commits January 16, 2026 05:18
Resolves merge conflicts:
- deps.edn: Combined graalvm logging deps with iceberg config
- api.cljc: Added trace/async-form wrapper to ledger-info while
  preserving Iceberg VG ledger-info support
- query/api.cljc: Added trace/async-form wrapper to load-alias while
  preserving R2RML and Iceberg VG handling logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants