-
Notifications
You must be signed in to change notification settings - Fork 25
Add R2RML virtual graph support with advanced mapping features #1098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Create virtual graph nameservice schema for storing BM25 and other VG configs - Add Virtual Graph Manager component for monitoring and managing VGs - Implement create-virtual-graph API function - Add nameservice support for publishing/retrieving virtual graph records - Create test suite for virtual graph functionality - Update nameservice storage to handle both commit and VG record types This enables virtual graphs to be stored as metadata in the nameservice rather than as data within user ledgers.
- Fix virtual graph storage to persist BM25 indexes to disk - Fix circular dependency between BM25 index and storage namespaces - Make memory store list-paths consistent with file store (non-recursive) - Fix type serialization in virtual graph data handling - Add proper type handling for both SID and string formats - Remove unused virtual-graph? function that always returned false - Remove unused virtual graph manager namespace - Remove unused virtual-graphs atom from Ledger record - Extract complex parsing logic into named helper functions - Fix try*/catch* usage in .clj files - Add validation to prevent @ symbols in graph and ledger names - Move virtual graphs to ns@v1/ directory alongside ledgers - Update nameservice scanning to only check immediate directory - Fix volatile handling consistency across callers - Maintain JSON-LD query format with string keys - Add virtual graph loader from nameservice - Update query API to handle virtual graph loading - Fix reflection warnings with proper type hints - Ensure consistent storage behavior across implementations
- Removed FlatRank virtual graph implementation (##-prefixed syntax) - Moved BM25 creation logic from api.cljc to virtual-graph.create namespace - Improved BM25 creation to validate ledgers exist before publishing to nameservice - Fixed async handling in load-and-validate-ledgers function - Consolidated test files and removed debugging tests - Added comprehensive BM25 tests for memory and federated queries - Fixed nameservice query test to be Clojure-only - Cleaned up unused imports and improved code organization This completes the migration to function-based vector search and simplifies the virtual graph creation API.
Major improvements to nameservice storage implementation: - Convert JSON-LD generation to multimethod pattern for extensibility - Consolidate filename generation logic to support both ledgers and VGs - Make retract API consistent by passing resource names instead of filenames - Simplify dependency registration to work with JSON-LD format - Clean up VG dependency tracking (remove non-functional update logic) - Fix migration code to use new multimethod API The publish flow is now cleaner with clear separation of concerns: 1. Convert record to JSON-LD (multimethod based on record type) 2. Write to storage with appropriate filename 3. Register dependencies if needed (VGs only currently)
- Move drop-virtual-graph functions to dedicated virtual-graph.drop namespace - Fix critical bug in unregister-vg-dependencies where empty map was wiping all dependencies - Update drop-ledger to use go-try for proper exception propagation - Add comprehensive drop tests with dependency checking - Replace manual UTF-8 conversion with util.bytes functions - Fix reflection warning in nameservice storage
- Preserved virtual graph-based BM25 implementation from feature branch - Applied main's RecursiveListableStore changes for slash handling in ledger names - Updated drop.cljc to use RecursiveListableStore protocol - Fixed API changes: fluree/create now returns db directly - Removed index_test.clj as vector tests moved to flatrank_test.clj - All tests passing
- Update drop.cljc to use RecursiveListableStore instead of ListableStore - Fix flatrank_test.clj to use new API (fluree/create returns db directly)
…ation tests for mapping relational data to RDF
…ring by literal values in WHERE clauses
… generation, enhancing query execution and result streaming.
…dling of triples maps
…adding filter expression handling in SQL generation; improve integration tests for accurate data retrieval and filtering.
- Introduced a new function `parse-r2rml-from-triples` to encapsulate the logic for parsing R2RML mappings from grouped triples. - Updated `parse-min-r2rml` to support inline TTL and JSON-LD mappings, allowing for more flexible mapping sources. - Modified integration tests to use a consistent query structure with string keys instead of keywords for better compatibility. - Added new tests for inline TTL and JSON-LD mappings to ensure correct functionality and data retrieval. - Improved logging for debugging purposes in various functions.
…atypes, and templates in object mappings; include integration tests for each feature.
…pdate integration tests to verify functionality.
4a997a3 to
bd28fe2
Compare
- Resolved conflicts by keeping R2RML support additions - Accepted removal of ledger objects from API (as per bm25-ns) - Kept virtual graph enhancements from both branches - Fixed variable naming (vars -> var-config) in FQL parse
…and SQL execution
- Resolved conflict in nameservice/storage.cljc to use new ledger:branch format - Fixed R2RML linting errors: - Added missing -match-properties protocol method - Changed where/nil-channel to empty-channel (following codebase rename) - Incorporated all changes from main including new ledger:branch naming convention
Contributor
Author
|
R2RML is now being used primarily by the iceberg feature in #1185 which includes most of these R2RML features, so closing this PR. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds R2RML (RDB to RDF Mapping Language) support for virtualizing relational databases as RDF graphs in Fluree DB. R2RML is a W3C standard that enables SPARQL queries over SQL databases without data migration.
📚 Documentation: See docs/r2rml-guide.md for usage guide and examples.
Features Implemented
Core R2RML Support
Logical Tables
rr:tableName- Direct table mappingrr:sqlQuery- SQL query results mapping (with JOINs, aggregations, computed columns)Subject Maps
rr:template- Generate subject IRIs from column valuesrr:class- Specify RDF classesPredicate-Object Maps
rr:column- Map database columns to RDF objectsrr:template- Create composite values from multiple columnsrr:constant- Use fixed literal or IRI valuesrr:datatype- XSD datatype specificationrr:language- Language tags for internationalizationFormat Support
Technical Implementation
The implementation follows a pattern-collection approach similar to BM25:
Key Components
Query Support
Supports both FQL and SPARQL query patterns:
Testing
Test suite includes 19 tests covering:
Example Usage
Implementation Notes
Not Yet Implemented
R2RML features not included in this PR:
rr:parentTriplesMap- Foreign key relationshipsrr:joinCondition- Join conditions between mappingsrr:graphMap- Named graph supportrr:termType- Explicit term type specificationBreaking Changes
None - this is a new feature addition.
Test Plan
Run the R2RML test suite:
All 19 tests should pass.