Skip to content

Conversation

@bplatz
Copy link
Contributor

@bplatz bplatz commented Aug 23, 2025

Summary

This PR adds R2RML (RDB to RDF Mapping Language) support for virtualizing relational databases as RDF graphs in Fluree DB. R2RML is a W3C standard that enables SPARQL queries over SQL databases without data migration.

📚 Documentation: See docs/r2rml-guide.md for usage guide and examples.

Features Implemented

Core R2RML Support

  • Logical Tables

    • rr:tableName - Direct table mapping
    • rr:sqlQuery - SQL query results mapping (with JOINs, aggregations, computed columns)
  • Subject Maps

    • rr:template - Generate subject IRIs from column values
    • rr:class - Specify RDF classes
  • Predicate-Object Maps

    • rr:column - Map database columns to RDF objects
    • rr:template - Create composite values from multiple columns
    • rr:constant - Use fixed literal or IRI values
    • rr:datatype - XSD datatype specification
    • rr:language - Language tags for internationalization

Format Support

  • Turtle (.ttl) format for R2RML mappings
  • JSON-LD format for R2RML mappings
  • Inline mapping definitions
  • File-based mapping definitions

Technical Implementation

The implementation follows a pattern-collection approach similar to BM25:

  1. SPARQL patterns are collected during the match phase
  2. Patterns are analyzed to determine the appropriate R2RML mapping
  3. SQL is generated based on the patterns and mapping
  4. Results are transformed to RDF according to the mapping rules

Key Components

  • Pattern Analysis: Selects R2RML mappings based on query patterns
  • SQL Generation: Generates SQL with column aliasing and filtering
  • Result Transformation: Handles templates, datatypes, language tags, and constants
  • Database Support: Works with H2, PostgreSQL, MySQL, SQL Server via JDBC

Query Support

Supports both FQL and SPARQL query patterns:

;; FQL Query
{"from" ["vg/persons"]
 "select" ["?person" "?name"]
 "where" [{"@id" "?person"
          "@type" "http://xmlns.com/foaf/0.1/Person"
          "http://xmlns.com/foaf/0.1/name" "?name"}]}
-- SPARQL Equivalent
SELECT ?person ?name
WHERE {
  ?person a foaf:Person ;
          foaf:name ?name .
}

Testing

Test suite includes 19 tests covering:

  • Basic table mapping
  • SQL query mapping with aggregations
  • Template-based object values
  • Constant values
  • XSD datatypes
  • Language tags
  • JSON-LD format support
  • Filter expressions
  • Complex SQL queries

Example Usage

;; Define R2RML mapping
(def mapping
  "@prefix rr: <http://www.w3.org/ns/r2rml#> .
   @prefix ex: <http://example.com/> .
   
   ex:PersonMap a rr:TriplesMap ;
     rr:logicalTable [ rr:tableName \"persons\" ] ;
     rr:subjectMap [
       rr:template \"http://example.com/person/{id}\" ;
       rr:class ex:Person
     ] ;
     rr:predicateObjectMap [
       rr:predicate ex:name ;
       rr:objectMap [ rr:column \"name\" ]
     ] .")

;; Publish virtual graph
(nameservice/publish publisher
  {:vg-name "vg/persons"
   :vg-type "fidx:R2RML"
   :engine  :r2rml
   :config  {:mappingInline mapping
             :rdb {:jdbcUrl "jdbc:postgresql://localhost/db"
                   :driver  "org.postgresql.Driver"}}})

Implementation Notes

  • Query Pushdown: Filters are converted to SQL WHERE clauses
  • Column Selection: Only required columns are selected from database
  • Template Processing: Templates are expanded during result processing
  • Result Streaming: Results are streamed through core.async channels

Not Yet Implemented

R2RML features not included in this PR:

  • rr:parentTriplesMap - Foreign key relationships
  • rr:joinCondition - Join conditions between mappings
  • rr:graphMap - Named graph support
  • rr:termType - Explicit term type specification
  • Dynamic predicates from columns

Breaking Changes

None - this is a new feature addition.

Test Plan

Run the R2RML test suite:

clojure -M:cljtest -m kaocha.runner --focus fluree.db.virtual-graph.r2rml-test

All 19 tests should pass.

bplatz added 27 commits July 31, 2025 13:23
- Create virtual graph nameservice schema for storing BM25 and other VG configs
- Add Virtual Graph Manager component for monitoring and managing VGs
- Implement create-virtual-graph API function
- Add nameservice support for publishing/retrieving virtual graph records
- Create test suite for virtual graph functionality
- Update nameservice storage to handle both commit and VG record types

This enables virtual graphs to be stored as metadata in the nameservice
rather than as data within user ledgers.
- Fix virtual graph storage to persist BM25 indexes to disk
- Fix circular dependency between BM25 index and storage namespaces
- Make memory store list-paths consistent with file store (non-recursive)
- Fix type serialization in virtual graph data handling
- Add proper type handling for both SID and string formats

- Remove unused virtual-graph? function that always returned false
- Remove unused virtual graph manager namespace
- Remove unused virtual-graphs atom from Ledger record
- Extract complex parsing logic into named helper functions
- Fix try*/catch* usage in .clj files

- Add validation to prevent @ symbols in graph and ledger names
- Move virtual graphs to ns@v1/ directory alongside ledgers
- Update nameservice scanning to only check immediate directory
- Fix volatile handling consistency across callers
- Maintain JSON-LD query format with string keys

- Add virtual graph loader from nameservice
- Update query API to handle virtual graph loading
- Fix reflection warnings with proper type hints
- Ensure consistent storage behavior across implementations
- Removed FlatRank virtual graph implementation (##-prefixed syntax)
- Moved BM25 creation logic from api.cljc to virtual-graph.create namespace
- Improved BM25 creation to validate ledgers exist before publishing to nameservice
- Fixed async handling in load-and-validate-ledgers function
- Consolidated test files and removed debugging tests
- Added comprehensive BM25 tests for memory and federated queries
- Fixed nameservice query test to be Clojure-only
- Cleaned up unused imports and improved code organization

This completes the migration to function-based vector search and simplifies
the virtual graph creation API.
Major improvements to nameservice storage implementation:

- Convert JSON-LD generation to multimethod pattern for extensibility
- Consolidate filename generation logic to support both ledgers and VGs
- Make retract API consistent by passing resource names instead of filenames
- Simplify dependency registration to work with JSON-LD format
- Clean up VG dependency tracking (remove non-functional update logic)
- Fix migration code to use new multimethod API

The publish flow is now cleaner with clear separation of concerns:
1. Convert record to JSON-LD (multimethod based on record type)
2. Write to storage with appropriate filename
3. Register dependencies if needed (VGs only currently)
- Move drop-virtual-graph functions to dedicated virtual-graph.drop namespace
- Fix critical bug in unregister-vg-dependencies where empty map was wiping all dependencies
- Update drop-ledger to use go-try for proper exception propagation
- Add comprehensive drop tests with dependency checking
- Replace manual UTF-8 conversion with util.bytes functions
- Fix reflection warning in nameservice storage
- Preserved virtual graph-based BM25 implementation from feature branch
- Applied main's RecursiveListableStore changes for slash handling in ledger names
- Updated drop.cljc to use RecursiveListableStore protocol
- Fixed API changes: fluree/create now returns db directly
- Removed index_test.clj as vector tests moved to flatrank_test.clj
- All tests passing
- Update drop.cljc to use RecursiveListableStore instead of ListableStore
- Fix flatrank_test.clj to use new API (fluree/create returns db directly)
…ation tests for mapping relational data to RDF
… generation, enhancing query execution and result streaming.
…adding filter expression handling in SQL generation; improve integration tests for accurate data retrieval and filtering.
- Introduced a new function `parse-r2rml-from-triples` to encapsulate the logic for parsing R2RML mappings from grouped triples.
- Updated `parse-min-r2rml` to support inline TTL and JSON-LD mappings, allowing for more flexible mapping sources.
- Modified integration tests to use a consistent query structure with string keys instead of keywords for better compatibility.
- Added new tests for inline TTL and JSON-LD mappings to ensure correct functionality and data retrieval.
- Improved logging for debugging purposes in various functions.
…atypes, and templates in object mappings; include integration tests for each feature.
…pdate integration tests to verify functionality.
@bplatz bplatz changed the base branch from main to feature/bm25-ns August 23, 2025 15:52
- Resolved conflicts by keeping R2RML support additions
- Accepted removal of ledger objects from API (as per bm25-ns)
- Kept virtual graph enhancements from both branches
- Fixed variable naming (vars -> var-config) in FQL parse
- Resolved conflict in nameservice/storage.cljc to use new ledger:branch format
- Fixed R2RML linting errors:
  - Added missing -match-properties protocol method
  - Changed where/nil-channel to empty-channel (following codebase rename)
- Incorporated all changes from main including new ledger:branch naming convention
@bplatz
Copy link
Contributor Author

bplatz commented Jan 9, 2026

R2RML is now being used primarily by the iceberg feature in #1185 which includes most of these R2RML features, so closing this PR.

@bplatz bplatz closed this Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants