-
Notifications
You must be signed in to change notification settings - Fork 35
feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
TheReaperJay
wants to merge
5
commits into
memgraph:main
Choose a base branch
from
TheReaperJay:feature/cugraph-rapids-25x-cuda13
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710
TheReaperJay
wants to merge
5
commits into
memgraph:main
from
TheReaperJay:feature/cugraph-rapids-25x-cuda13
+2,052
−554
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…atibility Upgrades the cuGraph module from RAPIDS 22.02/CUDA 11.5 to RAPIDS 25.12/CUDA 13.1, bringing 3 years of performance improvements and modern GPU support. ## Motivation The current implementation uses: - CUDA 11.5.2 (EOL, no RTX 40xx/50xx or H100 support) - cuGraph 22.02 (deprecated APIs) - Ubuntu 20.04 (EOL since April 2025) - Python 3.8 (EOL since October 2024) ## Changes **Modern API (8 algorithms):** - pagerank, betweenness_centrality, hits, katz_centrality - louvain, leiden, personalized_pagerank, graph_generator Uses `cugraph::create_graph_from_edgelist` with edge property views. Returns allocated results via structured bindings. **Legacy API (2 algorithms):** - balanced_cut_clustering, spectral_clustering These use `cugraph::ext_raft::` namespace which only supports legacy `GraphCSRView`. No modern API equivalent exists in cuGraph 25.x. Added required `raft::random::RngState` parameter for 25.x compatibility. **Key implementation notes:** - renumber=false: GraphView provides 0-based contiguous indices - Edge properties use variant type (arithmetic_device_uvector_t) - Build requires -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE ## Validation All 9 algorithms validated against NetworkX ground truth: - PageRank, Betweenness, HITS, Katz: exact/near-exact match - Louvain, Leiden: correct community detection - Balanced Cut, Spectral: correct clustering ## Hardware Support Added - NVIDIA RTX 40xx (Ada Lovelace) - NVIDIA RTX 50xx (Blackwell) - NVIDIA H100/H200 (Hopper)
…kX validation This commit introduces comprehensive end-to-end tests for all cuGraph GPU-accelerated graph algorithms, integrated into MAGE's existing e2e testing framework. ## What Was Added ### E2E Tests (e2e/**/test_cugraph_networkx_validation/) Each algorithm now has a dedicated test case following MAGE's e2e conventions: e2e/pagerank_test/test_cugraph_networkx_validation/ e2e/betweenness_centrality_test/test_cugraph_networkx_validation/ e2e/hits_test/test_cugraph_networkx_validation/ e2e/katz_test/test_cugraph_networkx_validation/ e2e/louvain_test/test_cugraph_networkx_validation/ e2e/leiden_cugraph_test/test_cugraph_networkx_validation/ e2e/personalized_pagerank_test/test_cugraph_networkx_validation/ e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/ (new) e2e/spectral_clustering_test/test_cugraph_networkx_validation/ (new) Each test directory contains: - input.cyp: A 9-node test graph with two communities (A1-A4, B1-B4) connected via a HUB node, providing a consistent topology for validating algorithm behavior - test.yml: Expected results with pytest.approx tolerances (rel=0.05, abs=1e-6) ### Standalone Validation Script (scripts/validate_cugraph_algorithms.py) A debugging and validation tool that: 1. Builds the identical 9-node graph in NetworkX (ground truth) 2. Computes expected values using NetworkX's reference implementations 3. Spins up a Memgraph container with cuGraph modules 4. Runs each cuGraph algorithm and compares against NetworkX 5. Reports pass/fail with detailed value comparisons This script is NOT part of the CI pipeline - it exists for developers to: - Validate cuGraph results against known-correct NetworkX implementations - Debug algorithm discrepancies during development - Verify GPU acceleration produces mathematically equivalent results ## Why This Approach 1. **E2E Framework Integration**: Tests use MAGE's existing pytest-based e2e infrastructure, ensuring they run alongside other module tests in CI. 2. **NetworkX as Ground Truth**: NetworkX is the de-facto standard for graph algorithms in Python. Validating cuGraph against NetworkX proves mathematical correctness, not just "it runs without crashing." 3. **Tolerance-Based Comparison**: GPU floating-point operations may produce slightly different results than CPU. Using pytest.approx with 5% relative tolerance accounts for this while still catching algorithmic errors. 4. **Consistent Test Graph**: The 9-node two-community topology was chosen because: - Small enough for fast execution - Complex enough to exercise algorithm behavior (communities, hub node) - Produces deterministic, verifiable results ## Algorithms Tested Centrality Measures: - cugraph.pagerank - cugraph.betweenness_centrality - cugraph.hits - cugraph.katz_centrality - cugraph.personalized_pagerank Community Detection: - cugraph.louvain - cugraph.leiden Clustering (Legacy ext_raft API): - cugraph.balanced_cut_clustering - cugraph.spectral_clustering Note: balanced_cut and spectral use the legacy cugraph::ext_raft:: API as these algorithms have not been migrated to the new pylibcugraph API in RAPIDS 25.x.
- Upgrade PyTorch to cu130 (CUDA 13.0 support via pytorch.org/whl/cu130) - Upgrade DGL to torch-2.9/cu130 wheels (removes torchdata dependency) - Add torch_geometric with PyG extensions built from source for CUDA 13 - Add unixodbc-dev for pyodbc module support - Upgrade numpy and gensim for binary compatibility These changes ensure all Python ML modules load without errors on CUDA 13.1, fixing issues with nvToolsExt, torchdata.datapipes, and torch_geometric imports.
The cuGraph C++ library supports sampling via the 'vertices' parameter,
which limits betweenness computation to k random source vertices instead
of all V vertices. This reduces complexity from O(V*E) to O(k*E).
The MAGE wrapper did not expose this parameter - it always passed
std::nullopt (use all vertices). This change adds the k parameter.
Note: Other cuGraph parameters (initial_pageranks, precomputed caches,
warm-start hints) are intentionally not exposed because MAGE procedures
are stateless - there is no way to persist or pass state between calls.
The k parameter is different: it is not about state, it is about avoiding
memory explosion on large graphs by sampling source vertices.
Backward compatible: default k=0 preserves existing behavior (all vertices).
Changes:
- Add optional 'k' parameter (default=0 means use all vertices)
- When k>0 and k<V: randomly sample k vertices as sources
- Pass sampled vertices to cuGraph via device_span
Usage: CALL cugraph.betweenness_centrality.get(true, true, 1000)
(normalized=true, directed=true, k=1000)
…y management Problem: Betweenness centrality and other memory-intensive algorithms on large graphs were failing with CUDA out-of-memory errors even when sufficient VRAM was available. Root Cause: RMM (RAPIDS Memory Manager) was using the default device allocator which allocates memory on-demand without pooling. This caused memory fragmentation across PageRank, Louvain, and other algorithms. When subsequent algorithms attempted to allocate large contiguous blocks, CUDA could not find one despite having enough total free memory. Solution: Initialize CUDA's built-in async memory resource (cudaMallocAsync) as the default RMM device resource at module load time. This provides: 1. Automatic memory pooling managed by CUDA driver 2. Defragmentation handled transparently by the driver 3. Contiguous memory blocks available for large allocations 4. No manual pool size configuration required 5. Optimal memory reuse across algorithm invocations The static initializer in mg_cugraph_utility.hpp runs once when each cuGraph module is loaded, before any algorithm execution. All existing code that calls rmm::mr::get_current_device_resource() automatically uses the pooled allocator with zero code changes to individual algorithms. This is part of the RAPIDS 25.x / CUDA 13 upgrade (PR memgraph#710).
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



Summary
This PR upgrades MAGE's cuGraph integration from the legacy RAPIDS 22.02 / CUDA 11.5 stack to modern RAPIDS 25.12 / CUDA 13.1, bringing GPU-accelerated graph algorithms up to date with current NVIDIA tooling.
Key Changes
Infrastructure Upgrade:
API Migration:
All 9 cuGraph algorithms updated to use the modern pylibcugraph API:
cugraph::pagerank→cugraph::pagerank()with explicit graph viewcugraph::betweenness_centrality→ normalized output handlingcugraph::hits→ proper hub/authority vector managementcugraph::katz_centrality→ updated alpha/beta parameter handlingcugraph::louvain/cugraph::leiden→ new clustering return typescugraph::personalized_pagerank→ vertex list handlingLegacy API Preserved:
Two algorithms remain on
cugraph::ext_raft::API as they haven't been migrated in RAPIDS 25.x:balanced_cut_clusteringspectral_clusteringE2E Tests Added
Comprehensive end-to-end tests for all 9 algorithms following MAGE's existing test framework:
Each test uses a 9-node two-community graph topology with expected values validated against NetworkX ground truth (5% tolerance for GPU floating-point variance).
Validation Script
Added
scripts/validate_cugraph_algorithms.py- a standalone debugging tool that:This is for developer debugging, not CI.
Test Plan
Dockerfile.cugraphBreaking Changes
None. All algorithm signatures and return types preserved.