feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710

TheReaperJay · 2025-12-28T11:06:08Z

Summary

This PR upgrades MAGE's cuGraph integration from the legacy RAPIDS 22.02 / CUDA 11.5 stack to modern RAPIDS 25.12 / CUDA 13.1, bringing GPU-accelerated graph algorithms up to date with current NVIDIA tooling.

Key Changes

Infrastructure Upgrade:

CUDA 11.5.2 → 13.1.0
RAPIDS/cuGraph 22.02 → 25.12
Ubuntu 20.04 → 24.04
Python 3.8 → 3.12

API Migration:
All 9 cuGraph algorithms updated to use the modern pylibcugraph API:

cugraph::pagerank → cugraph::pagerank() with explicit graph view
cugraph::betweenness_centrality → normalized output handling
cugraph::hits → proper hub/authority vector management
cugraph::katz_centrality → updated alpha/beta parameter handling
cugraph::louvain / cugraph::leiden → new clustering return types
cugraph::personalized_pagerank → vertex list handling

Legacy API Preserved:
Two algorithms remain on cugraph::ext_raft:: API as they haven't been migrated in RAPIDS 25.x:

balanced_cut_clustering
spectral_clustering

E2E Tests Added

Comprehensive end-to-end tests for all 9 algorithms following MAGE's existing test framework:

e2e/pagerank_test/test_cugraph_networkx_validation/
e2e/betweenness_centrality_test/test_cugraph_networkx_validation/
e2e/hits_test/test_cugraph_networkx_validation/
e2e/katz_test/test_cugraph_networkx_validation/
e2e/louvain_test/test_cugraph_networkx_validation/
e2e/leiden_cugraph_test/test_cugraph_networkx_validation/
e2e/personalized_pagerank_test/test_cugraph_networkx_validation/
e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/
e2e/spectral_clustering_test/test_cugraph_networkx_validation/

Each test uses a 9-node two-community graph topology with expected values validated against NetworkX ground truth (5% tolerance for GPU floating-point variance).

Validation Script

Added scripts/validate_cugraph_algorithms.py - a standalone debugging tool that:

Builds identical graph in NetworkX (ground truth)
Spins up Memgraph container with cuGraph modules
Runs each algorithm and compares against NetworkX
Reports pass/fail with detailed value comparisons

This is for developer debugging, not CI.

Test Plan

All 9 cuGraph algorithms pass validation against NetworkX ground truth
Docker image builds successfully with Dockerfile.cugraph
E2E tests follow existing MAGE test conventions
CI pipeline runs (pending merge)

Breaking Changes

None. All algorithm signatures and return types preserved.

…atibility Upgrades the cuGraph module from RAPIDS 22.02/CUDA 11.5 to RAPIDS 25.12/CUDA 13.1, bringing 3 years of performance improvements and modern GPU support. ## Motivation The current implementation uses: - CUDA 11.5.2 (EOL, no RTX 40xx/50xx or H100 support) - cuGraph 22.02 (deprecated APIs) - Ubuntu 20.04 (EOL since April 2025) - Python 3.8 (EOL since October 2024) ## Changes **Modern API (8 algorithms):** - pagerank, betweenness_centrality, hits, katz_centrality - louvain, leiden, personalized_pagerank, graph_generator Uses `cugraph::create_graph_from_edgelist` with edge property views. Returns allocated results via structured bindings. **Legacy API (2 algorithms):** - balanced_cut_clustering, spectral_clustering These use `cugraph::ext_raft::` namespace which only supports legacy `GraphCSRView`. No modern API equivalent exists in cuGraph 25.x. Added required `raft::random::RngState` parameter for 25.x compatibility. **Key implementation notes:** - renumber=false: GraphView provides 0-based contiguous indices - Edge properties use variant type (arithmetic_device_uvector_t) - Build requires -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE ## Validation All 9 algorithms validated against NetworkX ground truth: - PageRank, Betweenness, HITS, Katz: exact/near-exact match - Louvain, Leiden: correct community detection - Balanced Cut, Spectral: correct clustering ## Hardware Support Added - NVIDIA RTX 40xx (Ada Lovelace) - NVIDIA RTX 50xx (Blackwell) - NVIDIA H100/H200 (Hopper)

…kX validation This commit introduces comprehensive end-to-end tests for all cuGraph GPU-accelerated graph algorithms, integrated into MAGE's existing e2e testing framework. ## What Was Added ### E2E Tests (e2e/**/test_cugraph_networkx_validation/) Each algorithm now has a dedicated test case following MAGE's e2e conventions: e2e/pagerank_test/test_cugraph_networkx_validation/ e2e/betweenness_centrality_test/test_cugraph_networkx_validation/ e2e/hits_test/test_cugraph_networkx_validation/ e2e/katz_test/test_cugraph_networkx_validation/ e2e/louvain_test/test_cugraph_networkx_validation/ e2e/leiden_cugraph_test/test_cugraph_networkx_validation/ e2e/personalized_pagerank_test/test_cugraph_networkx_validation/ e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/ (new) e2e/spectral_clustering_test/test_cugraph_networkx_validation/ (new) Each test directory contains: - input.cyp: A 9-node test graph with two communities (A1-A4, B1-B4) connected via a HUB node, providing a consistent topology for validating algorithm behavior - test.yml: Expected results with pytest.approx tolerances (rel=0.05, abs=1e-6) ### Standalone Validation Script (scripts/validate_cugraph_algorithms.py) A debugging and validation tool that: 1. Builds the identical 9-node graph in NetworkX (ground truth) 2. Computes expected values using NetworkX's reference implementations 3. Spins up a Memgraph container with cuGraph modules 4. Runs each cuGraph algorithm and compares against NetworkX 5. Reports pass/fail with detailed value comparisons This script is NOT part of the CI pipeline - it exists for developers to: - Validate cuGraph results against known-correct NetworkX implementations - Debug algorithm discrepancies during development - Verify GPU acceleration produces mathematically equivalent results ## Why This Approach 1. **E2E Framework Integration**: Tests use MAGE's existing pytest-based e2e infrastructure, ensuring they run alongside other module tests in CI. 2. **NetworkX as Ground Truth**: NetworkX is the de-facto standard for graph algorithms in Python. Validating cuGraph against NetworkX proves mathematical correctness, not just "it runs without crashing." 3. **Tolerance-Based Comparison**: GPU floating-point operations may produce slightly different results than CPU. Using pytest.approx with 5% relative tolerance accounts for this while still catching algorithmic errors. 4. **Consistent Test Graph**: The 9-node two-community topology was chosen because: - Small enough for fast execution - Complex enough to exercise algorithm behavior (communities, hub node) - Produces deterministic, verifiable results ## Algorithms Tested Centrality Measures: - cugraph.pagerank - cugraph.betweenness_centrality - cugraph.hits - cugraph.katz_centrality - cugraph.personalized_pagerank Community Detection: - cugraph.louvain - cugraph.leiden Clustering (Legacy ext_raft API): - cugraph.balanced_cut_clustering - cugraph.spectral_clustering Note: balanced_cut and spectral use the legacy cugraph::ext_raft:: API as these algorithms have not been migrated to the new pylibcugraph API in RAPIDS 25.x.

CLAassistant · 2025-12-28T11:06:25Z

All committers have signed the CLA.

- Upgrade PyTorch to cu130 (CUDA 13.0 support via pytorch.org/whl/cu130) - Upgrade DGL to torch-2.9/cu130 wheels (removes torchdata dependency) - Add torch_geometric with PyG extensions built from source for CUDA 13 - Add unixodbc-dev for pyodbc module support - Upgrade numpy and gensim for binary compatibility These changes ensure all Python ML modules load without errors on CUDA 13.1, fixing issues with nvToolsExt, torchdata.datapipes, and torch_geometric imports.

The cuGraph C++ library supports sampling via the 'vertices' parameter, which limits betweenness computation to k random source vertices instead of all V vertices. This reduces complexity from O(V*E) to O(k*E). The MAGE wrapper did not expose this parameter - it always passed std::nullopt (use all vertices). This change adds the k parameter. Note: Other cuGraph parameters (initial_pageranks, precomputed caches, warm-start hints) are intentionally not exposed because MAGE procedures are stateless - there is no way to persist or pass state between calls. The k parameter is different: it is not about state, it is about avoiding memory explosion on large graphs by sampling source vertices. Backward compatible: default k=0 preserves existing behavior (all vertices). Changes: - Add optional 'k' parameter (default=0 means use all vertices) - When k>0 and k<V: randomly sample k vertices as sources - Pass sampled vertices to cuGraph via device_span Usage: CALL cugraph.betweenness_centrality.get(true, true, 1000) (normalized=true, directed=true, k=1000)

…y management Problem: Betweenness centrality and other memory-intensive algorithms on large graphs were failing with CUDA out-of-memory errors even when sufficient VRAM was available. Root Cause: RMM (RAPIDS Memory Manager) was using the default device allocator which allocates memory on-demand without pooling. This caused memory fragmentation across PageRank, Louvain, and other algorithms. When subsequent algorithms attempted to allocate large contiguous blocks, CUDA could not find one despite having enough total free memory. Solution: Initialize CUDA's built-in async memory resource (cudaMallocAsync) as the default RMM device resource at module load time. This provides: 1. Automatic memory pooling managed by CUDA driver 2. Defragmentation handled transparently by the driver 3. Contiguous memory blocks available for large allocations 4. No manual pool size configuration required 5. Optimal memory reuse across algorithm invocations The static initializer in mg_cugraph_utility.hpp runs once when each cuGraph module is loaded, before any algorithm execution. All existing code that calls rmm::mr::get_current_device_resource() automatically uses the pooled allocator with zero code changes to individual algorithms. This is part of the RAPIDS 25.x / CUDA 13 upgrade (PR memgraph#710).

sonarqubecloud · 2025-12-29T09:45:06Z

Quality Gate passed

Issues
14 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

TheReaperJay added 2 commits December 28, 2025 18:04

TheReaperJay mentioned this pull request Dec 28, 2025

Add latest cuGraph #458

Closed

55 tasks

TheReaperJay added 3 commits December 28, 2025 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710

feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710

TheReaperJay commented Dec 28, 2025

Uh oh!

CLAassistant commented Dec 28, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710

Are you sure you want to change the base?

feat(cugraph): upgrade to RAPIDS 25.12 / CUDA 13.1 with comprehensive e2e tests #710

Conversation

TheReaperJay commented Dec 28, 2025

Summary

Key Changes

E2E Tests Added

Validation Script

Test Plan

Breaking Changes

Uh oh!

CLAassistant commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Dec 29, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Dec 28, 2025 •

edited

Loading