feat(ci): add /test rerun-failed-tests command #4037

gustavolira · 2026-01-21T13:44:21Z

Summary

Add a new CI command /test rerun-failed-tests that re-executes only the tests that failed in the previous e2e-ocp-helm run
Optimizes CI time and resources by skipping tests that already passed
Conditionally deploys only namespaces that had failures (showcase and/or showcase-rbac)

New Files

.ibm/pipelines/retest-failed-utils.sh: Utility functions for:
- Fetching JUnit artifacts from GCS
- Parsing failed test names from JUnit XML
- Building artifact URLs
- Running specific test files with Playwright
.ibm/pipelines/jobs/ocp-rerun-failed-tests.sh: Main job handler that:
- Fetches previous JUnit results via GCS/GitHub API
- Parses which tests failed in each namespace
- Deploys only namespaces with failures
- Runs only the failed tests

How It Works

Fetches JUnit XML from the previous e2e-ocp-helm run via GCS
Parses failed test file paths from the JUnit XML
Deploys only the namespace(s) that had failures
Runs only the specific test files that failed using Playwright

Error Handling

Scenario	Behavior
No previous execution	Exit success, nothing to rerun
No tests failed	Exit success, nothing to rerun
Test file no longer exists	Warning and skip
All retests pass	PR check green
Some retests still fail	PR check red

Test Plan

Create a PR with a test that fails intentionally
Run /test e2e-ocp-helm (will fail)
Run /test rerun-failed-tests
Verify only failed tests are executed
Verify only necessary namespaces are deployed

Note: This PR requires a corresponding PR in the release repository to add the CI configuration.

🤖 Generated with Claude Code

openshift-ci · 2026-01-21T13:44:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zdrapela for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

.ibm/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add CI configuration for the new `/test rerun-failed-tests` command that re-executes only tests that failed in the previous e2e-ocp-helm run. New step registry: - ci-operator/step-registry/redhat-developer/rhdh/ocp/rerun-failed-tests/ New test configuration in redhat-developer-rhdh-main.yaml: - as: rerun-failed-tests - optional: true (manually triggered via /test rerun-failed-tests) - Uses cluster_claim with OCP 4.18 on AWS This PR depends on: redhat-developer/rhdh#4037 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-21T14:26:12Z

The image is available at:

/test e2e-ocp-helm

gustavolira · 2026-01-21T17:19:07Z

/review

rhdh-qodo-merge · 2026-01-21T17:19:40Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🔒 No security concerns identified
⚡ Recommended focus areas for review Reliability The GitHub API requests are unauthenticated and do not handle non-200 responses, pagination, or rate limiting. This can lead to flaky behavior where the rerun job exits “success” without actually rerunning because `head_sha` or the matching check run cannot be retrieved. Consider supporting `GITHUB_TOKEN`/`Authorization` when available, checking HTTP status codes, and handling multiple pages or selecting the most recent matching check run deterministically. get_previous_failed_build_id() { local org="${1}" local repo="${2}" local pr_number="${3}" local target_job="${4:-${RERUN_TARGET_JOB}}" log::info "Fetching check runs for PR #${pr_number} in ${org}/${repo}..." # Get the PR's head SHA first local head_sha head_sha=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/pulls/${pr_number}" \ -H "Accept: application/vnd.github.v3+json" \ \| jq -r '.head.sha // empty') if [[ -z "${head_sha}" ]]; then log::error "Could not get PR head SHA" return 1 fi log::info "PR head SHA: ${head_sha}" # Get check runs for this commit and find the target job # We look for the most recent completed run of the target job local check_runs_response check_runs_response=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/commits/${head_sha}/check-runs" \ -H "Accept: application/vnd.github.v3+json") # Find the check run that matches our target job and extract details URL # The details_url contains the Prow build ID local details_url details_url=$(echo "${check_runs_response}" \| jq -r \ --arg job "${target_job}" \ '.check_runs[] \| select(.name == $job and .conclusion != null) \| .details_url // empty' \ \| head -n 1) if [[ -z "${details_url}" ]]; then log::warn "No completed check run found for job: ${target_job}" return 1 fi log::info "Found details URL: ${details_url}" # Extract build ID from the details URL # URL format: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/org_repo/pr/job/build_id local build_id build_id=$(echo "${details_url}" \| grep -oE '[0-9]{10,}' \| tail -n 1) if [[ -z "${build_id}" ]]; then log::error "Could not extract build ID from details URL" return 1 fi log::success "Found previous build ID: ${build_id}" echo "${build_id}" } Parsing Robustness Failure/test extraction relies on regex/heuristics that may miscount or misidentify failures (e.g., `grep -oP 'failures="...'` can pick up other suites, and the build id extraction pulls any long digit sequence from `details_url`). Also, `xmllint --xpath '//testcase[failure]/@file'` can output multiple attributes on one line and the subsequent `sed` assumes a simple format. Tightening these parsers (or extracting structured fields with `jq`/XML tooling more defensively) will reduce false positives/negatives and prevent rerunning the wrong tests. # Extract build ID from the details URL # URL format: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/org_repo/pr/job/build_id local build_id build_id=$(echo "${details_url}" \| grep -oE '[0-9]{10,}' \| tail -n 1) if [[ -z "${build_id}" ]]; then log::error "Could not extract build ID from details URL" return 1 fi log::success "Found previous build ID: ${build_id}" echo "${build_id}" } ####################################### # Fetch JUnit results XML from GCS for a specific namespace # Arguments: # artifact_url: Full URL to the junit-results.xml file # output_file: Local path to save the XML file # Returns: # 0 if successful, 1 if failed ####################################### fetch_previous_junit_results() { local artifact_url="${1}" local output_file="${2}" log::info "Fetching JUnit results from: ${artifact_url}" # Attempt to download the JUnit XML file local http_status http_status=$(curl -sS -w "%{http_code}" -o "${output_file}" "${artifact_url}") if [[ "${http_status}" == "200" ]]; then log::success "Successfully downloaded JUnit results" return 0 else log::warn "Failed to fetch JUnit results (HTTP ${http_status})" rm -f "${output_file}" return 1 fi } ####################################### # Parse failed test names from JUnit XML file # Arguments: # junit_file: Path to the JUnit XML file # Outputs: # Writes failed test file paths to stdout, one per line # Returns: # 0 if parsing successful (even if no failures), 1 if file not found or parsing error ####################################### parse_failed_tests_from_junit() { local junit_file="${1}" if [[ ! -f "${junit_file}" ]]; then log::error "JUnit file not found: ${junit_file}" return 1 fi log::info "Parsing failed tests from: ${junit_file}" # Parse the JUnit XML to extract file paths from failed test cases # Structure: # <testsuites> # <testsuite name="chromium" tests="50" failures="2"> # <testcase name="test name" file="playwright/e2e/some-test.spec.ts"> # <failure>Error message</failure> # </testcase> # </testsuite> # </testsuites> # Use xmllint to extract file attributes from testcases that have a failure child # Fall back to grep/sed if xmllint not available if command -v xmllint &>/dev/null; then # Extract file paths from testcases with failures xmllint --xpath '//testcase[failure]/@file' "${junit_file}" 2>/dev/null \ \| sed 's/file="$[^"]$"/\1\n/g' \ \| grep -v '^$' \ \| sort -u else # Fallback: use grep to find testcase elements with failures # This is less robust but works without xmllint grep -oP '<testcase[^>]file="\K[^"]+(?="[^>]>[\s\S]?<failure)' "${junit_file}" 2>/dev/null \ \| sort -u fi return 0 } ####################################### # Get the count of failed tests from JUnit XML # Arguments: # junit_file: Path to the JUnit XML file # Outputs: # Writes the number of failures to stdout ####################################### get_failed_test_count() { local junit_file="${1}" if [[ ! -f "${junit_file}" ]]; then echo "0" return fi # Extract the failures count from the testsuite element local failures failures=$(grep -oP 'failures="\K[0-9]+' "${junit_file}" \| head -n 1) echo "${failures:-0}" }
📚 Focus areas based on broader codebase context Error Handling The new bash utilities rely on external tools (`curl`, `jq`, optionally `xmllint`) but do not enable strict bash settings or validate required dependencies before use. This can lead to silent failures and hard-to-debug behavior (e.g., missing `jq` causing empty `head_sha` without a clear dependency error). Add strict mode and explicit dependency checks (at least for required commands) early in the script. (Ref 1, Ref 2) #!/bin/bash # # Utility functions for re-running failed tests from previous CI executions. # # This script provides functions to: # - Fetch JUnit XML results from previous GCS artifacts # - Parse failed test names from JUnit XML # - Build URLs to access previous run artifacts # - Execute only the tests that failed in the previous run # # shellcheck source=.ibm/pipelines/lib/log.sh source "${DIR}/lib/log.sh" # GCS base URL for OpenShift CI test artifacts readonly GCS_BASE_URL="https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results" # GitHub API base URL readonly GITHUB_API_URL="https://api.github.com" # The job name we want to find previous failed runs for readonly RERUN_TARGET_JOB="pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm" ####################################### # Build the GCS artifact URL for a specific job run # Arguments: # org: GitHub organization (e.g., "redhat-developer") # repo: Repository name (e.g., "rhdh") # pr_number: Pull request number # job_name: CI job name (e.g., "pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm") # build_id: Prow build ID # namespace: Test namespace (e.g., "showcase" or "showcase-rbac") # Outputs: # Writes the constructed URL to stdout ####################################### build_previous_run_artifact_url() { local org="${1}" local repo="${2}" local pr_number="${3}" local job_name="${4}" local build_id="${5}" local namespace="${6}" # URL structure: # https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/ # pr-logs/pull/{org}_{repo}/{pr_number}/{job_name}/{build_id}/ # artifacts/e2e-ocp-helm/redhat-developer-rhdh-ocp-helm/artifacts/ # {namespace}/junit-results.xml local url="${GCS_BASE_URL}/pr-logs/pull/${org}_${repo}/${pr_number}/${job_name}/${build_id}" url="${url}/artifacts/e2e-ocp-helm/redhat-developer-rhdh-ocp-helm/artifacts/${namespace}/junit-results.xml" echo "${url}" } ####################################### # Get the previous build ID for a specific job from GitHub API # Arguments: # org: GitHub organization # repo: Repository name # pr_number: Pull request number # target_job: Job name to find (default: RERUN_TARGET_JOB) # Outputs: # Writes the build ID to stdout, or empty string if not found # Returns: # 0 if build ID found, 1 otherwise ####################################### get_previous_failed_build_id() { local org="${1}" local repo="${2}" local pr_number="${3}" local target_job="${4:-${RERUN_TARGET_JOB}}" log::info "Fetching check runs for PR #${pr_number} in ${org}/${repo}..." # Get the PR's head SHA first local head_sha head_sha=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/pulls/${pr_number}" \ -H "Accept: application/vnd.github.v3+json" \ \| jq -r '.head.sha // empty') if [[ -z "${head_sha}" ]]; then log::error "Could not get PR head SHA" return 1 fi log::info "PR head SHA: ${head_sha}" # Get check runs for this commit and find the target job # We look for the most recent completed run of the target job local check_runs_response check_runs_response=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/commits/${head_sha}/check-runs" \ -H "Accept: application/vnd.github.v3+json") # Find the check run that matches our target job and extract details URL # The details_url contains the Prow build ID local details_url details_url=$(echo "${check_runs_response}" \| jq -r \ --arg job "${target_job}" \ '.check_runs[] \| select(.name == $job and .conclusion != null) \| .details_url // empty' \ \| head -n 1) if [[ -z "${details_url}" ]]; then log::warn "No completed check run found for job: ${target_job}" return 1 fi log::info "Found details URL: ${details_url}" # Extract build ID from the details URL # URL format: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/org_repo/pr/job/build_id local build_id build_id=$(echo "${details_url}" \| grep -oE '[0-9]{10,}' \| tail -n 1) if [[ -z "${build_id}" ]]; then log::error "Could not extract build ID from details URL" return 1 fi log::success "Found previous build ID: ${build_id}" echo "${build_id}" } Reference reasoning: The referenced bash script establishes a consistent pattern of `set -euo pipefail` plus up-front `command -v ...` validation for required CLIs, which prevents downstream parsing logic from failing in non-obvious ways. Adopting the same pattern here would make failures deterministic and provide actionable error messages when tools are missing.
📄 References redhat-developer/rhdh-operator/hack/validate-image-digests.sh [1-55] redhat-developer/rhdh-operator/hack/validate-image-digests.sh [56-69] redhat-developer/rhdh-operator/hack/validate-image-digests.sh [71-103] redhat-developer/rhdh-operator/hack/validate-image-digests.sh [104-145] redhat-developer/rhdh-operator/hack/validate-image-digests.sh [146-157] redhat-developer/rhdh-operator/hack/validate-image-digests.sh [159-190] redhat-developer/rhdh-operator/hack/validate-image-digests.sh [192-199] redhat-developer/rhdh-operator/config/profile/rhdh/plugin-infra/plugin-infra.sh [1-49]

github-actions · 2026-01-21T17:23:02Z

🚫 Image Push Skipped.