Skip to content

Conversation

@gustavolira
Copy link
Member

Summary

  • Add a new CI command /test rerun-failed-tests that re-executes only the tests that failed in the previous e2e-ocp-helm run
  • Optimizes CI time and resources by skipping tests that already passed
  • Conditionally deploys only namespaces that had failures (showcase and/or showcase-rbac)

New Files

  • .ibm/pipelines/retest-failed-utils.sh: Utility functions for:

    • Fetching JUnit artifacts from GCS
    • Parsing failed test names from JUnit XML
    • Building artifact URLs
    • Running specific test files with Playwright
  • .ibm/pipelines/jobs/ocp-rerun-failed-tests.sh: Main job handler that:

    • Fetches previous JUnit results via GCS/GitHub API
    • Parses which tests failed in each namespace
    • Deploys only namespaces with failures
    • Runs only the failed tests

How It Works

  1. Fetches JUnit XML from the previous e2e-ocp-helm run via GCS
  2. Parses failed test file paths from the JUnit XML
  3. Deploys only the namespace(s) that had failures
  4. Runs only the specific test files that failed using Playwright

Error Handling

Scenario Behavior
No previous execution Exit success, nothing to rerun
No tests failed Exit success, nothing to rerun
Test file no longer exists Warning and skip
All retests pass PR check green
Some retests still fail PR check red

Test Plan

  • Create a PR with a test that fails intentionally
  • Run /test e2e-ocp-helm (will fail)
  • Run /test rerun-failed-tests
  • Verify only failed tests are executed
  • Verify only necessary namespaces are deployed

Note: This PR requires a corresponding PR in the release repository to add the CI configuration.

🤖 Generated with Claude Code

@openshift-ci
Copy link

openshift-ci bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zdrapela for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gustavolira added a commit to gustavolira/release that referenced this pull request Jan 21, 2026
Add CI configuration for the new `/test rerun-failed-tests` command
that re-executes only tests that failed in the previous e2e-ocp-helm run.

New step registry:
- ci-operator/step-registry/redhat-developer/rhdh/ocp/rerun-failed-tests/

New test configuration in redhat-developer-rhdh-main.yaml:
- as: rerun-failed-tests
- optional: true (manually triggered via /test rerun-failed-tests)
- Uses cluster_claim with OCP 4.18 on AWS

This PR depends on: redhat-developer/rhdh#4037

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

The image is available at:

/test e2e-ocp-helm

@gustavolira
Copy link
Member Author

/review

@rhdh-qodo-merge
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🔒 No security concerns identified
⚡ Recommended focus areas for review

Reliability

The GitHub API requests are unauthenticated and do not handle non-200 responses, pagination, or rate limiting. This can lead to flaky behavior where the rerun job exits “success” without actually rerunning because head_sha or the matching check run cannot be retrieved. Consider supporting GITHUB_TOKEN/Authorization when available, checking HTTP status codes, and handling multiple pages or selecting the most recent matching check run deterministically.

get_previous_failed_build_id() {
  local org="${1}"
  local repo="${2}"
  local pr_number="${3}"
  local target_job="${4:-${RERUN_TARGET_JOB}}"

  log::info "Fetching check runs for PR #${pr_number} in ${org}/${repo}..."

  # Get the PR's head SHA first
  local head_sha
  head_sha=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/pulls/${pr_number}" \
    -H "Accept: application/vnd.github.v3+json" \
    | jq -r '.head.sha // empty')

  if [[ -z "${head_sha}" ]]; then
    log::error "Could not get PR head SHA"
    return 1
  fi

  log::info "PR head SHA: ${head_sha}"

  # Get check runs for this commit and find the target job
  # We look for the most recent completed run of the target job
  local check_runs_response
  check_runs_response=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/commits/${head_sha}/check-runs" \
    -H "Accept: application/vnd.github.v3+json")

  # Find the check run that matches our target job and extract details URL
  # The details_url contains the Prow build ID
  local details_url
  details_url=$(echo "${check_runs_response}" | jq -r \
    --arg job "${target_job}" \
    '.check_runs[] | select(.name == $job and .conclusion != null) | .details_url // empty' \
    | head -n 1)

  if [[ -z "${details_url}" ]]; then
    log::warn "No completed check run found for job: ${target_job}"
    return 1
  fi

  log::info "Found details URL: ${details_url}"

  # Extract build ID from the details URL
  # URL format: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/org_repo/pr/job/build_id
  local build_id
  build_id=$(echo "${details_url}" | grep -oE '[0-9]{10,}' | tail -n 1)

  if [[ -z "${build_id}" ]]; then
    log::error "Could not extract build ID from details URL"
    return 1
  fi

  log::success "Found previous build ID: ${build_id}"
  echo "${build_id}"
}
Parsing Robustness

Failure/test extraction relies on regex/heuristics that may miscount or misidentify failures (e.g., grep -oP 'failures="...' can pick up other suites, and the build id extraction pulls any long digit sequence from details_url). Also, xmllint --xpath '//testcase[failure]/@file' can output multiple attributes on one line and the subsequent sed assumes a simple format. Tightening these parsers (or extracting structured fields with jq/XML tooling more defensively) will reduce false positives/negatives and prevent rerunning the wrong tests.

  # Extract build ID from the details URL
  # URL format: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/org_repo/pr/job/build_id
  local build_id
  build_id=$(echo "${details_url}" | grep -oE '[0-9]{10,}' | tail -n 1)

  if [[ -z "${build_id}" ]]; then
    log::error "Could not extract build ID from details URL"
    return 1
  fi

  log::success "Found previous build ID: ${build_id}"
  echo "${build_id}"
}

#######################################
# Fetch JUnit results XML from GCS for a specific namespace
# Arguments:
#   artifact_url: Full URL to the junit-results.xml file
#   output_file: Local path to save the XML file
# Returns:
#   0 if successful, 1 if failed
#######################################
fetch_previous_junit_results() {
  local artifact_url="${1}"
  local output_file="${2}"

  log::info "Fetching JUnit results from: ${artifact_url}"

  # Attempt to download the JUnit XML file
  local http_status
  http_status=$(curl -sS -w "%{http_code}" -o "${output_file}" "${artifact_url}")

  if [[ "${http_status}" == "200" ]]; then
    log::success "Successfully downloaded JUnit results"
    return 0
  else
    log::warn "Failed to fetch JUnit results (HTTP ${http_status})"
    rm -f "${output_file}"
    return 1
  fi
}

#######################################
# Parse failed test names from JUnit XML file
# Arguments:
#   junit_file: Path to the JUnit XML file
# Outputs:
#   Writes failed test file paths to stdout, one per line
# Returns:
#   0 if parsing successful (even if no failures), 1 if file not found or parsing error
#######################################
parse_failed_tests_from_junit() {
  local junit_file="${1}"

  if [[ ! -f "${junit_file}" ]]; then
    log::error "JUnit file not found: ${junit_file}"
    return 1
  fi

  log::info "Parsing failed tests from: ${junit_file}"

  # Parse the JUnit XML to extract file paths from failed test cases
  # Structure:
  # <testsuites>
  #   <testsuite name="chromium" tests="50" failures="2">
  #     <testcase name="test name" file="playwright/e2e/some-test.spec.ts">
  #       <failure>Error message</failure>
  #     </testcase>
  #   </testsuite>
  # </testsuites>

  # Use xmllint to extract file attributes from testcases that have a failure child
  # Fall back to grep/sed if xmllint not available
  if command -v xmllint &>/dev/null; then
    # Extract file paths from testcases with failures
    xmllint --xpath '//testcase[failure]/@file' "${junit_file}" 2>/dev/null \
      | sed 's/file="\([^"]*\)"/\1\n/g' \
      | grep -v '^$' \
      | sort -u
  else
    # Fallback: use grep to find testcase elements with failures
    # This is less robust but works without xmllint
    grep -oP '<testcase[^>]*file="\K[^"]+(?="[^>]*>[\s\S]*?<failure)' "${junit_file}" 2>/dev/null \
      | sort -u
  fi

  return 0
}

#######################################
# Get the count of failed tests from JUnit XML
# Arguments:
#   junit_file: Path to the JUnit XML file
# Outputs:
#   Writes the number of failures to stdout
#######################################
get_failed_test_count() {
  local junit_file="${1}"

  if [[ ! -f "${junit_file}" ]]; then
    echo "0"
    return
  fi

  # Extract the failures count from the testsuite element
  local failures
  failures=$(grep -oP 'failures="\K[0-9]+' "${junit_file}" | head -n 1)

  echo "${failures:-0}"
}
📚 Focus areas based on broader codebase context

Error Handling

The new bash utilities rely on external tools (curl, jq, optionally xmllint) but do not enable strict bash settings or validate required dependencies before use. This can lead to silent failures and hard-to-debug behavior (e.g., missing jq causing empty head_sha without a clear dependency error). Add strict mode and explicit dependency checks (at least for required commands) early in the script. (Ref 1, Ref 2)

#!/bin/bash
#
# Utility functions for re-running failed tests from previous CI executions.
#
# This script provides functions to:
# - Fetch JUnit XML results from previous GCS artifacts
# - Parse failed test names from JUnit XML
# - Build URLs to access previous run artifacts
# - Execute only the tests that failed in the previous run
#

# shellcheck source=.ibm/pipelines/lib/log.sh
source "${DIR}/lib/log.sh"

# GCS base URL for OpenShift CI test artifacts
readonly GCS_BASE_URL="https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results"

# GitHub API base URL
readonly GITHUB_API_URL="https://api.github.com"

# The job name we want to find previous failed runs for
readonly RERUN_TARGET_JOB="pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm"

#######################################
# Build the GCS artifact URL for a specific job run
# Arguments:
#   org: GitHub organization (e.g., "redhat-developer")
#   repo: Repository name (e.g., "rhdh")
#   pr_number: Pull request number
#   job_name: CI job name (e.g., "pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm")
#   build_id: Prow build ID
#   namespace: Test namespace (e.g., "showcase" or "showcase-rbac")
# Outputs:
#   Writes the constructed URL to stdout
#######################################
build_previous_run_artifact_url() {
  local org="${1}"
  local repo="${2}"
  local pr_number="${3}"
  local job_name="${4}"
  local build_id="${5}"
  local namespace="${6}"

  # URL structure:
  # https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/
  #   pr-logs/pull/{org}_{repo}/{pr_number}/{job_name}/{build_id}/
  #   artifacts/e2e-ocp-helm/redhat-developer-rhdh-ocp-helm/artifacts/
  #   {namespace}/junit-results.xml
  local url="${GCS_BASE_URL}/pr-logs/pull/${org}_${repo}/${pr_number}/${job_name}/${build_id}"
  url="${url}/artifacts/e2e-ocp-helm/redhat-developer-rhdh-ocp-helm/artifacts/${namespace}/junit-results.xml"

  echo "${url}"
}

#######################################
# Get the previous build ID for a specific job from GitHub API
# Arguments:
#   org: GitHub organization
#   repo: Repository name
#   pr_number: Pull request number
#   target_job: Job name to find (default: RERUN_TARGET_JOB)
# Outputs:
#   Writes the build ID to stdout, or empty string if not found
# Returns:
#   0 if build ID found, 1 otherwise
#######################################
get_previous_failed_build_id() {
  local org="${1}"
  local repo="${2}"
  local pr_number="${3}"
  local target_job="${4:-${RERUN_TARGET_JOB}}"

  log::info "Fetching check runs for PR #${pr_number} in ${org}/${repo}..."

  # Get the PR's head SHA first
  local head_sha
  head_sha=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/pulls/${pr_number}" \
    -H "Accept: application/vnd.github.v3+json" \
    | jq -r '.head.sha // empty')

  if [[ -z "${head_sha}" ]]; then
    log::error "Could not get PR head SHA"
    return 1
  fi

  log::info "PR head SHA: ${head_sha}"

  # Get check runs for this commit and find the target job
  # We look for the most recent completed run of the target job
  local check_runs_response
  check_runs_response=$(curl -sS "${GITHUB_API_URL}/repos/${org}/${repo}/commits/${head_sha}/check-runs" \
    -H "Accept: application/vnd.github.v3+json")

  # Find the check run that matches our target job and extract details URL
  # The details_url contains the Prow build ID
  local details_url
  details_url=$(echo "${check_runs_response}" | jq -r \
    --arg job "${target_job}" \
    '.check_runs[] | select(.name == $job and .conclusion != null) | .details_url // empty' \
    | head -n 1)

  if [[ -z "${details_url}" ]]; then
    log::warn "No completed check run found for job: ${target_job}"
    return 1
  fi

  log::info "Found details URL: ${details_url}"

  # Extract build ID from the details URL
  # URL format: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/org_repo/pr/job/build_id
  local build_id
  build_id=$(echo "${details_url}" | grep -oE '[0-9]{10,}' | tail -n 1)

  if [[ -z "${build_id}" ]]; then
    log::error "Could not extract build ID from details URL"
    return 1
  fi

  log::success "Found previous build ID: ${build_id}"
  echo "${build_id}"
}

Reference reasoning: The referenced bash script establishes a consistent pattern of set -euo pipefail plus up-front command -v ... validation for required CLIs, which prevents downstream parsing logic from failing in non-obvious ways. Adopting the same pattern here would make failures deterministic and provide actionable error messages when tools are missing.

📄 References
  1. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [1-55]
  2. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [56-69]
  3. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [71-103]
  4. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [104-145]
  5. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [146-157]
  6. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [159-190]
  7. redhat-developer/rhdh-operator/hack/validate-image-digests.sh [192-199]
  8. redhat-developer/rhdh-operator/config/profile/rhdh/plugin-infra/plugin-infra.sh [1-49]

@github-actions
Copy link
Contributor

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

gustavolira added a commit to gustavolira/release that referenced this pull request Jan 21, 2026
Add CI configuration for the new `/test rerun-failed-tests` command
that re-executes only tests that failed in the previous e2e-ocp-helm run.

New step registry:
- ci-operator/step-registry/redhat-developer/rhdh/ocp/rerun-failed-tests/

New test configuration in redhat-developer-rhdh-main.yaml:
- as: rerun-failed-tests
- optional: true (manually triggered via /test rerun-failed-tests)
- Uses cluster_claim with OCP 4.18 on AWS

This PR depends on: redhat-developer/rhdh#4037

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gustavolira gustavolira force-pushed the feat/rerun-failed-tests branch from 6a610e8 to e74c9f4 Compare January 21, 2026 17:39
@github-actions
Copy link
Contributor

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

@github-actions
Copy link
Contributor

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

@github-actions
Copy link
Contributor

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

Add a new CI command that re-executes only the tests that failed in the
previous e2e-ocp-helm run, optimizing time and resources.

New files:
- retest-failed-utils.sh: Utility functions for fetching JUnit artifacts
  from GCS, parsing failed tests, and running specific test files
- jobs/ocp-rerun-failed-tests.sh: Main job handler that orchestrates
  fetching previous results, deploying only needed namespaces, and
  running failed tests

The command:
- Fetches JUnit results from the previous e2e-ocp-helm run via GCS
- Parses which tests failed for showcase and showcase-rbac namespaces
- Deploys only the namespaces that had failures
- Runs only the tests that previously failed using Playwright
- Returns success if no previous run exists or no tests failed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🚫 Image Push Skipped.

The container image push was skipped because the build was skipped (either due to [skip-build] tag or no relevant changes with existing image)

@sonarqubecloud
Copy link

@openshift-ci
Copy link

openshift-ci bot commented Jan 21, 2026

@gustavolira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ocp-helm cda5403 link true /test e2e-ocp-helm

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant