Skip to content

Conversation

@hiboyang
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a follow-up for PR #8082, which adds support for RayJob with autoscaling. This PR improves e2e test to wait for scaling-up/down.

Which issue(s) this PR fixes:

Improve e2e test from PR #8082.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

No

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hiboyang
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link

netlify bot commented Dec 11, 2025

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 60244b9
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/693c38d64e084d0008917cbe
😎 Deploy Preview https://deploy-preview-8174--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 11, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @hiboyang. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 11, 2025
@hiboyang
Copy link
Contributor Author

@mimowo @yaroslava-serdiuk this is the PR to improve e2e test for RayJob with autoscaling.

@mimowo
Copy link
Contributor

mimowo commented Dec 11, 2025

Thank you! I will leave the first review pass to Yaroslava

@mimowo
Copy link
Contributor

mimowo commented Dec 11, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 11, 2025
g.Expect(k8sClient.List(ctx, podList, client.InNamespace(ns.Name))).To(gomega.Succeed())
// Count pods that have "workers" in their name
workerPodCount := countWorkerPods(podList)
g.Expect(workerPodCount).To(gomega.Equal(1), "Expected exactly 5 pods with 'workers' in the name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
g.Expect(workerPodCount).To(gomega.Equal(1), "Expected exactly 5 pods with 'workers' in the name")
g.Expect(workerPodCount).To(gomega.Equal(1), "Expected exactly 1 pods with 'workers' in the name")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

# run tasks in parallel to trigger autoscaling (scaling up)
print(ray.get([my_task.remote(i, 10) for i in range(10)]))
# run tasks in sequence to trigger scaling down
print([ray.get(my_task.remote(i, 1)) for i in range(40)])`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you put such high numbers? I'd anticipate a maximum of around 5 being adequate here. If we want to run with 20 or 40 iterations, I suggest we reduce the sleep time accordingly to optimize the test's execution speed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high numbers are to make the rayjob have enough time to scale up and down. Let me tune these numbers to set them a bit lower.

Copy link
Contributor Author

@hiboyang hiboyang Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaroslava-serdiuk I updated the tests to make waiting time shorter in rayjob, tried different values multiple times, if setting too short, the test is not stable and may fail randomly. Now get the final values. Would you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yaroslava-serdiuk for checking! I updated the PR again due to git conflict. Would you help to approve again?

@yaroslava-serdiuk
Copy link
Contributor

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 11, 2025
@hiboyang
Copy link
Contributor Author

/retest

Copy link
Contributor

@yaroslava-serdiuk yaroslava-serdiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 12, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f63009d2190179cca826c4fdb02319e3de6e04a8

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2025
Co-authored-by: Yaroslava Serdiuk <yaroslava@google.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 12, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants