Skip to content

Conversation

@tchap
Copy link
Contributor

@tchap tchap commented Aug 19, 2025

The flag can be used to only sync N most recently modified files.

The flag is mutually exclusive to:

  • --include
  • --exclude
  • --delete
  • --watch

It is also ignored when the source is a local directory when using tar, because the implementation doesn't allow to select particular files. We would need to exclude all other files, which can explode easily. Simply use a different strategy in that case.

Generally when there is any problem when using --last, the error is ignored and sync happens as if the flag was not specified.

Regarding implementation details, oc rsync performs an extras step when --last is specified, and that is discovering relevant files to select. This is done using manual directory walking when local, for remote the remote executor is used to invoke a shell using find+sort+head.

The resulting filenames are then passed to --files-from for rsync, for tar they are simply passed to the command as arguments.

Tests were added for testing the discovery mechanism, the rest has been tested manually. oc rsync is rather poorly unit-tested in general.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 19, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 19, 2025

@tchap: This pull request references WRKLDS-1191 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

Details

In response to this:

The flag can be used to only sync N most recently modified files.

The flag is mutually exclusive to:

  • --include
  • --exclude
  • --delete
  • --watch

It is also ignored when the source is a local directory when using tar, because the implementation doesn't allow to select particular files. We would need to exclude all other files, which can explode easily. Simply use a different strategy in that case.

Generally when there is any problem when using --last, the error is ignored and sync happens as if the flag was not specified.

Regarding implementation details, oc rsync performs an extras step when --last is specified, and that is discovering relevant files to select. This is done using manual directory walking when local, for remote the remote executor is used to invoke a shell using find+sort+head.

The resulting filenames are then passed to --files-from for rsync, for tar they are simply passed to the command as arguments.

Tests were added for testing the discovery mechanism, the rest has been tested manually. oc rsync is rather poorly unit-tested in general.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 19, 2025

@tchap: This pull request references WRKLDS-1191 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

Details

In response to this:

The flag can be used to only sync N most recently modified files.

The flag is mutually exclusive to:

  • --include
  • --exclude
  • --delete
  • --watch

It is also ignored when the source is a local directory when using tar, because the implementation doesn't allow to select particular files. We would need to exclude all other files, which can explode easily. Simply use a different strategy in that case.

Generally when there is any problem when using --last, the error is ignored and sync happens as if the flag was not specified.

Regarding implementation details, oc rsync performs an extras step when --last is specified, and that is discovering relevant files to select. This is done using manual directory walking when local, for remote the remote executor is used to invoke a shell using find+sort+head.

The resulting filenames are then passed to --files-from for rsync, for tar they are simply passed to the command as arguments.

Tests were added for testing the discovery mechanism, the rest has been tested manually. oc rsync is rather poorly unit-tested in general.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from atiratree and deads2k August 19, 2025 12:13
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 19, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tchap
Once this PR has been reviewed and has the lgtm label, please assign ardaguclu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tchap
Copy link
Contributor Author

tchap commented Aug 19, 2025

I guess this can also wait for 4.21 as nobody is going to use it in 4.20 anyway...

@tchap tchap force-pushed the rsync-last-n-include branch 2 times, most recently from 9f81cd5 to b3c84cc Compare August 20, 2025 16:35
@tchap
Copy link
Contributor Author

tchap commented Aug 20, 2025

/test okd-scos-e2e-aws-ovn
/test e2e-metal-ipi-ovn-ipv6
/test e2e-aws-ovn

@tchap
Copy link
Contributor Author

tchap commented Sep 18, 2025

/retest

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 18, 2025

Walkthrough

Introduces a --last flag to selectively copy only the N most recently modified files via rsync. Adds file discovery interfaces with local and remote implementations. Updates all three rsync copy strategies to utilize file discovery when enabled, with comprehensive validation preventing incompatible flag combinations.

Changes

Cohort / File(s) Summary
Rsync strategy implementations
pkg/cli/rsync/copy_rsync.go, pkg/cli/rsync/copy_rsyncd.go, pkg/cli/rsync/copy_tar.go
Adds Last and fileDiscovery fields to respective strategy types. When Last > 0, discovers matching files and invokes rsync with --files-from -, piping file list via stdin. Adjusts destination path handling for rsync semantics. For copy_tar.go, introduces NoRecursion flag to avoid recursive traversal when filtering is active.
File discovery interface and implementations
pkg/cli/rsync/discovery.go, pkg/cli/rsync/discovery_local.go, pkg/cli/rsync/discovery_remote.go
Defines fileDiscoverer interface for discovering N most recently modified files. Implements local discovery by reading directory, sorting by modification time, and truncating results. Implements remote discovery via shell command execution, parsing output, and extracting filenames.
Discovery tests
pkg/cli/rsync/discovery_local_test.go, pkg/cli/rsync/discovery_remote_test.go, pkg/cli/rsync/discovery_test.go
Validates local discovery with multiple result-count scenarios and subdirectory filtering. Validates remote discovery via mocked executor with success, partial, empty, and error cases. Introduces mockFileDiscoverer helper for testing discovery integration.
Strategy and executor tests
pkg/cli/rsync/copy_tar_test.go, pkg/cli/rsync/exec_test.go
Tests tar strategy file-discovery behavior with scenarios covering original includes replacement, discovery errors, and nil results. Introduces mockExecutor helper for validating command invocations and controlling execution outcomes.
CLI core
pkg/cli/rsync/rsync.go
Adds --last flag to RsyncOptions with initialization logic selecting local or remote discoverer based on source type. Extends validation to forbid incompatible combinations with --watch, --include, --exclude, and --delete. Updates help text and examples.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@tchap
Copy link
Contributor Author

tchap commented Jan 6, 2026

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2026
The flag can be used to only sync N most recently modified files.

The flag is mutually exclusive to:
  * --include
  * --exclude
  * --delete
  * --watch

It is also ignored when the source is a local directory when using tar,
because the implementation doesn't allow to select particular files.

Generally when there is any problem when using --last, the error is
ignored and sync happens as if the flag was not specified.

Regarding implementation details, oc rsync performs an extras step when
--last is specified, and that is discovering relevant files to select.
This is done using manual directory walking when local, for remote the
remote executor is used to invoke a shell using find+sort+head.

The resulting filenames are then passed to --files-from for rsync,
for tar they are simply passed to the command as arguments.

Tests were added for testing the discovery mechanism, the rest has been
tested manually. oc rsync is poorly unit-tested in general.

Assisted-by: Claude Code
@tchap tchap force-pushed the rsync-last-n-include branch from b3c84cc to a587e18 Compare January 6, 2026 16:14
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI Agents
In @pkg/cli/rsync/copy_rsync.go:
- Around line 95-112: When r.Last > 0 make a nil-check for r.fileDiscovery
before calling r.fileDiscovery.DiscoverFiles to avoid a panic in cases where
rsyncStrategy was instantiated directly (e.g., in tests); if r.fileDiscovery is
nil, either initialize a sensible default file discovery implementation or log a
warning and skip the --last branch (leaving in and dst unchanged), otherwise
call r.fileDiscovery.DiscoverFiles as before and proceed; reference r.Last,
r.fileDiscovery.DiscoverFiles, and rsyncStrategy (and Complete() which normally
sets fileDiscovery) to locate the code to change.

In @pkg/cli/rsync/copy_rsyncd.go:
- Around line 238-255: The block that applies --last in copy_rsyncd.go calls
s.fileDiscovery.DiscoverFiles when s.Last > 0 but lacks a nil check and can
panic if s.fileDiscovery is nil; update the conditional to first verify
s.fileDiscovery != nil before calling DiscoverFiles, e.g., if s.Last > 0 &&
s.fileDiscovery != nil { ... } and if fileDiscovery is nil log a warning (using
klog.Infof or klog.Warningf) that --last was ignored; keep the existing behavior
of populating in, adjusting dst, and logging when DiscoverFiles returns an error
or succeeds.

In @pkg/cli/rsync/discovery_remote.go:
- Around line 44-47: The code extracts filename using filepath.Base on a remote
container path (variables fullPath and filename in discovery_remote.go) which
breaks on Windows; replace filepath.Base with path.Base (from the "path"
package) to operate on POSIX-style forward-slash remote paths and update imports
accordingly so the file uses the "path" package instead of "path/filepath".
🧹 Nitpick comments (3)
pkg/cli/rsync/discovery_remote.go (1)

51-52: Check scanner.Err() after the scan loop.

The scanner may encounter errors during iteration. Add a check after the loop to ensure no errors occurred.

🔎 Proposed fix
 	}
+	if err := scanner.Err(); err != nil {
+		return nil, fmt.Errorf("error reading remote command output: %w", err)
+	}
 	return filenames, nil
pkg/cli/rsync/copy_tar_test.go (1)

66-74: Type assertion could panic; consider a safer approach.

The type assertion NewTarStrategy(options).(*tarStrategy) will panic if NewTarStrategy returns a different concrete type. While this is unlikely in a test, using the two-value form provides clearer test failure output.

🔎 Suggested improvement
-			strategy := NewTarStrategy(options).(*tarStrategy)
+			strategy, ok := NewTarStrategy(options).(*tarStrategy)
+			if !ok {
+				t.Fatal("NewTarStrategy did not return *tarStrategy")
+			}
pkg/cli/rsync/copy_rsyncd.go (1)

227-274: Consider extracting duplicated file discovery logic.

The file discovery block (building stdin buffer, adjusting destination path, logging) is nearly identical to copy_rsync.go. Extracting this to a helper function would improve maintainability and ensure consistent behavior across strategies.

🔎 Example helper signature
// discoverFilesForRsync returns the stdin reader and adjusted destination path
// when --last filtering is enabled. Returns nil reader if discovery fails or is disabled.
func discoverFilesForRsync(fd fileDiscoverer, last uint, sourcePath, destPath string) (io.Reader, string) {
    // ... shared logic ...
}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between dc6d2d4 and a587e18.

📒 Files selected for processing (12)
  • pkg/cli/rsync/copy_rsync.go
  • pkg/cli/rsync/copy_rsyncd.go
  • pkg/cli/rsync/copy_tar.go
  • pkg/cli/rsync/copy_tar_test.go
  • pkg/cli/rsync/discovery.go
  • pkg/cli/rsync/discovery_local.go
  • pkg/cli/rsync/discovery_local_test.go
  • pkg/cli/rsync/discovery_remote.go
  • pkg/cli/rsync/discovery_remote_test.go
  • pkg/cli/rsync/discovery_test.go
  • pkg/cli/rsync/exec_test.go
  • pkg/cli/rsync/rsync.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/cli/rsync/discovery_remote_test.go
  • pkg/cli/rsync/discovery_remote.go
  • pkg/cli/rsync/rsync.go
  • pkg/cli/rsync/discovery_test.go
  • pkg/cli/rsync/copy_rsyncd.go
  • pkg/cli/rsync/copy_tar.go
  • pkg/cli/rsync/discovery_local_test.go
  • pkg/cli/rsync/discovery.go
  • pkg/cli/rsync/discovery_local.go
  • pkg/cli/rsync/copy_tar_test.go
  • pkg/cli/rsync/exec_test.go
  • pkg/cli/rsync/copy_rsync.go
🧬 Code graph analysis (2)
pkg/cli/rsync/copy_tar.go (1)
pkg/helpers/source-to-image/tar/tar.go (2)
  • Tar (32-73)
  • Writer (82-86)
pkg/cli/rsync/copy_rsync.go (1)
pkg/cli/rsync/pathspec.go (1)
  • PathSpec (17-20)
🔇 Additional comments (14)
pkg/cli/rsync/exec_test.go (1)

1-27: LGTM!

Clean mock implementation for testing command execution. The structure properly validates expected commands and returns configurable output/errors.

pkg/cli/rsync/copy_tar.go (2)

48-65: LGTM on the discovery integration.

The error handling correctly logs warnings and proceeds without filtering when discovery fails, which aligns with the graceful degradation behavior described in the PR.


222-253: Verify --no-recursion behavior with trailing-slash sources.

The --no-recursion flag is only applied when the source path does not end with /. When the source ends with /, tar includes specific file patterns without --no-recursion. Please confirm this asymmetry is intentional and that file-only filtering works correctly for both path styles.

pkg/cli/rsync/discovery.go (1)

1-7: LGTM!

Clean interface definition with appropriate abstraction for file discovery.

pkg/cli/rsync/discovery_local_test.go (1)

12-97: LGTM!

Good test coverage with clear setup and multiple scenarios. The test properly validates ordering by modification time and ensures subdirectories are ignored.

pkg/cli/rsync/discovery_local.go (1)

19-65: LGTM!

Clean implementation with proper error handling and efficient pre-allocation. The sorting and limiting logic is correct.

pkg/cli/rsync/discovery_test.go (1)

1-14: LGTM!

Simple and effective mock for testing discovery behavior.

pkg/cli/rsync/copy_tar_test.go (1)

12-77: Well-structured test with good coverage of edge cases.

The test cases comprehensively cover the file discovery logic: successful discovery replacing originals, fallback when discovery returns empty or errors, and handling of nil/empty states. The use of table-driven tests and cmp.Diff for clear failure output follows Go testing best practices.

pkg/cli/rsync/rsync.go (2)

237-245: File discovery initialization is correctly ordered.

The file discovery setup happens before strategy initialization, ensuring strategies can use the discovered files during construction. The conditional logic properly distinguishes between local and remote sources.


276-287: Validation for mutually exclusive flags is comprehensive.

The switch-case structure cleanly validates that --last cannot be combined with --watch, --include, --exclude, or --delete. Error messages clearly identify which flags conflict.

pkg/cli/rsync/discovery_remote_test.go (2)

12-87: Good test coverage for remote file discovery.

The test suite covers the expected scenarios: successful discovery with exact and partial results, empty directories, and error propagation. The mock executor approach cleanly isolates the discovery logic from actual command execution.


64-72: Fix shell injection vulnerability in command construction.

Line 28 directly embeds basePath in single quotes without escaping: fmt.Sprintf("find '%s' ...", basePath). If basePath contains a single quote, it breaks the shell command and enables injection. Use proper shell escaping or an escaping library before embedding user-controlled paths into shell strings.

pkg/cli/rsync/copy_rsync.go (1)

87-133: Clean integration of --last filtering with rsync strategy.

The implementation properly builds the file list for --files-from, adjusts the destination path to maintain rsync semantics, and gracefully falls back on discovery failure. The warning log ensures users are informed when filtering cannot be applied.

pkg/cli/rsync/copy_rsyncd.go (1)

324-341: Strategy initialization correctly propagates --last options.

The NewRsyncDaemonStrategy function properly initializes Last and fileDiscovery from the options, maintaining consistency with other strategy constructors.

Comment on lines +28 to +31
cmd := []string{"sh", "-c", fmt.Sprintf("find '%s' -maxdepth 1 -type f -printf '%%T@ %%p\\n' | sort -rn | head -n %d", basePath, lastN)}
if err := discoverer.exec.Execute(cmd, nil, &output, &errOutput); err != nil {
return nil, fmt.Errorf("failed to execute remote find+sort+head command: %w, stderr: %s", err, errOutput.String())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential command injection if basePath contains shell metacharacters.

The basePath is embedded in a shell command with single quotes. If the path contains a single quote ('), it could break out of the quoting and allow command injection. Consider sanitizing or escaping the path.

🔎 Proposed fix
-	cmd := []string{"sh", "-c", fmt.Sprintf("find '%s' -maxdepth 1 -type f -printf '%%T@ %%p\\n' | sort -rn | head -n %d", basePath, lastN)}
+	// Escape single quotes in basePath to prevent command injection.
+	escapedPath := strings.ReplaceAll(basePath, "'", "'\\''")
+	cmd := []string{"sh", "-c", fmt.Sprintf("find '%s' -maxdepth 1 -type f -printf '%%T@ %%p\\n' | sort -rn | head -n %d", escapedPath, lastN)}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd := []string{"sh", "-c", fmt.Sprintf("find '%s' -maxdepth 1 -type f -printf '%%T@ %%p\\n' | sort -rn | head -n %d", basePath, lastN)}
if err := discoverer.exec.Execute(cmd, nil, &output, &errOutput); err != nil {
return nil, fmt.Errorf("failed to execute remote find+sort+head command: %w, stderr: %s", err, errOutput.String())
}
// Escape single quotes in basePath to prevent command injection.
escapedPath := strings.ReplaceAll(basePath, "'", "'\\''")
cmd := []string{"sh", "-c", fmt.Sprintf("find '%s' -maxdepth 1 -type f -printf '%%T@ %%p\\n' | sort -rn | head -n %d", escapedPath, lastN)}
if err := discoverer.exec.Execute(cmd, nil, &output, &errOutput); err != nil {
return nil, fmt.Errorf("failed to execute remote find+sort+head command: %w, stderr: %s", err, errOutput.String())
}

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2026

@tchap: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial-2of2 a587e18 link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-agnostic-ovn-cmd a587e18 link true /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn a587e18 link true /test e2e-aws-ovn
ci/prow/okd-scos-images a587e18 link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants