Skip to content

Conversation

@mattl-netflix
Copy link
Contributor

6369b6a 2024-05-02 | Cease stopping Cassandra in the restore process and reveal the ability to restore to a staging area and then import or move the data depending on whether Cassandra is running.
4ece8d3 2024-05-01 | Move getBackupDirectories to BackupRestoreUtil and change its signature to cease requiring an IConfiguration but rather just take the data directory path as a String.
6b3d45a 2024-04-10 | Cease wiping the data directory in advance of restore. It is counter-intuitive and will be empty in practice in most cases anyway.
f10289f 2024-04-10 | Consolidate restore status API.
4d6d07f 2024-03-09 | Remove backup racs as that configuration is not used in practice and is not relevant to whether a restore should be attempted.
aa3dffe 2024-03-09 | Remove redundant vertical whitespace.
8c23800 2024-03-09 | Remove redundant logging.
646973d 2024-03-09 | Remove all non-javadoc comments that don't provide warnings to future maintainers.
8d18f51 2024-03-09 | Remove waitForCompletion parameter to download method which is always false in practice. Remove final modifier on static method. Make static property an instance variable instead. Use parameterized constructor call in place of two lines.
ce6c7c1 2024-03-09 | Remove unused code from our restore tooling.

@mattl-netflix mattl-netflix force-pushed the feature/in_place_restore_4 branch from 6369b6a to 9557275 Compare May 5, 2024 20:11
String restoreFileName =
PATH_JOINER.join(dataDir, keyspace, columnFamily, indexDir, fileName);
return_ = new File(restoreFileName);
return_ =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are in here, consider renaming this variable, its not idiomatic java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

String dataDirectory, String monitoringFolder) throws IOException {
ImmutableSet.Builder<Path> backupPaths = ImmutableSet.builder();
Path dataPath = Paths.get(dataDirectory);
if (Files.exists(dataPath) && Files.isDirectory(dataPath))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Files.newDirectoryStream(keyspaceDirPath, Files::isDirectory)) {
for (Path columnfamilyDirPath : keyspaceStream) {
Path backupDirPath =
Paths.get(columnfamilyDirPath.toString(), monitoringFolder);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use columnFamilyDirPath.resolve(monitoringFolder) here instead of converting back and forth between string and path. Reference: https://docs.oracle.com/javase/8/docs/api/index.html?java/nio/file/Path.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (Path columnfamilyDirPath : keyspaceStream) {
Path backupDirPath =
Paths.get(columnfamilyDirPath.toString(), monitoringFolder);
if (Files.exists(backupDirPath) && Files.isDirectory(backupDirPath)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here that the File.exists call is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (instanceState.getRestoreStatus() != null
&& instanceState.getRestoreStatus().getStatus() != null
&& instanceState.getRestoreStatus().getStatus() == Status.STARTED) {
if (instanceState.isRestoring()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this cleanup 🙌

failedImports.addAll(importData(keyspace, table, tableDir.toString()));
}
} else {
recursiveMove(Paths.get(srcDir), Paths.get(configuration.getDataFileLocation()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the move is non-atomic I can see a weird edge case where we start Cassandra while the move is happening (and we have like 10,000+ files). Should we have some sort of "fence" that tells the subsystem that starts Cassandra it needs to wait while this happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should. I will need to address this in a follow-up, however. This is analogous to Cassandra getting started accidentally before the full download is complete the in the model we use today. Which hasn't happened yet, knock on wood. It might requires some rethinking about consolidating Cassandra's remediation policy in our environment. That is, the between Priam and systemd.

if (Files.isRegularFile(path)) {
Files.move(path, destination.resolve(path.getFileName()));
} else if (Files.isDirectory(path)) {
recursiveMove(path, destination.resolve(path.getFileName()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If am reading this right, if the source path we are currently iterating over is a directory we recurse by moving the files in that directory into [destination/directory name]. Cant we just move the entire directory instead of recursing then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here was for the the operation to be strictly additive. That is, if there are already files present I want to ensure that for them to get overwritten that the operator does something explicit.

Note that to that end I have changed the policy regarding whether to replace files if they are already present in a follow-up commit.

} else if (Files.isDirectory(path)) {
recursiveMove(path, destination.resolve(path.getFileName()));
} else {
throw new IOException("Failed determining type of inode is " + path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this error or just skip/log? Likely an edge case but do we really want to fail if thats the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new model using walkFileTree() we always continue.


// Cleanup local data
File dataDir = new File(config.getDataFileLocation());
if (dataDir.exists() && dataDir.isDirectory()) FileUtils.cleanDirectory(dataDir);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about this: I know we don't need to stop cass anymore because we use import but I was surprised to see we no longer clean the directory. Is this for the case where we want to restore a subset of the data but keep whats there? Do we still need the option to clean the directory for some restores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common cases for restores are:

  1. Restores into empty clusters (e.g., red-black)
  2. Restored into live clusters (e.g., emergencies)

In neither case do we want any data to be deleted. I would prefer that in outlier cases where that is required that there is a separate explicit job or command that the operator needs to run to best manage expectations. It is scary today that we could configure Priam such that a restart deletes all of the data.

"config.doesCassandraStartManually() is set to True, hence Cassandra needs to be started manually ...");
List<String> failedImports = cassOps.importAll(config.getRestoreDataLocation());
if (!failedImports.isEmpty()) {
instanceState.endRestore(Status.FAILED, LocalDateTime.now());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use the list of failed imports somewhere to log what failed and be able to resume? instead of starting over from the beginning?

Also if we only need the boolean why return the list? I would rather us use the list but if we are not going to I mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. In case we don't specify --copy-data and Cassandra's behavior is to remove the origin files regardless of successful import, we want a log of what failed so we can at least get a sense of the scope of what is missing. I will log the failures.

…hs when fetching incrementals. The iterators are always fully materialized. Also, remove the now-redundant method from BackupRestoreUtil that merely wrapped the MetaProxy call.
… false in practice.

Remove final modifier on static method.
Make static property an instance variable instead.
Use parameterized constructor call in place of two lines.
…is not relevant to whether a restore should be attempted.
…intuitive and will be empty in practice in most cases anyway.
…re to cease requiring an IConfiguration but rather just take the data directory path as a String.
…y to restore to a staging area and then import or move the data depending on whether Cassandra is running.
@mattl-netflix mattl-netflix force-pushed the feature/in_place_restore_4 branch from 9557275 to e1ebaa5 Compare March 17, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants