Skip to content

Conversation

@arunagrawal84
Copy link
Contributor

Build the instance from backups by using the restore process in case of an instance replacement. Note that we prefer this when data size is HUGE. C* streaming is super slow, and for instances with big data size can lead to C* streaming for multiple days. Note that this is a little bit dangerous as you "will" lose writes accepted by the old instance but not uploaded to the backup file system. Also, we do not plan to run a local repair on the replaced instance, so data will be stale. We hope that the repair service will take care of the inconsistency. Clusters with LOCAL_QUORUM for reads and writes may see little to no impact. If the restore fails, then we fall back to use "streaming".

@codecov
Copy link

codecov bot commented Nov 12, 2019

Codecov Report

❌ Patch coverage is 7.69231% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.62%. Comparing base (4323b20) to head (c000eaf).
⚠️ Report is 179 commits behind head on 3.x.

Files with missing lines Patch % Lines
...m/src/main/java/com/netflix/priam/PriamServer.java 0.00% 23 Missing ⚠️
...java/com/netflix/priam/restore/RestoreContext.java 0.00% 20 Missing ⚠️
...in/java/com/netflix/priam/tuner/StandardTuner.java 60.00% 0 Missing and 2 partials ⚠️
...com/netflix/priam/config/IBackupRestoreConfig.java 0.00% 1 Missing ⚠️
...a/com/netflix/priam/identity/InstanceIdentity.java 0.00% 1 Missing ⚠️
...ain/java/com/netflix/priam/tuner/dse/DseTuner.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                3.x     #837      +/-   ##
============================================
- Coverage     46.67%   46.62%   -0.06%     
+ Complexity     1054     1053       -1     
============================================
  Files           167      167              
  Lines          7315     7325      +10     
  Branches        746      748       +2     
============================================
+ Hits           3414     3415       +1     
- Misses         3650     3661      +11     
+ Partials        251      249       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@hashbrowncipher hashbrowncipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions, at least some of them due to my unfamiliarity with Priam's codebase.

logger.info("No restore needed, task not scheduled");
shouldStartCassandra = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we exit this block with shouldStartCassandra being false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original requested restore was a failure (that is when Priam starts in restore mode during weekly restore refresh). `` // Start cassandra only if restore is successful.
shouldStartCassandra = true;```

But yes with the recent refactoring of restore we will throw exception there and thus we don't need that variable. good catch.

shouldStartCassandra = true;
} else {
if (instanceIdentity.isReplace()
&& backupRestoreConfig.enableBypassCassandraStreaming()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Priam determine whether Cassandra hasn't successfully bootstrapped? I'm looked for an existing check, but I didn't see one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenRetrieverUtils.inferTokenOwnerFromGossip is used to fetch the instance identity. That method should tell correctly if Cassandra had already bootstrapped successfully.

* Note that we prefer this when data size is HUGE. C* streaming is super slow and for instances
* with big data size can lead to C* streaming for multiple days. Note that this is a little bit
* dangerous as you "will" some of the writes accepted by old instance but not uploaded to
* backup file system. Also we do not plan to run local repair on the replaced instance, so data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that not running repair is acceptable for a first iteration. Hypothetically though, how would we do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should be deferring that task to the repair service. Where that repair service sits, how it gets executed is a different conversation though.

if (!Restore.isRestoreEnabled(config, instanceInfo)) {
map.put("auto_bootstrap", config.getAutoBoostrap());
} else {
if (instanceState.getRestoreStatus() != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the purpose of this check. Why not just pass the auto_bootstrap setting through from the config 100% of the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea is - if we are doing a restore then we need to set auto_bootstrap to false else use the provided value in the configuration provider (like when creating a new cluster with false or true for most of the cases). Since we can choose to restore to bypass Cassandra streaming we need to override the configured value. I wanted to keep that logic in tuner instead of putting in a configuration provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we leave auto_bootstrap=true, even when we restore?

} else {
if (instanceIdentity.isReplace()
&& backupRestoreConfig.enableBypassCassandraStreaming()) {
logger.info("Trying to download data instead of streaming from Cassandra.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add "from backup", as in "Trying to download data instead of streaming from Cassandra"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants