-
-
Notifications
You must be signed in to change notification settings - Fork 3
feat(v2): Postgres storage adapter #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6fe9265 to
dcf8130
Compare
This adds a postgres storage adapter for the taskbroker, as well as providing a way to choose between the adapters in the configuration. This adapter will also work with AlloyDB. In postgres, the keyword `offset` is reserved, so that column is called `kafka_offset` in the PG tables and converted to `offset`. The tests were updated to run with both the SQLite and Postgres adapter using the rstest crate. The `create_test_store` function was updated to be the standard for all tests, and to allow choosing between a SQLite and Postgres DB. A `remove_db` function was added to the trait and the existing adapters, since the tests create a unique PG database on every run that should be cleaned up. The `create_test_store` function was updated to be the standard for all tests, and to allow choosing between an SQLite and Postgres DB.
dcf8130 to
f70bfda
Compare
fpacifici
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start a postgres as part of devservices ?
fpacifici
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend reducing the duplication in the ActivationStore between the two stores. I don't know whether we will remove SQLite from the implementation for good, anyway it will take some time before we get there, having so much copy paste is quite dangerous.
| processing_deadline TIMESTAMPTZ, | ||
| status TEXT NOT NULL, | ||
| at_most_once BOOLEAN NOT NULL DEFAULT FALSE, | ||
| application TEXT NOT NULL DEFAULT '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not need to have a default now that it has been merged and rolled out.
| }), | ||
| }; | ||
| let response = service.set_task_status(Request::new(request)).await; | ||
| println!("response: {:?}", response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
| /// The number of ms for timeouts when publishing messages to kafka. | ||
| pub kafka_send_timeout_ms: u64, | ||
|
|
||
| pub database_adapter: &'static str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not an enum ?
| /// When an activation is moved from pending -> processing a result is expected | ||
| /// in this many seconds. | ||
| pub processing_deadline_duration: u32, | ||
| pub processing_deadline_duration: i32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance that we have values in the sqlite DB that would not fit a i32 ?
| /// Remove the database, used only in tests | ||
| async fn remove_db(&self) -> Result<(), Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WE should create an MCP in front of production taskbroker and we absolutely need to expose this, which will certainly be used only in tests.
| .fetch_all(&mut *conn) | ||
| .await?; | ||
|
|
||
| Ok(rows.into_iter().map(|row| row.into()).collect()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we update the rows here, then the broker crashes because a nuclear strike targets the datacenter or, less likely, it suffers an OOM ? Are these tasks stuck ?
| if let Ok(row) = result { | ||
| let received_at: DateTime<Utc> = row.get("received_at"); | ||
| let delay_until: Option<DateTime<Utc>> = row.get("delay_until"); | ||
| let millis = now.signed_duration_since(received_at).num_milliseconds() | ||
| - delay_until.map_or(0, |delay_time| { | ||
| delay_time | ||
| .signed_duration_since(received_at) | ||
| .num_milliseconds() | ||
| }); | ||
| millis as f64 / 1000.0 | ||
| } else { | ||
| // If we couldn't find a row, there is no latency. | ||
| 0.0 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we can reduce the duplications by having a lower level DB trait that is used by the activation store.
| async fn count_by_status(&self, status: InflightActivationStatus) -> Result<usize, Error> { | ||
| let result = | ||
| sqlx::query("SELECT COUNT(*) as count FROM inflight_taskactivations WHERE status = $1") | ||
| .bind(status.to_string()) | ||
| .fetch_one(&self.read_pool) | ||
| .await?; | ||
| Ok(result.get::<i64, _>("count") as usize) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be a default method in the trait? This is an alternative direction with respect to the low level DB trait.
| .bind(id) | ||
| .fetch_optional(&mut *conn) | ||
| .await?; | ||
| println!("result: {:?}", result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| let now = Utc::now(); | ||
| let mut atomic = self.write_pool.begin().await?; | ||
|
|
||
| // Idempotent tasks that fail their processing deadlines go directly to failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at_most_once != idempotent. Idempotent means you can issue the task again safely. Is the comment wrong here ?
This adds a postgres storage adapter for the taskbroker, as well as providing a way to choose between the adapters in the configuration. This adapter will also work with AlloyDB.
In postgres, the keyword
offsetis reserved, so that column is calledkafka_offsetin the PG tables and converted tooffset.The tests were updated to run with both the SQLite and Postgres adapter using the rstest crate. The
create_test_storefunction was updated to be the standard for all tests, and to allow choosing between a SQLite and Postgres DB.A
remove_dbfunction was added to the trait and the existing adapters, since the tests create a unique PG database on every run that should be cleaned up.The
create_test_storefunction was updated to be the standard for all tests, and to allow choosing between an SQLite and Postgres DB.