-
Notifications
You must be signed in to change notification settings - Fork 0
Added gRPC interface definition for data cache #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
beauremus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider off-the-shelf solutions?
https://github.com/aklivity/zilla
https://github.com/mailgun/kafka-pixy
| rpc CreateTopic(CreateTopicRequest) returns (CreateTopicResponse); | ||
|
|
||
| // Send a single message to a topic | ||
| rpc Produce(ProduceRequest) returns (ProduceResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our data logger process/service is likely a good example of why a streaming producer is a good idea.
| rpc Produce(ProduceRequest) returns (ProduceResponse); | ||
|
|
||
| // Stream messages from a topic (server streaming) | ||
| rpc Consume(ConsumeRequest) returns (stream ConsumeResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with Kafka standard operations. Is there a way to request multiple topics? This seems to imply a socket connection per device, which I think is a single device at a single rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most implementations of a kafka consumer support one-consumer-many-topics setups.
Seems like the question is where we want the complexity to emerge. If each consumer is on one topic, we could have data from a single consumer be streamed to any requesting external client, and thereby limit the maximum number of active consumers. Allowing arbitrary combinations of topics means it gets harder to reuse consumers for different clients.
On the other hand, we have the concern you bring up, of a single client now needing many separate connections to listen on many topics, instead of a few connections.
But there's also the question of why we'd want data from many topics to be mangled into one stream. Usually a topic contains a specific kind of data. The data can come from many sources, but the idea is each topic is its own little pool of things that can be operated on or reasoned about in the same way. Allowing consumers to stream many topics back in one connection kinda breaks this pattern, making it harder to know what the data is that we're getting, and forcing external clients to implement a bunch of logic to disambiguate the data that comes in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, consumer has ability to consume messages from different topics. We can make an interface which has ability to consumer messages from different topics with single gRPC call.
is this the requirement?
|
|
||
| message ConsumeRequest { | ||
| string topic = 1; | ||
| string group_id = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is group_id? It doesn't mirror the ProduceRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a kafka-ism. Each consumer of a topic can optionally be in a group. If in a group, kafka will spread consumers across the partitions for the topic as evenly as possible. If there are fewer consumers in the group than partitions, some consumers will get multiple partitions. If there are more consumers than partitions, some consumers will not get any data at all. Probably better to not use groups unless we're sure we're ok with some client only getting some of the messages from a topic.
EDIT: Wanted to clarify, if a consumer is not in a group, or if it is the only consumer in its group, it will get all the partitions of a topic (unless the consumer has been configured to listen on a specific subset of partitions, which is also a possibility). And you can have as many consumers listening to the topic that you want. They only get throttled when in the same group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding group_id, I should not include (thank you Beau nice catch)
This is more kafka related which may not be present if we changed from kafka to something else
Therefore, I should remove this.....
rneswold
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guilty of not putting enough comments in my .proto files. But a lot of the fields in these messages have very generic names: topic, key, name, group_id, etc. (maybe these are recognized in the Kafka world?) Some comments would help understand what they're purpose is.
For instance, in the response messages, you have a success field and a message field. Is the message field to describe an error if success is false? If so, what happens if success is true, but an error message is sent?
If those fields are paired like that, I'd get rid of them and make one optional field: errMessage. If the error message is missing, then it's success. If it's there, something went wrong. There's no way to have an invalid state.
If they're not tied together, then comments would be nice to understand why they're needed and what they indicate.
|
I guess this is more for @beauremus , though maybe you also have an answer Amol, but what's the gain from putting Kafka behind a gRPC service, again? I'd heard an argument about "it keeps all our internal services speaking gRPC", but is that all? For example, we have several services that know how to speak Postgres, and that seems to work ok. In fact, we had a whole meeting where it was explicitly determined that we would have many Postgres-fluent services instead of one Postgres-Service-To-Rule-Them-All. I see the move to put Kafka behind a service in a similar vein. Out of the box, Kafka is designed to handle enterprise-scale messaging, at thousands or even millions of messages a second. Seems like a big waste to hide that capability behind another network hop, just so we can say we limited the number of things that need to know how to talk to Kafka. Not here to say there aren't good reasons out there, but I haven't heard anything concrete of what exactly is motivating this architectural decision, when we've had a similar discussion in the past about Postgres that went the other way. As an alternative idea, what if we wrote up a library for talking to Postgres and another for talking to Kafka? Microservices that needed a connection to one or the other could just add the library as a dependency. It would keep each microservice in charge of its own data connections, rather than demanding everyone goes through a central service. But it still buys us the benefit of only writing the nitty-gritty implementation of Postgres/Kafka logic once. And we avoid accidentally creating a bottleneck in the control system that doesn't need to be there. Again, just wanted to throw this out there as a means of sparking discussion - not meant to be construed as a demand for going any particular direction. Appreciate anyone who takes the time to engage! |
Very true. Our services should use gRPC. But we're not trying to gRPC-ize the APIs of products. All gRPC services that need Postgres can use it directly. All gRPC services that leverage Kafka should use it directly. The gRPC APIs are providing a control system service -- not an alternate, generic API for these products. |
|
I think I agree with @rneswold and @jacob-curley-fnal. The benefit of our own gRPC layer wrapping Kafka that I see is it keeps us de-coupled from Kafka as a technology choice. We can swap out Kafka for other message queue technology at a later date, should we chose, without needing to modify all of the down-stream services. But TBH, I think the cost-benefit isn't there. This API would need to scale at the same level as Kafka - is that a reasonable expectation? How much effort will that take? Instead, given that Kafka is a mature and popular open-source product we should just embrace it's API. |
|
And, as @jacob-curley-fnal suggest, I think a better way to mitigate the risk of abandoning Kafka in the future would be to use native adapters rather than a microservice. |
But we're not de-coupled if the gRPC uses Kafka terms and data structures. Because if we move away from Kafka, then we're trying to make our Kafka-compatible gRPC API fit whatever new backend we chose. |
|
Thank you all for expressing your opinion about it.
|
|
Thanks for your comments, Amol! I think there's a good amount to talk about, so we might benefit from getting folks in a room. I've been working on a graphQL -> Kafka endpoint as part of extapi-acsys, specifically for grabbing alarms. I know you've been working on data caching for device readings. This raises the question of are we talking about one Kafka instance servicing all the control system's needs, or will there be several instances for different purposes? That might impact how we decide to talk to it, or it might not. Either way, I had a whole long thing written out, but might just be more efficient if we have a meeting to be sure everyone's on the same page. Thoughts? |
|
@jacob-curley-fnal I like the idea of having a meeting to discuss this too. |
|
Definitely, I would love to join application team meeting to discuss about interfaces. |
Added gRPC interface for data cache (Kafka)