Skip to content

Conversation

@Kousay-Jebir
Copy link
Contributor

@Kousay-Jebir Kousay-Jebir commented Jun 16, 2025

The first commits do the following big changes :

  1. Renamed Lb to InnerLb. This has the consequence of aliasing imports of Lb in all the service/*.rs files
  2. Moved InnerLb definition and its init definition to its own module.
  3. lib.rs now only imports the enum as well as the two concrete structs InnerLb and ProxyLb
    to use crate::InnerLb as Lb.
  4. the enum itslef is now taking the name Lb. Added pub use enum_lb::Lb; so that lb-rs clients don't have to adjust their imports.
  5. rpc module contains two structs to properly define what a request and a response looks like. for now these reside in the module file itself but i think it needs to be moved to the model folder in future commits
  6. error.rs's LbErr now implements Ser and Deser, I did ask the AI to do these implementations just demo purposes for now since it's that relevant right now for what I've been working on will probably work on improving them in later commits
  7. For now i went with the approach where each call opens its own tcp connection. thus the definition of ProxyLb not containing a connection field used across the whole api.
  8. For drafting purposes, I hardcoded a TCP port that init will try to connect to, Thinking about moving this in Config

The goal of these first commits is to prepare the overall structure.
If this gets approved, future commits will focus on implementing the other apis, further organize the files, get rid of all the warnings and at some point implement robust measures such as retrial on error on connections etc...

@Kousay-Jebir

This comment was marked as resolved.

@Parth

This comment was marked as resolved.

@Parth

This comment was marked as resolved.

@Kousay-Jebir

This comment was marked as resolved.

@Parth

This comment was marked as resolved.

@Parth

This comment was marked as resolved.

@Kousay-Jebir

This comment was marked as resolved.

@Kousay-Jebir

This comment was marked as resolved.

@Parth

This comment was marked as resolved.

@Kousay-Jebir
Copy link
Contributor Author

pub struct LbClient {
    pub addr: SocketAddrV4,
    pub events: EventSubs
}

do you think this is valid ? , upon inspecting i see that there is no reason for subscribe to be called via rpc and let the leader handle it, i think followers can just call their own without interacting with the leader here ?

@Kousay-Jebir Kousay-Jebir requested a review from Parth July 5, 2025 14:01
Copy link
Member

@Parth Parth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good stuff, continuing to head in the right direction, you should fire off a cargo fmt soon

@@ -0,0 +1,419 @@
pub async fn dispatch(lb: Arc<LbServer>, req: RpcRequest) -> LbResult<Vec<u8>> {

let raw = req.args.unwrap_or_default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have some comments in the client side of this that should render the unwrap_or_default() un-needed.

Comment on lines 21 to 34
"import_account_private_key_v2" => {
let (pk_bytes, api_url): ([u8; 32], String) = deserialize_args(&raw)?;
let sk = SecretKey::parse(&pk_bytes).map_err(core_err_unexpected)?;
call_async(|| lb.import_account_private_key_v2(sk, &api_url)).await?
}

"import_account_phrase" => {
let (phrase_vec, api_url): (Vec<String>, String) = deserialize_args(&raw)?;
let slice: Vec<&str> = phrase_vec.iter().map(|s| s.as_str()).collect();
let phrase_arr: [&str; 24] = slice
.try_into()
.map_err(core_err_unexpected)?;
call_async(|| lb.import_account_phrase(phrase_arr, &api_url)).await?
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why these have been lifted out to here

Comment on lines +391 to +397
fn call_sync<R>(f: impl FnOnce() -> LbResult<R>) -> LbResult<Vec<u8>>
where
R: Serialize,
{
let res: R = f()?;
bincode::serialize(&res).map_err(|e| core_err_unexpected(e).into())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? I thought all the lb-rs fns became async because of the network activity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of lb-enum (in turn lb-client) are async due to the network activity, but i haven't turned sync methods of lb-server to async cuz no need to

Comment on lines 400 to 419
use std::future::Future;
use std::path::PathBuf;
use std::sync::Arc;

use libsecp256k1::SecretKey;
use serde::de::DeserializeOwned;
use serde::Serialize;
use std::sync::Mutex;
use uuid::Uuid;
use crate::model::api::{AccountFilter, AccountIdentifier, AdminSetUserTierInfo, ServerIndex, StripeAccountTier};
use crate::model::errors::{LbErrKind};
use crate::model::errors::{core_err_unexpected};
use crate::model::file::ShareMode;
use crate::model::file_metadata::{DocumentHmac, FileType};
use crate::model::path_ops::Filter;
use crate::rpc::{CallbackMessage, RpcRequest};
use crate::service::activity::RankingWeights;
use crate::service::import_export::{ExportFileInfo, ImportStatus};
use crate::subscribers::search::SearchConfig;
use crate::{LbServer,LbResult}; No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm sympathetic to all use statements being at the bottom of all files, I like to order things in some sort of significance, either alpha or importance and use statements usually are just noise to me.

But it's too un-idiomatic and foreign of a thing to do. If rustfmt supported it as a config and it happened automatically I could be down to be weird. But in the absence of that I think it's better off not happening on a project we want many people to contribute to.

stream.read_exact(&mut buf).await?;

let req: RpcRequest = bincode::deserialize(&buf).map_err(core_err_unexpected)?;
let payload = dispatch(lb, req).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let payload = dispatch(lb, req).await?;
let resp = dispatch(lb, req).await?;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why serialization stuff was added here, this file is prob only for LbServer internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to do this cuz now theres get_keychain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure no one outside lb is using keychain tho. So there prob shouldn't be a get_keychain

}

#[derive(Copy, Clone, Debug)]
impl Serialize for SearchIndex {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why serialization stuff was added here, this file is prob only for LbServer internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, for get_search

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still I may be mis-understanding but I'm fairly sure that no one will be serializing / deserializing the whole search index -- which is basically every document stored in mem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just call this file lb.rs

#[derive(Clone)]
pub struct LbClient {
pub addr: SocketAddrV4,
pub events: EventSubs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes sense:

i think followers can just call their own without interacting with the leader here ?

let me explain the point of the subscription api:
lots of people will ask questions from lb-rs (like: "what are all the files?"), and then they'll show that to the user.

The subscription api tells them when that data is stale and it's time to get fresh data.

Followers simply won't have access to the information unless it's pushed from the server to them.

Followers are interested in subscribing to a tokio broadcast channel. I would read generally about channels and then read about broadcast if you're unfamiliar with these synchronization primitives. So you're close regarding what you have, but I don't think this field belongs in LbClient, I think it's going to be something like this:

With your existing stateless protocol you'd have to do something like this:

when someone calls subscribe the client implementation do the following:

  • create a broadcast channel of it's own
  • create a new thread, this thread will call_rpc, and the dispatch version of subscribe will do the actual subscription and continuously respond with updates (the channel can't be sent over the network so this is the only sensible reply here). This helper thread on the client will recv those messages form them back into rust structs and broadcast them to whoever is listening.
  • the client subscribe impl will return the recv end of the broadcast channel as soon as the thread is created
  • the thread will populate the channel beyond that point.

When I was advising you to not really worry about callbacks it's to reserve your resources for mastery over this one method -- subscribe. I am interested in pretty soon deprecating all the callbacks we have already in favor of this new api.

Pushing data from within lb to clients is annoying, you're experiencing it's annoyance over the network, smail experiences in ffi, etc. So we just want to do it once and then just rely on our rich typesystem to carry us beyond that point once the pipe is setup.

There are three reasons I want your connection to be stateful, and you can work around all of them:

  • compatibility, if one client is updated before another one, then the connection may be incompatible, we'll need to explore conventions here, maybe one broadcasts it's min supported version (like our actual server does). Ideally this would just happen at connection time, but it's prob fast enough for it to happen over every connection. You can reserve room in the protocol like you did for the size of the transmission. Or it's part of the connection sequence (that's what I would do).
  • Performance overhead -- I assume it's small but worth addressing, but you can measure and if it's insignificant then you don't need to do it.
  • Security, right now there's no benefit to guarding access to localhost (to be clear it's not 0.0.0.0) but one day (optional passphrase #159) there likely will be and then the connection will probably take a passphrase or some other sort of API key to start. It's possible that you'd still need to log in to the client regardless, and that client would have a bb core with just an account or something that's used to authenticate with LbServer.

@Kousay-Jebir Kousay-Jebir requested a review from Parth August 14, 2025 14:08
@Kousay-Jebir
Copy link
Contributor Author

Should i work on fixing resulting errors in lb-c lb-java and test_utils

@Parth
Copy link
Member

Parth commented Aug 14, 2025

You can probably fix those errors last -- we can get very far with just CLI & egui as a proof of concept. Happy to see this moving again!

) -> LbResult<()> {
let method_id = method as u16;
let body = bincode::serialize(args).map_err(core_err_unexpected)?;
let len = body.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sending a len prefix is fine and pretty common

you should just know about another approach, one that fits this 1 connection per request a bit better:

after finishing your writes you can execute stream.Shutdown(write) and then on the reading side you can do read_all. This means you don't have to negotiate the types.

This lets you do a single, probably more optimal syscall, signalling to the OS, I want to put all available bytes into memory, I assume this is the fastest way to express the idea if the end goal is in-fact to get all the bytes into memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however if we find that we're trying to head in the direction of re-using the connection (which is likely) then you'll need to negotiate the length. So I think keeping it in this half way point is fine until we know what direction we want to head and then produce the optimal implementation for that strategy.

Comment on lines +117 to +121
stream
.read_exact(&mut len_buf)
.await
.map_err(core_err_unexpected)?;
let resp_len = usize::from_be_bytes(len_buf);
Copy link
Member

@Parth Parth Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a stream.read_u64 and a stream.write_u64 which could simplify these and related calls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you use _be_ in some places and _le_ in others? this makes me upgrade the above to a "requested change" from a suggestion.

Copy link
Member

@Parth Parth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is actually quite close to being ready to merge:

  • rebase needs to happen, so people can test with the most recent version
  • minor comments around the protocol need to be addressed
  • search index for sure does not need to be serialized, the search queries will be sent to LbServer which is the only one who will have the index.
  • we can make LbErr::backtrace an Option<String> instead of an Option<Backtrace> which will let you derive Serialize and Deserialize via the macro on the LbErr type.

@Kousay-Jebir
Copy link
Contributor Author

Regarding the SearchIndex serialization, in clients/cli/src/main.rs line 241 used to have
lb.search.tantivy_reader.reload().unwrap(); i had to add a get_search method so that this can retreive the index from the leader when lb is a follower. i think this was my only reason that made me impl ser and deser on it.

@Parth
Copy link
Member

Parth commented Sep 16, 2025

Regarding the SearchIndex serialization, in clients/cli/src/main.rs line 241 used to have lb.search.tantivy_reader.reload().unwrap(); i had to add a get_search method so that this can retreive the index from the leader when lb is a follower. i think this was my only reason that made me impl ser and deser on it.

OKay that makes sense, I think the preferred way to solve this would be to expose a new endpoint for the CLI to refresh the index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants