-
Notifications
You must be signed in to change notification settings - Fork 0
Fixes ENG-1430: Add eval run utility in TS #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9ce19d2 to
68e4a69
Compare
src/eval_utils/run.ts
Outdated
| { | ||
| ...request, | ||
| // @ts-ignore Log under the version expected by evaluation, not | ||
| // one determined by decorators. Otherwise the evaluation will stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "will stale" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, made comment more detailed
src/eval_utils/run.ts
Outdated
| } | ||
|
|
||
| if ("prompt" in request) { | ||
| if (!_.isEqual(state!.evaluatedVersion, request.prompt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the ! assertion?
i think that !s should be avoided in general. if code changes such that this assumption can't be made, this wouldn't raise an error (and when it does crash, it's not that useful an error - an error message explaining why we expect state to be defined would be more helpful for a future developer coming to this and wondering why this assertion is here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, added explicit check
src/eval_utils/run.ts
Outdated
| dataset: Dataset, | ||
| name?: string, | ||
| evaluators: Evaluator[] = [], | ||
| workers: number = 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"workers" here doesn't really make sense i think - does this actually spin off multiple workers? or is it just async concurrency?
does js have a standard name for this? (i do know this means inconsistency with python, but i think it's wrong enough that i'd push to change it here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p-map limits the number of promises running at any time to be workers.
I think it makes sense with an extra modification: we should cap the number of workers to not DOS our backend. Since the promises fire up simultaneously, we might have thousands of open HTTP connections and overwhelm the backend.
Added a 32 cap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my comment here was more about the specific terminology of "worker".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at p-map, it looks like they use "concurrency", which i'd prefer here as it's more accurate, even if it's inconsistent with python. (seems reasonable as it's a language-specific runtime option)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack - I'll change to concurrency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "consistent with python" was a goal for this PR we should have closed it long ago 😄
src/eval_utils/run.ts
Outdated
| "The path of the evaluated `file` must match the path of your decorated `callable`. Expected path: " + | ||
| file.path + | ||
| ", got: " + | ||
| file.callable.path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why +s and not template string? (think template string would be much cleaner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack changed to template
| } | ||
|
|
||
| if (file.callable && "version" in file.callable) { | ||
| if (!_.isEqual(file.version, file.callable.version)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to just check version_id? is a deep equal necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version can be a deep object, e.g. {flow: attributes: {...}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if both have version_id, i think we should use version_id.
as noted in other comment:
the deep equal here is slightly concerning - very often the response a user would get from the API won't be exactly equal to the request made. (e.g. the response has default values injected, along with some null attributes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been considering it, and I think we should prevent the user from specifying a value for file.version when the callable is wrapped in an SDK utility. There's a mini design-doc on this issue here:
https://linear.app/humanloop/issue/ENG-1456/run-utility-callable-version-issue
I'd rather leave this as it is and revisit when we tackle the ticket above. But I'm aware it's not ideal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in a sync with Harry. Was scoped out for later
| // Upsert the local Evaluators; other Evaluators are just referenced by `path` or `id` | ||
| let localEvaluators: [EvaluatorResponse, Function][] = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is our terminology here "local evaluators"?
in Humanloop we call them "External Evaluators".
is this something taken from the python sdk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - should I change to externalEvaluators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes to which question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear to me what this should be. but it should be consistent and intuitive. neither "local" nor "external" seem great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's taken from python SDK. I think it's good enough
src/humanloop.client.ts
Outdated
| "Using Prompt File utility without passing any provider in the " + | ||
| "HumanloopClient constructor. Did you forget to pass them?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is this shown to users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see
i think this is a very passive aggressive thing to tell users.
instead of "did you forget to pass them?", we should say something like "Prompt File utility can only be used when a provider is passed in the Humanloop constructor." or "Humanloop cannot find ... . To ..., pass a provider into the HumanloopClient constructor.".
Also I think the term "Prompt File utility" is very confusing and ambiguous. We should instead specify which function was used. (Unclear why "File" important here; unclear what counts as a "utility" here - and so hard to figure out what's wrong - are all sdk methods under .prompts "prompt utilities"?;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
${func.name}: You must pass at least one LLM client library in the Humanloop client constructor. Otherwise the prompt() utility will not work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does passing any llm client library work? or does it have to be the one that is called within the function wrapped by prompt()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instrumentors require the module and modify the prototype of the client itself. So all client instances would be watched by the OTel instrumentor:
https://github.com/traceloop/openllmetry-js/blob/main/packages/instrumentation-openai/src/instrumentation.ts#L55
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${func.name}: To use the `prompt()` utility, pass your LLM client library into the Humanloop client constructor; e.g. `HumanloopClient(..., { providerModules: {OpenAI} } )`.
clarified; should pass in all relevant llm client libraries.
|
are the 3 cookbooks meant to be released to users? |
Yes, will make them public after we release a beta version and can install humanloop properly from npm |
| } | ||
|
|
||
| if ("flow" in request) { | ||
| if (!_.isEqual(state.evaluatedVersion, request.flow)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the deep equal here is slightly concerning - very often the response a user would get from the API won't be exactly equal to the request made. (e.g. the response has default values injected, along with some null attributes)
is that relevant here?
Are both state.evaluatedVersion and request.flow user-provided?
Should we check explicitly on version_id instead (if that is available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in a sync with Harry. Was scoped out for later
Based on #7
QA'd using the following cookbooks:
Remarks:
(inputs, messages) => ...signature. This can be bypassed by wrapping the function returned by the utility in an arrow function: