Skip to content

Conversation

@andreibratu
Copy link

@andreibratu andreibratu commented Dec 12, 2024

Based on #7

QA'd using the following cookbooks:

Remarks:

  • Decorators are forced to use a (inputs, messages) => ... signature. This can be bypassed by wrapping the function returned by the utility in an arrow function:
    const callLLM = async (messages: any[]) => {
        return await promptUtilityFactory(
            opentelemetryTracer,
            async (_: any, messages: any[]) => {
                const client = new OpenAI({ apiKey: process.env.OPENAI_KEY });
                const response = await client.chat.completions.create({
                    model: "gpt-4o",
                    messages: messages,
                    temperature: 0.8,
                });
                return (
                    (response.choices[0].message.content || "") +
                    " " +
                    (await randomString())
                );
            },
            "Call LLM",
        )({}, messages);
    };

    const agentCall = async (messages: any[]) =>
        await humanloop.flow(
            opentelemetryTracer,
            async (_: any, messages: any[]) => {
                return await callLLM(messages);
            },
            "Small Flow",
        )({}, messages);
  • Eval run works

{
...request,
// @ts-ignore Log under the version expected by evaluation, not
// one determined by decorators. Otherwise the evaluation will stale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "will stale" mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, made comment more detailed

}

if ("prompt" in request) {
if (!_.isEqual(state!.evaluatedVersion, request.prompt)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the ! assertion?

i think that !s should be avoided in general. if code changes such that this assumption can't be made, this wouldn't raise an error (and when it does crash, it's not that useful an error - an error message explaining why we expect state to be defined would be more helpful for a future developer coming to this and wondering why this assertion is here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, added explicit check

dataset: Dataset,
name?: string,
evaluators: Evaluator[] = [],
workers: number = 8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"workers" here doesn't really make sense i think - does this actually spin off multiple workers? or is it just async concurrency?

does js have a standard name for this? (i do know this means inconsistency with python, but i think it's wrong enough that i'd push to change it here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p-map limits the number of promises running at any time to be workers.

I think it makes sense with an extra modification: we should cap the number of workers to not DOS our backend. Since the promises fire up simultaneously, we might have thousands of open HTTP connections and overwhelm the backend.

Added a 32 cap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my comment here was more about the specific terminology of "worker".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at p-map, it looks like they use "concurrency", which i'd prefer here as it's more accurate, even if it's inconsistent with python. (seems reasonable as it's a language-specific runtime option)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack - I'll change to concurrency

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "consistent with python" was a goal for this PR we should have closed it long ago 😄

Comment on lines 156 to 159
"The path of the evaluated `file` must match the path of your decorated `callable`. Expected path: " +
file.path +
", got: " +
file.callable.path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why +s and not template string? (think template string would be much cleaner)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack changed to template

}

if (file.callable && "version" in file.callable) {
if (!_.isEqual(file.version, file.callable.version)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to just check version_id? is a deep equal necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version can be a deep object, e.g. {flow: attributes: {...}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both have version_id, i think we should use version_id.

as noted in other comment:
the deep equal here is slightly concerning - very often the response a user would get from the API won't be exactly equal to the request made. (e.g. the response has default values injected, along with some null attributes)

Copy link
Author

@andreibratu andreibratu Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been considering it, and I think we should prevent the user from specifying a value for file.version when the callable is wrapped in an SDK utility. There's a mini design-doc on this issue here:

https://linear.app/humanloop/issue/ENG-1456/run-utility-callable-version-issue

I'd rather leave this as it is and revisit when we tackle the ticket above. But I'm aware it's not ideal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in a sync with Harry. Was scoped out for later

Comment on lines +253 to +254
// Upsert the local Evaluators; other Evaluators are just referenced by `path` or `id`
let localEvaluators: [EvaluatorResponse, Function][] = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is our terminology here "local evaluators"?
in Humanloop we call them "External Evaluators".
is this something taken from the python sdk?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - should I change to externalEvaluators?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes to which question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear to me what this should be. but it should be consistent and intuitive. neither "local" nor "external" seem great.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's taken from python SDK. I think it's good enough

Comment on lines 114 to 115
"Using Prompt File utility without passing any provider in the " +
"HumanloopClient constructor. Did you forget to pass them?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this shown to users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see

image

i think this is a very passive aggressive thing to tell users.
instead of "did you forget to pass them?", we should say something like "Prompt File utility can only be used when a provider is passed in the Humanloop constructor." or "Humanloop cannot find ... . To ..., pass a provider into the HumanloopClient constructor.".

Also I think the term "Prompt File utility" is very confusing and ambiguous. We should instead specify which function was used. (Unclear why "File" important here; unclear what counts as a "utility" here - and so hard to figure out what's wrong - are all sdk methods under .prompts "prompt utilities"?;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:
${func.name}: You must pass at least one LLM client library in the Humanloop client constructor. Otherwise the prompt() utility will not work properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does passing any llm client library work? or does it have to be the one that is called within the function wrapped by prompt()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrumentors require the module and modify the prototype of the client itself. So all client instances would be watched by the OTel instrumentor:
https://github.com/traceloop/openllmetry-js/blob/main/packages/instrumentation-openai/src/instrumentation.ts#L55

Copy link
Contributor

@harry-humanloop harry-humanloop Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${func.name}: To use the `prompt()` utility, pass your LLM client library into the Humanloop client constructor; e.g. `HumanloopClient(..., { providerModules: {OpenAI} } )`.

clarified; should pass in all relevant llm client libraries.

@harry-humanloop
Copy link
Contributor

are the 3 cookbooks meant to be released to users?
(if so, would expect a PR so I can leave comments on it.)

@andreibratu
Copy link
Author

are the 3 cookbooks meant to be released to users? (if so, would expect a PR so I can leave comments on it.)

Yes, will make them public after we release a beta version and can install humanloop properly from npm

}

if ("flow" in request) {
if (!_.isEqual(state.evaluatedVersion, request.flow)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the deep equal here is slightly concerning - very often the response a user would get from the API won't be exactly equal to the request made. (e.g. the response has default values injected, along with some null attributes)

is that relevant here?

Are both state.evaluatedVersion and request.flow user-provided?
Should we check explicitly on version_id instead (if that is available)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in a sync with Harry. Was scoped out for later

@andreibratu andreibratu merged commit 75fc08a into master Dec 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants