Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET Allow ISKFunction to return IAsyncEnumerable #1298

Closed
Tracked by #3132
wayne-o opened this issue Jun 1, 2023 · 7 comments · Fixed by #3496
Closed
Tracked by #3132

.NET Allow ISKFunction to return IAsyncEnumerable #1298

wayne-o opened this issue Jun 1, 2023 · 7 comments · Fixed by #3496
Assignees
Labels
kernel Issues or pull requests impacting the core kernel
Milestone

Comments

@wayne-o
Copy link

wayne-o commented Jun 1, 2023

Important

Labeled High because it will not require a breaking change, but it's very important to complete by v1.0.0

Is there any way currently to allow ISKFunction to return an IAsyncEnumerable? If not is there any plan to?

@wayne-o
Copy link
Author

wayne-o commented Jun 1, 2023

Or to put it another way, is there any way to get the gpt client to use the streaming endpoint and emit events on each message?

@evchaki evchaki added the kernel Issues or pull requests impacting the core kernel label Jun 1, 2023
@craigomatic
Copy link
Contributor

We added support for streaming chat messages semi-recently, although I'm not sure this is what you are asking for: https://github.com/microsoft/semantic-kernel/blob/main/samples/dotnet/kernel-syntax-examples/Example33_StreamingChat.cs

Can you be a little more specific about the types of events you are looking to consume?

@wayne-o
Copy link
Author

wayne-o commented Jun 2, 2023 via email

@awharrison-28
Copy link
Contributor

@RogerBarreto has some ideas in the works to make the requested behavior possible. TBD - when we get those changes in, let's make sure to tag this issue.

@wayne-o
Copy link
Author

wayne-o commented Jun 3, 2023 via email

@MithrilMan
Copy link

MithrilMan commented Jun 13, 2023

Since you don't have this yet in Semantic Kernel, let me give my view on this:

I'm working on my own implementation of generative LLM integration here: https://github.com/MithrilMan/AIdentities
since a couple of months (I didn't knew about semantic kernel and when I found it it was already late and anyway wasn't adhering 100% to my vision) and I implemented what @wayne-o is asking in my design: I've a concept of Skills and when a skill is executed, this is the signature of the call
IAsyncEnumerable<Thought> ExecuteAsync(Prompt prompt, SkillExecutionContext context, CancellationToken cancellationToken);
The core of my skill/brain thing is here (readme isn't updated) https://github.com/MithrilMan/AIdentities/tree/25e60f875979a0d595561fb7d0f1f5e82e5ab7a1/src/AIdentities.Shared/Features/CognitiveEngine

It works, but I'm sick of a problem that arises around the use of IAsyncEnumerable: when the API fails in the middle of generating a streamed content, it fails badly because it fails when you start to consume the iterator and .Net doesn't have a nice way to handle such cases, so if you want to implement a retry pattern for example by using polly (and we've seen how this is important with OpenAI that lately had big problems serving properly their API), you can't just wrap the call to the stream method because the exception of course arise when you materialize the enumerator.
Unluckily .Net doesn't allow to call a yield return within a try/catch block, so you have to create all sort of workaround to have a proper retry method.

Also when you stream chunks of text, what would you do if something goes wrong in the API ?
In a simple chat application you can just ask the user to send again the message, but if it's part of a skill (or skfunction as you call it) It's very likely that You'd need to inform the consumer that you have to start over and that it should forget the partial generated text (= higher complexity)

I've yet not figured out how to properly and affectively handle such scenarios, in my case when it fails, fails badly, but this makes me wonder if it's worth to use streaming calls in skills at all.

Of course it's cool to see that my Skill is producing a text as a stream and the user can start reading as soon as it's generating, also what I called AIdentity can generate different kind of "thoughts" that can be streemed too (think of it like a kind of log in some scanrio) but at the same time if it breaks in the middle you'll have big troubles to rollback what you eventually did with the partial generated text, without mentioning minor problems it can give if you try to parse that text on a markdown viewer for example (that's why recently in my chat section I'm building the message on a normal div and then swith to markdown once completed), and I'm starting to think that maybe all this complexity isn't worth the effort.

I'm a bit torn as to whether or not implementing stream at the function level is something I'd want to deal with. Maybe just being able to signal to consumer events like "StartingTextGeneration" / "EndingTextGeneration" could be enough

@RogerBarreto RogerBarreto moved this to Sprint: Planned in Semantic Kernel Jun 16, 2023
@RogerBarreto RogerBarreto moved this from Sprint: Planned to Sprint: In Progress in Semantic Kernel Jun 19, 2023
@RogerBarreto RogerBarreto moved this from Sprint: In Progress to Sprint: Planned in Semantic Kernel Jun 26, 2023
@nacharya1 nacharya1 removed the status in Semantic Kernel Jul 20, 2023
@nacharya1 nacharya1 moved this to Sprint: Planned in Semantic Kernel Jul 24, 2023
@nacharya1 nacharya1 added this to the R3 : Cycle 2 milestone Aug 4, 2023
@RogerBarreto RogerBarreto moved this from Sprint: Planned to Sprint: In Progress in Semantic Kernel Sep 6, 2023
@nacharya1 nacharya1 moved this from Sprint: In Progress to Backlog - New features in Semantic Kernel Sep 14, 2023
@RogerBarreto RogerBarreto modified the milestones: R3: Cycle 3, v1 Sep 15, 2023
@lemillermicrosoft lemillermicrosoft modified the milestones: v1, R3: Cycle 3 Sep 22, 2023
@RogerBarreto RogerBarreto linked a pull request Sep 26, 2023 that will close this issue
4 tasks
@markwallace-microsoft
Copy link
Member

Make child of #1649

@matthewbolanos matthewbolanos changed the title Allow ISKFunction to return IAsyncEnumerable .NET Allow ISKFunction to return IAsyncEnumerable Oct 11, 2023
@matthewbolanos matthewbolanos removed this from the R3: Cycle 3 milestone Oct 12, 2023
@RogerBarreto RogerBarreto moved this from Backlog - New features to Sprint: In Progress in Semantic Kernel Nov 14, 2023
@RogerBarreto RogerBarreto added this to the v1.0.0 milestone Nov 14, 2023
@RogerBarreto RogerBarreto moved this from Sprint: In Progress to Sprint: In Review in Semantic Kernel Nov 24, 2023
github-merge-queue bot pushed a commit that referenced this issue Nov 24, 2023
## Context and Problem Statement

Resolves #1649
Resolves #1298 

It is quite common in co-pilot implementations to have a streamlined
output of messages from the LLM (large language models) and currently
that is not possible while using ISKFunctions.InvokeAsync or
Kernel.RunAsync methods, which enforces users to work around the Kernel
and Functions to use `ITextCompletion` and `IChatCompletion` services
directly as the only interfaces that currently support streaming.

Currently streaming is a capability that not all providers do support
and this as part of our design we try to ensure the services will have
the proper abstractions to support streaming not only of text but be
open to other types of data like images, audio, video, etc.

Needs to be clear for the sk developer when he is attempting to get
streaming data.

## Decision Drivers

1. The sk developer should be able to get streaming data from the Kernel
and Functions using Kernel.RunAsync or ISKFunctions.InvokeAsync methods

2. The sk developer should be able to get the data in a generic way, so
the Kernel and Functions can be able to stream data of any type, not
limited to text.

3. The sk developer when using streaming from a model that does not
support streaming should still be able to use it with only one streaming
update representing the whole data.

## User Experience Goal

```csharp
//(providing the type at as generic parameter)

// Getting a Raw Streaming data from Kernel
await foreach(string update in kernel.StreamingRunAsync<byte[]>(function, variables))

// Getting a String as Streaming data from Kernel
await foreach(string update in kernel.StreamingRunAsync<string>(function, variables))

// Getting a StreamingResultUpdate as Streaming data from Kernel
await foreach(StreamingResultUpdate update in kernel.StreamingRunAsync<StreamingResultUpdate>(variables, function))
// OR
await foreach(StreamingResultUpdate update in kernel.StreamingRunAsync(function, variables)) // defaults to Generic above)
{
    Console.WriteLine(update);
}
```

## Out of Scope

- Streaming with plans will not be supported in this phase. Attempting
to do so will throw an exception.
- Kernel streaming will not support multiple functions (pipeline).

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
@github-project-automation github-project-automation bot moved this from Sprint: In Review to Sprint: Done in Semantic Kernel Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment