Gradio
Gradio is a great tool to quickly showcase AI projects with a nice interface.
However, without token streaming, users have to wait for the entire response to be generated before they can get any output.
With token streaming, users can view the tokens being generated in real-time, just like in ChatGPT.
Here’s an example of streaming vs not streaming response:
Layout
First, let’s create the layout for our demo.
with gr.Blocks() as demo:
gr.Markdown("""
<br>
# How to do real-time token streaming like ChatGPT in Gradio
### By @clusteredbytes
<br>
""")
with gr.Row():
with gr.Column():
question = gr.Textbox(lines=5, label="Ask the AI anything you want:")
gr.Examples(examples=["Write a poem in cockney accent telling why West Ham are massive.", "Write a poem about love."], inputs=question)
btn = gr.Button(value="Get Response")
with gr.Column():
answer = gr.Textbox(lines=15, label="Response from AI:")
- We’re using Gradio
Blocksfor this tutorial. - We have two columns.
- Input on the left (in the
questiontextbox), and generated output on the right (in theanswertextbox). - We create a
Buttonthat’ll trigger generating the response.
Button Click Handler
Now let’s create a click handler for that button
btn.click(fn=get_ai_response, inputs=question, outputs=answer)
When the button is clicked -
- the function
get_ai_responseis called - the string in the
questiontextbox is passed as input - the output is sent to the
answertextbox
Function to get tokens
Let’s see what the get_ai_response function does
def get_ai_response(input: str):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
stream=True, # Enable token streaming
temperature=0.2,
messages=[
{"role": "system", "content": "You're an AI assitant. Do what you're told to do by the user."},
{"role": "user", "content": input},
])
partial_response = ""
for stream_response in response:
token = stream_response["choices"][0]["delta"].get("content", "")
partial_response += token
yield partial_response
- First it creates an OpenAI Chat completion with necessary parameters
- To enable streaming, we add parameter
stream=True - As we’re using
stream=True, the chat completion returns ageneratorobject instead of an AI response. - Rather than getting the complete response all at once, we could use the
generatorobject to get the response in a token-by-token manner on-the-fly. - Hence the user don’t have to wait for the entire response to be received, but can see the response get populated incrementally token-by-token.
Now using that generator we got as response,
- we get the tokens one-by-one
- for each token we get, we add it to the
partial_responsevariable - then we
yieldthatpartial_response - Gradio then shows that
partial_responseas intermediate output
Launch Gradio
Finally we launch the gradio Block using demo.launch()
demo.queue()
demo.launch(share=True, debug=True)
- we set
share=Trueto create a public link - we set
debug=Trueto see logs and errors
Notice that we use demo.queue(), which is required for streaming intermediate outputs in Gradio.
Source Code
Here’s the full source code used in this tutorial:
import gradio as gr
import openai
def get_ai_response(input: str):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
stream=True, # Enable token streaming
temperature=0.2,
messages=[
{"role": "system", "content": "You're an AI assitant. Do what you're told to do by the user."},
{"role": "user", "content": input},
])
partial_response = ""
for stream_response in response:
token = stream_response["choices"][0]["delta"].get("content", "")
partial_response += token
yield partial_response
with gr.Blocks() as demo:
gr.Markdown("""
<br>
# How to do real-time token streaming like ChatGPT in Gradio
### By @clusteredbytes
<br>
""")
with gr.Row():
with gr.Column():
question = gr.Textbox(lines=5, label="Ask the AI anything you want:")
gr.Examples(examples=["Write a poem in cockney accent telling why West Ham are massive.", "Write a poem about love."], inputs=question)
btn = gr.Button(value="Get Response")
with gr.Column():
answer = gr.Textbox(lines=15, label="Response from AI:")
btn.click(fn=get_ai_response, inputs=question, outputs=answer)
demo.queue()
demo.launch(share=True, debug=True)
I tweet about these topics and anything I’m exploring regularly. Follow me on twitter