Gradio
Gradio is a great tool to quickly showcase AI projects with a nice interface.
However, without token streaming, users have to wait for the entire response to be generated before they can get any output.
With token streaming, users can view the tokens being generated in real-time, just like in ChatGPT.
Here’s an example of streaming vs not streaming response:
Layout
First, let’s create the layout for our demo.
with gr.Blocks() as demo:
gr.Markdown("""
<br>
# How to do real-time token streaming like ChatGPT in Gradio
### By @clusteredbytes
<br>
""")
with gr.Row():
with gr.Column():
question = gr.Textbox(lines=5, label="Ask the AI anything you want:")
gr.Examples(examples=["Write a poem in cockney accent telling why West Ham are massive.", "Write a poem about love."], inputs=question)
btn = gr.Button(value="Get Response")
with gr.Column():
answer = gr.Textbox(lines=15, label="Response from AI:")
- We’re using Gradio
Blocks
for this tutorial. - We have two columns.
- Input on the left (in the
question
textbox), and generated output on the right (in theanswer
textbox). - We create a
Button
that’ll trigger generating the response.
Button Click Handler
Now let’s create a click handler for that button
btn.click(fn=get_ai_response, inputs=question, outputs=answer)
When the button is clicked -
- the function
get_ai_response
is called - the string in the
question
textbox is passed as input - the output is sent to the
answer
textbox
Function to get tokens
Let’s see what the get_ai_response
function does
def get_ai_response(input: str):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
stream=True, # Enable token streaming
temperature=0.2,
messages=[
{"role": "system", "content": "You're an AI assitant. Do what you're told to do by the user."},
{"role": "user", "content": input},
])
partial_response = ""
for stream_response in response:
token = stream_response["choices"][0]["delta"].get("content", "")
partial_response += token
yield partial_response
- First it creates an OpenAI Chat completion with necessary parameters
- To enable streaming, we add parameter
stream=True
- As we’re using
stream=True
, the chat completion returns agenerator
object instead of an AI response. - Rather than getting the complete response all at once, we could use the
generator
object to get the response in a token-by-token manner on-the-fly. - Hence the user don’t have to wait for the entire response to be received, but can see the response get populated incrementally token-by-token.
Now using that generator we got as response,
- we get the tokens one-by-one
- for each token we get, we add it to the
partial_response
variable - then we
yield
thatpartial_response
- Gradio then shows that
partial_response
as intermediate output
Launch Gradio
Finally we launch the gradio Block using demo.launch()
demo.queue()
demo.launch(share=True, debug=True)
- we set
share=True
to create a public link - we set
debug=True
to see logs and errors
Notice that we use demo.queue()
, which is required for streaming intermediate outputs in Gradio.
Source Code
Here’s the full source code used in this tutorial:
import gradio as gr
import openai
def get_ai_response(input: str):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
stream=True, # Enable token streaming
temperature=0.2,
messages=[
{"role": "system", "content": "You're an AI assitant. Do what you're told to do by the user."},
{"role": "user", "content": input},
])
partial_response = ""
for stream_response in response:
token = stream_response["choices"][0]["delta"].get("content", "")
partial_response += token
yield partial_response
with gr.Blocks() as demo:
gr.Markdown("""
<br>
# How to do real-time token streaming like ChatGPT in Gradio
### By @clusteredbytes
<br>
""")
with gr.Row():
with gr.Column():
question = gr.Textbox(lines=5, label="Ask the AI anything you want:")
gr.Examples(examples=["Write a poem in cockney accent telling why West Ham are massive.", "Write a poem about love."], inputs=question)
btn = gr.Button(value="Get Response")
with gr.Column():
answer = gr.Textbox(lines=15, label="Response from AI:")
btn.click(fn=get_ai_response, inputs=question, outputs=answer)
demo.queue()
demo.launch(share=True, debug=True)
I tweet about these topics and anything I’m exploring regularly. Follow me on twitter