Skip to content

Workshop for Azure OpenAI Service

15. Streamlit Chat with SLM

ks6088ts-labs/workshop-azure-openai

15. Streamlit Chat with SLM

Overview

# Run Ollama server
$ ollama serve

# Pull Phi3 model
$ ollama pull phi3

# Run a simple chat with Ollama
$ uv run python apps/15_streamlit_chat_slm/chat.py

# Run summarization with SLM
$ uv run python apps/15_streamlit_chat_slm/summarize.py

# Run streamlit app
$ uv run python -m streamlit run apps/15_streamlit_chat_slm/main.py

# List models
$ ollama list
NAME           ID              SIZE      MODIFIED
phi3:latest    4f2222927938    2.2 GB    3 minutes ago
phi4:latest    ac896e5b8b34    9.1 GB    55 minutes ago

Usage

Use Phi3 model

# Pull Phi3 model
$ ollama pull phi3

# Measure time to run the chat
$ time uv run python apps/15_streamlit_chat_slm/chat.py \
  --model phi3 \
  --prompt "hello"
{
  "content": "Hello! How can I help you today?",
  "additional_kwargs": {},
  "response_metadata": {
    "model": "phi3",
    "created_at": "2025-01-09T03:30:05.706397262Z",
    "message": {
      "role": "assistant",
      "content": ""
    },
    "done_reason": "stop",
    "done": true,
    "total_duration": 540964618,
    "load_duration": 5078297,
    "prompt_eval_count": 22,
    "prompt_eval_duration": 229000000,
    "eval_count": 10,
    "eval_duration": 305000000
  },
  "type": "ai",
  "name": null,
  "id": "run-54f0d2a6-b3d4-4ef5-b009-0c72b23297ac-0",
  "example": false,
  "tool_calls": [],
  "invalid_tool_calls": [],
  "usage_metadata": {
    "input_tokens": 22,
    "output_tokens": 10,
    "total_tokens": 32
  }
}
uv run python apps/15_streamlit_chat_slm/chat.py --model phi3 --prompt   1.57s user 0.16s system 68% cpu 2.515 total

Use Phi4 model

# Pull Phi4 model
$ ollama pull phi4

# Measure time to run the chat
$ time uv run python apps/15_streamlit_chat_slm/chat.py \
  --model phi4 \
  --prompt "hello"
{
  "content": "Hello! How can I assist you today? If you have any questions or need information, feel free to let me know. 😊",
  "additional_kwargs": {},
  "response_metadata": {
    "model": "phi4",
    "created_at": "2025-01-09T03:16:19.661532868Z",
    "message": {
      "role": "assistant",
      "content": ""
    },
    "done_reason": "stop",
    "done": true,
    "total_duration": 10662476906,
    "load_duration": 10769327,
    "prompt_eval_count": 23,
    "prompt_eval_duration": 426000000,
    "eval_count": 28,
    "eval_duration": 10223000000
  },
  "type": "ai",
  "name": null,
  "id": "run-16375018-116e-422f-afd4-84692af42bd6-0",
  "example": false,
  "tool_calls": [],
  "invalid_tool_calls": [],
  "usage_metadata": {
    "input_tokens": 23,
    "output_tokens": 28,
    "total_tokens": 51
  }
}
uv run python apps/15_streamlit_chat_slm/chat.py --model phi4 --prompt   1.48s user 0.12s system 12% cpu 12.455 total

Note:

Update Ollama to the latest version to run phi4 model
To use Ollama on WSL2, you may need to enable systemd. For more information, see Use systemd to manage Linux services with WSL

References