Tutorial: llmTools
DUE Wed, 11/12, 2 pm
This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.
Treat your messages sent to chatterd and Ollama as public utterances
with no reasonable expectation of privacy and know that these
are recorded for the purposes of carrying out a contextual interaction
with Ollama.
Expected behavior
This tutorial app does only one thing, namely to answer the question, “What is the
weather at my location?”. It does this by using two tools: get_location(), which queries
your device for its GPS lat/lon coordinates, and get_weather(), which uses takes
lat/lon as argument to query the free, open-source weather service Open Meteo, which
has a free-to-use API.
DISCLAIMER: the video demo shows you one aspect of the app’s behavior. It is not a substitute for the spec. If there are any discrepancies between the demo and the spec, please follow the spec. The spec is the single source of truth. If the spec is ambiguous, please consult the teaching staff for clarification.
Objectives
Front end:
- Learn how to provide a tool on device
- Setup tool calling infrastructure
- How to recognize tool-call event in the SSE stream and do the tool call
Back end:
- Learn how to provide a tool in the backend
- Setup tool calling and forwarding infrastructure
- How to recognize tool-call event in Ollama’s NDJSON stream and do the tool call or forward it as an SSE event to the front-end
LLM tool use or function calling
LLM’s “tool use” (equivalently, “function calling”) is the main enabler of agentic AI, allowing the LLM to act as an agent to accomplish tasks (on user’s behalf), for example, looking up information on the web, making a reservation, buying tickets, etc. The most widely known tool-use infrastructure is the open-standard Model Context Protocol (MCP). Unfortunately, MCP doesn’t directly match Ollama’s tool-use APIs. You’d need an MCP-Ollama Bridge to translate Ollama’s tool-use API to MCP’s.
A full MCP implementation and deployment involves satisfying a good amount of constraints and requirements and the (auto) generation of a substantial amount of boiler-plate code. The mechanism for tool provisioning and tool use itself is normally encapsulated in one of the many language-specific MCP SDK. MCP Clients and Servers are then built wrapped around these SDKs, which are often treated as black boxes.
In this tutorial, we build the equivalent of a simplified MCP SDK, but one that integrates into the app, instead of as a separate library/SDK, to interact directly with Ollama using native Ollama API, bypassing MCP.
API and protocol handshakes
We add one new “production” API to Chatter:
llmtools: uses HTTP POST to post user’s prompt, with tool definition(s), to Ollama as a JSON Object and to receive Ollama’s response, with tool calls(s), as an HTTP SSE stream.
And one “testing” APIs for use with development tools like Postman (but not with your front end):
weather: uses HTTP POST to post user’s prompt, withget_weather()tool definition, as a JSON Object, to test tool call fulfillment at the backend, short circuiting the front end.
Using this syntax:
url-endpoint
-> request: data sent to Server
<- response: data sent to Client
The protocol handshakes consist of:
/llmtools
-> HTTP POST { appID, model, messages, stream, tools }
<- { SSE event-stream lines } 200 OK
where each element in the messages array can carry no, one, or more tool calls.
The streaming part of MCP’s new “Streaming HTTP” protocol also uses SSE to stream.
/weather
-> HTTP GET { lat, lon }
<- { "Weather at lat: 42.29272, lon: -83.71627 is 56.5ºF" } 200 OK
Data formats
To post a prompt to Ollama with the llmtools API, the front-end client sends
a JSON Object consisting of an appID field, to uniquely identify this
client device for PostgreSQL database sharing, to store the user’s prompt context;
a model field, for the LLM model we want Ollama to use, a messages field for the
prompt itself (more details below), a stream field to indicate whether we want
Ollama to stream its response or to batch and send it in one message.
Tool definition JSON
A request from the client to Ollama can carry a tools field, which is a JSON
Array of tool signatures as JSON Objects. For example, here’s a request with the
signature of a single tool, get_location:
{
"appID": "edu.umich.reactive.postman",
"model": "qwen3",
"messages": [
{ "role": "user", "content": "What is the lat/lon at my location?" }
],
"stream": true,
"tools": [
{
"type": "function",
"function": {
"description": "Get current location",
"name": "get_location",
"parameters": null
}
}
]
}
The messages field is mostly the same as in the llmChat tutorial: a JSON Array
with one or more JSON Objects as its elements. In the example above, there is only one
JSON Object in the messages array. Each element of messages consists of a role
field, which can be "system", to give instructions (“prompt engineering”) to the model,
"user", to carry user’s prompt, "assistant", to indicate reply (“prompt completion”)
from the model, etc. The content field hold the actual instruction, prompt, reply, etc.
from the entity listed in role.
Tool call JSON
A response from the model in Ollama can carry a tool_calls field in the JSON Object
of messages. A tool_calls field is a JSON Array listing the tools the model wants to
call. While Qwen3 supports parallel tool calls, it looks like Ollama API surface doesn’t
yet provide parallel tool calls. In this tutorial we only explore serial tool calls.
Each tool call is specified as a JSON Object, e.g., here’s an example Ollama
response with qwen3 calling the get_location tool:
{
"model": "qwen3",
"created_at": "2025-10-20T18:13:28.011173Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_location",
"arguments": {}
}
}
]
},
"done": false
}
There is no special format for sending tool call results back to Ollama. Tool call results
are sent in the content field of a new HTTP POST request sent to Ollama’s /chat API, along
with all the messages exchanged between the client and Ollama up to the call results, including
the message with the tool calls. This gives the model sufficient context to interpret the new
message as a response carrying the tool call result.
Specifications
As in previous tutorials, you only need to build one front end, AND one of the alternative back-end stacks. To receive full credit, your front end MUST work with your own back end and with mada.eecs.umich.edu (see below).
You should start this tutorial with the backend:
before tackling the frontend:
End-to-end Testing
You will need a working front end to fully test your backend’s handling of tool calls.
Once you have your front end implemented, first test it against the provided
back end on mada.eecs.umich.edu: change the serverUrl property in your ChattStore
to mada.eecs.umich.edu.
With qwen3 specified as the model in your ChattViewModel, send a request to mada
with the prompt, “What’s the weather at my location?” After a long <think></think>
process, you should see Qwen3 reporting the temperature at your current lat/lon position.
Of all the models available on Ollama, only the qwen3:8b (= qwen3) model works well
for tool use. Even gpt-oss:20b that is supposed to support tool use
doesn’t work reliably.
Unfortunately, you cannot run qwen3:8b on a *-micro instance. You’d need more than
6 GB of memory to run qwen3:8b comfortably. To be graded, your frontend must work with
the Ollama accessible from mada.
Since mada is a shared resource and Ollama is single tasking, you would have to wait
your turn if others are using mada. If your laptop has the necessary amount of memory,
you can use it to test the tutorial locally before testing it on mada. In any case,
don’t wait until the deadline to test your code and then get stuck behind a long line
of classmates trying to access mada.
Limited end-to-end backend testing
Due to the limited resources of *-micro instances, please pull model qwen3:0.6b to your
Ollama instance. This model works ok for tool calling, as long as you don’t ask
it to reason about chaining tool calls. For example, it won’t be able to reason that it must
first call get_location to obtain lat/lon, and then use these to call get_weather.
Instead, you have to do the dependency resolution for it. With qwen3:0.6b specified
as the model in ChattViewModel:
- to test the non-backend resident tool,
get_location, send the prompt, “Get my location using the get_location tool.” - once the location is shown on screen, to test the backend-resident tool,
get_weather, send the next prompt, “What’s the weather at my location?” The model should recognize that there’s a tool reply with lat/lon location in the second prompt’s context and be able to use them to make the second tool call.
To get full credit, your backend implementation running on a *-micro instance
must pass this test. When submitting your frontend, make sure your serverUrl
is set to YOUR_SERVER_IP so that we know what your server IP is. You will not
get full credit if your front end is not set up to work with your backend!
| Prepared by Sugih Jamin | Last updated: October 26th, 2025 |