Tutorial: llmTools
Course Schedule
LLM tool use: or equivalently, “function calling”, is the main enabler of agentic AI, allowing
LLMs to accomplish tasks such as looking up information on the web, making a reservation, buying
tickets, etc. In this llmTools tutorial and the subsequent llmAction project, we give LLMs
access to web and cloud services, UNIX command-line interface (CLI) tools on the back end, and
front-end tools such as reading GPS location and biometric check.
The most widely known tool-use infrastructure is the open-standard Model Context Protocol (MCP). However a full MCP tool definition involves satisfying a good amount of constraints and requirements, requiring a substantial amount of boiler-plate code. There is increasing realization that tools defined using MCP consume a large number of tokens just to get loaded onto LLMs. This has led to development of agent frameworks such as OpenClaw that was designed to use mainly CLI tools instead of MCP.
Our tutorial and project were designed from the start to use only CLI tools.
About the tutorial
This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.
Treat your messages sent to chatterd and Ollama as public utterances with no reasonable
expectation of privacy and know that these are recorded for the purposes of carrying out a
contextual interaction with Ollama.
Expected behavior
Our tutorial app does only one thing: to answer the question, “What is the weather at my location?”.
It does this by using two tools: get_location(), which queries your device for its GPS lat/lon
coordinates, and get_weather(), which takes the lat/lon of a location as arguments to query the
open-source weather service Open Meteo, which has a free-to-use API.
DISCLAIMER: the video demo shows you one aspect of the app’s behavior. It is not a substitute for the spec. If there are any discrepancies between the demo and the spec, please follow the spec. The spec is the single source of truth. If the spec is ambiguous, please consult the teaching staff for clarification.
Objectives
Front end:
- Learn how to provide a tool on device
- Setup web/cloud-based and on-device tool calling infrastructure
- Recognize tool-call event in an SSE stream
- Perform tool call on device and return results to the LLM through the back end
Back end:
- Learn how to provide a tool in the back end
- Setup web/cloud-based and command-line interface (CLI) tool calling and forwarding infrastructure
- Recognize tool-call events in Ollama’s NDJSON stream and determine whether the tool call is for a back-end or front-end tool
- Forward a front-end tool call as an SSE event to the front end and return the tool-call results to the LLM
- Perform tool call on the back end and return results to LLM
API and protocol handshakes
We add one new “production” API to Chatter:
llmtools: uses HTTP POST to post user’s prompt, with tool definition(s), to Ollama as a JSON Object and to receive Ollama’s response, with tool calls(s), as an HTTP SSE stream.
And one “testing” APIs for use with development tools like Postman (but not with your front end):
weather: uses HTTP POST to post user’s prompt, withget_weather()tool definition, as a JSON Object, to test tool call fulfillment at the back end.
Using this syntax:
url-endpoint
-> request: data sent to Server
<- response: data sent to Client
The protocol handshakes consist of:
/llmtools
-> HTTP POST { appID, model, messages, stream, tools }
<- { SSE event-stream lines } 200 OK
where each element in the messages array can carry no, one, or more tool calls.
/weather
-> HTTP GET { lat, lon }
<- { "Weather at lat: 42.29272, lon: -83.71627 is 56.5ºF" } 200 OK
Data formats
To post a prompt to Ollama with the llmtools API, the front-end client sends
a JSON Object consisting of an appID field, to uniquely identify this
client device for PostgreSQL database sharing, to store the user’s prompt context;
a model field, for the LLM model we want Ollama to use, a messages field for the
prompt itself (more details below); a stream field to indicate whether we want
Ollama to stream its response or to batch and send it in one message; and a tools
field, explained below.
Tool definition JSON
A request from the client to Ollama can carry a tools field, which is a JSON
Array of tool signatures encoded as JSON Objects. For example, here’s a request with the
signature of a single tool, get_location:
{
"appID": "edu.umich.reactive.DUMMY_UNIQNAME",
"model": "qwen3",
"messages": [
{ "role": "user", "content": "What is the lat/lon at my location?" }
],
"stream": true,
"tools": [
{
"type": "function",
"function": {
"name": "get_location",
"description": "Get current location",
"parameters": null
}
}
]
}
The messages field is mostly the same as in the llmChat tutorial: a JSON Array
with one or more JSON Objects as its elements. In the example above, there is only one
JSON Object in the messages array. Each element of messages consists of a role
field, which can be "system", to give instructions (“prompt engineering”) to the model,
"user", to carry user’s prompt, "assistant", to indicate reply (“prompt completion”)
from the model, etc. The content field hold the actual instruction, prompt, reply, etc.
from the entity listed in role.
Let’s save that definition for the get_location tool for later use:
- on your laptop, go to your tutorials folder (
/YOUR:TUTORIALS/) - create a new sub-folder and call it
tools - in your text editor, create a new file and put the JSON schema below in it
- save the file as plain text in your newly created
/YOUR:TUTORIALS/toolsfolder and name itget_location.json(without.txtfilename extension){ "type": "function", "function": { "name": "get_location", "description": "Get current location", "parameters": null } }
Git add your tools folder and the get_location.json file, commit your changes, and push to your
git repo.
Tool call JSON
A response from the model in Ollama can carry a tool_calls field in the JSON Object
of messages. A tool_calls field is a JSON Array listing the tools the model wants to
call. It looks like Ollama API surface doesn’t yet support parallel tool calls–even though
Qwen3, for example, supports parallel tool calls. In this tutorial we will explore only
serial tool calls. Each tool call is specified as a JSON Object, e.g., here’s an example
prompt completion from qwen3 calling the get_location tool:
{
"model": "qwen3",
"created_at": "2025-10-20T18:13:28.011173Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_location",
"arguments": {}
}
}
]
},
"done": false
}
There is no special format for sending tool call results back to Ollama. Briefly, each tool call
completes an HTTP POST request sent to Ollama’s /chat API. Results of the tool call are sent
as a new HTTP POST request to Ollama’s /chat API, with the results stored in the content field
of the new HTTP POST request. Each HTTP POST request with an appID always prepend all the
exchanges between the client and Ollama with the same appID, up to and including the last tool
call. This gives the LLM the context it needs to interpret the new message as a response to its
last tool call result. We will elaborate on this exchange in the specifications for the back end.
Specifications
As in previous tutorials, you only need to build one front end, AND one of the
alternative back-end stacks. To receive full credit, your front end MUST work with your own back
end and with mada.eecs.umich.edu (see below).
You should start this tutorial by working on the back end:
before tackling the front end:
End-to-end Testing
You will need a working front end to fully test your back end’s handling of tool calls.
Once you have your front end implemented, first test it against the provided
back end on mada.eecs.umich.edu: change the serverUrl property in your ChattStore
to mada.eecs.umich.edu.
With qwen3 specified as the model in your ChattViewModel, send a request to mada
with the prompt, “What’s the weather at my location?” After a long <think></think>
process, you should see Qwen3 reporting the temperature at your current lat/lon position.
Of all the models available on Ollama, only the qwen3:8b (= qwen3) model works well
for tool use. Even gpt-oss:20b that is supposed to support tool use
doesn’t work reliably.
Unfortunately, you cannot run qwen3:8b on a *-micro instance. You’d need more than
6 GB of memory to run qwen3:8b comfortably. To be graded, your front end must work with
the Ollama accessible from mada.
Since mada is a shared resource and Ollama is single tasking, you would have to wait
your turn if others are using mada. If your laptop has the necessary amount of memory,
you can use it to test the tutorial locally before testing it on mada. In any case,
don’t wait until the deadline to test your code and then get stuck behind a long line
of classmates trying to access mada.
Limited end-to-end back end testing
Due to the limited resources of *-micro instances, please pull model qwen3:0.6b to your
Ollama instance. This model works ok for tool calling, as long as you don’t ask
it to chain tool calls. For example, it won’t be able to reason that it must
first call get_location to obtain lat/lon, and then use these to call get_weather.
Instead, you have to do the dependency resolution for it. With qwen3:0.6b specified
as the model in ChattViewModel:
- to test the font end tool,
get_location, send the prompt, “Get my location using the get_location tool.” - once the location is shown on screen, to test the back-end tool,
get_weather, send the next prompt, “What’s the weather at my location?” The model should recognize that there’s a tool reply with lat/lon in the context sent with the second prompt and be able to use them to make the second tool call.
To get full credit, your back-end implementation running on a *-micro instance
must pass this test. When submitting your front end, make sure your serverUrl
is set to YOUR_SERVER_IP so that we know what your server IP is. You will not
get full credit if your front end is not set up to work with your back end!
| Prepared by Sugih Jamin | Last updated: March 7th, 2026 |