Tutorial: llmTools

DUE Wed, 11/12, 2 pm

This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.

Treat your messages sent to chatterd and Ollama as public utterances with no reasonable expectation of privacy and know that these are recorded for the purposes of carrying out a contextual interaction with Ollama.

Expected behavior

This tutorial app does only one thing, namely to answer the question, “What is the weather at my location?”. It does this by using two tools: get_location(), which queries your device for its GPS lat/lon coordinates, and get_weather(), which uses takes lat/lon as argument to query the free, open-source weather service Open Meteo, which has a free-to-use API.

DISCLAIMER: the video demo shows you one aspect of the app’s behavior. It is not a substitute for the spec. If there are any discrepancies between the demo and the spec, please follow the spec. The spec is the single source of truth. If the spec is ambiguous, please consult the teaching staff for clarification.

Objectives

Front end:

Learn how to provide a tool on device
Setup tool calling infrastructure
How to recognize tool-call event in the SSE stream and do the tool call

Back end:

Learn how to provide a tool in the backend
Setup tool calling and forwarding infrastructure
How to recognize tool-call event in Ollama’s NDJSON stream and do the tool call or forward it as an SSE event to the front-end

LLM tool use or function calling

LLM’s “tool use” (equivalently, “function calling”) is the main enabler of agentic AI, allowing the LLM to act as an agent to accomplish tasks (on user’s behalf), for example, looking up information on the web, making a reservation, buying tickets, etc. The most widely known tool-use infrastructure is the open-standard Model Context Protocol (MCP). Unfortunately, MCP doesn’t directly match Ollama’s tool-use APIs. You’d need an MCP-Ollama Bridge to translate Ollama’s tool-use API to MCP’s.

A full MCP implementation and deployment involves satisfying a good amount of constraints and requirements and the (auto) generation of a substantial amount of boiler-plate code. The mechanism for tool provisioning and tool use itself is normally encapsulated in one of the many language-specific MCP SDK. MCP Clients and Servers are then built wrapped around these SDKs, which are often treated as black boxes.

In this tutorial, we build the equivalent of a simplified MCP SDK, but one that integrates into the app, instead of as a separate library/SDK, to interact directly with Ollama using native Ollama API, bypassing MCP.

API and protocol handshakes

We add one new “production” API to Chatter:

llmtools: uses HTTP POST to post user’s prompt, with tool definition(s), to Ollama as a JSON Object and to receive Ollama’s response, with tool calls(s), as an HTTP SSE stream.

And one “testing” APIs for use with development tools like Postman (but not with your front end):

weather: uses HTTP POST to post user’s prompt, with get_weather() tool definition, as a JSON Object, to test tool call fulfillment at the backend, short circuiting the front end.

Using this syntax:

url-endpoint
-> request: data sent to Server
<- response: data sent to Client

The protocol handshakes consist of:

/llmtools
-> HTTP POST { appID, model, messages, stream, tools }
<- { SSE event-stream lines } 200 OK

where each element in the messages array can carry no, one, or more tool calls.

The streaming part of MCP’s new “Streaming HTTP” protocol also uses SSE to stream.

/weather
-> HTTP GET { lat, lon }
<- { "Weather at lat: 42.29272, lon: -83.71627 is 56.5ºF" } 200 OK

Data formats

To post a prompt to Ollama with the llmtools API, the front-end client sends a JSON Object consisting of an appID field, to uniquely identify this client device for PostgreSQL database sharing, to store the user’s prompt context; a model field, for the LLM model we want Ollama to use, a messages field for the prompt itself (more details below), a stream field to indicate whether we want Ollama to stream its response or to batch and send it in one message.

Tool definition JSON

A request from the client to Ollama can carry a tools field, which is a JSON Array of tool signatures as JSON Objects. For example, here’s a request with the signature of a single tool, get_location:

{	
  "appID": "edu.umich.reactive.postman",
  "model": "qwen3",
  "messages": [
      { "role": "user", "content": "What is the lat/lon at my location?" }
  ],
  "stream": true,
  "tools": [
      {
          "type": "function",
          "function": {
              "description": "Get current location",
              "name": "get_location",
              "parameters": null
          }
      }
  ]
}

The messages field is mostly the same as in the llmChat tutorial: a JSON Array with one or more JSON Objects as its elements. In the example above, there is only one JSON Object in the messages array. Each element of messages consists of a role field, which can be "system", to give instructions (“prompt engineering”) to the model, "user", to carry user’s prompt, "assistant", to indicate reply (“prompt completion”) from the model, etc. The content field hold the actual instruction, prompt, reply, etc. from the entity listed in role.

Tool call JSON

A response from the model in Ollama can carry a tool_calls field in the JSON Object of messages. A tool_calls field is a JSON Array listing the tools the model wants to call. While Qwen3 supports parallel tool calls, it looks like Ollama API surface doesn’t yet provide parallel tool calls. In this tutorial we only explore serial tool calls. Each tool call is specified as a JSON Object, e.g., here’s an example Ollama response with qwen3 calling the get_location tool:

{
    "model": "qwen3",
    "created_at": "2025-10-20T18:13:28.011173Z",
    "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
            {
                "function": {
                    "name": "get_location",
                    "arguments": {}
                }
            }
        ]
    },
    "done": false
}

There is no special format for sending tool call results back to Ollama. Tool call results are sent in the content field of a new HTTP POST request sent to Ollama’s /chat API, along with all the messages exchanged between the client and Ollama up to the call results, including the message with the tool calls. This gives the model sufficient context to interpret the new message as a response carrying the tool call result.

Specifications

As in previous tutorials, you only need to build one front end, AND one of the alternative back-end stacks. To receive full credit, your front end MUST work with your own back end and with mada.eecs.umich.edu (see below).

You should start this tutorial with the backend:

llmtools Backend

before tackling the frontend:

End-to-end Testing

You will need a working front end to fully test your backend’s handling of tool calls. Once you have your front end implemented, first test it against the provided back end on mada.eecs.umich.edu: change the serverUrl property in your ChattStore to mada.eecs.umich.edu.

With qwen3 specified as the model in your ChattViewModel, send a request to mada with the prompt, “What’s the weather at my location?” After a long <think></think> process, you should see Qwen3 reporting the temperature at your current lat/lon position.

Of all the models available on Ollama, only the qwen3:8b (= qwen3) model works well for tool use. Even gpt-oss:20b that is supposed to support tool use doesn’t work reliably. Unfortunately, you cannot run qwen3:8b on a *-micro instance. You’d need more than 6 GB of memory to run qwen3:8b comfortably. To be graded, your frontend must work with the Ollama accessible from mada.

Since mada is a shared resource and Ollama is single tasking, you would have to wait your turn if others are using mada. If your laptop has the necessary amount of memory, you can use it to test the tutorial locally before testing it on mada. In any case, don’t wait until the deadline to test your code and then get stuck behind a long line of classmates trying to access mada.

Limited end-to-end backend testing

Due to the limited resources of *-micro instances, please pull model qwen3:0.6b to your Ollama instance. This model works ok for tool calling, as long as you don’t ask it to reason about chaining tool calls. For example, it won’t be able to reason that it must first call get_location to obtain lat/lon, and then use these to call get_weather.

Instead, you have to do the dependency resolution for it. With qwen3:0.6b specified as the model in ChattViewModel:

to test the non-backend resident tool, get_location, send the prompt, “Get my location using the get_location tool.”
once the location is shown on screen, to test the backend-resident tool, get_weather, send the next prompt, “What’s the weather at my location?” The model should recognize that there’s a tool reply with lat/lon location in the second prompt’s context and be able to use them to make the second tool call.

To get full credit, your backend implementation running on a *-micro instance must pass this test. When submitting your frontend, make sure your serverUrl is set to YOUR_SERVER_IP so that we know what your server IP is. You will not get full credit if your front end is not set up to work with your backend!

Prepared by Sugih Jamin

Last updated: October 26th, 2025