Project 3: llmAction: Agentic AI in Action
DUE Mon, 12/08, 6 pm
This programming assignment (PA) may be completed individually or in teams of at most 2. You can partner differently for each PA.
The promise of agentic AI is it can effectuate actions on the physical world, beyond looking up information. The main enablers of this premise are tools that are not just pure functions, but tools with real-world, irreversible side effects. We call such tools “operative tools.” We will build one such operative tool in this programming assignment, albeit a mild one. The tool is “mild” in the sense that while its actions indeed cannot be undone, and the resources consumed cannot be reclaimed, their ultimate outcomes can be rectified and nullified.
To guard against potential harm operative tools can inflict upon its users and the physical world, we limit the capacity of our operative tool. It can only perform a limited set of actions on a contained system (we hope). Other than fixing a tool’s capability and scope, we impose a human-in-the-loop (HITL) requirement on such tools. In this assignment, we implement a very simple HITL system by requiring authentication prior to any operative tool use.
Due to the limitations of smaller LLMs available on Ollama, we found that
the qwen3:8b model is the smallest that can perform chained actions.
We have provided access to this model on mada.eecs.umich.edu for the
completion of this assignment. See the Testing section below.
Treat your messages sent to mada as public utterances with no reasonable
expectation of privacy and know that these are recorded to provide context
to Ollama and are inspected by the teaching staff both to observe the
workings of the tool and tool-use infrastructe and as part of the grading process.
Objectives
In addition to the objectives listed in the llmTools and
Signin tutorials, this PA has the following objectives:
- Practice and apply the objectives learned in the two tutorials
- On the back end, create an operative tool that can affect the physical world, with HITL safe guard
- On the front end, create an operative tool that authenticate the user as part of the HITL system
Expected behavior
Sending a prompt to Ollama that requires operative tool use with human-in-the-loop. The human “approval” consists of verifying the validity of an authorization token obtained after Google Signin. Storage and access to the limited-lifetime authorization token on user’s device requires biometric authentication:
DISCLAIMER: the video demoes show you one aspect of the app’s behavior. It is not a substitute for the spec. If there are any discrepancies between the demo and this spec, please follow the spec. The spec is the single source of truth. If the spec is ambiguous, please consult the teaching staff for clarification.
Features and requirements
Your app must provide the following features and satisfy the following requirements, including those in any applicable “Implementation guidelines” documents, to receive full credit.
Front-end UI
As can be seen in the video demo, the app consists of a single screen with the following UI elements:
- a title bar showing the title
LlmAction with HITL, - a timeline of posted prompts on the right and LLM responses shown on the left,
- these UI elements at the bottom of the screen:
- a text box spanning the left and middle part of the input area,
-
a “Send” button on the right of the textbox showing a “paper plane” icon. This button is enabled only when the text box is not empty and no networking session is in progress.
When the button is “disabled”, it is grayed out and tapping on it has no effect.
While there is a networking session in progress, that is, while waiting for Ollama’s response to a prompt, the “Send” button’s icon changes from a “paper plane” to an animated “loading” circle,
- the app allows user to sign in with Google Signin,
- the app serves as a front end to obtain user’s biometric authentication on device.
UI Design
One can easily spend a whole weekend (or more!) getting the UI “just right.”
Remember: we won’t be grading you on how beautiful your UI
looks nor how precisely it matches the one shown on the video demo. You’re
free to design your UI differently, so long as all indicated UI elements
are fully visible on the screen, non overlapping, and functioning as specified.
Front-end UX
Aside from being a simple chatbot to facilitate user interactions with Ollama, the app serves as an authentication front end of a HITL system. To invoke the provided operative tool, the LLM must first obtain authorization from the user.
The front end provides a get_auth tool for that purpose. When an LLM
invokes get_auth, the app allows user to sign in to Google Signin and
obtain a limited-lifetime authorization token. Storage and access to this
authorization token on device is guarded by biometric authentication.
API
We use the /llmtools and /adduser APIs from the aforementioned tutorials
in this assignment.
Back-end infrastructures
The back end is expected to have the same requirements and provide all the
tool-calling and communication infrastructure as described in the llmTools
tutorial, including support for NDJSON stream from Ollama and SSE stream
to the front end.
The back end provides an ollama_cli tool that LLM can invoke to list
models currently available on the Ollama running on the same host that houses
the back end chatterd. The same tool can be used to run all other ollama
commands such as pulling and removing models.
On mada, the Ollama the
tool can access is not the Ollama processing user prompts from this app.
You don’t have to worry about rendering the Ollama serving the app inoperative.
When testing your own back end with your own instance of Ollama that your
app relies on, be careful not to delete the model you need to run your app!
Implementation and submission guidelines
End-to-end Testing
Testing of llmAction is very similar to that of llmTools.
You will need a working front end to fully test your back end’s handling of
tool calls. Once you have your front end implemented, first test it against
the provided back end on mada.eecs.umich.edu. We found LLM models smaller
than qwen3:8b (5.2 GB) to not be able to conduct chained tool calls, nor
assemble tool arguments from multiple sources. When using mada.eecs.umich.edu
to test your app, specify the qwen3:8b model.
With qwen3:8b specified as the model to use, send a request to mada
with the prompt, “List all models available on Ollama.” After a (sometimes
very long) <think></think> process, the model will call the get_auth
tool on your device.
Once the model is granted an authorization token, it will call the ollama_cli
tool on the back end (chatterd) to run the command ollama ls to complete
the prompt, to list all available Ollama models.
NOTE: we found that sometimes only the table header of the
lscommand result is returned toollama_cli(). The model then either informs the user that the table is empty—while it shouldn’t have been, or worse, lists some well-known but not necessarily present models! Just repeating the prompt again (and again) usually would (eventually) show the correct results. This has been observed on different backend stacks, so could be an Ollama issue?
Since mada is a shared resource and Ollama serves one HTTP request at a time,
you would have to wait your turn if others are using mada. If your laptop has
the necessary resources, you may want to pull model qwen3:8b (5.2 GB) to the
Ollama running on your laptop and use it to test your app locally before testing
on mada. In any case, don’t wait until the deadline to test your code and
then get stuck behind a long line of classmates trying to access mada.
Limited end-to-end testing
Due to the limited resources of *-micro instances, please pull model qwen3:0.6b
to your Ollama instance. This model works ok for tool calling, as long as you don’t
ask it to reason about chained tool calls or assembling tool arguments from multiple
sources. Instead, you have to do them for it. With qwen3:0.6b specified on your
front end:
-
to test the front end tool,
get_auth, send the prompt, “Get token with get_auth.” -
once the
chatterID(a string of hex number) is shown on screen, to test the back-end tool,ollama_cli, send the next prompt, “List Ollama models by calling ollama_cli with token as first argument and ls as second argument.”
The model should recognize that there’s a tool reply with the authorization token in the second prompt’s context and be able to use it to make the second tool call. Unfortunately it doesn’t seem capable of assembling arguments from multiple sources by itself.
To get full credit, your back end implementation running on a *-micro instance
must pass this test. When submitting your front end, make sure your serverUrl
is set to YOUR_SERVER_IP so that we know what your server IP is. You will not
get full credit if your front end is not set up to work with your back end!
| Prepared by Xin Jie ‘Joyce’ Liu, Chenglin Li, Sugih Jamin | Last updated: November 22nd, 2025 |