Project 3: llmAction: Agentic AI in Action

Course Schedule

The exciting thing about agentic AIs is they can act on the physical world, beyond looking up information. They can do this when we give them tools that are not just pure functions, but tools with real-world, irreversible side effects. We call such tools “operative tools.” We will build such an operative tool in this project, albeit one with very limited capabilities. While the actions of this tool indeed cannot be undone, and the resources consumed cannot be reclaimed, their ultimate outcomes can still be rectified and nullified.

To guard against potential harm operative tools can inflict upon their users and the physical world, we put limits and guardrails against their capacity. They can only perform a defined set of actions in a contained system (we hope). We impose a human-in-the-loop (HITL) checks. In this project, we require authorization by authenticated user before our operative tool can perform any of its actions.

You MUST submit your LLM Prompts and Skills and Rules file(s) as part of this assignment.

Partial credits

To help you with time management, break your approach down to smaller tasks, and to help structure your solution, you can earn partial credits by completing the following two tutorials by their deadlines, as listed in the Course Schedule:

llmTools (28% of project grade)
Signin (28% of project grade)

Completing and submitting the tutorials by their respective deadlines are optional, though the features and functionalities, embodied in the tutorials are REQUIRED of this project.

This project may be completed individually or in teams of at most 2. You can partner differently for each project. Only ONE member of a team needs to submit the project and its tutorials.

Due to the limitations of smaller LLMs available on Ollama, we found that the qwen3.5:9b model is the most efficient when performing chained actions. We have provided access to this model along with gemma4:e2b (7 GB) and gemma4:e4b (10 GB) on mada.eecs.umich.edu for use in this project. See the Testing section below.

Treat your messages sent to mada as public utterances with no reasonable expectation of privacy and know that these are recorded to provide context to Ollama and are inspected by the teaching staff both to observe the workings of the tool and tool-use infrastructe and as part of the grading process.

Objectives

In addition to the objectives listed in the llmTools and Signin tutorials, this project has the following objectives:

Practice and apply the objectives learned in the two tutorials
On the back end, create an operative tool that can affect the physical world, with HITL safe guard
On the front end, create an operative tool that authenticate the user as part of the HITL system

Expected behavior

Sending a prompt to Ollama triggering operative tool use with human-in-the-loop check. The human “approval” consists of verifying the validity of an authorization token obtained after authentication with Google Signin. Storage and access to the limited-lifetime authorization token on user’s device requires biometric authentication:

DISCLAIMER: the video demoes show you one aspect of the app’s behavior. It is not a substitute for the spec. If there are any discrepancies between the demo and this spec, please follow the spec. The spec is the single source of truth. If the spec is ambiguous, please consult the teaching staff for clarification.

Features and requirements

To receive full credits, your app must provide the following features and satisfy the following requirements, including those in any applicable “Implementation guidelines” documents.

Front-end UI

As can be seen in the video demo, the app consists of a single screen with the following UI elements:

a title bar showing the title LlmAction with HITL,
a timeline of posted prompts shown on the right of screen and LLM responses on the left,
the following UI elements placed at the bottom of the screen:
- a text box spanning the left and middle part of the input area,
- a “Send” button on the right of the textbox showing a “paper plane” icon. This button is enabled only when the text box is not empty and no networking session is in progress.
  
  When the button is “disabled”, it is grayed out and tapping on it has no effect.
  
  While there is a networking session in progress, that is, while waiting for Ollama’s response to a prompt, the “Send” button’s icon changes from a “paper plane” to an animated “loading” circle,
the app allows user to sign in with Google Signin,
the app serves as a front end to obtain user’s biometric authentication on device.

UI Design

One can easily spend a whole weekend (or more!) getting the UI “just right.”

Remember: we won’t be grading you on how beautiful your UI looks nor how precisely it matches the one shown on the video demo. You’re free to design your UI differently, so long as all indicated UI elements are fully visible on the screen, non overlapping, and functioning as specified.

Front-end UX

To invoke the provided operative tool, the LLM must first obtain authorization from the user by invoking another provided tool, get_auth. When an LLM invokes get_auth, the app allows user to sign in to Google Signin and obtain a limited-lifetime authorization token. Storage and access to this authorization token on device is guarded by biometric authentication.

API

We use the /llmprep, /llmtools, and /adduser APIs from the previous tutorials; there are no new APIs.

Back-end infrastructures

The back end is expected to have the same requirements and provide all the tool-calling and communication infrastructure as described in the llmTools tutorial, including converting NDJSON to SSE streams.

The back end provides an ollama_cli tool that LLMs can invoke to perform Ollama’s commands, such as listing models currently available, pulling and removing models, etc.

On mada.eecs.umich.edu, the Ollama the tool can access is not the Ollama processing user prompts. So you don’t have to worry about crippling the Ollama serving the app. When testing your own back end with your own instance of Ollama that your app relies on, be careful not to delete the model you need to run your app!

Implementation and submission guidelines

Back end

Front end

To add get_auth as a front-end tool, first create a schema file, get_auth.json, in the tools directory you created in /YOUR:TUTORIALS/ directory in the llmTools tutorial. Put the following JSON schema in the file:

{
  "type": "function",
  "function": {
    "name": "get_auth",
    "description": "Get authorization token",
    "parameters": null
  }
}

End-to-end Testing

Testing of llmAction is very similar to testing llmTools.

You will need a working front end to fully test your back end’s handling of tool calls. Once you have your front end implemented, first test it against the provided back end on mada.eecs.umich.edu. We found that LLM models smaller than qwen3.5:9b (7.7 GB RAM) cannot conduct chained tool calls efficiently, nor assemble tool arguments from multiple sources.

With qwen3.5:9b, gemma4:e4b, or gemma4:e2b specified as the model to use, send a request to mada.eecs.umich.edu with the prompt, “List all models available on Ollama.” After a (sometimes very long) thinking process, the model will call the get_auth tool on your device.

Once the model is granted authorization, it will call the ollama_cli tool on the back end (chatterd) to run the command ollama ls to list all available Ollama models and complete the prompt.

NOTE: we found that sometimes only the table header of the ls command result is returned to ollama_cli(). The model then either informs the user that the table is empty—while it shouldn’t have been, or worse, lists some well-known but not necessarily present models! Just repeating the prompt again (and again) usually would (eventually) show the correct results. This has been observed on our various back-end stacks, so could be an Ollama issue?

Since mada is a shared resource and Ollama serves one HTTP request at a time, you would have to wait your turn if others are using mada. If your laptop has the necessary resources (8GB+ HD and RAM), you may want to pull model qwen3.5:9b to the Ollama running on your laptop and use it to test your app locally before testing on mada. In any case, don’t wait until the deadline to test your code and then get stuck behind a long line of classmates trying to access mada.

Limited

end-to-end testing

Due to the limited resources of *-micro instances, please pull model qwen3:0.6b to your Ollama instance. This model works ok for tool calling, as long as you don’t ask it to reason about chained tool calls or assembling tool arguments without explicit instructions. Instead, you have to do them for it. With qwen3:0.6b specified on your front end:

to test the front end tool, get_auth, send the prompt, “Get token with get_auth.”
once the chatterID (a string of hex number) is shown on screen, to test the back-end tool, ollama_cli, send the next prompt, “List Ollama models by calling ollama_cli with token as first argument and ls as second argument.”

The model should recognize that there’s a tool reply with the authorization token in the second prompt’s context and be able to use it to make the second tool call. Unfortunately it doesn’t seem capable of assembling arguments without explicit instruction.

To get full credit, your back end implementation running on a *-micro instance must pass this test. When submitting your front end, make sure your serverUrl is set to YOUR_SERVER_IP so that we know what your server IP is. You will not get full credit if your front end is not set up to work with your back end!

You MUST submit your LLM Prompts and Skills and Rules file(s) as part of this assignment.

Prepared by Xin Jie ‘Joyce’ Liu, Chenglin Li, Sugih Jamin

Last updated: March 14th, 2026