Tutorial: llmPrompt
DUE Wed, 09/03, 2 pm

Image: Live Chat Symbol Icon, AI Tools Icon Set, ChatGPT
We will build an app called Chatter and endow it with various capabilities in
each tutorial. In this tutorial, Chatter allows user to post prompts to Ollama
running on a backend server. Due to discrepancies between Ollama’s networking 
capabilities and the networking requirements of mobile platforms, we need a 
backend server of our own, which we called chatterd, to serve as proxy to 
bridge between the two. The Chatter front end displays Ollama’s streamed response 
as soon as each element arrives. Treat messages you send to chatterd and Ollama 
as public utterances with no reasonable expectation of privacy.

chatterd serving as HTTP/2 proxy for Ollama
About the tutorials
This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.
We don’t assume any mobile nor backend development experience on your part. We will start by introducing you to the front-end integrated development environment (IDE) and backend development tools.
Are the tutorials cumulative?
Tutorials 1 and 2 form the foundation for all subsequent tutorials. The backend you build from tutorials 1 and 2 will be used throughout the term. Some latter tutorials will build on the frontend of tutorial 1, others will build on the frontend of tutorial 2. So you want to keep a clean zipped copy of each for subsequent uses and modifications.
Since the backend requires manual set up of your computing infrastructure, customized to each person or team, we cannot provide a ready-made solution for you to download. Each person or team must have their own mobile scaffolding and infrastructure from tutorials 1 and 2 to complete the latter tutorials and projects. If you run into any difficulties setting up yours, please seek help from the teaching staff early and immediately.
Each project builds on the two tutorials associated with it. Project 1’s tasks are covered in tutorials 1 and 2, Project 2’s tasks are covered in tutorials 3 and 4, etc. You should keep a clean copy of each tutorial that you can refer to when working on their associated project.
All the tutorials can be completed by cut-and-pasting code from the specs. However, you will be more productive on homework and projects if you do understand the code. As a general rule of thumb, the less time you spend reading, the more time you will spend debugging.
Objectives
Frontend:
- Familiarize yourself with the development environment
- Start learning Kotlin/Swift syntax and language features
- Learn declarative UI with reactive state management (Compose for Android, SwiftUI for iOS)
- Observe the unidirectional data flow architecture
- Learn HTTP GET and POST asynchronous exchange
- Learn JSON serialization/deserialization
- Use reactive UI to display asynchronous events: display each newline-delimitied JSON (NDJSON) streaming element sent by Ollama as they arrive
- Install self-signed certificate on your mobile device
Backend:
- Setup and run Ollama on a backend server
- Generate self-signed private key and its public key certificate
- Set up an HTTPS server with self-signed certificate
- Introduction to URL path routing
- Use JSON for HTTP request/response to communicate with both Ollama and your front end
- Introduction to Ollama api/generate
- Forward NDJSON streaming response from Ollama to the front end
API and protocol handshake
For this tutorial, Chatter has only one API:
- llmprompt: uses HTTP POST to post user’s prompt for Ollama’s- generateAPI as a JSON Object, which the backend simply forwards to Ollama. Upon receiving Ollama’s response, as an NDJSON stream, the backend simply forwards the stream to the client.
Using this syntax:
url-endpoint
-> request: data sent to Server
<- response: data sent to Client
The protocol handshake consists of:
/llmprompt
-> HTTP POST { model, prompt, stream }
<- { newline-delimited JSON Object(s) } 200 OK
Data format
To post a prompt to Ollama with the llmprompt API, the front-end client sends
a JSON object consisting of a model field, a prompt field, and 
a stream field. The model field holds the name of the model, which in
this tutorial can be either tinyllama or gemma3. The prompt field
carries the prompt to Ollama. The stream field indicate whether Ollama
should stream its response or to batch and send it in one message. For example:
{
    "model": "tinyllama",
    "prompt": "howdy?",
    "stream": true
}
Upon receiving this JSON object, chatterd simply forwards it to Ollama. If streaming 
is turned on, the reply from Ollama is a newline-delimited JSON (NDJSON) 
stream, otherwise it responds with a single JSON Object. The JSON objects carried 
in Ollama’s response is specified in the Ollama documentation for api/generate.
Either way, we only look at the response field of Ollama’s response. We also do 
not accumulate Ollama’s reply to be returned as a single response at completion.
Your submission is required to work at least with model tinyllama 
running on Ollama. The sample backend on mada.eecs.umich.edu can run 
Ollama models gemma3, for a more satisfying interaction with an LLM,
in addition to tinyllama, to verify that your implementation of the
backend is working as expected.
Specifications
There are TWO pieces to this tutorial: a front-end mobile client and a back-end server. For the front end, you can build either for Android with Jetpack Compose or for iOS in SwiftUI. For the back end, you can choose between these web microframeworks: Echo (Go), Express (Typescript), Starlette (Python), or axum (Rust).
You only need to build one front end, AND one of the alternative back-ends. Until you have a working backend of your own, you can use mada.eecs.umich.edu to test your front end. To receive full credit, your front end MUST work with both mada.eecs.umich.edu and your own back end.
IMPORTANT: unless you plan to learn to program both Android and iOS, you should do the tutorials on the same platform as that of your projects.
| Prepared by Sugih Jamin | Last updated: August 27th, 2025 |