Tutorial: llmPrompt
DUE Wed, 09/03, 2 pm
Image: Live Chat Symbol Icon, AI Tools Icon Set, ChatGPT
We will build an app called Chatter
and endow it with various capabilities in
each tutorial. In this tutorial, Chatter
allows user to post prompts to Ollama
running on a backend server. Due to discrepancies between Ollama’s networking
capabilities and the networking requirements of mobile platforms, we need a
backend server of our own, which we called chatterd
, to serve as proxy to
bridge between the two. The Chatter
front end displays Ollama’s streamed response
as soon as each element arrives. Treat messages you send to chatterd
and Ollama
as public utterances with no reasonable expectation of privacy.
chatterd serving as HTTP/2 proxy for Ollama
About the tutorials
This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.
We don’t assume any mobile nor backend development experience on your part. We will start by introducing you to the front-end integrated development environment (IDE) and backend development tools.
Are the tutorials cumulative?
Tutorials 1 and 2 form the foundation for all subsequent tutorials. The backend you build from tutorials 1 and 2 will be used throughout the term. Some latter tutorials will build on the frontend of tutorial 1, others will build on the frontend of tutorial 2. So you want to keep a clean zipped copy of each for subsequent uses and modifications.
Since the backend requires manual set up of your computing infrastructure, customized to each person or team, we cannot provide a ready-made solution for you to download. Each person or team must have their own mobile scaffolding and infrastructure from tutorials 1 and 2 to complete the latter tutorials and projects. If you run into any difficulties setting up yours, please seek help from the teaching staff early and immediately.
Each project builds on the two tutorials associated with it. Project 1’s tasks are covered in tutorials 1 and 2, Project 2’s tasks are covered in tutorials 3 and 4, etc. You should keep a clean copy of each tutorial that you can refer to when working on their associated project.
All the tutorials can be completed by cut-and-pasting code from the specs. However, you will be more productive on homework and projects if you do understand the code. As a general rule of thumb, the less time you spend reading, the more time you will spend debugging.
Objectives
Frontend:
- Familiarize yourself with the development environment
- Start learning Kotlin/Swift syntax and language features
- Learn declarative UI with reactive state management (Compose for Android, SwiftUI for iOS)
- Observe the unidirectional data flow architecture
- Learn HTTP GET and POST asynchronous exchange
- Learn JSON serialization/deserialization
- Use reactive UI to display asynchronous events: display each newline-delimitied JSON (NDJSON) streaming element sent by Ollama as they arrive
- Install self-signed certificate on your mobile device
Backend:
- Setup and run Ollama on a backend server
- Generate self-signed private key and its public key certificate
- Set up an HTTPS server with self-signed certificate
- Introduction to URL path routing
- Use JSON for HTTP request/response to communicate with both Ollama and your front end
- Introduction to Ollama
api/generate
- Forward NDJSON streaming response from Ollama to the front end
API and protocol handshake
For this tutorial, Chatter
has only one API:
llmprompt
: uses HTTP POST to post user’s prompt for Ollama’sgenerate
API as a JSON Object, which the backend simply forwards to Ollama. Upon receiving Ollama’s response, as an NDJSON stream, the backend simply forwards the stream to the client.
Using this syntax:
url-endpoint
-> request: data sent to Server
<- response: data sent to Client
The protocol handshake consists of:
/llmprompt
-> HTTP POST { model, prompt, stream }
<- { newline-delimited JSON Object(s) } 200 OK
Data format
To post a prompt to Ollama with the llmprompt
API, the front-end client sends
a JSON object consisting of a model
field, a prompt
field, and
a stream
field. The model
field holds the name of the model, which in
this tutorial can be either tinyllama
or gemma3
. The prompt
field
carries the prompt to Ollama. The stream
field indicate whether Ollama
should stream its response or to batch and send it in one message. For example:
{
"model": "tinyllama",
"prompt": "howdy?",
"stream": true
}
Upon receiving this JSON object, chatterd
simply forwards it to Ollama. If streaming
is turned on, the reply from Ollama is a newline-delimited JSON (NDJSON)
stream, otherwise it responds with a single JSON Object. The JSON objects carried
in Ollama’s response is specified in the Ollama documentation for api/generate
.
Either way, we only look at the response
field of Ollama’s response. We also do
not accumulate Ollama’s reply to be returned as a single response at completion.
Your submission is required to work at least with model tinyllama
running on Ollama. The sample backend on mada.eecs.umich.edu can run
Ollama models gemma3
, for a more satisfying interaction with an LLM,
in addition to tinyllama
, to verify that your implementation of the
backend is working as expected.
Specifications
There are TWO pieces to this tutorial: a front-end mobile client and a back-end server. For the front end, you can build either for Android with Jetpack Compose or for iOS in SwiftUI. For the back end, you can choose between these web microframeworks: Echo (Go), Express (Typescript), Starlette (Python), or axum (Rust).
You only need to build one front end, AND one of the alternative back-ends. Until you have a working backend of your own, you can use mada.eecs.umich.edu to test your front end. To receive full credit, your front end MUST work with both mada.eecs.umich.edu and your own back end.
IMPORTANT: unless you plan to learn to program both Android and iOS, you should do the tutorials on the same platform as that of your projects.
Prepared by Sugih Jamin | Last updated: August 27th, 2025 |