Tutorial: llmPrompt

DUE Wed, 09/03, 2 pm

Image: Live Chat Symbol Icon, AI Tools Icon Set, ChatGPT

We will build an app called Chatter and endow it with various capabilities in each tutorial. In this tutorial, Chatter allows user to post prompts to Ollama running on a backend server. Due to discrepancies between Ollama’s networking capabilities and the networking requirements of mobile platforms, we need a backend server of our own, which we called chatterd, to serve as proxy to bridge between the two. The Chatter front end displays Ollama’s streamed response as soon as each element arrives. Treat messages you send to chatterd and Ollama as public utterances with no reasonable expectation of privacy.

chatterd serving as HTTP/2 proxy for Ollama

About the tutorials

This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.

We don’t assume any mobile nor backend development experience on your part. We will start by introducing you to the front-end integrated development environment (IDE) and backend development tools.

Are the tutorials cumulative?

Tutorials 1 and 2 form the foundation for all subsequent tutorials. The backend you build from tutorials 1 and 2 will be used throughout the term. Some latter tutorials will build on the frontend of tutorial 1, others will build on the frontend of tutorial 2. So you want to keep a clean zipped copy of each for subsequent uses and modifications.

Since the backend requires manual set up of your computing infrastructure, customized to each person or team, we cannot provide a ready-made solution for you to download. Each person or team must have their own mobile scaffolding and infrastructure from tutorials 1 and 2 to complete the latter tutorials and projects. If you run into any difficulties setting up yours, please seek help from the teaching staff early and immediately.

Each project builds on the two tutorials associated with it. Project 1’s tasks are covered in tutorials 1 and 2, Project 2’s tasks are covered in tutorials 3 and 4, etc. You should keep a clean copy of each tutorial that you can refer to when working on their associated project.

All the tutorials can be completed by cut-and-pasting code from the specs. However, you will be more productive on homework and projects if you do understand the code. As a general rule of thumb, the less time you spend reading, the more time you will spend debugging.

Objectives

Frontend:

Familiarize yourself with the development environment
Start learning Kotlin/Swift syntax and language features
Learn declarative UI with reactive state management (Compose for Android, SwiftUI for iOS)
Observe the unidirectional data flow architecture
Learn HTTP GET and POST asynchronous exchange
Learn JSON serialization/deserialization
Use reactive UI to display asynchronous events: display each newline-delimitied JSON (NDJSON) streaming element sent by Ollama as they arrive
Install self-signed certificate on your mobile device

Backend:

Setup and run Ollama on a backend server
Generate self-signed private key and its public key certificate
Set up an HTTPS server with self-signed certificate
Introduction to URL path routing
Use JSON for HTTP request/response to communicate with both Ollama and your front end
Introduction to Ollama api/generate
Forward NDJSON streaming response from Ollama to the front end

API and protocol handshake

For this tutorial, Chatter has only one API:

llmprompt: uses HTTP POST to post user’s prompt for Ollama’s generate API as a JSON Object, which the backend simply forwards to Ollama. Upon receiving Ollama’s response, as an NDJSON stream, the backend simply forwards the stream to the client.

Using this syntax:

url-endpoint
-> request: data sent to Server
<- response: data sent to Client

The protocol handshake consists of:

/llmprompt
-> HTTP POST { model, prompt, stream }
<- { newline-delimited JSON Object(s) } 200 OK

Data format

To post a prompt to Ollama with the llmprompt API, the front-end client sends a JSON object consisting of a model field, a prompt field, and a stream field. The model field holds the name of the model, which in this tutorial can be either tinyllama or gemma3. The prompt field carries the prompt to Ollama. The stream field indicate whether Ollama should stream its response or to batch and send it in one message. For example:

{
    "model": "tinyllama",
    "prompt": "howdy?",
    "stream": true
}

Upon receiving this JSON object, chatterd simply forwards it to Ollama. If streaming is turned on, the reply from Ollama is a newline-delimited JSON (NDJSON) stream, otherwise it responds with a single JSON Object. The JSON objects carried in Ollama’s response is specified in the Ollama documentation for api/generate. Either way, we only look at the response field of Ollama’s response. We also do not accumulate Ollama’s reply to be returned as a single response at completion.

Your submission is required to work at least with model tinyllama running on Ollama. The sample backend on mada.eecs.umich.edu can run Ollama models gemma3, for a more satisfying interaction with an LLM, in addition to tinyllama, to verify that your implementation of the backend is working as expected.

Specifications

There are TWO pieces to this tutorial: a front-end mobile client and a back-end server. For the front end, you can build either for Android with Jetpack Compose or for iOS in SwiftUI. For the back end, you can choose between these web microframeworks: Echo (Go), Express (Typescript), Starlette (Python), or axum (Rust).

You only need to build one front end, AND one of the alternative back-ends. Until you have a working backend of your own, you can use mada.eecs.umich.edu to test your front end. To receive full credit, your front end MUST work with both mada.eecs.umich.edu and your own back end.

IMPORTANT: unless you plan to learn to program both Android and iOS, you should do the tutorials on the same platform as that of your projects.

llmPrompt Backend

Prepared by Sugih Jamin

Last updated: August 27th, 2025