Tutorial: llmChat

Course Schedule

Prompt context: when interacting with users, LLMs are usually memoryless. Each prompt is standalone. If you want to keep a continuing back-and-forth “conversation” with an LLM, you must send a transcript (“context”) of your on-going “conversation” up to the current point. In this tutorial, we store each prompt from the user and the reply (“completion”) from the LLM (“assistant”) in our back-end PostgreSQL database. With each new prompt from the user, we send the full transcript of the user’s conversation with the LLM, in chronological order, over to the LLM before sending the new prompt.

Server-sent event streaming: while Ollama uses NDJSON to stream its reply, commercial LLMs, such as ChatGPT, and tool use protocol, such as MCP’s “Streaming HTTP”, often use server-sent event (SSE) to stream replies. The advantage of SSE over NDJSON is that you can have multiple streams interleaved into one connection. In this tutorial, we see how error messages can be tagged as such and sent alongside normal messages, so that the front end can identify and handle them separately. We will have another use of stream interleaving in subsequent tutorials and projects.

Image showing
chatterd serving as SSE proxy and providing context for Ollama

About the tutorials

This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.

Treat messages you send to chatterd and Ollama as public utterances with no reasonable expectation of privacy and know that these are recorded for grading purposes and provided to Ollama For contextual interactions.

Objectives

Front end:

Back end:

TODO

There is a TODO item at the end of both the front-end and back-end specs that you must complete as part of the tutorial.

API and protocol handshakes

In this tutorial, we add two new APIs to Chatter:

Using this syntax:

url-endpoint
-> request: data sent to Server
<- response: data sent to Client followed by HTTP Status

The protocol handshakes consist of:

/llmchat
-> HTTP POST { appID, model, messages, stream=true }
<- { SSE event-stream lines } 200 OK

/llmprep
-> HTTP POST { appID, model, messages, stream=false }
<- { system prompt or error message } 200 OK or HTTP error status code

Data formats

To post a prompt to Ollama with either the llmchat or llmprep API, the front-end client sends a JSON Object consisting of an appID field, to uniquely identify this client app for PostgreSQL database sharing; a model field, for the LLM model we want Ollama to use, a messages field for the prompt itself (more details below), and a stream field to indicate whether we want Ollama to stream its response (true) or to batch and send it in one message (false). For example:

{	
    "appID": "edu.umich.reactive.postman",
    "model": "gemma3:270m",
    "messages": [
        { "role": "user", "content": "Where is Agra?" }
    ],
    "stream": true
}

The messages field is a JSON Array with one or more JSON Object as its element. In the example above there is only one JSON Object in the messages array. Each element of messages consists of a role field, which can be "system", to give instructions (“prompt engineering”) to the model, "user", to carry user’s prompt, "assistant", to indicate reply (“prompt completion”) from the model, etc. For the /llmprep API, the "role" of message(s) should be "system"; whereas for the /llmchat API, the "role" of message(s) should be "user". The content field hold the actual instruction, prompt, reply, etc. from the entity listed in role. Remember to separate each element with a comma if you have more than one elements in the array. The "stream" field should be set to false for /llmprep and true for /llmchat. When "stream" is true, response from chatterd is streamed using Server-Sent Event (SSE) streaming.

SSE stream

Server-Sent Event (SSE) stream consists of text strings in a specific format:

event: EventName
data: a line of info associated with EventName event

event: ANewEvent
data: a line of ANewEvent info
data: another line of ANewEvent info

data: a line of info implicitly associated with the default Message event

data: another line also of the Message event

Each event is tagged with an event line followed by the event’s name. An event line is delimited with a newline ('\n' or, for streams from a Python server or on Windows, "\r\n"). Then follow one or more lines of data associated with that event. Each data line is also delimited with a newline. An empty line (equivlently, two consecutive newlines "\n\n") denotes the end of an event block.

When no event is explicitly specified prior to a data line, as in the last two data lines in the above example, the data line is assumed to be associated with the default Message event. To put it another way, a data line after an empty line is assumed to be part of the default Message event.

An SSE stream returning from an /llmchat POST would look like the following on curl:

data: {"model":"gemma3:270m","created_at":"2025-08-08T20:35:52.586157Z","message":{"role":"assistant","content":"Ab"},"done":false}

data: {"model":"gemma3:270m","created_at":"2025-08-08T20:35:52.603345Z","message":{"role":"assistant","content":"sol"},"done":false}

data: {"model":"gemma3:270m","created_at":"2025-08-08T20:35:52.620774Z","message":{"role":"assistant","content":"utely"},"done":false}

. . . .

data: {"model":"gemma3:270m","created_at":"2025-08-08T20:35:53.814197Z","message":{"role":"assistant","content":"!"},"done":false}

data: {"model":"gemma3:270m","created_at":"2025-08-08T20:35:53.832383Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":1554222000,"load_duration":24007209,"prompt_eval_count":581,"prompt_eval_duration":272983000,"eval_count":71,"eval_duration":1247892000}

On Postman, the data tag will not be shown, only the data lines (here the model used is tinyllama): Image showing

On error, an error event will be returned by an /llmchat POST:

event: error
data: {"error": "<error message>"}

Specifications

As in previous tutorials, you only need to build one front end, AND one of the alternative back-end stacks. Until you have a working backend of your own, you can use mada.eecs.umich.edu to test your front end. To receive full credit, your front end MUST work with your own back end.

The balance of work in this tutorial is more heavily weighted towards the backend:


Prepared by Sugih Jamin Last updated: January 14th, 2026