Tutorial: llmChat

DUE Wed, 10/1, 2 pm

This tutorial may be completed individually or in teams of at most 2. You can partner differently for each tutorial.

Treat your messages sent to chatterd and Ollama as public utterances with no reasonable expectation of privacy and know that these are recorded for the purposes of carrying out a contextual interaction with Ollama.

Objectives

Front end:

To handle SSE stream

Back end:

To store and forward user interactions with LLM
To convert NDJSON stream to SSE stream

Prompt context and server-sent event streaming

When interacting with a user, LLMs are usually memoryless. Each prompt is standalone. If you want to keep a continuing back-and-forth “conversation” with an LLM, you must send a transcript (“context”) of your on-going “conversation” up to the current point. In this tutorial, we store each prompt from the user and the reply (“completion”) from the LLM (“assistant”) in our backend PostgreSQL database. With each new prompt from the user, we send the full content, in chronological order, over to the LLM.

While Ollama uses NDJSON to stream its reply, commercial LLMs, such as ChatGPT, and tool use protocol, such as MCP, often use server-sent event (SSE) to stream replies. The streaming part of MCP’s new “Streaming HTTP” protocol also uses SSE to stream. The advantage of SSE over NDJSON is that you can have multiple streams interleaved into one connection. In this tutorial, we see how error messages can be sent alongside normal messages, and tagged as such, so that the front end could identify and handle them separately. We will have another use of stream interleaving in Project 2 and subsequently.

chatterd serving as SSE proxy and providing context for Ollama

API and protocol handshakes

In this tutorial, we add one new API to Chatter:

llmchat: uses HTTP POST to post user’s prompt for Ollama’s chat API as a JSON Object and to receive Ollama’s response as an HTTP SSE stream.

Using this syntax:

url-endpoint
-> request: data sent to Server
<- response: data sent to Client

The protocol handshakes consist of:

/llmchat
-> HTTP POST { appID, model, messages, stream }
<- { SSE event-stream lines } 200 OK

Data formats

To post a prompt to Ollama with the llmchat API, the front-end client sends a JSON Object consisting of an appID field, to uniquely identify this client device for PostgreSQL database sharing, to store the user’s prompt context; a model field, for the LLM model we want Ollama to use, a messages field for the prompt itself (more details below), and a stream field to indicate whether we want Ollama to stream its response or to batch and send it in one message. For example:

{	
    "appID": "edu.umich.reactive.postman",
    "model": "tinyllama",
    "messages": [
        { "role": "user", "content": "I live in Ann Arbor" }
    ],
    "stream": true
}

The messages field is a JSON Array with one or more JSON Object as its element. In the example above there is only one JSON Object in the messages array. Each element of messages consists of a role field, which can be "system", to give instructions (“prompt engineering”) to the model, "user", to carry user’s prompt, "assistant", to indicate reply (“prompt completion”) from the model, etc. The content field hold the actual instruction, prompt, reply, etc. from the entity listed in role. Remember to separate each element with a comma if you have more than one elements in the array.

The SSE stream returned by chatterd will look something like this if you use curl:

data: {"model":"tinyllama","created_at":"2025-08-08T20:35:52.586157Z","message":{"role":"assistant","content":"Ab"},"done":false}

data: {"model":"tinyllama","created_at":"2025-08-08T20:35:52.603345Z","message":{"role":"assistant","content":"sol"},"done":false}

data: {"model":"tinyllama","created_at":"2025-08-08T20:35:52.620774Z","message":{"role":"assistant","content":"utely"},"done":false}

[. . . .]

data: {"model":"tinyllama","created_at":"2025-08-08T20:35:53.814197Z","message":{"role":"assistant","content":"!"},"done":false}

data: {"model":"tinyllama","created_at":"2025-08-08T20:35:53.832383Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":1554222000,"load_duration":24007209,"prompt_eval_count":581,"prompt_eval_duration":272983000,"eval_count":71,"eval_duration":1247892000}

On Postman, the data tag will not be shown, only the data lines:

On error, an error event will be sent:

event: error
data: {"error": "<error message>"}

Specifications

As in previous tutorials, you only need to build one front end, AND one of the alternative back-end stacks. Until you have a working backend of your own, you can use mada.eecs.umich.edu to test your front end. To receive full credit, your front end MUST work with your own back end.

The balance of work in this tutorial is more heavily weighted towards the backend:

llmchat Backend

Prepared by Sugih Jamin

Last updated: August 25th, 2025