Your own GPT4All based ChatGPT-like bot in Matrix

Your own (useful?) AI assistant

Posted by J├╝rgen on 14 April 2023

Integrate a GPT4All chatbot in your matrix instance

TL;DR advice: scroll down for pics and a link to a GIT repo

It's no secret, massively trained LLM's (Large Language Models, GPT being a prominent one) are a big thing, some say 'the next internet'-big. These language models, whose formulas gained through training on immense sets of conversations, can match up the most likely tokens to a given set of tokens, which seems to mimic artificial intelligence or AI.

Of course there's no intelligence involved, after seeing billions of conversations, such a model can match the most likely response to a query, there's no thinking or reasoning involved, it can only get to a set of most likely outcomes.

But nevertheless, a piece of software that can answer a question with a natural answer without the answer having to be hardcoded, seems like pure magic. And after training these models on instructions and their responses and introducing conversational memory, it look like these techniques could pass the Turing test.

Even though these Language models are not actual artificial intelligence, a lot of people (me included) do find these models quite useful; You can just ask a natural question and get an answer which is actually useful. Personally I don't view these models as a form of intelligence but rather as the next step in the evolution of search engines. I like to compare a bot like ChatGPT to Star Trek (TNG)'s ship computer.


OpenAI's GPT3 and GPT4 are quite impressive, but also quite closed IP. After toying around a bit with GPT2 Meta's LLaMa got leaked. This was a LLM which was the result of time and resource consuming billion conversation training, a task most people cannot do by themselves, because it takes a lot of expensive specialized hardware, a lot of (p)re-formatted training data en more importantly, a insane amount of time. A few smart people took this model, and extended it by training on instruction sets in order to be able to dictate tasks in order to direct the outcome of a query in a certain subset. This led to Alpaca. The next step was a model trained on a much bigger set of instructions. This would be GPT4All. This will be the model I will be using.

Note: This is how I understand progression of events happened, I'm not a scholar or even remotely attached to the field of Deep Learning (other than toying around with Keras in order to try to predict Crypto rate movement using Models of my own design, before realizing this was a futile task; unpredictable is what it is, even to a trained model).

A big pre of GPT4All is that it comes in a quantized form which, to my understanding, means that the internals of the model are reduced to a limited, discrete set of values instead of a continuous flow of values. What I know for sure is that it allows me to use a model which would normally require me to use something short of a super-computer on my mere laptop.


The whole chatbot way of using an LLM looks like fun. How cool would it be to have your own AI-like assistant running, makes you feel a bit like Tony Stark, right? Well, a big part of chatting is that in a conversation, both parties typically remember what the conversation is all about (though I've witnessed real-life discussions that do not seem to meet this requirement). A LLM like GPT4All is typically stateless, meaning you provide it with input, it completes that as the output, and then it all ends, task completed, bye bye.

I started out trying to get GPT4All to work with LangChain and use it as a ConversationChain with a ConversationBufferMemory, but after a few hours it became clear to me that I could not get it to work. The Memory was not working like I expected, of like presented in many examples online for other models. I gave up and decided it must've been a bug in the module, and not me not being able to comprehend the abstraction.

In any way, roaming through the source, I at least got an idea of how such a memory would work. It's quite simple actually, you just keep track of the conversation, and you give the stateless machine your query, preceded by the conversation so far in order to provide it with a context Since the model will try to find the tokens best fitting the provided tokens, this context will limit the tokens to the ones best matching the context.

The context consumes a lot of memory, and the buffer is limited to an amount of tokens (up to 2048?). So you can have a conversation up until the history consumes all those tokens, and the program will crash. So a way to limit the tokens is needed in order to keep the conversation going.

So I cobbled together a memory buffer which will start to forget earlier parts of the conversation in order to keep it going. A big benefit of matching the response tokens as much as possible to the input tokens, that the resulting tokens implicitly carry a bit of knowledge of their history. So even though you'll lose your exact history, the conversation as a whole will still keep (at least for a period) it's general tone kept in the entire history. It will just be impossible to refer back to something that's not kept in memory anymore.


If we're going to chat with a bot, why not do it through an actual chat system? I've created a few bots for my running synapse matrix server, matrix being a specification of how a chat system should word, and synapse being a implementation of it. My first bot would just echo whatever I fed it/.

The second one was a 1-on-1 passthrough to a PostgreSQL, meaning I could enter an SQL query from my Element matrix client on my phone, and get the result as a formatted table in a chat. (security-wise, this was a very bad idea).

Finally I created a NLP home automation bot to which I could send short commands like turn lights on in order to see the lights turn on in the living room.

So it seemed like a logical step to make a GPT chatbot available as a contact between my Whatsapp, Telegram, Discord and Signal contacts (I love Matrix and its bridges).

The code

First of all, the code is available here.

The code is not too complicated.

  • For matrix integration, the matrix-nio package is used. Most of the code is actually to integrate into a running instance. Lots of callback events.
  • For use of the GPT4All model, the langchain package is used.
  • In case some Markdown formatting is generated in the answer, the markdown package is used to convert markdown to HTML.

First of all, the code connects to the matrix homeserver and logs in as a registered user. From this point on I can add the chatbot user to my contacts.

Next, a prompt is formed. The concept of the prompt had me baffled at first. I was still thinking from a developers perspective. It was not until I realized it was nothing more that a template around the question and the answer and a way to provide a hardcoded context. It's just text, but it defines your bot. (I've been toying around with different prompts, at one point, the bot presented itself as Darth Vader :D).

The whole memory part is quite hacky; The amount of back-and-forts kept is based on the the maximum response tokens size and the available context size. The prompt_prefix is the hardcoded prompt bit, which is constantly prepended (in an ugly way) the the next input and its history.

There's also a Dockerfile provided in the repo, so you can run the bot inside of a docker container. It might be needed to provide custom ulimit values when starting the container like

      memlock: -1

in docker-compose.

Using the bot

My laptop has a 8-core i7-11800H and 64GB of RAM. Response times varied from a few seconds to respond to the first 'hello' to a few minutes to answer bigger queries further on the in the conversation. And it also seems you're getting a different mileage between runs. Sometimes the bot is more coherent than other times while answering the same question.


Well, this could qualify as having a 'conversation'


This looks more like a disagreement than a conversation


This is nowhere near the level of ChatGPT, and the upcoming GPT4 will obliterate this, but for now, I enjoy this technology quite a bit, and I will continue improving this bot.

For the TL;DR people: GIT Repo