Translate as you speak

Spoken translation using language models

Posted by J├╝rgen on 23 April 2023

Translate as you speak

Lots of development is happening in the field of language models. After toying around with GPT4All for conversations, and having built a home assistant with voice in- and output, I thought it would be a nice experiment to have a personal translator that would translate as you speak. And since these translation models are not as big as these LLM's that are all the hype now, it would be a nice chance to have the GPU do the heavy lifting.

Design

  • Speech recognition is done through [Vosk](). This is one of the few local open source speech recognition libraries I could find that would support the Dutch language (a pre-requisite since I'm Dutch myself).

  • Translation is done using a [Hugging-Face]() [pipeline]() and the [Helsinki]() model specialized in translating Dutch to English.

  • Voice synthesis is done using [Larynx]() a relatively good sounding local voice synthesizer.

The layout is fairly simple:

  • The Vosk speech recognition takes microphone input and returns text.
  • The Hugging-Face translator has a RESTful API accepting Dutch text-input and returning English text-output
  • The Larynx speech synthesizer takes text and returns a sound file that can be played.

Using it

Clone the GIT Repository Check out the README.md for system dependencies

After issuing a docker-compose up there are 2 containers running; Larynx and LLM (translator). The Vosk speech recognizer is built into the client, so after setting up a Python3.9+ virtual environment and installing the dependencies, running client with ./client.py will start listening for speech.

Examples

The videos below illustrate how it works.

Hello

Ordering a beer