Brain-machine interfaces have come a long way in the last decade. Now a group of computer scientists and neuroscientists from the University of San Francisco have taken things a step further.
The researchers used two neural networks to decode their participant’s brain activity and determine what they were saying. What they achieved was real-time translation of neural activity to text.
There were four participants, each with 250 electrodes previously implanted in their brains to monitor epileptic seizures, over a region known as the perisylvian cortex. They read aloud between 30 – 50 sentences, with less than 40 minutes of training time.
They were able to get a remarkable level of accuracy. For one participant, the error rate was as low as 3%. For comparison, a 5% error rate is generally observed in professional speech transcription.
Even better, once the networks have been trained on one person, they require less training for any subsequent participants, even when the electrodes are placed on the opposite side of the brain.
The errors, when they occurred, added some humour. In one instance, the phrase “those musicians harmonize marvelously” was mistaken for “the spinach was a famous singer.” In another, the phrase “a roll of wire lay near the wall” was interpreted as “will robin wear a yellow lily.”
They don’t sound anything alike, but then the algorithms weren’t monitoring sound, they were monitoring brain waves.
Words vs Phonemes
Previous algorithms have attempted to model speech based on phonemes, small units of sound that make up words and sentences when strung together. This model relied on words as the fundamental building blocks. There are advantages and disadvantages either way.
For starters, there are fewer phonemes than there are words, so there should be less to learn while gaining a wider range of comprehension. However, because they are shorter and strongly influenced by the preceding phonemes (coarticulation), they are more difficult to distinguish.
Words, on the other hand, are easier to tell apart from one another, but the number of words required to map most of our conversation is rather large. In this experiment, the dataset included only about 250 words, whereas the average English speaker is estimated to know 42,000.
Nevertheless, the researchers believe this is the best model—when they gave the phoneme model a try, the error rate shot up to 33%.
What’s more, they trained the models for less than 40 minutes, for someone with a permanent brain implant there’s much more room for expanding the vocabulary. It’s also likely that anybody that’s completely lost their voice would quite happily take a few hundred words.
An important aspect of this research is that the participants were speaking aloud as the algorithms worked their magic. As Dr Christian Herff told the Guardian, it isn’t so much translating thought as it is the brain activity involved in speech.
One of the authors of the study, Dr Joseph Makin, mentioned to Inverse, “In the long run, we think people who have lost speech—from ALS, a stroke, or some other traumatic brain injury—but remain cognitively intact would benefit from a speech prostheses along the lines of the set up in this study. But that’s at least several years into the future.”
The technology certainly holds promise for those who lack the ability to communicate. However, I’m sure many of us think beyond the regenerative function and into the realm of mind reading, telepathy, and the dystopian tropes of many sci-fi stories.
For more on that, I recommend checking out these articles: