Researchers have demonstrated how to decode what the human brain is seeing by using artificial intelligence to interpret fMRI scans from people watching videos, representing a sort of mind-reading technology.
The advance could aid efforts to improve artificial intelligence and lead to new insights into brain function. Critical to the research is a type of algorithm called a convolutional neural network, which has been instrumental in enabling computers and smartphones to recognize faces and objects.
“That type of network has made an enormous impact in the field of computer vision in recent years,” said Zhongming Liu, an assistant professor in Purdue University’s Weldon School of Biomedical Engineering and School of Electrical and Computer Engineering. “Our technique uses the neural network to understand what you are seeing.”
Convolutional neural networks, a form of “deep-learning” algorithm, have been used to study how the brain processes static images and other visual stimuli. However, the new findings represent the first time such an approach has been used to see how the brain processes movies of natural scenes, a step toward decoding the brain while people are trying to make sense of complex and dynamic visual surroundings, said doctoral student Haiguang Wen.
The researchers acquired 11.5 hours of fMRI data from each of three women subjects watching 972 video clips, including those showing people or animals in action and nature scenes. First, the data were used to train the convolutional neural network model to predict the activity in the brain’s visual cortex while the subjects were watching the videos. Then they used the model to decode fMRI data from the subjects to reconstruct the videos, even ones the model had never watched before.
The model was able to accurately decode the fMRI data into specific image categories. Actual video images were then presented side-by-side with the computer’s interpretation of what the person’s brain saw based on fMRI data.
“For example, a water animal, the moon, a turtle, a person, a bird in flight,” Wen said. “I think what is a unique aspect of this work is that we are doing the decoding nearly in real time, as the subjects are watching the video. We scan the brain every two seconds, and the model rebuilds the visual experience as it occurs.”
The researchers were able to figure out how certain locations in the brain were associated with specific information a person was seeing.
“Neuroscience is trying to map which parts of the brain are responsible for specific functionality,” Wen said. “This is a landmark goal of neuroscience. I think what we report in this paper moves us closer to achieving that goal. A scene with a car moving in front of a building is dissected into pieces of information by the brain: one location in the brain may represent the car; another location may represent the building. Using our technique, you may visualize the specific information represented by any brain location, and screen through all the locations in the brain’s visual cortex. By doing that, you can see how the brain divides a visual scene into pieces, and re-assembles the pieces into a full understanding of the visual scene.”
The researchers also were able to use models trained with data from one human subject to predict and decode the brain activity of a different human subject, a process called cross-subject encoding and decoding. This finding is important because it demonstrates the potential for broad applications of such models to study brain function, even for people with visual deficits.
“We think we are entering a new era of machine intelligence and neuroscience where research is focusing on the intersection of these two important fields,” Liu said. “Our mission in general is to advance artificial intelligence using brain-inspired concepts. In turn, we want to use artificial intelligence to help us understand the brain. So, we think this is a good strategy to help advance both fields in a way that otherwise would not be accomplished if we approached them separately.”