

Discover more from Deliberate Internet
Attention is all you need. So do Androids meditate?
The current AI breakthroughs borrow quite a lot of Psychology lore. I am so here for it.
It may seem like AI is everywhere now. It's only a matter of time until your fridge, after being equipped with SmartHome features and enabling tracking of your yogurt on the blockchain, will offer some kind of "revolutionary" AI capabilities.
As you can probably tell, I have fallen a bit into the AI rabbit hole myself. Tools like GPT-3, Stable Diffusion, Midjourney, and chatGPT all sprung up with genuinely helpful capabilities. Why?
Attention is all you need
The current wave of AI tools was triggered by a spectacularly titled paper, "Attention is all you need." (pdf here), in which the researchers presented a transformer - a new architecture of a Large Language Model. It uses the Machine Learning "Attention" mechanism instead of a long toolchain of encoders and decoders. I have read that research paper and understood very little except for the fact that it allowed for immense performance gains in training big models - something that was prohibitively expensive before.
Fortunately, smart people understood much more and created all the recently popular AI tools.
So what is "Attention" in Machine Learning?
I studied both Computer Science and Psychology, so understanding a concept spanning both of these disciplines was a challenge I could not refuse. I dug up another great, freely available paper, "Attention in Psychology, Neuroscience, and Machine Learning," which you can find here. The paper notes the findings of Psychology most useful for Machine Learning research:
(In Psychology) Attention is the flexible control of limited computational resources. Why those resources are limited and how they can best be controlled will vary across use cases, but the ability to dynamically alter and route the flow of information has clear benefits for the adaptiveness of any system
Machine Learning "Attention" allows individual parts of the neural network to make sense of the smaller pieces of data before sending it off to the "Main Task". In some sense, it relegates some of the work to "unconscious" processes:
This type of artificial attention is thus a form of iterative re-weighting. Specifically, it dynamically highlights different components of a pre-processed input as they are needed for output generation. This makes it flexible and context dependent, like biological attention.
Before using attention mechanisms, Machine Learning models tried to make sense of the entire input at once, so the input length was constrained. Do you remember driving for the first time? There was so much happening at once. Computers felt the same way:
The decoder can't reasonably be conditioned on the entirety of the input so at some point a bottleneck must be introduced. In the system without attention, the fixed-length encoding vector was a bottleneck. When an attention mechanism is added, the encoding can be larger because the bottleneck (in the form of the context vector) will be produced dynamically as the decoder determines which part of the input to attend to
Do Androids Meditate?
In the paper, we find a concise explanation of what was so revolutionary in the "Attention is all you need”:
In 2017, the influential “Attention is All You Need” paper utilized a very different style of architecture for machine translation. This model doesn't have any recurrence, making it simpler to train ( ... in a ) a process known as “self-attention.”
The author caveats this by saying:
Interestingly, self-attention has less in common with biological attention than the recurrent attention models originally used for machine translation (...) self-attention provides a form of horizontal interaction between words—which allows for words in the encoded sentence to be processed in the context of those around them—but this mechanism does not include an obvious top-down component driven by the needs of the decoder
Let me try to rephrase this:
The revolutionary mechanism that allowed the Large Language Models to perform a quantum leap was:
Making them pay more attention to the context of what they are processing
Allowing them to spend uninterrupted time following their attention without explicitly directing it
Gaining an awareness of the context and bringing this focus back to the main task
To my layman's ears, this sounds exactly like the biological attention mechanism and the practice of meditation.
It's something biological brains credit for quantum leaps as well.
Attention is all you need in life too.
What struck me about the paper's title is its sage advice for Machine Learning models and humans alike.
Attention in Learning
In Outliers, Malcolm Gladwell coined the "10 000 hours" rule, which states that in order to reach true mastery of a craft, you need 10 000 hours of Deliberate Practice.
That Deliberate Practice assumes paying attention to your progress, identifying areas to improve, and challenging yourself with ever-increasing performance levels.
Attention in Parenting
Your child only really demands safety, food, and attention. Even if you are going through the motions but are checked out mentally, she will somehow notice it and complain. If, on the other hand - you are truly present, your child will sense that and become instantly happier and more self-sufficient 15-30 minutes later.
Your child needs your attention precisely when you have the least to spare. It sucks, I know.
Attention is the ultimate currency.
The world is moving in the direction of post-scarcity: We have more food, stuff, and "information" than we know what to do with. Many services on the Internet are already free, and you pay by - you guessed it - paying attention.
Your attention is the ultimate non-inflationary resource. Spend it on what counts.
What I have been working on
I have been teasing working on something related to AI, and I can finally share a bit more. If you have a WordPress.com site or a blog and are stuck writing, search for magic
in the block editor. Let me know how it goes. Hopefully, it will be more useful than an AI-enabled fridge.
This line from the "Attention in Psychology, Neuroscience, and Machine Learning" paper summarizes why I jumped on the opportunity to work with new technology. it's just exciting:
general attention appears to have limited reserves that won't be deployed in the case of a mundane or insufficiently rewarding task but can be called upon for more promising or interesting work