intermediate8 min read·AI & NLP

BERT, MUM & Google AI Models for AEO

BERT understands bidirectional word context; MUM processes text and images across 75 languages — both model families underlie Google's AI Overview citation selection.

BERT and MUM: The NLP Models That Power Modern AI Search

BERT and MUM are the two most consequential NLP model releases in Google Search history. BERT (2018, integrated into Search in 2019) introduced bidirectional context understanding - the ability to interpret how words modify each other in both directions simultaneously. MUM (2021) extended this to multimodal, multilingual, cross-task reasoning at a scale 1,000× more computationally powerful than BERT. Together, these models transformed what 'matching content to a query' means: from keyword overlap to genuine semantic understanding of user intent, entity relationships, and multi-step informational needs.

For AEO practitioners, BERT and MUM have direct, structural implications for content strategy. BERT means that natural language, precise entity references, and intent-matched writing are not best practices - they are algorithmic requirements. MUM means that comprehensive, multi-format, multilingual content that addresses the complete scope of a topic cluster is the ceiling of what AEO content strategy must aspire to.

For broader context, see Transformer Architecture for AEO, NLP Content Optimization, and Entity Salience.

BERT vs MUM Architecture Comparison

Hover each model to compare their capabilities, scale, and primary application in Google Search:

BERT vs MUM Architecture - Click to Compare
BERTBidirectional Encoder RepresentationsInput: Text only110M parameters (base)Context window: ~512 tokensEncoder-only architecturePre-trains on fill-in-the-blankLaunched: Oct 2018Use: Query understandingLanguage: English + multilingualvsMUMMultitask Unified ModelInput: Text + Images + Video1,000× more powerful than BERTContext window: Long-contextEncoder-decoder + multimodalUnderstands subtasks in one passLaunched: May 2021Use: Complex answer generationLanguages: 75+ natively

BERT in Action - Real Content Impact Examples

Three examples showing how BERT's context understanding changes which content wins for specific queries. These illustrate the direct content writing implications:

BERT in Action - Real Content Impact Examples

Query:

'What is a river bank?' - BERT understands 'bank' = geographic feature

BERT disambiguates 'bank' query context

BERT Prefers (post-BERT winner)

"A river bank is the land alongside a river, typically elevated above the water level during normal flow conditions. River banks form through sediment deposition over time."

Correct: this passage uses 'bank' in a geographic context. BERT scores this passage as highly relevant to the geographic-bank query.

BERT Deprioritizes (pre-BERT might have won)

"Bank information: we offer checking, savings, and investment accounts. Visit our local bank branch."

Incorrect: this passage about a financial bank would previously have matched 'bank' queries, but BERT's bidirectional context understanding eliminates this false positive.

NLP Model Evolution Timeline - 2013 to 2024

The progression from Word2Vec to modern reasoning models explains the cumulative capabilities AI systems now bring to query interpretation and answer generation:

NLP Model Evolution Timeline - 2013 to 2024
2013

Word2Vec

First word embeddings - 'king - man + woman ≈ queen'. Established that meaning could be encoded as vector relationships.

2015

Attention Mechanism

Allowed models to focus on different parts of the input when generating each output. Foundation for Transformers.

2017

Transformer Architecture

'Attention is All You Need' paper. Full transformer architecture. Enabled parallel training at scale.

2018

BERT

Bidirectional context understanding. Changed how Google interprets queries. Still actively used in Search.

2019

GPT-2

Autoregressive text generation at scale. First model capable of generative responses to open-ended prompts.

2020

GPT-3

175B parameters. Few-shot learning emerged. The model that made AI generative capability mainstream.

2021

MUM (Google)

Multimodal + multilingual understanding 1000× BERT. Powers complex search understanding and AI Overviews context.

2022

ChatGPT

RLHF-trained GPT-3.5. Made conversational AI accessible to 100M users in 5 days. Triggered AI search revolution.

2023

GPT-4 / Gemini

Multimodal LLMs at scale. Image + text joint understanding. Integrated into Bing Search and Google SGE.

2024

Reasoning Models

o1, Gemini 2.0 Flash Thinking. Chain-of-thought reasoning enables complex research and expert-level analysis.

MUM Implications for AEO Strategy

MUM's capabilities beyond BERT have five specific strategic implications for how AEO content should be structured and published:

Multi-step complex queries are answered without intermediate searches

Content must address the complete multi-step journey in a single article. 'How to recover from a Google algorithm penalty' needs to cover: identifying the penalty type, analyzing affected pages, making corrections, and submitting for reconsideration - all in one place.

Multilingual content receives proportional citation regardless of language

MUM's 75+ language understanding means non-English content in your target markets can now be AEO-optimized. High-quality Spanish, French, or Japanese content is citation-eligible in AI responses to queries in those languages - even for US-hosted sites.

Image and video content is understood for query matching

MUM can understand comparison images, instructional video thumbnails, and chart images. Product comparison images with text labels become independently searchable query matches - Image schema and descriptive alt text are now substantive AEO signals, not just accessibility features.

Topic sufficiency - one comprehensive resource beats multiple partial ones

MUM evaluates whether content addresses the full scope of a topic cluster. A single 5,000-word pillar page that comprehensively covers a topic (with all subtopics, counterarguments, and use cases) is preferred to five separate 1,000-word pages that each partially address the topic.

Cross-language entity reference recognition

MUM identifies that 'Künstliche Intelligenz' (German) and 'artificial intelligence' are the same entity. Your English content can now receive citation credit when a German-language AI query is answered using your English-language article - if it is the most comprehensive available source.

Frequently Asked Questions

Related Topics