Speakable Implementation: Schema, Content Rules, and Audio Delivery Optimization
Speakable schema marks specific sections of a web page as pre-optimized for text-to-speech delivery via Google Assistant. When implemented correctly, it enables Google to read your designated content sections aloud as direct voice answers - bypassing the text-read experience and delivering audio-native answers from your page. For content creators and AEO practitioners, speakable is the explicit schema signal for voice answer candidacy: a statement to Google that these specific passages are ready for audio delivery without formatting artifacts.
The most common implementation failure for Speakable schema is marking sections that contain HTML artifacts, bullet-character formatting, or overly complex sentence structures that produce poor TTS output. Google's TTS engine reads exactly what the schema selector points to - including any formatting noise in the marked element. Pre-deployment TTS testing of marked sections is a critical quality step that most implementations skip.
For the foundational context, see Speakable for News Publishers and Google Assistant Optimization.
Speakable - Audio Delivery Pipeline
Click each step to understand the full audio delivery pipeline from markup detection to voice playback:
Step 1: Speakable markup detected
Google crawlers parse speakable schema on the page - either via SpokenWord/@speakable JSON-LD or CSS selector declarations. The sections marked speakable are extracted as a candidate audio delivery corpus.
Speakable JSON-LD - CSS Selector and XPath Methods
Toggle between the two implementation methods - with complete code examples and critical content quality rules:
JSON-LD Method - Recommended
{
"@context": "https://schema.org/",
"@type": "WebPage",
"name": "AEO Guide: Optimizing for Voice Search",
"speakable": {
"@type": "SpeakableSpecification",
"xpath": [
"/html/head/title",
"/html/body/article/section[@id='speakable-intro']/p[1]",
"/html/body/article/section[@id='speakable-summary']/p"
]
},
"url": "https://example.com/aeo/voice-search-guide"
}Critical content rules for marked sections
Max 2 minutes of TTS audio: Mark sections that cumulatively produce no more than 2 minutes of spoken content (~200-300 words). Google's TTS duration limit means overly long speakable sections are truncated.
Complete sentences only: Speakable content must consist of grammatically complete sentences ending with periods, question marks, or exclamation marks. Fragments and headline-style text produce poor TTS output.
No HTML artifacts in speakable sections: Speakable sections must not contain tables, lists with bullet characters that render as literal symbols, or markdown-style formatting that becomes audio artifacts when read aloud.
Self-contained meaning: Each speakable section should stand alone - comprehensible without the surrounding page context. If the text requires visual formatting or preceding content to make sense, it's not suitable for audio delivery.
Speakable Content Quality - Before and After Examples
Real content examples showing the difference between speakable-appropriate and speakable-inappropriate writing:
❌ Too long + complex
“AEO - which stands for Answer Engine Optimization - is a comprehensive digital marketing practice that involves the systematic analysis, structuring, and optimization of web content across multiple dimensions including but not limited to schema markup implementation, featured snippet targeting, voice search query alignment, entity disambiguation, passage relevance scoring, and AI-mediated citation pathway optimization for the purpose of maximizing content retrieval probability in answer engine contexts.”
✓ Optimal - clear, complete, concise
“AEO (Answer Engine Optimization) is the practice of structuring web content for direct retrieval by AI-powered answer systems. Unlike traditional SEO, which targets organic click-throughs, AEO targets the moment when an AI system selects which source to cite in its answer.”
❌ Contains HTML artifacts
“Key facts: • Published: March 2026 • Author: Jane Smith • Category: AEO Basics → Read more: Voice Search 101 | Structured Data | Entity Recognition”
✓ Self-contained, voice-ready
“Speakable schema marks specific sections of a web page for text-to-speech delivery via Google Assistant. When a user asks a voice query that matches the page's topic, Google reads the speakable-marked section aloud rather than requiring the user to open the full article.”