intermediate7 min read·Voice Search

Implementing Speakable Schema Step-by-Step

Implementing Speakable requires targeting news articles, marking up relevant sections with cssSelector, and submitting via Google News Publisher Center.

Speakable Implementation: Schema, Content Rules, and Audio Delivery Optimization

Speakable schema marks specific sections of a web page as pre-optimized for text-to-speech delivery via Google Assistant. When implemented correctly, it enables Google to read your designated content sections aloud as direct voice answers - bypassing the text-read experience and delivering audio-native answers from your page. For content creators and AEO practitioners, speakable is the explicit schema signal for voice answer candidacy: a statement to Google that these specific passages are ready for audio delivery without formatting artifacts.

The most common implementation failure for Speakable schema is marking sections that contain HTML artifacts, bullet-character formatting, or overly complex sentence structures that produce poor TTS output. Google's TTS engine reads exactly what the schema selector points to - including any formatting noise in the marked element. Pre-deployment TTS testing of marked sections is a critical quality step that most implementations skip.

For the foundational context, see Speakable for News Publishers and Google Assistant Optimization.

Speakable - Audio Delivery Pipeline

Click each step to understand the full audio delivery pipeline from markup detection to voice playback:

Speakable - Audio Delivery Pipeline

Step 1: Speakable markup detected

Google crawlers parse speakable schema on the page - either via SpokenWord/@speakable JSON-LD or CSS selector declarations. The sections marked speakable are extracted as a candidate audio delivery corpus.

Speakable JSON-LD - CSS Selector and XPath Methods

Toggle between the two implementation methods - with complete code examples and critical content quality rules:

Speakable Implementation - JSON-LD vs CSS Selectors

JSON-LD Method - Recommended

{
  "@context": "https://schema.org/",
  "@type": "WebPage",
  "name": "AEO Guide: Optimizing for Voice Search",
  "speakable": {
    "@type": "SpeakableSpecification",
    "xpath": [
      "/html/head/title",
      "/html/body/article/section[@id='speakable-intro']/p[1]",
      "/html/body/article/section[@id='speakable-summary']/p"
    ]
  },
  "url": "https://example.com/aeo/voice-search-guide"
}

Critical content rules for marked sections

Max 2 minutes of TTS audio: Mark sections that cumulatively produce no more than 2 minutes of spoken content (~200-300 words). Google's TTS duration limit means overly long speakable sections are truncated.

Complete sentences only: Speakable content must consist of grammatically complete sentences ending with periods, question marks, or exclamation marks. Fragments and headline-style text produce poor TTS output.

No HTML artifacts in speakable sections: Speakable sections must not contain tables, lists with bullet characters that render as literal symbols, or markdown-style formatting that becomes audio artifacts when read aloud.

Self-contained meaning: Each speakable section should stand alone - comprehensible without the surrounding page context. If the text requires visual formatting or preceding content to make sense, it's not suitable for audio delivery.

Speakable Content Quality - Before and After Examples

Real content examples showing the difference between speakable-appropriate and speakable-inappropriate writing:

Speakable Content - Quality Examples

Too long + complex

AEO - which stands for Answer Engine Optimization - is a comprehensive digital marketing practice that involves the systematic analysis, structuring, and optimization of web content across multiple dimensions including but not limited to schema markup implementation, featured snippet targeting, voice search query alignment, entity disambiguation, passage relevance scoring, and AI-mediated citation pathway optimization for the purpose of maximizing content retrieval probability in answer engine contexts.

Optimal - clear, complete, concise

AEO (Answer Engine Optimization) is the practice of structuring web content for direct retrieval by AI-powered answer systems. Unlike traditional SEO, which targets organic click-throughs, AEO targets the moment when an AI system selects which source to cite in its answer.

Contains HTML artifacts

Key facts: • Published: March 2026 • Author: Jane Smith • Category: AEO Basics → Read more: Voice Search 101 | Structured Data | Entity Recognition

Self-contained, voice-ready

Speakable schema marks specific sections of a web page for text-to-speech delivery via Google Assistant. When a user asks a voice query that matches the page's topic, Google reads the speakable-marked section aloud rather than requiring the user to open the full article.

Speakable Implementation Checklist

Speakable Implementation Checklist0%

Frequently Asked Questions

Related Topics