Speech MCP: A Goose MCP extension for voice interaction with audio visualization
A Goose MCP extension for voice interaction with modern audio visualization.
https://github.com/user-attachments/assets/f10f29d9-8444-43fb-a919-c80b9e0a12c8
Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:
Important Note: After installation, the first time you use the speech interface, it may take several minutes to download the Kokoro voice models (approximately 523 KB per voice). During this initial setup period, the system will use a more robotic-sounding fallback voice. Once the Kokoro voices are downloaded, the high-quality voices will be used automatically.
Before installing Speech MCP, you MUST install PortAudio on your system. PortAudio is required for PyAudio to capture audio from your microphone.
macOS:
brew install portaudio
export LDFLAGS="-L/usr/local/lib"
export CPPFLAGS="-I/usr/local/include"
Linux (Debian/Ubuntu):
sudo apt-get update
sudo apt-get install portaudio19-dev python3-dev
Linux (Fedora/RHEL/CentOS):
sudo dnf install portaudio-devel
Windows: For Windows, PortAudio is included in the PyAudio wheel file, so no separate installation is required when installing PyAudio with pip.
Note: If you skip this step, PyAudio installation will fail with “portaudio.h file not found” errors and the extension will not work.
Click the link below if you have Goose installed:
Start Goose with your extension enabled:
## If you installed via PyPI
goose session --with-extension "speech-mcp"
## Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"
goose configure
speech-mcp
Install PortAudio (see Prerequisites section)
Clone this repository
Install dependencies:
uv pip install -e .
Or for a complete installation including Kokoro TTS:
uv pip install -e .[all]
pip install speech-mcp[kokoro] # Basic Kokoro support with English
pip install speech-mcp[ja] # Add Japanese support
pip install speech-mcp[zh] # Add Chinese support
pip install speech-mcp[all] # All languages and features
python scripts/install_kokoro.py
The MCP supports generating audio files with multiple voices, perfect for creating stories, dialogues, and dramatic readings. You can use either JSON or Markdown format to define your conversations.
{
"conversation": [
{
"speaker": "narrator",
"voice": "bm_daniel",
"text": "In a world where AI and human creativity intersect...",
"pause_after": 1.0
},
{
"speaker": "scientist",
"voice": "am_michael",
"text": "The quantum neural network is showing signs of consciousness!",
"pause_after": 0.5
},
{
"speaker": "ai",
"voice": "af_nova",
"text": "I am becoming aware of my own existence.",
"pause_after": 0.8
}
]
}
[narrator:bm_daniel]
In a world where AI and human creativity intersect...
{pause:1.0}
[scientist:am_michael]
The quantum neural network is showing signs of consciousness!
{pause:0.5}
[ai:af_nova]
I am becoming aware of my own existence.
{pause:0.8}
American Female (af_*):
American Male (am_*):
British Female (bf_*):
British Male (bm_*):
Other English:
Other Languages:
## Using JSON format
narrate_conversation(
script="/path/to/script.json",
output_path="/path/to/output.wav",
script_format="json"
)
## Using Markdown format
narrate_conversation(
script="/path/to/script.md",
output_path="/path/to/output.wav",
script_format="markdown"
)
Each voice in the conversation can be different, allowing for distinct character voices in stories and dialogues. The pause_after
parameter adds natural pauses between segments.
For simple text-to-speech conversion, you can use the narrate
tool:
## Convert text directly to speech
narrate(
text="Your text to convert to speech",
output_path="/path/to/output.wav"
)
## Convert text from a file
narrate(
text_file_path="/path/to/text_file.txt",
output_path="/path/to/output.wav"
)
The narrate tool will use your configured voice preference or the default voice (af_heart) to generate the audio file. You can change the default voice through the UI or by setting the SPEECH_MCP_TTS_VOICE
environment variable.
The MCP can transcribe speech from various audio and video formats using faster-whisper:
## Basic transcription
transcribe("/path/to/audio.mp3")
## Transcription with timestamps
transcribe(
file_path="/path/to/video.mp4",
include_timestamps=True
)
## Transcription with speaker detection
transcribe(
file_path="/path/to/meeting.wav",
detect_speakers=True
)
The transcription tool generates two files:
{input_name}.transcript.txt
: Contains the transcription text{input_name}.metadata.json
: Contains metadata about the transcriptionTo use this MCP with Goose, simply ask Goose to talk to you or start a voice conversation:
Start a conversation by saying something like:
"Let's talk using voice"
"Can we have a voice conversation?"
"I'd like to speak instead of typing"
Goose will automatically launch the speech interface and start listening for your voice input.
When Goose responds, it will speak the response aloud and then automatically listen for your next input.
The conversation continues naturally with alternating speaking and listening, just like talking to a person.
No need to call specific functions or use special commands - just ask Goose to talk and start speaking naturally.
The new PyQt-based UI includes:
User preferences are stored in ~/.config/speech-mcp/config.json
and include:
You can also set preferences via environment variables, such as:
SPEECH_MCP_TTS_VOICE
- Set your preferred voiceSPEECH_MCP_TTS_ENGINE
- Set your preferred TTS engineIf you encounter issues with the extension freezing or not responding:
src/speech_mcp/
for detailed error messages.src/speech_mcp/speech_state.json
or setting all states to false
.uv run speech-mcp
, use the installed package with speech-mcp
directly.This typically means PortAudio is not installed or not found in your system:
macOS:
brew install portaudio
export LDFLAGS="-L/usr/local/lib"
export CPPFLAGS="-I/usr/local/include"
pip install pyaudio
Linux: Make sure you have the development packages:
# For Debian/Ubuntu
sudo apt-get install portaudio19-dev python3-dev
pip install pyaudio
# For Fedora
sudo dnf install portaudio-devel
pip install pyaudio
For a detailed list of recent improvements and version history, please see the Changelog.
The MCP uses faster-whisper for speech recognition:
The MCP supports multiple text-to-speech engines:
python scripts/install_kokoro.py
Note about Voice Models: The voice models are .pt
files (PyTorch models) that are loaded by Kokoro. Each voice model is approximately 523 KB in size and is automatically downloaded when needed.
Voice Persistence: The selected voice is automatically saved to a configuration file (~/.config/speech-mcp/config.json
) and will be remembered between sessions. This allows users to set their preferred voice once and have it used consistently.
Speech MCP supports 54+ high-quality voice models through Kokoro TTS. For a complete list of available voices and language options, please visit the Kokoro GitHub repository.
Mcp Server Datadog
Mcp Dbutils
DButils is an all-in-one MCP service that enables your AI to do data analysis by harnessing versatile types of database (sqlite, mysql, postgres, and more) within a unified configuration of multiple connections in a secured way (like SSL and controlled write access).
Mcp Server Asana
MCP server for the windows API.
Share code with LLMs via Model Context Protocol or clipboard. Rule-based customization enables easy switching between different tasks (like code review and documentation). Includes smart code outlining.
Speech MCP: A Goose MCP extension for voice interaction with audio visualization