Speech is an essential part of communication. If you need to incorporate speech into your applications, the Azure AI Foundry Speech Service is a great service for the job. In this talk, we will gain an overview of some of the capabilities around Azure AI Speech, seeing how we can use the service to perform text-to-speech and speech-to-text operations. We will investigate multi-lingual speech translation, analyze the components of speech, and even dive into custom neural voices, giving your applications a unique voice. We'll also compare Azure AI Speech to OpenAI's Whisper model, learning which model performs better for specific scenarios. Along the way, we will work with the .NET and Python libraries that make this service available to a wide audience of developers. Finally, because this is a critical piece of any cloud technology conversation, we'll gain an idea of how much it all costs.
ADDITIONAL MEDIA
No recordings or additional media are available for this talk.
Microsoft's Azure Speech documentation hub is the canonical reference for the service. It includes overviews, quickstarts, how-tos, and API reference for every language the SDK supports.
The Azure-Samples/cognitive-services-speech-sdk repo on GitHub has runnable samples in C#, C++, Java, JavaScript, Python, Objective-C, and Swift. It's the fastest way to go from "hello world" to something real.
The pronunciation assessment how-to covers the full scoring model (accuracy, fluency, completeness, prosody, miscue detection), along with the JSON response shape for word- and phoneme-level results.
The speech translation overview explains real-time multi-language translation including the newer multi-lingual mode that auto-detects the source language and handles mid-session language switching.
Language and voice support is the definitive list of locales, neural voices, and which features (translation, pronunciation, prosody) are available where. it's essential when scoping a demo or a production rollout.
The pronunciation assessment tool and the broader Speech Studio overview let you try reading, speaking, and gaming scenarios with no code. It's great for getting a feel for the scoring before writing any SDK integration.
The azure-cognitiveservices-speech package on PyPI is the Python SDK distribution. Keep it bookmarked for release notes and to check which version of the SDK is current before pinning dependencies.