All You Want to Know About Polly: Amazon's Text-to-Speech AI Tool

Eric Miller
Eric Miller Originally published Jun 03, 24, updated Jul 20, 24

Speech synthesis, or artificial production of human speech, has various use cases in everyday life. While the tech has made lives easier for a large section of society, it was not until recently that it showed promise to be truly joyful to use and experience. Text-to-speech technology is entering a new era of evolution and usefulness, thanks in part to the boom in artificial intelligence. Let’s read about the Amazon text-to-speech tool called AWS Polly and see what it’s all about.

In this article
    1. Simple API
    2. Selection of Languages and Voices
    3. Support for Speech Synthesis Markup Language
    4. Custom Lexicons, Brand Voices, and More

Part 1: What is Text-to-Speech Technology?

Text-to-speech technology, or TTS, is a system that converts text to artificially processed and produced human speech. This speech was previously characterized by robotic, artificial-sounding speech that was unenthused and lacked perceivable emotion, making the speech sound and feel disconnected from real human speech.

TTS integration is on the rise lately, and the market is projected to cross USD 17 billion by 2029. This increase will be driven by the rising demand for accessibility features in devices and for voice-enabled devices.

Artificial Intelligence and TTS

TTS is one of the areas that has benefitted immensely from artificial intelligence. Due to AI, TTS systems can now understand context, understand a wide spectrum of languages, and manage them with ease, but the best thing to have happened to TTS systems thanks to AI is the inclusion of emotion in speech. Thanks to AI, TTS systems can now sound very real, complete with inflections and modulations akin to natural human speech.

Part 2: What is Amazon Polly? How Does It Work?

Amazon Polly is part of Amazon Web Services or AWS and is Amazon’s online text-to-speech service. Akin to other AWS products, users can ‘deploy’ AWS Polly for their purposes and pay a price as per choice of plan.

How Does Amazon Polly Work?

Amazon Polly works by synthesizing natural-sounding human speech through the use of deep learning technologies that can then be used to convert text to speech. Users can easily build speech-activated applications through the use of dozens of natural-sounding lifelike voices across dozens of languages.

To use Amazon Polly, users need to:

1. Choose a voice engine,

2. Call a speech synthesis method,

3. Input the text that needs to be synthesized,

4. Specify the audio output format.

Amazon Polly then converts the input into a high-quality speech audio stream.

Part 3: Features of Amazon Polly

AWS products have always been developer-centric, and as such, require some know-how of how to code and ‘deploy’. As with all AWS products, Polly comes with a comprehensive feature set designed to fit in with all sorts of uses.

Simple API

Users can simply send the text they want synthesized for speech to the Amazon Polly API and Amazon Polly returns the converted audio stream back to the application immediately for use.

Selection of Languages and Voices

Amazon Polly already includes dozens of lifelike voices, but now, along with standard and neural TTS voices, Polly also includes long-form and generative voices. This means that Amazon Polly can now offer voices that are much closer to real, human voices than ever before.

Support for Speech Synthesis Markup Language

Amazon Polly supports SSML, making possible adjustments to speech style, loudness, pitch, and speech rate. Custom Amazon SSML tags enable newfound use cases, helping users create a more lifelike speech for better user attention and retention.

Custom Lexicons, Brand Voices, and More

Custom lexicons allow users to modify the pronunciation of certain words, to better enable the lifelike speech experience. Amazon Polly also offers users the ability to work with the Polly team to create a custom neutral TTS voice for specific use by their organization alone. Polly is feature-rich, and everyone is sure to find Polly good for their work.

Part 4: Use Cases

Polly AWS can be used in a variety of scenarios. This level of flexibility is by design and makes Amazon Polly text to speech tool a versatile TTS application for everyone. across industries, companies are using Amazon Polly as their preferred TTS solution to keep their customers satisfied.

4.1: The Washington Post

The Washington Post has deployed Amazon Polly to let users hear the stories on their news site instead of reading them, giving users a new way to interact with the news site and driving user engagement. Using AWS Polly text-to-speech tool not only enabled readers short of time to hear the stories instead of reading but also helped WaPo make its legendary journalism accessible to all.

4.2: Trinity Audio WordPress Plugin

In case you have ever designed a WordPress site and have used Trinity Audio’s WordPress plugin to convert text to speech on the website, you have already used AWS Polly! In today’s fast-paced world, people prefer audio and video more than reading, and to make your web content accessible to everyone and increase your audience, you need to have all sorts of tools available, including, yes, text-to-speech tools that can convert your web content into audio for people to hear when on the move.

4.3: CommonLit

CommonLit is using Amazon Polly text-to-speech technology to enable learners to get a rich learning experience. Users can click the Read Aloud button to hear Polly read the text out loud naturally, fluidly, and fluently. Instead of paying voice actors to read the texts, CommonLit chose to use Amazon Polly and gained far more than they would have with using voice actors.

Closing Words

There is a reason developers love Amazon Web Services, and that’s because if you know how to, use and integrate complex web technologies into your work is very, very easy. The price points are fair, the quality of service excellent, documentation humongous. What’s not to like? Just one thing – it might not be for everyone! Amazon Polly is an excellent, professional TTS service that is highly customizable and can be integrated into nearly anything. But, if all you are looking for is that odd transcription from text to speech, you might be better off with an even easier-to-use online text-to-speech conversion tool such as Wondershare Virbo. Give it a try today!

Eric Miller
