Anonymous
Not logged in
Talk
Contributions
Create account
Log in
Publication Station
Search
Editing
Research/new digital reading experiences
(section)
From Publication Station
Namespaces
Page
Discussion
More
More
Page actions
Read
Edit
History
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Text to Speech === An important aspect of immersive reading is voiceover. Tone, infliction, rhythm, and emphasis all play an important part in how the text is experienced by the reader. In an ideal scenario, there would be a human narrating the text, but in practice this is often too expensive and time-consuming. Therefore, we have done some experiments to determine the usability of various AI Text-to-Speech (TTS) technologies that could create voiceovers from text files. <span id="ssml"></span> ==== SSML ==== After our initial exploration of various mainstream TTS products, the conclusion was that the voices sound quite convincing and pleasant, but the intonation is unnatural and does not properly emphasize the right words. We sought the solution to this problem in Speech Synthesis Markup Language (SSML). This is a way to include markings in de source text to indicate emphasis, pauses and other things that can improve the speech. Our idea was to use ChatGPT to add these markings to the source text, so that a TTS product can use this to improve its speech pattern. <span id="comparing-tts"></span> ==== Comparing TTS ==== The next step was a more thorough auditing of various TTS services to determine their quality and their support for SSML. '''Free option: Amazon Polly''' https://aws.amazon.com/polly/ Amazon Polly provided unnatural sounding speech. There is a higher quality version available, but that could not be accessed from the Netherlands. It claims to support SSML, but its interpretation often sounds stunted. '''Free option: Crikk''' https://crikk.com/ Crikk works better out-of-the-box, but it does not support SSML so it can not be improved upon. In does support pauses, but nothing else. '''Free option: Google TTS''' https://cloud.google.com/text-to-speech Google TTS works better than Amazon and supports SSML, but still the intonation remains very unnatural. '''Paid option: Elevenlabs''' https://elevenlabs.io/ Elevenlabs offers a trial of 10000 characters per month. It works much better than all the other options. It appears to do it’s own preprocessing to figure out the right intonation for each sentence. It doesn’t support SSML, but frankly it doesn’t need it. Overall, there is more variation in quality between different TTS services than we expected, and the large players don’t necessarily seem to do best. However, looking at the pace of innovation in this area, we expect the quality to improve drastically in the coming years. <span id="elective-immersive-reading"></span>
Summary:
Please note that all contributions to Publication Station are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Publication Station:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation
Main navigation
Main page
Printmaking Studio
Print Studio
Dig. Publishing Studio
Namespaces
Grafiwiki
Random Page
Log in
Wiki tools
Wiki tools
Page tools
Page tools
User page tools
More
What links here
Related changes
Page information
Page logs