Saturday 20 June 2009

By-Voice: Free online text-to-speech synthesis

I stumbled across a cool little service at by-voice.com. Basically, this is a simple web-service by which you can get text converted to computer-generated speech. The service works by making a HTTP request to by-voice, which means its a little awkward for everyday users, but it's wonderful for programmers, because you could, if you were to sign up to the service fully, write programs that would on-the-fly download speech and play it.



The free service gives you 200 speech-generation credits a month. But unfortunately this doesn't mean 200 downloads, as we shall see later.

There are a bunch of different 'speakers':

carlos - Spanish (European) Male Voice
laura - Spanish (European) Female Voice
amaya - Spanish (European) Female Voice
jennifer - English (US) Female Voice
daniel - English (UK) Male Voice
jane - English (US) Female Voice
celia - Spanish (Mexican) Female Voice
oriol - Catalan Male Voice
meritxell - Catalan Female Voice
empar - Valencian Female Voice
amaia - Basque Female Voice
adriana - Portuguese (European) Female Voice
julia - Portuguese (brazilian) Female Voice
brigitte - French Female Voice
freire - Galician Male Voice
javier - Spanish (Argentina) Male Voice
isabel - Spanish Female Voice

Unfortunately they can only read text in their own languages, which is a shame because I'd quite like to have my laptop speak to me in exotic female voices.

Anyways, to use the service you have to sign up and get sent a 'developer key' in your email. Then you can download voices by crafting a web-url like this:

http://www.by-voice.com/ttsonline/get_speak_3scale.php?account=your3scalekey&voice=carlos&format=mp3&text=hola%20mundo

This one will make 'Carlos' say 'Hola mundo!', which I suspect is "Hello World" in spanish. Notice that you have to change 'account=your3scalekey' to be whatever the key you got in the email was. Furthermore, note that all spaces in the speech-text have to be replaced by '%20', which is the standard HTTP quote mechanism.

Obviously, you can change the speaker by changing the 'voice=' argument. So, 'voice=jane' gets you speech in a female US accent. If you send her "Hola mundo" though, she'll probably make a terrible mess of it, best to make her say "Hello World".

'format=' can be changed to 'format=wav' if you want microsoft .wav files. However, they cost more credits. Yes, I'm afraid that you can't just keep downloading as many phrases as you like, the free sign-up only gets you 200 credits. 200 is a lot, right? Well, no, a credit is PER CHARACTER, so you can quickly use up all your free credits, and have to wait a month before you get another 200. For some strange reason each character spoken in .mp3 format costs only 1 credit. The same character in .wav costs 4 credits.

Still, you can have lots of fun with this, and I'm even thinking of signing up to the 45 euro paying plan for a month, which gives 45000 credits. To be honest though, I think by-voice would do better to have a web-frontend where you could by just the credits you need, when you want them.

Okay, here's how to use the service under Linux. There are a bunch of command-line programs one can use under linux that will copy the contents of a web-url to disk, wget, curl, etc etc. I use 'links -source', which uses the links webbrowser as a downloader rather than a browser. So:

links -source "http://www.by-voice.com/ttsonline/get_speak_3scale.php?account=your3scalekey&voice=carlos&format=mp3&text=hola%20mundo" > hola_mundo.mp3

Will get the file for me. Notice that I had to put the whole url in quotes, this is because '&' is a special character for the shell, and will confuse things if you don't quote the string.

Obviously, typing in that url all the time is a pain, so I wrote a trivial little script, 'byvoice.sh':


VOICE=jennifer
KEY=<insert your developer key here>
FORMAT=mp3

TEXT=`echo $* | sed "s/ /\%20/g"`

links -source "http://www.by-voice.com/ttsonline/get_speak_3scale.php?account=$KEY&voice=$VOICE&format=$FORMAT&text=$TEXT"


This does all the work like replacing spaces in the text, so now I can just type
./byvoice.sh "Battery low" > BatteryLow.mp3


And I have my .mp3. Sometimes you have to play about a little with the text to get them to say it right. For instance, wanted speech to tell me when Dan Carlin updates his 'hardcore history' podcast. I started with:


./byvoice.sh "New hardcore history podcast available" > NewHardcoreHistory.mp3

Unfortunately 'Jane' isn't familiar with the word 'podcast', and pronounces it 'powwdcast'. I solved this by breaking it into two words 'pod cast'. Then she gets it right.
She does even worse with 'Hardcore', managing to pronounce it to sound like 'Hard porn', which is not something that I want to have coming out of my laptop when others are around (hence, all the 'hardporn podcast' alerts are silent!). The solution here was to replace the 'c' in 'core' with 'k'. So finally:


./byvoice.sh "New hardkore history pod cast available" > NewHardcoreHistory.mp3

And I have what I want. But then I'd used up all my credits.

So, there it is. Go over to 'www.by-voice.com' to play with it, although start off with SHORT sentences if you want to get anywhere with your 200 credits.



1 comment:

  1. This seems very cool. Kind of 2001 Space Odyssey. Have you seen anything that does the reverse really well? Where a person could talk and have it transcribed into text. I am looking for something like that to speed up my composing stuff.

    ReplyDelete