Let's talk : Respeaking

Do you think subtitles on TV have gotten worse? Say hello to respeaking.

What is respeaking?

A respeaker (or voice writer) uses a mask or speech silencer to repeat what they hear into their computer, which uses voice recognition software to translate speech into text.

Image: www.gccra.org

The people providing live subtitling for television are called captioners. If they use voice recognition they are respeakers. In the UK, respeaking is only used in live captioning, it’s not usually used with deaf people.

Speech-to-Text Reporters (STTRs) are called CART Providers in the USA, or stenographers. They use verbatim shorthand machines which have been around for over 20 years in the UK so they have always been a profession in their own right. The spoken language is not modified in any way, what appears on the screen is exactly what is spoken and this relies on the shorthand skills of the Verbatim Speech-to-Text Reporter.

Respeaking is relatively new in the UK and is seen as something completely different due to the very different skills involved.

The qualification exam to become an STTR is only open to people able to write verbatim shorthand at 180 wpm. In the US, certification is at 225 wpm.

Respeaking is a very different skill: you need to talk fast and very accurately, and know a lot about speech recognition software. The most training time is spent on

How fast can you talk before recognition errors occur
How can you speak very precisely, even in fast talking
How can you achieve the ability to listen, respeak, reread what the computer writes and correct it by hand – all nearly simultaneously.

The BBC used to pay stenographers to write the live subtitles for TV programmes. To cut costs, they started using respeakers. What is the difference between a stenographer and a respeaker? What are the issues a consumer should be thinking about when deciding which to use? We spoke to Voice4Me*, a large provider of respeakers, to find out more.

SPEED

A stenographer can write up to 350 words a minute and speed can be an issue for respeakers.

HEALTH & SAFETY

Respeakers are advised to only do 15-minute stints for up to two hours of work as it can damage vocal chords to do any more. How can their voice maintain quality? This has been researched extensively by Voice4Me who have been training and providing respeakers for many years.

Voice4Me only has four out of a team of over 20 who can get 98.5% on regular political TV programmes. They wouldn’t get that on a programme that would be more ad hoc and without much prep. It is not possible to respeak constantly over a two hour period or a whole day, and maintain voice quality. Voice4Me found that the quality drops as the stints get longer. They may claim to be better and to do everything, but the evidence isn’t there.

ACCURACY

Not everyone can walk off the street and be a good respeaker. Voice4Me have stretched the boundaries and still find the service wanting. They have had varying degrees of success with their staff, who are very familiar with the industry and the requirements of a deaf audience. Their respeakers are highly trained and are only required to make 97% accuracy. On rolling news, they don’t give them a content/contextual percentage as this accuracy level would drop again. Stenographers are required to hit 99% accuracy and 98% content. Only a handful of respeakers can hit 98% and aways on familiar content, and very very few are verbatim.

What is the difference between 97% and 99% accuracy? It is the difference between stenographic captions and TV subtitles. It’s a big difference. It doesn’t sound a lot but it is. Respeakers don’t all hit the target of 97%; they are supposed to. That isn’t contextual accuracy. That’s just what they get down – the content. STTRs have a contextual accuracy requirement so they have to get everything down – and hit the accuracy level of 99% as well. Voice4Me recognise the differences in skills and their respeakers are better trained than any other. And that is only 15 minutes worth.

In the UK and the USA, stenographers have to be able to write verbatim. This is not the case in Europe. A stenographer will include information such as laughter, applause, and colloquies. In the UK, speech to text reporters are required to have a minimum of 180 words per minute (wpm) with a syllabic density of 2.4. In reality, they have to write a lot faster than that to be verbatim, generally over 200wpm. 180wpm is the entry level.

Many stenographer clients would not accept the accuracy levels of respeakers. Clients often want the verbatim nature of speech-to-text. The danger is you are paying for a service that won’t meet your needs, so why pay for it at all?

PREPARATION

Preparation prior to a job is very important and affects accuracy. Preparation time and material for respeakers is much longer and more important than for stenographers. Most respeaking jobs don’t provide either.

FLEXIBILITY AND ADAPTABILITY

There are some great respeakers doing subtitles but they have been specially trained and have been working in the same environment every day for years – not the varied situations stenographers find themselves in. There is a long way to go and respeakers are not the next new thing. They won’t be able to handle the long hours or full days that stenographers do, nor the varied content.

In the UK, respeakers cannot work onsite, they only work remotely. To work onsite, they would need to work in an interpreter’s booth and the audio is piped in, which is not generally available in the UK. Onsite, they can use a special mask to speak into, which softens their voice, however this can still be disruptive. Clients want a discreet service, which will move with them from meeting to meeting, with no fuss.

COST EFFECTIVENESS

A stenographer is more cost effective as they can work on their own, whereas two respeakers are required to do the same job, so it costs more for the consumer. Voice4Me’s respeakers do other subtitling work, which is where the value is found in the employee. Respeakers can’t do the volume of output that a stenographer can, and that is what separates them. You need two respeakers to do what a stenographer does, which makes a stenographer cheaper.

STANDARDS

There are a few really good respeakers out there but there are many that are not so good. They are also mostly experienced subtitlers so they know and understand the material they are doing really well – such as the news on television. The sound they receive is excellent quality and even the shows they don’t have prep for, they know them well, and will have worked on them before. Voice4Me invested heavily in respeaking, with varied success. They had high quality subtitlers at 99% and now have settled at 97% for respeakers. They have never had so many complaints, and are under huge pressure from deaf groups. People want quality.

Membership of a national register is important because service providers would sign up to a Code of Practice that includes a code of conduct such as;

You shall do no harm, and you will not bring your profession into disrepute.

All registered Communication Professionals in the UK are required to provide an enhanced disclosure and barring service (DBS) check and evidence of valid personal indemnity insurance. This will cover them for a complaints procedure, which protects both themselves, the person who books them, and the client. They are required to carry out Continuing Professional Development on an annual basis as a condition of their registration.

In contrast, with respeaking, there is no real quality benchmark. Professor Romero Fresco’s EU standards are only reliable in the Spanish language. There is no guarantee of quality with the respeaker that you get, no guarantee of personal indemnity insurance, no CPD, no redress for complaints.

TRAINING

To train as a STTR or CART writer, a person would first train as a court reporter. A court reporter using a steno machine could train to be a respeaker in 6 months. Anyone training as a respeaker can learn to use the basic equipment (instead of the newest technologies) in 6 – 8 months. Full training can be completed in one year. Learning the process is not sufficient to reach a working standard; the respeaker must practice and become proficient as well, which takes around 6 months.

CHALLENGES

The challenges of remote work are usually sound quality and lack of preparation material. Consumers tend to give you fewer breaks as they can’t see the writer. All of these things don’t work well for respeaking. The health and safety aspect was extensively researched by Voice4Me, which means anything more than 15 minutes of respeaking solid text without a break will result in damage to the voice, and also a drop in quality.

Voice4Me has invested a huge amount of money and effort into respeaking. They have in hindsight agreed they should have invested in stenography training but they have invested so much money already, and everything is built around it, so they continue. Their respeakers do lots of other things, not just respeaking, so they are multi skilled and therefore cost effective. Voice4Me has recently taken back on several stenographers as freelancers – who could work day and night for them if they wanted to.

Respeaking works well in captioning because of the nature of the work. The stenographer can work remotely and it’s everything and anything with no prep, bad sound, people mumbling etc etc. Respeakers will be talking and listening, with no breaks. Respeaking is not as skilled a job as speech-to-text or captioning. Like remote speech-to-text, respeaking doesn’t work for every situation and for everyone. Respeakers do not earn as much as stenographers because there is a recognised difference in skills and productivity. If you adhere to health and safety requirements, you need two to three respeakers to do the same job as one stenographer, so are you really saving money? The common goal of all systems is to transform speech to text in (near) realtime and with a high quality of service.

Respeaking might be useful for the educational market. They do have a place in the market in certain situations and with co-working, but they certainly are not at a point where they will be a like for like replacement for stenographers. Consumers will have to decide if a cheaper price is more important than quality issues – and that is their choice.

* Names changed to protect privacy

Jana Gunter says:

November 14, 2015 at 2:50 am

There is nothing better than a stenocaptioner…and there probably never will be. Please complain about poor respeaking as, eventually, it will finally be realized that stenos need to cover realtime events/shows and they need to be paid accordingly for their amazing, unique, extremely difficult skill.
By the way, stenos do prep for broadcasts…they need to add in new/current language/names etc into their lexicon for clear, easier translation.
And I do have the utmost respect for true respeakers (rare) who have put in years of training (as a steno does) to get to that point of quality!

C121 Administrator says:
November 16, 2015 at 11:23 pm

We have seen some very good respeakers – they are masters of their craft. Sadly, they are not in the UK. Unfortunately in the UK some agencies are training people to work as respeakers within six weeks, which demonstrates a lack of understanding of such a career and how important quality is to the deaf people viewing the text output.

When we watch a top steno writer or respeaker at work, the difference in quality is quite apparent and it’s lovely to watch. They make it look so easy. But of course it isn’t, and it takes years to reach a high standard.

Chris says:

September 18, 2018 at 2:43 pm

Nice article. I work with a team of highly-skilled respeakers who all hit around 99% accuracy, though we do have to paraphrase at times and can’t hit anywhere near 350WPM. But that said, such high speed makes the subtitles move very quickly and it’s not to everyone’s taste. Respeakers take our work very seriously and have to maintain our software every day by adding new vocab and adapting it to our voices. There are loads of techniques to master and we work on a broad range of live TV programmes, so it’s a forever evolving craft. Having recently repurposed some stenographic output, I have to say not all stenos are at that elite standard and some are on par with your average respeaker now. The industry is changing. Sadly, it’s driven by cost-cutting but I really feel we’re in the final few years of humans doing this kind of work anyway because machine learning and AI is going to take over once the next tech leap arrives. Remember, we were typing on keyboards back in the ’80s doing this job, so it’s a matter of time before us subtitlers are out of a job. There will be that transition period where the human and the robot have to compete for work but eventually there will be no room for sentiment when you can employ a machine to work 24/7 at low running cost, almost flawlessly, compared to a human with their sick days, lapses in concentration and annual leave requirements!

Let’s talk : Respeaking

What is respeaking?

Trackbacks & Pingbacks

Leave a Reply

Leave a Reply Cancel reply

What is respeaking?

You might also like

Trackbacks & Pingbacks

Leave a Reply

Leave a Reply Cancel reply