Live Captioning

Someone reading text off an ipad. The speaker can be seen in the background.

As part of our drive to improve the accessibility of our learning materials this year, we have been captioning videos with Amara and also sourcing videos on Box of Broadcasts, which are invariably already captioned (though sometimes poorly it has to be admitted).  Occasionally we outsource captioning to an external company  and recently they got in touch to say they could provide a live captioning service.  This blog post is about our first foray into the world of live captioning.

Firstly, I had better explain what live captioning is.  It is the production, on the fly, of a transcript of what someone is saying, made available over the web live as the speaker is talking.

The process.

How it works (or at least my understanding of how it works from the end user’s point of view)… We stream an audio feed out of the lecture hall to the company doing the live captioning. They repeat what they hear, including punctuation, into a speech recognition program (something like Dragon Naturally Speaking I guess) which converts the spoken word into text.  This is then streamed to the web and can be read on mobile devices by anyone who connects to their website.  We imagine this is of most use to people in the lecture hall whose first language is not English and those who have hearing impairments.

Streaming out.

We decided to trial it at our Learning & Teaching Conference with our two keynote speakers. We were going to record the sessions anyway with Panopto (lecture capture system) and since this allows for live streaming we thought we could give the captioning company access to that stream. However, in testing we realised that there was something like a 30 second delay in getting the stream out of our campus. This delay, plus the delay inherent in re-reading and streaming it back (maybe 5 seconds), meant it was not feasible to use the Panopto stream.  Instead, we used Skype and only an audio channel and this worked very well with minimal delays.

Reading the captions.

To read the live captions, which might better be called a live transcript (?), you had to go to the captioning company’s website and join the session with a password.  Reading the captions as they stream in takes a little getting used to as you do not scroll down the page, the lines of text keep getting added at the top, so pushing all earlier lines down. However, you do get used to it and I imagine if you could not hear or could not understand very well, that it would be useful.

We also projected the live captions onto a screen at the front of the lecture room adjacent to the main projection board. However, the machine connected to the stream was connected via wifi and for some unidentified reason, it lost its connection after about ten minutes and so the transcript froze and an error message appeared in the middle of the screen alerting all in the room to the fact that the connection had been lost.  This was bad news for our test because those members of the audience who were interested, but who had elected not to connect via their own devices, could not do so now because the password was no longer displayed. (It had been projected at the start of the session).

The powerpoint lsides are projected onto themain screen and the transcipt is displayed on a screen next to them.


Despite the projection of the captions failing, the transcript kept coming onto people’s own devices and was by and large acceptably accurate. Occasionally you would see the captioner at the other end correct a spelling and sometimes they were not able to hear what was said (e.g. when a question was asked from the back of the room) and so they typed ‘inaudible’ or some such term. They also sometimes misheard a word or did not recognise a word. We could have improved accuracy and speed by providing the captioners with the PowerPoints in advance and possibly with a list of anticipated specialist words and names. However, we did not think of that; we will know for next time.

I said that the accuracy was by and large acceptably accurate. I based this on the fact that if I had been unable to hear, then I would have got the gist of what was being said. I also thought that even when you can hear, you often miss certain words.  You should also not view a live transcript in the same way as a tidied up transcript. Speakers often reformulate utterances and make false starts. When you read these in black and white they may not make sense, but that is what is being said and the live captions sometimes reflect this.  Here is a screenshot showing live captions followed by a 100% accurate transcription so you can judge for yourself.

live-captions 1 small

Below is a 100% accurate transcript of the same part of the speech with differences in red.

I’ll admit certainly not to being very conversant with the wider literature from USA, Australia and New Zealand, where the idea of student engagement, you could argue, originated and took hold, or certainly the terminology. That said, in the UK we do have a strong student union tradition and student representation at the course level is fairly well embedded, so our definitions will of necessity be set in that context. Vicki Trowler made this observation in the literature of the student engagement for the HEA – in the bulk of literature, student engagement literature, it’s concerned with time and effort on the part of students in relation to their learning or wider student experience. So this situates student engagement as a kind of investment – of intellectual, emotional or time resource on the part of the student in their learning. This literature is also concerned with the way that students build and fashion an identity that is tied up in their student experience but as I mentioned, and as Trowler observes, the UK context is one of student representation. HEFCE defines student engagement in 2008 as “the process by which institutions and sector bodies make deliberate attempts to involve and empower students in the process of shaping the learning experience”. Immediately we can see a fundamental difference of definition here between a traditional definition of student engagement that looks at students’ level of investment in their learning versus a newer UK specific definition that looks, that focuses the power of students to have an influence in determining what the learning should look like.

Closed captions for the Panopto recording.

A bonus for having paid for the live captions is that the captioning company provided us with the transcripts (in Word format) and also closed caption files in a format of our choosing (we asked for .srt). However, we made a mistake in the way we produced the caption files. The company needed access to the Panopto recording to synchronise the transcript, but before supplying this we edited the recording and because Panopto uses non-destructive editing, although the captioners started at the beginning of what they saw, this was not the real beginning of the recording and so when we uploaded the .srt files to Panopto  they did not synchronise. Next time we will know to provide an unedited recording and tell them to start the timing from the beginning even though we do not need captions until, say, twelve minutes in.  To solve this problem we fed the transcript into CaptionMaker (software) and used an unedited version of the recording and CaptionMaker then processed it into captions which did align properly.

However, using the live captions as closed captions for a recording is only a bonus in that you do not have to pay for normal captioning and can use the live captions as the basis for making accurate closed captions.  You cannot really use the live captions as closed captions because they are not accurate and while this might be acceptable in a live situation it is not acceptable on a recording.

Audience’s feedback.

We surveyed the audience about their reaction to the live captions. Only nine people responded saying they tried to access the live captions. All nine were native English speakers and none had any learning disability.

  • Were you able to successfully view the live captions on your device? 6 answered yes.
  • Did you find the live captions useful? 5 answered yes.
  • Would you find a transcript to read after the session useful? 5 answered yes.
  • Would you find a subtitled video recording of the session useful? 3 answered yes.
  • Anything else you want to say about the live captions?
    • I was interested in exploring it for students, rather than myself so have answered honestly above. I think it is a useful technology that is likely to help many of our students.
    • They weren’t useful for me as they were a little bit out of sync with the speaker (as expected) but to someone with hearing impairments it would be very useful.
    • The captions were great, although there were a few misheard terms that were incorrect (Eg. the speaker would say “I am” but the captions said “I and” and for some parts it wouldn’t make sense to a user that was hearing impaired – but this only happened a few times so I thought it was pretty successful. Another issue was WiFi, I was sat next to someone using the captions also, and they had no problems at all but my iPad kept losing connection but I only realised after waiting a short while for the captions to load up. This could be problematic if the user again is hard of hearing as they would’ve missed a great deal of what was being said from the speaker and would mean they would have to go back and reread to catch up. Strangely enough, for the last keynote I moved one seat across to see if that helped my connection issues, and it did! I had no issues with WiFi that time.

An alternative for a BSL signer?

I had thought that by providing live transcription we could dispense with the need to provide a British Sign Language signer but I have been informed by our disability coordinator that this is not the case as BSL is the students’ first language and they may not be able to read the English fast enough. Therefore I am unsure whether providing live captions is a reasonable adjustment.  The company that provided the service suggests that it is a reasonable adjustment for those who require a note taker though I would have thought that normal captioning would be as useful and would be more accurate.


It cost £65 per hour plus VAT and for that we got live captions, a transcript (the same content as the live captions) and a caption file for our recording (again, the same content as the live captions).


Technically it worked. We could have improved accuracy by providing the slides or notes to the captioner in advance. We need to be aware of how captions synchronise with Panopto recordings and we need to improve our wifi coverage in that lecture room. Will we use it again? Not sure, but it is something we can offer as a service if there is demand and budget.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Laurence Marks

away from the office..


Random cars we like


From the kitchen to the racetrack and back again

The Ali Lowe Commentary

The view from the shed...

The Lure of Speed

Vintage Motoring Blog

Rob Appleyard

My thoughts on learning, technology and stuff that makes me tick

%d bloggers like this: