Accessible Video and Audio

11 Accessible Video and Audio

If content you are sharing does not already have captions, consider the following:

Video type and captioning options.
Type of Video	Options
Videos that are shared	Replace videos with others that have good captions Ask the owner of the video (e.g. YouTube) to caption their content
Videos that you create / own	Upload your content to a platform that allows for auto-generated captions and edit where needed

When creating video content, where possible:

Consider writing a script – this will help with auto-captioning accuracy, improve recording efficiency and can double as a transcript.When creating video content, where possible:
Consider including in your script audio descriptions of what is visually taking place on the screen. This is called Interpreted Description and can reduce the need for described video accommodations.

For the distribution of .mp3 / audio files, the most essential alternative sensory modality that one can include is:

A transcript!
- (A plain-text version of the speech or audio in an audio recording)
Consistent feedback from Blind and Deaf learners, as well as learners for whom English is not a first language – having access to a video or audio transcript is not only preferred – it offer access to materials in a way that video / audio cannot.

Feedback from the McMaster community around inconsistency of ASR results (e.g. closed captions and transcripts) has been dependent on the following factors:
- Speaking in a “non-native” English accent (even when person’s 1st or 2nd language is an English dialect, e.g. Chinese English and Indian English).
- Using complex vocabulary (e.g. STEM and Humanities).
- Speaking with speech impairment.
We are hoping to acknowledge here that these experiences are real and valid.
- ASR currently depends on linguistics / language data-sets to “train” the Ai-technology to recognize both accent and vocabulary variation.
- Due to histories of intersectional oppression and colonization – Data sets containing rich and varied examples of accents, speech impairments and even specific vocabulary usages are limited in comparison to “native” English speaking data sets.
- Because of this (overly-simplified) explanation, those embodying the above experiences will face different and additional barriers than those who do not embody these experiences (e.g. needing to depend more on heavily scripted content to upload to a video’s captioning interface).