"

8 Tiny Language Models and Indigenous languages (part 2)

Lanz Angeles; Moeez Omair; and Rutwa Engineer

If you would like to try out the optional experiments included, please download and open the required files in your IDE: [Download Here]

Disclaimer

The content of this module heavily relies on the translations and alphabets provided by FirstVoices, a website that provides resources for multiple different First Nations languages. As a result, we cannot guarantee the accuracy of all linguistic information shown throughout this module. Furthermore, localized communities concerning the nations discussed in this module were not consulted. Nevertheless, we believe that using these resources and starting a discussion about First Nations peoples can still give us some insight into their experiences as well as their culture and language for the future and beyond.

 

Sub-Module 2: Lazy, First Nations Literature (LFNL)

Learning Objectives

In this second sub-module, you will

  • Understand how Lazy Literature works for languages other than English
  • Learn the differences and similarities between English and Babine
  • Realize how First Nations languages, like Babine, have been preserved today
  • Learn how some assumptions in programming can lead to incorrect outputs

You can also follow this sub-module by using these slides [PDF]. They cover the same content and are added for your convenience.

Lazy, First Nations Literature – An Introduction

In Lazy Literature, our data (”corpus”) only consists of English stories.

Though this allows us to generate random English sentences in the style of our literature, we can use our program for other languages as well.

Let’s explore this idea using another language spoken by the First Nations peoples: Babine–Witsuwit’en!

What is Babine–Witsuwit’en?

Babine–Witsuwit’en is a language primarily spoken by the First Nations people residing in the British Columbia Interior. The language consists of two dialects:

  1. Babine/Nedut’en, spoken by the Babine people
  2. Witsuwit’en, spoken by the Wet’suwet’en people

Notably, Babine–Witsuwit’en is an endangered language, with the number of speakers for each dialect being in the hundreds.

For this module, let’s work with the Babine dialect of Babine–Witsuwit’en. We’ll be using FirstVoices’ Babine word translations, found here, as well as excerpts from the following FirstVoices’ Babine stories:

Learn – An English-Babine Example

Let’s familiarize ourselves with Babine!

You’re given a data folder consisting of four stories, both in English and Babine. Let’s take a closer look at one of the stories, which uses eg1_eng.txt and eg1_bab.txt.

Example 1

The first line of eg1_eng.txt is:

Dineeze’ is here.

Translated to Babine, this same line in eg1_bab.txt can be expressed as

Dineeze’ ‘eet sde.

What differences or similarities do you notice between these lines?

Clearly, using one line of a short story isn’t sufficient if we want to compare two languages together. So, let’s consider the entirety of these files.

Example 2

Here’s what eg1_eng.txt looks like:

Dineeze’ is here.
He tells me not to worry.
The people are not late.
We are early.

In comparison, here’s what eg1_bab.txt looks like:

Dineeze’ ‘eet sde.
Dineeze’ ‘idnee ‘windzeewendeedeyh.
‘Aweet ts’iyewh wihadadilh.
‘Inu ‘agh witsats ‘endeel.

What do you notice?

Currently, we’ve based our observations only on two versions of a single short story excerpt. For longer stories, comparing these languages purely by eye can be difficult.

 

To make things easier, we can lazily rely on Lazy Literature!

Optional Experiment

Click here to access all the text excerpts we’ll be working with! You’ll need them to complete the upcoming tasks and extra questions.

This task is very open-ended and, thus, there is not a single answer we’re looking for.

Consider the short story “Salmon Fishing at Old Fort“, which is provided in both English and Babine (salmon_fishing_eng.txt and salmon_fishing_bab.txt respectively)

Randomly generate a piece of Babine text (1-gram) using this short story. What do you think your piece of text is generally about? Show your understanding by translating some of its words using the language’s dictionary found here.

Note: You do not have to translate your text word by word.

Example

Suppose our Babine sentence is “C’en t’ah wits’ik, Jilh Ts’e Yu tl’a C’idim T’en deed’ah Nu-co k’it lhok ‘alh’ah.”, which is the first line of the file “eg3_bab.txt“.

Although we may not be able to translate most of the sentence, the language’s dictionary shows that “lhok” translates to “Fish”. Therefore, our sentence seems to be somehow related to fish.

Let’s Complete The Task!

Extra Question

This question is not necessary, but it can help you gain more insight into the preservation of First Nations languages and the intricacies of Babine!

Let’s revisit the experiment where you are tasked to translate a randomly generated Babine sentence. How difficult was it to translate your Babine piece of text? Why might this be the case? Use your randomly generated text to justify your reasoning.

Completing Extra Question…

References

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Introducing Critical Algorithmic Literacies in Computer Programming Copyright © by Rutwa Engineer; Moeez Omair; Alisha Hasan; Adelina Patlatii; Lanz Angeles; Sana Sarin; Madhav Ajayamohan; and marianne is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.