Conducting Sentiment Analysis Using Dictionaries

Moeez Omair; Rutwa Engineer

7 Conducting Sentiment Analysis Using Dictionaries

Moeez Omair and Rutwa Engineer

If you would like to try out the optional experiment included, please download and open the required files in your IDE: [Download Here]

Learning Objectives

In this module, we will explore dictionaries, a crucial data structure that is often used in programming. This module will help you to:

Understand the basic concept behind dictionaries
Extend your knowledge by creating and accessing information from dictionaries
Witness dictionaries in action with an interactive experiment
Review concepts covered in previous modules

What are Dictionaries?

Dictionaries are a type of data-structure which has the following properties:

stores data in key : value pairs
is mutable, meaning it can be modified even after initialization
does not permit duplicate entries

A dictionary may contain multiple keys, and each key may contain multiple values of different datatypes.

An example of a dictionary may look like the following:

restaurant = {"name": "Baldi Bakery",
"founded": 1990,
"dishes": ["Banana Bread Breakfast", "Baguette Brunch", "Brownie Binner"]}

The dictionary above has 3 different keys: name, founded, dishes:

name: contains a string key – Baldi Bakery
founded: contains an integer key – 1990
dishes: contains a key which is a list of strings – [“Banana Bread Breakfast”, “Baguette Brunch”, “Brownie Binner”]

Manipulating Dictionaries

Let’s go over some basics for Dictionaries in Python. This section also contains code you may run in your IDE as you go.

Creating a Dictionary

A dictionary may be initialized as empty {} or already containing data such as the example above. We demonstrate examples below as well:

#try this

dict_1 = {}

dict_2 = {"key1": "value1", "key2": "value2", "key3": "value3"}

print(dict_1)

print(dict_2)

Accessing Items from a Dictionary

Accessing keys: You may access a list of keys using the built-in keys() method.
Accessing values: If you would like to find the values stored within a key, you must use the get("key") method.

dict = {"letters": ["a", "b"], "year": 2024}

print(dict.keys())

print(dict.get("year"))

Updating a Dictionary

A dictionary could be updated in multiple ways, we will go through a couple of them below. Let us assume we will be working with the same dictionary for all changes.

Replacing existing data

We can replace existing data using update() or the simple = operator:

dict["letters"] = "c" # method 1

dict.update({"letters": "d"}) # method 2

Adding new values to existing keys

Instead of replacing the values in letters, we may also concatenate new values using += or append():

dict["letters"] += "c" # method 1

dict["letters"].append("d") # method 2

Introducing a new key : value pair

Suppose we want to add a new category called symbols to this dictionary. This can be similarly using += concatenation or update():

dict["symbols"] += ["!", "@", "#", "$"] # method 1

dict.update({"symbols": ["!", "@", "#", "$"]}) # method 2

Removing items

We remove elements from dictionaries using the pop() method:

dict.pop("year")

Maintaining Healthy Workplace Culture

(Optional Experiment)

The workplace is full of rich and unique experiences, however, elements of work life such as disagreements, stress, frustration, and more may unfortunately hinder the workspace environment. In fact, the Government of Canada published a detailed report as of 2022 detailing various statistics surrounding harassment and violence in workplaces. One of the findings note a 26% increase in harassment and violence reported compared to 2021 (Government of Canada). In light of this, its crucial for management systems to be in place and actively ensure the safety of all employees.

Suppose you are responsible for the management of a team and must deal with all comments, feedback, or complaints from your colleagues. Given the increasing frequency of incoming emails, you decide to come up with a system to provide a quick analysis of the tone you must use to approach the email. Essentially, you will be working with a simplified version of sentiment analysis (IBM). This is what you will be working on as part of this experiment.

If you would like to follow the experiment, please download the starter files above (or [click here] for convenience).

In the starter code, you are provided two CSV files. Each file contains the following two columns:

The category the word belongs to
The word itself

The first CSV file contains a selection of words suggesting professionality, which is called positive.csv:

positive.csv

Category,Attribute

inclusive,diverse

inclusive,multicultural

inclusive,equitable

inclusive,welcoming

inclusive,open-minded

inclusive,accepting

inclusive,integrative

inclusive,unbiased

professional,courteous

professional,respectful

professional,punctual

professional,competent

professional,reliable

professional,efficient
(...)

Similarly we provide negative.csv:

negative.csv

Category,Attribute

discriminatory,biased

discriminatory,partial

discriminatory,prejudiced

discriminatory,closed-minded

discriminatory,unfair

discriminatory,exclusive

discriminatory,non-inclusive

discriminatory,selective

offensive,inappropriate

offensive,insensitive

offensive,impolite

offensive,unpleasant

offensive,unacceptable

offensive,provocative

(...)

Let’s get into it, shall we?

Task 1: Read From the CSV Files and Populate the Dictionaries

Using what we have learned in this module and previous modules, we will read our CSV files and convert them into dictionaries:

"""

### Task 1: Read From the CSV Files and Populate the Dictionaries ###

"""

def read_csv(file_path):

dictionary = {}

with open(file_path, mode='r') as file:

csv_reader = csv.reader(file)

# skip header

next(csv_reader)

for row in csv_reader:

category, attribute = row

if category in dictionary:

dictionary[category].append(attribute)

else:

dictionary[category] = [attribute]

return dictionary

Task 2: Complete the Sentiment Analysis

Given a txt file, we want to output the overall sentiment of the email. In order to do so, the following steps are required:

Convert the text into a list of words: We read() the txt file, convert all input to lower(), and split() the input into individual words. To further ensure that we can match the words to the positive and negative dictionaries, we remove any special characters (e.g. punctuation) using the strip() function.
Match the words: If the word is in the positive dictionary, we increment the positive sentiment score by 1. Similarly, if the word is in the negative dictionary, we instead increment the negative sentiment score by 1.
Summarize the results: Once we have parsed through the email, an overall sentiment score is generated based on the difference between the positive and negative sentiment scores.

Based on the overall sentiment score, we can suggest the following findings:

Overall sentiment score > 0: The email reflects a positive experience!
Overall sentiment score == 0: The email reflects a neutral experience.
Overall sentiment score < 0: The email reflects a negative experience!

Task 3: Run the Code!

Now that we have completed our code, let us run it using the examples provided. In line 66, replace the first parameter in the function with the txt file you would like to analyze. The starter code comes with pos_email.txt and neg_email.txt, but feel free to create a custom txt file as well!

Limitations

It is important to note that a program like this is limited in nature for the sake of demonstration and readability:

The program is based on two CSV files with between 40-50 words each. Given the diverse selection of words that can be used to portray the same message, it isn’t possible to capture all the words with a positive or negative connotation in a provided text.
It is possible for the program to misinterpret the input. This program notes the presence of a word in the text, but does not take the context into account. For example, a phrase such as “She is not careless.” would result in a negative tally as “careless” is present.

Despite these limitations, this program should provide you with a new concept to implement going forward. You may also look into implementing a more robust version of this program if you would like!

References

ChatGPT (Used to generate the CSV files provided)

What is sentiment analysis?. IBM. (August 7, 2024) https://www.ibm.com/topics/sentiment-analysis

Python Dictionaries.W3schools. https://www.w3schools.com/python/python_dictionaries.asp

2022 annual report. Government of Canada. (August 6, 2024) https://www.canada.ca/en/employment-social-development/services/health-safety/reports/2022-workplace-harassment-violence.html

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Introducing Critical Algorithmic Literacies in Computer Programming Copyright © by Rutwa Engineer; Moeez Omair; Alisha Hasan; Adelina Patlatii; Lanz Angeles; Sana Sarin; Madhav Ajayamohan; and Izabelle Marianne is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.