7 Conducting Sentiment Analysis Using Dictionaries
Moeez Omair and Rutwa Engineer
If you would like to try out the optional experiment included, please download and open the required files in your IDE: [Download Here]
Learning Objectives
In this module, we will explore dictionaries, a crucial data structure that is often used in programming. This module will help you to:
- Understand the basic concept behind dictionaries
- Extend your knowledge by creating and accessing information from dictionaries
- Witness dictionaries in action with an interactive experiment
- Review concepts covered in previous modules
What are Dictionaries?
Dictionaries are a type of data-structure which has the following properties:
- stores data in key : value pairs
- is mutable, meaning it can be modified even after initialization
- does not permit duplicate entries
A dictionary may contain multiple keys, and each key may contain multiple values of different datatypes.
An example of a dictionary may look like the following:
restaurant = {"name": "Baldi Bakery",
"founded": 1990,
"dishes": ["Banana Bread Breakfast", "Baguette Brunch", "Brownie Binner"]}
The dictionary above has 3 different keys: name, founded,
dishes:
- name: contains a string key – Baldi Bakery
- founded: contains an integer key – 1990
- dishes: contains a key which is a list of strings – [“Banana Bread Breakfast”, “Baguette Brunch”, “Brownie Binner”]
Manipulating Dictionaries
Let’s go over some basics for Dictionaries in Python. This section also contains code you may run in your IDE as you go.
Creating a Dictionary
A dictionary may be initialized as empty {} or already containing data such as the example above. We demonstrate examples below as well:
#try this
dict_1 = {}
dict_2 = {"key1": "value1", "key2": "value2", "key3": "value3"}
print(dict_1)
print(dict_2)
Accessing Items from a Dictionary
- Accessing keys: You may access a list of keys using the built-in
keys()
method. - Accessing values: If you would like to find the values stored within a key, you must use the
get("key")
method.
dict
= {"letters": ["a", "b"], "year": 2024}
print(dict.keys())
print(dict.get("year"))
Updating a Dictionary
A dictionary could be updated in multiple ways, we will go through a couple of them below. Let us assume we will be working with the same dictionary for all changes.
- Replacing existing data
We can replace existing data using update()
or the simple =
operator:
dict["letters"] = "c" # method 1
dict.update({"letters": "d"}) # method 2
- Adding new values to existing keys
Instead of replacing the values in letters, we may also concatenate new values using +=
or append():
dict["letters"] += "c" # method 1
dict["letters"].append("d") # method 2
- Introducing a new key : value pair
Suppose we want to add a new category called symbols to this dictionary. This can be similarly using +=
concatenation or update()
:
dict["symbols"] += ["!", "@", "#", "$"] # method 1
dict.update({"symbols": ["!", "@", "#", "$"]}) # method 2
- Removing items
We remove elements from dictionaries using the pop()
method:
dict.pop("year")
Maintaining Healthy Workplace Culture
(Optional Experiment)
The workplace is full of rich and unique experiences, however, elements of work life such as disagreements, stress, frustration, and more may unfortunately hinder the workspace environment. In fact, the Government of Canada published a detailed report as of 2022 detailing various statistics surrounding harassment and violence in workplaces. One of the findings note a 26% increase in harassment and violence reported compared to 2021 (Government of Canada). In light of this, its crucial for management systems to be in place and actively ensure the safety of all employees.
Suppose you are responsible for the management of a team and must deal with all comments, feedback, or complaints from your colleagues. Given the increasing frequency of incoming emails, you decide to come up with a system to provide a quick analysis of the tone you must use to approach the email. Essentially, you will be working with a simplified version of sentiment analysis (IBM). This is what you will be working on as part of this experiment.
If you would like to follow the experiment, please download the starter files above (or [click here] for convenience).
In the starter code, you are provided two CSV files. Each file contains the following two columns:
- The category the word belongs to
- The word itself
The first CSV file contains a selection of words suggesting professionality, which is called positive.csv:
positive.csv
Category,Attribute
inclusive,diverse
inclusive,multicultural
inclusive,equitable
inclusive,welcoming
inclusive,open-minded
inclusive,accepting
inclusive,integrative
inclusive,unbiased
professional,courteous
professional,respectful
professional,punctual
professional,competent
professional,reliable
professional,efficient
(...)
Similarly we provide negative.csv:
negative.csv
Category,Attribute
discriminatory,biased
discriminatory,partial
discriminatory,prejudiced
discriminatory,closed-minded
discriminatory,unfair
discriminatory,exclusive
discriminatory,non-inclusive
discriminatory,selective
offensive,inappropriate
offensive,insensitive
offensive,impolite
offensive,unpleasant
offensive,unacceptable
offensive,provocative
(...)
Let’s get into it, shall we?
Task 1: Read From the CSV Files and Populate the Dictionaries
Using what we have learned in this module and previous modules, we will read our CSV files and convert them into dictionaries:
"""
### Task 1: Read From the CSV Files and Populate the Dictionaries ###
"""
def read_csv(file_path):
dictionary = {}
with open(file_path, mode='r') as file:
csv_reader = csv.reader(file)
# skip header
next(csv_reader)
for row in csv_reader:
category, attribute = row
if category in dictionary:
dictionary[category].append(attribute)
else:
dictionary[category] = [attribute]
return dictionary
Task 2: Complete the Sentiment Analysis
- Convert the text into a list of words: We
read()
the txt file, convert all input tolower()
, andsplit()
the input into individual words. To further ensure that we can match the words to the positive and negative dictionaries, we remove any special characters (e.g. punctuation) using thestrip()
function. - Match the words: If the word is in the positive dictionary, we increment the positive sentiment score by 1. Similarly, if the word is in the negative dictionary, we instead increment the negative sentiment score by 1.
- Summarize the results: Once we have parsed through the email, an overall sentiment score is generated based on the difference between the positive and negative sentiment scores.
Based on the overall sentiment score, we can suggest the following findings:
-
Overall sentiment score > 0: The email reflects a positive experience!
-
Overall sentiment score == 0: The email reflects a neutral experience.
-
Overall sentiment score < 0: The email reflects a negative experience!
Task 3: Run the Code!
Now that we have completed our code, let us run it using the examples provided. In line 66, replace the first parameter in the function with the txt file you would like to analyze. The starter code comes with pos_email.txt
and neg_email.txt
, but feel free to create a custom txt file as well!
Limitations
- The program is based on two CSV files with between 40-50 words each. Given the diverse selection of words that can be used to portray the same message, it isn’t possible to capture all the words with a positive or negative connotation in a provided text.
- It is possible for the program to misinterpret the input. This program notes the presence of a word in the text, but does not take the context into account. For example, a phrase such as “She is not careless.” would result in a negative tally as “careless” is present.
References
ChatGPT (Used to generate the CSV files provided)
What is sentiment analysis?. IBM. (August 7, 2024) https://www.ibm.com/topics/sentiment-analysis
Python Dictionaries.W3schools. https://www.w3schools.com/python/python_dictionaries.asp
2022 annual report. Government of Canada. (August 6, 2024) https://www.canada.ca/en/employment-social-development/services/health-safety/reports/2022-workplace-harassment-violence.html