12.2. Big Data
As mentioned previously, the rate and quantity of data that is being created is growing rapidly. According to estimates, the amount of data created, stored, and used globally over the next five years (2020-2025) is expected to reach 180 zettabytes, which is roughly enough to fill 10 million DVDs (Holst, 2021, June 7).
Every time you perform an internet search, shop online, or post a video to social media, you are contributing to the digital repository of data. Data is becoming so voluminous and complex that it is difficult for existing systems to handle and process. This is the definition of big data. Big data can be categorized by the three V’s: Volume (size), Variety (# of data types), and Velocity (processing speed).
- Volume. Refers to the size of the datasets. While data can be stored and accessed in what feels like an infinite number of ways, there isn’t a set rule for how large, or how much volume is needed for the datasets. It really depends on the organization’s processing power and the statistical/analytical techniques that are needed. While volume can refer to a single dataset, the plural term was purposely used as, typically, multiple sources of data are used when using big data analytics.
- Variety. While related to volume, variety refers to the types of data used within the dataset or datasets being selected for analysis. If the data contains a large amount of variety, this can reduce the amount of volume needed before the analysis would be placed in the realm of big data. With more complexity introduced through the variety of the data, the traditional analytical techniques are also reduced, and with a large volume, traditional statistical software may simply not be able to handle the complexity and size.
- Velocity. Big datasets can be so large in size that they can’t necessarily be moved from storage to analysis; instead, they need to be ‘streamed’ through the analysis. Velocity refers to the speed in which all sources can work together and be analyzed. So as the volume and variety increase, the ability to provide adequate velocity to traditional analysis decreases. When the reliability of this begins to falter, big data begins to emerge as the solution for the analysis.
For organizations, the ability to collect data has become much more cost-effective due to advances in technology. However, more data does not equal better data. As the size of data storage methods increases, the ability to use traditional techniques to identify and transform data becomes more complex. It is important to understand this concept and the consequences it has for managerial decision-making. Organizations need to consider the characteristics of data as they design processes, roles and systems to help make sense of data gathered to make informed decisions.
“Chapter 3 Data Analytics” from Information Systems: No Boundaries! Copyright © 2021 by Shane M Schartz is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.