Data and Big Data

Many of these areas, but by no means all, require some form of “Big Data” analysis.  Big Data is a relatively new term used to refer to data that cannot be efficiently or robustly processed by standard statistical and computational techniques and requires specialized approaches.  Big Data is primarily characterized by the “three v’s”: volume, velocity, and variety.  Volume refers to the sheer size of the data.  Although there is not a “threshold” size separating Big Data from non-Big Data, the former usually indicates data whose size is on the order of Terabytes (TB) or larger, where a Terabyte is approximately a trillion bytes (1,000,000,000,000 or 1012 bytes), or, more precisely, since binary numbering is used for digital objects, 1,099,511,627,776, or 240 bytes.  Velocity indicates that data are acquired very quickly, and that the volume of data grows very rapidly.  An example illustrating Big Data velocity is the pace at which social media posts are generated .  Another example is the rapid expansion of meteorological data because of the increasingly large number of weather stations.   Variety refers to heterogeneity of the data.  For example, whereas most temperature data consist of floating-point numbers (values that can be expressed with decimal numbers), social media data consist of text, timestamps, location information, and other data, some of which may not be present in every case.   Because of these characteristics, new approaches are required to acquire, store, access, process, visualize, and analyze Big Data, and new scientific sub-disciplines have recently emerged to address these challenges.

 

Many examples of Big Data can be seen in the context of the digital humanities.  For instance, large corpora of textual material consist of possibly billions or and trillions of words.  Social network analysis consists of millions of relationships, each one having its own characteristics.  Large image collections are considered to be Big Data because of the large number of picture elements in each individual image.  Consequently, Big Data processing is increasingly relevant to the digital humanities.

[NEXT]

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Contemporary Digital Humanities Copyright © 2022 by Mark P. Wachowiak is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book