Imaging, Image Processing, and Optical Character Recognition
Image processing and analysis are particularly important in digital humanities research that is primarily visual, such as art, cultural heritage, and cultural analytics. However, because text analysis often involves digitizing originally written, analog texts, image processing has a crucial role to play in this area as well. Optical character recognition (OCR) is utilized to recognize text in digital images, relying heavily on image processing algorithms, particularly in the preprocessing stage. For instance, preprocessing consists of binarization, where the pixels of images of text are converted to two colours – black and white – that indicate whether the pixel is part of text or not. Binarization is usually performed by applying a threshold value to each pixel in the image. Assuming that the text in the digitized image is darker than the background, if the pixel value is above a certain grey-level threshold, it is set to white, and otherwise to black. However, in practice, most texts to be digitized have imperfections, shadows, etc., which make applying a threshold problematic. Consequently, OCR also incorporates advanced adaptive techniques from image processing in the binarization process. Skew correction is used to “straighten” text that has been skewed during the scanning process. Several techniques, some of them involving complex mathematical transformations of the image pixels, can be applied for skew correction. A common image processing operation is noise removal, in which artifacts, imperfections, spots, dirt marks, etc., as well as electronic noise inherent in the scanning process are mathematically removed or minimized. Thinning and skeletonization consist of applying image processing methods to make the stroke width uniform to transform characters into thin, skeleton representations. This step is particularly important if the text is handwritten. This step is optional and is often not necessary for printed documents.
Colour image processing involves manipulating the red, green, blue (and possibly opacity, quantified by the alpha value in computer graphics) channels of an image for analysis and is extensively used in applications involving cultural heritage and cultural analytics. Image processing is also used in the creation of image databases, and provide the techniques for image-based searches, especially relevant for digital archives and retrieving image information based on other image content rather than exclusively on metadata. Algorithms to recognize the content of images have improved markedly in the 2010s, with improved machine learning algorithms particularly advanced artificial neural networks (ANNs). Recognition of the content of digital images is now one of the key applications of machine learning. In some cases, it is possible for an algorithm to distinguish between different objects, such as a cat and a dog, in an image. Of course, this discernment capability is the basis of OCR. Python offers many image processing libraries.
Image processing techniques are also used to generate 3D images, or models, from 2D images. For example, 3D dimensional geometry, and therefore a 3D model of an artifact can be constructed from multiple 2D photographs of an artifact acquired from different orientations and viewpoint by geometry-based image processing algorithms. 3D surface models can also be generated techniques such as laser scanning, where a scanning device captures the x-, y-, and z-coordinates of a large number of discrete points from the surface of the scanned object. Image processing techniques are then used to mathematically form the 3D model from a set of coordinate points on its surface. The 3D model can then be displayed on a computer monitor. Users can interact with the model through input devices, enabling translation, scaling, and rotation. Furthermore, the 3D model can by physically constructed or manufactured, or it may be realized through 3D printing technology.
A recent innovation in image processing for the digital humanities is 4D reconstruction and interactive visualization of cultural heritage, where the temporal (time) dimension is introduced into 3D modeling to produce precise time-varying 3D constructions. Such techniques capture temporal geometric variations and distortions, i.e., allowing scholars to analyze the artifacts on both the spatial and temporal dimensions. However, 4D modeling is challenging because the data used for the 4D reconstructions are often found on unstructured web resources that, because they were not originally created for 3D modeling, contain outliers and noise (Doulamis et al., 2018).