"

3 The Data Life Cycle and Quality Assurance

Data Lifecycle: Understanding the Journey of Digitized Data

In Chapter Two we worked through the steps in the Digitization process while in this chapter, we are highlighting the data lifecycle. You may think these are the same thing however, the digitization process is a subset of the data lifecycle. The digitization process is about creating digital assets from physical ones, while the data lifecycle is about how these digital assets are managed, used, and preserved over time. Some clients may require long-term archiving, while others may request data destruction within a short timeframe. Despite these variations, the data lifecycle generally follows a standard set of stages:

  • Creation: The birth of digital assets and metadata through scanning, image capture, or manual entry.
  • Storage: Storing “live data” in databases or Digital Asset Management systems for frequent access.
  • Use: Accessing and utilizing data for reporting, sharing, and other purposes.
  • Archiving: Moving data to long-term digital storage, where it may be accessed infrequently or not at all during its retention period.
  • Destruction: Permanently deleting physical and digital files once the retention period ends.

In the following video “What is Data Lifecycle,” Pivotal Stats presents the five fundamental stages of data management: creation through acquisition and generation, storage selection, active usage, archival preservation, and secure deletion. The presentation emphasizes critical infrastructure considerations including security protocols and scalability requirements, while highlighting the role of structured retention policies in maintaining effective data governance.

Reference: Pivotal Stats. (2024, February 27). What is Data Lifecycle #dataanalytics [Video]. https://youtube.com/shorts/wgtuu2kazaI (1min)

 

Creation Stage:  Distinguishing Between Digital Artifacts and Their Associated Data

In digital content management, there’s an important distinction between digital artifacts (documents, images, etc.) and their associated data. When you digitize a document, the document itself becomes essentially a digital image, while generating several types of associated information that we refer to as “data”:

  • Machine-Readable Content
  • Text extracted through OCR
  • Indexed content for searching
  • Metadata
  • Technical details (file format, size)
  • Administrative information (who created it, when)
  • Tracking Information
  • Who accessed or modified it
  • When changes were made
  • Version history

This separation is crucial because modern information systems rely on this associated data, not the artifact itself, to make documents searchable, trackable, and manageable. For example, when you search for a document, you’re actually searching through its associated data rather than the document itself.

Think of it like a library catalog system – the books are the artifacts, while all the information that helps you find and manage those books (author, title, location, checkout history) is the associated data.

Data Storage Stage: Best Practices for Storing and Preserving Digital Files

To ensure the longevity and integrity of digital assets, it’s essential to use file formats that prioritize long-term stable storage and minimize the risk of file degradation or decay. Here are some recommended formats for various data types:

The Silent Threat of Data Degradation

Contrary to popular belief, digital files are not immune to decay and degradation.

This process, known as data degradation or “data rot,” is primarily caused by improper saving, storage, or hardware and software failures when accessing and writing files.

In the video “FAQs on Bit Rot,” the Council of State Archivists and State Electronic Records Initiative examine the critical issue of data degradation in digital preservation. The presentation explores the underlying causes of bit rot, including semiconductor errors and storage media deterioration across various platforms such as hard drives, optical media, and magnetic storage.

The content details preventative measures, addressing data integrity verification through checksums, systematic data refreshment protocols, and environmental control strategies. Understanding these preservation challenges and mitigation techniques is fundamental for digitization professionals responsible for maintaining long-term digital record integrity.

Reference: Council of State Archivists. (2019). FAQs on Bit Rot [Video]. YouTube. https://youtube.com/watch?v=a50B801U0RA ( 9:58 min)

image

Reference: Jim Salter, CC BY-SA 4.0. Wikimedia. [Photo] https://commons.wikimedia.org/w/index.php?curid=64047012

Fun Facts

Data decay can also occur on files hosted on the internet, known as “Link Rot.” Examples of link rot can be seen on pages prior to 2004 on The Internet Archive, where many 404 errors exist, especially on images hosted by outdated services like GeoCities or Angelfire. Even popular games like Pokémon Go are not immune to data degradation. Its soundtrack and audio files are slowly being degraded with each update to the game!

Data degradation can manifest in various ways, such as:

  • File corruption
  • Pixelation and errors in images
  • Noise and disruptions in audio files
  • Broken links leading to 404 errors

 

Causes of Data Degradation

Several factors can contribute to data degradation, including:

  • Writing & Access Failures: The more a file is accessed, altered, edited, or changed, the more likely it is to become corrupted.
  • Cosmic Rays: High-speed particles from the sun or outside the solar system can be harmful to electronics, specifically their memory.
  • Poorly Maintained Databases: Inadequate maintenance can lead to files going missing or becoming inaccessible, and outdated databases can hinder data retrieval.
  • Cloud Service Outages and Data Corruption: Disruptions in cloud storage services and corruption during data transfer can also contribute to degradation.

 

Preventing Data Degradation

To safeguard digital assets from the pitfalls of data degradation, consider implementing the following best practices:

  • Regular Audits and Checks: Perform routine checks to confirm data completeness and accuracy, detect errors early, and identify storage system issues.
  • Migration to Updated File Formats: Migrate outdated or old file formats to newer or more retention-friendly formats to ensure accessibility and readability by software and machines.
  • Cloud Storage with Backups: Utilize cloud storage providers that create multiple backups of data to ensure recovery in case of disasters.
  • Proper Data Storage and Organization: Store data correctly and organize databases to make data searchable, accessible, and easily retrievable. Keep databases up-to-date with current technology and standards to prevent redundancy.

 

Real-World Examples: Triumphs and Pitfalls in Digital Preservation

The New York Philharmonic’s Digital Archives Project

The New York Philharmonic, one of the world’s most renowned orchestras, embarked on an ambitious digital preservation project to ensure the longevity of its vast collection of historical documents, music scores, and recordings. The orchestra partnered with a team of archivists, librarians, and technology experts to digitize and preserve over 3 million pages of archival material spanning 175 years. The project’s success relied on collaboration between various professionals, meticulous metadata cataloging, digitization in multiple formats (including high-resolution images and PDF/A), and secure storage on servers with multiple backups. As a result, the orchestra’s rich history is now accessible worldwide, ensuring their legacy will endure for generations.

The BBC Domesday Project

In 1986, the BBC launched the Domesday Project to create a multimedia snapshot of life in the United Kingdom. The project collected data, images, and video from over a million contributors and stored it on two custom-made LaserDiscs. However, just 15 years later, the data became virtually inaccessible due to technological obsolescence. It took a team of experts three years and significant resources to recover the data and migrate it to a more accessible format. This cautionary tale highlights the risks of relying on proprietary technologies and the importance of proactive data migration. It underscores the need for sustainable digital preservation practices to avoid costly and time-consuming data recovery efforts.

The Final Stage in the Data Lifecycle –  Data Destruction

You might wonder why companies choose to destroy data after a certain period. The answer lies in legal requirements and practical considerations. Data is typically maintained for a specified retention period, after which it must be destroyed. Consider the following example:

  • Colleges capture and store personal information about students, such as names, course schedules, and student numbers. Once a student graduates, the retention period begins, lasting anywhere from 7 to 55 years, depending on the institution’s policies. During this time, students may need to access their records for various reasons, such as confirming their degree or obtaining tax documents like T4As.
  • While it’s crucial to retain data for an adequate period to fulfill these needs, storing information indefinitely would be excessive and costly for the college. Balancing the need for accessibility with the practicality of storage is a key factor in determining retention periods.
  • As you navigate your digitization career, remember that understanding the data lifecycle and its stages will help you better serve your clients’ needs while ensuring compliance with legal requirements and practical considerations surrounding data retention and destruction.

Data Life cycle Summary

Understanding the various types of data, the importance of digital preservation, and the challenges posed by data degradation is paramount in the digitization industry. By adopting best practices for storing and preserving digital files, performing regular audits, and staying informed about the latest technologies and standards, you can help ensure that valuable digital assets remain accessible and intact for generations to come. As you embark on your career in digitization, keep these lessons in mind and strive to be a champion for sustainable digital preservation practices.


Understanding Your Role in Quality Assurance

 

In the digitization industry, every person serves as a guardian of document quality. Consider this scenario: A medical record arrives at your workstation. After digitization, the physical copy will be destroyed, making your digital version the only remaining record of someone’s medical history. This reality exemplifies why quality isn’t just a department’s responsibility – it’s your professional obligation at every moment.

Foundations of Quality in Digitization

Quality management in digitization encompasses two distinct but complementary approaches: Quality Assurance and Quality Control. While these terms are often used interchangeably in casual conversation, understanding their fundamental differences is crucial for effective implementation in professional settings.

Quality Assurance (QA) represents a proactive approach focused on preventing defects before they occur. This preventive strategy involves establishing standardized procedures, providing comprehensive staff training, implementing preventive measures, and maintaining consistent documentation throughout the digitization process. By focusing on prevention, QA helps organizations avoid costly corrections and rework later in the process.

In contrast, Quality Control (QC) functions as a reactive process, involving the inspection of completed work, identification of existing defects, implementation of corrective measures, and documentation of issues for process improvement. While QC catches errors that slip through preventive measures, it serves as a crucial complement to QA rather than a replacement for it.

The Digitization Quality Framework

Quality management in digitization operates across four key stages, each requiring specific attention to maintain high standards. Understanding these stages helps practitioners implement appropriate quality measures at each point in the process.

Receiving Stage

The quality journey begins when documents first arrive at the digitization facility. During receiving, staff implement document tracking systems and verify inventory against provided manifests. This initial stage includes assessing document condition and establishing chain of custody documentation. Quality measures at this stage prevent downstream issues by ensuring all materials are properly accounted for and their condition documented.

Key quality measures during receiving include confirming document counts against client manifests, establishing tracking mechanisms, and documenting any damaged or problematic materials. Staff must also verify that the number of registered documents in the system matches the physical count, creating a reliable baseline for the entire digitization process.

Preparation Stage

Document preparation involves conditioning physical documents for scanning. This critical stage includes careful removal of fasteners like staples and paper clips, addressing creases or folds, and identifying documents that may require special handling. Quality measures focus on ensuring documents are properly prepared while preventing damage that could affect scan quality.

During preparation, quality assurance activities include thorough inspection for remaining fasteners, verification that no documents are lost during preparation, and identification of materials that may present scanning challenges. Staff must communicate any potential issues to scanning operators to ensure appropriate handling of problematic documents.

Scanning Stage

The scanning process represents a crucial transition from physical to digital format. Quality measures during scanning encompass equipment calibration, monitoring image capture quality, and ensuring appropriate resolution and format standards are met. Operators must maintain constant vigilance for issues like scan lines, double feeds, and other quality defects that can occur during this stage.

Quality measures during scanning include using document joggers to prevent misfeeds, monitoring image quality in real-time, addressing any issues missed in previous stages, and ensuring proper scanner maintenance. Operators must also watch for double feeds and maintain detailed logs of any problems encountered.

Data Entry Stage

During data entry, staff extract and record metadata from documents. Quality measures focus on ensuring accuracy, completeness, and consistency of entered data. This stage often includes verification steps and may involve both automated and manual quality checks to maintain data integrity.

Quality assurance during data entry includes identifying any issues missed in previous stages, verifying AI-extracted data when applicable, confirming entered data against source documents, and making logical decisions when interpreting handwriting or multilingual content. Digital confirmation tools may also verify the presence of required metadata fields during upload.

Common Quality Issues and Solutions

Physical Document Issues

Staples and Paper Clips

Remaining fasteners present significant challenges during scanning. These items appear as dark spots or irregular shapes in scanned images and can potentially damage scanning equipment and the physical document. Pages can be torn in the scanner if the fastener is not removed. 

Solution: Documents with staples or paper clips must be re-prepped by removing the fasteners, then rescanned to ensure a clean digital file is created. Consistent checks during the prep phase can prevent this issue.

Dog Ears and Sticky Notes

Folded corners and attached notes can obscure important information during scanning. Dog ears typically appear as white triangles in corners of scanned images, while sticky notes may create squares of missing or covered text. 

Solution: Dog eared documents and sticky notes are usually handled in prep. Sticky notes are removed from pages and taped to an individual sheet of paper or to a blank back. When identified in QC or indexing a re-scan should be requested. The folded pages should be straightened and re-scanned.

Digital Quality Issues

Scan Lines

Scan lines appear as straight or irregular lines running across the digital image but are absent from the physical original copy. These artifacts typically result from debris on scanner glass, including paper dust, toner residue, adhesive residue, or other contaminants. 

Solution: Scan lines can be corrected by cleaning the scanner’s glass from debris and rescanning the document to ensure no lines appear.

Double Feeds

Double feeds occur when scanners pull multiple pages through simultaneously, potentially missing content or causing paper jams. These issues can be particularly problematic with high-speed industrial scanners, where double feeds might occur without causing obvious jams. High speed scanners often will have double feed detection capabilities. This feature measures the thickness of a page so if two pages pass through the scanner it is able to recognize that two or more pages are together and halt scanning. 

Solution: Double feeds can sometimes be reduced by proper preparation of documents to be scanned. Double feeds are corrected by rescanning the affected pages. If detected early, the operator can clear the jam and re-scan the affected pages.

Poor Scan Quality

Image quality issues can manifest as low resolution, unclear text, dark spots, or color distortion. These problems may stem from improper scanner settings, equipment issues, or damaged originals. 

Solution: Scan quality should be checked at the QC stage and rescans should be requested if the image is unclear or doesn’t meet the required standards. Maintaining clean scanners, using correct settings, and training operators to recognize poor quality are key to preventing this issue.

Business Impact and Quality Economics – The Cost Escalation Principle

The Cost Escalation Principle, also known as the 1-10-100 Rule, demonstrates how quality-related costs increase dramatically when issues remain unaddressed. This principle breaks down costs into three stages:

  • Point of Entry Correction ($1): Issues identified and corrected during initial processing incur minimal costs, typically involving simple adjustments to procedures or quick rescans.
  • Batch Processing Correction ($10): Problems discovered during quality control stages require more resources, potentially disrupting workflows and requiring significant rework.
  • Post-Deployment Correction ($100): Issues found after project completion incur the highest costs, potentially requiring complete reprocessing, damaging client relationships, and impacting company reputation.

Quality Control Methodologies

Sampling Methods

Quality control employs various sampling approaches based on project requirements and risk levels:

  • Random Sampling involves inspecting randomly selected documents from each batch. This method works well for large-volume projects with consistent document types but carries some risk of missing systematic errors.
  • Fixed Interval Sampling examines documents at predetermined intervals, providing more systematic coverage than random sampling. This approach helps identify patterns of errors that might occur at specific points in the process.
  • 100% Inspection involves reviewing every digitized document, typically reserved for high-stakes projects where error tolerance is minimal. While resource-intensive, this method provides the highest level of quality assurance.

 

Quality Metrics and KPIs

Effective quality management requires measurable indicators of performance. Key metrics include:

  • Turnaround Time (TAT): Measures the total time from document receipt to delivery of digitized files. This metric helps identify process bottlenecks and efficiency improvements.
  • Exception Count: Tracks the number of documents requiring special handling or missing critical metadata. High exception counts may indicate systematic issues in data entry or document preparation.
  • Error Rates: Measures the frequency of quality issues across different stages of the digitization process, helping identify areas needing improvement.

Professional Development in Quality Assurance

Quality assurance professionals must maintain current knowledge of industry best practices and emerging technologies. Ongoing professional development should include:

  • Technical Training: Understanding new scanning technologies, document handling techniques, and quality control methodologies.
  • Process Improvement: Learning advanced quality management techniques and process optimization strategies.
  • Regulatory Compliance: Staying current with industry-specific regulations and compliance requirements.

Quality Assurance and Control Summary

Quality assurance and control in digitization require a comprehensive understanding of both preventive and corrective measures. Success depends on implementing appropriate quality measures throughout the digitization process, from receiving through final delivery. The financial impact of quality issues makes early detection and correction crucial, while evolving technology and regulatory requirements demand continuous professional development.


ISO Standards

Now that we understand how quality assurance works in digitization, let’s see how it connects with international standards. Think of QA as your personal checklist for doing great work, while ISO standards are like the rulebook that everyone in the industry follows. When digitization companies combine QA processes (like checking image quality and accuracy) with ISO standards (such as ISO 9001 and ISO/IEC 27001), they create a powerful system. ISO standards are internationally agreed by experts. These standards are the distilled wisdom of people with expertise in their subject matter and who know the needs of the organizations they represent – people such as manufacturers, sellers, buyers, customers, trade associations, users or regulators. (Source: https://www.iso.org/standards.html )

Benefits of Adhering to ISO Standards

ISO was founded with the idea of answering a fundamental question: “what’s the best way of doing this?”

It started with the obvious things like weights and measures, and over the last 50 years has developed into a family of standards that cover everything from the shoes we stand in, to the Wi-Fi networks that connect us invisibly to each other.

Addressing all these and more, International Standards mean that consumers can have confidence that their products are safe, reliable and of good quality. ISO’s standards on road safety, toy safety and secure medical packaging are just a few of those that help make the world a safer place.

Regulators and governments count on ISO standards to help develop better regulation, knowing they have a sound basis thanks to the involvement of globally-established experts.

Watch the following video to find out more about how ISO’s 25,558 standards touch almost all aspects of daily life, and work for businesses large and small. With International Standards on air, water and soil quality, on emissions of gases and radiation, and environmental aspects of products, they protect the health of the planet and people, beyond bringing economic benefits.

Reference: ISO. (2009). What ISO standards do for you [Video]. YouTube. https://youtube.com/watch?v=AYBVTeqKahk (2:05min)

ISO 15489-1:2016

ISO 15489-1:2016 is crucial for digitization companies because it provides a comprehensive framework for managing records effectively in the digital era. As the world shifts from paper to digital formats, digitization companies are at the forefront of converting physical records into digital ones. This standard helps ensure that these digital records are created, stored, and maintained in a way that guarantees their authenticity, reliability, integrity, and usability over time.

Here’s why it’s important:

 

With this in mind, if you are working for a digitization company that is striving to meet ISO 15489-1:2016 standards then you should see:

  • Records Management System: Organizations need a system to handle records, whether they’re on paper or digital.
  • Policies and Responsibilities: The business has clear rules about how to manage records and who is responsible for them.
  • Processes and Controls: The business has steps to create, keep, and eventually get rid of records safely.
  • Compliance: The business must follow laws and rules about keeping records.
  • Continuous Improvement: The business should keep getting better at managing their records as technology and rules change.

Authenticity, Reliability, Integrity, and Usability Characteristics

For digitization professionals, the importance of creating and maintaining digital records that stand the test of time is paramount so let’s circle back to the authenticity, reliability, integrity, and usability characteristics of ISO 15489-1:2016. Here’s a breakdown of how the standard guarantees each key characteristic:

  • Authenticity: A record is authentic when it can be proven to be exactly what it claims to be, created by the stated creator, at the claimed time. Think of it as the digital equivalent of a signature or seal of approval.
    • How ISO 15489-1 Guarantees It:
      • Requires documented policies and procedures for record creation and handling
      • Mandates clear identification of authorized record creators
      • Establishes controls against unauthorized additions, deletions, or alterations
  • Reliability: A reliable record is one that accurately represents the business transaction or activity it documents.  It’s like having a trustworthy witness to an event.
    • How ISO 15489-1 Guarantees It:
      • Requires records to be created at or near the time of the event they document
      • Specifies that records should be created by individuals with direct knowledge of the facts
      • Establishes standards for the tools and processes used in record creation
  • Integrity: refers to a record being complete and unaltered throughout its lifecycle. Think of it as maintaining a chain of custody for digital evidence.
    • How ISO 15489-1 Guarantees It:
      • Implements protection against unauthorized alterations
      • Requires documentation of any authorized changes
      • Establishes controls to maintain completeness of records
  • Usability: A usable record is one that can be located, retrieved, presented, and interpreted. It’s about ensuring that records remain accessible and understandable over time.
    • How ISO 15489-1 Guarantees It:
      • Establishes requirements for metadata and classification systems
      • Specifies storage and retrieval systems that maintain accessibility
      • Ensures records remain readable and interpretable throughout their lifecycle

ISO Standards Summary

In summary, ISO 15489-1:2016 equips digitization companies with the tools they need to manage digital records effectively, ensuring compliance, improving efficiency, supporting long-term preservation, enhancing decision-making, and building trust with clients.


Chapter Conclusion

This chapter explored how data lifecycle management and quality assurance work together in the digitization field. We learned that quality isn’t just a final checkpoint but a continuous process throughout data’s journey from creation through destruction. Key concepts like the 1-10-100 Rule showed why early error detection is crucial, while ISO standards provided frameworks for ensuring document authenticity and reliability. Through real-world examples, we’ve seen how quality assurance professionals must balance technical expertise with evolving industry standards to maintain digital information integrity. This comprehensive approach to quality management and data lifecycle stewardship forms the foundation for successful digitization practices in today’s digital landscape.

Key Takeaways

Different data types require specific storage formats and preservation strategies to prevent degradation and ensure long-term accessibility.

Continuous professional development in quality assurance is vital for staying current with evolving industry standards, technologies, and compliance requirements.

Quality in digitization requires both proactive prevention (Quality Assurance) and reactive detection (Quality Control) at every stage of the process to ensure reliable and accurate results.

Each digitization stage—receiving, preparation, scanning, and data entry—includes specific quality checkpoints that must be consistently monitored and measured using established Key Performance Indicators (KPIs) such as error rates and turnaround times.

Common physical issues (e.g., staples, dog ears) and digital issues (e.g., scan lines, double feeds) require standardized detection and resolution procedures to maintain quality.

The Cost Escalation Principle (1-10-100 Rule) demonstrates why early error detection is critical: issues become exponentially more expensive to fix the later they are identified.

ISO standards, particularly ISO 15489-1:2016, provide essential frameworks for ensuring document authenticity, reliability, integrity, and usability throughout the digitization process.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Foundations in Digitization Copyright © by marklamontagne is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.