Section 4: The Data Lifecycle
Dr. Sinéad McElhone; Sherri Hannell; and Noah James
Section Overview
This section aims to introduce the concept of the data lifecycle and its purpose, as well as provide an overview of each of the six stages.
Section Objectives
By the end of this section, you will be able to:
- Understand what is meant by the data lifecycle;
- List the six stages of the data lifecycle;
- Describe the goals of data lifecycle management; and
- Reflect on how the management of data through its lifecycle is a key component of data literacy.
Test Your Knowledge
Complete the following activity to assess how much you already know about the content that will be covered in this section.
The Data Lifecycle
The data lifecycle refers to the sequence of events that data goes through from its initial creation or capture to its eventual archiving or destruction at the end of its usefulness. The data lifecycle serves as a foundation on which data management practices are based. It provides a phased approach with a logical grouping of activities to develop and deliver data management operations.
Although the steps involved in the data lifecycle can vary depending on the source referenced or whether the data lifecycle includes analytics, there are usually six stages.
The Six Stages of the Data Lifecycle
As outlined in Section 1: Introduction to Data and Data Literacy, the six stages are as follows:
Having a documented data lifecycle process is key to ensuring that an organization practices effective data governance. Large volumes of data continue to grow due to generation by an increasing number of devices and applications. Storage costs and compliance issues exert pressure on organizations to destroy data that is no longer needed. Proper oversight of data through its lifecycle is essential to ensure its usefulness, minimize errors, and ensure regulatory compliance.
Goals of Data Lifecycle Management
Data lifecycle management involves providing planning, control, and support to manage operations across the data lifecycle while dealing with issue management and resolution. Data lifecycle management encompasses requirements, change, and data services management and may span different applications, systems, databases, and storage. Data lifecycle management ensures that the life of data aligns with other organizational lifecycles like technology, decision-making, and project management lifecycles. Data lifecycle management is triggered when the need for data is identified and goes all the way until retirement of data. Benefits to a managed data lifecycle are that it enables a structured flow for data. The main goals of data lifecycle management include:
- Integrity: Maintaining the integrity of data is a key outcome of a structured data lifecycle. Data integrity means that the data is ensured to be the most up to date, best quality data for use. Without assurances of integrity, stale or the wrong version of data could be used;
- Security: From creation to deletion, security of data is imperative. Protocols for managing the security of data keeps the data safe from malicious or inappropriate access, addresses privacy protocols, and ensures data remains uncorrupted; and
- Access: The implementation of proper access controls ensures that the right user has access to the right data for the right amount of time. There may be issues pertaining to privacy regarding the use of data, and access controls ensure that data is appropriately made available.
Data Lifecycle Management Control Frameworks
Organizations use multiple layers of controls to ensure efficient and consistent management practices throughout the data lifecycle. Controls enforce and formalize roles, structures, rules, and actions related to data management. Controls can be both internal and external in nature. For example, external controls manage data shared or collected from other organizations, and internal controls are created to proscribe how data should be managed within an organization.
Internal controls include the creation of policies and standards, guidelines, and procedural manuals:
- Policies: Policy statements are organizational-wide, high-level statements that guide and control data asset-related management, actions, and decisions. Policies focus on desired outcomes, but do not specify implementation details. Compliance with policies is mandatory and may be further supported through guidelines or standards;
- Standards: Standards have enterprise-wide applicability and consist of mandatory actions, rules, or controls designed to support policy statements. Standards for data asset management may specify processes, roles, accountabilities, techniques, tools, etc;
- Guidelines: Guidelines for data asset management consist of recommended, non-mandatory controls, accountabilities, and recommendations from an organizational perspective, not by area or function within an organization. Guidelines are often developed by an enterprise or cross-functional Data Governance Office or Committee; and
- Procedural Manuals: Procedural manuals are data asset management controls that are a collection of policy statements and standards that are specific to a business function or a particular data asset. Procedural manuals consist of a collection of procedures that provide step-by-step instructions to assist staff in implementing various policies, standards, and guidelines. Procedures assign responsibilities for implementing each step and help ensure that a consistent standardized process is followed each time.
External controls include agreements, including legally binding arrangements, and reference manuals:
- Agreements: Agreements are negotiated and possibly legally binding arrangements that organizations make with external parties that detail the terms and conditions that apply to both the organization and external party entering into the agreement; and
- Reference Manuals: Reference manuals are a handbook of detailed instructions for external parties to follow when submitting data to, or accessing data from, a partner organization.
Additional Operational Support Artifacts and Resources for Data Lifecycle Management
In addition to data lifecycle management control frameworks, the creation and implementation of operational support resources assist an organization to effectively manage data throughout its lifecycle. In addition to policies, standards, guidelines, procedural and reference manuals, and agreements, the following artifacts can be created to support an organization’s data management lifecycle:
Knowledge management supports data lifecycle management by ensuring that the artifacts that support the lifecycle are available and accessible. It is a discipline that promotes an integrated approach to identifying, capturing, evaluating, retrieving, and sharing all of an organization’s data and information-related assets. These assets may include databases, documents, policies, procedures, products, methodologies, artifacts, and previously uncaptured expertise and experience in individual workers. Knowledge management ensures that data and information-related assets are readily accessible, up to date, consistent, and actively managed for use by staff. It ensures that all staff of an organization have seamless access to the information they need to appropriately fulfill the responsibility of their roles.
Summary
The data lifecycle is the process that helps organizations manage the flow of data, from creation to destruction. It is imperative that organizations appropriately manage data to guarantee its integrity at every stage of the lifecycle. There are six stages in the data lifecycle: create, store, use, share, archive, and destroy. Data management control frameworks, artifacts, and resources assist organizations to manage their data throughout its lifecycle.
Test Your Knowledge
Complete the following activity to assess how much you learned about the content that was covered in this section.
The sequence of events that data goes through from its initial creation or capture to its eventual archiving or destruction at the end of its usefulness.
The business function of planning for, controlling, and delivering data (Fircan, 2021b).
A discipline which provides the necessary policies, processes, standards, roles, and responsibilities needed to ensure that data is managed as an asset (Fircan, 2021b).
A named collection of related data elements that is formally managed as a single unit. They may be a collection of facts represented as text, numbers, graphics, images, sound, or video, and are the raw material from which information can be derived and decisions can be made.