"

05. The Nature and Evaluation of IT General Controls

05.06. Computer Operations Management

Credit: Photo Of People Near Wooden Table by Fauxels, used under Pexels License.

Briefly reflect on the following before we begin:

  • Why is data backup and restoration crucial in computer operations management?
  • How can auditing techniques improve computer operations management?
  • What are the challenges faced in ensuring effective computer operations management?

Computer operations management, a critical IS component, involves overseeing the day-to-day operations of computer systems and ensuring their optimal performance. This section will discuss data backup and restoration, a vital part of computer operations management. We will also emphasize the importance of reliable data backup strategies and highlight how they are essential for data integrity and business continuity.

Next, we will focus on monitoring and controlling system performance by exploring the tools and techniques used for performance monitoring. We also discuss how to identify and address performance issues. Compliance and operational reporting form another essential aspect of computer operations management. As such, we will delve into the regulatory requirements and standards organizations must adhere to and discuss how compliance is monitored and reported, including examining the role of internal controls and audit trails in ensuring compliance. Lastly, we will review the auditing techniques for computer operations management, which includes assessing the effectiveness of backup procedures, performance monitoring systems, and compliance processes.

Computer Operations Management

Computer Operations Management encompasses various processes to ensure the efficient and secure functioning of an organization’s IT systems. This area includes multiple functions, from effectively running servers and networks to managing data storage and processing. Computer operations management is both technical and strategic, involving technical aspects like hardware and software maintenance. In contrast, the planning and strategic nature ensures that these systems align with the organization’s goals and objectives. This dual nature of computer operations ensures that IT resources are utilized effectively and efficiently, maximizing their contribution to the organization’s operations. The primary goal is to maintain the smooth and efficient operation of all computer-related activities, which is fundamental for the overall functionality of an organization’s IT infrastructure. Efficient management of these systems is crucial to avoid downtime, leading to operational disruptions, loss of productivity, and financial losses. Moreover, effective computer operations management is essential to ensure data integrity and security, both critical in protecting an organization’s information assets. In this section, let’s explore some of the operational aspects of computer operations.

System performance monitoring is an ongoing process that involves tracking and assessing the performance of an organization’s IT systems. It encompasses various activities, including collecting performance metrics, analyzing data, and proactively identifying potential issues. Organizations typically use specialized monitoring tools and software to monitor system performance continuously. The process begins by establishing performance metrics tailored to the organization’s needs. These metrics might include response times, server load, network latency, and resource utilization. Automated monitoring tools are configured to collect data and trigger alerts when predefined thresholds are breached. IT teams are responsible for analyzing this data, identifying bottlenecks or anomalies, and taking corrective actions promptly. Regular reports on system performance are generated to provide insights into trends and improvements.

The next component of computer operations management is data backup and recovery processes, which involve the systematic and scheduled creation of copies of critical data and applications to safeguard against data loss due to hardware failures, human errors, or disasters. Organizations typically implement a tiered backup strategy, with data backed up on-site and off-site. Automated backup solutions are used to ensure data consistency and reliability. Regular backup tests are conducted to verify the integrity of backups and the ability to restore data in case of an incident. In addition to backup, organizations also establish comprehensive disaster recovery plans that outline the steps to be taken in case of data loss or system failure. These plans include procedures for data restoration, system recovery, and communication with stakeholders.

Hardware maintenance and management are essential to ensure IT infrastructure’s reliability and availability, including servers, storage devices, networking equipment, and other hardware components. Organizations typically have a well-defined schedule for hardware maintenance, including routine inspections, cleaning, and firmware updates. These activities are crucial in preventing hardware failures and ensuring optimal performance. Hardware inventories are maintained to track assets, configurations, and warranty information. Additionally, hardware redundancy and failover mechanisms are implemented to minimize the impact of hardware failures on operations.

Moreover, keeping software and applications updated is vital for security and performance. Organizations establish a systematic approach to software updates and patch management to address vulnerabilities and improve system stability. This process begins with the identification of software that requires updates or patches. Automated tools are often used to scan the environment for vulnerable software versions. Once identified, updates and patches are tested in a controlled environment before deployment to production systems. Regular patch management ensures that critical security vulnerabilities are addressed promptly, reducing the risk of security breaches. It also helps maintain compatibility and stability across the IT ecosystem of the organization.

Incident and problem management is a structured approach to addressing and resolving issues that may impact the organization’s IT operations. This control encompasses the entire incident lifecycle, from detection and reporting to resolution and analysis. When an incident occurs, it is logged and categorized based on its impact and urgency. An incident response team is responsible for promptly addressing and mitigating the issue. Root cause analysis is conducted to identify the underlying problem and prevent future occurrences. Additionally, problem management focuses on proactively identifying recurring issues and addressing their root causes to prevent future incidents. Incident and problem management tools and well-defined processes are critical components of this control.

Similarly, environmental control management ensures that the physical environment in which IT systems operate is conducive to proper functioning. This includes factors such as temperature, humidity, and physical security. Organizations implement environmental monitoring systems that continuously track conditions within data centers and server rooms. Automated alerts are triggered if conditions fall outside predefined thresholds, allowing swift corrective actions. Access controls and surveillance systems are also implemented to protect against unauthorized access to sensitive areas. Fire suppression and disaster recovery measures are also implemented to safeguard against environmental threats.

Lastly, capacity planning and scalability control involve assessing IT systems’ current and future resource requirements to ensure they can handle increasing workloads and growth. IT teams analyze usage patterns, performance metrics, and business projections to determine when and how resources should be scaled. This includes considerations for additional hardware, software licenses, and network capacity. Scalability control also encompasses load and capacity testing to assess systems’ performance under different stress levels. It helps organizations decide when and how to scale their infrastructure to meet business demands.

Relevant Risks

In IS computer operations management, organizations face several primary risks that can significantly impact their operations and strategic objectives. Understanding these risks is vital for effective risk management and ensuring the IS operates smoothly and efficiently. Let’s consider some of these risks.

 

Table: Risks Relevant to IS Computer Operations Management
Risk Description Example
Inadequate System Performance Monitoring Failure to continuously monitor system performance metrics may result in undetected bottlenecks or issues affecting system efficiency. A company neglects to monitor server CPU utilization, leading to a sudden spike in usage during peak hours, causing system slowdowns and affecting customer experience.
Insufficient Data Backup and Recovery Procedures Lack of robust data backup and recovery processes may lead to data loss, prolonged downtime, and compromised data integrity. An organization experiences a ransomware attack and discovers that its data backups are outdated and unable to fully restore lost data, resulting in significant data loss and recovery costs.
Neglected Hardware Maintenance Failure to perform regular hardware maintenance may result in hardware failures, system downtime, and increased operational costs. A critical server experiences a hardware failure due to dust accumulation and overheating, causing an unexpected system outage and affecting customer-facing services.
Poorly Managed Software Updates and Patching Ineffective software updates and patch management may leave systems vulnerable to security breaches and stability issues. A company fails to apply critical security patches promptly, allowing cybercriminals to exploit a known vulnerability, leading to a data breach.
Inefficient Incident and Problem Management Inadequate incident and problem management processes may result in prolonged disruptions, unresolved issues, and a lack of proactive problem resolution. An IT team fails to identify the root cause of recurring server crashes, leading to repeated incidents and prolonged downtime for a critical application.
Weak Environmental Controls Inadequate environmental monitoring and control systems may expose IT infrastructure to physical threats, such as temperature fluctuations or unauthorized access. Lack of temperature monitoring in a server room results in overheating, causing hardware failures and costly replacements.
Inadequate Data Center Capacity Planning Poor capacity planning may lead to resource constraints, performance bottlenecks, and an inability to accommodate increasing workloads. A company’s data center reaches its resource limits, causing slow application response times and rendering the infrastructure incapable of handling additional users.
Inadequate Disaster Recovery Preparedness Failure to establish and test disaster recovery plans may result in prolonged recovery times and significant data loss during a disaster. A natural disaster strikes, and an organization realizes its disaster recovery plan is outdated and lacks essential components, leading to prolonged service disruption and data loss.
Ineffective Scalability Strategies Inefficient scalability strategies may hinder an organization’s ability to adapt to changing workloads and accommodate business growth. An e-commerce platform experienced a surge in traffic during a holiday sale, causing site crashes and lost sales due to inadequate scalability planning and resource allocation.

Effectively managing these risks involves implementing robust system performance monitoring, data back-up and recovery, hardware maintenance and management, patch management, continuous training and awareness programs, and adapting to the evolving IT environment. Mitigating these risks is essential for maintaining the efficiency and effectiveness of an organization’s information systems operations and regulatory compliance.

Relevant IT General Controls Objectives and Activities

In computer operations management, a subset of IT General Controls (ITGC), several crucial controls ensure effective and seamless management of information systems. These controls are vital in aligning existing IS with business objectives, managing risks, and ensuring successful outcomes. Let’s consider the primary ITGC objectives for this category.

System Performance Monitoring Control

The primary objective of this control is to ensure that system performance is consistently monitored and potential performance issues are proactively identified and addressed to maintain optimal system functionality. It focuses on the need to continuously assess and manage the performance of an organization’s IT systems. Organizations can detect deviations from expected norms by consistently monitoring performance metrics such as response times, server load, and resource utilization. Proactively identifying performance issues allows for timely corrective actions, preventing service disruptions and ensuring that IT systems operate at their peak efficiency.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Implement a real-time performance monitoring system that continuously tracks critical performance metrics such as CPU utilization, memory usage, and network latency. This control lets IT teams detect performance issues and take immediate corrective actions.
  • Establish predefined performance thresholds and alerting mechanisms. When performance metrics exceed these thresholds, automated alerts notify IT staff of potential issues. This control helps in proactively addressing performance bottlenecks.
  • Utilize historical performance data analysis to identify trends and patterns in system behaviour. Organizations can predict potential performance issues by analyzing historical data and planning for capacity upgrades or optimizations.

Data Backup and Recovery Control

This control aims to establish and maintain a robust data backup and recovery process that ensures the availability and integrity of critical data, applications, and systems in the event of data loss or system failure. Data is valuable for any organization; data loss can have severe consequences. This control objective emphasizes the importance of a well-defined and reliable data backup and recovery process. It ensures that data is regularly backed up, on-site and off-site and recovery procedures are tested and documented. In the event of data loss or system failure, this control objective ensures that critical data can be restored swiftly, minimizing downtime and potential data loss.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Implement a regular and automated data backup schedule, including full and incremental backups. Ensure backups are conducted at specified intervals, with data integrity checks to confirm successful backups.
  • Maintain off-site data storage for backups to safeguard against on-site disasters. Ensure backups are securely transferred and stored in geographically separate locations to protect data from physical threats.
  • Regularly test data backups by conducting recovery drills. Verify that critical data and systems can be successfully restored from backups. This control ensures that the backup and recovery process is reliable and effective.

Hardware Maintenance and Management Control

The purpose of this control is to manage and maintain hardware components effectively, ensuring their reliability, availability, and performance through routine inspections, maintenance, and upgrades as needed. Hardware components, including servers, networking equipment, and storage devices, are the foundation of IT infrastructure. This control objective emphasizes the need for routine maintenance and management to ensure hardware reliability and optimal performance. By conducting regular inspections, cleaning, and firmware updates, organizations can prevent hardware failures and extend the lifespan of their equipment. This control objective also encourages the establishment of hardware inventories to track assets and warranties, ensuring timely replacements and upgrades when necessary.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Establish a schedule for routine hardware inspections, including checks for dust buildup, loose connections, and physical damage. This control helps identify potential hardware issues before they lead to failures.
  • Implement a process for applying firmware and driver updates to hardware components. Ensure that these updates are tested in a controlled environment before deployment to production systems.
  • Maintain an accurate inventory of hardware assets, including make, model, serial number, and warranty information. This control facilitates timely replacements, warranty claims, and hardware upgrades.

Software Updates and Patch Management Control

The primary objective of this control is to systematically identify, evaluate, and apply software updates and patches to mitigate security vulnerabilities, enhance system stability, and ensure software compatibility across the IT environment. Software vulnerabilities are a prime target for cyberattacks, making timely updates and patch management crucial. This control objective focuses on identifying and assessing software vulnerabilities regularly. Organizations should establish processes to apply patches and updates promptly after thorough testing to prevent exploitation. It also emphasizes the importance of maintaining software compatibility across the IT ecosystem to avoid compatibility issues that could disrupt operations.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Regularly conduct vulnerability assessments to identify software vulnerabilities within the organization’s IT environment. This control helps in prioritizing patch management efforts.
  • Before deploying software patches and updates to production systems, establish a testing environment where patches can be applied and tested for compatibility and stability. Ensure that patches do not introduce new issues.
  • Implement a change management process that includes approval, documentation, software updates and patch tracking. This control ensures that all changes are well-documented and can be audited.

Incident and Problem Management Control

The objective of this control is to establish a structured approach for the detection, reporting, investigation, resolution, and analysis of incidents and problems to minimize disruptions, identify root causes, and prevent recurrence. Incidents and problems can disrupt operations and impact service quality. This control objective emphasizes the need for a well-structured incident and problem management process. It includes clear procedures for detecting, reporting, and resolving incidents. Root cause analysis is conducted to identify underlying problems and prevent recurring incidents. This proactive approach helps organizations minimize disruptions, enhance service quality, and continuously improve their IT operations.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Implement a system for logging and categorizing incidents based on their impact and urgency. This control ensures that incidents are appropriately documented and prioritized for resolution.
  • Conduct thorough root cause analysis for incidents and problems to identify underlying issues. Document the findings and implement corrective actions to prevent recurrence.
  • Develop and maintain an incident response plan that outlines roles and responsibilities, escalation procedures, and communication protocols during a significant incident. This control ensures a coordinated and effective response to incidents.

Environmental Controls

The goal of this control is to maintain a controlled physical environment that ensures the proper functioning of IT systems, including monitoring and maintaining temperature, humidity, physical security, and protection against environmental threats. The physical environment in which IT systems operate is crucial for their reliability and performance. This control objective monitors and maintains ecological factors such as temperature, humidity, and physical security within data centers and server rooms. It also includes measures to protect against environmental threats like fires and floods. By maintaining a controlled environment, organizations can minimize the risk of hardware failures and ensure the uninterrupted operation of IT systems.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Deploy environmental monitoring systems that continuously track temperature, humidity, and smoke detection within data centers and server rooms. Set up alerts to notify personnel of any deviations from acceptable ranges.
  • Implement strict physical access controls, including biometric authentication, access logs, and security cameras, to prevent unauthorized access to critical infrastructure areas.
  • Ensure that fire suppression systems, sprinklers, and disaster recovery plans are in place and regularly tested. This control safeguards against environmental threats like fires and floods.

Capacity Planning and Scalability Control

The primary purpose of this control is to assess current and future resource requirements of IT systems, ensuring they are adequately provisioned, scalable, and capable of accommodating increasing workloads and business growth. Capacity planning and scalability control are essential for organizations to meet their evolving IT needs. This control objective involves assessing current resource utilization, analyzing performance trends, and projecting future requirements. It ensures organizations have the necessary hardware and software resources to handle increasing workloads and business growth. By planning for scalability, organizations can avoid resource constraints that could hinder their ability to adapt to changing demands and maintain optimal system performance.

Examples of ITGC activities that may facilitate the achievement of this objective include the following:

  • Utilize performance modelling and predictive analytics to forecast future resource requirements. This control helps in proactively identifying when additional capacity is needed.
  • Conduct load testing to assess how systems perform under different stress levels. This control helps determine the optimal resource allocation and scalability requirements.
  • Implement automated resource scaling mechanisms that dynamically allocate additional resources (e.g., CPU, memory, storage) based on demand. This control ensures that systems can adapt to changing workloads efficiently.

Summarized Audit Program

As discussed in Chapter 3, an audit program is a structured and comprehensive plan that outlines the procedures and activities to assess the effectiveness of an organization’s control environment. Based on the core concepts of computer operations management ITGCs discussed above, presented below is a summarized audit program highlighting select relevant risks, corresponding ITGCs, and potential ways (audit procedures) to assess the operating effectiveness of such ITGCs. Please note that this is not an exhaustive audit program covering all applicable risks and controls and is provided for your reference only.

 

Table: Summarized Audit Program
Detailed Description of the Risk and Its Impact Relevant IT General Control Activity Detailed Test of Controls Audit Procedure
System downtime can disrupt business operations and lead to loss of productivity and revenue. Regularly monitor and maintain computer systems to ensure optimal performance, with daily system checks and monthly maintenance activities. Responsibilities include identifying and resolving potential issues before they lead to system downtime. Review 40 daily system monitoring logs and two recent monthly maintenance reports. Use inspection and analysis techniques to assess the effectiveness of the monitoring and maintenance activities. Verify that system checks are thorough and consistent and that maintenance activities address key performance metrics.
Data loss due to inadequate backup processes can result in significant operational setbacks and data recovery costs. Implement and regularly test data backup and recovery procedures, with backups conducted daily and recovery tests performed quarterly. Responsibilities include ensuring that backups are complete and recovery processes are effective. Inspect 40 daily backup logs and two recent quarterly recovery test reports. Use inspection and reperformance techniques to confirm that backups are regularly conducted and that recovery tests validate the effectiveness of the backup procedures. Check for comprehensive backups and successful recovery in test scenarios.
Inadequate hardware maintenance can lead to equipment failures and system unreliability. Regular hardware maintenance, including physical inspections and repairs, is performed monthly. Responsibilities involve monitoring hardware health and scheduling necessary repairs or replacements. Review 2 recent monthly hardware maintenance reports. Use inspection techniques to assess the thoroughness of hardware maintenance and the resolution of identified issues. Determine that hardware is adequately maintained and that problems are promptly addressed.
Software malfunctions due to outdated or unpatched software can compromise system security and functionality. Regular software updates and patch management, with updates applied and reviewed monthly. Responsibilities include monitoring software versions, using necessary updates, and ensuring software security. Examine two recent monthly software update reports. Use inspection and analysis techniques to verify that software is up-to-date and that patches are applied promptly. Assess the currency and security of the software in use.
Inadequate monitoring of system performance can lead to undetected inefficiencies and overloading. Continuous monitoring of system performance metrics, conducted daily, focusing on promptly identifying and resolving performance issues. Review 40 daily system performance monitoring records. Use analysis techniques to evaluate the efficiency and effectiveness of the performance monitoring process. Check for consistent monitoring and timely resolution of any performance issues.
Outdated or unpatched software creates security vulnerabilities. Implement a rigorous software update and patch management process. Responsibilities include monitoring for software updates, testing patches, and ensuring timely deployment. Frequency: Software updates and patch deployments are reviewed monthly. Review two monthly patch management reports. Use inspection and analysis techniques to confirm that software is regularly updated and patches are applied promptly.
Failure to monitor and review user access rights can lead to inappropriate or excessive access. Review and update user access rights regularly to align with job roles and responsibilities. Responsibilities include conducting access reviews and adjusting rights as needed. Frequency: User access reviews are conducted quarterly. Inspect two quarterly user access review reports. Use inspection and confirmation techniques to verify that access rights are appropriate and that reviews are conducted regularly.

In the Spotlight

For additional context on auditing backup and recovery components of IS operations, please read the article “IS Audit Basics: Backup and Recovery”[opens a new tab].

Cooke, I. (2018). Is audit basics: Backup and recovery. ISACA Journal, 1. https://www.isaca.org/resources/isaca-journal/issues/2018/volume-1/is-audit-basics-backup-and-recovery

 

Knowledge Check

 

Review Questions

  1. What is the primary purpose of system performance monitoring in Computer Operations Management?
  2. Why is it essential for organizations to maintain both on-site and off-site data backups?
  3. What are the critical components of effective hardware maintenance and management in Computer Operations Management?
  4. How does patch testing contribute to software updates and patch management control?

 

Mini Case Study

You are an IT auditor assigned to assess a mid-sized financial institution’s computer operations management practices. The organization relies heavily on its IT infrastructure to process transactions, manage customer data, and provide online banking services. You identify a potential hardware maintenance and management risk during your preliminary assessment.

The organization needs a systematic hardware maintenance and management program. Hardware components, including servers and networking equipment, are aging, and there needs to be a documented routine maintenance or inspection schedule. This situation poses a potential risk of hardware failures that could disrupt critical banking operations.

Required:

  1. Identify the specific IT General controls that should be in place to mitigate the risk associated with inadequate hardware maintenance.
  2. Develop an audit procedure that you would use to assess the organization’s current hardware maintenance practices.
definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Auditing Information Systems Copyright © 2024 by Amit M. Mehta is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.