12.7. Data Analysis Projects
Data analysis projects should start with a clear vision with business-focused objectives. When senior executives can see objectives illustrated in potential payoff, they will be able to champion the effort, and having an executive champion is a key success factor. Focusing on business issues will also drive technology choice, with the firm better able to focus on products that best fit its needs. Once a firm has business goals and hoped-for payoffs clearly defined, it can address the broader issues needed to design, develop, deploy, and maintain its system. Data analysis projects should consider the following:
Data relevance | What data is needed to meet our current and future goals? |
Data sourcing | Where can this data be obtained from? Is it available via our internal systems? Via third-party data aggregators? Via suppliers or sales partners? Do we need to set up new systems, surveys, and other collection efforts to acquire the data we need? |
Data quantity | How much data is needed? |
Data quality | Can our data be trusted as accurate? Is it clean, complete, and reasonably free of errors? How can the data be made more accurate and valuable for analysis? Will we need to ‘scrub,’ calculate, and consolidate data so that it can be used? |
Data hosting | Where will the systems be housed? What are the hardware and networking requirements for the effort? |
Data governance | What rules and processes are needed to manage data from its creation through its retirement? Are there operational issues (backup, disaster recovery)? Legal issues? Privacy issues? How should the firm handle security and access? |
Data Governance
Data governance is essential in organizations to ensure that the data being used is reliable. Data governance is really the commitment by an organization to ensuring data and information meet the characteristics of being valuable in order for the organization to meet its objectives. This can include strategies, processes and standards. An issue occurs when information is not valuable and cannot be relied on. Bad data can be due to duplication or incomplete information stored. As well, there are issues when legacy systems try to communicate with each other.
Legacy systems are outdated information systems that were not designed to share data, aren’t compatible with newer technologies, and aren’t aligned with the organization’s current business needs. The problem can be made worse by mergers and acquisitions, especially if a company depends on operational systems that are incompatible with its partner. The elimination of incompatible systems is not just a technical issue. A company might be under an extended agreement with different vendors or outsourcers, and breaking a contract or invoking an escape clause can be costly.
Another problem when turning data into information is that most transactional databases are not set up to be simultaneously accessed for reporting and analysis. When a customer buys something from a cash register, that action may post a sales record and deduct an item from the firm’s inventory. In most TPS systems, requests made to the database can usually be performed pretty quickly—the system adds or modifies the few records involved and it’s done—in and out in a flash. But if a manager asks a database to analyze historical sales trends showing the most and least profitable products over time, they may be asking a computer to look at thousands of transaction records, comparing results, and neatly ordering findings. That’s not a quick in-and-out task, and it may very well require significant processing to come up with the request. Do this against the very databases you’re using to record your transactions, and you might grind your computers to a halt.
“11.5 Data Warehouses and Data Marts” and “11.4 Data Rich, Information Poor” from Information Systems by Minnesota Libraries is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.