AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
ABSTRACT
Large data volumes, widely distributed data sources, and multiple stakeholders characterize typical e-business settings. Mobile and wireless technologies have further increased data volumes, further distributed the data sources, while permitting access to data anywhere, anytime. Such environments empower and necessitate decision-makers to act/react quicker to all decision-tasks including mission-critical ones. Decision-support in such environments demands efficient data quality management. This paper presents a framework for managing data quality in such environments using the information product approach. It includes a modeling technique to explicitly represent the manufacture of an information product, quality dimensions and methods to compute data quality of the product at any stage in the manufacture, and a set of capabilities to comprehensively manage data quality and implement total data quality management. The paper also posits the notion of a virtual business environment to support dynamic decision-making and describes the role of the data quality framework in this environment.
Keywords: data quality; information quality; total data quality management; information product; virtual business environments
**********
INTRODUCTION
Organizations are forced to manage larger volumes of data as a consequence of e-business and the technology advances that support it. The strong push to gain business intelligence and competitive advantage has increased the number of different ways data is analyzed, and the variety and frequency of decision-tasks performed with it. The advent and widespread use of wireless technology/devices within the mobile-business arena promise to further increase it. Decision-makers are forced to become more responsive and make quicker and more dynamic decisions because of having access to data anywhere, anytime. A decision-maker uses the same data for different decision-tasks besides sharing the data and decision-outcomes with several others. This creates dynamic decision environments characterized by data at different levels of granularity, high frequency and a large variety of decision tasks, and multiple stakeholders (data providers, decision-makers, and data custodians). Supporting decision-making in such environments in the face of increasing data volumes demands efficient and proactive data quality management. The business-webs and partnerships formed to support e-business activities create a widespread distribution of resources spanning multiple organizations. The decision-maker has no control over these data sources. The number and distribution of such data sources makes it difficult to guarantee data quality. Efficient data quality management must include informing the decision-maker about the quality of the data being used and/or providing him/her with the ability to gauge it. The decision-maker can then decide if the quality is acceptable for the decision-task at hand and evaluate if alternate data and/or sources are more acceptable along with the associated risks/benefits.
Although useful, conventional approaches to data quality management such as data cleansing (Hernadez & Stolfo, 1998), data tracking and statistical process control (Redman, 1996), data source calculus and algebra (Lee, Bressen, & Madnick, 1998; Parssian, Sarkar, & Jacob, 1999), data stewardship (English, 1999), and dimensional gap analysis (Kahn, Strong, & Wang, 2002; Lee, Strong, Kahn, & Wang, 2002) do not provide a systematic approach for managing data quality. In this paper, an alternative approach based on the notion of an information product (IP) is developed for managing data quality in dynamic decision environments. The IP approach has gained considerable acceptance in organizations for several reasons. First, manufacturing an IP is akin to manufacturing a physical product. Raw materials, storage, assembly, processing, inspection, rework, and packaging (formatting) are all applicable. Typical IPs (such as management reports, invoices, etc.) are "standard products" and hence can be "assembled" in a production line. Components and/or processes of an IP may be outsourced to an external agency (ASP), organization, or a different business-unit that uses a different set of computing resources. Second, IPs, like physical products, can be "grouped" based on similar characteristics and common data inputs permitting the "group" to be managed as a whole. In other words, multiple IPs may share a subset of processes and data inputs, and may be created using a single "production line" with minor variations that distinguish each IP. Finally, proven methods for TQM (such as quality at source and continuous improvement) that have been successfully applied in manufacturing can be adapted for total data quality management. To exploit these properties of IPs and manage data quality using the IP approach, mechanisms for systematically representing the manufacturing stages and evaluating data quality at each stage are essential. To understand the implications of poor-quality data for total data quality management, it is necessary to evaluate the impact of delays in one or more manufacturing stages, trace a quality problem in an IP to the manufacturing stage(s) that may have caused it, and predict the IP(s) impacted by quality issues identified at some manufacturing step(s). The IP approach facilitates a comprehensive, intuitive, and visual representation of the manufacture of an IP.
In this paper we present an IP-based framework for data quality management. We first describe a set of modeling constructs to systematically represent the manufacture of an IP. The representation is called an information product map or IPMAP. The IPMAP allows the decision-maker to visualize not only the widespread distribution of data and other resources but also the flow of data elements and the sequence by which these data elements are processed to create the required IPs. Combined with the metadata and the capabilities for total data quality management that are part of the framework, the IPMAP permits decision-makers to understand the sources, processes, systems, business units, and organizations involved in the creation of the IP. We then describe the metadata including quality dimensions associated with the constructs and show how the IPMAP and its metadata can be used to evaluate data quality. We further develop a set of capabilities to compute time-to-deliver, trace quality problems to manufacturing stages, and recognize IPs affected by poor-quality data. These support total data quality management and are built on the IPMAP using graph-based operations that are shown to be correct. We finally propose a virtual business environment (VBE) for supporting dynamic decision-making and show how the IPMAP and data quality management fit into the VBE. This combination allows decision-makers to not only understand, and evaluate the data (information products) used in the decision-task, but also understand and evaluate its quality.
The next section summarizes the relevant literature on data quality using the IP approach. The IP Framework section introduces the modeling constructs of the IPMAP, including the capabilities defined on the IPMAP and the quality dimensions used to evaluate data quality. The VBE and the role of the IPMAP in a VBE are then described. Finally, conclusions and directions of further research are presented.
RELEVANT LITERATURE
The framework for data quality management described in this paper consists of a modeling scheme to represent the manufacture of an IP, quality dimensions for evaluating its quality, and capabilities for managing data quality using the IPMAP. In this section we present the key literature on information manufacture to differentiate our work. We compare the IPMAP with related modeling techniques--workflow models and dataflow diagrams. We then present the relevant literature on quality dimensions and differentiate our contribution.
Although there have been several attempts to develop models of an information manufacturing system, these do not offer a systematic representation of all operations involved. Further, the constructs offered are not specific enough and are often insufficient to capture the manufacturing details (Wand & Wang, 1996; Wang, Lee, Pipino, & Strong, …