Data Sourcing/Movement
Business Intelligence is fundamental to the success of business. Understanding why things happen and identifying trends and patterns help organisations to be able to continuously improve what they do through effective measurement and corresponding adjustments for greater chances of success. Earlier in this series we have looked at what business intelligence is, why it is important, implementation and warehouse concepts and running a BI project. Our attention now turns to data sourcing/movement, the process of which is critical to business intelligence success. There are various processes that must be undertaken, according to Joerg Reinschmidt and Allison Francoise of IBM.
One such process is data replication. This is achieved to help management to be able to build and reuse functions. Data replication needs to embrace both control and flexibility. Data integrity must be maintained, while flexibility offers mixing and matching to be able to provide helpful reporting. Data replication also requires ease of maintenance, and must be designed in such a way that performance can be high regardless of the large volumes of data that are being handled. Essential is the fact that data may need to be replicated from a wide variety of different sources, all of which feed in to provide an overall picture. The data replication also has to preserve the business context to ensure relevance. The data replication process involves identifying the source data, defining the target data, mapping source and target, defining the replication mode, scheduling the actual replication, capturing the data from the source, transferring the captured data between source and target and then transforming it to meet the needs of the defined mapping.
The capture process is particularly important because it makes sure that the source data is copied and an accurate record is kept. It needs to maintain consistency, and timestamp the change. In many cases not all of the data at source is needed, and some may be discarded accordingly. Understanding the history of any changes made in the database is also important in case of any problems that arise where the integrity of the data is questioned. There are various levels of capture. Static capture takes the data at one moment in time. Incremental capture captures changes in a data source set and handles the fact that data has a time dependency. Delayed capture looks to capture data at set times.
While data is critical to the process it is not helpful if it has not been cleansed, and this is another important stage in the data sourcing process. Data is extremely valuable to organisations, but as Joerg Reinschmidt and Allison Francoise of IBM explain:
“It has been demonstrated that non-quality data can cause business losses in excess of 20% of revenue and can cause business failure”.
This emphasises the need for clean data, prior to storage in the data warehouse. This requires an assessment to take place of data quality to determine the level of cleansing that might be required and the cost of this process. This is one of the biggest and most important processes in creating helpful business intelligence. Data can have all kinds of problems relating to field names, file names and the actual data. Legacy data can also create complexities that need to be cleansed out. Determining these problems at the outset increases the potential for a successful BI implementation. The general process that is used for data cleansing is to analyse the existing data, condition and standardise it and then integrate it. During this latter stage, data may be purged out if not of sufficient quality.
Data is then transformed, with a goal of enabling it to be able to meet the business requirements that have been defined for it. Format changes that occur at this stage may either be very straightforward or extremely complex, depending on the source data and what it is required to be able to do. Data is selected and then either separation or concatenation occurs. Concatenation is the opposite of separation, allowing data to be drawn together if required. Normalisation or denormalisation may be required to make sure that the data is able to be developed into reporting. Following this, aggregation is a transformation process that draws data from a low level of detail into a summary that is useful for reporting purposes. The data must also be applied and loaded for the process to be completed.
Guide to Business Intelligence (part 1): An Introduction
Guide to Business Intelligence (part 2): Implementations and Warehouse Concepts
Guide to Business Intelligence (part 3): Project
Guide to Business Intelligence (part 4): Data Sourcing/Movement
Guide to Business Intelligence (part 5): Solutions Architecture
Paula Newton is a business writer, editor and management consultant with extensive experience writing and consulting for both start-ups and long established companies. She has ten years management and leadership experience gained at BSkyB in London and Viva Travel Guides in Quito, Ecuador, giving her a depth of insight into innovation in international business. With an MBA from the University of Hull and many years of experience running her own business consultancy, Paula’s background allows her to connect with a diverse range of clients, including cutting edge technology and web-based start-ups but also multinationals in need of assistance. Paula has played a defining role in shaping organizational strategy for a wide range of different organizations, including for-profit, NGOs and charities. Paula has also served on the Board of Directors for the South American Explorers Club in Quito, Ecuador.