Business intelligence (BI) is a process for analyzing data and delivering actionable information that helps all kinds of stakeholders in an organization to make informed business decisions.
What is BI Testing?
BI testing encompasses all testing activities within a BI landscape. In a BI landscape data is gathered from various sources and combined into a datawarehouse or datalake. The stored data is then used to gather information in some form of a report. Since a datalake is one big pool of unstructured data, testing in these environments is usually done by data engineers or data scientists who make use of the data directly from the data lake without any conversions. BI Testers focus mainly on structured environments such as datawarehouses.
Datawarehousing is the act of gathering data from various sources into a datawarehouse in order to use this data for business purposes, such as trends, insights in profits/costs or to make predictions about the future. A datawarehousing BI landscape consists of various stages, see figure 1.
Source: Data can come from all sorts of sources, most often it will be from various databases that are in use throughout the company, but can also include separate files, data coming from managed file transfers or from other databases through API's, etc.
Datawarehouse (Extract-Transform-Load ‘ETL’): Data gets extracted from the sources and stored in a datawarehouse. In general, this data is first placed in a staging environment. For the data to get loaded into the actual datawarehouse it will most likely need to be cleaned and transformed in order to be tied together to create a single point of truth. In some environments the data goes to a second staging area after the transformations before moving on to the actual datawarehouse in production. Though the more common approach is to do the cleanup and transformations while moving the data from the source staging to the final datawarehouse.
Data marts: Data marts are designed to show smaller subsections of the data stored in the data warehouse. This is usually done for performance and/or security reasons. The data can undergo further transformations or get enriched with more data to get more suitable information.
Dashboards/Reports: Interesting data analytics will be presented to people and departments in the form of dashboarding or reports. Tools designed for dashboarding/reporting are directly connected to the data in the datawarehouse. Authorization rules will most likely be applied here as the users generally access the data through these tools.
Data management/audit trail: For security and privacy purposes data management and auditing should be in place. Think of the GDPR rules stating that for instance a user has the right “to be forgotten”, this means that data needs to be traceable across the stages.
|Note that not all datawarehousing landscapes will look the same. Not all environments support a second ‘common staging’ area and not all clients make use of data marts. The datawarehouse can also be a in the form of a Data Vault. And ETL is sometimes reversed as ELT where data is first loaded into the datawarehouse and then transformed. This building block however, will focus on the landscape described in figure 1 but it’s important to understand that variations to this landscape are possible.
Testing in a BI landscape
Testing in a BI landscape is done mostly based on the various environments that are also mentioned in figure 1.
Staging area: Is the data complete? Are there no records doubled? Is the quality sufficient? are all fields from the sources accounted for in the STM (source-to-target mapping)? Do they have the right qualities for the transformations? See also building block: Data Profiling.
Datawarehouse (core): Are the transformations executed correctly? Is everything loaded correctly into the datawarehouse? No missing data or duplicates? Is the quality acceptable? See also building block: Data Quality.
Data marts: Do the data marts contain the required data (and nothing more)? Is the data correctly loaded? If applicable: are the transformations executed correctly?
Dashboarding/Reporting: does it show what the business asked for? In an understandable fashion? Are the calculations done on the correct fields? Is authorization implemented correctly?
Data management/auditing: Is auditing in place? Can records be fully removed (“forgotten”)?
Testroles and skills
Based on the testing activities three BI testing roles can be discerned with each its own focus and skills:
In practice however, a BI test team is often asked to perform all roles as the project advances:
=> The BI test team is first asked to test the migration from source to staging and to check the data quality.
=> Then to test the ETL transformations and the final data quality in the datawarehouse (and data marts)
=> And last, to check that the reports and dashboards are showing the correct information.