Data quality in payments – importance, challenges & solutions

Data Quality (DQ) is the first step towards achieving strong Data governance in any payment system or financial data warehouse/lake. Ensuring DQ in big data space, especially, is more challenging than implementing DQ in traditional DW/RDMS.

Traditional systems have more mechanisms to control or do corrective course during data capture or storage where as Big data has some specific challenges doing course correction to ensure DQ due to obvious reasons.

Big Data Lake – usually has large volume & variety data from various business systems and implementing DQ in this case requires lot of collaboration from all business stakeholders and it is very important to ensure DQ in big data space to get accurate simulations/meaningful predictions/precise reports.

There are some critical checks that any DQ activity involves.

  • Completeness of data
  • Consistency with MDM
  • Integrity checks
  • Finding missing information’s
  • Converting unknown to known

The above critical DQ checks should be implemented on below major set of data categories:

  • Business critical
  • System critical
  • Information security

Few examples of DQ challenges…

  • Correcting ZIP codes from the system/application that not consistent with ZIP code repository from postal services.
  • Finding & correcting missing or unknown or incorrect Geo codes when data ingestion is in progress to Data Lake.

There are many more complex challenges and that require data engineers to think of optimized solutions without hurting business SLA/cost.

Poor data/absence of data quality checks will impact critical business decisions and Analytics will not provide accurate insights to any payment business desperate to have the Analytics wheel rotating all the time to achieve the business goals (sales, increase customer base, regulatory compliance. etc. )..

Greater importance of DQ in payments with big data ecosystem is a promising path to accessing the key data quality benefits but it is possible when you have the right tools to get the most accurate information.

DQ issues arise primarily due to bad data entry by humans, junk data coming from system(s), lack of data standards across the board and the lack of ownership. These are the reasons that majorly contribute to poor data and these issues affect the downstream application or data warehouse badly, because each downstream layer spends lot of efforts for course correction that may leads to failure in achieving business goals.

The approach to address poor data is parallel in a way that both business and IT should work together to fix the root cause issue at the source system and big data warehouse should be able to find more poor data quality issues to keep rotating this cycle until all prioritized/critical data quality is ensured by all stakeholders.

The first approach is time consuming and may take years depend on list of prioritized issues and number of application and time taken for a cycle to fix DQ issue.

An alternate approach is less complex and less time consuming because scope of DQ within Data Warehouse especially big data Hadoop warehouse has huge potential and power to address DQ issues with help of correct framework or tool that helps on the fly DQ checks without hurting business delivery SLAs.

While some of the use cases in big data eco system may not emphasize on data quality but there are lot of use cases that require strict data quality management in place to utilize full potential of analytics.

Some the key steps that should be taken or agreed upon are enlisted below:

  • Build data quality rules for each category
  • Build data quality metadata
  • Monitor data quality at every stage where data is touched
  • Build data quality management dashboard to monitor DQ KPIs
  • Capture DQ dashboard snapshot for each system/application
  • Form the DQ committee to create awareness across the board
  • Measure DQ KPIs for each system & assign Ownership

When it comes  to DQ in the big data world, the below factors are to be analyzed before start implementing data quality:

  • Priority of data quality checks and frequency of updates to Data Lake
  • Use case based data quality checks with different components within big data ecosystem
  • Source system data quality / No. of DQ Rules Very efficiently handles multiple data quality checks and controls during data load process
  • High volume data in stipulated time with number of DQ checks.
  • Variety of data which really requires DQ

There are some products available in the market that come with DQ modules but study the above factors with respect to the tool which fits for your project use case.

Want to learn more about us?

Reach out.