Are Your Data Conflicts Slowing You Down?
In our last post
, we unpacked the first element of our 10-step, comprehensive data quality program. In this example, we will tackle the second step. As a reminder:
- You know where your data comes from in terms of systems and sources
- You are aware of conflicts and inconsistencies between your sources
- You have an approach for resolving any conflicts between sources
- You capture data once, and use it in multiple places
- You have documented what data is critical for implementing your business rules, and you have approaches for filling in any missing data
- You have tools and processes for identifying and correcting flaws in your data
- Your data exists in a format that makes it easy to access using readily available tools
- You are not dependent on a software vendor for access to your data
- Everyone on your team is cognizant of the value of “good data” and the long-term costs of ”sloppy data”
- You leverage your data to support operations AND to support long term decisions
If you completed the data inventory described in our last data post, it is likely that you have already found some potential conflicts. The more you know about the potential conflicts up front, the more you can do to maintain the integrity of your internal data. Completing step #1 is a prerequisite here, but once you have an inventory of your data elements and sources, you can use the list to identify potential conflicts. For example, if you add new members based on a list from employers, but get their Union ID based on a list from a Union office – do you compare and reconcile any differences in the member’s name? Do you verify that the “new” Union number doesn’t already exist in your system? Some of the most nefarious problems are created by the “duplicate person” syndrome – i.e. when 2 separate records are referring to the same person. Even worse, is the “duplicate ID” syndrome – when 1 record is tied to 2 different people. These are the data gremlins that can cost you dearly down the road. The expense to resolve any downstream issues compounds over time the longer they linger.
Let’s dive into the specific example of the process for identifying conflicts in action. Start by picking a data element such as “member name.” This is a piece of data that you could receive from multiple places and should already be on your list of potential conflicts. Start by creating a list of the data sources for the data element you selected (i.e Member Name) along with all of the pieces of related data that come from that source. Use a spreadsheet if possible. A document will also work but a spreadsheet will be more helpful later. For example:
Data Source 1 – Employer
Member Name
Member SSN
Member Union ID
Hours Worked by Date
Data Source 2 – Union Membership system
Member Name
Member SSN
Member Union ID
Member Dues Status
Member DoB
Data Source 3 – Member Enrollment Form
Member Name
Member SSN
Member Union ID
Member DoB
Member marital status
From this example, you may learn that:
- Member name, Member SSN, and Member Union ID have up to 3 sources (and there are not guarantees that all three fields match)
- There are 2 “identifiers” (SSN and Union ID) which should both be unique and consistent
- Member DoB may have 2 sources that are inconsistent
Do this for each data source and element until you have identified all of the potential conflicts that could result from a data element that has more than one source. Make a list of those conflicts in a separate worksheet in the Excel workbook that contains your data inventory. We'll discuss resolving conflicts in our next post, but remember that capturing the sources of potential conflicts will make the resolution step much simpler.







