Masterdata Management; Matching, Merging, Normalisation, Consolidation
Data of an organization generally includes
- Unstructured data found in emails, intranet portals, product files carrying specifications, collaterals and more.
- Transactional data- Data like shipping documents, invoices orders, credit card payments, insurance claims and others
- Metadata- It is the data about the other data of the organisations generally xml files, configuration files, logs etc
- Hierarchical data- Those data of the organization which are in parent child relationship in the organization data tree
- Reference data – A special type of data that is used to categorise and index other data
- Master data- the data around which the core business is connected
The four main domains of master data include
Customers, products, locations, warranty licenses etc
Masterdata management encompasses tools, technologies and procedures that enable coordination of Masterdata along the entire enterprise. The two most important functions for Masterdata management are matching master data and merging them
There are various methods of matching data that can be accomplished by automation on pre-defined rules and passing the uncertain data through a review process
- Exact Match- Exact Match depends on linking two different sets of data with the help of a common key or match identity like national identification number or a barcode encoding the item number.
- Phonetic Match- This is usually employed for those data which can not be matched by exact match. This is generally applicable to data which have been inputted manually or through voice-based input methods. For example, recording customer name at point of sales. We generally include phonetic algorithms like Soundex in databases like SQL or Postgress
- Fuzzy Match- This involves matching users when other criteria match. That is the items falling within a certain radius of match is considered to be same. This is not very exact way of matching. It helps in greater coverage of data matching but accuracy is least in all the types of matching
- Negative Match- It requires some extra conditions which if not met the data shouldn’t be matched. They are used for excluding an unlikely event.
Matching and Merging data; Essentials to be considered
Name frequency for merging- While matching and merging name data, one should take into consideration the frequency of the names. Frequent names need to be validated at higher levels before they are match. There may be 50K Alex in US but 1 or fewer karadas. The burden of validation is more in the prior case.
Languages- One entity can have multiple representation in different languages
Abbreviations- One entity can have different abbreviations. For example, Sam for Samuel, Rose for Rosalyn and so on
There are many other numeric, strings and variables consideration to be taken while matching and merging data
After matching and merging the third step is data cleansing and normalisation, then data consolidation and finally putting data into production.