What Does Data Cleansing Mean?

Abhishek Ghosh

By Abhishek Ghosh February 21, 2021 6:44 pm Updated on February 21, 2021

What Does Data Cleansing Mean?

Data cleansing includes various methods for removing and correcting data errors in databases or other information systems. For example, the errors may consist of incorrect (originally incorrect or outdated), redundant, inconsistent, or incorrectly formatted data. Key steps for data cleansing are duplicate detection (detecting and merging the same data sets) and data fusion (merging and completing patchy data). Data cleansing is a contribution to improving the quality of information. However, information quality also affects many other characteristics of data sources (credibility, relevance, availability, costs, etc) that cannot be improved by means of data cleansing.

Data Cleansing Process

The process of cleaning up the data is divided into five successive steps):

Make a backup copy of the file/table
Data Quality – Setting Data Requirements
Analysis of the data
Standardization
Cleanup of the data

Data Quality Requirements

High-quality and reliable data must meet certain requirements :

valid data: same data type, certain maximum values, etc.
complete data
uniform data: same unit (i.e. currency, weight, length)
integral data: Data must be protected from intentional and/or unintentional manipulation.

Analysis of data

Once the requirements have been clarified, the data must be checked, i.e. with the help of the checklists, whether the data is of the required quality.

Standardization of data before cleanup

For a successful cleanup, the data must first be standardized. For this purpose, these are first structured and then standardized. The structuring brings the data into a uniform format, for example, a date is brought into a uniform data format (01.09.2009) or composite data is broken down into its components, i.e. a customer’s name into the name components Salutation, Title, First Name and Last Name. In most cases, such structuring is not trivial and is carried out with the help of complex parsers.

During standardization, the existing values are mapped to a standardized value list. This standardization may be carried out, for example, academic titles or company additions.

Cleaning up data

There are six methods to clean up the data that can be applied individually or in combination:

Derive from other data: The correct values are derived from other data (i.e. salutation from the gender).
Replace with other data: The corrupted data is replaced by other data (i.e. from other systems).
Use Default values: Default values are used instead of the incorrect data.
Remove incorrect data: The data is filtered out and not further processed.
Remove duplicates: Duplicates are identified through duplicate detection, the non-redundant data is
Consolidation from the duplicates, and a single data set is formed from them.
Split summary: In contrast to the removal of duplicates, incorrectly summarized data is separated again.

What Does Data Cleansing Mean
Storage of the faulty data

Before cleaning up the data, you should save the original, erroneous data as a copy, and not simply delete it after the cleanup. Otherwise, the adjustments would not be comprehensible, and such a process would not be audit-proof.

An alternative is to store the corrected value in an additional column. Because additional disk space is required, this approach is recommended for only a few columns in a record to correct. Another option is to store it in an additional line, which increases the memory requirement even more. Therefore, it is only possible to correct a small number of records. The last option for a large number of columns and rows to correct is to create a separate table.

What Does Data Cleansing Mean?

Data Cleansing Process

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email:

Data Cleansing Process

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Articles Related to What Does Data Cleansing Mean?

Take The Conversation Further ...

Get new posts by email: