Application
On this page you can find the user manual for the Duplicate-Check.

Settings
The Duplicate-Check has no default setting and can be configured individually for each check. Select all the rules for cleansing and duplicate checking that you want to perform in batch processing.
Purge
All cleanups are performed one after the other and are subsequently exported in new fields. If cleanups are performed, the duplicate check is performed afterwards.
Duplicates
The duplicate check checks all transferred data records one after the other and forms groups. Using these groups, all duplicate data records can be viewed and processed subsequently. Which rules were executed during the check of a duplicate is also output.
Batch processing
Select a file or drag and drop it to the “Import file” field and check it by pressing the “Check button”. This will start the verification of the entire file.
The process of checking takes some time depending on the size of the file.
The name of the duplicate export file is automatically assigned and generated in the identical format as the format passed. For example, if a file with the name “dubletten.json” is passed, the name of the log file will be “dubletten-log.json”.
Pre-validation
Pre-checking of a duplicate import file can help you to check the format and the number of records found without performing any verification. The file will be analyzed and the result will be shown to you with a message.
Cleanup Rules
The duplicate cleanup rules listed here are part of the Duplicate-Check. We will add supplementary cleanup options for the software over time.

cleanup | description |
---|---|
c001 | Remove multiple spaces |
c002 | Remove trailing spaces |
c003 | Remove non printable characters |
c004 | Remove German letters (umlauts, ß) |
c005 | Normalize quotes, special chars (Result: ‘, “, -) |
c006 | Remove duplicate letters, e.g. tt -> t |
If you see a discrepancy with our status listed here compared to the duplicate check, please let us know. Do you have other cleanups that we don’t yet provide? Let us know, we will be very happy to expand them.
Duplicate Rules
The duplicate cleanup rules listed here are part of the Duplicate-Check. We will add supplementary cleanup options for the software over time.

Rules | Description |
---|---|
d100 | Match data 100% |
d101 | Match on lower case |
d102 | Ignore country |
d103 | Ignore first name/last name |
d104 | Ignore department |
d105 | Ignore country/first name/last name/department |
d106 | Ignore house number |
d107 | Ignore postcode |
d108 | Ignore city |
d109 | Ignore first name |
d110 | Ignore last name |
d111 | Ignore street |
If you see a discrepancy with our status listed here compared to the Duplicate-Check, please let us know. Do you have other cleanups that we don’t yet provide? Let us know, we will be very happy to expand them.
Configuration
On the tab card you have the possibility to define some settings for the Duplicate-Check. These will be taken into account during processing. These settings are described in detail below.

Separator
The separator is only relevant for the import of CSV files for the Duplicate-Check (regardless of whether batch processing or background processing). The data is split into individual values based on this separator and prepared for checking. Please consider when setting the separator to an occurrence in your master data. A comma or a semicolon can possibly occur in a company name and lead to an error in the processing. We have defined the pipe character “|” as default. For JSON and XLSX this separator is not necessary.
Character encoding of the output
With this setting you have the possibility to control the code page of the CSV output of the Duplicate-Check. UTF-8 is specified here by default. If the CSV file is further processed with Microsoft Excel, it is recommended to use Win1252 (this corresponds to the ANSI encoding).
If the data records in the output file of the Duplicate-Check are not displayed correctly in your text editor or in Microsoft Excel, for example with umlauts, please also set this parameter to a different one than the one currently set for you. This solves display problems in most cases.
Command Line Interface (cli)
You can also run the duplicate check without the graphical interface. For running the client tool in a command line, please specify all necessary parameters.
Parameters
Run ew_service_duplicate --help
and you will get the overview of all Duplicate-Check parameters that you can pass to the cli.
Usage of: ew_service_duplicate.exe [options]
Main options:
--lang=ARG Language (de,en). Overwrites settings.
-c, --cleaner=ARG List of cleaning rules (default: all), comma-separated.
-d, --duplicates=ARG List of duplicate checks (default: all), comma-separated
--inputfile=ARG Filename to import (csv, json, xlsx)
--outputfile=ARG Filename to export the results (csv, json, xlsx)
--split Split export into different files.
--testmail Send a testmail.
--validatefile=ARG Check file, if structure is readable.
Information:
-h, --help Show help and exit.
-v, --version Version Return the version information.
Cleaning Rules:
c001 - Remove multiple spaces
c002 - Remove trailing spaces
c003 - Remove non printable characters
c004 - Remove German letters (umlauts, ß)
c005 - Normalize quotes, special chars (Result: ', ", -)
Duplicate Checks:
d100 - Entries matching with 100%
d101 - Entries matching ignoring case
d102 - Entries matching ignoring country
d103 - Entries matching ignoring firstname/lastname
d104 - Entries matching ignoring department
d105 - Entries matching ignoring country/firstname/lastname/department
d106 - Entries matching ignoring number
d107 - Entries matching ignoring postcode
d108 - Entries matching ignoring town
d109 - Entries matching ignoring firstname
d110 - Entries matching ignoring lastname
d111 - Entries matching ignoring street
-h --help
View all the necessary parameters that the cli supports.
-v --version
Outputs the current installed version of the Duplicate-Check.
--lang
This parameter allows you to specify or override the language of the Duplicate-Check.
-c --cleaner
If you specify this parameter without any other cleanup rules, all of them will be executed one after the other. If you want to perform only certain cleanups, specify them, e.g. --cleaner c001,c003
.
If this parameter is omitted, no cleanups will be performed.
-d --duplicates
If you specify this parameter without any other duplicate rules, all of them will be performed one after the other. If you want to check only certain duplicates, specify them, e.g. --duplicates d100,d105
.
If this parameter is omitted, no cleanups will be performed.
-i --inputfile
Use this parameter to specify the file with data to be imported.
-o --outputfile
The export file is specified with this parameter. It is important that this is not identical to the import file.
--split
Splits the results into separate files containing unique or duplicate entries
--testmail
From the Duplicate-Check, test sending a test e-mail. After processing a file, an email can be sent to you at the end.
--validatefile
Check the duplicate cleanup import file for formal correctness beforehand.
Outputs of the command line interface (cli)
The cli keeps issuing messages on the command line during runtime so that you can keep track of how far the checks have progressed.
CSV Interface
We recommend using the XLSX or JSON import interfaces.
By using a simple CSV file, the Duplicate-Check software provides you with a way to check your entire data set.
We take care to maintain compatibility when extending the CSV import interface of the Duplicate-Check software. This means that you can always use the latest version without generating additional effort when integrating it into your ERP system.
To ensure that the individual duplicate data records belong to your master data, you have the option of specifying up to two unique keys in the import file.
The default separator of the individual elements for the Duplicate-Check is the ‘|’ character (pipe). This can be changed via the settings. Bold field names are mandatory fields (the separator can be changed via settings).
Please note that all fields must be specified in the Duplicate-Check import file, even if you do not use Key_1 and Key_2.
Structure – CSV Import File
field | format | example |
---|---|---|
key1 | String | |
key2 | String | |
firstname | String | |
lastname | String | |
name1 | String | |
name2 | String | |
name3 | String | |
name4 | String | |
street | String | |
number | String | |
postcode | String | |
town | String | |
department | String | |
country | String |
Example in the form of a CSV file:
key1;key2;firstname;lastname;name1;name2;name3;name4;street;number;postcode;town;department;country;
val_key1;val_key2;val_firstname;val_lastname;val_name1;val_name2;val_name3;val_name4;val_street;val_number;val_postcode;val_town;val_department;val_country;
… (more duplicate checks)
Note
Please pay attention to the correct number of columns (14 columns) when creating the CSV import file. This note is important for possible errors during import when using CSV. However, you can also use the XLSX or JSON import format to eliminate this source of error.
Structure – CSV Export File
The CSV export file of the Duplicate-Check contains the transferred values as well as cleaned values and values marked as duplicates.
field | format | example |
---|---|---|
internalid | String | |
key1 | String | |
key2 | String | |
firstname | String | |
lastname | String | |
name1 | String | |
name2 | String | |
name3 | String | |
name4 | String | |
street | String | |
number | String | |
postcode | String | |
town | String | |
department | String | |
country | String | |
// cleaned data | ||
cleaned firstname | String | |
cleaned lastname | String | |
cleaned name1 | String | |
cleaned name2 | String | |
cleaned name3 | String | |
cleaned name4 | String | |
cleaned street | String | |
cleaned number | String | |
cleaned postcode | String | |
cleaned town | String | |
cleaned department | String | |
cleaned country | String | |
// applied cleaners | ||
applied cleaners | String | |
// applied duplicates | ||
duplicate ids | String | |
address group | String |
The output of the export file in CSV format of the duplicate check always includes an additional column containing the headings. Please take this into account for any automatic re-import of the check results.
JSON Interface
With the import interface for JSON files, the Duplicate-Check offers you a way to check your entire data set from your master data.
We make sure that compatibility is always maintained when extending the JSON duplicate interface. This means that you can always use the latest version without generating additional effort when integrating it into your ERP system.
In order to be able to uniquely assign a JSON data record from your ERP system in the duplicate check, you have the option of specifying up to two unique keys in the import file. These will be returned in the export file and can be used for re-import into your ERP system. However, you can also leave these two fields (key1 and key2) blank. They are not necessary for processing.
Please note that all bold fields must be specified in the import file.
Structure – JSON Import File
field | format | example |
---|---|---|
key1 | String | |
key2 | String | |
firstname | String | |
lastname | String | |
name1 | String | |
name2 | String | |
name3 | String | |
name4 | String | |
street | String | |
number | String | |
postcode | String | |
town | String | |
department | String | |
country | String |
Example in the form of a JSON file:
[
{
"key1":"val_key1",
"key2":"val_key2",
"firstname":"val_firstname",
"lastname":"val_lastname",
"name1":"val_name1",
"name2":"val_name2",
"name3":"val_name3",
"name4":"val_name4",
"street":"val_street",
"number":"val_number",
"postcode":"val_postcode",
"town":"val_town",
"department":"val_department",
"country":"val_country"
},
{...}
]
Structure – JSON Export File
The Duplicate-Check JSON export file contains the previously imported data, also in the same data format unless otherwise specified. Please note that the JSON interface outputs all available fields and thus represents the most complete format.
field | format | example |
---|---|---|
internalid | String | |
key1 | String | |
key2 | String | |
firstname | String | |
lastname | String | |
name1 | String | |
name2 | String | |
name3 | String | |
name4 | String | |
street | String | |
number | String | |
postcode | String | |
town | String | |
department | String | |
country | String | |
// cleaned data | ||
cleaned firstname | String | |
cleaned lastname | String | |
cleaned name1 | String | |
cleaned name2 | String | |
cleaned name3 | String | |
cleaned name4 | String | |
cleaned street | String | |
cleaned number | String | |
cleaned postcode | String | |
cleaned town | String | |
cleaned department | String | |
cleaned country | String | |
// applied cleaners | ||
applied cleaners | String | |
// applied duplicates | ||
duplicate ids | String | |
address group | String |
In contrast to XLSX or CSV, we have agreed on the pure English notation of the Duplicate-Check keys within JSON. This allows us to avoid runtime errors in the event of an incorrect conversion from the outset.
Please note that the fields in the Duplicate-Check export file are not always output in the same order as specified in the table above.
XLSX Interface
With the import interface for Microsoft Excel (XLSX) file, the software Duplicate-Check offers you a possibility to check your entire data set from your master data.
We take care to maintain compatibility (also with Microsoft Excel) when extending the XLSX duplicate interface. This means that you can always use the latest version without generating additional effort when integrating it into your ERP system.
In order to be able to uniquely assign an XLSX data record from your ERP system in the Duplicate-Check, you have the option of specifying up to two unique keys in the import file. These will be returned in the export file and can be used for re-import into your ERP system. However, you can also leave these two fields (Key_1 and Key_2) blank.
The designations of the column headers of the Duplicate-Check are searched for within the XLSX file during import and assigned during import. Please enter only one name at a time, e.g. “key1” and not “key1,key_1”.
The upper and lower case of the column headers is not relevant for the import.
Structure – XLSX Import File
field | format | example |
---|---|---|
key1 | String | |
key2 | String | |
firstname | String | |
lastname | String | |
name1 | String | |
name2 | String | |
name3 | String | |
name4 | String | |
street | String | |
number | String | |
postcode | String | |
town | String | |
department | String | |
country | String |
Structure – XLSX Export File
The XLSX export file of the Duplicate-Check contains the returned values of the individual checks, also in the same data format unless otherwise specified.
field | format | example |
---|---|---|
internalid | String | |
key1 | String | |
key2 | String | |
firstname | String | |
lastname | String | |
name1 | String | |
name2 | String | |
name3 | String | |
name4 | String | |
street | String | |
number | String | |
postcode | String | |
town | String | |
department | String | |
country | String | |
// cleaned data | ||
cleaned firstname | String | |
cleaned lastname | String | |
cleaned name1 | String | |
cleaned name2 | String | |
cleaned name3 | String | |
cleaned name4 | String | |
cleaned street | String | |
cleaned number | String | |
cleaned postcode | String | |
cleaned town | String | |
cleaned department | String | |
cleaned country | String | |
// applied cleaners | ||
applied cleaners | String | |
// applied duplicates | ||
duplicate ids | String | |
address group | String |
XLSX versions
We support all XLSX versions up to and including the latest version of Office 365 in the Duplicate-Check software. Please understand that we will no longer support older versions (XLS).
Our test scope for the Duplicate-Check software includes very many different variants of how a Microsoft Excel document can look like. Nevertheless, we could not test all functions and possibilities. If you encounter a problem when importing XLSX files, please contact us and send us one or two test data sets so that we can help you quickly.