-
Notifications
You must be signed in to change notification settings - Fork 0
Parameters
This parameter defines the type of result the user wants to get back. Currently, action can be one of those:
action |
Default | Effect |
|---|---|---|
report |
Returns a JSON document with duplicate information. Doesn't alter the file in any way | |
flag |
* | Adds new flag fiels to the data set indicating which record is duplicate and with reference to the original record |
remove |
Returns the data set with all duplicates removed |
This parameter defines the types of duplicates to be searched for. Currently, duplicates can be one of those:
duplicates |
Default | Effect |
|---|---|---|
strict |
Only identical rows will be considered duplicates | |
partial |
Only rows with the same key information (locality, scientific name, date and collector) will be considered duplicates | |
all |
* | All the previous types apply |
Since the deduplication tasks will be held in the background, the service needs a way of delivering the results to the user. The best way to do this is to send a notification to the user's email address with a link to the parsed file.
This parameter indicates which field in the original dataset acts as row (record) identifier.
This is a key component, because it will serve as an identifier for which record the duplicate is duplicate of. Both in the case of the report and the flag methods, each duplicate will show the id of the "original" record.
In principle, DWCAs will have either the occurrenceid or an id field, or an "id" can be determined from the meta.xml file. However, this might not be the case, so the first option, that overrides everything else, is to look for an id parameter via this parameter. If there is no such thing, the service looks for an id field, then an occurrenceid field.
If no "id" field is provided and neither id or occurrenceid can be found in the dataset, there will be certain information omitted from the result set.
This repository is part of the VertNet project.
For more information, please check out the project's home page and GitHub organization page