Controle da Qualidade de Dados em Informática da Biodiversidade: O Estudo de Caso dos Dados de Ocorrência de Espécies (Data Quality Control in Biodiversity Informatics: The Case of Species Occurrence Data)

Allan Koch Veiga (, Antonio Mauro Saraiva (, Etienne Américo Cartolano (

Universidade de São Paulo
This paper appears in: Revista IEEE América Latina

Publication Date: June 2014
Volume: 12,   Issue: 4 
ISSN: 1548-0992

For fighting the current environment sustainability crisis, several studies on biodiversity and the environment have been conducted. These studies are based on the assessment and monitoring of biodiversity by means of the collection, storage, analysis, simulation, modeling, visualization and sharing of a significant volume of biodiversity data in broad temporal and spatial scale. Species occurrences data are a particularly important type of biodiversity data because they are widely used in various studies. Nevertheless, for the analysis and modeling obtained from these data to be reliable, the data used must be high-quality. Thus, to improve the Data Quality (DQ) of species occurrences, the aim of this work was to conduct a study about DQ applied to species occurrences data that allowed assessing and improving DQ, using mechanisms to prevent errors. For the most important data domains identified (taxonomic, geospatial and location), a study on DQ Assessment was performed, in which important DQ dimensions (aspects) and problems that affect theses dimensions were identified, defined and interrelated. Based upon this study, DQ mechanisms were identified that would allow improving the DQ by reducing errors. Using the error-preventing approach, 13 mechanisms to support the prevention of 8 DQ problems were identified, thus providing an improvement of accuracy, precision, completeness, consistency and credibility of source of taxonomic, geospatial and location data of species occurrences. This work showed that with the development of certain computing mechanisms, preventing errors reduces DQ problems. As a result of reducing some problems in particular, the DQ in specific data domains is improved for certain DQ dimensions.

Index Terms:
Data Quality, Biodiversity Informatics, Species Occurrence, Data Quality Control   

Documents that cite this document
This function is not implemented yet.

[PDF Full-Text (444)]