Occurrence Data Working Group

Coordinators:
Edeline Gagnon, Technical University Munich, Germany
Jens Ringelberg, University of Zürich, Switzerland
Joe Miller, Global Biodiversity Information Facility (GBIF), Denmark

The Occurrence Data Working Group initially decided to wait for the accepted species checklist from the Legume Taxonomic Work Group and for the launch of the Legume Data Portal before deciding on the best approach to produce an expert-verified global occurrence dataset for the entire family. Now that the first version of Legume Checklist has been published (see previous report), the Working Group has been discussing an agreed set of automated filters for “cleaning” data that could be combined with an expert-curated occurrence dataset, and which could then be available for community use via the Legume Data Portal.

At the LPWG meeting in September, three short talks were presented: i) Charlotte Hagelstam-Renshaw (Université de Montréal, Canada) outlined the methods she is using to assemble occurrence data for subfamily Cercidoideae as part of her MSc; ii) Moabe Fernandes, Newton Postdoctoral fellow (University of Exeter, UK), presented his on-going work focused on assembling and cleaning occurrence data for all legumes in the Americas, with the goal of assessing phylogenetic and biodiversity hotspots for conservation. Moabe stressed the importance of finding ways to validate taxonomic identities of records as key to improving data quality; iii) Domingos Cardoso (Universidade Federal de Bahia, Brazil), presented “cleanHerb”, a new R package he has developed to easily standardize herbarium record data from biodiversity databases.

Joe Miller stressed that a longer-term goal for GBIF is to store and make these filtered “clean” data available for reuse with attribution. This is a difficult task but GBIF is making progress. A simple step researchers can do, is to keep track of the GBIF-IDs when they work with GBIF data; if modifications are made to the GBIF occurrence data, he suggested that researchers track why and how decisions are made so that the data can eventually be reused.