Legume Data Portal

Carole Sinou1, Joe Miller2 & Anne Bruneau1
1Université de Montréal, Canada
2Global Biodiversity Information Facility (GBIF), Denmark

In November 2020, the Global Biodiversity Information Facility (GBIF) launched a program to host tailored data portals for GBIF nodes in their community. The portals were designed primarily for GBIF country nodes and regions to showcase occurrence data for particular geographical areas. The LPWG was chosen as one of only two taxon-focused projects with the objective of presenting data about Leguminosae. GBIF provides the informatics infrastructure and expertise, while the legume community is responsible for the content. GBIF is interested in encouraging dialogue with expert communities in order to improve data quality and increase visibility and use of GBIF mediated data, which come from herbaria. In its initial phase, the portal has three major types of information for our community: the latest taxonomic information; distribution data from GBIF; and a platform for community communication. The Legume Data Portal was developed under the umbrella of Canadensys (https://www.canadensys.net/), which is an associate participant node of GBIF and publishes data from Canadian collections and initiatives.

Legume Data Portal 1

The need for scientists to exchange, share and organise data has resulted in a proliferation of taxonomic and biodiversity research data portals in the past decades. These cyberinfrastructures have had a major impact and helped to revitalise taxonomy by allowing quick access to bibliographic information, biological and nomenclatural data, and specimen information. A pioneering example is the International Legume Database and Information Service (ILDIS), which was established in 1985, and led the way in developing methods and thinking with regard to taxonomic data management more generally. However, due to lack of resources, lack of institutional support and community involvement, as well as evolving software protocols, increased data complexity, and the need for distributed data curation, ILDIS had not been curated by the community for over ten years. In a 2019 paper in Advances in Legume Systematics Part 13, Bruneau et al. proposed the establishment of a new Web portal to facilitate access to scientifically validated data about legumes, stemming from several years of discussions within the LPWG and a workshop held during the International Legume Conference in Japan in 2018. The collaboration with GBIF has provided the LPWG with a framework for helping to curate global data resources and working together to report data quality issues to data providers, leading to enhanced quality data on legumes.

Legume Data Portal 2

Example of subfamily description for Caesalpinioideae

The new Legume Data Portal was officially launched on September 30th 2021 at the virtual LPWG meeting. The portal currently provides an overview of the legume family and each of the subfamilies and a summary of the work and objectives of the LPWG and its five working groups.

The new verified legume species checklist generated by the Taxonomy Working Group is available to browse, search and download from the Legume Data Portal (https://www.legumedata.org/taxonomy/species-list). For each genus there are links to the relevant pages on GBIF and the World Checklist of Vascular Plants (WCVP). The LPWG Taxonomy Working Group is currently exploring available tools for curating the species checklist with the aim of encouraging community engagement in maintaining an accurate and up-to-date species checklist. This new verified checklist is now the primary taxonomic backbone for the nearly 22 million legume occurrence records served through GBIF and the Legume Data Portal. An accurate legume checklist is important for research. It is also expected to become the primary source of scientific names for other global online platforms that serve biodiversity data (e.g., World Flora Online). In the future, we would like to provide further information about legumes, including links to trait, genomic and phylogenetic data, and curated images of legume species.

The Legume Data Portal also showcases legume occurrence data, including specimens and citizen science records, currently available on GBIF (https://www.legumedata.org/data?view=MAP). A series of filters have been applied to these data to automatically clean obvious distribution errors, but users can apply additional filters as required. Users can download data for downstream analyses and obtain a DOI for citing the dataset from GBIF. The Legume Occurrences Working group aims to provide tools and scripts for cleaning occurrence data and sharing cleaned occurrence datasets for the family.

GBIF maintains all scientific names it receives linked to specimens from herbaria all over the world, including many names that are not available or aligned with the new LPWG taxonomy. These occurrence data points, even though they are shared with GBIF, remain underutilized and don’t represent our current knowledge of legume species distributions. GBIF will provide the LPWG with a list of all legume names currently used in herbaria that share data with GBIF with the hope that in due course all data can be reconciled with the LPWG taxonomy. In turn GBIF can quantify the value added by taxonomy on the quality of GBIF occurrence data which we hope will convince funding bodies of the value of taxonomy. Overall, the goal is to improve data quality for better science. GBIF is using the legume portal as an important test case.

Finally, a section of the Legume Data Portal is dedicated to the Bean Bag, with access to recent issues and a link to all back issues, and on the Portal home page you can see legume news items and announcements from the community. Legume news items and feedback about the portal should be sent to: legumephylogenywg@gmail.com.

The Legume Data Portal aims to encourage international collaboration and exchange amongst scientists and students, and provide a platform to share data and expertise on the systematics and evolution of the Leguminosae with a broad community of users. Please take a look, and please contribute!

Legume Data Portal

GBIF map view showing the distribution of occurrences for the mimosoid legume genus Senegalia on the Legume Data Portal. The user can click on an occurrence point and directly obtain the specimen data for the records that populate that locality.