Gathering Our Narrowing-The-Gap Insights from SPNHC2023

11 Aug 2023 - Deborah Paul, Matt Yoder

the view from our SPNHC 2023 Symposium

With 15 presentations bringing us wildly different perspectives, our Narrowing The Gaps Symposium offered something for everyone at SPNHC2023. Access to our biodiversity-related data and to knowledge derived from that information and to relevant human expertise vary widely. It can be difficult (impossible?) to measure impact or provide recognition and insightful metrics without access. In our symposium, we asked our presenters to share how the distance between gathering (specimen) information and publishing accessible information might be changing. As organizers, we also asked our presenters to address how the role of the collection manager may be changing with respect to getting to publication faster. Here we summarize a few highlights from these talks.

Linking and Sharing to Narrow Gaps

At Yale, Utrup and Davis showed us the power of LUX used with Yale Collections to realize “linked open data” (LOD, for short). In short, the ability to help your data link to existing web-accessible data to improve discovery and use. From Andrea Thomer and the iSamples group, we learned we still need a package of “ids” for records, people, and organizations in a given dataset if we want to offer meaningful recognition and impact metrics for local collections / collectors / organizations and aggregators. We may be seeing a significant culture change trend in the recognition of the value of and need for shared knowledge management in digitization and data-sharing workflows. For example, Siobhan Leachman shared what is possible in terms of revealing otherwise hidden contributions to science with the (more) level playing field offered by use of tools like Wikidata, Google Sheets, Bionomia, and the Biodiversity Heritage Library (BHL). At Indiana University (IU), Gary Motz gave us a picture of what it’s like to offer a university the opportunity to gather all their collections into a harmonized shared system. Individuals and departments now realizing the joint solution makes otherwise inaccessible collections FAIR (that’s findable, accessible, interoperable, reuseable) and has resulted in collections being tied into the University Strategic Plan.

Who Benefits from Narrowing The Gaps?

Getting museum collections specimen data out the door requires a lot of hard work and expertise. Tommy McElrath, researcher and collection manager at The Illinois Natural History Survey (INHS) pointed out just how valuable the DOI strategy employed by GBIF, along with the dwc:recordedByID and dwc:identifiedByID turns out to be for individuals, a given collection, and an institution. Without these identifiers, Tommy would have much less knowledge about collector expertise, the use of the related data, and their scientific impact. What other benefits could aggregators be offering to those who supply so much of their data? How could others be contributing to the value proposition here? One example would be for all of us to work on ensuring all those collecting or identifying specimens (or doing other data or specimen curation work) have identifiers (i. e. an ORCiD ID or a Wikidata Q number). And, some current collection management software can store and share these identifiers when publishing their data to aggregators like GBIF (e. g. TaxonWorks), other CMS are working to update their data models to make this possible.

Once data gets shared with others (e.g. aggregators), data providers receive feedback about the shared information. (In other words, we often find issues in our datasets, as soon as we share them). This “benefit” remains tricky for many data providers to put to good use. It seems some people ignore these assertions (aka “annotations”) being unsure of what to do with them. Others find them quite useful, however, they note it is often not easy to do so (many reasons). Some folks, like TaxonWorks, have added tools anyone can use like gbifference to see GBIF’s specimen record annotations in their own database. Much work remains to be done, to ensure that our data providers can benefit from improved data in their own collection management systems (e.g. flipped latitudes or longitudes, date issues, misspellings making finding difficult, taxonomy-related issues). Note that a collection’s taxonomic information might even be more up-to-date than the aggregators!

(Arturo’s point about “time to first citation after original description? is something like 10 years?”)

Knowledge Sharing - Knowledge Transfer - Knowledge Evolution (Expansion?)

a few key points or ideas from some of our speakers

Nicky Nicolson (KEW)
- suggests we consider tools like mastodon in curation pipelines, bringing more transparency and inclusion - notes the need to improve credit, that is, to professionalize technician roles, to have leadership value the skills technicians bring and for leadership to know what to look for - see the techniciancommitment for some ideas

Ely Wallis (CSIRO) comments that machine observations and eDNA will soon surpass human observations and specimen records.
- if we want richer data, much will depend on the capacity of the collection staff (and tools) necessary to make this possible. - she asks, what do collections want? better field collecting tools, for example

Shelley James (Perth, Western Australia Herbaruim)
- where are the pinch points? (see graphic) 3 mos to 1 year to get duplicates - ? something fell down and no one noticed

Sharif Islam (DiSSCo, Naturalis)
- we have missing functionality in our CMS systems (see Wikidata Knowledge Graph example) - need data curation in an online domain (instead of a physical domain) - need curation and stewardship

Vince Smith (NHM, London)
- We want to realize meaningful data integration and access to these [richer] data. Yet, governments, universities, etc. find it difficult to align on a single mission given the many science-related challenges we’re facing as a planet. So, Vince suggests a two-part strategy to approach alignment that would empower different visions and needs. - employing community curation (aka more shared knowledge management) - working to implment AI integration into our workflows - noting that the genealogy world that has done both of the above to great success - With a distributed connected open system like this we can all contribute and benefit, we can address local / global issues together in this way.

Beckett Sterner (ASU)
- How can we explain / understand / contextualize differential success across different collection management software platforms and communities? - How do vouchers incentivize (or not) data maintenance? What are factors that influence this? e. g. - no or little curation after publishing data - lack of incentives / benefits for data quality after-the-fact (also more expensive) - For more information about how to further apply this governance framework of shared, pooled resources with polycentric user communities, check out a practical guide that we published on implementing this framework for data:

Matt and Deb (UIUC, SFG) - How are collections and software and collecting and publishing changing / adapting to the various ways vouchers are coming into being (from born digital, to some variation on the theme of some physical object with some digital information)?

Thanks to all our speakers.

Jessica Utrup, Kelly Davis, Andrea Thomer, Siobhan Leachman, Gary Motz, Tommy McElrath, Robert Cubey , Nicky Nicolson, Elycia Wallis, Shelley James, Sharif Islam, Vince Smith, Beckett Sterner, Majid Vafadar, Chern-Mei Jang