Persistent Identifiers

Set the foundation for your data strategy with a direct return on investment by providing long-term stable references

The Challenge

Managing identifiers can be a challenge for large organizations. The underlying reason for this is that the current IT landscape is still dominated by the application-centric paradigm while the data life-cycle is much longer than the application life-cycle. This means that the identifier management is still ruled by specific application and vendor solutions. To move forward towards a digital enterprise, data needs to be managed and governed application independent. The crucial first step here is to decouple the identification of data resources from specific applications or systems. Persistent identifiers are designed to do exactly this in a very elegant and simple way. Accurids is a the first digital registry based on persistent identifiers which meets important enterprise requirements.

 

What is a Persistent Identifier?

A persistent identifier (PID) is a long-lasting reference to a digital resource. Typically, persistent identifiers are accessible over standard internet protocols so that one can easily lookup information. The main role of PIDs is to decouple identification of a resource from its current location. Usage of Persistent Identifiers is one of the core FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles for scientific data management.

Minimal Functionalities of PID implementations

All PID implementations need to cover at least the following functions:

  • Minting: generating identifiers without collisions
  • Resolution: Simple way to lookup registered resources – redirection to target URLs
  • Administration: system configuration and update of target URLs of registered resources

Examples

The concept of persistent identifiers is well established and there exist several implementations:

What is Special about Accurids PID Solution?

Accurids provides a PID implementation tailored to reference and master data and to enterprise requirements from regulated industries. In the following we describe key features of our solution.

Creating Persistent Identifiers

  • Configuration: Accurids allows you to freely configure the patterns of your PIDs for different domains. Configurations include the namespace, pattern of the local identifier, prefix/suffix, specific sequences, reuse of legacy codes and more.
  • PID can be generated sequentially, as batches and in parallel.

PURL: Standards-based Identifier Resolution

  • HTTPS: Standard-based and secure resolution mechanism
  • Access rights: In contrast to most public resolution scenarios you likely want to restrict who can see what information about your resources. Accurids implements a role based resolution mechanism so that you can configure who will see what information. For instance, resolution is different behind your corporate firewall than outside of it.
  • Performance: Resolution needs to be as fast as possible and scale huge number of parallel requests
  • Versions: Accurids implements a hypermedia model that allows to resolve to different versions of your digital resources.

Advanced Resolution Mechanisms

Accurids is a hub and you can extend the basic resolution mechanism to return information from multiple source directly or on subsequent requests.

Example: You have created PIDs for chemical substances and make basic metadata accessible within your organization. For authorized users you additionally include a picture of the chemical structure.

Versioning

Accurids keeps the history of the essential metadata for all registered resources. The version chain and individual versions are resolvable though identifier extensions.

Minimal Metadata Models

Accurids provides the possibility to configure minimal metadata models for persistent identifiers. This allows to harmonize reference and master data from distributed sources you can harmonize on critical data elements per domain or system wide. For instance the domain or category (product, location, substance, device etc.) of the registered resources can be set as a required attribute. This increases findability and interoperability without dictating too rigorous standards.

High-Performance & -Availability

The persistent identifier registry is a critical component of your digital infrastructure with high-availability and high-performance requirements. Accurids is a cloud-native implementation which implements a node concept so that you can scale to your performance requirements and setup redundancy as needed.

Security & Integrity

The persistent identifiers are the knowledge backbone for large parts of your digital infrastructure. The integrity of this backbone is essential and Accurids utilizes state-of-the-art security (e.g., Azure Active Directory) and integrity checks to guarantee reliability.

Interoperability

Accurids allows you to integrate third party PID generators such as ARK or DOI.

Working with Public Ontologies

Most of the public ontologies (e.g., from the OBO Foundry) already utilize PIDs/PURLs. With customers we have identified the following scenarios when it makes sense to create additional PIDs:

  • Anonymity: You do not want to be tracked
  • Performance: Faster resolution than through public services
  • Versioning: You control the version to be used and keep the history
  • Extensions: You extend a public ontology with internal metadata

If you only want to avoid being tracked, hosting of public ontologies internally with Accurids and redirecting corresponding URLs within your enterprise proxy would be sufficient.

Return Of Investment

The main return of investment is realized through reduction of complexity for reference and master data consuming applications. Their development costs regarding reference and master data integration is less than half the normal effort. Additionally, significantly reduced effort and duration for lookup corresponding data  is realized due to direct access though any web browser. Your employees do not need to ask “Where can I get information about X? Do I have access to this? Where do I request access?”.

In the long term, PIDs provide the flexibility to change your RMD systems without disrupting your applications and the need of expensive migration scenarios. This will save typically more than 2M $ per master data system which would need to be migrated.