Article

Solving Reference Data Challenges in Pharma at Enterprise Scale

Why the management of reference data management cannot be centralized, and how easy-to-use global lookup services effectively increase findability, reuse, and interoperability.

Heiner Oberkampf

October 5, 2025

Why the management of reference data management cannot be centralized, and how easy-to-use global lookup services effectively increase findability, reuse, and interoperability.

‍

Reference data in the pharma and life science industries is created in many places and within various functions, both inside and outside of enterprise boundaries. The distributed nature of reference data management often leads to information redundancy, low findability, additional integration and mapping efforts, or even data incompatibility issues. Mitigating these issues is critical for FAIR data management. In this article, we describe why the management of reference data management often cannot be centralized and how easy-to-use global lookup services effectively increase findability, reuse, and interoperability.

‍

Reference data is managed in many distributed sources

‍

Reference data in pharma is created in many places and with varied functions. The creation of information happens across institutions and along the translational value chain, going through early research, development, clinical studies, regulatory processes, manufacturing, supply chain, commercial, and finally, the observation of evidence in the real world. Reference data in early research is usually based on public literature and information created by scientific institutions. Partnering entities, such as Contract Research Organizations (CROs), are involved in early research and clinical studies. As part of the process, clinical study and regulatory affairs data are legally required to be based on reference data from authorities (such as the EMA and FDA). Ultimately, the foundational workflows that eventually lead to a life science product involve collecting and reusing data by many different systems.

‍

The different departments that fulfill specific tasks along these pharma value chain often use vocabularies within their business applications that are not aligned across systems.

‍

Depending on the department and applications, creating the same or similar reference data points (such as a new indication) might happen in a strictly regulated environment and in settings where IDs and labels for data can be freely chosen. Thus, the particular departments must recognize handling shared and distributed reference data. This involves opening up for sharing labels and codes with other systems or ingesting and reusing codes from different functions and even from external organizations.

‍

Although departments handle the same kind of data – such as records on regions and countries, species, indications, and drugs – this information is used in different contexts and often requires different levels of detail or granularity. Thus, data entities are viewed from different perspectives and have different roles, depending on the function and workflow (this requires different attributes and metadata). In this context, other data producers and users must align the workflows for creating data (and on the data use conditions) along the value chain.

‍

It is vital to bridge data silos between the different functions, minimizing costs and efforts when using interfaces to transfer data between functions.

‍

Satisfying the demands of regulatory authorities is another chief concern, which may involve multiple functions in the product’s value chain.

‍

The Top 3 Reference Data Challenges in Pharma

‍

1. Missing Awareness and Redundancies

‍

Internal and external functions are unaware of existing reference data and reference data standards. Thus, alignment is necessary to enable integration and translation (mapping) between reference data sources if multiple standards are required. This lack of awareness leads to reference data being reinvented, which in turn unintentionally creates data silos. In the worst case, functions are aware of other systems but claim to provide the reigning reference data while ignoring other data or standards. Examples of this are the different functions inside the enterprise, such as R&D, production, Regulatory Affairs, Real World Evidence, and the work with CROs. Also, redundancies cannot be avoided if e.g., a proprietary source system in the lab has pre-configured non-standard units of measures. Again harmonization of measurement attributes and corresponding units need to be made.

‍

2. External Authorities

‍

Despite global data harmonization efforts as, e.g., driven through ISO IDMP, every health authority requires particular vocabularies to be used in submissions and any structured data exchange. These different vocabularies are mostly not aligned – posing a tremendous challenge to pharma regulatory teams when preparing submissions. Futhermore, the internal data is not aligned to what the regulatory teams needs, because detailed information from departments is often unavailable due to a lack of internal standardization. Thus, such demands may result in massive data wrangling and integration efforts along the value chain. The diversity of regulatory requirements in different regions and product categories is thus only the tip of this complexity iceberg.

‍

3. Workflow Integration

‍

Since many different departments need to adapt, extend, and enhance information, it is crucial to align and maintain reference data in a way that enables the subsequent orchestration of data distribution and sharing workflows. This must happen without impeding processes that also require the reference data in other functions.

‍

Similarly, master data entities, such as “product” or “study”, are managed in a distributed manner, and this will not change. This is because people work in their specific business applications when capturing or processing data. In this case, switching to another system or requesting the creation of a new master entity through a managed service is often not feasible for business users. Moreover, such a system is often unavailable to them, or they do not have access to it.

‍

How does ACCURIDS support distributed reference data management?

‍

Nowadays, many enterprise data strategies aim to centralize reference and master data management to regain control. Though there are legitimate reasons and interests for centralization, ACCURIDS is built on the premise that centralized management alone is too slow to accommodate the fast-changing IT and data landscape. Pure centralized approaches fail to support the perspectives of different business units and the corresponding incompatibilities in large organizations. Nevertheless, both aspects are increasingly important in an era of digitalization and increased need for collaboration across enterprise boundaries.

‍

ACCURIDS is a registry for distributed reference and master data. For data stewards and data consumers, it serves as a discovery solution and provides reliable access for anyone in your organization.

‍

With the following capabilities of ACCURIDS your organization will regain control through a distributed reference data management approach:

‍

FAIR Reference Data Registry: All existing reference data from current applications get registered with Globally Unique, Persistent and Resolvable Identifiers to (a) create an index of all reference data objects and to (b) provide long-term stable and resolvable IDs that users (humans and machines) can rely on – even when the location where the reference data is managed changes over time.
Global Lookup Service: With the registry foundation, it is possible to search any term or code used within the enterprise across systems with the context of where they are used. This allows data stewards and business users to find the preferred terminologies for a given domain, encouraging reuse instead of re-creation.
Match & Merge: As the current reference data landscape has huge redundancies and requires also to integrate and align to vocabularies from many different external organizations, it is essential to have highly automated match and merge capabilities for the terminologies that are already in use. ACCURIDS AI automatically generates matching proposals and data stewards need to review only exceptional ambiguous cases and approve AI generated mappings to provide your organization with authoritative mappings.
ACCURIDS Data Hub for Public Terminologies: All pharma relevant public terminologies, e.g., from health authorities (e.g., EMA SPOR or FDA SPL) or public terminologies from the OBO Foundry such as the Gene Ontology or the NCBI taxonomy, can be subscribed to with one click so that all internal consumers use the same up-to-date and validated version.

Walid Atai

Business & Brand Developer

walid.atai@accurids.com

DIA Europe 2026: Collaborating for a Sustainable Future

Join us in Rotterdam from March 24-26 to discuss innovation, regulatory excellence, and the power of partnership.

Learn more

Event

26 February 2026

medicines for europe Regulatory Affairs Conference 2026

Join us in Amsterdam this February as we discuss the pivotal shifts in pharmaceutical legislation, the transition to PMS, and the decommissioning of the Art 57 database.

Meet ACCURIDS at DIA RSIDM 2026 | Booth #212

Learn more

Browse all articles

From Fragmented Data to a Unified Backbone. In Months, Not Years.

Ready to see the purpose-built Fabric in action? Schedule a personal demo and we will show you how to eliminate the risk and complexity of your IDMP project.

Schedule a Demo

Case Studies

Solving Reference Data Challenges in Pharma at Enterprise Scale

Reference data is managed in many distributed sources

The Top 3 Reference Data Challenges in Pharma

1. Missing Awareness and Redundancies

2. External Authorities

3. Workflow Integration

How does ACCURIDS support distributed reference data management?

Read more

From Fragmented Data to a Unified Backbone. In Months, Not Years.