By Maximilian Salcher

Maximilian Salcher is Research Officer at LSE Health and Social Care, London School of Economics and Political Science, London, United Kingdom. Email:

Summary: Linking existing databases is seen as key to unlocking the potential of big data to revolutionise health care. Shared electronic health records and provider benchmarking can improve the quality of care, while linked databases are deemed enablers to support the transformation towards value-based health care. The wealth of collected data enables researchers to answer questions that are of high relevance for policy-makers, patients and providers. However, data privacy concerns pose a challenge to the integration of data sources. Effective use of big data to transform health care systems requires substantial commitment from all stakeholders and a strong governance framework. 

Key words

Big data, governance, data privacy, data linkage, value-based health care


Health care systems around the world routinely generate a wealth of data on every patient, providing a comprehensive picture of health care pathways and outcomes. Enhanced by data from non-health care system sources, such as geographic location, socio-economic status, lifestyle and social networks, a near-complete picture of the individual can be created. The increased supply of health-related data from multiple sources (“big data”) has the potential to change the face of health care and provide added value for all health care system stakeholders.1,2 While no single definition for big data is universally accepted, existing ones all describe a similar concept: large, diverse, and rapidly increasing datasets that contain information in various formats and which require novel methods to be processed.3

In theory, access to detailed data about individual patients supports patient-centred and outcomes-focused care through individualised treatment decisions, which take into account clinical, genetic, lifestyle and other information. However, most of the data are contained in silos, and even for health care-related data there is no or limited linkage between existing databases. Missing information about a patients’ background, history and outcomes poses a problem for a range of health care systems stakeholders, including care providers, who are interested in providing better quality care for their patients; policy-makers and regulators, who aim to ensure effective and efficient services for the population; as well as researchers, who need comprehensive data to investigate risk factors and remedies in order to inform clinical practice and the development of new therapies. Linking existing databases is therefore seen as key to unlocking the potential of data-driven health care system change.  Below, opportunities from using big data to improve quality of care, increase health care system efficiency, and conduct high-quality research are presented, alongside considerations regarding the ethical and technical challenges associated with the use of large and linked databases.

Improving the quality of care

At the individual level, the integration of clinically relevant data can lead to significant improvements in clinical practice with tangible benefits for patients, including individualised treatment plans and fewer duplicate diagnostic tests. Increasingly, lack of linkage between existing databases is recognised as a barrier to coordinated provision of health care services and shared electronic health records (EHR) are introduced as a counter measure in many health care systems. Projects such as the Catalonian “HC3” shared EHR and the Danish online portal act as information sharing platforms for all health care professionals involved in the care of an individual patient.

At the aggregate level, big data provides an opportunity to monitor provider performance and ensure high quality of care. In a survey of OECD countries, the measurement of various elements of the health care system was given as a key reason for enabling linkage between datasets, including measurement of health care quality and system performance; coordination and outcomes of care pathways; quality of care through compliance rates with national guidelines; resource use and costs; disease prevalence; and the analysis of relationships between socio-economic status, health and health care.4 However, many countries miss out on opportunities to improve clinical practice using linked data.5 An example for putting data linkage to action is the National Board of Health and Welfare in Sweden, which monitors compliance of providers with national clinical guidelines in various disease areas using data from its patient registries network (the National Quality Registries) that are linked to mortality and prescriptions databases. Performance across providers can be compared and reasons for shortcomings investigated, which in turn informs action plans for clinical practice improvement.

Improving the efficiency of health care systems

The existence of substantial waste in health care systems, stemming from underuse of effective treatments, overuse of ineffective treatments, failure to coordinate and execute care and other sources,6 has given rise to the promotion of value-based health care as a priority for policy-makers: improving patient outcomes in a cost-effective way. Meaningful use of big data could contribute to waste reduction by identifying the most cost-effective treatments, enable care coordination (see shared EHR above), and accelerate the development of innovative and highly effective medicines. For example, linked data on long-term and real world outcomes can be used to assess the efficacy, (comparative) effectiveness and cost-effectiveness of new medicines, leading to more informed decisions about market access and availability of these drugs. However, the trade-off between rigorous evidence standards for market approval of new drugs and faster access to innovative medicines for patients needs to be carefully considered, and evidence on big data-induced efficiency gains in health care systems through value-based health care or other mechanisms is yet to emerge.

In the United States, the Centers for Medicare and Medicaid Services (CMS) aim to use big data to drive health care systems change and are planning to link two thirds of payments to value, including through initiatives such as Accountable Care Organizations and Coordinated Care Organizations. Data-driven health care system change is also on the agenda of the European Union (EU). The recently launched “Big Data for Better Outcomes” programme (Innovative Medicines Initiative) creates research platforms and big data networks for various disease areas (currently including Alzheimer’s disease, hematologic malignancies, and cardiovascular diseases) with the aim of accelerating the transition towards value-based health care systems in Europe. In line with recommendations from a recent European Commission report on big data in health care1, this research programme leverages expertise from the public and private sectors in a public-private partnership to combine and expand existing data sources, build analytic capacities, and establish common standards.

Data linkage can also create efficiency gains in the collection and use of data. At the heart of the Belgian initiative is the recognition that the analysis of health care data can be improved significantly by linking and making better use of existing, rather than collecting additional, data. Integrating data from various sources is a particular challenge in decentralised health care systems and can require substantial investments in technical solutions and political will to overcome long-standing fragmentation. In the Belgian example, a new centre for the integration of existing databases was established as part of a national eHealth action plan that was agreed by several hundred stakeholders.

Research opportunities

For researchers, linked databases provide opportunities to analyse disease patterns, detect associations between exposures (such as behaviour or health care services received) and outcomes (e.g., acute events such as heart attacks or onset of chronic diseases such as Alzheimer’s), and potentially identify causal relationships that can serve as starting points for the development of new therapies. As value-based health care gains traction, interest and investments in comparative effectiveness research have increased, with the wealth of collected data enabling researchers to answer questions that are of high relevance for policy-makers, patients and providers. Data linkage further widens the realm of possible research questions, adding outcome as well as prediction variables to the dataset at the researcher’s disposal.

For example, the CALIBER project in the United Kingdom integrates data from different sources to depict the journey of patients with myocardial infarction through the health care system. Relevant data of events leading up to the infarct and after it are collected in different electronic databases, including a database on primary care, hospitalisation with interventions and associated resource use, a disease registry, and the death certificate. Integration of the data contained in separate datasets provides researchers with a powerful tool to analyse the factors leading to heart attacks, as well as the relative effectiveness of different interventions to reduce the morbidity and mortality burden of these events.

Knowledge about prevalence of diseases and their patterns are important determinants of national and regional health care planning, yet gaps remain in our understanding of chronic and multiple chronic diseases. Linkage of data from separate sources, such as administrative data from different payers, providers (in- and out-patient care, social care), diagnostic tests, laboratory results and prescriptions, can help to understand disease patterns. While public health monitoring is the most common use of EHRs in OECD countries,5 fragmentation of the health care system with isolated points of care hinders records linkage. Research initiatives, such as the Austrian DEXHELPP project, which aims to combine data sources and develop methods to support decision-making at the population level, require substantial investments, technical expertise, and stakeholder buy-in to overcome these difficulties and play a role in informing health care planning.

Technical and ethical challenges: can data be linked?

Simultaneously with the formulation of the promises of big data, technical and ethical challenges for realising this potential have been identified.3,7 Different standards in databases might prevent data from being used together or require significant resources to be made compatible. Unique patient identifiers, which allow deterministic linkage of records, are not available in all countries. Projects addressing these challenges develop novel methods, such as statistical models and algorithms to match data from separate sources based on the probability of common features (probabilistic matching).

Data privacy concerns arguably pose an even greater challenge to the integration of data sources. Careful consideration needs to be given to the delicate trade-off between access to more personal information and associated potential individual and societal benefits in health care service provision, policy making and research on one side, and the need for data privacy on the other side. The English Department of Health currently evaluates models to inform the public about usage of data in health and social care, including different options for opting out of information sharing between different service providers.8 Some of the proposed opt-out models distinguish between consent to using data for service provision (coordination and continuation of care) and for research purposes. While research plays an important role in improving quality of care, the benefits of using integrated data for research purposes are less tangible for the individual patient, requiring additional information to be made available to obtain consent.

The legal framework for allowing researchers to use existing data varies by country and disease area. The EU Data Protection Regulation allows the use of personal data for research of significant public interest. While data collection may be mandatory without an opt-out option in some areas (e.g. registries of infectious diseases), others require explicit consent from the patient. Either way, researchers and those managing the data have a responsibility to ensure trust in their handling of the data. Good practice measures make inappropriate use of sensitive data less likely, including establishing steering committees with patient representatives; using trusted third parties for data linkage; clear rules for requesting access to data and tracking data use; and safe data environments to conduct research.

Big data governance

While some countries lead the way in allowing data to be shared among government and health systems entities, as well as data to be made available for research (including the United Kingdom, Sweden and New Zealand), others remain much more restrictive and do not allow data custodians and researchers access to datasets they do not own. The different speeds at which countries are developing governance frameworks that maximise benefits while minimising risks showcase the complexity of integrating legal, ethical, technical, and political considerations into a common framework. Enabling governance mechanisms that reconcile data use with data protection include, among others, accessible and well-designed health information systems and a legal framework for processing sensitive information.4

As mentioned above, the European Data Protection Regulation provides for the latter in the European context, although it leaves room for interpretation and requires countries to develop their own frameworks. The development of health information systems and investment in infrastructure is reflected in some national health strategies, including in France, where the recent health system reform foresees linkage of data from social health insurance with private insurance, social care and mortality records. Despite these developments, implementation of governance frameworks and associated investments can lag behind and delay the meaningful use of big data for quality of care improvement, efficiency gains, and research.


Effective use of big data to transform health care systems requires substantial commitment from all stakeholders and a solid governance framework. Linkage of datasets is more than a technical exercise and requires reflection on data privacy and security, incorporation of data use into health system planning, and how and by whom linked datasets will be used. Some of these questions cannot be answered globally, as fundamental differences exist in approaches to using data in health system planning and policy development, and in citizens’ attitudes towards the trade-off between privacy and promised benefits from granting access to sensitive personal data. Trust, as an integral element of the patient-provider relationship, extends to using personal data for research, health system planning and monitoring and has to be won through open communication and the implementation of measures that demonstrate attention to citizens’ concerns. 


This article partially builds on a mini-brief written by the author on behalf of the European Observatory on Health Systems and Policies. Independently of this, the author would also like to acknowledge that this work has received support from the EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking (DO-IT grant n° 116055).


  1. Habl C, Renner A-T, Bobek J, Laschkolnig A. Study on Big Data in Public Health, Telemedicine and Health care. Brussels, Belgium: European Commission, 2016:117. Available at:
  2. Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policy implications. Health Affairs Project Hope 2014;33(7):1115-22. doi:10.1377/hlthaff.2014.0147.
  3. Salas-Vega S, Haimann A, Mossialos E. Big Data and Health Care: Challenges and Opportunities for Coordinated Policy Development in the EU. Health Systems & Reform 2015;1(4):285-300. doi:10.1080/23288604.2015.1091538.
  4. OECD. Health Data Governance. Paris: Organisation for Economic Co-operation and Development; 2015. Available at:
  5. OECD. Strenghtening Health Information Infrastructure for Health Care Quality Governance: Good Practices, New Opportunities and Data Privacy Protection Challenges. Paris: Organisation for Economic Co-operation and Development, 2013.
  6. Berwick DM, Hackbarth AD. Eliminating Waste in US Health Care. JAMA 2012;307(14):1513-16. Available at: doi:10.1001/jama.2012.362.
  7. Weber GM, Mandl KD, Kohane IS. Finding the Missing Link for Big Biomedical Data. JAMA 2014;311(24):2479-80. Available at: doi:10.1001/jama.2014.4228.
  8. National Data Guardian for Health and Care. Review of Data Security, Consent and Opt-Outs. National Data Guardian for Health and Care, 2016. Available at:

Source: Eurohealth