When you visit your doctor or attend hospital, information collected about you, including your symptoms, tests, investigations, diagnosis, and treatments, is entered on computers as electronic health records (EHRs). This information could help us learn how to tailor treatments more accurately for individual patients and to offer better and safer healthcare. The challenge The challenge we face is that most of the information held within these records is in written form – sometimes referred to as unstructured text – which is difficult to use in research: for example, ‘the patient feels very tired and breathless, is losing weight, and says her heart is beating very fast’. We need to develop special computerised tools to process these words to ensure we have a full picture of all patient symptoms, experiences and diagnoses to use in research for patient benefit. The solution We will establish a natural language processing (NLP) research community that will address the complexity of clinical text through development of shared tools and standards with inbuilt patient confidence and engagement, supporting joint working across industry, academia and the NHS. The community will be open and inclusive, and develop capability for UK-wide NLP research at scale whilst providing clear ‘quick-wins’ through exemplar projects, shared material and datasets for training and implementation, with the ultimate aim of integrating with other health data analytics. The project will lay the foundations for a sustainable model for collaborative working, thus attracting funding for next 4 years and beyond. Impact and outcomes There is much value in the unstructured portion of the EHR, often in the form of rich narrative, which is currently unused, yet important for understanding health interactions that are either not recorded in, or are less obvious from, the structured data (e.g. multimorbidity, cancer diagnosis). The subtleties of the patient journey and characteristics are often stored in the free-text component and clinicians therefore find it a more user-friendly format. Building on existing successful partner-led programmes, and drawing the wider community together, this project will enable a major shift in the UK’s ability for research-ready, actionable, real-time and large-scale EHRs. Shared tools will be made available across the NHS, creating richer, more useful clinical information to improve healthcare. The integration of NLP derived phenotypes (digital descriptions of health characteristics) from EHRs with other rich records (e.g. educational and social information) and other modalities including imaging, mobile health and genomics will help generate a more complete picture of the patient and their health. Example projects will focus on areas of stroke, lung cancer and serious mental illness. Better use of unstructured text will help streamline matching of patients to clinical trials and stratification of patients for disease classification, outcome prediction, patient trajectories across the life-course, adverse drug reactions, and identify drug-repurposing opportunities. Resources The HDR UK Text implementation project has a GitHub repository containing a curated list of applications and datasets developed and shared by the Health Data Research (HDR) UK Text community. The repository is located here: https://github.com/hdruk-text