Named Entity Recognition (NER) in healthcare is a technique for detecting and classifying healthcare-specific terms (entities) such as patient names and medical terms from unstructured text. Performing such tasks not only improves the accuracy of data extraction from unstructured text and facilitates information retrieval, but also enhances advanced AI systems. Medical NER is an essential technology for AI development in natural language in medical institutions.

TranSynk’s NER dataset is a dataset designed to help healthcare organizations extract critical information from unstructured data. It can reveal relationships between medical reports, insurance documents, patient reviews, clinical notes, and other data to increase the visibility of medical data. We leverage NLP’s advanced expertise to handle complex custom annotation projects of any size.

1. identification of medical specific expressions

Medical records contain a vast amount of medical information, much of which is unstructured text that is not easy to identify, partly due to its specialized nature. To facilitate the conversion of this unstructured content into a structured format, unique expression annotations dedicated to medical information are required.

2.1 Attributes of Pharmaceutical Products

Most medical records contain information about drugs and their attributes that are important to clinical practice. Based on established guidelines, the various attributes of these drugs are accurately annotated.

2.2 Laboratory Data Attributes

Laboratory data contained in medical records often describe unique attributes. We follow established guidelines to identify these attributes and provide accurately annotated data.

2.3 Attributes of physical measurements

Physical measurements include a variety of data, including vital signs, and are recorded in the medical record along with their respective attributes. We can identify these physical measurement attributes and annotate or tag them appropriately.

3. oncology-specific NER

In addition to general medical unique expression extraction (NER) annotations, we also support NER in highly specialized areas such as oncology and radiology. Oncology can provide datasets for the following NER annotations Cancer Problem, Histology, Cancer Stage, TNM Stage, Cancer Grade, Dimension, Clinical Status, Tumor Marker Test, Cancer Medicine, Cancer Surgery, Radiation, Gene Studied, Cancer Surgery Radiation, Gene Studied, Variation Code, Body Site

4. side effects NER and relevance

In addition to pinpointing and annotating major medical expressions and their relationships, the system also supports the annotation of relationships to side effects caused by administered drugs (Drugs) and procedures (Procedures), as shown in the figure on the left.

  • After chemotherapy [Procedure], the patient experienced nausea [Adverse Effect] and vomiting [Adverse Effect].
  • The patient also has hepatitis [Adverse Effect] caused by Xeloda [Drug].

5. assertion status

Not only do we implement medical expressions and their relationships, but we also classify the Status, Negation, and Subject associated with these medical expressions. In the example below, medical history and family history are assigned to Status.

NER & Entity Linking Data

A dataset for Named Entity Recognition (NER) and entity linking, focused on extracting medical entities such as symptoms, procedures, medications, and anatomical locations from medical documents.

  • Annotations applied to over 10,000 medical documents
  • Extraction of a wide range of medical entities, including Problems, Diagnoses, Procedures, and Medications
  • JSON Format

ICD-10-CM & CPT Data

A medical NLP dataset annotated with ICD-10-CM and CPT codes for medical documents, designed to support automated medical coding and clinical text analysis.

  • Annotations applied to over 10,000 medical documents
  • Designed for model development integrated with major medical terminologies, including ICD‑10‑CM, CPT, SNOMED, UMLS, and RxNorm
  • JSON Format
POS tagged data

A dataset for NLP development featuring part‑of‑speech (POS) tagging applied to medical text, designed to support medical language processing and Document AI applications.

  • Annotations applied to more than 20,000 medical text samples
  • Includes detailed parts‑of‑speech (POS) annotation data
  • JSON Format