Named Entity Recognition (NER) in healthcare is a technique for detecting and classifying healthcare-specific terms (entities) such as patient names and medical terms from unstructured text. Performing such tasks not only improves the accuracy of data extraction from unstructured text and facilitates information retrieval, but also enhances advanced AI systems. Medical NER is an essential technology for AI development in natural language in medical institutions.
TranSynk’s NER dataset is a dataset designed to help healthcare organizations extract critical information from unstructured data. It can reveal relationships between medical reports, insurance documents, patient reviews, clinical notes, and other data to increase the visibility of medical data. We leverage NLP’s advanced expertise to handle complex custom annotation projects of any size.
1. identification of medical specific expressions
Medical records contain a vast amount of medical information, much of which is unstructured text that is not easy to identify, partly due to its specialized nature. To facilitate the conversion of this unstructured content into a structured format, unique expression annotations dedicated to medical information are required.
2.1 Attributes of Pharmaceutical Products
Most medical records contain information about drugs and their attributes that are important to clinical practice. Based on established guidelines, the various attributes of these drugs are accurately annotated.
2.2 Laboratory Data Attributes
Laboratory data contained in medical records often describe unique attributes. We follow established guidelines to identify these attributes and provide accurately annotated data.
2.3 Attributes of physical measurements
Physical measurements include a variety of data, including vital signs, and are recorded in the medical record along with their respective attributes. We can identify these physical measurement attributes and annotate or tag them appropriately.
3. oncology-specific NER
In addition to general medical unique expression extraction (NER) annotations, we also support NER in highly specialized areas such as oncology and radiology. Oncology can provide datasets for the following NER annotations Cancer Problem, Histology, Cancer Stage, TNM Stage, Cancer Grade, Dimension, Clinical Status, Tumor Marker Test, Cancer Medicine, Cancer Surgery, Radiation, Gene Studied, Cancer Surgery Radiation, Gene Studied, Variation Code, Body Site
4. side effects NER and relevance
In addition to pinpointing and annotating major medical expressions and their relationships, the system also supports the annotation of relationships to side effects caused by administered drugs (Drugs) and procedures (Procedures), as shown in the figure on the left.
- After chemotherapy [Procedure], the patient experienced nausea [Adverse Effect] and vomiting [Adverse Effect].
- The patient also has hepatitis [Adverse Effect] caused by Xeloda [Drug].
5. assertion status
Not only do we implement medical expressions and their relationships, but we also classify the Status, Negation, and Subject associated with these medical expressions. In the example below, medical history and family history are assigned to Status.


