图片显示不正确 Precision Medicine Corpus

图片显示不正确What's it?

Precision Medicine Corpus (PMCorpus) is a public resource providing a manually annotated corpus and related resources for information extraction in the biomedical domain. This is the first golden standard corpus in biomedical field in China. Based on the Precision Medicine Ontology, PMCorpus contains 6000 PubMed articles in six major fields, including cancer, cardiovascular and metabolic diseases. This corpus is an essential resource for biomedical text mining, and is the basic support for semantic annotation, machine translation, knowledge correlation, data mining, intelligent retrieval and other functions.

图片显示不正确How to use it?

PMCorpus contains two collections, one is raw documents in .txt fomat (UTF-8 encoding), another is annotations in .ann format (text-based). Each document is named as PubMed ID and published year (e.g. 28581518_2017.txt).


Institute of Medical Information & Library (中国医学科学院医学信息研究所)


Xinying An(安新颖), Shaoping Fan(范少萍)



图片显示不正确Funding source
