What's it?
Precision Medicine Corpus (PMCorpus) is a public resource providing a manually annotated corpus and related resources for information extraction in the biomedical domain. This is the first golden standard corpus in biomedical field in China. Based on the Precision Medicine Ontology, PMCorpus contains 6000 PubMed articles in six major fields, including cancer, cardiovascular and metabolic diseases. This corpus is an essential resource for biomedical text mining, and is the basic support for semantic annotation, machine translation, knowledge correlation, data mining, intelligent retrieval and other functions.
How to use it?
PMCorpus contains two collections, one is raw documents in .txt fomat (UTF-8 encoding), another is annotations in .ann format (text-based). Each document is named as PubMed ID and published year (e.g. 28581518_2017.txt).
Institute
Institute of Medical Information & Library (中国医学科学院医学信息研究所)
Author
Xinying An(安新颖), Shaoping Fan(范少萍)
Support
Publication
Unpublished
Funding source
[{"id":"2016YFC0901902","name":"2016YFC0901902"}]