Overview of the Biocreative V chemical disease relation (CDR) task

TitleOverview of the Biocreative V chemical disease relation (CDR) task
Publication TypeConference Proceedings
Year of Conference2015
AuthorsWei C-H, Peng Y, Leaman R, Davis APeter, Mattingly CJ, Li J, Wiegers TC, Lu Z
Conference NameProceedings of the BioCreative V Workshop
Pagination154-166
Date Published2015
Abstract

Manually curating chemicals, diseases, and their relations is of significant importance to biomedical research but is plagued by its high cost and the rapid growth of the biomedical literature. In recent years, there has been a growing interest to develop computational approaches for automatic chemical-disease relation (CDR) extraction with proposals of different techniques. Despite these attempts, the lack of a comprehensive benchmarking dataset has limited the comparison of different techniques in order to assess and advance the current state of the art. To this end, we set up a challenge task through BioCreative V to automatically extract CDRs from the literature. More specifically, we designed two challenge tasks: disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. To assist system development and assessment, we created a large annotated text corpus that consists of human annotations of all chemicals, diseases and their interactions in 1,500 PubMed articles. A total of 34 teams worldwide participated in the CDR task: 16 in the DNER task and 18 in the CID task. When comparing the text-mined results with the manually annotated ground truth, the best systems achieved an F-score of 86.46 for the DNER task – a result that approaches the human inter-annotator agreement (0.8875) – and an F-score of 57.03 for the CID task, the highest results ever reported for such tasks. In addition to the accuracy, another novel aspect of our evaluation is that we tested each participating system’s ability to return real-time results in a timely manner: the average response time for each team’s DNER and CID systems are 5.6 and 9.3 seconds via their respective web services. Given the level of participant and team results, we find our task to be successful in engaging the text-mining research community, producing a large annotated corpus, and improving the results of automatic disease recognition and chemical-disease relation extraction.

URLhttps://biocreative.bioinformatics.udel.edu/media/store/files/2015/BC5CDRoverview.pdf