PubMed, an online bibliographic database of life sciences and biomedical articles, currently comprises of over 26 million indexed articles. This rapidly growing body of biomedical literature presents both an opportunity and a headache to researchers looking to extract critical information from this largely unstructured archive. Journal articles are primarily published in the format of free text, where language is unstructured and non-standard vocabulary (and jargon) is used to refer to important biomedical concepts, making it difficult to search and access relevant information.

Enter Dr. Andrew Su and his team at The Scripps Research Institute.

Dr. Su’s lab is interested in improving the process of extracting information from biomedical research literature, an action also known as biocuration. They have developed a web-based application called Mark2Cure that enables volunteer citizen scientists to conduct the crucial task of identifying key concepts in biomedical literature. For example, named entities in biomedical text might include genes, drugs, diseases and proteins. This “named entity recognition” (NER) procedure allows researchers to structure the knowledge in a way that can be queried, making information more easily accessible.

In an article published in Citizen Science: Theory and Practice, the Su lab demonstrate how non-expert citizen scientists can perform NER tasks with high accuracy, and by doing so contribute to the development of computational methods for improved information extraction.

Read Citizen Science for Mining the Biomedical Literature, or visit Mark2Cure.org for more information.