Large-scale extraction of gene-level physiology from the bibliome
If you have performed an expression microarray experiment, or a genome-wide association study, or a high-throughput proteomic experiment, you will have had a librarian moment. In that moment, you wish that someone would have organized all that had ever been written about the genes or segments of the genome that came up in your experiment as "significant," typically by some statistical measure. The alternative of having to read hundreds if not thousands of papers is unappealing. Because that librarian moment is so common in this genomic era, a slew of companies have emerged to provide a systematic annotation linked to the literature. In these endeavors, Their ambitions greatly surpass those of the ontologists who are "merely" satisfied with a a few labels for each gene regarding biological processes, functions and cellular locations and they seek to provide whole pathways of gene regulation, and signaling. For this reason, I was quite intrigued by a presentation I recently heard at the C-SHALS conference in Cambridge, MA by a bioinformatics group at Sanofi Aventis. They provided that all too rare and extremely valuable style of review in biomedical science: The consumer report format. That is, they compared several of the leading bibliome-based gene annotation packages and systematically reviewed coverage and specificity of these competing wares. These products fell into two categories: those generated by human curation (Ingenuity, and GeneGO) and those by automated means (Temis, Ariadne). From my perspective the bottom-line was a) the coverage of all these packages is spotty and remarkably non-overlapping (of genes and processes) and b) the human-driven packages were dramatically better in several dimensions. Another full-time Librarian Employment Act, if librarians take this challenge of annotation as their own.