Combining expertise data with “compound annotation”
As mentioned earlier, I’ll be presenting at the 2010 National Meeting of the American Chemical Society. You can find the abstract of my talk here. In brief, it involves applying expertise mining to what Christopher Lipinski calls “compound annotation”. This process, described by Dr. Lipinski at the latest Community Meeting held by Collaborative Drug Discovery (CDD), involves examining the results of compound screens in the light of what the literature has to say about a hit or highly related analogs of the compound. The goal of is to determine whether a hit is spurious or otherwise chemically undesirable, problems that plague screens and yet could be substantially alleviated by considering the literature, according to him.
Specifically, Dr. Lipinski believes that one should be suspicious of hits involving compounds for which there are no published data beyond simple hits. This is because the vast majority of screens are done with compounds that have been available for a long time. Combined with his belief that compound space isn’t that large when it comes to biological activity, these notions mean that one should beware of a hit involving an uncharacterized compound that is also dissimilar to other, better characterized compounds. I hasten to add that these are Dr. Lipinski’s notions, but hey, remember who he is, namely, one of the most successful medicinal chemists ever. True, pharmaceutical companies do have lots of screening data that have never been published, but the recent growth of publicly-accessible screening data should significantly increase the likelihood that biological activity data will be visible outside of corporate databases if true activity exists.
Now, it occurred to me that this compound annotation process could be improved by combining it with data from expertise mining. This would enable one to address obvious needs such as calibrating one’s understanding of published results according to the author’s experience in the field, the specific compounds in question, the techniques involved, etc. I intend to use CDD’s data mining environment as a backdrop for this effort, as it does a great job of bringing screening data and literature annotations together. Stay tuned.

Events 