Blog Events
 

Archive

Archive for the ‘Events’ Category

Combining expertise data with “compound annotation”

October 28th, 2009 Yannick No comments

As mentioned earlier, I’ll be presenting at the 2010 National Meeting of the American Chemical Society. You can find the abstract of my talk here. In brief, it involves applying expertise mining to what Christopher Lipinski calls “compound annotation”. This process, described by Dr. Lipinski at the latest Community Meeting held by Collaborative Drug Discovery (CDD), involves examining the results of compound screens in the light of what the literature has to say about a hit or highly related analogs of the compound. The goal of is to determine whether a hit is spurious or otherwise chemically undesirable, problems that plague screens and yet could be substantially alleviated by considering the literature, according to him.

Specifically, Dr. Lipinski believes that one should be suspicious of hits involving compounds for which there are no published data beyond simple hits. This is because the vast majority of screens are done with compounds that have been available for a long time. Combined with his belief that compound space isn’t that large when it comes to biological activity, these notions mean that one should beware of a hit involving an uncharacterized compound that is also dissimilar to other, better characterized compounds. I hasten to add that these are Dr. Lipinski’s notions, but hey, remember who he is, namely, one of the most successful medicinal chemists ever. True, pharmaceutical companies do have lots of screening data that have never been published, but the recent growth of publicly-accessible screening data should significantly increase the likelihood that biological activity data will be visible outside of corporate databases if true activity exists.

Now, it occurred to me that this compound annotation process could be improved by combining it with data from expertise mining. This would enable one to address obvious needs such as calibrating one’s understanding of published results according to the author’s experience in the field, the specific compounds in question, the techniques involved, etc. I intend to use CDD’s data mining environment as a backdrop for this effort, as it does a great job of bringing screening data and literature annotations together. Stay tuned.

Categories: Analysis, Events Tags:

Upshot from 3rd CDD Community Meeting

October 6th, 2009 Yannick 3 comments

Collaborative Drug Discovery (CDD) held its third Community Meeting at The J. David Gladstone Institutes at UCSF’s Mission Bay campus last week. This well-attended meeting brought together an unusual mix of biomedical researchers and assorted computational types to mingle with foundations, biotechs and pharmas, all to discuss how CDD’s unique technology has helped them tackle orphan or under-studied diseases such as malaria and tuberculosis.

Because facilitating collaborative research is one of ResearchScorecard’s core goals, I made a point of attending, as well as presenting a poster.  In brief, our message was simply that our tools can help identify and assess potential collaborators, something which is especially important in the realm of orphan diseases, where researchers may not be well-integrated with areas of research with greater representation.

Interestingly for me, another expertise location system was also presented in poster form by Dr. Kate Marusina from UC Davis’ Clinical and Translational Science Center. Specifically, she and her colleagues have been developing a CTSA Pharmaceutical Assets Portal using a data mining-intensive approach similar to ResearchScorecard’s.  Their goal is to help “… forge relationships with the pharmaceutical/biotech industry with the intent to facilitate the transfer of the investigational drugs and biologics for academic research”. This, of course, is a variant of ResearchScorecard’s goal, and I was thrilled to discover the interesting way they’ve been going at it. Going forward, you can expect to see their influence upon ResearchScorecard (emulation being the sincerest form of flattery, you know). More on their work in a subsequent blog.

Now, if you’re not familiar with CDD, you would do well to check them out. At its simplest, they have married Web 2.0 groupware functionality with traditional compound screening capabilities such as registration, data analysis and visualization, and protocol management. Some of the very impressive features of CDD’s tool are the exceptional flexibility, ease of use and speed they offer over tools from traditional vendors such as MDL and others. It has to be seen to be believed, and CDD’s CEO, Dr. Barry Bunin, gave a mind-blowing demo of the improvements they’ve made over the last year. OK, it’s mind-blowing only if you’re a software geek, but hey, I know for a fact that the kinds of operations performed in seconds by CDD used to take anywhere from minutes to days, if they were possible at all.

So how do they do it? Their software is written using the Ruby language following the agile methodology, complete with unit testing for all components and generating “builds” automatically, thus enabling rapid progress. Having worked in the scientific software field, I can tell you there are very few companies that follow such a rigorous process. The upshot is that not only is CDD’s software remarkably bug-free, but their productivity appears excellent. That’s a hard combination to achieve in any field, scientific or otherwise.

Consequently, I’ve now officially joined the ranks of believers in Ruby on Rails (RR), at least when it comes to scientific software. Why? Because building such software has traditionally lagged way behind experimentation, creating constant frustrations with the biologists and chemists who depend on the software keeping up with them. As far as I can tell, RR is as big a step-up in facilitating the development of bio- and chemoinformatics software as Perl was in the early ’90s, namely, a 10X over writing software in the standard language and approach of the day: endless compilations of programs written in C, ouch.  Within certain parameters, I believe CDD has proven that Ruby is well-suited for even compute-heavy scientific applications.

Categories: Events Tags:

Meet us at CDD Community Meeting

August 30th, 2009 Yannick No comments

CDD logoI’ll be presenting a poster at the Third Annual CDD Community Meeting, October 1st at the J. David Gladstone Institute in San Francisco, California.

Below are the title and abstract.

Using Expertise Finding To Make Better Scientific Partnering Decisions
ResearchScorecard.com is a scientific social networking site designed to greatly increase the quality of scientific partnering decisions. By applying a suite of data mining algorithms to vast collection of descriptors of academic bioresearchers, we enable a much more robust assessment of potential collaborators, all with maximal ease of use. Contrary to most social networking or people finder sites, ResearchScorecard’s data are gathered automatically from public sources on the web, followed by heavy cleansing and semantic integration. Only objects of recognized credibility, such as peer-reviewed papers, grants and patents are considered. This approach enables one of the unique features of the site: determining the ranking of researchers based on the degree of domain expertise and overall prominence. Two use cases are described: searching for expertise, and monitoring for new grants awarded to one’s collaborators.

Categories: Events Tags:

ResearchScorecard presenting at ACS 2010

August 27th, 2009 Yannick 1 comment

Some news: I will be speaking at the American Chemical Society’s 239th National Meeting & Exposition, March 21-25 2010 in San Francisco, California.

Specifically, I will be speaking as part of the Future of Scholarly Communication symposium, which is hosted by the Chemical Information Division (CINF) of ACS.

As you can guess, the symposium is one of many such efforts at coming to grips with the potential of Web technologies to ameliorate how researchers use the literature, whether in terms of access, comprehension, search specificity or machine usage. So far, I have to say we’re still very much in the “coming to grips” part of the problem, which I guess is to be expected, given the recentness and vastness of the possibilities offered by Web technologies.

This division consists of librarians, publishers, software and database vendors, and scientists in the fields of chemical information, chemical informatics, and drug discovery.

My preliminary title is “Putting the Researcher Forward: Expertise Databases For Better Navigation of the Literature”, and will likely focus on use cases whereby a database of semantically robust metrics can be facilitate the relationship of scientists with the literature. I’ll probably try to address cross-disciplinary issues as well, although that’s a tough one.

Categories: Events Tags: