|
|
Archive
Archive for the ‘Analysis’ Category
As mentioned earlier, I’ll be presenting at the 2010 National Meeting of the American Chemical Society. You can find the abstract of my talk here. In brief, it involves applying expertise mining to what Christopher Lipinski calls “compound annotation”. This process, described by Dr. Lipinski at the latest Community Meeting held by Collaborative Drug Discovery (CDD), involves examining the results of compound screens in the light of what the literature has to say about a hit or highly related analogs of the compound. The goal of is to determine whether a hit is spurious or otherwise chemically undesirable, problems that plague screens and yet could be substantially alleviated by considering the literature, according to him.
Specifically, Dr. Lipinski believes that one should be suspicious of hits involving compounds for which there are no published data beyond simple hits. This is because the vast majority of screens are done with compounds that have been available for a long time. Combined with his belief that compound space isn’t that large when it comes to biological activity, these notions mean that one should beware of a hit involving an uncharacterized compound that is also dissimilar to other, better characterized compounds. I hasten to add that these are Dr. Lipinski’s notions, but hey, remember who he is, namely, one of the most successful medicinal chemists ever. True, pharmaceutical companies do have lots of screening data that have never been published, but the recent growth of publicly-accessible screening data should significantly increase the likelihood that biological activity data will be visible outside of corporate databases if true activity exists.
Now, it occurred to me that this compound annotation process could be improved by combining it with data from expertise mining. This would enable one to address obvious needs such as calibrating one’s understanding of published results according to the author’s experience in the field, the specific compounds in question, the techniques involved, etc. I intend to use CDD’s data mining environment as a backdrop for this effort, as it does a great job of bringing screening data and literature annotations together. Stay tuned.
Continuing our analysis of ARRA grantees initiated back in July, below are some of the features of the latest batch of ARRA awards. As of early September, over $108M were allocated to 361 researchers located at one of the six institutions we currently cover.

Table 1
 |
First, the distribution of funds to the six universities (Table 1 and 2). As with the July awards, UCLA received the largest total amount of ARRA funds, and the Scripps Research Institute (La Jolla) still retains the highest per capita financing, with $410K per principal investigator (PI) vs. $343 for UCLA. In contrast , Stanford and Caltech switch places, with Caltech now in third place and Stanford dropping to fifth rank, again on a per-PI basis. UCSD comes in last, as before. |
|
|
|
|

Figure 1  |
As with the July crop, UCLA still comes in first with the largest number of grants awarded per recipient PI (Fig 1). Scripps makes a good showing with a strong second place, whereas Stanford drops one notch to third place. |
|
|
Table 2  |
Table 2 lists the specific primary research topics of the recipient PIs. As with the July data, topics associated with gene expression analysis, immunology, neurobiology and computational biology are strongly represented. |
|
|
Here at ResearchScorecard we love numbers, especially when it concerns understanding funding decisions and projected impact on future research and product usage. The first batch of ARRA awards has now been released and we’ve been analyzing where the dollars are going.
Table 1
|
We first looked at the distribution of funds to the six universities we cover currently (Table 1 and 2).
Although UCLA received the largest amount, on a dollar-per-principal investigator (PI) basis it ranks second, with a far smaller average-dollars-per-PI compared with the number one per-capita recipient, The Scripps Research Institute in La Jolla, CA ($355K vs. $270K per PI).
|
| |
|
Table 2
|
To explain this sizeable difference, I postulate that grants to Scripps researchers tend to be more clinical in nature, since clinical research is notoriously more expensive than non-clinical research. I confess I haven’t dug deep enough to confirm this, though.
|
| |
|
Figure 1
|
UCLA also comes in first with the largest number of grants awarded per recipient PI (Fig 1). This doesn’t mean a whole lot, though, as UCLA is substantially larger that ther other institutions in our database, and so you would expect them to score highest in this respect
simply because they have more researchers. However, Caltech and Stanford are noteworthy for tying in second place for the number of awards, since both institutions are much smaller than the others.
I believe this reflects the outstanding average quality of the research performed at these institutions, as measured by scientific impact, grant funding per PI and other factors.
|
| |
|
Figure 2
|
When ARRA awards are analyzed with respect to the primary research area of the recipients, researchers involved in immunology, computational biology and genetics scored best in securing ARRA funding (Fig 2).
|
| |
|
Table 3
|
Table 3 lists the specific primary research topics of the PIs that make up the summary areas in Fig. 2, along with the matching funding, and oh by the way, this classification accounts for 68% of the funding, as we don’t have data on some of the recipients at this time (won’t last long, not to worry).
|
| |
|
Figure 3
|
Another interesting analysis asks the question “does the funding go primarily to oustanding researchers?” The short answer is no. Correlating the distribution of awards with researcher ranking based on our GOPR metric, we find that recipients are broadly distributed, though there is a small cluster of recipients of multiple ARRA grants in the high percentile range (Fig. 3). Interestingly, the sole individual that received three grants doesn’t score very high.
|
| |
|
Table 4
|
Finally, Table 4 looks at trademarked product usage for a dozen products that recipient researchers have mentioned in their papers since 2006.
I’m also including a comparison of usage between researchers whose GOPR was in the top 10% in 2008 (i.e., 90% percentile) and all recipients. Comparing with the 90% GOPR percentile, products that are significantly under-utilized are listed in green, whereas products that are significantly over-used are listed in orange.
|
|