ScoreTracker: When NIH wants an impact assessment

January 8th, 2010 Yannick No comments

Here’s an interesting offshoot of the ResearchScorecard database: NIH ScoreTracker, a data collection tool intended to help research institutions assess the impact of NIH-sponsored research.

In the specific case at hand, one of our clients needed to identify which grant proposals and publications benefited from services partly paid by funds from the NIH’s Centers for Clinical and Translational Science Awards (CTSA) program. For interesting social reasons, universities have historically been rather poor at tracking the impact of a funder’s support, even one as important as NIH.

For university service providers, this is an especially troublesome problem, because they often service many parts of an institution’s research effort but can’t easily point to specific contributions, for example, providing statistical consulting that eventually helps land a grant proposal.

This gets especially interesting for CTSA grant recipients, because NIH requires that they demonstrate that the funds are indeed impacting the biomedical research process, as is the program’s goal. Problem is, how does one do that? NIH doesn’t provide specific metrics, such that it is up to grantees to identify them, assuming they have the necessary data. In my experience, this is simply not the case, and yet the grantees must satisfy the NIH’s very reasonable request in some way. Interesting, non? Well, OK, perhaps only if you’re a data geek.

Our solution has been to develop a tool that automatically mails a group of researchers with a URL to a survey which is automatically populated by the ResearchScorecard database. NIH ScoreTracker makes it very quick and easy for the busy researcher to identify which publications and grants benefited from an institutional service, such as the services of a statistician paid by CTSA funds. It does so by boiling everything down to a Y/N format — the researcher never has to enter anything else (see screenshots).

collecting impact for publications

Collecting impact for publications

collecting impact for grant proposals

Collecting impact for grants

It doesn’t get any simpler than that, and this is also true for the administrators seeking the data. All they need to do is identify the group of researchers they want to contact, and NIH ScoreTracker will automatically e-mail each researcher until they take the survey. In the absence of a response, NIH ScoreTracker will eventually conclude that an answer is not forthcoming, at which point the administrators are contacted so that they can use their charms to remedy the problem. This is all done automatically.

Beyond tracking successful grants and publications, which aren’t terribly informative measures of the impact of research upon medical practice, more innovative approaches can be envisioned, such as counting literature mentions of techniques and products derived from an institution’s research. For example, much of microarray technology was invented at Stanford University. ResearchScorecard’s database can answer question such as “what is the growth of mentions of microarray-based approaches in the literature?”, which can serve as a proxy for estimating the impact of that research on biomedical research, where impact means “changing the way research is done”. A more involved version of this analysis estimate the dollars spent on the technology over the years to generate an ROI figure. Watch this space for future tools of this nature.

Categories: New functionality Tags:

New: Finding postdocs with specific profiles of expertise

December 27th, 2009 Yannick No comments

Today we’re introducing our latest search tool, Find Postdoctoral Researcher. As you can guess, it’s about finding (biomedical) postdoctoral researchers.

ResearchScorecard's Find Postdoctoral Researcher tool

Why do so? Postdocs are perhaps the main engine of biomedical research, and contacting a postdoc is often essential to a project. Want to learn a technique? Discuss the suitability of an instrument? Compare notes on a set of results? Postdocs are ideal for such.

Problem is, there are very few ways of finding postdocs, especially if a refined list is desired, i.e., not returning every postdocs at an institution.

This is where ResearchScorecard’s Find Postdoctoral Researcher comes in. With it, you can find postdocs based on:

  • their area of research expertise, based on publication record.
  • the research products they use. We’re particularly proud of that one.
  • a PubMed record (i.e., identifying which of the authors of paper are postdocs).
  • their family name.

We’ll be adding an additional function shortly, namely, retrieving postdocs by institution. You’d think we would already have that one, but no.

The tool is in its beta phase, and you can let us know how useful you find it by answering the one-question survey at the bottom of the page that comes up once the search is finished. You can access the tool by going to Toosl→Find Postdoctoral Researcher.

Currently, the main limitation is the difficulty of inferring who is a postdoc, so there are a lot of false negatives (folks who are postdocs but are not identified as such). I’m sure ya’ll find lots more additional problems, so please let us know what they are!

Categories: New functionality Tags:

Expertise finding at DARPA

December 26th, 2009 Yannick No comments

I like to follow DARPA’s work, which I view as tapping into the brains of really smart people and processes.  This is why a current project of theirs caught my eye: “Tools for the Analysis of Social and Group Dynamics”, (SB101-005).

The project involves “… providing advanced understanding to a non-expert”, which might be a nice concise definition of knowledge management. It “… seeks (to develop) novel technologies that can be brought to bear upon the problem of analyzing the dynamics of a complex society and gaining insights into the potential effects of policies, plans, and courses of actions in such environments. ” In other words, software that would enable one to undertand the social dynamics in a group, and perhaps help you predict the likely outcome of hypothetical events. Neat, huh?

Specifically, “DARPA is interested in developing tools that can assist military planners and decision makers at any echelon by providing them with the kind of insights that high-level leaders currently have access to via human experts.” Note emphasis on eliminating the current bottleneck: human experts.

The announcement continues by stating that a “… wide variety of technical approaches and solutions are envisioned and would be in scope, from networking technologies that can effectively put planners in touch with available experts; to modeling and simulation technologies that allow “what if” analysis; to innovative information storage and retrieval techniques for acquiring and applying lessons learned. What is essential is that the proposed technology provides a means for a reasonably intelligent non-expert to easily and effectively gain relevant expert-level insight into the potential behavior of a complex, dynamic society. ” Of course, given my focus, I was pleased to see the recognition given to expertise finding as one option for addressing this goal.

As has become typical in recent years, DARPA is seeking practicality assessments, such as determining the availability of data and the difficulty of acquisition, as well as evaluating ease-of-use issues and the explainability of models in the case of modeling projects.

Of course, DARPA is concerned with understanding “… the cultural and political dynamics of complex societies”, rather than expertise finding for biomedical researchers, but hey, I find it fascinating that they believe this area is worth tackling, even under high-risk/high-benefit expectations. This is yet another example of the kinds of smarts the US government regularly demonstrates, yet are rarely appreciated by most of the public, I suspect.

Last, check out the interesting references they provide. I found the Epstein book particularly interesting.

  • R. M. Axelrod, The Evolution of Cooperation, Basic Books, Inc.: New York, 1994.
  • J. Epstein, Generative Social Science: Studies in Agent-Based Computational Modeling, Princeton University Press, 2006.
  • I.O. Lesser, Coalition Dynamics in the War against Terrorism, The International Spectator, Feb. 2002.
  • R.E. Neustadt & E.R. May, Thinking in Time: The Uses of History for Decision Makers, Free Press, New York, 1986.
  • P. Schrodt, Forecasting Conflict in the Balkans using Hidden Markov Models. Pp. 161-184 in Robert Trappl, ed. Programming for Peace: Computer-Aided Methods for International Conflict Resolution and Prevention. Dordrecht, Netherlands: Kluwer Academic Publishers, 2006.

ARRA funding for multi-site biomedical expertise finding system

November 8th, 2009 Yannick No comments

Led by the University of Florida, a consortium of universities and one institute have recently received ARRA funding to create a compendium of scientists that will cover seven American institutions.

Yup folks, unless you are an NIH employee involved in managing extra-mural funding, there is no comprehensive source of researcher information about tax-funded bioresearchers accessible to scientists (or tax payers), other than what is provided by individual funding agencies.

Even then, these portals are focused strictly on serving a limited spectrum of queries, and they are not integrated, such that there is no practical way to answer basic question such as “list all university scientists involved in bioresearch last year in the US”. Worst still, until very recently the ability of NIH (the biggest US biomedical funder) to deliver data about whom and what it funds was remarkably poor, though it has now made up for it, thanks to its excellent RePORTER tool.

As the press release states: “We think this will have a huge multiplier effect and will allow researchers to find new partners and other ways to use their research,” said Judith Russell, dean of the University Libraries at UF. “For years, librarians have helped researchers find the information they need. This is another type of critical information scientists need.” Research relies on scientists working together, so any software that facilitates the social aspects of that process constitutes an obvious, low-hanging fruit way of enhancing research, especially since it doesn’t involve having to figure out how to split the atom, develop molecular biology or otherwise coming with groundbreaking developments.

Given its importance, you have to wonder why it took so long to get started on building a comprehensive, “broad public” system. By way of explanation, bear in mind that many (most?) universities have a terrible time just figuring out who is researching within their walls, suffering as they do from the kind of stove-piping and other information ills that brought us 911 among other notable failures. Then again, one could argue that Web 2.0 technologies and methods have only recently reached sufficient maturity to consider a comprehensive researcher portal, so perhaps the timing isn’t so off. Or not: It could have been done five or even ten years ago, methinks.

Although one can fault the federal government for taking so long to get started, we should all note the wisdom and innovativeness of this project. As far as I know, no other country has embarked on something like this, and certainly no country as large as the US, with the largest research ecosystem in the world. To my mind, this is the kind of far-ranging project that is naturally suited for ARRA funding, and I’m thrilled to see it recognized as such. So hats off to our civil service friends: well done! If I’m right, it may serve as a powerful force enabler for American science in the years to come.

Now, back to nitpicking: One might wonder why NIH didn’t simply fund the National Library of Medicine to do this work. It would fit very logically within their mission, and indeed would help NLM significantly, as examplified with Thomson’s ResearcherID project, aimed at making it easier to disambiguate researchers and their contributions. Well, ARRA funds weren’t available to NLM, so perhaps it’s as simple as that. Still, I couldn’t avoid being surprised at finding the University of Florida being the “prime” on this grant. I don’t know about you, but I certainly don’t think of UF when pondering “Semantic Web” and “expertise finding”. If anyone knows better, please do light my lantern.

On the technical side, the proposed compendium will use the VIVO system, a nice piece of software developed by the Cornell Libraries (there’s that library connection again). Interestingly, the software is Open Source to boot, though I haven’t found a download site yet. More once I get my hands on it.

Why ResearchScorecard now links to LinkedIn

November 4th, 2009 Yannick No comments

examining a researcher's LinkedIn network

examining a researcher's LinkedIn network

We’ve recently added functionality that links our Researcher Profiles to public LinkedIn profiles.

Why bother, you might think? The reasons are eloquently described in an interesting study by a group of researchers in academia, software companies and one of my favorite defense contractors, MITRE Corporation.

Having researched the requirements for expertise location systems for biomedical scientists, one of Schleyer et al.’s (2008) major findings is the need to exploit “… others’ social networks when searching for collaborators”. In plain language, this just means that when considering a collaboration, people find it helpful to understand who is associated with the prospective collaborator, perhaps to determine whether a common contact could perform introductions, but also to get a sense of the person (kind of like in high school, where one is often judged by their crowd). Yes, biomedical researchers are just like everyone else when it comes to socialization.

In short, after perusing the professional and scientific aspects of a potential collaborator, you’ll now be able to jump to LinkedIn to figure out whether there is a contact known to you both that can tell you more about him/her. Neat, huh?

Of course, such “social networking inter-connection” is one thing LinkedIn does admirably well in the professional realm, and so it didn’t take much to convince us to enable our Researcher Profiles to show a link to an individual’s profile when it’s available. Note that you will need your own LinkedIn account to be able examine someone else’s network.

Going back to the study, Schleyer et al. present ten major conclusions derived from interviews and a comprehensive literature review. The interviewees were from Carnegie Mellon University and the University of Pittsburgh. As with all expertise finding studies I know of, the results are retrospective only, since no scientist was actually observed in the process of seeking expertise. Though understandable, this limitation is unfortunate, given the relative inability of human subjects to recall and accurately describe their motivations and thought processes post facto.

Requirements identified by study Our plain language translation What we’re doing about it
“The effort required to create and update an online profile should be commensurate with the perceived benefit of the system” Scientists just don’t have the time to create and maintain their profile… Our Researcher Profiles are not populated by the researcher.
“Online profiles should (…) reduce the effort involved in making collaboration decisions” The study states that information about a scientist is “…very fragmented and inhomogeneous”. In short, creating a robust profile requires lots of manual Web searching and inability to construct a comprehensive data set by which to judge a given data point against a distribution (the only way to really understand data). Resolving this problem is one of ResearchScorecard’s main value-added features: very different data sets are brought together and harmonized; statistical distributions are created and used to contextualized individual data points.
“Online profiles should be up-to-date” Selecting a collaborator involves predicting aspects of the professional future of that person; leading indicators are preferred over trailing indicators. ResearchScorecard is one of very few biomedical expertise systems that cover granting data, one of the “freshest” data sources to describe current researcher activity. And of course, we include funding amounts, not just title and grant number, and we do so for multiple funders, even private ones.
“Researchers should be able to exploit their own and others’ social networks when searching for collaborators” Scientists want to assess their potential collaborator’s “clique”. Now available!
“The system should model proximity, which influences the potential success in several respects” “Proximity” = physical proximity, social proximity (clique), organizational proximity, and closeness of research area between the two parties. RSC provides unit affiliation and research area proximity for this purpose through its Collaborator Network report, though we could do a better of showing physical proximity. Here’s an example report (takes a few minutes to compute).
“The system should facilitate the assessment of personal compatibility, similarity of work styles and other “soft” traits influencing collaborations” Is the potential collaborator a nice person? Does he/she know how to collaborate? We provide metrics of the number of collaborators over the years as a rough way to address this question.
“Social networks based on co-authorship may only partially describe a researcher’s collaborative network” What about data from memberships in research consortia, clinical trials, etc, that are not always visible? There is a lot here that we don’t address … yet. We do track co-PIships and are considsering mining the acknowledgment section of publications (see this 2004 paper for an example application).
“The system should account for researchers’ preferences regarding privacy and public availability of information about them” This topic is replete with a plethora of aspects, but one elephant in the room is the desire from some researchers to not attract attention for any number of reasons… We at ResearchScorecard believe that if a researcher works in a research institution that receives public funding, there are no strong reasons to exclude aspects of a professional persona from the profile if the underlying data are already publicly visible.
“The system should provide methods to search effectively across disciplines” Biomedical research is vastly more cross-disciplinary than even ten years ago. Witness discoveries that rely on instruments that are heavily dependent upon physics, chemistry, computer science, engineering, etc. This dependency on other disciplines is likely to continue increasing. This requirement is why we are investigating the merging of expertise data with data from compound analysis systems such as CDD (see our recent blog post).
“The system should help make “non-intuitive” connections between researchers” Finding potential collaborators that look like you: easy. Finding potential collaborators that you should consider yet don’t look like you: hard. This requirement is related to cross-disciplinary searching, though there are plenty of potential collaborators in proximal fields as well. For a software system to make non-intuitive yet useful recommendations would be very valuable, as long the recipients have confidence in the recommendations. Unfortunately, it’s our experience that the more non-intuitive the recommendation, the less likely the recipients’ confidence in the recommendation…

Combining expertise data with “compound annotation”

October 28th, 2009 Yannick No comments

As mentioned earlier, I’ll be presenting at the 2010 National Meeting of the American Chemical Society. You can find the abstract of my talk here. In brief, it involves applying expertise mining to what Christopher Lipinski calls “compound annotation”. This process, described by Dr. Lipinski at the latest Community Meeting held by Collaborative Drug Discovery (CDD), involves examining the results of compound screens in the light of what the literature has to say about a hit or highly related analogs of the compound. The goal of is to determine whether a hit is spurious or otherwise chemically undesirable, problems that plague screens and yet could be substantially alleviated by considering the literature, according to him.

Specifically, Dr. Lipinski believes that one should be suspicious of hits involving compounds for which there are no published data beyond simple hits. This is because the vast majority of screens are done with compounds that have been available for a long time. Combined with his belief that compound space isn’t that large when it comes to biological activity, these notions mean that one should beware of a hit involving an uncharacterized compound that is also dissimilar to other, better characterized compounds. I hasten to add that these are Dr. Lipinski’s notions, but hey, remember who he is, namely, one of the most successful medicinal chemists ever. True, pharmaceutical companies do have lots of screening data that have never been published, but the recent growth of publicly-accessible screening data should significantly increase the likelihood that biological activity data will be visible outside of corporate databases if true activity exists.

Now, it occurred to me that this compound annotation process could be improved by combining it with data from expertise mining. This would enable one to address obvious needs such as calibrating one’s understanding of published results according to the author’s experience in the field, the specific compounds in question, the techniques involved, etc. I intend to use CDD’s data mining environment as a backdrop for this effort, as it does a great job of bringing screening data and literature annotations together. Stay tuned.

Categories: Analysis, Events Tags:

Upshot from 3rd CDD Community Meeting

October 6th, 2009 Yannick 3 comments

Collaborative Drug Discovery (CDD) held its third Community Meeting at The J. David Gladstone Institutes at UCSF’s Mission Bay campus last week. This well-attended meeting brought together an unusual mix of biomedical researchers and assorted computational types to mingle with foundations, biotechs and pharmas, all to discuss how CDD’s unique technology has helped them tackle orphan or under-studied diseases such as malaria and tuberculosis.

Because facilitating collaborative research is one of ResearchScorecard’s core goals, I made a point of attending, as well as presenting a poster.  In brief, our message was simply that our tools can help identify and assess potential collaborators, something which is especially important in the realm of orphan diseases, where researchers may not be well-integrated with areas of research with greater representation.

Interestingly for me, another expertise location system was also presented in poster form by Dr. Kate Marusina from UC Davis’ Clinical and Translational Science Center. Specifically, she and her colleagues have been developing a CTSA Pharmaceutical Assets Portal using a data mining-intensive approach similar to ResearchScorecard’s.  Their goal is to help “… forge relationships with the pharmaceutical/biotech industry with the intent to facilitate the transfer of the investigational drugs and biologics for academic research”. This, of course, is a variant of ResearchScorecard’s goal, and I was thrilled to discover the interesting way they’ve been going at it. Going forward, you can expect to see their influence upon ResearchScorecard (emulation being the sincerest form of flattery, you know). More on their work in a subsequent blog.

Now, if you’re not familiar with CDD, you would do well to check them out. At its simplest, they have married Web 2.0 groupware functionality with traditional compound screening capabilities such as registration, data analysis and visualization, and protocol management. Some of the very impressive features of CDD’s tool are the exceptional flexibility, ease of use and speed they offer over tools from traditional vendors such as MDL and others. It has to be seen to be believed, and CDD’s CEO, Dr. Barry Bunin, gave a mind-blowing demo of the improvements they’ve made over the last year. OK, it’s mind-blowing only if you’re a software geek, but hey, I know for a fact that the kinds of operations performed in seconds by CDD used to take anywhere from minutes to days, if they were possible at all.

So how do they do it? Their software is written using the Ruby language following the agile methodology, complete with unit testing for all components and generating “builds” automatically, thus enabling rapid progress. Having worked in the scientific software field, I can tell you there are very few companies that follow such a rigorous process. The upshot is that not only is CDD’s software remarkably bug-free, but their productivity appears excellent. That’s a hard combination to achieve in any field, scientific or otherwise.

Consequently, I’ve now officially joined the ranks of believers in Ruby on Rails (RR), at least when it comes to scientific software. Why? Because building such software has traditionally lagged way behind experimentation, creating constant frustrations with the biologists and chemists who depend on the software keeping up with them. As far as I can tell, RR is as big a step-up in facilitating the development of bio- and chemoinformatics software as Perl was in the early ’90s, namely, a 10X over writing software in the standard language and approach of the day: endless compilations of programs written in C, ouch.  Within certain parameters, I believe CDD has proven that Ruby is well-suited for even compute-heavy scientific applications.

Categories: Events Tags:

The lastest on ARRA grants

September 28th, 2009 Yannick No comments

Continuing our analysis of ARRA grantees initiated back in July, below are some of the features of the latest batch of ARRA awards. As of early September, over $108M were allocated to 361 researchers located at one of the six institutions we currently cover.

Click for full view
Table 1
ARRA funding properties for Sept 2009 for institutions covered by ResearchScorecard
First, the distribution of funds to the six universities (Table 1 and 2). As with the July awards, UCLA received the largest total amount of ARRA funds, and the Scripps Research Institute (La Jolla) still retains the highest per capita financing, with $410K per principal investigator (PI) vs. $343 for UCLA. In contrast , Stanford and Caltech switch places, with Caltech now in third place and Stanford dropping to fifth rank, again on a per-PI basis. UCSD comes in last, as before.
Click for full view
Figure 1 Institutional affiliation of multiple ARRA award recipients for institutions covered by ResearchScorecard

As with the July crop, UCLA still comes in first with the largest number of grants awarded per recipient PI (Fig 1). Scripps makes a good showing with a strong second place, whereas Stanford drops one notch to third place.

Click for full viewTable 2 Research topics of ARRA recipients for Sept 2009 for institutions covered by ResearchScorecard Table 2 lists the specific primary research topics of the recipient PIs. As with the July data, topics associated with gene expression analysis, immunology, neurobiology and computational biology are strongly represented.
Categories: Analysis Tags:

Help ResearchScorecard: Buy these great books

September 2nd, 2009 Yannick 2 comments

If you like, you may be interested in the books that have helped us design and code the site.

Our new “We recommend…” section lists those books that have done just that for us, available from These great titles pertain to everything from web spider development to data mining to business intelligence, along with lots of other techy topics.

So hey, if…

  • you like our site
  • you like these books
  • intend to buy one of them

…well, please consider doing so using the links in the recommendation section to make your purchase. You’ll pay exactly the same as you would if you had found the book on Amazon, and we’ll get a (small) cut.

Your purchase will help toward maintaining the free services we provide, and you’ll receive our undying gratitude to booth!

Categories: We recommend Tags:

Meet us at CDD Community Meeting

August 30th, 2009 Yannick No comments

CDD logoI’ll be presenting a poster at the Third Annual CDD Community Meeting, October 1st at the J. David Gladstone Institute in San Francisco, California.

Below are the title and abstract.

Using Expertise Finding To Make Better Scientific Partnering Decisions is a scientific social networking site designed to greatly increase the quality of scientific partnering decisions. By applying a suite of data mining algorithms to vast collection of descriptors of academic bioresearchers, we enable a much more robust assessment of potential collaborators, all with maximal ease of use. Contrary to most social networking or people finder sites, ResearchScorecard’s data are gathered automatically from public sources on the web, followed by heavy cleansing and semantic integration. Only objects of recognized credibility, such as peer-reviewed papers, grants and patents are considered. This approach enables one of the unique features of the site: determining the ranking of researchers based on the degree of domain expertise and overall prominence. Two use cases are described: searching for expertise, and monitoring for new grants awarded to one’s collaborators.

Categories: Events Tags: