Blog From the world of expertise finding...
 

Archive

Archive for the ‘From the world of expertise finding...’ Category

Expertise finding at DARPA

December 26th, 2009 Yannick No comments

I like to follow DARPA’s work, which I view as tapping into the brains of really smart people and processes.  This is why a current project of theirs caught my eye: “Tools for the Analysis of Social and Group Dynamics”, (SB101-005).

The project involves “… providing advanced understanding to a non-expert”, which might be a nice concise definition of knowledge management. It “… seeks (to develop) novel technologies that can be brought to bear upon the problem of analyzing the dynamics of a complex society and gaining insights into the potential effects of policies, plans, and courses of actions in such environments. ” In other words, software that would enable one to undertand the social dynamics in a group, and perhaps help you predict the likely outcome of hypothetical events. Neat, huh?

Specifically, “DARPA is interested in developing tools that can assist military planners and decision makers at any echelon by providing them with the kind of insights that high-level leaders currently have access to via human experts.” Note emphasis on eliminating the current bottleneck: human experts.

The announcement continues by stating that a “… wide variety of technical approaches and solutions are envisioned and would be in scope, from networking technologies that can effectively put planners in touch with available experts; to modeling and simulation technologies that allow “what if” analysis; to innovative information storage and retrieval techniques for acquiring and applying lessons learned. What is essential is that the proposed technology provides a means for a reasonably intelligent non-expert to easily and effectively gain relevant expert-level insight into the potential behavior of a complex, dynamic society. ” Of course, given my focus, I was pleased to see the recognition given to expertise finding as one option for addressing this goal.

As has become typical in recent years, DARPA is seeking practicality assessments, such as determining the availability of data and the difficulty of acquisition, as well as evaluating ease-of-use issues and the explainability of models in the case of modeling projects.

Of course, DARPA is concerned with understanding “… the cultural and political dynamics of complex societies”, rather than expertise finding for biomedical researchers, but hey, I find it fascinating that they believe this area is worth tackling, even under high-risk/high-benefit expectations. This is yet another example of the kinds of smarts the US government regularly demonstrates, yet are rarely appreciated by most of the public, I suspect.

Last, check out the interesting references they provide. I found the Epstein book particularly interesting.

  • R. M. Axelrod, The Evolution of Cooperation, Basic Books, Inc.: New York, 1994.
  • J. Epstein, Generative Social Science: Studies in Agent-Based Computational Modeling, Princeton University Press, 2006.
  • I.O. Lesser, Coalition Dynamics in the War against Terrorism, The International Spectator, Feb. 2002.
  • R.E. Neustadt & E.R. May, Thinking in Time: The Uses of History for Decision Makers, Free Press, New York, 1986.
  • P. Schrodt, Forecasting Conflict in the Balkans using Hidden Markov Models. Pp. 161-184 in Robert Trappl, ed. Programming for Peace: Computer-Aided Methods for International Conflict Resolution and Prevention. Dordrecht, Netherlands: Kluwer Academic Publishers, 2006.

ARRA funding for multi-site biomedical expertise finding system

November 8th, 2009 Yannick No comments

Led by the University of Florida, a consortium of universities and one institute have recently received ARRA funding to create a compendium of scientists that will cover seven American institutions.

Yup folks, unless you are an NIH employee involved in managing extra-mural funding, there is no comprehensive source of researcher information about tax-funded bioresearchers accessible to scientists (or tax payers), other than what is provided by individual funding agencies.

Even then, these portals are focused strictly on serving a limited spectrum of queries, and they are not integrated, such that there is no practical way to answer basic question such as “list all university scientists involved in bioresearch last year in the US”. Worst still, until very recently the ability of NIH (the biggest US biomedical funder) to deliver data about whom and what it funds was remarkably poor, though it has now made up for it, thanks to its excellent RePORTER tool.

As the press release states: “We think this will have a huge multiplier effect and will allow researchers to find new partners and other ways to use their research,” said Judith Russell, dean of the University Libraries at UF. “For years, librarians have helped researchers find the information they need. This is another type of critical information scientists need.” Research relies on scientists working together, so any software that facilitates the social aspects of that process constitutes an obvious, low-hanging fruit way of enhancing research, especially since it doesn’t involve having to figure out how to split the atom, develop molecular biology or otherwise coming with groundbreaking developments.

Given its importance, you have to wonder why it took so long to get started on building a comprehensive, “broad public” system. By way of explanation, bear in mind that many (most?) universities have a terrible time just figuring out who is researching within their walls, suffering as they do from the kind of stove-piping and other information ills that brought us 911 among other notable failures. Then again, one could argue that Web 2.0 technologies and methods have only recently reached sufficient maturity to consider a comprehensive researcher portal, so perhaps the timing isn’t so off. Or not: It could have been done five or even ten years ago, methinks.

Although one can fault the federal government for taking so long to get started, we should all note the wisdom and innovativeness of this project. As far as I know, no other country has embarked on something like this, and certainly no country as large as the US, with the largest research ecosystem in the world. To my mind, this is the kind of far-ranging project that is naturally suited for ARRA funding, and I’m thrilled to see it recognized as such. So hats off to our civil service friends: well done! If I’m right, it may serve as a powerful force enabler for American science in the years to come.

Now, back to nitpicking: One might wonder why NIH didn’t simply fund the National Library of Medicine to do this work. It would fit very logically within their mission, and indeed would help NLM significantly, as examplified with Thomson’s ResearcherID project, aimed at making it easier to disambiguate researchers and their contributions. Well, ARRA funds weren’t available to NLM, so perhaps it’s as simple as that. Still, I couldn’t avoid being surprised at finding the University of Florida being the “prime” on this grant. I don’t know about you, but I certainly don’t think of UF when pondering “Semantic Web” and “expertise finding”. If anyone knows better, please do light my lantern.

On the technical side, the proposed compendium will use the VIVO system, a nice piece of software developed by the Cornell Libraries (there’s that library connection again). Interestingly, the software is Open Source to boot, though I haven’t found a download site yet. More once I get my hands on it.

Why ResearchScorecard now links to LinkedIn

November 4th, 2009 Yannick No comments
examining a researcher's LinkedIn network

examining a researcher's LinkedIn network

We’ve recently added functionality that links our Researcher Profiles to public LinkedIn profiles.

Why bother, you might think? The reasons are eloquently described in an interesting study by a group of researchers in academia, software companies and one of my favorite defense contractors, MITRE Corporation.

Having researched the requirements for expertise location systems for biomedical scientists, one of Schleyer et al.’s (2008) major findings is the need to exploit “… others’ social networks when searching for collaborators”. In plain language, this just means that when considering a collaboration, people find it helpful to understand who is associated with the prospective collaborator, perhaps to determine whether a common contact could perform introductions, but also to get a sense of the person (kind of like in high school, where one is often judged by their crowd). Yes, biomedical researchers are just like everyone else when it comes to socialization.

In short, after perusing the professional and scientific aspects of a potential collaborator, you’ll now be able to jump to LinkedIn to figure out whether there is a contact known to you both that can tell you more about him/her. Neat, huh?

Of course, such “social networking inter-connection” is one thing LinkedIn does admirably well in the professional realm, and so it didn’t take much to convince us to enable our Researcher Profiles to show a link to an individual’s profile when it’s available. Note that you will need your own LinkedIn account to be able examine someone else’s network.

Going back to the study, Schleyer et al. present ten major conclusions derived from interviews and a comprehensive literature review. The interviewees were from Carnegie Mellon University and the University of Pittsburgh. As with all expertise finding studies I know of, the results are retrospective only, since no scientist was actually observed in the process of seeking expertise. Though understandable, this limitation is unfortunate, given the relative inability of human subjects to recall and accurately describe their motivations and thought processes post facto.

Requirements identified by study Our plain language translation What we’re doing about it
“The effort required to create and update an online profile should be commensurate with the perceived benefit of the system” Scientists just don’t have the time to create and maintain their profile… Our Researcher Profiles are not populated by the researcher.
“Online profiles should (…) reduce the effort involved in making collaboration decisions” The study states that information about a scientist is “…very fragmented and inhomogeneous”. In short, creating a robust profile requires lots of manual Web searching and inability to construct a comprehensive data set by which to judge a given data point against a distribution (the only way to really understand data). Resolving this problem is one of ResearchScorecard’s main value-added features: very different data sets are brought together and harmonized; statistical distributions are created and used to contextualized individual data points.
“Online profiles should be up-to-date” Selecting a collaborator involves predicting aspects of the professional future of that person; leading indicators are preferred over trailing indicators. ResearchScorecard is one of very few biomedical expertise systems that cover granting data, one of the “freshest” data sources to describe current researcher activity. And of course, we include funding amounts, not just title and grant number, and we do so for multiple funders, even private ones.
“Researchers should be able to exploit their own and others’ social networks when searching for collaborators” Scientists want to assess their potential collaborator’s “clique”. Now available!
“The system should model proximity, which influences the potential success in several respects” “Proximity” = physical proximity, social proximity (clique), organizational proximity, and closeness of research area between the two parties. RSC provides unit affiliation and research area proximity for this purpose through its Collaborator Network report, though we could do a better of showing physical proximity. Here’s an example report (takes a few minutes to compute).
“The system should facilitate the assessment of personal compatibility, similarity of work styles and other “soft” traits influencing collaborations” Is the potential collaborator a nice person? Does he/she know how to collaborate? We provide metrics of the number of collaborators over the years as a rough way to address this question.
“Social networks based on co-authorship may only partially describe a researcher’s collaborative network” What about data from memberships in research consortia, clinical trials, etc, that are not always visible? There is a lot here that we don’t address … yet. We do track co-PIships and are considsering mining the acknowledgment section of publications (see this 2004 paper for an example application).
“The system should account for researchers’ preferences regarding privacy and public availability of information about them” This topic is replete with a plethora of aspects, but one elephant in the room is the desire from some researchers to not attract attention for any number of reasons… We at ResearchScorecard believe that if a researcher works in a research institution that receives public funding, there are no strong reasons to exclude aspects of a professional persona from the profile if the underlying data are already publicly visible.
“The system should provide methods to search effectively across disciplines” Biomedical research is vastly more cross-disciplinary than even ten years ago. Witness discoveries that rely on instruments that are heavily dependent upon physics, chemistry, computer science, engineering, etc. This dependency on other disciplines is likely to continue increasing. This requirement is why we are investigating the merging of expertise data with data from compound analysis systems such as CDD (see our recent blog post).
“The system should help make “non-intuitive” connections between researchers” Finding potential collaborators that look like you: easy. Finding potential collaborators that you should consider yet don’t look like you: hard. This requirement is related to cross-disciplinary searching, though there are plenty of potential collaborators in proximal fields as well. For a software system to make non-intuitive yet useful recommendations would be very valuable, as long the recipients have confidence in the recommendations. Unfortunately, it’s our experience that the more non-intuitive the recommendation, the less likely the recipients’ confidence in the recommendation…

Expertise finding at UCSD

August 22nd, 2009 ypouliot 2 comments

Calit2 is a general public, visually pleasing expertise finding system that incorporates now-classic Web 2.0 techniques such as tag clouds and AJAX. It is focused on researchers associated with the California Institute for Telecommunication and Information Technology, basically scientists at the University of California San Diego (UCSD).

Launched in 2007, it covers more than 650 researchers and does a nice job of being informative without being overwhelming. However, like all such academic systems that I know of, it doesn’t provide comparative information of the sort that would enable differentiating scientists in terms of research domains, degree of expertise, or scientific prominence (aka “research impact”).

For example, as of this writing it lists16 scientists involved in bioinformatics. Great stuff, but you can’t easily distinguish their specific research areas within bioinformatics (yes, a tough task indeed), nor generate a report that would give you a sense of how influential each researcher in terms of numbers of papers, grants and grant dollars, patents, etc.

As I said earlier, the latter limitation is typical of academic systems, presumably because their researchers might not appreciate being ranked in such an explicit manner, even though this is precisely what happens when they are evaluated for tenure, and to a lesser extent, when applying for funding. And of course, we’re just applying the GPA principle, though one could argue that one’s GPA isn’t being displayed for all to see and that the number is calculated when one has to join a school, admittedly not the case here…

This is one reason we provide a ranking of researchers using our GOPR score. Since we’re not beholden to any particular research institution, and although we’re trying to be sensitive about it, ResearchScorecard can afford to take the risk of annoying some of our fellow scientists, or at least as long as our ranking system makes sense and is operating correctly. I hasten to say that ranking scientists (or any professional) in a rigorous and fair manner is difficult indeed. We’re definitely not done here, and so we’re always very interested in hearing suggestions and comments from our, ahem, “research subjects”.

And if you would like to raise your ranking, it’s actually very straightforward to do so. All one has to do is to get more grants, publish more papers and obtain more patents, among other things. Hey, I didn’t say it would be easy…

Connecting folks based on their searches: The State Department’s iHarvest

August 7th, 2009 ypouliot 1 comment

Many sectors of American society like to dump on the federal government. I often disagree as to the pertinence of these criticisms. Rather, I frequently observe amazingly smart initiatives and accomplishments, close to miraculous given how large an organization we are talking about.

Here’s an example: Applying the principle of search motivation to connect individuals who may have valuable information to share. Called iHarvest, it is being developed for the Department of State so that government employees who are researching similar individuals can discover that others are doing the same. That very observation might be highly meaningful if one party has bits of information the others don’t.

Yes, there are all sorts of knowledge management issues here. E.g., what if no one has any “proprietary” information? Even so, there is value in having the parties come together to realize that they don’t know any more as a group than they do individually. Remember, the beginning of wisdom involves understanding the limits of one’s knowledge.

Now, where might have you heard of this business of using the search motivation to connect X with Y? Hum, perhaps…Google! Yup, that’s the core of the Big G’s business model right there, now being applied for matters of security.

And oh by the way, this was brought to my attention by a monitoring agent of the government’s impressive FedBizOpps.gov repository of business opportunities, all for free, though you will likely need an account to access the link to the description of iHarvest.

Below I’ve highlighted the significant bit from the project description, just to spare you reading the required turgid governmentese:

The Department of State (DOS), Bureau of Diplomatic Security (DS) has an unusual and compelling need for immediate support for a unique iHarvest capability that leverages new information technology to automatically build user models based on analyst or operators activity and interests. This capability will automatically alert DS personnel to the fact of other individuals within DS that are conducting similar research or analysis and connect both parties. Additionally, the capability will support connections outside of DS with other interagency partners as DOS embraces a Whole of Government approach. In order to transform the enterprise of DS into an interagency compatible organization, there is an immediate need for greater data discovery among our intelligence partners and within DOS writ large, and this capability is an immediate first step to address this need. Particularly, as US Department of Defense forces reduce their presence in Iraq the DS agents immediately require an automated mechanism for sharing information amongst themselves and interagency partners. The capability will plug-in to DS existing situational awareness systems that support intuitive spatial interaction (Google Earth). Without this capability, the Departments ability to conduct diplomacy and business in high threat areas and around the world may be at risk which could affect the Departments mission. Further, it would impair the Departments ability to support national security requirements. Vital pieces of information that one individual is working with could go undiscovered by an office (or agency) that is involved with the same problem set. DS personnel are in danger at these high threat areas if the protective services personnel are not provided with this capability there exists a grave danger for their personal injury as well as injury to the individuals they are assigned to protect. The objective of this activity is to provide iHarvest integration research, design, development, integration, fielding and technical review support to the DS office: establish an alternate services and operations center for integration, operational testing, and evaluation. This information center will be an intricate part of a network of agencies where personnel conduct multi security level intelligence, law enforcement and counterterrorism operations.

Very cool, and great idea. My hat is off to the nameless bureaucrat(s) responsible for getting this off the ground. Who says government is necessarily lacking in imagination? Not I…

Categories: Interesting KM papers Tags:

POPS: Expertise location at NASA

June 27th, 2009 ypouliot 1 comment

Interesting case study of POPS produced by Clark & Parsia , a semantic web firm.

POPS is a NASA expertise location system which aims to “integrate NASA’s information about its nearly 70,000 combined civil service and contractor workforce in one place, linking the relevant, related information to form a comprehensive data service for staffers, workforce planners, analysts, and related personnel.”

POPS makes use of semantic Web technologies such as RDF to integrate data which are delivered via jSpace , is a visual query builder and Linked Data browser for SPARQL and other RDF query languages.

I particularly like their social network visualizer and its ability to overlay skills on top of the familiar “who-has-worked-with-whom” network (fig. 2 in the white paper), though it does look like an awful lot of navigation may be required. I also wonder about how much detail can be overlaid unto the network. Still, very nice work.

POPS' Social Network plugin

POPS' Social Network plugin

Categories: Interesting KM papers Tags: