BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Posts Tagged ‘Thomson Reuters’

Web Services Lite

Posted by Nick on November 26, 2009

When the Bibliosight project began back in June, Thomson Reuters’ new Web of Science Web Services had not been released and we were very grateful to the company for giving us full access to their “general API”. After discussion with Thomson, we understood this to be an unrestricted version of WS Lite. However, we have now subscribed to the service which, in actual fact, appears to be a different enough product to need another reasonable chunk of time to learn and implement, which is a little frustrating this close to the end of the project!

There is some consolation that a number of components appear to be shared; query format for example, though Mike hasn’t had enough time with the documentation to fully digest all the similarities.

The resulting XML is also different but more useful (we think), though right now this is based on the documentation which is much more thorough and which should make our life easier and also others wanting to implement the service.

To register for WS Lite users will need to review the Terms & Conditions at the following URL which will take you to a registration form: http://science.thomsonreuters.com/info/terms-ws/

Posted in Bibliosight, Thomson Reuters Research Analytics | Tagged: , , , | 5 Comments »

Visit from Thomson Reuters

Posted by Nick on October 2, 2009

On Wednesday afternoon Mike and I were finally able to sit down with Jon and…Gareth? (sorry, I’m terrible with names) from Thomson Reuters to discuss Bibliosight and the work we are doing with the WoS API, it probably goes without saying just how useful this was, especially so soon after our Tuesday meeting.

As we have come to appreciate, Thomson are still very much in an ongoing process of developing their suite of tools and commercial services around the extraction of data from WoS using their API and, overall, I was given the impression that the company are currently practising something of a balancing act to weigh their commercial interests against providing appropriate value added services to their subscribers under existing licensing agreement – which is, of course, entirely reasonable.  Jon suggested that the Bibliosight project is something of a pioneer in using this technology and a useful case-study for the company, which certainly puts some of our early difficulties into context – though he did indicate that numerous other folk are also actively investigating the API; in particular he mentioned Queens College Belfast, an institution in Birmingham and R4R at Kings College London in collaboration with EPrints’ Les Carr at Soton.  R4R is the only project that I was hitherto aware of and have had any contact with; it would be really useful if we were able to communicate with others also using the API.

Thomson Reuter’s flagship commercial product is called InCites and “supplies all the data and tools you need to easily produce targeted, customized reports… all in one place. You can conduct in-depth analyses of your institution’s role in research, as well as produce focused snapshots that showcase particular aspects of research performance.” We discussed how, though such a service will be invaluable for the research oriented Russell Group institutions, it is likely to be overkill for a million plus institution like Leeds Met; nevertheless we do require a certain level of functionality to help us analyse our research performance which, alongside our traditional strengths in teaching and learning, is increasingly important, especially in view of the REF.  Hopefully this is where the developing ‘suite of tools’ comes in and our guests were keen to get a handle on precisely what we are hoping to achieve with Bibliosight (aren’t we all!).  I outlined our preliminary use-cases for them as a foundation for our discussion and was also keen to ask some of the specific questions that had arisen during the previous day’s meeting.  First of all I asked about the wording of the documentation that appears to suggest that it is only possible to return 100 records with a single query using the API – they weren’t aware of such an issue and agreed that the way it was expressed in the documentation was a little ambiguous; Jon will follow this up for us though Mike may also be able to elucidate the situation when he has investigated further.  They were able to say that another user had discovered that the API could be called twice every second, however, so didn’t anticipate any problems with extracting all the data we need.

The major issue that came up at the meeting on Tuesday was how best to return all of the articles for a given institution with the most appropriate field to query apparently being the address field.  It is not clear, however, how consistent the institutional address actually is and Jon confirmed that it is derived from information harvested from individual journals/papers which preliminary manual searching of WoS has already demonstrated to be idiosyncratic  – at least in the case of Leeds Metropolitan University and almost certainly other institutions aswell (leeds metropolitan university; leeds met [uni]; lmu etc).  Jon suggested that the safest and most effective method of returning all records would actually be by using ResearcherID though this would require all institutional authors to be registered and an additional paid subscription to ResearcherID download (as opposed to upload which is free) – in lieu of this, however, he did confirm that the address field was the only way and that it may be necessary to build a catch-all query to ensure that we don’t miss anything – precisely how we achieve this is still a little bit of a moot point, though he did indicate that some work has been done on disambiguating institutional address formats within WoS and will follow up on this for us in due course.

Through our discussion, Article Match Retrieval is finally beginning to make more sense to me now, and Jon confirmed that this is the method that would be used in conjunction with the API to provide numbers of citations to an individual article – AMR can be queried by numerous fields including DOI and UT Identifier (A unique identifier for a journal article assigned by Thomson Reuters.); in terms of the current project, I think it makes sense to focus initially on extracting bibliographic data first before worrying about citation metrics; via the API, we can also extract the UT identifier and then use this to query AMR.

We also touched on Terms & Conditions and Thomson, again reasonably, expect WoS as data source to be clearly acknowledged on each individual record – Mike wasn’t initially certain how this could easily be achieved from a technical perspective, at least in the case of bibliographic citation information (which may have been added manually); we have a few ideas on how this could actually be achieved but is really just something to be aware of at this stage.

All in all I now feel that the overall shape project is beginning to be resolved and, in addition to the technical work required to extract, store, parse, convert (XML) records and then pass them somewhere else (intraLibrary/EndNote), a large part of Bibliosight will necessarily focus on developing use-cases for our institutiona research administration which is likely to continue well beyond the designated 6 month life-cycle of the #jiscri project!

Posted in Progress post, Research Excellence Framework, Thomson Reuters Research Analytics | Tagged: , , , , , , , , | 2 Comments »

More on ResearcherID

Posted by Nick on September 29, 2009

A quick search of my blog feeds turned up surprisingly little on ResearcherID – just 8 posts from my fairly populous repository oriented RSS aggregation, all from 2008, they include this post from June 2008 on the Wrap Repository blog which emphasises that “it would be easier for the author if there were a universal unique identifier that could help us all to share information about the author in a more automated way” and a post on the principles of citation-based evaluation from Overdue Ideas in which @ostephens summarises a session by James Pringle from Thomson Reuters and voices concern about how the work being done by Thomson Reuters “joins up with activity in the sector, and by other organisations. How does ‘ResearcherID.com’ link to OCLC Identities work? It would be great to see some joined up thinking across the library/information sector on this, as otherwise we will end up with multiple methods of identification.”

So it seems that ResearcherID received a flurry of attention when is was released back in 2008 but is still just one potential solution to an ongoing issue – as I noted in a recent post, Open Research Online is using a unique University ID in their EPrints repository (though I need to do more reading, other solutions mooted in the blogosphere seem to be OpenID and OCLC identities.)  I also searched http://www.researcherid.com/ for “Leeds Metropolitan University” and found just 4 of our researchers in the database…

Nevertheless, in terms of the of the Bibliosight project, and the wider context of the Leeds Met repository, ResearcherID could well be an appropriate solution, and is certainly worth exploring further with the project team and with the URO…just a very quick note on practicalities; batch upload to ResearcherID would require us to prepare a detailed XML document which, to my mark-up phobic eye, looks decidedly none trivial – it would need to comprise records for all leeds Met researchers of course.  This is an example (view in IE or FF; Chrome will interpret the XML rather than show mark-up)

Posted in ResearcherID | Tagged: , , , | 3 Comments »

Project meeting number 3: Draft agenda

Posted by Nick on September 24, 2009

Date of meeting:  Tuesday 29th September 2009

1.  Apologies

2.  Team membership

3.  Progress since last meeting

  • API
  • Use cases
  • Project reporting – blog; tags specified by JISC

4.  Visit by Thomson Reuters reps on Wednesday 30th September

5.  Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

  • Potential synergies with Bibliosight

6.  Article match Retrieval & ResearcherID

7.  A.O.B.

8.  Date of next meeting

Posted in Agenda | Tagged: , , , , | 3 Comments »

Just round the next corner…

Posted by Nick on September 24, 2009

Thomas Reuters new research analytics website - http://researchanalytics.thomsonreuters.com/ –  has finally provided a thread through the labyrinthine Research Analytics infrastructure that I’m able to follow – forgive the hyperbolic metaphor –  it’s probably isn’t that difficult to navigate and I’m hardly an ancient Greek hero – just easily lost! Nevertheless, it links intuitively amongst the various information and I’m reviewing, in particular, the information on Web Services, Article Match Retrieval and ResearcherID in advance of next week’s meeting.

I’ve certainly asked most of the Web Services FAQ myself over the past few weeks.  The most relevant from our perspective are:

    What data fields can be queried through the service?

    The requesting system can query the Web of Science using the following fields:

  • Address (including Street, City, Province, Zip Code, or Country)
  • Author
  • Conference (including title, location, data, and sponsor)
  • Group Author
  • Organization or Sub-organization
  • Source Publication (journal, book or conference)
  • Title
  • Topic
  • Year Published
  • The service will support the AND, OR, NOT, and SAME Boolean operators.

    What data elements are returned by the service?

    The Web of Science Web Service returns five fields to the requesting system:

  • Article Title
  • Authors — All authors, book authors, and corporate authors
  • Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
  • Keywords — all author supplied keywords
  • UT — A unique article identified provided by Thomson Reuters

One of the issues we are likely to run into retrieving data from WoS is differentiating between similar names and disambiguating the same name that has been entered in different formats and this is where ResearcherID can come in.

N.B.  This is actually an issue with implications beyond Bibliosight and purely internally; I’ve been aware of the need for a unique identifier for researchers in intraLibrary for a while, prompted by a blog post from Open Research Online describing how they have developed a feed of a faculty group members’ publications from ORO (EPrints) to the faculty’s website which “made use of the fact that everyone’s publications in ORO are linked to their unique university ID.”  This prompted me to wonder aloud on Twitter if such use of unique identifiers was standard practice for Eprints – @smithcolin and @ostevens tell me that it isn’t.

ResearcherID is a global service to assign a unique identifier and eliminate author misidentification – with the obvious benefit over an institutional ID that it is universal rather than just local.

As far as I understand, the ResearcherID Web Service from Thomas Reuters comprises two element -

  • ResearcherID upload “that enables administrators to mass create ResearcherID profiles and upload publication data for some or all of the accounts you create for faculty, researchers, etc. at your institution.”
  • Researcher ID download is “a web-based service that enables you to query ResearcherID for researchers at your institution and return publication data for them, including times cited counts where applicable, as well as return institution affiliation for researchers at the requesting institution.”

Upload is freely available to everyone but download is a subscription based service.

I have now registered with the ResearcherID batch upload service and will report on it more fully at the meeting next week.

So what about Article Match Retrieval?  To be honest, this is where my thread runs out, and I’m still not entirely sure how this fits in.  It’s free I think (to WoS subscribers) and the blurb says:

“Article Match Retrieval allows for a real-time lookup of bibliographic metadata such as DOI, author, source title, etc., against the Web of Science database (using the institution’s subscription entitlements). If a match is found, the service will return Times Cited information as well as links to view the full record, related records page, or citing articles page in Web of Science. An institution can use these links as a way to link into Web of Science from their library web page or institutional repository. Subscribers to Journal Citation Reports can use this service to retrieve links to the JCR record for a given journal.”

There is then a form to fill out “to find out how to create direct links to Web of Science articles or Journal Citation Reports” – I’m pretty sure I’ve already filled it out back when we were submitting the bid but this is something to check out with Jon when he comes on Wednesday…

So though things are not crystal clear quite yet, Tuesday and Wednesday next week should put us firmly on the right track.

Posted in Progress post, Thomson Reuters Research Analytics | Tagged: , , , , , , | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.