BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Thinking out loud…

Posted by Nick on November 11, 2009

As the deadline for #jiscri draws close I have just returned to work after a month away from Bibliosight and I’m now desperately trying to catch up with the project and determine exactly what we can aim to achieve by the end of November…The candid truth is that we have only very recently got to the point where Mike can actually do some coding and begin to put together a prototype that fulfills the requirements of our (still formative) use-case[s].

Yesterday morning I had a stab at completing a more detailed template for a primary use-case (this comprises a narrative and the use case itself); then in the afternoon I sat down with Mike to catch up with his progress from a technical perspective and to brain-storm around precisely what functions we require from our prototype and how this may be achieved; there are also some outstanding issues of clarity pertaining to Thomson Reuter’s API documentation, specifically “WoS Search Retrieve Codes and Descriptions” in that we currently have unrestricted access to the API but it is my understanding that the free* service will actually be restricted.  We are not certain:

a)  Precisely which of the fields are associated with the restricted subset that we will be able to query and/or return under the current terrms of our WoS subscription*

b)  What some of the fields actually are as they lack a description in the documentation

*Free to us under existing subscription

Disclaimer:  I’m very much thinking out loud here and attempting to translate what I understand are ongoing conceptual issues for Mike as he works through the documentation.

Note:  I’ve continued to refer to ResearcherID – see https://bibliosightnews.wordpress.com/2009/10/02/visit-from-thomson-reuters/ – though it is not a service we plan on implementing as part of Bibliosight, and not necessarily even in the longer term, I’m pretty sure we are likely to require some sort of unique identifier for authors – a subject that is currently receiving a lot of attention from the repository community.

Anyway…looking back over the blog it seems that:

The requesting system can query the Web of Science using the following fields:

  • Address (including Street, City, Province, Zip Code, or Country)
  • Author
  • Conference (including title, location, data, and sponsor)
  • Group Author
  • Organization or Sub-organization
  • Source Publication (journal, book or conference)
  • Title
  • Topic
  • Year Published

The service will support the AND, OR, NOT, and SAME Boolean operators.

The Web of Science Web Service returns five fields to the requesting system:

  • Article Title
  • Authors — All authors, book authors, and corporate authors
  • Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
  • Keywords — all author supplied keywords
  • UT — A unique article identified provided by Thomson Reuters

The test queries that Mike has submitted to the API have returned XML that appears to be both more granular than indicated and that includes fields other than those that constitute these five (e.g. abstract) so the first thing to do, perhaps, is to contact Thomson Reuters and see if they can apply the restrictions that we will ultimately need to work with, if only to remove some of the noise and make it easier to see the wood for the trees.

The API documentation actually lists over 100 “fields”; only a handful of these are actually described in the documentation, however, and while many are reasonably transparent, others are a little less so and some look like they may duplicate information – or are they perhaps used as alternatives? (e.g. bib_id = Volume, issue, special, pages and year data / bib_issue = Volume and year data).  There is also some lack of consistency in this bibliographic info on a record by record basis; we need to ensure that we have consistent XML being returned for all records – hopefully we can then develop a template in intraLibrary itself that reflects that consistent XML as closely as possible such that we can devise an XSLT style-sheet to perform the approriate transformation.

Mike already has a desktop client that will take XML and perform an XSLT transformation so, once we have clarified the LOM format we require (an action for me from the last meeting), it *should* be relatively straightforward to plug into the WoS API to retrieve XML from the Web of Science which can then be transformed into appropriate LOM.

Then we need to ingest that LOM into intraLibrary, preferably using SWORD…which I shall think about another time!

One Response to “Thinking out loud…”

  1. […] Thinking out loud… […]

Leave a comment