BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Posts Tagged ‘R4R’

Visit from Thomson Reuters

Posted by Nick on October 2, 2009

On Wednesday afternoon Mike and I were finally able to sit down with Jon and…Gareth? (sorry, I’m terrible with names) from Thomson Reuters to discuss Bibliosight and the work we are doing with the WoS API, it probably goes without saying just how useful this was, especially so soon after our Tuesday meeting.

As we have come to appreciate, Thomson are still very much in an ongoing process of developing their suite of tools and commercial services around the extraction of data from WoS using their API and, overall, I was given the impression that the company are currently practising something of a balancing act to weigh their commercial interests against providing appropriate value added services to their subscribers under existing licensing agreement – which is, of course, entirely reasonable.  Jon suggested that the Bibliosight project is something of a pioneer in using this technology and a useful case-study for the company, which certainly puts some of our early difficulties into context – though he did indicate that numerous other folk are also actively investigating the API; in particular he mentioned Queens College Belfast, an institution in Birmingham and R4R at Kings College London in collaboration with EPrints’ Les Carr at Soton.  R4R is the only project that I was hitherto aware of and have had any contact with; it would be really useful if we were able to communicate with others also using the API.

Thomson Reuter’s flagship commercial product is called InCites and “supplies all the data and tools you need to easily produce targeted, customized reports… all in one place. You can conduct in-depth analyses of your institution’s role in research, as well as produce focused snapshots that showcase particular aspects of research performance.” We discussed how, though such a service will be invaluable for the research oriented Russell Group institutions, it is likely to be overkill for a million plus institution like Leeds Met; nevertheless we do require a certain level of functionality to help us analyse our research performance which, alongside our traditional strengths in teaching and learning, is increasingly important, especially in view of the REF.  Hopefully this is where the developing ‘suite of tools’ comes in and our guests were keen to get a handle on precisely what we are hoping to achieve with Bibliosight (aren’t we all!).  I outlined our preliminary use-cases for them as a foundation for our discussion and was also keen to ask some of the specific questions that had arisen during the previous day’s meeting.  First of all I asked about the wording of the documentation that appears to suggest that it is only possible to return 100 records with a single query using the API – they weren’t aware of such an issue and agreed that the way it was expressed in the documentation was a little ambiguous; Jon will follow this up for us though Mike may also be able to elucidate the situation when he has investigated further.  They were able to say that another user had discovered that the API could be called twice every second, however, so didn’t anticipate any problems with extracting all the data we need.

The major issue that came up at the meeting on Tuesday was how best to return all of the articles for a given institution with the most appropriate field to query apparently being the address field.  It is not clear, however, how consistent the institutional address actually is and Jon confirmed that it is derived from information harvested from individual journals/papers which preliminary manual searching of WoS has already demonstrated to be idiosyncratic  – at least in the case of Leeds Metropolitan University and almost certainly other institutions aswell (leeds metropolitan university; leeds met [uni]; lmu etc).  Jon suggested that the safest and most effective method of returning all records would actually be by using ResearcherID though this would require all institutional authors to be registered and an additional paid subscription to ResearcherID download (as opposed to upload which is free) – in lieu of this, however, he did confirm that the address field was the only way and that it may be necessary to build a catch-all query to ensure that we don’t miss anything – precisely how we achieve this is still a little bit of a moot point, though he did indicate that some work has been done on disambiguating institutional address formats within WoS and will follow up on this for us in due course.

Through our discussion, Article Match Retrieval is finally beginning to make more sense to me now, and Jon confirmed that this is the method that would be used in conjunction with the API to provide numbers of citations to an individual article – AMR can be queried by numerous fields including DOI and UT Identifier (A unique identifier for a journal article assigned by Thomson Reuters.); in terms of the current project, I think it makes sense to focus initially on extracting bibliographic data first before worrying about citation metrics; via the API, we can also extract the UT identifier and then use this to query AMR.

We also touched on Terms & Conditions and Thomson, again reasonably, expect WoS as data source to be clearly acknowledged on each individual record – Mike wasn’t initially certain how this could easily be achieved from a technical perspective, at least in the case of bibliographic citation information (which may have been added manually); we have a few ideas on how this could actually be achieved but is really just something to be aware of at this stage.

All in all I now feel that the overall shape project is beginning to be resolved and, in addition to the technical work required to extract, store, parse, convert (XML) records and then pass them somewhere else (intraLibrary/EndNote), a large part of Bibliosight will necessarily focus on developing use-cases for our institutiona research administration which is likely to continue well beyond the designated 6 month life-cycle of the #jiscri project!

Posted in Progress post, Research Excellence Framework, Thomson Reuters Research Analytics | Tagged: , , , , , , , , | 2 Comments »

Project meeting number 2: Draft agenda

Posted by Nick on August 27, 2009

Date of meeting:  1st September 2009

1.  Apologies

2.  Progress since last meeting

  • API
  • SWOT analysis
  • Project reporting

3.  Liaison with other projects

4.  Use case development

5.  A.O.B.

6.  Date of next meeting

Posted in Agenda, Bibliosight | Tagged: , , , , , , | 1 Comment »

euroCRIS

Posted by Nick on July 15, 2009

Preliminary “research” (OK Google) has lead me to euroCRIS (Current Research Information Systems) – http://www.eurocris.org/public/home/; I learned from the JISC site that one of the aims of the Readiness for REF project is to study the CERIF metadata model to see how it could deal with REF data requirements.

Initially the big G took me to an archived page at http://cordis.europa.eu/cerif/ which provided a little background and pointed me at euroCRIS.  I’m still busy reviewing the information on CERIF 2008 there though some of it appears to be restricted to euroCRIS members only.  Which I suspect we are not.  Institutional membership costs just 250 Euros – is it worth becoming a member in the context of the Bibliosight project?

http://www.eurocris.org/public/join-eurocris/types-of-membership/

Posted in Bibliosight | Tagged: , , , , | 2 Comments »

Quickstep into rapid innovation project management

Posted by Nick on June 17, 2009

As I’m on annual leave for 10 days from tomorrow, I’ve been trying to set up the first couple of our monthly project meetings before I go – past experience has taught me that getting your project team all together in one room is easier said than done; the whole point of #jiscri, of course, is that the approach is agile and light and I’ll definately be making as much use of Web 2.0 as possible to communicate with the project team but there really is no substitute for a good old fashioned face to face meeting with luke-warm tea and biscuits.

I haven’t yet seen the project documentation but Wendy has had a preliminary discussion with the programme manager, Andy McGregor, who has indicated that the blog should be the primary mechanism for reporting on our project – in lieu of a formal final project report; there will be specific areas we need to address in our blog posts and I’m looking forward to learning more about this aspect of the programme.  Andy also referred to a couple of other #jiscri projects that it would be useful for us to liaise with; R4R (Readiness for REF) at Kings College – http://www.kcl.ac.uk/iss/cerch/projects/portfolio/r4r.html – and one building an API for TicTocs (don’t quote me on that – need to learn more).

So.  Our first “scrum” is scheduled for Monday 13th July; in the first instance, we should all aim to gather information independently ahead of the scrum which will necessarily be focussed on planning to scope the likely project trajectory – we can think about technical developments in more detail at that meeting.

For the record and by initial only (until they introduce themselves here I hope!) our scrum is:

NS, WL, AS, MT, BB, KB, PD, PJ

Posted in Bibliosight | Tagged: , , , , | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.