BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Posts Tagged ‘JournalTOCsAPI’

JournalTOCsAPI workshop

Posted by Nick on November 26, 2009

On Friday I was invited to participate in a workshop for the JournalTOCsAPI project at Heriot Watt University in Edinburgh.  I didn’t think I was going to make it at all due to the awful flooding in Cumbria and we were told at one point that trains were travelling no further than Carlisle due to the weather and that Scotland was effectively out of bounds – the tracks must have been dry enough, however, and I arrived just in time for Lisa Roger’s introductory presentation “JournalTOCs Workshop – Introduction & Feedback”:

Then came Jenny Delasalle, Repository manager at Warwick University and chair of UKCORR, talking about “Repositories and Alerting Services”:

The third presentation was given by Santy Chumbe, the JournalTOCs Project manager, on behalf of Anne Dixon from the British Geological Survey who helped to test the first use case for the JournalTOCs project:

I was next up presenting on Bibliosight – though it remains to be seen just how relevant this will continue to be as we learn more about WS Lite:

Finally Phil Barker presented on “The Other Side of the Interface” which I found a most engaging re-evaluation of our developing repository/research infrastructure as a complex and dynamic “ecosystem” full of interacting (and evolving) entities and processes:

Thanks to the JournalTOCs team for an enjoyable and informative event, to Jenny and Phil for their presentations and to Helen Muir and Colin Smith (Repository Manager at the Open University) for their insights throughout the day. It was particularly interesting for me to listen to Jenny and Colin discuss their respective practices at WRAP and the ORO – both examples of successful and well established Open Access repositories at major research institutions with much greater numbers of research outputs than Leeds Met – I certainly learnt a great deal about how I might use alerting services, including the JournalTOCsAPI, to alert me to new publications that I can pursue for the OA research repository at Leeds Met and, along with bibliosight and WS Lite I shall aim to integrate some of what I learned into my workflows over the coming months.

Posted in Event, JournalTocs | Tagged: , , | 1 Comment »

Quick sketch #2

Posted by Nick on November 13, 2009

The diagram below is Arthur’s update of my earlier quick sketch to illustrate what Bibliosight will aim to achieve by the formal #jiscri deadline.

It is numbered and colour coded – stages 1 – 3 (shades of blue) are within the #jiscri timeframe; stages 2 (green) & 5 (buff) will require ongoing work beyond the deadline.

(N.B.  Click on the image for a full size view in a separate browser window.)

Bibliosight

Posted in Bibliosight | Tagged: , , , , , , , | 2 Comments »

Project meeting – minutes

Posted by Nick on October 1, 2009

(Date of meeting 29th September 2009)

Present:  Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Sue Rooke, Nick Sheppard

1.  Apologies

No apologies

2.  Team membership

Thank you to Sue Rooke who has agreed to join the Bibliosight project team; Sue is a research administrator in the Faculty of Health and has already been involved in repository development, contributing to developing workflows and providing feedback on the Open Search interface.  We hope that Sue will contribute, in particular, to use case development.

The team is still lacking a representative from the academic community and we are currently waiting for a reply to recent correspondence. WL is attending the research sub-committee on Monday 5th October and may raise the issue there if necessary.

Action:  WL/NS to pursue academic contact(s) for a representative to sit on the project team

3.  Progress since last meeting

• API

We have now received the updated documentation from Thomson Reuters and Mike has submitted a query to the API  and received an appropriate response in XML. Thomson Reuters’ FAQ gives a full summary of the data fields that can be queried by the service and the data elements that can be returned which appears to be in line with this XML response.

We are therefore able to formally reduce the associated risk back to low:

Risk Probability Impact Action to Prevent/Manage Risk
API unsuitable for project deliverables Low (elevated to Medium;1stSeptember 2009 – reduced back to Low; 29th September 2009) High Feedback from Thomson Reuters indicates proposal technically feasible.

Problems with API/documentation have been mitigated by release of new documentation from Thomson Reuters; 29th September 2009)

N.B.  The wording of the documentation appears to suggest that it is only possible to return 100 records with a single query using the API – NS to clarify with Thomson Reuters.  If this is the case, the practical implications  are limited in the case of Leeds Metropolitan University which publishes a relatively small amount of research but would be considerable for an institution with a greater research output.

Action:  NS to clarify 100 record limit with TR

Action:  MT to continue appropriate* implementation of API

* Hopefully what is “appropriate” will evolve over the coming weeks!

• Use cases

Technical difficulties have contributed to a lack of conceptual clarity amongst the project team and there was considerable discussion around precisely what data Bibliosight will now seek to retrieve from WoS using the API and what we will aim to achieve with that data.

The original use case narratives outlined in the bid were several and focussed on an alert service for researchers and/or repository administrators to encourage the deposit of an appropriate full text in the repository and perhaps neglected the obvious administrative use case whereby metadata from WoS is pulled directly into intraLibrary.

N.B.  An important use case was also the extraction of citation metrics that would potentially inform the REF – we are not yet clear how this would be achieved but we understand it will rely on the Article Match Retrieval service.

Of course we also want to produce outputs that are of use to the wider community rather than just to users of our specific repository software and this reflects the considerations of the Readiness for REF project which also hopes to enable UK repositories to make effective and efficient use of the WoS API (as part of a much broader project) and is focussing on EPrints, DSpace and Fedora as the most well established OA research repository platforms.  R4R raises several pertinant questions, many of which also arose independently and in a similar form during our own discussion:

  • What are the different workflows relevant to (i) backfilling a repository with a one-off download and (ii) ongoing use of WoSAPI to populate a repository?
  • What uses might records downloaded from WoSAPI be put to?
  • How might the workflows be designed to enable other datastreams also to help populate the repository (eg from UK PubMedCentral, arXiv, or sources that better serve the arts, humanities and social sciences)?
  • What workflows might be able to handle facts such as that the WoS record will become available some time after the paper is published, whereas deposit into the repository may happen earlier than that?
  • What methods might be helpful in addressing the inevitable questions of duplicate records, or ambiguous relations with existing records?
  • Are there implications for a repository’s mission and reputation if the balance of content it holds is rapidly changed by a large number of WoS-derived records?

Use cases may also be informed by the JournalTOCsAPI project (see item 5 below) who also explored similar issues in a recent post.

One  practical consideration from a technical perspective and that will have a bearing on developing use cases is the best method of extracting comprehensive records from institution “X” – the most appropriate field to query seems to be the address field but it is not clear how consistent the institutional address in this field will be – for example, early experimentation has found that “leeds metropolitan university” only returns 201 records; using a wildcard in the form “leeds met*”, however, returns 1503 records (test conducted 29th September 2009).  This was an issue flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

In terms of the practicalities of actually getting records from WoS into intraLibrary once they have been harvested, Peter did indicate that it should be possible to upload suitable XML records into intraLibrary though this will need to be in LOM format, meaning that we may need to perform an XSLT transformation to convert data retrieved from WoS into a suitable format.  Also, Peter is uncertain whether XML that can be imported in this way will also include the LOM extensions we are using to accommodate bibliographic information and will need to speak to his technical colleagues at Intrallect to clarify.

Note:  There was also discussion around appropriate integration with SFX, our OpenURL resolver, as a possible means of identifying a published URL for WoS records – this is an area that has scope implications both for Bibliosight and the remit of the Leeds Metropolitan University repository itself; beyond an Open Access repository of research (i.e. to also comprise citation only records).  This is an area that may need to be explored in more detail later in the project.

Action:  PD to clarify re upload of XML to intraLibrary including LOM extensions

Action:  NS/BB/SR to meet with another member of the URO to clarify potential use cases (meeting on Thursday 1st October)

Action:  All team members to contribute to ongoing discussion on the blog.

• Project reporting – blog; tags specified by JISC

It was agreed that the specific subject for blog posts this month will be ‘Technical standards’ – Peter agreed to contribute a post before the next meeting.

Action: PD to contribute a blog post on ‘technical standards’.

Action: All team members to contribute to ongoing discussion on the blog.

4.  Visit by Thomson Reuters reps on Wednesday 30th September

Mike and I met with Jon and Gareth from TR on Wednesday 30th (yesterday) who were able to clarify several issues for us – separate post to follow

5. Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

During the meeting, I gave a quick overview of the recently released JournalTOCsAPI at http://www.journaltocs.hw.ac.uk/index.php?action=api with a view to de-mysifying the concept of an API for the less technical amongst us and also potentially giving the more technical a developmental steer.  Currently, queries need to be submitted to the API by URL and are returned as an RSS feed which includes as much metadata as in the original TOC feed – depending on the quality of the original record – comparable to Bibliosight in many respects, this project perhaps has greater flexibility regarding the metadata it is able to query and return – it is, after all, building an API from the ground up that will query an openly accessible data source – however, it is likely that the quality of the data may not be as consistent as WoS; there may be fields missing, for example.

It has also been informative to engage with another, similar project as a ‘user’ and we discussed how Bibliosight might also engage with JournalTOCsAPI community of users and agreed that it is a valuable opportunity to solicit the opinion of repository managers from other institutions using different software platforms.

Action:  NS to continue engaging with JournalTOCsAPI as a ‘user’

Action:  NS to send an email that can be forwarded to JournalTOCsAPI community of users as suggested in recent correspondence from Lisa Rogers

6.  Article Match Retrieval & Researcher ID

These were only touched upon briefly in the meeting and flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

7.  A.O.B.

None

8.  Date of next meeting

20th October 2009 – 11:30 am

Posted in Bibliosight, Progress post, SCRUM minutes | Tagged: , , , , , , , , , | 2 Comments »

User participation

Posted by Nick on September 29, 2009

Aside from the minor detail that we haven’t yet got anything that requires user participation, we need to consider how we will facilitate such participation when we do actually have a tool to test!  One facet of this will be users at Leeds Met but it would obviously be desirable to engage with users from other institutions, using different repository platforms; after I met Santy at the jiscri event in Manchester, Lisa Rogers from JournalTOCsAPI contacted me last week about engaging with their community of users and suggested I send an email that she can forward to their users to see if they are also interested in our project.

The JournalTOCsAPI project are themselves facilitating user participation in a number of ways and have recently released an alpha of their API at http://www.journaltocs.hw.ac.uk/index.php?action=api which, though still with limited functionality, gives a good sense of what queries are/will be supported by the API.  At the moment, queries need to be submitted by URL and are returned as an RSS feed.  Of course I have been participating as a user and submitted the query http://www.journaltocs.hw.ac.uk/api/articles/leeds%20metropolitan%20university which returns 5 articles with “leeds metropolitan university” in the returned metadata (only 5 results!) and includes as much metadata as in the original TOC (table of contents) feed – depending on the quality of this original record this will be:

Abstract, Content type, DOI, Author(s), Journal, ISSN, Article URL, Citation, Publication Date

Question:  Is this the full complement of returned fields (these from my 5 results)

For comparison this compares with the metadata we hope to be able to extract from WoS:

  • Authors — All authors, book authors, and corporate authors
  • Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
  • Keywords — all author supplied keywords
  • UT — A unique article identified provided by Thomson Reuters

(Most obviously we are lacking Abstract and DOI)

Like Bibliosight and all the jiscri projects, the JournalTOCsAPI blog is also an important mechanism for facilitating user participation and in a recent post the  team asked How do you want to be alerted?; though our primary use cases are perhaps slightly different, this is also a pertinent question for Bibliosight with our original conception having several objectives – to automate the process as much as possible and pull metadata from WoS directly into our repository but also to alert researchers and/or repository administrators to encourage the deposit of an appropriate full text.  Of course we also want to produce outputs that are of use to the wider community rather than just to users of intraLibrary.

The third way that JournalTOCsAPI are facilitating user participation is by using an online bug-tracking system called Mantis – http://www.mantisbt.org/.  Do we need to think about providing a similar facility for users of Bibliosight deliverables?

Question to JournalTOCsAPI:  How successful has this approach been amongst your users?  Though I have been set up with an account I must admit that I’ve only logged in once…but then I haven’t yet found a bug to report!

Much to think about as we go into our 3rd project meeting this afternoon and with the focus very much on developing a usable prototype before meeting number 4.

Posted in User participation | Tagged: , , , , | 1 Comment »

Project meeting number 3: Draft agenda

Posted by Nick on September 24, 2009

Date of meeting:  Tuesday 29th September 2009

1.  Apologies

2.  Team membership

3.  Progress since last meeting

  • API
  • Use cases
  • Project reporting – blog; tags specified by JISC

4.  Visit by Thomson Reuters reps on Wednesday 30th September

5.  Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

  • Potential synergies with Bibliosight

6.  Article match Retrieval & ResearcherID

7.  A.O.B.

8.  Date of next meeting

Posted in Agenda | Tagged: , , , , | 3 Comments »

JISC Rapid Innovation event at City of Manchester stadium

Posted by Nick on September 23, 2009

At the beginning of September (Thrsday 3rd/Friday 4th September) Mike and I attended the JISC Rapid Innovation in Development event at the City of Manchester stadium and I’m finally getting round to a blog post…one way or another, things have moved on somewhat in the intervening weeks and now that we have XML – the lack of which was hanging over me like a dark cloud in Manchester (or was that, in fact, a dark cloud?)

Although not a City supporter it was a fantastic venue with great views over the rain-swept turf and had the added benefit for me of being nice and local; much more so, in fact, than a normal work day.

I had only learned the day before that all the projects represented at the event would be participating in a “45 second scramble” to deliver a lightening pitch of their project and found myself very much outside my comfort zone as I took my place in the queue before being handed the mic – itself discomforting when you’re not used to having your voice amplified! Fortunately, the organisers did a great job of putting us at our ease and the atmosphere was one of fun and mutual support; although that didn’t make the next stage of the exercise any less uncomfortable when we attended a Dragon’s Den style interview with a panel of three who made you watch your recorded pitch back and helped you analyse just what was wrong with it…oh, and tomorrow morning you’ll have to do it again. In 20 seconds.

I was given some good advice and in actual fact found the exercise very useful though still lacked the confidence to deliver my 20 seconds from the hip and read it out instead…one thing at a time!

Click on the image for video

Of course, the main issue for Bibliosight has been the ongoing difficulties with the WoS API and unlike other projects at the event, we didn’t yet* have anything like a prototype…which didn’t make the business of pitching any easier – all delegates were also interviewed by a live blogger at the event and I needed to think on my feet in order to present a cogent synopsis of our still nascent project… it will be interesting to see just how closely the final deliverables are to this still-theoretical overview.  I was able to speak with our programme manager who was supportive of the difficulties we have been having and acknowledged that unforseen problems like this are likely to arise due to the very nature of Rapid Innovation.

*As we do now have the updated documentation from Thomson Reuters and Mike has been able to return some XML from WoS using the API, a working prototype should be available soon.

I was also able to speak in some detail with @santychumbe from JournalTOCsAPI and in light of our problems, he suggested that we might want to experiment with their API in the meantime – at the very least this would contribute to software testing on another jiscri project and potentially also inform our own API development. However, as we now have XML ourselves from the WoS API and in view of the short timescales we are all working to for these projects, it’s probably unrealistic for us to look at JournalTOCsAPI in any great detail in the context of Bibliosight, though we are, of course, keen to liaise with Santy and his colleagues in the longer term and to input to JournalTOCsAPI in any way we can.

Note: Santy has just announced a release of the JournalTOCsAPI - http://www.journaltocs.hw.ac.uk/index.php?action=api – which I intend to have a closer look at immediately after this post as part of my preparation for next weeks meeting (29th Sept).

As I was so close to home I didn’t stay in the hotel with the other delegates so wasn’t able to partake of after-hours networking – and only belatedly learned that the poker chip in my delegate pack was actually for a free drink; had I realised I could have passed it on to a fellow delegate to the undoubted benefit of their project.  It did mean, however, that I was probably fresher on day two than some of my colleagues who had cashed in their chips, though a hangover might have been a welcome distraction from my impending lightning pitch!

In amongst our own specific and project-centric discussions there were lightening talks from various experts, a panel discussion on agile software development, lots of planned and impromptu software demonstrations (I saw a great SWORD app using the Adobe Air platform from @juliancheal that I want to get my hands on) as well as innumerable spontaneous fora coalescing throughout both packed days…@paulwalk wound things up with an introduction to DevCSI – Developer Community Supporting Community Innovation and has also posted about the event at http://blog.paulwalk.net/2009/09/04/jisc-rapid-innovation-event/.  All in all, one of the most enjoyable JISC events I have attended.

Posted in Event | Tagged: , , , , | 2 Comments »

Project meeting number 2: Draft agenda

Posted by Nick on August 27, 2009

Date of meeting:  1st September 2009

1.  Apologies

2.  Progress since last meeting

  • API
  • SWOT analysis
  • Project reporting

3.  Liaison with other projects

4.  Use case development

5.  A.O.B.

6.  Date of next meeting

Posted in Agenda, Bibliosight | Tagged: , , , , , , | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.