BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Posts Tagged ‘minutes’

Project meeting – minutes

Posted by Nick on November 18, 2009

Present: Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Nick Sheppard

1. Apologies

Sue Rooke

2. Minutes from last meeting and actions

As emphasised at the last meeting, it has not been possible, within our timescale, to engage a suitable academic replacement after Phil Jones left the institution earlier in the project and it is now anticipated that academic staff / researchers will be involved in evaluating the outcomes of the project beyond the formal end of jiscri. WL/NS do now have a meeting scheduled (30th November 2009) with Professor Richard Light, the recently appointed Chair of the Carnegie Research Institute, to discuss Bibliosight and the wider repository infrastructure.

NS/PD have done some work on clarifying use cases – see item 4.

Transformation of XML from WoS to LOM format for ingest into intraLibrary. See – http://bibliosightnews.wordpress.com/2009/11/16/mapping-fields-from-wos-api-lom/ – more work still needs to be done in this area. (Action – NS/MT)

AS has updated the schematic diagram to clarify what will be achieved by the end of November. See – http://bibliosightnews.wordpress.com/2009/11/13/332/

NS to contribute project management post to blog on day to day work – ongoing – NS to action ASAP.

PD has contributed a blog post on technical standards used in Bibliosight – http://bibliosightnews.wordpress.com/2009/11/17/the-role-of-standards-in-bibliosight/

3. Update on development of desk-top application

As emphasised at the last meeting, three discrete functional requirements of the desktop application (from now on referred to as Bib App) have been clearly identified:

• Retrieve records from WoS as XML
• Perform an appropriate XSLT transformation to LOM format suitable for ingest to intraLibrary
• Deposit LOM records into intraLibrary using SWORD

MT has been working primarily on stages 1 and 2 and has adopted a pragmatic approach, treating them as two discrete tasks before attempting to integrate the functionality in a single user interface, he has a desktop client that will take XML and perform an XSLT transformation so, once we have clarified the LOM format we require – see http://bibliosightnews.wordpress.com/2009/11/16/mapping-fields-from-wos-api-lom/ – it should be relatively straightforward to plug into the WoS API to retrieve XML from the Web of Science which can then be transformed into appropriate LOM.

Deposit of the LOM into intraLibrary via SWORD should also be fairly straightforward – see – http://bibliosightnews.wordpress.com/2009/11/17/the-role-of-standards-in-bibliosight/ – however, in order to generate clean, consistent LOM, there are still a number of issues to be resolved.

From a technical perspective, Mike is not a Java programmer* and is working very hard to master the language in order to implement an integrated UI that can unify these three discrete functional areas – the precise functionality of the Bib App will also be informed by developing use cases – see item 4 below.

*The WoS API is Java based which perhaps makes it less accessible than it could be – it may be that JISC wish to make recommendations to Thomson Reuters and others regarding the development of open web services APIs. See – http://blogs.ukoln.ac.uk/good-apis-jisc/

Action: NS/MT to continue to investigate issues around three functional areas

Action: MT to continue developing Bib App – development will necessarily take us beyond the formal end of jiscri projects at the end of November

4. Update on use cases

PD/NS have summarised our three use cases in some detail which need writing up in full ASAP (Nick to action).

Particular issues that were identified include:

• In light of progress through the project, UC narratives need to be updated from the now outdated drafts proposed in the original bid
• UCs need to be fully itemised with an ‘actor’ clearly identified for each success scenario
• More thought needs to be given to extensions to each UC

There was particular discussion around UC_2 which centres on targeted communications to researchers to encourage deposit of an appropriate author produced version of a recently published/cited article. It is clear that such a use case will need to identify individual publisher’s copyright policy around deposit in an IR; if they do permit deposit, what restrictions / conditions to they impose? For example, a very common restriction is in the form of a 12/18 month embargo that would need to be incorporated into the workflow.

Action: NS to explore use cases in more detail and write up in full.

5. JournalTOCsAPI workshop – 20th November 2009 – Nick attending

NS is attending a workshop being run by the JournalTOCsAPI project on Friday 20th November and has been invited to give a 15 minute presentation on Bibliosight.

The workshop has two main objectives:

1. To learn the techniques/methodologies that professionals managing repositories use to identify new content for their repositories and the potential benefits as well as the shortcomings that they have identified in the JournalTOCsAPI

2. To give an opportunity to repository managers and API developers to learn the thoughts of experts in institutional repositories for efficiently integrating and reusing up-to-date journal TOC RSS feeds within repository systems and forward looking research information systems.

Action: NS to attend and participate as required

6. Project management tasks – project evaluation

The project management task to be addressed on the blog will be project evaluation.

Action: NS/WL to liaise and post on project evaluation

7. Formal end of project

The formal end of the project in line with the jiscri programme is the end of Novemeber 2009 by which time we are confident we will have a detailed proof of concept for Bibliosight that is well documented on the blog. However, there is still a considerable amount to be done to implement a fully functional Bib App which is a valuable outcome for the institution and the sector; work will therefore be ongoing beyond the end of the jiscri project, internal resources allowing.

8. A.O.B.

None

Posted in Bibliosight | Tagged: , | 1 Comment »

Project meeting – minutes

Posted by Nick on October 1, 2009

(Date of meeting 29th September 2009)

Present:  Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Sue Rooke, Nick Sheppard

1.  Apologies

No apologies

2.  Team membership

Thank you to Sue Rooke who has agreed to join the Bibliosight project team; Sue is a research administrator in the Faculty of Health and has already been involved in repository development, contributing to developing workflows and providing feedback on the Open Search interface.  We hope that Sue will contribute, in particular, to use case development.

The team is still lacking a representative from the academic community and we are currently waiting for a reply to recent correspondence. WL is attending the research sub-committee on Monday 5th October and may raise the issue there if necessary.

Action:  WL/NS to pursue academic contact(s) for a representative to sit on the project team

3.  Progress since last meeting

• API

We have now received the updated documentation from Thomson Reuters and Mike has submitted a query to the API  and received an appropriate response in XML. Thomson Reuters’ FAQ gives a full summary of the data fields that can be queried by the service and the data elements that can be returned which appears to be in line with this XML response.

We are therefore able to formally reduce the associated risk back to low:

Risk Probability Impact Action to Prevent/Manage Risk
API unsuitable for project deliverables Low (elevated to Medium;1stSeptember 2009 – reduced back to Low; 29th September 2009) High Feedback from Thomson Reuters indicates proposal technically feasible.

Problems with API/documentation have been mitigated by release of new documentation from Thomson Reuters; 29th September 2009)

N.B.  The wording of the documentation appears to suggest that it is only possible to return 100 records with a single query using the API – NS to clarify with Thomson Reuters.  If this is the case, the practical implications  are limited in the case of Leeds Metropolitan University which publishes a relatively small amount of research but would be considerable for an institution with a greater research output.

Action:  NS to clarify 100 record limit with TR

Action:  MT to continue appropriate* implementation of API

* Hopefully what is “appropriate” will evolve over the coming weeks!

• Use cases

Technical difficulties have contributed to a lack of conceptual clarity amongst the project team and there was considerable discussion around precisely what data Bibliosight will now seek to retrieve from WoS using the API and what we will aim to achieve with that data.

The original use case narratives outlined in the bid were several and focussed on an alert service for researchers and/or repository administrators to encourage the deposit of an appropriate full text in the repository and perhaps neglected the obvious administrative use case whereby metadata from WoS is pulled directly into intraLibrary.

N.B.  An important use case was also the extraction of citation metrics that would potentially inform the REF – we are not yet clear how this would be achieved but we understand it will rely on the Article Match Retrieval service.

Of course we also want to produce outputs that are of use to the wider community rather than just to users of our specific repository software and this reflects the considerations of the Readiness for REF project which also hopes to enable UK repositories to make effective and efficient use of the WoS API (as part of a much broader project) and is focussing on EPrints, DSpace and Fedora as the most well established OA research repository platforms.  R4R raises several pertinant questions, many of which also arose independently and in a similar form during our own discussion:

  • What are the different workflows relevant to (i) backfilling a repository with a one-off download and (ii) ongoing use of WoSAPI to populate a repository?
  • What uses might records downloaded from WoSAPI be put to?
  • How might the workflows be designed to enable other datastreams also to help populate the repository (eg from UK PubMedCentral, arXiv, or sources that better serve the arts, humanities and social sciences)?
  • What workflows might be able to handle facts such as that the WoS record will become available some time after the paper is published, whereas deposit into the repository may happen earlier than that?
  • What methods might be helpful in addressing the inevitable questions of duplicate records, or ambiguous relations with existing records?
  • Are there implications for a repository’s mission and reputation if the balance of content it holds is rapidly changed by a large number of WoS-derived records?

Use cases may also be informed by the JournalTOCsAPI project (see item 5 below) who also explored similar issues in a recent post.

One  practical consideration from a technical perspective and that will have a bearing on developing use cases is the best method of extracting comprehensive records from institution “X” – the most appropriate field to query seems to be the address field but it is not clear how consistent the institutional address in this field will be – for example, early experimentation has found that “leeds metropolitan university” only returns 201 records; using a wildcard in the form “leeds met*”, however, returns 1503 records (test conducted 29th September 2009).  This was an issue flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

In terms of the practicalities of actually getting records from WoS into intraLibrary once they have been harvested, Peter did indicate that it should be possible to upload suitable XML records into intraLibrary though this will need to be in LOM format, meaning that we may need to perform an XSLT transformation to convert data retrieved from WoS into a suitable format.  Also, Peter is uncertain whether XML that can be imported in this way will also include the LOM extensions we are using to accommodate bibliographic information and will need to speak to his technical colleagues at Intrallect to clarify.

Note:  There was also discussion around appropriate integration with SFX, our OpenURL resolver, as a possible means of identifying a published URL for WoS records – this is an area that has scope implications both for Bibliosight and the remit of the Leeds Metropolitan University repository itself; beyond an Open Access repository of research (i.e. to also comprise citation only records).  This is an area that may need to be explored in more detail later in the project.

Action:  PD to clarify re upload of XML to intraLibrary including LOM extensions

Action:  NS/BB/SR to meet with another member of the URO to clarify potential use cases (meeting on Thursday 1st October)

Action:  All team members to contribute to ongoing discussion on the blog.

• Project reporting – blog; tags specified by JISC

It was agreed that the specific subject for blog posts this month will be ‘Technical standards’ – Peter agreed to contribute a post before the next meeting.

Action: PD to contribute a blog post on ‘technical standards’.

Action: All team members to contribute to ongoing discussion on the blog.

4.  Visit by Thomson Reuters reps on Wednesday 30th September

Mike and I met with Jon and Gareth from TR on Wednesday 30th (yesterday) who were able to clarify several issues for us – separate post to follow

5. Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

During the meeting, I gave a quick overview of the recently released JournalTOCsAPI at http://www.journaltocs.hw.ac.uk/index.php?action=api with a view to de-mysifying the concept of an API for the less technical amongst us and also potentially giving the more technical a developmental steer.  Currently, queries need to be submitted to the API by URL and are returned as an RSS feed which includes as much metadata as in the original TOC feed – depending on the quality of the original record – comparable to Bibliosight in many respects, this project perhaps has greater flexibility regarding the metadata it is able to query and return – it is, after all, building an API from the ground up that will query an openly accessible data source – however, it is likely that the quality of the data may not be as consistent as WoS; there may be fields missing, for example.

It has also been informative to engage with another, similar project as a ‘user’ and we discussed how Bibliosight might also engage with JournalTOCsAPI community of users and agreed that it is a valuable opportunity to solicit the opinion of repository managers from other institutions using different software platforms.

Action:  NS to continue engaging with JournalTOCsAPI as a ‘user’

Action:  NS to send an email that can be forwarded to JournalTOCsAPI community of users as suggested in recent correspondence from Lisa Rogers

6.  Article Match Retrieval & Researcher ID

These were only touched upon briefly in the meeting and flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

7.  A.O.B.

None

8.  Date of next meeting

20th October 2009 – 11:30 am

Posted in Bibliosight, Progress post, SCRUM minutes | Tagged: , , , , , , , , , | 2 Comments »

Project meeting – minutes

Posted by Nick on July 14, 2009

Present:  Charles Duncan, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Nick Sheppard

1.  Apologies

Phil Jones sent his apologies.

Peter Douglas sent his apologies – Charles Duncan attending from Intrallect in his stead.

2.  Project overview

WL chaired the meeting and began by presented an overview of the proposed project; to exploit the Web of Science web-services API in order to promote full text deposit of author versions of published peer reviewed research papers in the Leeds Met repository; to develop an alerting service to alert the repository team/URO when a research paper associated with Leed Met is picked up by WoS; automated communication to a researcher which would alert them to the presence of their citation on Web of Science, and request an author version for the repository; potentially also to import metadata from WoS to automatically populate the repository.

3. Project management and meetings

The project is funded under the JISC Rapid Innovation programme (tag: JISCRI; programme code repository and wiki at http://code.google.com/p/jiscri/) and is due to complete at the end of November 2009.  A rapid development cycle is therefore essential and will be based on the SCRUM methodology recommended by JISC.

  • Team and roles

The team of 6 people comprises:

a) Members responsible for project deliverables

Wendy Luker – Project Manager (or SCRUM master); Arthur Sargeant – Project consultant; Mike Taylor – Web-developer responsible for technical development; Nick Sheppard – Repository Development Officer responsible for project research; Peter Douglas – representative of Intrallect

b) Representative stakeholders who will inform development and potentially benefit from project deliverables.

Babita Bhogal – represents the University Research Office; a potential customer/user of project deliverables; Phil Jones – represents the Carnegie Research Institute; a potential customer/user of project deliverables.

There will be 5 “sprint” cycles; at the end of each cycle there will be a full team meeting to review progress and technical development.  In addition NS/MT will liaise more closely throughout the sprint cycle including face to face on a weekly basis – these meetings may also include WL, AS as necessary.

N.B.  The JISC programme manager has indicated that Bibliosight could benefit from work being done at Kings College with the R4R (Readiness for REF) project and should also liaise with another JISCRI project based at Heriot Watt University that is building an API for ticTocs.

Action:  NS – investigate / establish contact with these projects and provide a detailed overview before the next meeting.

To reflect the scale of projects under the programme, JISC are advocating a light-weight reporting framework utilising the blog as the primary mechanism.  It is anticipated that all team members will contribute to the blog and that the subject for posts will be specified at each meeting in line with 6 subject areas specified by JISC.  These are:  Project SWOT analyses; User participation; Day to day work; Technical standards; Value add; Small wins and fails; Progress report.

Aggregation tag for the project is #bibliosight (blog posts and Twitter updates).  Other relevant tags are #JISCRI, #SWOT, #rapidInnovation, #progressPosts, #UseCase

Action:  NS/WL -  blog initial SWOT analysis in advance of next meeting.

Action:  NS – ensure all team members have administrative access to the blog.

  • Technical

The first workpackage is a “full technical review of Web of Science Web Services API / technical developments required to appropriately integrate API into repository” with the time scale June-July 2009.

NS/MT recently attended a webinar run by Thomson Reuters where they presented an Introduction to Thomson Reuters Research Evaluation Tools which reviewed the API; MT has also reviewed API documentation and has gained the appropriate administrative permissions to run a Java programming environment on his local machine and is now in a position to explore the API in more detail. MT may require technical input from Java programmers at Intrallect and CD confirmed that this would be acceptible under the terms of the bid.

A code repository has also been set up in line with JISC guidelines at http://code.google.com/p/bibliosight/.  This is where any code produced by the project will be stored subject to appropriate Open Source licensing (see below) and the location for all documentation and bug tracking.  The version control system implemented is subversion; as  the only developer currently associated with the project, MT is the only user who requires full  access.

There was also some preliminary discussion around how the API will most appropriately be integrated into the Leeds Met repository; whether WoS data will be pulled directly into intraLibrary or into an external environment for example and what the implications of this might be eg. prototype proof of concept build of intraLibrary.  However, it was decided that initial focus should be on manually mapping the process and on the API itself before these issues can usefully be explored further.

CD raised a technical question regarding the API; whether the interface only supports  SOAP or if it can also supports REST which would potentially provide a lower technical threshold.

Action:  NS – full review of Thomson Reuters services; article match and retrieve; Web-services lite; Researcher ID upload; Researcher ID download; Web-services premium.  Disambiguation of free vs. paid services.  SOAP vs. REST

Action:  MT – explore / implement API and document process.  Establish precisely what information can be extracted from WoS using the API.

Action:  NS/MT/AS – manually review WoS to elucidate desired process i.e. What information do we want and what information can we get a) manually b) programmatically (free vs. paid)

  • User testing and engagement

This will be facilitated through appropriate liaison with BB (URO) and PJ (CRI) and will initially focus on communication – NS is attending the CRI Readers’ and Professors’ meeting on Thursday 16th July – and generating use-cases and scenarios, possibly in collaboration with Intrallect who have experience and expertise in this area.

Action:  NS to attend CRI Readers’ and professors’ meeting on Thursday 16th July for initial communication and feedback.

Action:  NS/BB/PD to liaise to generate preliminary use-cases/scenarios.

4.  Licensing

Software/code/project deliverables are to be made available under appropriate licence agreements in line with JISC guidelines.  The licence provisionally applied at http://code.google.com/p/bibliosight/ is GNU GENERAL PUBLIC LICENSE Version 3 – http://www.gnu.org/copyleft/gpl.html.  This may or may not be suitable for our software requirements; other project deliverables may require different licensing models; requires further research.

Action:  NS to research; liaise with OSS watch to clarify licensing issues

5.  AOB

Administrative housekeeping (unminuted)

6.  Date(s) of next meeting(s)

Given the short project lifecycle, it was decided that provisional/approximate dates should be outlined for all remaining meetings:

  • w/c 31st August 2009 (bank holiday Monday)
  • Late September
  • Mid-late October
  • Early November
  • Last week in November

Action:  NS to call next meeting w/c 31st August 2009

Posted in Bibliosight, SCRUM minutes | Tagged: , , , , | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.