BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Archive for the ‘SCRUM minutes’ Category

Project meeting – minutes

Posted by wendyluker on November 11, 2009

Minutes of the Bibliosight Meeting

Tuesday 20th October 2009

1.  Apologies

Nick, Sue, Babita

2.  Minutes of the last meeting, and actions

Actions :

WL /NS to pursue academic contacts for a representative – this has been on-going, but at this stage of the project it seemed unlikely that we would now get a representative.  Academic staff / researchers to be involved in evaluating the outcomes of the project.

PD to clarify upload of XML to intraLibrary including LOM extensions – Peter confirmed that this could be done.

NS/BB/SR to meet with another member of the URO to clarify potential use cases: Wendy reported that Nick had met with Sue Rooke and Sam Armitage, and work had been done on use cases.  Nick would be able to clarify this on his return to work.

PD to contribute blog post on technical standards : on-going.
New action: Wendy to send Peter the required tags for the post.

All team members to contribute to on-going discussion on the blog – reiterated!

3. Update on meeting with Thomson Reuters

Mike updated the group on the meeting with Thomson Reuters.  We have access to the unrestricted API, but we are not entitled to use it to a greater extent than would be provided by the Web Services Lite version.  Even though it appeared that the 100 record limit may not be an issue after all, in fact if we download the initial set of records year by year then this should not present an issue.  Wendy and Arthur reported on some testing of the Web of Science search interface that they had been doing to check whether the ‘Leeds Metropolitan Univ’ search would be sufficiently robust, and it appeared to be so.

We will need to display WofS / Thomson Reuters terms and conditions alongside any material retrieved from WofS.  There is a place in LOM for this.

4. Update on Use Cases

The use cases will be a useful output of the project, and need further work at this stage, e.g. we need to ensure we capture the information around the intended alerting service: at what point will individuals be alerted? Where will the alert come from?
More work also needed on cataloguing workflows, and how we will deal with the initial 1485 items that will be downloaded.

5. API – next steps in the development

Mike updated the group on progress with the API.

At this stage we can:

  • Get records out of WofS
  • Transform them into XML
    Action Nick: what is the LOM XML?
  • Load them into intraLibrary

Mike needed several decisions to be made before he could progress further:

Would the process for downloading be manual or automated? MANUAL

Would the client be desktop or web based: DESKTOP

It was also decided that the XSLT should be easily swapped out so that it can be output in different formats, i.e. to other interfaces, whether they be Endnote, for example, or another repository.  This would be of benefit to the rest of the community.

The group discussed the diagram that Nick had put up on the Blog recently, with regard to the intended scope of the current project, and which tasks might be part of further developments.

Action: Arthur to update the diagram to make it clear what would be achieved by the end of November (encompassing the intended outputs of the original project) and what the future developments might be.

6. Project management tasks: technical standards and value add

The next of the project management tasks to be addressed on the blog would be day to day work.

Action: Nick on his return

Peter would supply a blog on technical standards

Action: Wendy to send Peter the appropriate tags.

7. Other business

There was no other business

8. Date and time of next meeting

The next meeting will be held on Tuesday 17th November, starting at 1pm.
Peter will arrive at approx. 11am for a pre-meeting with Nick (and others) about use cases.

Posted in Bibliosight, SCRUM minutes | Tagged: , , | 1 Comment »

Project meeting – minutes

Posted by Nick on October 1, 2009

(Date of meeting 29th September 2009)

Present:  Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Sue Rooke, Nick Sheppard

1.  Apologies

No apologies

2.  Team membership

Thank you to Sue Rooke who has agreed to join the Bibliosight project team; Sue is a research administrator in the Faculty of Health and has already been involved in repository development, contributing to developing workflows and providing feedback on the Open Search interface.  We hope that Sue will contribute, in particular, to use case development.

The team is still lacking a representative from the academic community and we are currently waiting for a reply to recent correspondence. WL is attending the research sub-committee on Monday 5th October and may raise the issue there if necessary.

Action:  WL/NS to pursue academic contact(s) for a representative to sit on the project team

3.  Progress since last meeting

• API

We have now received the updated documentation from Thomson Reuters and Mike has submitted a query to the API  and received an appropriate response in XML. Thomson Reuters’ FAQ gives a full summary of the data fields that can be queried by the service and the data elements that can be returned which appears to be in line with this XML response.

We are therefore able to formally reduce the associated risk back to low:

Risk Probability Impact Action to Prevent/Manage Risk
API unsuitable for project deliverables Low (elevated to Medium;1stSeptember 2009 – reduced back to Low; 29th September 2009) High Feedback from Thomson Reuters indicates proposal technically feasible.

Problems with API/documentation have been mitigated by release of new documentation from Thomson Reuters; 29th September 2009)

N.B.  The wording of the documentation appears to suggest that it is only possible to return 100 records with a single query using the API – NS to clarify with Thomson Reuters.  If this is the case, the practical implications  are limited in the case of Leeds Metropolitan University which publishes a relatively small amount of research but would be considerable for an institution with a greater research output.

Action:  NS to clarify 100 record limit with TR

Action:  MT to continue appropriate* implementation of API

* Hopefully what is “appropriate” will evolve over the coming weeks!

• Use cases

Technical difficulties have contributed to a lack of conceptual clarity amongst the project team and there was considerable discussion around precisely what data Bibliosight will now seek to retrieve from WoS using the API and what we will aim to achieve with that data.

The original use case narratives outlined in the bid were several and focussed on an alert service for researchers and/or repository administrators to encourage the deposit of an appropriate full text in the repository and perhaps neglected the obvious administrative use case whereby metadata from WoS is pulled directly into intraLibrary.

N.B.  An important use case was also the extraction of citation metrics that would potentially inform the REF – we are not yet clear how this would be achieved but we understand it will rely on the Article Match Retrieval service.

Of course we also want to produce outputs that are of use to the wider community rather than just to users of our specific repository software and this reflects the considerations of the Readiness for REF project which also hopes to enable UK repositories to make effective and efficient use of the WoS API (as part of a much broader project) and is focussing on EPrints, DSpace and Fedora as the most well established OA research repository platforms.  R4R raises several pertinant questions, many of which also arose independently and in a similar form during our own discussion:

  • What are the different workflows relevant to (i) backfilling a repository with a one-off download and (ii) ongoing use of WoSAPI to populate a repository?
  • What uses might records downloaded from WoSAPI be put to?
  • How might the workflows be designed to enable other datastreams also to help populate the repository (eg from UK PubMedCentral, arXiv, or sources that better serve the arts, humanities and social sciences)?
  • What workflows might be able to handle facts such as that the WoS record will become available some time after the paper is published, whereas deposit into the repository may happen earlier than that?
  • What methods might be helpful in addressing the inevitable questions of duplicate records, or ambiguous relations with existing records?
  • Are there implications for a repository’s mission and reputation if the balance of content it holds is rapidly changed by a large number of WoS-derived records?

Use cases may also be informed by the JournalTOCsAPI project (see item 5 below) who also explored similar issues in a recent post.

One  practical consideration from a technical perspective and that will have a bearing on developing use cases is the best method of extracting comprehensive records from institution “X” – the most appropriate field to query seems to be the address field but it is not clear how consistent the institutional address in this field will be – for example, early experimentation has found that “leeds metropolitan university” only returns 201 records; using a wildcard in the form “leeds met*”, however, returns 1503 records (test conducted 29th September 2009).  This was an issue flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

In terms of the practicalities of actually getting records from WoS into intraLibrary once they have been harvested, Peter did indicate that it should be possible to upload suitable XML records into intraLibrary though this will need to be in LOM format, meaning that we may need to perform an XSLT transformation to convert data retrieved from WoS into a suitable format.  Also, Peter is uncertain whether XML that can be imported in this way will also include the LOM extensions we are using to accommodate bibliographic information and will need to speak to his technical colleagues at Intrallect to clarify.

Note:  There was also discussion around appropriate integration with SFX, our OpenURL resolver, as a possible means of identifying a published URL for WoS records – this is an area that has scope implications both for Bibliosight and the remit of the Leeds Metropolitan University repository itself; beyond an Open Access repository of research (i.e. to also comprise citation only records).  This is an area that may need to be explored in more detail later in the project.

Action:  PD to clarify re upload of XML to intraLibrary including LOM extensions

Action:  NS/BB/SR to meet with another member of the URO to clarify potential use cases (meeting on Thursday 1st October)

Action:  All team members to contribute to ongoing discussion on the blog.

• Project reporting – blog; tags specified by JISC

It was agreed that the specific subject for blog posts this month will be ‘Technical standards’ – Peter agreed to contribute a post before the next meeting.

Action: PD to contribute a blog post on ‘technical standards’.

Action: All team members to contribute to ongoing discussion on the blog.

4.  Visit by Thomson Reuters reps on Wednesday 30th September

Mike and I met with Jon and Gareth from TR on Wednesday 30th (yesterday) who were able to clarify several issues for us – separate post to follow

5. Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

During the meeting, I gave a quick overview of the recently released JournalTOCsAPI at http://www.journaltocs.hw.ac.uk/index.php?action=api with a view to de-mysifying the concept of an API for the less technical amongst us and also potentially giving the more technical a developmental steer.  Currently, queries need to be submitted to the API by URL and are returned as an RSS feed which includes as much metadata as in the original TOC feed – depending on the quality of the original record – comparable to Bibliosight in many respects, this project perhaps has greater flexibility regarding the metadata it is able to query and return – it is, after all, building an API from the ground up that will query an openly accessible data source – however, it is likely that the quality of the data may not be as consistent as WoS; there may be fields missing, for example.

It has also been informative to engage with another, similar project as a ‘user’ and we discussed how Bibliosight might also engage with JournalTOCsAPI community of users and agreed that it is a valuable opportunity to solicit the opinion of repository managers from other institutions using different software platforms.

Action:  NS to continue engaging with JournalTOCsAPI as a ‘user’

Action:  NS to send an email that can be forwarded to JournalTOCsAPI community of users as suggested in recent correspondence from Lisa Rogers

6.  Article Match Retrieval & Researcher ID

These were only touched upon briefly in the meeting and flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

7.  A.O.B.

None

8.  Date of next meeting

20th October 2009 – 11:30 am

Posted in Bibliosight, Progress post, SCRUM minutes | Tagged: , , , , , , , , , | 2 Comments »

Project meeting – minutes

Posted by Nick on September 9, 2009

Present: Peter Douglas, Wendy Luker, Arthur Sargeant,, Mike Taylor, Babita Bhogal, Nick Sheppard

1. Apologies

Phil Jones sent his apologies; Phil is leaving Leeds Met in October and it was agreed the project team should formally approach another academic member of staff as a replacement. In addition, it was thought that the project team would benefit from a research administration perspective; potential canditates were suggested and will be approached in due course. This will complete the membership of the project team as outlined in the original bid and plan.

Action: NS to formally approach named individuals

2. Progress since last meeting

• API

Having followed the documentation supplied by Thomson Reuters for the API, it is throwing an error that we have been unable to interpret due to lack of specific expertise in Java. MT suspected that there may, in fact, be a problem with the API itself and/or the documentation. Upon enquiry, Thomas Reuters have indicated that the documentation is, in fact, a little out of date and they are on the verge of releasing new versions of all of their Web Services documentation, hopefully by the end of August; in response to a follow up enquiry in advance of this meeting (28th August) Thomson Reuters were not yet able to offer any update on when the new documentation will be released.

Thomson Reuters are not able to offer formal technical support with respect to implementing the API; they did indicate that they would pass on the error message to their developers but suggested that we would also be advised to identify our own resources to resolve the problem. (NB. We have now identified a colleague with the appropriate Java skills who has agreed to look at the current API documentation and help us interpret the error.)

The project team needs to come to a fuller understanding of the differences between the various services offered by Thomson Reuters, who have recently defined Web Services Lite and Web Services Premium for us as follows:

“Web Services Lite: This service responds to queries to return a limited range of data elements from the Web of Science. The fields are Author, Source (volume, number, issue, date, page span), Article Title, Keywords, and UT (a unique record identifier). The primary use case for the Lite service is to populate institutional repositories and is scheduled to be made available within the next two-to-three weeks. This service is free.”

“Web Services Premium: This service is a much more robust version of WS Lite and is very similar to the API we sent you earlier. The primary differences are that the service needs to be entitled and has much, much better documentation. WS Premium is scheduled to be available within the next month to six weeks. I’m not sure what the price is (if any) for the service, but we hope to have that sorted out in the very near future.”

We hope that the updated documentation will help to clarify the situation for us; the fact that it is not yet available is obviously of some concern in the context of such a short project; it was agreed that, should the new WS documentation from Thomson Reuters not become available before the next scheduled meeting (Tuesday 29th September 2009) then major deliverables of the project may be seriously compromised. To reflect this, the associated risk has been officially elevated and will be fed back to JISC via the programme manager:

Risk Probability Impact Action to Prevent/Manage Risk
API unsuitable for project deliverables Low (elevated to Medium;1stSeptember 2009) High Feedback from Thomson Reuters indicates proposal technically feasible.

Problems with API/documentation should be mitigated by forthcoming release of new documentation from Thomson Reuters; 1st September 2009)

Action:  NS to liaise with JISC via programme manager to emphasise elevated risk

Action:  NS to continue to liaise with Thomson Reuters regarding new API documentation

• SWOT analysis
SWOT analysis was emphasised as an ongoing process thoughout the lifetime of the project in line with JISC guidelines for Rapid Innovation projects and the project team were reminded that they can continue to contribute via the blog or via the PollDaddy questionnaire at http://surveys.polldaddy.com/s/5768FF905C3EB6E7/. The most recent SWOT post on the blog is dated August 13th 2009.

The most serious current threat is the ongoing problem implementing the API which represents an external technological threat that is difficult to mitigate against due to our lack of direct control.

Action: Continue to undertake and document SWOT analyses throughout the lifetime of the project

• Project reporting

WL emphasised again the lightweight approach to project reporting adopted by JISC for Rapid Innovation projects, using the blog as the primary mechanism.
Feedback from the JISC programme manager has been positive, acknowledging that the blog is updated regularly and is of a high quality; it was acknowledged that most members of the project team have contributed to the blog and they are asked to continue to do so.

Moreover, the benefits of open project reporting have been illustrated by appropriate liaison with other projects; even the difficulties we have experienced in implementing the API are useful to others who will be able to learn from the problems we have encountered.

Action: Project team to continue to contribute regular progress posts to the blog
3. Liaison with other projects

• Readiness for REF – http://www.kcl.ac.uk/iss/cerch/projects/portfolio/r4r.html

The project manager for R4R, Stephen Grace, very kindly answered recent correspondence acknowledging that Andy McGregor had spoken to him about Bibliosight and giving an overview of relevant aspects of their project; as far as I understand, the full remit of R4R is somewhat broader than Bibliosight. An element of the project, however, is work that will enable UK repositories to make effective and efficient use of the Web of Science API which is directly comparable to Bibliosight.

N.B. R4R explicitly identify that their work in this area will be of benefit to at least EPrints, DSpace and Fedora software which are the most well used repository platforms across the sector. Our own platform, of course, is intraLibrary and depending on how the API is implemented by the respective projects, there is likely to be considerable cross-over; like R4R, we also aim to deliver outputs that are of wider use across the sector.

Stephen emphasised that work on the API and associated workflows has not yet begun – R4R is a two year project running from April 2009 – March 2011- so there is unlikely to be a great deal of scope for direct liaison between the projects though there is still an opportunity for valuable communication throughout and beyond the end of our project.

• JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/API/blog/

The JournalTOCsAPI project team has been working to recruit volunteers from across the sector to test their API and, in recent correspondence, recognised the potential synergies between our projects and acknowledged that, when their community is established, it will be useful to explore how Bibliosight can also engage with them. JournalTOCsAPI anticipate having a prototype of their API ready for testing some time in September; for Bibliosight, however, our technical problems mean that a prototype is unlikely to be ready for testing before mid to late October at the earliest.

Action: NS to continue liaising with R4R and JournalTOCsAPI as appropriate

4. Use case development

Given technical problems implementing the API, it seems sensible to focus on use case development; NS and BB/URO have begun to liaise in this regard.
In theory the lack of a working prototype is no barrier to developing detailed use cases; indeed it is desirable to define functional requirements entirely independently of software development.

Action: NS/BB to liaise to develop detailed use cases

5. A.O.B.

None

6. Date of next meeting

Tuesday 29th September 2009

Posted in SCRUM minutes | Tagged: , , , | 1 Comment »

Project meeting – minutes

Posted by Nick on July 14, 2009

Present:  Charles Duncan, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Nick Sheppard

1.  Apologies

Phil Jones sent his apologies.

Peter Douglas sent his apologies – Charles Duncan attending from Intrallect in his stead.

2.  Project overview

WL chaired the meeting and began by presented an overview of the proposed project; to exploit the Web of Science web-services API in order to promote full text deposit of author versions of published peer reviewed research papers in the Leeds Met repository; to develop an alerting service to alert the repository team/URO when a research paper associated with Leed Met is picked up by WoS; automated communication to a researcher which would alert them to the presence of their citation on Web of Science, and request an author version for the repository; potentially also to import metadata from WoS to automatically populate the repository.

3. Project management and meetings

The project is funded under the JISC Rapid Innovation programme (tag: JISCRI; programme code repository and wiki at http://code.google.com/p/jiscri/) and is due to complete at the end of November 2009.  A rapid development cycle is therefore essential and will be based on the SCRUM methodology recommended by JISC.

  • Team and roles

The team of 6 people comprises:

a) Members responsible for project deliverables

Wendy Luker – Project Manager (or SCRUM master); Arthur Sargeant – Project consultant; Mike Taylor – Web-developer responsible for technical development; Nick Sheppard – Repository Development Officer responsible for project research; Peter Douglas – representative of Intrallect

b) Representative stakeholders who will inform development and potentially benefit from project deliverables.

Babita Bhogal – represents the University Research Office; a potential customer/user of project deliverables; Phil Jones – represents the Carnegie Research Institute; a potential customer/user of project deliverables.

There will be 5 “sprint” cycles; at the end of each cycle there will be a full team meeting to review progress and technical development.  In addition NS/MT will liaise more closely throughout the sprint cycle including face to face on a weekly basis – these meetings may also include WL, AS as necessary.

N.B.  The JISC programme manager has indicated that Bibliosight could benefit from work being done at Kings College with the R4R (Readiness for REF) project and should also liaise with another JISCRI project based at Heriot Watt University that is building an API for ticTocs.

Action:  NS – investigate / establish contact with these projects and provide a detailed overview before the next meeting.

To reflect the scale of projects under the programme, JISC are advocating a light-weight reporting framework utilising the blog as the primary mechanism.  It is anticipated that all team members will contribute to the blog and that the subject for posts will be specified at each meeting in line with 6 subject areas specified by JISC.  These are:  Project SWOT analyses; User participation; Day to day work; Technical standards; Value add; Small wins and fails; Progress report.

Aggregation tag for the project is #bibliosight (blog posts and Twitter updates).  Other relevant tags are #JISCRI, #SWOT, #rapidInnovation, #progressPosts, #UseCase

Action:  NS/WL –  blog initial SWOT analysis in advance of next meeting.

Action:  NS – ensure all team members have administrative access to the blog.

  • Technical

The first workpackage is a “full technical review of Web of Science Web Services API / technical developments required to appropriately integrate API into repository” with the time scale June-July 2009.

NS/MT recently attended a webinar run by Thomson Reuters where they presented an Introduction to Thomson Reuters Research Evaluation Tools which reviewed the API; MT has also reviewed API documentation and has gained the appropriate administrative permissions to run a Java programming environment on his local machine and is now in a position to explore the API in more detail. MT may require technical input from Java programmers at Intrallect and CD confirmed that this would be acceptible under the terms of the bid.

A code repository has also been set up in line with JISC guidelines at http://code.google.com/p/bibliosight/.  This is where any code produced by the project will be stored subject to appropriate Open Source licensing (see below) and the location for all documentation and bug tracking.  The version control system implemented is subversion; as  the only developer currently associated with the project, MT is the only user who requires full  access.

There was also some preliminary discussion around how the API will most appropriately be integrated into the Leeds Met repository; whether WoS data will be pulled directly into intraLibrary or into an external environment for example and what the implications of this might be eg. prototype proof of concept build of intraLibrary.  However, it was decided that initial focus should be on manually mapping the process and on the API itself before these issues can usefully be explored further.

CD raised a technical question regarding the API; whether the interface only supports  SOAP or if it can also supports REST which would potentially provide a lower technical threshold.

Action:  NS – full review of Thomson Reuters services; article match and retrieve; Web-services lite; Researcher ID upload; Researcher ID download; Web-services premium.  Disambiguation of free vs. paid services.  SOAP vs. REST

Action:  MT – explore / implement API and document process.  Establish precisely what information can be extracted from WoS using the API.

Action:  NS/MT/AS – manually review WoS to elucidate desired process i.e. What information do we want and what information can we get a) manually b) programmatically (free vs. paid)

  • User testing and engagement

This will be facilitated through appropriate liaison with BB (URO) and PJ (CRI) and will initially focus on communication – NS is attending the CRI Readers’ and Professors’ meeting on Thursday 16th July – and generating use-cases and scenarios, possibly in collaboration with Intrallect who have experience and expertise in this area.

Action:  NS to attend CRI Readers’ and professors’ meeting on Thursday 16th July for initial communication and feedback.

Action:  NS/BB/PD to liaise to generate preliminary use-cases/scenarios.

4.  Licensing

Software/code/project deliverables are to be made available under appropriate licence agreements in line with JISC guidelines.  The licence provisionally applied at http://code.google.com/p/bibliosight/ is GNU GENERAL PUBLIC LICENSE Version 3 – http://www.gnu.org/copyleft/gpl.html.  This may or may not be suitable for our software requirements; other project deliverables may require different licensing models; requires further research.

Action:  NS to research; liaise with OSS watch to clarify licensing issues

5.  AOB

Administrative housekeeping (unminuted)

6.  Date(s) of next meeting(s)

Given the short project lifecycle, it was decided that provisional/approximate dates should be outlined for all remaining meetings:

  • w/c 31st August 2009 (bank holiday Monday)
  • Late September
  • Mid-late October
  • Early November
  • Last week in November

Action:  NS to call next meeting w/c 31st August 2009

Posted in Bibliosight, SCRUM minutes | Tagged: , , , , | 1 Comment »