(Date of meeting 29th September 2009)
Present: Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Sue Rooke, Nick Sheppard
1. Apologies
No apologies
2. Team membership
Thank you to Sue Rooke who has agreed to join the Bibliosight project team; Sue is a research administrator in the Faculty of Health and has already been involved in repository development, contributing to developing workflows and providing feedback on the Open Search interface. We hope that Sue will contribute, in particular, to use case development.
The team is still lacking a representative from the academic community and we are currently waiting for a reply to recent correspondence. WL is attending the research sub-committee on Monday 5th October and may raise the issue there if necessary.
Action: WL/NS to pursue academic contact(s) for a representative to sit on the project team
3. Progress since last meeting
• API
We have now received the updated documentation from Thomson Reuters and Mike has submitted a query to the API and received an appropriate response in XML. Thomson Reuters’ FAQ gives a full summary of the data fields that can be queried by the service and the data elements that can be returned which appears to be in line with this XML response.
We are therefore able to formally reduce the associated risk back to low:
| Risk | Probability | Impact | Action to Prevent/Manage Risk |
| API unsuitable for project deliverables | Low (elevated to Medium;1stSeptember 2009 – reduced back to Low; 29th September 2009) | High | Feedback from Thomson Reuters indicates proposal technically feasible.
Problems with API/documentation have been mitigated by release of new documentation from Thomson Reuters; 29th September 2009) |
N.B. The wording of the documentation appears to suggest that it is only possible to return 100 records with a single query using the API – NS to clarify with Thomson Reuters. If this is the case, the practical implications are limited in the case of Leeds Metropolitan University which publishes a relatively small amount of research but would be considerable for an institution with a greater research output.
Action: NS to clarify 100 record limit with TR
Action: MT to continue appropriate* implementation of API
* Hopefully what is “appropriate” will evolve over the coming weeks!
• Use cases
Technical difficulties have contributed to a lack of conceptual clarity amongst the project team and there was considerable discussion around precisely what data Bibliosight will now seek to retrieve from WoS using the API and what we will aim to achieve with that data.
The original use case narratives outlined in the bid were several and focussed on an alert service for researchers and/or repository administrators to encourage the deposit of an appropriate full text in the repository and perhaps neglected the obvious administrative use case whereby metadata from WoS is pulled directly into intraLibrary.
N.B. An important use case was also the extraction of citation metrics that would potentially inform the REF – we are not yet clear how this would be achieved but we understand it will rely on the Article Match Retrieval service.
Of course we also want to produce outputs that are of use to the wider community rather than just to users of our specific repository software and this reflects the considerations of the Readiness for REF project which also hopes to enable UK repositories to make effective and efficient use of the WoS API (as part of a much broader project) and is focussing on EPrints, DSpace and Fedora as the most well established OA research repository platforms. R4R raises several pertinant questions, many of which also arose independently and in a similar form during our own discussion:
- What are the different workflows relevant to (i) backfilling a repository with a one-off download and (ii) ongoing use of WoSAPI to populate a repository?
- What uses might records downloaded from WoSAPI be put to?
- How might the workflows be designed to enable other datastreams also to help populate the repository (eg from UK PubMedCentral, arXiv, or sources that better serve the arts, humanities and social sciences)?
- What workflows might be able to handle facts such as that the WoS record will become available some time after the paper is published, whereas deposit into the repository may happen earlier than that?
- What methods might be helpful in addressing the inevitable questions of duplicate records, or ambiguous relations with existing records?
- Are there implications for a repository’s mission and reputation if the balance of content it holds is rapidly changed by a large number of WoS-derived records?
Use cases may also be informed by the JournalTOCsAPI project (see item 5 below) who also explored similar issues in a recent post.
One practical consideration from a technical perspective and that will have a bearing on developing use cases is the best method of extracting comprehensive records from institution “X” – the most appropriate field to query seems to be the address field but it is not clear how consistent the institutional address in this field will be – for example, early experimentation has found that “leeds metropolitan university” only returns 201 records; using a wildcard in the form “leeds met*”, however, returns 1503 records (test conducted 29th September 2009). This was an issue flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).
In terms of the practicalities of actually getting records from WoS into intraLibrary once they have been harvested, Peter did indicate that it should be possible to upload suitable XML records into intraLibrary though this will need to be in LOM format, meaning that we may need to perform an XSLT transformation to convert data retrieved from WoS into a suitable format. Also, Peter is uncertain whether XML that can be imported in this way will also include the LOM extensions we are using to accommodate bibliographic information and will need to speak to his technical colleagues at Intrallect to clarify.
Note: There was also discussion around appropriate integration with SFX, our OpenURL resolver, as a possible means of identifying a published URL for WoS records – this is an area that has scope implications both for Bibliosight and the remit of the Leeds Metropolitan University repository itself; beyond an Open Access repository of research (i.e. to also comprise citation only records). This is an area that may need to be explored in more detail later in the project.
Action: PD to clarify re upload of XML to intraLibrary including LOM extensions
Action: NS/BB/SR to meet with another member of the URO to clarify potential use cases (meeting on Thursday 1st October)
Action: All team members to contribute to ongoing discussion on the blog.
• Project reporting – blog; tags specified by JISC
It was agreed that the specific subject for blog posts this month will be ‘Technical standards’ – Peter agreed to contribute a post before the next meeting.
Action: PD to contribute a blog post on ‘technical standards’.
Action: All team members to contribute to ongoing discussion on the blog.
4. Visit by Thomson Reuters reps on Wednesday 30th September
Mike and I met with Jon and Gareth from TR on Wednesday 30th (yesterday) who were able to clarify several issues for us – separate post to follow
5. Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api
During the meeting, I gave a quick overview of the recently released JournalTOCsAPI at http://www.journaltocs.hw.ac.uk/index.php?action=api with a view to de-mysifying the concept of an API for the less technical amongst us and also potentially giving the more technical a developmental steer. Currently, queries need to be submitted to the API by URL and are returned as an RSS feed which includes as much metadata as in the original TOC feed – depending on the quality of the original record – comparable to Bibliosight in many respects, this project perhaps has greater flexibility regarding the metadata it is able to query and return – it is, after all, building an API from the ground up that will query an openly accessible data source – however, it is likely that the quality of the data may not be as consistent as WoS; there may be fields missing, for example.
It has also been informative to engage with another, similar project as a ‘user’ and we discussed how Bibliosight might also engage with JournalTOCsAPI community of users and agreed that it is a valuable opportunity to solicit the opinion of repository managers from other institutions using different software platforms.
Action: NS to continue engaging with JournalTOCsAPI as a ‘user’
Action: NS to send an email that can be forwarded to JournalTOCsAPI community of users as suggested in recent correspondence from Lisa Rogers
6. Article Match Retrieval & Researcher ID
These were only touched upon briefly in the meeting and flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).
7. A.O.B.
None
8. Date of next meeting
20th October 2009 – 11:30 am