BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Archive for the ‘Bibliosight’ Category

Final Progress Post

Posted by Nick on December 23, 2009

***Updated February 10th 2010****

Title of Primary Project Output:

The Bibliosight desktop application will allow users to specify an approriate query and retrieve bibliographic data as XML from the Web of Science using the recently released (free) WoS API (WS Lite) and convert into a suitable format for repository ingest via SWORD*

*Due to current limitations of WS Lite, the functionality to convert XML output has not been implemented – see this post on Repository News for more details.

Screenshots or diagram of prototype:


Diagram of how returned XML will be mapped onto LOM XML for ingest to intraLibrary (click on the image for full size):


The full bibliosight process (click on the image for full size):

Description of Prototype:

The prototype is a desktop application written in Java that is linked to Thomson Reuters’ WS Lite, an API that allows the Web of Science to be queried by the following fields:

Field Searchable code
Address (including 5 field below) AD
1.  Street SA
2.  City CI
3.  Province/State PS
4.  Zip/postal code ZP
5.  Country CU
Author AU
Conference (including title, location, data, and sponsor) CF
Group Author GP
Organization OG
Sub-organization SG
Source Publication (journal, book or conference) SO
Title TI
Topic TS

Queries may also be specified by date* and the service will support the AND, OR, NOT, and SAME Boolean operators.

*The date on which a record was added to WoS rather than the date of publication. In most cases the year will be the same but there will certainly be some cases where an article published in one year will not have been added to WoS until the following year.

An overview of the application is as follows:

Query options: Query – Allows the user to specify the fields to query in the form Code=(query parameter) and the service does support wild-cards e.g. AD=(leeds met* univ*)

Query options: Date - Allows the user to specify either the date range (inclusive) or retrieve recent updates within the last week/two weeks/four weeks

Query options: Database: DatabaseID – Currently WOS only; in order to ensure the client is as flexible as possible this field is included to accommodate additional Database IDs and it may be possible to plug-in additional databases in the future, for example.

Query options: Database: Editions – These checkboxes reflect the Citation Databases filter within WoS:

  • AHCI – Arts & Humanities Citation Index (1975-present)
  • ISTP – Conference Proceedings Citation Index- Science (1990-present)*
  • SCI – Science Citation Index (1970-present)
  • SSCI – Social Sciences Citation Index (1970-present)

*ISTP reflects code currently used by API – it is not clear why it doesn’t correspond with term now used in WoS which is CPCI-S – Conference Proceedings Citation Index- Science (1990-present)

Retrieve Options: Start Record – Allows user to specify start record to return from all results

Retrieve Options: Maximum records to retrieve – Allows user to specify maximum records to retrieve between 1 and 100 (N.B.  The API is currently restricted to a maximum of 100 records though it can be queried multiple times.)

Retrieve Options: Sort by (Date) (Ascending/Descending) – Allows user to sort records (currently by date only) ascending or descending in date order.

Proxy settings: This is purely for local network setup at Leeds Met and has nothing to do with WoS but will be necessary for users that are behind a proxy server.

View results: View results of current query (as XML)

Save results: Save results of current query

Perform search request: Perform the specified query

Link to working prototype:

There are several issues with distributing a working prototype in that it has a number of dependencies, some of which are specific to the WS Lite service and it is our view that it is less confusing to release the code only, which is available from http://code.google.com/p/bibliosight/

A screen-cast of the working prototype is available here.

Please note that you will require an appropriate subscription to ISI Web of Knowledge; the service requires an authorised IP address and you will also need to register for Thomson Reuters Web of Science® web services programming interface (WS Lite) by agreeing to the Terms & Conditions at http://science.thomsonreuters.com/info/terms-ws/ and completing a registration form – if you have any problems you should contact your Thomson Reuters account manager.

Link to end user documentation:

End user documentation:  http://bibliosightnews.wordpress.com/end-user-documentation/

About the project:  http://bibliosightnews.wordpress.com/about/

For use cases see: http://bibliosightnews.wordpress.com/use-cases/

Link to code repository or API:

The code is available from http://code.google.com/p/bibliosight/

Link to technical documentation:

Technical documentation for WS Lite is available from Thomson Reuters and you should address enquiries to your Thomson Reuters account manager.

The code available from http://code.google.com/p/bibliosight/ is fully commented.

Date prototype was launched:

February 9th 2010 (This is code only, not a  distribution of a working prototype – there is some very basic info in there on what you’d need to get it running.)

A screen-cast of the working prototype is available here.

Project Team Names, Emails and Organisations:

Wendy Luker (Leeds Metropolitan University)      w.luker@leedsmet.ac.uk

Arthur Sargeant (Leeds Metropolitan University)  a.sargeant@leedsmet.ac.uk

Peter Douglas (Intrallact Ltd) p.douglas@intrallect.com

Michael Taylor (Leeds Metropolitan University) m.taylor@leedsmet.ac.uk

Nick Sheppard (Leeds Metropolitan University) n.e.sheppard@leedsmet.ac.uk

Babita Bhogal (Leeds Metropolitan University) b.bhogal@leedsmet.ac.uk

Sue Rooke (Leeds Metropolitan University)  s.rooke@leedsmet.ac.uk

Project Website:

http://bibliosightnews.wordpress.com/

PIMS entry:

https://pims.jisc.ac.uk/projects/view/1389

Table of Content for Project Posts:

  1. First Post
  2. Quickstep into rapid innovation project management
  3. Project meeting number 1:  Draft Agenda
  4. Project meeting – minutes
  5. eurocris
  6. JournalTOCs
  7. SWOT analysis – a digital experiment
  8. Generating use-cases
  9. No one said it would be easy
  10. SWOT update
  11. Project meeting number 2:  Draft Agenda
  12. Use case meeting
  13. 20 second pitch at #jiscri
  14. Project meeting – minutes
  15. Small but important win – we have XML!
  16. Research Excellence Framework:  Second consultation on the assessment and funding of research
  17. JISC Rapid Innovation event at City of Manchester stadium
  18. Quick reminder(s)
  19. Just round the next corner…
  20. Project meeting number 3:  Draft agenda
  21. More on ResearcherID
  22. User participation
  23. Project meeting – minutes
  24. Quick sketch
  25. Visit from Thomson Reuters
  26. Project meeting number 4:  Draft agenda
  27. Project meeting – minutes
  28. Thinking out loud…
  29. Quick sketch #2
  30. Mapping fields from WoS API => LOM
  31. Project meeting number 4:  Draft agenda
  32. The role of standards in Bibliosight
  33. Project meeting – minutes
  34. Web Services Lite
  35. JournalTOCsAPI workshop
  36. Steady as she goes – Bibliosight back on course

Posted in Bibliosight, Final Progress Post, Progress post | Tagged: , , , , , , , , , | 1 Comment »

Steady as she goes – Bibliosight back on course!

Posted by Nick on December 18, 2009

The good ship Bibliosight was due into port at the end of November with the rest of the jiscri fleet, however, as I reported at the time, she found herself in a spot of heavy weather and, after experimenting throughout the project with a more general, unrestricted API, we activated our subscription to Web Services Light only to discover that is a different enough product that it would need another reasonable chunk of time to learn and implement.  I’m pleased to report, however, that Mike has been at the helm night and day, battling manfully through the storm, and has managed to bring us back on course!

After some initial problems dealing with an authentication step and setting up a query in such a way that it actually returned an appropriate XML response, it appears that the structure of the XML returned from WS Lite is actually somewhat better organised than from the general API, and more customisable meaning that for our XML transformation step we can simply create our own XML file in the format that we want such that we can transform without having to worry about the oddities that we were seeing with the general API. Mike initially thought that we could do without the XSLT altogether (i.e. have code to output in the formats we need) but that would reduce the flexibility of the process.

A sample record is reproduced below:

<?xml version=”1.0″ encoding=”UTF-8″?>
<searchResponse>
<!– Number of records in the database/editions selected –>
<numberOfItemsSearched>1000</numberOfItemsSearched>
<!– Number of records that match the query parameters –>
<numberOfItemsFound>1</numberOfItemsFound>
<!– Number of records in the result set –>
<numberOfItemsListed>1</numberOfItemsListed>
<!– Date this file was created (generally would be used to date the query execution time) –>
<dateCreated>2009-12-09T15:30:00Z</dateCreated>
<items>
<item>
<!– Seems to be always present –>
<title>Record title</title>
<!– Seems to be always present –>
<authors count=”3″>
<author>Bloggs, J</author>
<author>Smith, J</author>
<author>Sheppard, N</author>
</authors>
<source>
<!– Not always present –>
<bookSeriesTitle>Book series title</bookSeriesTitle>
<!– Seems to be always present –>
<title>Source title</title>
<!– Not always present –>
<volume>10</volume>
<!– Not always present –>
<issue>1</issue>
<!– Not always present –>
<pages>116-126</pages>
<!– Not always present –>
<published>
<!– Not always present –>
<date>JAN</date>
<!– Seems to be always present –>
<year>2008</year>
</published>
</source>
<!– Not always present –>
<keywords count=”2″>
<keyword>keyword 1</keyword>
<keyword>keyword 2</keyword>
</keywords>
<!– Seems to be always present –>
<ut>000252821700009</ut>
</item>
</items>
<!– This section echoes the query parameters used to generate the results –>
<searchRequest>
<queryParameters>
<databaseId>WOS</databaseId>
<!– These are the only editions we seem to be entitled to –>
<editions count=”4″>
<edition collection=”WOS”>SCI</edition>
<edition collection=”WOS”>SSCI</edition>
<edition collection=”WOS”>AHCI</edition>
<edition collection=”WOS”>ISTP</edition>
</editions>
<!– Symbolic time span can’t be used in conjunction with time span –>
<symbolicTimeSpan>1week</symbolicTimeSpan>
<!– This is a DATABASE time span, not a publication time span –>
<timeSpan>
<begin>2008-01-01</begin>
<end>2008-12-31</end>
</timeSpan>
<!– Language is always ‘en” –>
<userQuery language=”en”>AD=(leeds met* univ*)</userQuery>
</queryParameters>
<retrieveParameters>
<!– Currently this is the only available sort field –>
<fields count=”1″>
<field>
<name>Date</name>
<sort>A</sort>
</field>
</fields>
<!– Max returned records (1 – 100) –>
<count>100</count>
<!– Record offset –>
<firstRecord>1</firstRecord>
</retrieveParameters>
</searchRequest>
</searchResponse>

And here is a diagram of how we expect to map the XML onto LOM XML for ingest to intraLibrary (click on the image for full size):

So far so good, now all we need is a UI:

The UI is not yet coupled to the API but the basic components are now pretty much all in place; Mike has aimed to ensure that the client is as flexible as possible – it will allow users to limit a query  by a specified date range including recent updates and can also accommodate additional Database IDs should it be possible to plug-in additional databases in the future, for example.

Hopefully we will get the boat floating early in the New Year when we will finally be able to do some user testing as well as disseminating the code under an appropriate licence (probably GNU GENERAL PUBLIC LICENSE Version 3 – http://www.gnu.org/copyleft/gpl.html)

Merry Christmas!

Posted in Bibliosight, Progress post | Tagged: , , , | 3 Comments »

Web Services Lite

Posted by Nick on November 26, 2009

When the Bibliosight project began back in June, Thomson Reuters’ new Web of Science Web Services had not been released and we were very grateful to the company for giving us full access to their “general API”. After discussion with Thomson, we understood this to be an unrestricted version of WS Lite. However, we have now subscribed to the service which, in actual fact, appears to be a different enough product to need another reasonable chunk of time to learn and implement, which is a little frustrating this close to the end of the project!

There is some consolation that a number of components appear to be shared; query format for example, though Mike hasn’t had enough time with the documentation to fully digest all the similarities.

The resulting XML is also different but more useful (we think), though right now this is based on the documentation which is much more thorough and which should make our life easier and also others wanting to implement the service.

To register for WS Lite users will need to review the Terms & Conditions at the following URL which will take you to a registration form: http://science.thomsonreuters.com/info/terms-ws/

Posted in Bibliosight, Thomson Reuters Research Analytics | Tagged: , , , | 5 Comments »

Project meeting – minutes

Posted by Nick on November 18, 2009

Present: Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Nick Sheppard

1. Apologies

Sue Rooke

2. Minutes from last meeting and actions

As emphasised at the last meeting, it has not been possible, within our timescale, to engage a suitable academic replacement after Phil Jones left the institution earlier in the project and it is now anticipated that academic staff / researchers will be involved in evaluating the outcomes of the project beyond the formal end of jiscri. WL/NS do now have a meeting scheduled (30th November 2009) with Professor Richard Light, the recently appointed Chair of the Carnegie Research Institute, to discuss Bibliosight and the wider repository infrastructure.

NS/PD have done some work on clarifying use cases – see item 4.

Transformation of XML from WoS to LOM format for ingest into intraLibrary. See – http://bibliosightnews.wordpress.com/2009/11/16/mapping-fields-from-wos-api-lom/ – more work still needs to be done in this area. (Action – NS/MT)

AS has updated the schematic diagram to clarify what will be achieved by the end of November. See – http://bibliosightnews.wordpress.com/2009/11/13/332/

NS to contribute project management post to blog on day to day work – ongoing – NS to action ASAP.

PD has contributed a blog post on technical standards used in Bibliosight – http://bibliosightnews.wordpress.com/2009/11/17/the-role-of-standards-in-bibliosight/

3. Update on development of desk-top application

As emphasised at the last meeting, three discrete functional requirements of the desktop application (from now on referred to as Bib App) have been clearly identified:

• Retrieve records from WoS as XML
• Perform an appropriate XSLT transformation to LOM format suitable for ingest to intraLibrary
• Deposit LOM records into intraLibrary using SWORD

MT has been working primarily on stages 1 and 2 and has adopted a pragmatic approach, treating them as two discrete tasks before attempting to integrate the functionality in a single user interface, he has a desktop client that will take XML and perform an XSLT transformation so, once we have clarified the LOM format we require – see http://bibliosightnews.wordpress.com/2009/11/16/mapping-fields-from-wos-api-lom/ – it should be relatively straightforward to plug into the WoS API to retrieve XML from the Web of Science which can then be transformed into appropriate LOM.

Deposit of the LOM into intraLibrary via SWORD should also be fairly straightforward – see – http://bibliosightnews.wordpress.com/2009/11/17/the-role-of-standards-in-bibliosight/ – however, in order to generate clean, consistent LOM, there are still a number of issues to be resolved.

From a technical perspective, Mike is not a Java programmer* and is working very hard to master the language in order to implement an integrated UI that can unify these three discrete functional areas – the precise functionality of the Bib App will also be informed by developing use cases – see item 4 below.

*The WoS API is Java based which perhaps makes it less accessible than it could be – it may be that JISC wish to make recommendations to Thomson Reuters and others regarding the development of open web services APIs. See – http://blogs.ukoln.ac.uk/good-apis-jisc/

Action: NS/MT to continue to investigate issues around three functional areas

Action: MT to continue developing Bib App – development will necessarily take us beyond the formal end of jiscri projects at the end of November

4. Update on use cases

PD/NS have summarised our three use cases in some detail which need writing up in full ASAP (Nick to action).

Particular issues that were identified include:

• In light of progress through the project, UC narratives need to be updated from the now outdated drafts proposed in the original bid
• UCs need to be fully itemised with an ‘actor’ clearly identified for each success scenario
• More thought needs to be given to extensions to each UC

There was particular discussion around UC_2 which centres on targeted communications to researchers to encourage deposit of an appropriate author produced version of a recently published/cited article. It is clear that such a use case will need to identify individual publisher’s copyright policy around deposit in an IR; if they do permit deposit, what restrictions / conditions to they impose? For example, a very common restriction is in the form of a 12/18 month embargo that would need to be incorporated into the workflow.

Action: NS to explore use cases in more detail and write up in full.

5. JournalTOCsAPI workshop – 20th November 2009 – Nick attending

NS is attending a workshop being run by the JournalTOCsAPI project on Friday 20th November and has been invited to give a 15 minute presentation on Bibliosight.

The workshop has two main objectives:

1. To learn the techniques/methodologies that professionals managing repositories use to identify new content for their repositories and the potential benefits as well as the shortcomings that they have identified in the JournalTOCsAPI

2. To give an opportunity to repository managers and API developers to learn the thoughts of experts in institutional repositories for efficiently integrating and reusing up-to-date journal TOC RSS feeds within repository systems and forward looking research information systems.

Action: NS to attend and participate as required

6. Project management tasks – project evaluation

The project management task to be addressed on the blog will be project evaluation.

Action: NS/WL to liaise and post on project evaluation

7. Formal end of project

The formal end of the project in line with the jiscri programme is the end of Novemeber 2009 by which time we are confident we will have a detailed proof of concept for Bibliosight that is well documented on the blog. However, there is still a considerable amount to be done to implement a fully functional Bib App which is a valuable outcome for the institution and the sector; work will therefore be ongoing beyond the end of the jiscri project, internal resources allowing.

8. A.O.B.

None

Posted in Bibliosight | Tagged: , | 1 Comment »

Quick sketch #2

Posted by Nick on November 13, 2009

The diagram below is Arthur’s update of my earlier quick sketch to illustrate what Bibliosight will aim to achieve by the formal #jiscri deadline.

It is numbered and colour coded – stages 1 – 3 (shades of blue) are within the #jiscri timeframe; stages 2 (green) & 5 (buff) will require ongoing work beyond the deadline.

(N.B.  Click on the image for a full size view in a separate browser window.)

Bibliosight

Posted in Bibliosight | Tagged: , , , , , , , | 2 Comments »

Project meeting – minutes

Posted by wendyluker on November 11, 2009

Minutes of the Bibliosight Meeting

Tuesday 20th October 2009

1.  Apologies

Nick, Sue, Babita

2.  Minutes of the last meeting, and actions

Actions :

WL /NS to pursue academic contacts for a representative – this has been on-going, but at this stage of the project it seemed unlikely that we would now get a representative.  Academic staff / researchers to be involved in evaluating the outcomes of the project.

PD to clarify upload of XML to intraLibrary including LOM extensions – Peter confirmed that this could be done.

NS/BB/SR to meet with another member of the URO to clarify potential use cases: Wendy reported that Nick had met with Sue Rooke and Sam Armitage, and work had been done on use cases.  Nick would be able to clarify this on his return to work.

PD to contribute blog post on technical standards : on-going.
New action: Wendy to send Peter the required tags for the post.

All team members to contribute to on-going discussion on the blog – reiterated!

3. Update on meeting with Thomson Reuters

Mike updated the group on the meeting with Thomson Reuters.  We have access to the unrestricted API, but we are not entitled to use it to a greater extent than would be provided by the Web Services Lite version.  Even though it appeared that the 100 record limit may not be an issue after all, in fact if we download the initial set of records year by year then this should not present an issue.  Wendy and Arthur reported on some testing of the Web of Science search interface that they had been doing to check whether the ‘Leeds Metropolitan Univ’ search would be sufficiently robust, and it appeared to be so.

We will need to display WofS / Thomson Reuters terms and conditions alongside any material retrieved from WofS.  There is a place in LOM for this.

4. Update on Use Cases

The use cases will be a useful output of the project, and need further work at this stage, e.g. we need to ensure we capture the information around the intended alerting service: at what point will individuals be alerted? Where will the alert come from?
More work also needed on cataloguing workflows, and how we will deal with the initial 1485 items that will be downloaded.

5. API – next steps in the development

Mike updated the group on progress with the API.

At this stage we can:

  • Get records out of WofS
  • Transform them into XML
    Action Nick: what is the LOM XML?
  • Load them into intraLibrary

Mike needed several decisions to be made before he could progress further:

Would the process for downloading be manual or automated? MANUAL

Would the client be desktop or web based: DESKTOP

It was also decided that the XSLT should be easily swapped out so that it can be output in different formats, i.e. to other interfaces, whether they be Endnote, for example, or another repository.  This would be of benefit to the rest of the community.

The group discussed the diagram that Nick had put up on the Blog recently, with regard to the intended scope of the current project, and which tasks might be part of further developments.

Action: Arthur to update the diagram to make it clear what would be achieved by the end of November (encompassing the intended outputs of the original project) and what the future developments might be.

6. Project management tasks: technical standards and value add

The next of the project management tasks to be addressed on the blog would be day to day work.

Action: Nick on his return

Peter would supply a blog on technical standards

Action: Wendy to send Peter the appropriate tags.

7. Other business

There was no other business

8. Date and time of next meeting

The next meeting will be held on Tuesday 17th November, starting at 1pm.
Peter will arrive at approx. 11am for a pre-meeting with Nick (and others) about use cases.

Posted in Bibliosight, SCRUM minutes | Tagged: , , | 1 Comment »

Project meeting – minutes

Posted by Nick on October 1, 2009

(Date of meeting 29th September 2009)

Present:  Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Sue Rooke, Nick Sheppard

1.  Apologies

No apologies

2.  Team membership

Thank you to Sue Rooke who has agreed to join the Bibliosight project team; Sue is a research administrator in the Faculty of Health and has already been involved in repository development, contributing to developing workflows and providing feedback on the Open Search interface.  We hope that Sue will contribute, in particular, to use case development.

The team is still lacking a representative from the academic community and we are currently waiting for a reply to recent correspondence. WL is attending the research sub-committee on Monday 5th October and may raise the issue there if necessary.

Action:  WL/NS to pursue academic contact(s) for a representative to sit on the project team

3.  Progress since last meeting

• API

We have now received the updated documentation from Thomson Reuters and Mike has submitted a query to the API  and received an appropriate response in XML. Thomson Reuters’ FAQ gives a full summary of the data fields that can be queried by the service and the data elements that can be returned which appears to be in line with this XML response.

We are therefore able to formally reduce the associated risk back to low:

Risk Probability Impact Action to Prevent/Manage Risk
API unsuitable for project deliverables Low (elevated to Medium;1stSeptember 2009 – reduced back to Low; 29th September 2009) High Feedback from Thomson Reuters indicates proposal technically feasible.

Problems with API/documentation have been mitigated by release of new documentation from Thomson Reuters; 29th September 2009)

N.B.  The wording of the documentation appears to suggest that it is only possible to return 100 records with a single query using the API – NS to clarify with Thomson Reuters.  If this is the case, the practical implications  are limited in the case of Leeds Metropolitan University which publishes a relatively small amount of research but would be considerable for an institution with a greater research output.

Action:  NS to clarify 100 record limit with TR

Action:  MT to continue appropriate* implementation of API

* Hopefully what is “appropriate” will evolve over the coming weeks!

• Use cases

Technical difficulties have contributed to a lack of conceptual clarity amongst the project team and there was considerable discussion around precisely what data Bibliosight will now seek to retrieve from WoS using the API and what we will aim to achieve with that data.

The original use case narratives outlined in the bid were several and focussed on an alert service for researchers and/or repository administrators to encourage the deposit of an appropriate full text in the repository and perhaps neglected the obvious administrative use case whereby metadata from WoS is pulled directly into intraLibrary.

N.B.  An important use case was also the extraction of citation metrics that would potentially inform the REF – we are not yet clear how this would be achieved but we understand it will rely on the Article Match Retrieval service.

Of course we also want to produce outputs that are of use to the wider community rather than just to users of our specific repository software and this reflects the considerations of the Readiness for REF project which also hopes to enable UK repositories to make effective and efficient use of the WoS API (as part of a much broader project) and is focussing on EPrints, DSpace and Fedora as the most well established OA research repository platforms.  R4R raises several pertinant questions, many of which also arose independently and in a similar form during our own discussion:

  • What are the different workflows relevant to (i) backfilling a repository with a one-off download and (ii) ongoing use of WoSAPI to populate a repository?
  • What uses might records downloaded from WoSAPI be put to?
  • How might the workflows be designed to enable other datastreams also to help populate the repository (eg from UK PubMedCentral, arXiv, or sources that better serve the arts, humanities and social sciences)?
  • What workflows might be able to handle facts such as that the WoS record will become available some time after the paper is published, whereas deposit into the repository may happen earlier than that?
  • What methods might be helpful in addressing the inevitable questions of duplicate records, or ambiguous relations with existing records?
  • Are there implications for a repository’s mission and reputation if the balance of content it holds is rapidly changed by a large number of WoS-derived records?

Use cases may also be informed by the JournalTOCsAPI project (see item 5 below) who also explored similar issues in a recent post.

One  practical consideration from a technical perspective and that will have a bearing on developing use cases is the best method of extracting comprehensive records from institution “X” – the most appropriate field to query seems to be the address field but it is not clear how consistent the institutional address in this field will be – for example, early experimentation has found that “leeds metropolitan university” only returns 201 records; using a wildcard in the form “leeds met*”, however, returns 1503 records (test conducted 29th September 2009).  This was an issue flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

In terms of the practicalities of actually getting records from WoS into intraLibrary once they have been harvested, Peter did indicate that it should be possible to upload suitable XML records into intraLibrary though this will need to be in LOM format, meaning that we may need to perform an XSLT transformation to convert data retrieved from WoS into a suitable format.  Also, Peter is uncertain whether XML that can be imported in this way will also include the LOM extensions we are using to accommodate bibliographic information and will need to speak to his technical colleagues at Intrallect to clarify.

Note:  There was also discussion around appropriate integration with SFX, our OpenURL resolver, as a possible means of identifying a published URL for WoS records – this is an area that has scope implications both for Bibliosight and the remit of the Leeds Metropolitan University repository itself; beyond an Open Access repository of research (i.e. to also comprise citation only records).  This is an area that may need to be explored in more detail later in the project.

Action:  PD to clarify re upload of XML to intraLibrary including LOM extensions

Action:  NS/BB/SR to meet with another member of the URO to clarify potential use cases (meeting on Thursday 1st October)

Action:  All team members to contribute to ongoing discussion on the blog.

• Project reporting – blog; tags specified by JISC

It was agreed that the specific subject for blog posts this month will be ‘Technical standards’ – Peter agreed to contribute a post before the next meeting.

Action: PD to contribute a blog post on ‘technical standards’.

Action: All team members to contribute to ongoing discussion on the blog.

4.  Visit by Thomson Reuters reps on Wednesday 30th September

Mike and I met with Jon and Gareth from TR on Wednesday 30th (yesterday) who were able to clarify several issues for us – separate post to follow

5. Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

During the meeting, I gave a quick overview of the recently released JournalTOCsAPI at http://www.journaltocs.hw.ac.uk/index.php?action=api with a view to de-mysifying the concept of an API for the less technical amongst us and also potentially giving the more technical a developmental steer.  Currently, queries need to be submitted to the API by URL and are returned as an RSS feed which includes as much metadata as in the original TOC feed – depending on the quality of the original record – comparable to Bibliosight in many respects, this project perhaps has greater flexibility regarding the metadata it is able to query and return – it is, after all, building an API from the ground up that will query an openly accessible data source – however, it is likely that the quality of the data may not be as consistent as WoS; there may be fields missing, for example.

It has also been informative to engage with another, similar project as a ‘user’ and we discussed how Bibliosight might also engage with JournalTOCsAPI community of users and agreed that it is a valuable opportunity to solicit the opinion of repository managers from other institutions using different software platforms.

Action:  NS to continue engaging with JournalTOCsAPI as a ‘user’

Action:  NS to send an email that can be forwarded to JournalTOCsAPI community of users as suggested in recent correspondence from Lisa Rogers

6.  Article Match Retrieval & Researcher ID

These were only touched upon briefly in the meeting and flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

7.  A.O.B.

None

8.  Date of next meeting

20th October 2009 – 11:30 am

Posted in Bibliosight, Progress post, SCRUM minutes | Tagged: , , , , , , , , , | 2 Comments »

Small but important win – we have XML!

Posted by MikeT on September 17, 2009

We have now received the updated documentation for the API. This is good news, but doesn’t necessarily mean a step forward for the project; indeed there was nothing obvious in the documentation that I could use to make the example from the old documentation work correctly (there is no example at all in the new set of documents).

However, another file included in the archive, ESTI.wsdl, proved to be the key to the whole thing. For a quick bit of background information, the example from the old documentation instructed us to generate the java source files using a WSDL file retrieved from a remote server. Regenerating the java files using the newly supplied WSDL file was all it took to get the example working and spewing out copious amounts of (hopefully) useful XML.

I think we can breathe a sigh of relief now that we can move forward with the project again.

Posted in Bibliosight | Tagged: , , , , | 4 Comments »

Project meeting number 2: Draft agenda

Posted by Nick on August 27, 2009

Date of meeting:  1st September 2009

1.  Apologies

2.  Progress since last meeting

  • API
  • SWOT analysis
  • Project reporting

3.  Liaison with other projects

4.  Use case development

5.  A.O.B.

6.  Date of next meeting

Posted in Agenda, Bibliosight | Tagged: , , , , , , | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.