BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Final Progress Post

Posted by Nick on December 23, 2009

***Updated February 10th 2010****

Title of Primary Project Output:

The Bibliosight desktop application will allow users to specify an approriate query and retrieve bibliographic data as XML from the Web of Science using the recently released (free) WoS API (WS Lite) and convert into a suitable format for repository ingest via SWORD*

*Due to current limitations of WS Lite, the functionality to convert XML output has not been implemented – see this post on Repository News for more details.

Screenshots or diagram of prototype:


Diagram of how returned XML will be mapped onto LOM XML for ingest to intraLibrary (click on the image for full size):


The full bibliosight process (click on the image for full size):

Description of Prototype:

The prototype is a desktop application written in Java that is linked to Thomson Reuters’ WS Lite, an API that allows the Web of Science to be queried by the following fields:

Field Searchable code
Address (including 5 field below) AD
1.  Street SA
2.  City CI
3.  Province/State PS
4.  Zip/postal code ZP
5.  Country CU
Author AU
Conference (including title, location, data, and sponsor) CF
Group Author GP
Organization OG
Sub-organization SG
Source Publication (journal, book or conference) SO
Title TI
Topic TS

Queries may also be specified by date* and the service will support the AND, OR, NOT, and SAME Boolean operators.

*The date on which a record was added to WoS rather than the date of publication. In most cases the year will be the same but there will certainly be some cases where an article published in one year will not have been added to WoS until the following year.

An overview of the application is as follows:

Query options: Query – Allows the user to specify the fields to query in the form Code=(query parameter) and the service does support wild-cards e.g. AD=(leeds met* univ*)

Query options: Date – Allows the user to specify either the date range (inclusive) or retrieve recent updates within the last week/two weeks/four weeks

Query options: Database: DatabaseID – Currently WOS only; in order to ensure the client is as flexible as possible this field is included to accommodate additional Database IDs and it may be possible to plug-in additional databases in the future, for example.

Query options: Database: Editions – These checkboxes reflect the Citation Databases filter within WoS:

  • AHCI – Arts & Humanities Citation Index (1975-present)
  • ISTP – Conference Proceedings Citation Index- Science (1990-present)*
  • SCI – Science Citation Index (1970-present)
  • SSCI – Social Sciences Citation Index (1970-present)

*ISTP reflects code currently used by API – it is not clear why it doesn’t correspond with term now used in WoS which is CPCI-S – Conference Proceedings Citation Index- Science (1990-present)

Retrieve Options: Start Record – Allows user to specify start record to return from all results

Retrieve Options: Maximum records to retrieve – Allows user to specify maximum records to retrieve between 1 and 100 (N.B.  The API is currently restricted to a maximum of 100 records though it can be queried multiple times.)

Retrieve Options: Sort by (Date) (Ascending/Descending) – Allows user to sort records (currently by date only) ascending or descending in date order.

Proxy settings: This is purely for local network setup at Leeds Met and has nothing to do with WoS but will be necessary for users that are behind a proxy server.

View results: View results of current query (as XML)

Save results: Save results of current query

Perform search request: Perform the specified query

Link to working prototype:

There are several issues with distributing a working prototype in that it has a number of dependencies, some of which are specific to the WS Lite service and it is our view that it is less confusing to release the code only, which is available from http://code.google.com/p/bibliosight/

A screen-cast of the working prototype is available here.

Please note that you will require an appropriate subscription to ISI Web of Knowledge; the service requires an authorised IP address and you will also need to register for Thomson Reuters Web of Science® web services programming interface (WS Lite) by agreeing to the Terms & Conditions at http://science.thomsonreuters.com/info/terms-ws/ and completing a registration form – if you have any problems you should contact your Thomson Reuters account manager.

Link to end user documentation:

End user documentation:  https://bibliosightnews.wordpress.com/end-user-documentation/

About the project:  https://bibliosightnews.wordpress.com/about/

For use cases see: https://bibliosightnews.wordpress.com/use-cases/

Link to code repository or API:

The code is available from http://code.google.com/p/bibliosight/

Link to technical documentation:

Technical documentation for WS Lite is available from Thomson Reuters and you should address enquiries to your Thomson Reuters account manager.

The code available from http://code.google.com/p/bibliosight/ is fully commented.

Date prototype was launched:

February 9th 2010 (This is code only, not a  distribution of a working prototype – there is some very basic info in there on what you’d need to get it running.)

A screen-cast of the working prototype is available here.

Project Team Names, Emails and Organisations:

Wendy Luker (Leeds Metropolitan University)      w.luker@leedsmet.ac.uk

Arthur Sargeant (Leeds Metropolitan University)  a.sargeant@leedsmet.ac.uk

Peter Douglas (Intrallact Ltd) p.douglas@intrallect.com

Michael Taylor (Leeds Metropolitan University) m.taylor@leedsmet.ac.uk

Nick Sheppard (Leeds Metropolitan University) n.e.sheppard@leedsmet.ac.uk

Babita Bhogal (Leeds Metropolitan University) b.bhogal@leedsmet.ac.uk

Sue Rooke (Leeds Metropolitan University)  s.rooke@leedsmet.ac.uk

Project Website:

https://bibliosightnews.wordpress.com/

PIMS entry:

https://pims.jisc.ac.uk/projects/view/1389

Table of Content for Project Posts:

  1. First Post
  2. Quickstep into rapid innovation project management
  3. Project meeting number 1:  Draft Agenda
  4. Project meeting – minutes
  5. eurocris
  6. JournalTOCs
  7. SWOT analysis – a digital experiment
  8. Generating use-cases
  9. No one said it would be easy
  10. SWOT update
  11. Project meeting number 2:  Draft Agenda
  12. Use case meeting
  13. 20 second pitch at #jiscri
  14. Project meeting – minutes
  15. Small but important win – we have XML!
  16. Research Excellence Framework:  Second consultation on the assessment and funding of research
  17. JISC Rapid Innovation event at City of Manchester stadium
  18. Quick reminder(s)
  19. Just round the next corner…
  20. Project meeting number 3:  Draft agenda
  21. More on ResearcherID
  22. User participation
  23. Project meeting – minutes
  24. Quick sketch
  25. Visit from Thomson Reuters
  26. Project meeting number 4:  Draft agenda
  27. Project meeting – minutes
  28. Thinking out loud…
  29. Quick sketch #2
  30. Mapping fields from WoS API => LOM
  31. Project meeting number 4:  Draft agenda
  32. The role of standards in Bibliosight
  33. Project meeting – minutes
  34. Web Services Lite
  35. JournalTOCsAPI workshop
  36. Steady as she goes – Bibliosight back on course

Posted in Bibliosight, Final Progress Post, Progress post | Tagged: , , , , , , , , , | 1 Comment »

Steady as she goes – Bibliosight back on course!

Posted by Nick on December 18, 2009

The good ship Bibliosight was due into port at the end of November with the rest of the jiscri fleet, however, as I reported at the time, she found herself in a spot of heavy weather and, after experimenting throughout the project with a more general, unrestricted API, we activated our subscription to Web Services Light only to discover that is a different enough product that it would need another reasonable chunk of time to learn and implement.  I’m pleased to report, however, that Mike has been at the helm night and day, battling manfully through the storm, and has managed to bring us back on course!

After some initial problems dealing with an authentication step and setting up a query in such a way that it actually returned an appropriate XML response, it appears that the structure of the XML returned from WS Lite is actually somewhat better organised than from the general API, and more customisable meaning that for our XML transformation step we can simply create our own XML file in the format that we want such that we can transform without having to worry about the oddities that we were seeing with the general API. Mike initially thought that we could do without the XSLT altogether (i.e. have code to output in the formats we need) but that would reduce the flexibility of the process.

A sample record is reproduced below:

<?xml version=”1.0″ encoding=”UTF-8″?>
<searchResponse>
<!– Number of records in the database/editions selected –>
<numberOfItemsSearched>1000</numberOfItemsSearched>
<!– Number of records that match the query parameters –>
<numberOfItemsFound>1</numberOfItemsFound>
<!– Number of records in the result set –>
<numberOfItemsListed>1</numberOfItemsListed>
<!– Date this file was created (generally would be used to date the query execution time) –>
<dateCreated>2009-12-09T15:30:00Z</dateCreated>
<items>
<item>
<!– Seems to be always present –>
<title>Record title</title>
<!– Seems to be always present –>
<authors count=”3″>
<author>Bloggs, J</author>
<author>Smith, J</author>
<author>Sheppard, N</author>
</authors>
<source>
<!– Not always present –>
<bookSeriesTitle>Book series title</bookSeriesTitle>
<!– Seems to be always present –>
<title>Source title</title>
<!– Not always present –>
<volume>10</volume>
<!– Not always present –>
<issue>1</issue>
<!– Not always present –>
<pages>116-126</pages>
<!– Not always present –>
<published>
<!– Not always present –>
<date>JAN</date>
<!– Seems to be always present –>
<year>2008</year>
</published>
</source>
<!– Not always present –>
<keywords count=”2″>
<keyword>keyword 1</keyword>
<keyword>keyword 2</keyword>
</keywords>
<!– Seems to be always present –>
<ut>000252821700009</ut>
</item>
</items>
<!– This section echoes the query parameters used to generate the results –>
<searchRequest>
<queryParameters>
<databaseId>WOS</databaseId>
<!– These are the only editions we seem to be entitled to –>
<editions count=”4″>
<edition collection=”WOS”>SCI</edition>
<edition collection=”WOS”>SSCI</edition>
<edition collection=”WOS”>AHCI</edition>
<edition collection=”WOS”>ISTP</edition>
</editions>
<!– Symbolic time span can’t be used in conjunction with time span –>
<symbolicTimeSpan>1week</symbolicTimeSpan>
<!– This is a DATABASE time span, not a publication time span –>
<timeSpan>
<begin>2008-01-01</begin>
<end>2008-12-31</end>
</timeSpan>
<!– Language is always ‘en” –>
<userQuery language=”en”>AD=(leeds met* univ*)</userQuery>
</queryParameters>
<retrieveParameters>
<!– Currently this is the only available sort field –>
<fields count=”1″>
<field>
<name>Date</name>
<sort>A</sort>
</field>
</fields>
<!– Max returned records (1 – 100) –>
<count>100</count>
<!– Record offset –>
<firstRecord>1</firstRecord>
</retrieveParameters>
</searchRequest>
</searchResponse>

And here is a diagram of how we expect to map the XML onto LOM XML for ingest to intraLibrary (click on the image for full size):

So far so good, now all we need is a UI:

The UI is not yet coupled to the API but the basic components are now pretty much all in place; Mike has aimed to ensure that the client is as flexible as possible – it will allow users to limit a query  by a specified date range including recent updates and can also accommodate additional Database IDs should it be possible to plug-in additional databases in the future, for example.

Hopefully we will get the boat floating early in the New Year when we will finally be able to do some user testing as well as disseminating the code under an appropriate licence (probably GNU GENERAL PUBLIC LICENSE Version 3 – http://www.gnu.org/copyleft/gpl.html)

Merry Christmas!

Posted in Bibliosight, Progress post | Tagged: , , , | 3 Comments »

JournalTOCsAPI workshop

Posted by Nick on November 26, 2009

On Friday I was invited to participate in a workshop for the JournalTOCsAPI project at Heriot Watt University in Edinburgh.  I didn’t think I was going to make it at all due to the awful flooding in Cumbria and we were told at one point that trains were travelling no further than Carlisle due to the weather and that Scotland was effectively out of bounds – the tracks must have been dry enough, however, and I arrived just in time for Lisa Roger’s introductory presentation “JournalTOCs Workshop – Introduction & Feedback”:

Then came Jenny Delasalle, Repository manager at Warwick University and chair of UKCORR, talking about “Repositories and Alerting Services”:

The third presentation was given by Santy Chumbe, the JournalTOCs Project manager, on behalf of Anne Dixon from the British Geological Survey who helped to test the first use case for the JournalTOCs project:

I was next up presenting on Bibliosight – though it remains to be seen just how relevant this will continue to be as we learn more about WS Lite:

Finally Phil Barker presented on “The Other Side of the Interface” which I found a most engaging re-evaluation of our developing repository/research infrastructure as a complex and dynamic “ecosystem” full of interacting (and evolving) entities and processes:

Thanks to the JournalTOCs team for an enjoyable and informative event, to Jenny and Phil for their presentations and to Helen Muir and Colin Smith (Repository Manager at the Open University) for their insights throughout the day. It was particularly interesting for me to listen to Jenny and Colin discuss their respective practices at WRAP and the ORO – both examples of successful and well established Open Access repositories at major research institutions with much greater numbers of research outputs than Leeds Met – I certainly learnt a great deal about how I might use alerting services, including the JournalTOCsAPI, to alert me to new publications that I can pursue for the OA research repository at Leeds Met and, along with bibliosight and WS Lite I shall aim to integrate some of what I learned into my workflows over the coming months.

Posted in Event, JournalTocs | Tagged: , , | 1 Comment »

Web Services Lite

Posted by Nick on November 26, 2009

When the Bibliosight project began back in June, Thomson Reuters’ new Web of Science Web Services had not been released and we were very grateful to the company for giving us full access to their “general API”. After discussion with Thomson, we understood this to be an unrestricted version of WS Lite. However, we have now subscribed to the service which, in actual fact, appears to be a different enough product to need another reasonable chunk of time to learn and implement, which is a little frustrating this close to the end of the project!

There is some consolation that a number of components appear to be shared; query format for example, though Mike hasn’t had enough time with the documentation to fully digest all the similarities.

The resulting XML is also different but more useful (we think), though right now this is based on the documentation which is much more thorough and which should make our life easier and also others wanting to implement the service.

To register for WS Lite users will need to review the Terms & Conditions at the following URL which will take you to a registration form: http://science.thomsonreuters.com/info/terms-ws/

Posted in Bibliosight, Thomson Reuters Research Analytics | Tagged: , , , | 6 Comments »

Project meeting – minutes

Posted by Nick on November 18, 2009

Present: Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Nick Sheppard

1. Apologies

Sue Rooke

2. Minutes from last meeting and actions

As emphasised at the last meeting, it has not been possible, within our timescale, to engage a suitable academic replacement after Phil Jones left the institution earlier in the project and it is now anticipated that academic staff / researchers will be involved in evaluating the outcomes of the project beyond the formal end of jiscri. WL/NS do now have a meeting scheduled (30th November 2009) with Professor Richard Light, the recently appointed Chair of the Carnegie Research Institute, to discuss Bibliosight and the wider repository infrastructure.

NS/PD have done some work on clarifying use cases – see item 4.

Transformation of XML from WoS to LOM format for ingest into intraLibrary. See – https://bibliosightnews.wordpress.com/2009/11/16/mapping-fields-from-wos-api-lom/ – more work still needs to be done in this area. (Action – NS/MT)

AS has updated the schematic diagram to clarify what will be achieved by the end of November. See – https://bibliosightnews.wordpress.com/2009/11/13/332/

NS to contribute project management post to blog on day to day work – ongoing – NS to action ASAP.

PD has contributed a blog post on technical standards used in Bibliosight – https://bibliosightnews.wordpress.com/2009/11/17/the-role-of-standards-in-bibliosight/

3. Update on development of desk-top application

As emphasised at the last meeting, three discrete functional requirements of the desktop application (from now on referred to as Bib App) have been clearly identified:

• Retrieve records from WoS as XML
• Perform an appropriate XSLT transformation to LOM format suitable for ingest to intraLibrary
• Deposit LOM records into intraLibrary using SWORD

MT has been working primarily on stages 1 and 2 and has adopted a pragmatic approach, treating them as two discrete tasks before attempting to integrate the functionality in a single user interface, he has a desktop client that will take XML and perform an XSLT transformation so, once we have clarified the LOM format we require – see https://bibliosightnews.wordpress.com/2009/11/16/mapping-fields-from-wos-api-lom/ – it should be relatively straightforward to plug into the WoS API to retrieve XML from the Web of Science which can then be transformed into appropriate LOM.

Deposit of the LOM into intraLibrary via SWORD should also be fairly straightforward – see – https://bibliosightnews.wordpress.com/2009/11/17/the-role-of-standards-in-bibliosight/ – however, in order to generate clean, consistent LOM, there are still a number of issues to be resolved.

From a technical perspective, Mike is not a Java programmer* and is working very hard to master the language in order to implement an integrated UI that can unify these three discrete functional areas – the precise functionality of the Bib App will also be informed by developing use cases – see item 4 below.

*The WoS API is Java based which perhaps makes it less accessible than it could be – it may be that JISC wish to make recommendations to Thomson Reuters and others regarding the development of open web services APIs. See – http://blogs.ukoln.ac.uk/good-apis-jisc/

Action: NS/MT to continue to investigate issues around three functional areas

Action: MT to continue developing Bib App – development will necessarily take us beyond the formal end of jiscri projects at the end of November

4. Update on use cases

PD/NS have summarised our three use cases in some detail which need writing up in full ASAP (Nick to action).

Particular issues that were identified include:

• In light of progress through the project, UC narratives need to be updated from the now outdated drafts proposed in the original bid
• UCs need to be fully itemised with an ‘actor’ clearly identified for each success scenario
• More thought needs to be given to extensions to each UC

There was particular discussion around UC_2 which centres on targeted communications to researchers to encourage deposit of an appropriate author produced version of a recently published/cited article. It is clear that such a use case will need to identify individual publisher’s copyright policy around deposit in an IR; if they do permit deposit, what restrictions / conditions to they impose? For example, a very common restriction is in the form of a 12/18 month embargo that would need to be incorporated into the workflow.

Action: NS to explore use cases in more detail and write up in full.

5. JournalTOCsAPI workshop – 20th November 2009 – Nick attending

NS is attending a workshop being run by the JournalTOCsAPI project on Friday 20th November and has been invited to give a 15 minute presentation on Bibliosight.

The workshop has two main objectives:

1. To learn the techniques/methodologies that professionals managing repositories use to identify new content for their repositories and the potential benefits as well as the shortcomings that they have identified in the JournalTOCsAPI

2. To give an opportunity to repository managers and API developers to learn the thoughts of experts in institutional repositories for efficiently integrating and reusing up-to-date journal TOC RSS feeds within repository systems and forward looking research information systems.

Action: NS to attend and participate as required

6. Project management tasks – project evaluation

The project management task to be addressed on the blog will be project evaluation.

Action: NS/WL to liaise and post on project evaluation

7. Formal end of project

The formal end of the project in line with the jiscri programme is the end of Novemeber 2009 by which time we are confident we will have a detailed proof of concept for Bibliosight that is well documented on the blog. However, there is still a considerable amount to be done to implement a fully functional Bib App which is a valuable outcome for the institution and the sector; work will therefore be ongoing beyond the end of the jiscri project, internal resources allowing.

8. A.O.B.

None

Posted in Bibliosight | Tagged: , | 1 Comment »

Project meeting number 5: Draft agenda

Posted by Nick on November 16, 2009

Date of meeting:  Tuesday 17th November 2009

1. Apologies

2. Minutes from last meeting and actions

3. Update on development of desk-top application

4. Update on use cases

  • Identify new research in WoS on a regular basis (daily/weekly/monthly); retrieve available metadata associated with records – add to intraLibrary
  • Identify new research in WoS on a regular basis (daily/weekly/monthly); check copyright/SHERPA-RoMEO; generate targeted email

5. JournalTOCsAPI workshop – 20th November 2009 – Nick attending

6. Project management tasks – project evaluation

7. Formal end of project

8. A.O.B.

Posted in Agenda | Tagged: , | 1 Comment »

Quick sketch #2

Posted by Nick on November 13, 2009

The diagram below is Arthur’s update of my earlier quick sketch to illustrate what Bibliosight will aim to achieve by the formal #jiscri deadline.

It is numbered and colour coded – stages 1 – 3 (shades of blue) are within the #jiscri timeframe; stages 2 (green) & 5 (buff) will require ongoing work beyond the deadline.

(N.B.  Click on the image for a full size view in a separate browser window.)

Bibliosight

Posted in Bibliosight | Tagged: , , , , , , , | 2 Comments »

Thinking out loud…

Posted by Nick on November 11, 2009

As the deadline for #jiscri draws close I have just returned to work after a month away from Bibliosight and I’m now desperately trying to catch up with the project and determine exactly what we can aim to achieve by the end of November…The candid truth is that we have only very recently got to the point where Mike can actually do some coding and begin to put together a prototype that fulfills the requirements of our (still formative) use-case[s].

Yesterday morning I had a stab at completing a more detailed template for a primary use-case (this comprises a narrative and the use case itself); then in the afternoon I sat down with Mike to catch up with his progress from a technical perspective and to brain-storm around precisely what functions we require from our prototype and how this may be achieved; there are also some outstanding issues of clarity pertaining to Thomson Reuter’s API documentation, specifically “WoS Search Retrieve Codes and Descriptions” in that we currently have unrestricted access to the API but it is my understanding that the free* service will actually be restricted.  We are not certain:

a)  Precisely which of the fields are associated with the restricted subset that we will be able to query and/or return under the current terrms of our WoS subscription*

b)  What some of the fields actually are as they lack a description in the documentation

*Free to us under existing subscription

Disclaimer:  I’m very much thinking out loud here and attempting to translate what I understand are ongoing conceptual issues for Mike as he works through the documentation.

Note:  I’ve continued to refer to ResearcherID – see https://bibliosightnews.wordpress.com/2009/10/02/visit-from-thomson-reuters/ – though it is not a service we plan on implementing as part of Bibliosight, and not necessarily even in the longer term, I’m pretty sure we are likely to require some sort of unique identifier for authors – a subject that is currently receiving a lot of attention from the repository community.

Anyway…looking back over the blog it seems that:

The requesting system can query the Web of Science using the following fields:

  • Address (including Street, City, Province, Zip Code, or Country)
  • Author
  • Conference (including title, location, data, and sponsor)
  • Group Author
  • Organization or Sub-organization
  • Source Publication (journal, book or conference)
  • Title
  • Topic
  • Year Published

The service will support the AND, OR, NOT, and SAME Boolean operators.

The Web of Science Web Service returns five fields to the requesting system:

  • Article Title
  • Authors — All authors, book authors, and corporate authors
  • Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
  • Keywords — all author supplied keywords
  • UT — A unique article identified provided by Thomson Reuters

The test queries that Mike has submitted to the API have returned XML that appears to be both more granular than indicated and that includes fields other than those that constitute these five (e.g. abstract) so the first thing to do, perhaps, is to contact Thomson Reuters and see if they can apply the restrictions that we will ultimately need to work with, if only to remove some of the noise and make it easier to see the wood for the trees.

The API documentation actually lists over 100 “fields”; only a handful of these are actually described in the documentation, however, and while many are reasonably transparent, others are a little less so and some look like they may duplicate information – or are they perhaps used as alternatives? (e.g. bib_id = Volume, issue, special, pages and year data / bib_issue = Volume and year data).  There is also some lack of consistency in this bibliographic info on a record by record basis; we need to ensure that we have consistent XML being returned for all records – hopefully we can then develop a template in intraLibrary itself that reflects that consistent XML as closely as possible such that we can devise an XSLT style-sheet to perform the approriate transformation.

Mike already has a desktop client that will take XML and perform an XSLT transformation so, once we have clarified the LOM format we require (an action for me from the last meeting), it *should* be relatively straightforward to plug into the WoS API to retrieve XML from the Web of Science which can then be transformed into appropriate LOM.

Then we need to ingest that LOM into intraLibrary, preferably using SWORD…which I shall think about another time!

Posted in Progress post | Tagged: , , , , | 1 Comment »