BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Archive for December, 2009

Final Progress Post

Posted by Nick on December 23, 2009

***Updated February 10th 2010****

Title of Primary Project Output:

The Bibliosight desktop application will allow users to specify an approriate query and retrieve bibliographic data as XML from the Web of Science using the recently released (free) WoS API (WS Lite) and convert into a suitable format for repository ingest via SWORD*

*Due to current limitations of WS Lite, the functionality to convert XML output has not been implemented – see this post on Repository News for more details.

Screenshots or diagram of prototype:


Diagram of how returned XML will be mapped onto LOM XML for ingest to intraLibrary (click on the image for full size):


The full bibliosight process (click on the image for full size):

Description of Prototype:

The prototype is a desktop application written in Java that is linked to Thomson Reuters’ WS Lite, an API that allows the Web of Science to be queried by the following fields:

Field Searchable code
Address (including 5 field below) AD
1.  Street SA
2.  City CI
3.  Province/State PS
4.  Zip/postal code ZP
5.  Country CU
Author AU
Conference (including title, location, data, and sponsor) CF
Group Author GP
Organization OG
Sub-organization SG
Source Publication (journal, book or conference) SO
Title TI
Topic TS

Queries may also be specified by date* and the service will support the AND, OR, NOT, and SAME Boolean operators.

*The date on which a record was added to WoS rather than the date of publication. In most cases the year will be the same but there will certainly be some cases where an article published in one year will not have been added to WoS until the following year.

An overview of the application is as follows:

Query options: Query – Allows the user to specify the fields to query in the form Code=(query parameter) and the service does support wild-cards e.g. AD=(leeds met* univ*)

Query options: Date – Allows the user to specify either the date range (inclusive) or retrieve recent updates within the last week/two weeks/four weeks

Query options: Database: DatabaseID – Currently WOS only; in order to ensure the client is as flexible as possible this field is included to accommodate additional Database IDs and it may be possible to plug-in additional databases in the future, for example.

Query options: Database: Editions – These checkboxes reflect the Citation Databases filter within WoS:

  • AHCI – Arts & Humanities Citation Index (1975-present)
  • ISTP – Conference Proceedings Citation Index- Science (1990-present)*
  • SCI – Science Citation Index (1970-present)
  • SSCI – Social Sciences Citation Index (1970-present)

*ISTP reflects code currently used by API – it is not clear why it doesn’t correspond with term now used in WoS which is CPCI-S – Conference Proceedings Citation Index- Science (1990-present)

Retrieve Options: Start Record – Allows user to specify start record to return from all results

Retrieve Options: Maximum records to retrieve – Allows user to specify maximum records to retrieve between 1 and 100 (N.B.  The API is currently restricted to a maximum of 100 records though it can be queried multiple times.)

Retrieve Options: Sort by (Date) (Ascending/Descending) – Allows user to sort records (currently by date only) ascending or descending in date order.

Proxy settings: This is purely for local network setup at Leeds Met and has nothing to do with WoS but will be necessary for users that are behind a proxy server.

View results: View results of current query (as XML)

Save results: Save results of current query

Perform search request: Perform the specified query

Link to working prototype:

There are several issues with distributing a working prototype in that it has a number of dependencies, some of which are specific to the WS Lite service and it is our view that it is less confusing to release the code only, which is available from http://code.google.com/p/bibliosight/

A screen-cast of the working prototype is available here.

Please note that you will require an appropriate subscription to ISI Web of Knowledge; the service requires an authorised IP address and you will also need to register for Thomson Reuters Web of Science® web services programming interface (WS Lite) by agreeing to the Terms & Conditions at http://science.thomsonreuters.com/info/terms-ws/ and completing a registration form – if you have any problems you should contact your Thomson Reuters account manager.

Link to end user documentation:

End user documentation:  https://bibliosightnews.wordpress.com/end-user-documentation/

About the project:  https://bibliosightnews.wordpress.com/about/

For use cases see: https://bibliosightnews.wordpress.com/use-cases/

Link to code repository or API:

The code is available from http://code.google.com/p/bibliosight/

Link to technical documentation:

Technical documentation for WS Lite is available from Thomson Reuters and you should address enquiries to your Thomson Reuters account manager.

The code available from http://code.google.com/p/bibliosight/ is fully commented.

Date prototype was launched:

February 9th 2010 (This is code only, not a  distribution of a working prototype – there is some very basic info in there on what you’d need to get it running.)

A screen-cast of the working prototype is available here.

Project Team Names, Emails and Organisations:

Wendy Luker (Leeds Metropolitan University)      w.luker@leedsmet.ac.uk

Arthur Sargeant (Leeds Metropolitan University)  a.sargeant@leedsmet.ac.uk

Peter Douglas (Intrallact Ltd) p.douglas@intrallect.com

Michael Taylor (Leeds Metropolitan University) m.taylor@leedsmet.ac.uk

Nick Sheppard (Leeds Metropolitan University) n.e.sheppard@leedsmet.ac.uk

Babita Bhogal (Leeds Metropolitan University) b.bhogal@leedsmet.ac.uk

Sue Rooke (Leeds Metropolitan University)  s.rooke@leedsmet.ac.uk

Project Website:

https://bibliosightnews.wordpress.com/

PIMS entry:

https://pims.jisc.ac.uk/projects/view/1389

Table of Content for Project Posts:

  1. First Post
  2. Quickstep into rapid innovation project management
  3. Project meeting number 1:  Draft Agenda
  4. Project meeting – minutes
  5. eurocris
  6. JournalTOCs
  7. SWOT analysis – a digital experiment
  8. Generating use-cases
  9. No one said it would be easy
  10. SWOT update
  11. Project meeting number 2:  Draft Agenda
  12. Use case meeting
  13. 20 second pitch at #jiscri
  14. Project meeting – minutes
  15. Small but important win – we have XML!
  16. Research Excellence Framework:  Second consultation on the assessment and funding of research
  17. JISC Rapid Innovation event at City of Manchester stadium
  18. Quick reminder(s)
  19. Just round the next corner…
  20. Project meeting number 3:  Draft agenda
  21. More on ResearcherID
  22. User participation
  23. Project meeting – minutes
  24. Quick sketch
  25. Visit from Thomson Reuters
  26. Project meeting number 4:  Draft agenda
  27. Project meeting – minutes
  28. Thinking out loud…
  29. Quick sketch #2
  30. Mapping fields from WoS API => LOM
  31. Project meeting number 4:  Draft agenda
  32. The role of standards in Bibliosight
  33. Project meeting – minutes
  34. Web Services Lite
  35. JournalTOCsAPI workshop
  36. Steady as she goes – Bibliosight back on course

Posted in Bibliosight, Final Progress Post, Progress post | Tagged: , , , , , , , , , | 1 Comment »

Steady as she goes – Bibliosight back on course!

Posted by Nick on December 18, 2009

The good ship Bibliosight was due into port at the end of November with the rest of the jiscri fleet, however, as I reported at the time, she found herself in a spot of heavy weather and, after experimenting throughout the project with a more general, unrestricted API, we activated our subscription to Web Services Light only to discover that is a different enough product that it would need another reasonable chunk of time to learn and implement.  I’m pleased to report, however, that Mike has been at the helm night and day, battling manfully through the storm, and has managed to bring us back on course!

After some initial problems dealing with an authentication step and setting up a query in such a way that it actually returned an appropriate XML response, it appears that the structure of the XML returned from WS Lite is actually somewhat better organised than from the general API, and more customisable meaning that for our XML transformation step we can simply create our own XML file in the format that we want such that we can transform without having to worry about the oddities that we were seeing with the general API. Mike initially thought that we could do without the XSLT altogether (i.e. have code to output in the formats we need) but that would reduce the flexibility of the process.

A sample record is reproduced below:

<?xml version=”1.0″ encoding=”UTF-8″?>
<searchResponse>
<!– Number of records in the database/editions selected –>
<numberOfItemsSearched>1000</numberOfItemsSearched>
<!– Number of records that match the query parameters –>
<numberOfItemsFound>1</numberOfItemsFound>
<!– Number of records in the result set –>
<numberOfItemsListed>1</numberOfItemsListed>
<!– Date this file was created (generally would be used to date the query execution time) –>
<dateCreated>2009-12-09T15:30:00Z</dateCreated>
<items>
<item>
<!– Seems to be always present –>
<title>Record title</title>
<!– Seems to be always present –>
<authors count=”3″>
<author>Bloggs, J</author>
<author>Smith, J</author>
<author>Sheppard, N</author>
</authors>
<source>
<!– Not always present –>
<bookSeriesTitle>Book series title</bookSeriesTitle>
<!– Seems to be always present –>
<title>Source title</title>
<!– Not always present –>
<volume>10</volume>
<!– Not always present –>
<issue>1</issue>
<!– Not always present –>
<pages>116-126</pages>
<!– Not always present –>
<published>
<!– Not always present –>
<date>JAN</date>
<!– Seems to be always present –>
<year>2008</year>
</published>
</source>
<!– Not always present –>
<keywords count=”2″>
<keyword>keyword 1</keyword>
<keyword>keyword 2</keyword>
</keywords>
<!– Seems to be always present –>
<ut>000252821700009</ut>
</item>
</items>
<!– This section echoes the query parameters used to generate the results –>
<searchRequest>
<queryParameters>
<databaseId>WOS</databaseId>
<!– These are the only editions we seem to be entitled to –>
<editions count=”4″>
<edition collection=”WOS”>SCI</edition>
<edition collection=”WOS”>SSCI</edition>
<edition collection=”WOS”>AHCI</edition>
<edition collection=”WOS”>ISTP</edition>
</editions>
<!– Symbolic time span can’t be used in conjunction with time span –>
<symbolicTimeSpan>1week</symbolicTimeSpan>
<!– This is a DATABASE time span, not a publication time span –>
<timeSpan>
<begin>2008-01-01</begin>
<end>2008-12-31</end>
</timeSpan>
<!– Language is always ‘en” –>
<userQuery language=”en”>AD=(leeds met* univ*)</userQuery>
</queryParameters>
<retrieveParameters>
<!– Currently this is the only available sort field –>
<fields count=”1″>
<field>
<name>Date</name>
<sort>A</sort>
</field>
</fields>
<!– Max returned records (1 – 100) –>
<count>100</count>
<!– Record offset –>
<firstRecord>1</firstRecord>
</retrieveParameters>
</searchRequest>
</searchResponse>

And here is a diagram of how we expect to map the XML onto LOM XML for ingest to intraLibrary (click on the image for full size):

So far so good, now all we need is a UI:

The UI is not yet coupled to the API but the basic components are now pretty much all in place; Mike has aimed to ensure that the client is as flexible as possible – it will allow users to limit a query  by a specified date range including recent updates and can also accommodate additional Database IDs should it be possible to plug-in additional databases in the future, for example.

Hopefully we will get the boat floating early in the New Year when we will finally be able to do some user testing as well as disseminating the code under an appropriate licence (probably GNU GENERAL PUBLIC LICENSE Version 3 – http://www.gnu.org/copyleft/gpl.html)

Merry Christmas!

Posted in Bibliosight, Progress post | Tagged: , , , | 3 Comments »