Archive for the ‘Progress post’ Category

Final Progress Post

Posted by Nick on December 23, 2009

***Updated February 10th 2010****

Title of Primary Project Output:

The Bibliosight desktop application will allow users to specify an approriate query and retrieve bibliographic data as XML from the Web of Science using the recently released (free) WoS API (WS Lite) and convert into a suitable format for repository ingest via SWORD*

*Due to current limitations of WS Lite, the functionality to convert XML output has not been implemented – see this post on Repository News for more details.

Screenshots or diagram of prototype:

Diagram of how returned XML will be mapped onto LOM XML for ingest to intraLibrary (click on the image for full size):

The full bibliosight process (click on the image for full size):

Description of Prototype:

The prototype is a desktop application written in Java that is linked to Thomson Reuters’ WS Lite, an API that allows the Web of Science to be queried by the following fields:

Field	Searchable code
Address (including 5 field below)	AD
1. Street	SA
2. City	CI
3. Province/State	PS
4. Zip/postal code	ZP
5. Country	CU
Author	AU
Conference (including title, location, data, and sponsor)	CF
Group Author	GP
Organization	OG
Sub-organization	SG
Source Publication (journal, book or conference)	SO
Title	TI
Topic	TS

Queries may also be specified by date* and the service will support the AND, OR, NOT, and SAME Boolean operators.

*The date on which a record was added to WoS rather than the date of publication. In most cases the year will be the same but there will certainly be some cases where an article published in one year will not have been added to WoS until the following year.

An overview of the application is as follows:

Query options: Query – Allows the user to specify the fields to query in the form Code=(query parameter) and the service does support wild-cards e.g. AD=(leeds met* univ*)

Query options: Date – Allows the user to specify either the date range (inclusive) or retrieve recent updates within the last week/two weeks/four weeks

Query options: Database: DatabaseID – Currently WOS only; in order to ensure the client is as flexible as possible this field is included to accommodate additional Database IDs and it may be possible to plug-in additional databases in the future, for example.

Query options: Database: Editions – These checkboxes reflect the Citation Databases filter within WoS:

AHCI – Arts & Humanities Citation Index (1975-present)
ISTP – Conference Proceedings Citation Index- Science (1990-present)*
SCI – Science Citation Index (1970-present)
SSCI – Social Sciences Citation Index (1970-present)

*ISTP reflects code currently used by API – it is not clear why it doesn’t correspond with term now used in WoS which is CPCI-S – Conference Proceedings Citation Index- Science (1990-present)

Retrieve Options: Start Record – Allows user to specify start record to return from all results

Retrieve Options: Maximum records to retrieve – Allows user to specify maximum records to retrieve between 1 and 100 (N.B. The API is currently restricted to a maximum of 100 records though it can be queried multiple times.)

Retrieve Options: Sort by (Date) (Ascending/Descending) – Allows user to sort records (currently by date only) ascending or descending in date order.

Proxy settings: This is purely for local network setup at Leeds Met and has nothing to do with WoS but will be necessary for users that are behind a proxy server.

View results: View results of current query (as XML)

Save results: Save results of current query

Perform search request: Perform the specified query

Link to working prototype:

There are several issues with distributing a working prototype in that it has a number of dependencies, some of which are specific to the WS Lite service and it is our view that it is less confusing to release the code only, which is available from http://code.google.com/p/bibliosight/

A screen-cast of the working prototype is available here.

Please note that you will require an appropriate subscription to ISI Web of Knowledge; the service requires an authorised IP address and you will also need to register for Thomson Reuters Web of Science® web services programming interface (WS Lite) by agreeing to the Terms & Conditions at http://science.thomsonreuters.com/info/terms-ws/ and completing a registration form – if you have any problems you should contact your Thomson Reuters account manager.

Link to end user documentation:

End user documentation: https://bibliosightnews.wordpress.com/end-user-documentation/

About the project: https://bibliosightnews.wordpress.com/about/

For use cases see: https://bibliosightnews.wordpress.com/use-cases/

Link to code repository or API:

The code is available from http://code.google.com/p/bibliosight/

Link to technical documentation:

Technical documentation for WS Lite is available from Thomson Reuters and you should address enquiries to your Thomson Reuters account manager.

The code available from http://code.google.com/p/bibliosight/ is fully commented.

Date prototype was launched:

February 9th 2010 (This is code only, not a distribution of a working prototype – there is some very basic info in there on what you’d need to get it running.)

A screen-cast of the working prototype is available here.

Project Team Names, Emails and Organisations:

Wendy Luker (Leeds Metropolitan University) w.luker@leedsmet.ac.uk

Arthur Sargeant (Leeds Metropolitan University) a.sargeant@leedsmet.ac.uk

Peter Douglas (Intrallact Ltd) p.douglas@intrallect.com

Michael Taylor (Leeds Metropolitan University) m.taylor@leedsmet.ac.uk

Nick Sheppard (Leeds Metropolitan University) n.e.sheppard@leedsmet.ac.uk

Babita Bhogal (Leeds Metropolitan University) b.bhogal@leedsmet.ac.uk

Sue Rooke (Leeds Metropolitan University) s.rooke@leedsmet.ac.uk

Project Website:

https://bibliosightnews.wordpress.com/

PIMS entry:

https://pims.jisc.ac.uk/projects/view/1389

Table of Content for Project Posts:

First Post
Quickstep into rapid innovation project management
Project meeting number 1: Draft Agenda
Project meeting – minutes
eurocris
JournalTOCs
SWOT analysis – a digital experiment
Generating use-cases
No one said it would be easy
SWOT update
Project meeting number 2: Draft Agenda
Use case meeting
20 second pitch at #jiscri
Project meeting – minutes
Small but important win – we have XML!
Research Excellence Framework: Second consultation on the assessment and funding of research
JISC Rapid Innovation event at City of Manchester stadium
Quick reminder(s)
Just round the next corner…
Project meeting number 3: Draft agenda
More on ResearcherID
User participation
Project meeting – minutes
Quick sketch
Visit from Thomson Reuters
Project meeting number 4: Draft agenda
Project meeting – minutes
Thinking out loud…
Quick sketch #2
Mapping fields from WoS API => LOM
Project meeting number 4: Draft agenda
The role of standards in Bibliosight
Project meeting – minutes
Web Services Lite
JournalTOCsAPI workshop
Steady as she goes – Bibliosight back on course

Posted in Bibliosight, Final Progress Post, Progress post | Tagged: #jiscri, Bibliosight, demonstrator, finalProgressPost, JISC, output, product, progressPosts, prototype, rapidInnovation | 1 Comment »

Steady as she goes – Bibliosight back on course!

Posted by Nick on December 18, 2009

The good ship Bibliosight was due into port at the end of November with the rest of the jiscri fleet, however, as I reported at the time, she found herself in a spot of heavy weather and, after experimenting throughout the project with a more general, unrestricted API, we activated our subscription to Web Services Light only to discover that is a different enough product that it would need another reasonable chunk of time to learn and implement. I’m pleased to report, however, that Mike has been at the helm night and day, battling manfully through the storm, and has managed to bring us back on course!

After some initial problems dealing with an authentication step and setting up a query in such a way that it actually returned an appropriate XML response, it appears that the structure of the XML returned from WS Lite is actually somewhat better organised than from the general API, and more customisable meaning that for our XML transformation step we can simply create our own XML file in the format that we want such that we can transform without having to worry about the oddities that we were seeing with the general API. Mike initially thought that we could do without the XSLT altogether (i.e. have code to output in the formats we need) but that would reduce the flexibility of the process.

A sample record is reproduced below:

<?xml version=”1.0″ encoding=”UTF-8″?>
<searchResponse>
<!– Number of records in the database/editions selected –>
<numberOfItemsSearched>1000</numberOfItemsSearched>
<!– Number of records that match the query parameters –>
<numberOfItemsFound>1</numberOfItemsFound>
<!– Number of records in the result set –>
<numberOfItemsListed>1</numberOfItemsListed>
<!– Date this file was created (generally would be used to date the query execution time) –>
<dateCreated>2009-12-09T15:30:00Z</dateCreated>
<items>
<item>
<!– Seems to be always present –>
<title>Record title</title>
<!– Seems to be always present –>
<authors count=”3″>
<author>Bloggs, J</author>
<author>Smith, J</author>
<author>Sheppard, N</author>
</authors>
<source>
<!– Not always present –>
<bookSeriesTitle>Book series title</bookSeriesTitle>
<!– Seems to be always present –>
<title>Source title</title>
<!– Not always present –>
<volume>10</volume>
<!– Not always present –>
<issue>1</issue>
<!– Not always present –>
<pages>116-126</pages>
<!– Not always present –>
<published>
<!– Not always present –>
<date>JAN</date>
<!– Seems to be always present –>
<year>2008</year>
</published>
</source>
<!– Not always present –>
<keywords count=”2″>
<keyword>keyword 1</keyword>
<keyword>keyword 2</keyword>
</keywords>
<!– Seems to be always present –>
<ut>000252821700009</ut>
</item>
</items>
<!– This section echoes the query parameters used to generate the results –>
<searchRequest>
<queryParameters>
<databaseId>WOS</databaseId>
<!– These are the only editions we seem to be entitled to –>
<editions count=”4″>
<edition collection=”WOS”>SCI</edition>
<edition collection=”WOS”>SSCI</edition>
<edition collection=”WOS”>AHCI</edition>
<edition collection=”WOS”>ISTP</edition>
</editions>
<!– Symbolic time span can’t be used in conjunction with time span –>
<symbolicTimeSpan>1week</symbolicTimeSpan>
<!– This is a DATABASE time span, not a publication time span –>
<timeSpan>
<begin>2008-01-01</begin>
<end>2008-12-31</end>
</timeSpan>
<!– Language is always ‘en” –>
<userQuery language=”en”>AD=(leeds met* univ*)</userQuery>
</queryParameters>
<retrieveParameters>
<!– Currently this is the only available sort field –>
<fields count=”1″>
<field>
<name>Date</name>
<sort>A</sort>
</field>
</fields>
<!– Max returned records (1 – 100) –>
<count>100</count>
<!– Record offset –>
<firstRecord>1</firstRecord>
</retrieveParameters>
</searchRequest>
</searchResponse>

And here is a diagram of how we expect to map the XML onto LOM XML for ingest to intraLibrary (click on the image for full size):

So far so good, now all we need is a UI:

The UI is not yet coupled to the API but the basic components are now pretty much all in place; Mike has aimed to ensure that the client is as flexible as possible – it will allow users to limit a query by a specified date range including recent updates and can also accommodate additional Database IDs should it be possible to plug-in additional databases in the future, for example.

Hopefully we will get the boat floating early in the New Year when we will finally be able to do some user testing as well as disseminating the code under an appropriate licence (probably GNU GENERAL PUBLIC LICENSE Version 3 – http://www.gnu.org/copyleft/gpl.html)

Merry Christmas!

Posted in Bibliosight, Progress post | Tagged: #jiscri, Bibliosight, progressPosts, User Interface | 3 Comments »

The role of standards in Bibliosight

Posted by drpdouglas on November 17, 2009

Standards play a significant role in supporting the process detailed in an earlier post; specifically, allowing metadata records (retrieved from Web of Science) to be transformed and deposited into the Leeds Met institutional repository (intraLibrary).

The method of depositing into intraLibrary is SWORD (see http://www.swordapp.org). SWORD is a profile of the Atom Publishing Protocol which was developed as part of a JISC funded project. Simply put, SWORD allows for content packages or metadata records to be deposited into a repository from outside of that repository. Previous to SWORD there was not a standard way of doing this.

Leeds Metropolitan University uses intraLibrary 3.1 which uses version 1.2 of SWORD.

IntraLibrary 3.2, due for release sometime in the new year will support SWORD 1.3. A SWORD 1.3 reference implementation of intraLibrary can be found at http://www.swordapp.org/sword/demonstrators. Note that while intraLibrary is being used in Bibliosight, there are other repositories which use SWORD – see http://www.swordapp.org for more information.

For Bibliosight we are depositing metadata records and, again, this relies on standards. The format of the metadata record being deposited is LOM (Learning Object Metadata – see http://wiki.cetis.ac.uk/What_is_IEEE_LOM/IMS_LRM for a good overview of LOM) with bibliographic citation extensions. From version 3.0 onwards, intraLibrary provides extensions of the LOM Identifier fields for describing bibliographic citation data about a resource. The metadata fields used in this extension conform to DCMI’s Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata – see http://www.dublincore.org/documents/dc-citation-guidelines/.

An example of a metadata record including bibliographic citation metadata can be seen at the end of this article. This shows the format of a metadata record which can be deposited into intraLibrary using SWORD. The Bibliosight project will be taking the XML record which WoS returns and converting this to the format shown below using XSLT. The Bibliosight team are currently working on a crosswalk between the WoS-formatted metadata and the LOM record we require for deposit (see Nick’s post on Nov 16th).

Note that the following example also includes XCRI (course information) extensions and CLA reporting extensions but those are not required in the Bibliosight project.

<?xml version=”1.0″ encoding=”UTF-8″?>

<lom:lom xmlns:lom=”http://ltsc.ieee.org/xsd/LOM” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:ns1=”http://www.intrallect.com/metadata_model/xcri” xmlns:ns2=”http://www.intrallect.com/metadata_model/claReporting” xmlns:ns0=”http://www.intrallect.com/metadata_model/bibliographicCitation” xsi:schemaLocation=”http://ltsc.ieee.org/xsd/LOM http://ltsc.ieee.org/xsd/lomv1.0/lomLoose.xsd”>

<!–Generated by transforming IMS 1.2.1

metadata to IEEE LOM Metadata–>

<lom:general>

<!–General Section–>

<lom:identifier>

<lom:catalog>http://edinburgh.intralibrary.com:8080/IntraLibrary-OAI</lom:catalog>

<lom:entry>oai:com.intrallect.edinburgh:10</lom:entry>

<ns0:bibliographicCitation xmlns:imsmd=”http://www.imsglobal.org/xsd/imsmd_v1p2″>

<ns0:genre>

<ns0:source>Bibliographic Citation</ns0:source>

<ns0:value>book</ns0:value>

</ns0:genre>

<ns0:title>

<ns0:string language=”en”>In and Out of Morocco: Smuggling and Migration</ns0:string>

</ns0:title>

<ns0:spage>64</ns0:spage>

<ns0:epage>97</ns0:epage>

</ns0:bibliographicCitation>

<ns0:bibliographicCitation xmlns:imsmd=”http://www.imsglobal.org/xsd/imsmd_v1p2″>

<ns0:genre>

<ns0:source>Bibliographic Citation</ns0:source>

<ns0:value>book</ns0:value>

</ns0:genre>

<ns0:title>

<ns0:string language=”en”>In and Out of Morocco: Smuggling and Migration</ns0:string>

</ns0:title>

<ns0:spage>64</ns0:spage>

<ns0:epage>97</ns0:epage>

</ns0:bibliographicCitation>

</lom:identifier>

<lom:title>

<lom:string language=”en”>The Impact of Migration on Status Distinctions</lom:string>

</lom:title>

<lom:description>

<lom:string language=”en”>Chapter 4 pp64-97 + endnotes pp166-169</lom:string>

</lom:description>

<lom:language>en</lom:language>

</lom:general>

<lom:lifeCycle>

<!–Lifecycle Section–>

<lom:contribute>

<lom:role>

<lom:source>LOMv1.0</lom:source>

<lom:value>author</lom:value>

</lom:role>

<lom:entity>BEGIN:vcard

FN:David McMurray

END:vcard</lom:entity>

<lom:date>

<lom:dateTime>2008-12-18T10:11:07.608</lom:dateTime>

</lom:date>

</lom:contribute>

<lom:contribute>

<lom:role>

<lom:source>LOMv1.0</lom:source>

<lom:value>publisher</lom:value>

</lom:role>

<lom:entity>BEGIN:vcard

FN:University of Minnesota Press, Minneapolis

END:vcard</lom:entity>

<lom:date>

<lom:dateTime>2000-04-05T00:00:00Z</lom:dateTime>

</lom:date>

</lom:contribute>

<ns1:course xmlns:imsmd=”http://www.imsglobal.org/xsd/imsmd_v1p2″>

<ns1:courseIdentifier>CAWA</ns1:courseIdentifier>

<ns1:courseTitle>

<ns1:string language=”en”>Arab World Studies</ns1:string>

</ns1:courseTitle>

<ns1:coursePresentation>

<ns1:coursePresentationCohort>10</ns1:coursePresentationCohort>

</ns1:coursePresentation>

</ns1:course>

</lom:lifeCycle>

<lom:metaMetadata>

<!–Metametadata Section–>

<lom:identifier>

<lom:catalog>ISBN/ISSN</lom:catalog>

<lom:entry>9780816625079</lom:entry>

</lom:identifier>

<lom:contribute>

<lom:role>

<lom:source>LOMv1.0</lom:source>

<lom:value>creator</lom:value>

</lom:role>

<lom:entity>BEGIN:vcard

FN:Charles Duncan

ORG:Intrallect Ltd

EMAIL:C.Duncan@intrallect.com

END:vcard</lom:entity>

<lom:date>

<lom:dateTime>2008-12-18T10:11:07.610</lom:dateTime>

</lom:date>

</lom:contribute>

<lom:metadataSchema>IEEE LOM 1.0</lom:metadataSchema>

<lom:language>en</lom:language>

</lom:metaMetadata>

<lom:technical>

<!–Technical Section–>

<lom:format>application/pdf</lom:format>

<lom:size>2182217</lom:size>

<lom:location>test</lom:location>

</lom:technical>

<lom:rights>

<!–Rights Section–>

<lom:copyrightAndOtherRestrictions>

<lom:source>LOMv1.0</lom:source>

<lom:value>yes</lom:value>

</lom:copyrightAndOtherRestrictions>

<lom:description>

<lom:string language=”en”>University of Minnesota Press, Minneapolis, 2000</lom:string>

</lom:description>

</lom:rights>

<lom:relation>

<!–Relations Section–>

<ns2:claReporting xmlns:imsmd=”http://www.imsglobal.org/xsd/imsmd_v1p2″>

<ns2:claCodeA>

<ns2:claCodeASource>

<ns2:source>CLA</ns2:source>

<ns2:value>A</ns2:value>

</ns2:claCodeASource>

<ns2:claCodeASourceInstitution>University of Edinburgjh</ns2:claCodeASourceInstitution>

</ns2:claCodeA>

</ns2:claReporting>

<ns2:claReporting xmlns:imsmd=”http://www.imsglobal.org/xsd/imsmd_v1p2″>

<ns2:claCodeA>

<ns2:claCodeASource>

<ns2:source>CLA</ns2:source>

<ns2:value>A</ns2:value>

</ns2:claCodeASource>

<ns2:claCodeASourceInstitution>University of Edinburgh</ns2:claCodeASourceInstitution>

</ns2:claCodeA>

</ns2:claReporting>

</lom:relation>

<lom:classification>

<!–Classification Section–>

<lom:purpose>

<lom:source>LOMv1.0</lom:source>

<lom:value>discipline</lom:value>

</lom:purpose>

<lom:taxonPath>

<lom:source>

<lom:string language=”en”>intrallect</lom:string>

</lom:source>

<lom:taxon>

<lom:id>b00</lom:id>

<lom:entry>

<lom:string language=”en”>Books</lom:string>

</lom:entry>

</lom:taxon>

</lom:taxonPath>

</lom:classification>

</lom:lom>

Posted in Bibliosight, Progress post, XSLT transformation | Tagged: #jiscri, deposit, LOM, metadata, progressPosts, standards, SWORD, techStandards | 1 Comment »

Thinking out loud…

Posted by Nick on November 11, 2009

As the deadline for #jiscri draws close I have just returned to work after a month away from Bibliosight and I’m now desperately trying to catch up with the project and determine exactly what we can aim to achieve by the end of November…The candid truth is that we have only very recently got to the point where Mike can actually do some coding and begin to put together a prototype that fulfills the requirements of our (still formative) use-case[s].

Yesterday morning I had a stab at completing a more detailed template for a primary use-case (this comprises a narrative and the use case itself); then in the afternoon I sat down with Mike to catch up with his progress from a technical perspective and to brain-storm around precisely what functions we require from our prototype and how this may be achieved; there are also some outstanding issues of clarity pertaining to Thomson Reuter’s API documentation, specifically “WoS Search Retrieve Codes and Descriptions” in that we currently have unrestricted access to the API but it is my understanding that the free* service will actually be restricted. We are not certain:

a) Precisely which of the fields are associated with the restricted subset that we will be able to query and/or return under the current terrms of our WoS subscription*

b) What some of the fields actually are as they lack a description in the documentation

*Free to us under existing subscription

Disclaimer: I’m very much thinking out loud here and attempting to translate what I understand are ongoing conceptual issues for Mike as he works through the documentation.

Note: I’ve continued to refer to ResearcherID – see https://bibliosightnews.wordpress.com/2009/10/02/visit-from-thomson-reuters/ – though it is not a service we plan on implementing as part of Bibliosight, and not necessarily even in the longer term, I’m pretty sure we are likely to require some sort of unique identifier for authors – a subject that is currently receiving a lot of attention from the repository community.

Anyway…looking back over the blog it seems that:

The requesting system can query the Web of Science using the following fields:

Address (including Street, City, Province, Zip Code, or Country)
Author
Conference (including title, location, data, and sponsor)
Group Author
Organization or Sub-organization
Source Publication (journal, book or conference)
Title
Topic
Year Published

The service will support the AND, OR, NOT, and SAME Boolean operators.

The Web of Science Web Service returns five fields to the requesting system:

Article Title
Authors — All authors, book authors, and corporate authors
Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
Keywords — all author supplied keywords
UT — A unique article identified provided by Thomson Reuters

The test queries that Mike has submitted to the API have returned XML that appears to be both more granular than indicated and that includes fields other than those that constitute these five (e.g. abstract) so the first thing to do, perhaps, is to contact Thomson Reuters and see if they can apply the restrictions that we will ultimately need to work with, if only to remove some of the noise and make it easier to see the wood for the trees.

The API documentation actually lists over 100 “fields”; only a handful of these are actually described in the documentation, however, and while many are reasonably transparent, others are a little less so and some look like they may duplicate information – or are they perhaps used as alternatives? (e.g. bib_id = Volume, issue, special, pages and year data / bib_issue = Volume and year data). There is also some lack of consistency in this bibliographic info on a record by record basis; we need to ensure that we have consistent XML being returned for all records – hopefully we can then develop a template in intraLibrary itself that reflects that consistent XML as closely as possible such that we can devise an XSLT style-sheet to perform the approriate transformation.

Mike already has a desktop client that will take XML and perform an XSLT transformation so, once we have clarified the LOM format we require (an action for me from the last meeting), it *should* be relatively straightforward to plug into the WoS API to retrieve XML from the Web of Science which can then be transformed into appropriate LOM.

Then we need to ingest that LOM into intraLibrary, preferably using SWORD…which I shall think about another time!

Posted in Progress post | Tagged: #jiscri, #UseCase, ResearcherID, WoS API, XML | 1 Comment »

Visit from Thomson Reuters

Posted by Nick on October 2, 2009

On Wednesday afternoon Mike and I were finally able to sit down with Jon and…Gareth? (sorry, I’m terrible with names) from Thomson Reuters to discuss Bibliosight and the work we are doing with the WoS API, it probably goes without saying just how useful this was, especially so soon after our Tuesday meeting.

As we have come to appreciate, Thomson are still very much in an ongoing process of developing their suite of tools and commercial services around the extraction of data from WoS using their API and, overall, I was given the impression that the company are currently practising something of a balancing act to weigh their commercial interests against providing appropriate value added services to their subscribers under existing licensing agreement – which is, of course, entirely reasonable. Jon suggested that the Bibliosight project is something of a pioneer in using this technology and a useful case-study for the company, which certainly puts some of our early difficulties into context – though he did indicate that numerous other folk are also actively investigating the API; in particular he mentioned Queens College Belfast, an institution in Birmingham and R4R at Kings College London in collaboration with EPrints’ Les Carr at Soton. R4R is the only project that I was hitherto aware of and have had any contact with; it would be really useful if we were able to communicate with others also using the API.

Thomson Reuter’s flagship commercial product is called InCites and “supplies all the data and tools you need to easily produce targeted, customized reports… all in one place. You can conduct in-depth analyses of your institution’s role in research, as well as produce focused snapshots that showcase particular aspects of research performance.” We discussed how, though such a service will be invaluable for the research oriented Russell Group institutions, it is likely to be overkill for a million plus institution like Leeds Met; nevertheless we do require a certain level of functionality to help us analyse our research performance which, alongside our traditional strengths in teaching and learning, is increasingly important, especially in view of the REF. Hopefully this is where the developing ‘suite of tools’ comes in and our guests were keen to get a handle on precisely what we are hoping to achieve with Bibliosight (aren’t we all!). I outlined our preliminary use-cases for them as a foundation for our discussion and was also keen to ask some of the specific questions that had arisen during the previous day’s meeting. First of all I asked about the wording of the documentation that appears to suggest that it is only possible to return 100 records with a single query using the API – they weren’t aware of such an issue and agreed that the way it was expressed in the documentation was a little ambiguous; Jon will follow this up for us though Mike may also be able to elucidate the situation when he has investigated further. They were able to say that another user had discovered that the API could be called twice every second, however, so didn’t anticipate any problems with extracting all the data we need.

The major issue that came up at the meeting on Tuesday was how best to return all of the articles for a given institution with the most appropriate field to query apparently being the address field. It is not clear, however, how consistent the institutional address actually is and Jon confirmed that it is derived from information harvested from individual journals/papers which preliminary manual searching of WoS has already demonstrated to be idiosyncratic – at least in the case of Leeds Metropolitan University and almost certainly other institutions aswell (leeds metropolitan university; leeds met [uni]; lmu etc). Jon suggested that the safest and most effective method of returning all records would actually be by using ResearcherID though this would require all institutional authors to be registered and an additional paid subscription to ResearcherID download (as opposed to upload which is free) – in lieu of this, however, he did confirm that the address field was the only way and that it may be necessary to build a catch-all query to ensure that we don’t miss anything – precisely how we achieve this is still a little bit of a moot point, though he did indicate that some work has been done on disambiguating institutional address formats within WoS and will follow up on this for us in due course.

Through our discussion, Article Match Retrieval is finally beginning to make more sense to me now, and Jon confirmed that this is the method that would be used in conjunction with the API to provide numbers of citations to an individual article – AMR can be queried by numerous fields including DOI and UT Identifier (A unique identifier for a journal article assigned by Thomson Reuters.); in terms of the current project, I think it makes sense to focus initially on extracting bibliographic data first before worrying about citation metrics; via the API, we can also extract the UT identifier and then use this to query AMR.

We also touched on Terms & Conditions and Thomson, again reasonably, expect WoS as data source to be clearly acknowledged on each individual record – Mike wasn’t initially certain how this could easily be achieved from a technical perspective, at least in the case of bibliographic citation information (which may have been added manually); we have a few ideas on how this could actually be achieved but is really just something to be aware of at this stage.

All in all I now feel that the overall shape project is beginning to be resolved and, in addition to the technical work required to extract, store, parse, convert (XML) records and then pass them somewhere else (intraLibrary/EndNote), a large part of Bibliosight will necessarily focus on developing use-cases for our institutiona research administration which is likely to continue well beyond the designated 6 month life-cycle of the #jiscri project!

Posted in Progress post, Research Excellence Framework, Thomson Reuters Research Analytics | Tagged: Article Match Retrieval, Bibliosight, progressPosts, R4R, ResearcherID, Thomson Reuters, use case, WoS API, XML | 2 Comments »

Project meeting – minutes

Posted by Nick on October 1, 2009

(Date of meeting 29th September 2009)

Present: Peter Douglas, Wendy Luker, Arthur Sargeant, Mike Taylor, Babita Bhogal, Sue Rooke, Nick Sheppard

1. Apologies

No apologies

2. Team membership

Thank you to Sue Rooke who has agreed to join the Bibliosight project team; Sue is a research administrator in the Faculty of Health and has already been involved in repository development, contributing to developing workflows and providing feedback on the Open Search interface. We hope that Sue will contribute, in particular, to use case development.

The team is still lacking a representative from the academic community and we are currently waiting for a reply to recent correspondence. WL is attending the research sub-committee on Monday 5th October and may raise the issue there if necessary.

Action: WL/NS to pursue academic contact(s) for a representative to sit on the project team

3. Progress since last meeting

• API

We have now received the updated documentation from Thomson Reuters and Mike has submitted a query to the API and received an appropriate response in XML. Thomson Reuters’ FAQ gives a full summary of the data fields that can be queried by the service and the data elements that can be returned which appears to be in line with this XML response.

We are therefore able to formally reduce the associated risk back to low:

Risk

Probability

Impact

Action to Prevent/Manage Risk

API unsuitable for project deliverables

Low (elevated to Medium;1^stSeptember 2009 – reduced back to Low; 29th September 2009)

High

Feedback from Thomson Reuters indicates proposal technically feasible.

Problems with API/documentation have been mitigated by release of new documentation from Thomson Reuters; ^29th September 2009)

N.B. The wording of the documentation appears to suggest that it is only possible to return 100 records with a single query using the API – NS to clarify with Thomson Reuters. If this is the case, the practical implications are limited in the case of Leeds Metropolitan University which publishes a relatively small amount of research but would be considerable for an institution with a greater research output.

Action: NS to clarify 100 record limit with TR

Action: MT to continue appropriate* implementation of API

* Hopefully what is “appropriate” will evolve over the coming weeks!

• Use cases

Technical difficulties have contributed to a lack of conceptual clarity amongst the project team and there was considerable discussion around precisely what data Bibliosight will now seek to retrieve from WoS using the API and what we will aim to achieve with that data.

The original use case narratives outlined in the bid were several and focussed on an alert service for researchers and/or repository administrators to encourage the deposit of an appropriate full text in the repository and perhaps neglected the obvious administrative use case whereby metadata from WoS is pulled directly into intraLibrary.

N.B. An important use case was also the extraction of citation metrics that would potentially inform the REF – we are not yet clear how this would be achieved but we understand it will rely on the Article Match Retrieval service.

Of course we also want to produce outputs that are of use to the wider community rather than just to users of our specific repository software and this reflects the considerations of the Readiness for REF project which also hopes to enable UK repositories to make effective and efficient use of the WoS API (as part of a much broader project) and is focussing on EPrints, DSpace and Fedora as the most well established OA research repository platforms. R4R raises several pertinant questions, many of which also arose independently and in a similar form during our own discussion:

What are the different workflows relevant to (i) backfilling a repository with a one-off download and (ii) ongoing use of WoSAPI to populate a repository?
What uses might records downloaded from WoSAPI be put to?
How might the workflows be designed to enable other datastreams also to help populate the repository (eg from UK PubMedCentral, arXiv, or sources that better serve the arts, humanities and social sciences)?
What workflows might be able to handle facts such as that the WoS record will become available some time after the paper is published, whereas deposit into the repository may happen earlier than that?
What methods might be helpful in addressing the inevitable questions of duplicate records, or ambiguous relations with existing records?
Are there implications for a repository’s mission and reputation if the balance of content it holds is rapidly changed by a large number of WoS-derived records?

Use cases may also be informed by the JournalTOCsAPI project (see item 5 below) who also explored similar issues in a recent post.

One practical consideration from a technical perspective and that will have a bearing on developing use cases is the best method of extracting comprehensive records from institution “X” – the most appropriate field to query seems to be the address field but it is not clear how consistent the institutional address in this field will be – for example, early experimentation has found that “leeds metropolitan university” only returns 201 records; using a wildcard in the form “leeds met*”, however, returns 1503 records (test conducted 29th September 2009). This was an issue flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

In terms of the practicalities of actually getting records from WoS into intraLibrary once they have been harvested, Peter did indicate that it should be possible to upload suitable XML records into intraLibrary though this will need to be in LOM format, meaning that we may need to perform an XSLT transformation to convert data retrieved from WoS into a suitable format. Also, Peter is uncertain whether XML that can be imported in this way will also include the LOM extensions we are using to accommodate bibliographic information and will need to speak to his technical colleagues at Intrallect to clarify.

Note: There was also discussion around appropriate integration with SFX, our OpenURL resolver, as a possible means of identifying a published URL for WoS records – this is an area that has scope implications both for Bibliosight and the remit of the Leeds Metropolitan University repository itself; beyond an Open Access repository of research (i.e. to also comprise citation only records). This is an area that may need to be explored in more detail later in the project.

Action: PD to clarify re upload of XML to intraLibrary including LOM extensions

Action: NS/BB/SR to meet with another member of the URO to clarify potential use cases (meeting on Thursday 1st October)

Action: All team members to contribute to ongoing discussion on the blog.

• Project reporting – blog; tags specified by JISC

It was agreed that the specific subject for blog posts this month will be ‘Technical standards’ – Peter agreed to contribute a post before the next meeting.

Action: PD to contribute a blog post on ‘technical standards’.

Action: All team members to contribute to ongoing discussion on the blog.

4. Visit by Thomson Reuters reps on Wednesday 30th September

Mike and I met with Jon and Gareth from TR on Wednesday 30th (yesterday) who were able to clarify several issues for us – separate post to follow

5. Review of JournalTOCsAPI – http://www.journaltocs.hw.ac.uk/index.php?action=api

During the meeting, I gave a quick overview of the recently released JournalTOCsAPI at http://www.journaltocs.hw.ac.uk/index.php?action=api with a view to de-mysifying the concept of an API for the less technical amongst us and also potentially giving the more technical a developmental steer. Currently, queries need to be submitted to the API by URL and are returned as an RSS feed which includes as much metadata as in the original TOC feed – depending on the quality of the original record – comparable to Bibliosight in many respects, this project perhaps has greater flexibility regarding the metadata it is able to query and return – it is, after all, building an API from the ground up that will query an openly accessible data source – however, it is likely that the quality of the data may not be as consistent as WoS; there may be fields missing, for example.

It has also been informative to engage with another, similar project as a ‘user’ and we discussed how Bibliosight might also engage with JournalTOCsAPI community of users and agreed that it is a valuable opportunity to solicit the opinion of repository managers from other institutions using different software platforms.

Action: NS to continue engaging with JournalTOCsAPI as a ‘user’

Action: NS to send an email that can be forwarded to JournalTOCsAPI community of users as suggested in recent correspondence from Lisa Rogers

6. Article Match Retrieval & Researcher ID

These were only touched upon briefly in the meeting and flagged to follow up with Thomson Reuters reps on Wednesday 30th September (see item 4; post to follow).

7. A.O.B.

None

8. Date of next meeting

20th October 2009 – 11:30 am

Posted in Bibliosight, Progress post, SCRUM minutes | Tagged: #jiscri, Bibliosight, JournalTOCsAPI, minutes, progressPosts, ResearcherID, Scrum, use case, userCase, WoS API | 2 Comments »

Just round the next corner…

Posted by Nick on September 24, 2009

Thomas Reuters new research analytics website – http://researchanalytics.thomsonreuters.com/ – has finally provided a thread through the labyrinthine Research Analytics infrastructure that I’m able to follow – forgive the hyperbolic metaphor – it’s probably isn’t that difficult to navigate and I’m hardly an ancient Greek hero – just easily lost! Nevertheless, it links intuitively amongst the various information and I’m reviewing, in particular, the information on Web Services, Article Match Retrieval and ResearcherID in advance of next week’s meeting.

I’ve certainly asked most of the Web Services FAQ myself over the past few weeks. The most relevant from our perspective are:

What data fields can be queried through the service?

The requesting system can query the Web of Science using the following fields:

Address (including Street, City, Province, Zip Code, or Country)
Author
Conference (including title, location, data, and sponsor)
Group Author
Organization or Sub-organization
Source Publication (journal, book or conference)
Title
Topic
Year Published

The service will support the AND, OR, NOT, and SAME Boolean operators.

What data elements are returned by the service?

The Web of Science Web Service returns five fields to the requesting system:

Article Title

Authors — All authors, book authors, and corporate authors
Source — Includes the source title, subtitle, book series and subtitle, volume, issue, special issue, pages, article number, supplement number, and publication date
Keywords — all author supplied keywords
UT — A unique article identified provided by Thomson Reuters

One of the issues we are likely to run into retrieving data from WoS is differentiating between similar names and disambiguating the same name that has been entered in different formats and this is where ResearcherID can come in.

N.B. This is actually an issue with implications beyond Bibliosight and purely internally; I’ve been aware of the need for a unique identifier for researchers in intraLibrary for a while, prompted by a blog post from Open Research Online describing how they have developed a feed of a faculty group members’ publications from ORO (EPrints) to the faculty’s website which “made use of the fact that everyone’s publications in ORO are linked to their unique university ID.” This prompted me to wonder aloud on Twitter if such use of unique identifiers was standard practice for Eprints – @smithcolin and @ostevens tell me that it isn’t.

ResearcherID is a global service to assign a unique identifier and eliminate author misidentification – with the obvious benefit over an institutional ID that it is universal rather than just local.

As far as I understand, the ResearcherID Web Service from Thomas Reuters comprises two element –

ResearcherID upload “that enables administrators to mass create ResearcherID profiles and upload publication data for some or all of the accounts you create for faculty, researchers, etc. at your institution.”
Researcher ID download is “a web-based service that enables you to query ResearcherID for researchers at your institution and return publication data for them, including times cited counts where applicable, as well as return institution affiliation for researchers at the requesting institution.”

Upload is freely available to everyone but download is a subscription based service.

I have now registered with the ResearcherID batch upload service and will report on it more fully at the meeting next week.

So what about Article Match Retrieval? To be honest, this is where my thread runs out, and I’m still not entirely sure how this fits in. It’s free I think (to WoS subscribers) and the blurb says:

“Article Match Retrieval allows for a real-time lookup of bibliographic metadata such as DOI, author, source title, etc., against the Web of Science database (using the institution’s subscription entitlements). If a match is found, the service will return Times Cited information as well as links to view the full record, related records page, or citing articles page in Web of Science. An institution can use these links as a way to link into Web of Science from their library web page or institutional repository. Subscribers to Journal Citation Reports can use this service to retrieve links to the JCR record for a given journal.”

There is then a form to fill out “to find out how to create direct links to Web of Science articles or Journal Citation Reports” – I’m pretty sure I’ve already filled it out back when we were submitting the bid but this is something to check out with Jon when he comes on Wednesday…

So though things are not crystal clear quite yet, Tuesday and Wednesday next week should put us firmly on the right track.

Posted in Progress post, Thomson Reuters Research Analytics | Tagged: #jiscri, Article Match Retrieval, FAQ, progressPosts, ResearcherID, Thomson Reuters, Web Services | 2 Comments »

Quick reminder(s)

Posted by Nick on September 24, 2009

Just a quick reminder to the project team that we should be regularly posting in the 6 subject areas specified by JISC: Project SWOT analyses; User participation; Day to day work; Technical standards; Value add; Small wins and fails; Progress report.

Thanks to Mike for the recent Small win post (actually a rather big win!).

I’m currently putting together the agenda for next Tuesday’s meeting – I’ll post here and email a copy later today.

A reminder also that we will be joined by a new team member at the meeting – Sue is a research administrator in the Faculty of Health and has already been involved in repository development – testing for us and providing feedback on the developing infrastructure; she will have some good perspectives on institutional and faculty research administration and should be able to contribute to our use cases.

Finally, Jon Stroll, our rep from Thomson Reuters, is visiting us on Wednesday 30th – he will have a technical colleague with him and should be able to give us a good steer on the project – this will be an item on the agenda.

Posted in Progress post | Tagged: #jiscri, Bibliosight, progressPosts, reminder, use case | 1 Comment »

BiblioSight News

Integrating the Web of Science web-services API into the Leeds Met Repository

Categories

Blogroll

Twitter updates

mrnick

Subscribe

Tags

Archive for the ‘Progress post’ Category

Final Progress Post

Steady as she goes – Bibliosight back on course!

The role of standards in Bibliosight

Thinking out loud…

Visit from Thomson Reuters

Project meeting – minutes

Just round the next corner…

What data fields can be queried through the service?

What data elements are returned by the service?

The Web of Science Web Service returns five fields to the requesting system:

Quick reminder(s)