Who Goes There? An investigation into XCRI and Identifiers

Two members of the CCLiP team, Alan Brown (University of Liverpool, XCRI feed producer and project partner) and Anne Qualter (head of the Educational Development division at the University of Liverpool, on behalf of the CCLiP project manager), attended the MUSKET Assembly on 2nd July 2010. At the assembly Alan Brown gave a presentation about the use of XCRI in the CCLiP portal. Part of his presentation addressed the issue of identifiers and the <identifier> element in XCRI, and it was this part of the presentation that seems to have sparked some discussion.

Alan’s presentation explained that an identifier is required for the catalog, provider, course and presentation elements, and that we suggest that XCRI providers use a URL as the identifier. We’ve had several people asking why we suggest that a URL is used for the identifier, and I hope to explain our reasoning, how the CCLiP project uses URLs and why the use of URIs makes sense. I’ll also address the difference between a URL and a URI a little later on.

Let’s start with what the CCLiP project asks XCRI providers for in terms of an identifier. CCLiP asks that id’s (as I shall refer to them from now on) be provided in the form of a URI (Uniform Resource Identifier). Ideally we want a URL (Uniform Resource Locator), a resolvable address that provides human understandable content about the catalog, provider, course or presentation. However, any URI will do.

Ok, so what’s the difference between a URL and a URI?

A URI is in fact a superset made up of URL and URN (Uniform Resource Name). A URL is used to identify something on the internet and can be resolved to an actual resource. It also doesn’t have to be restricted to a webpage or HTTP request, it can use any application layer protocol (eg FTP). A URN does not resolve to a tangible resource, it is merely a name for something. So a URI is either a URL, or a URN.

I’m not going to go into more detail than that, but if you’re still confused or want more information there is an excellent overview here: http://www.damnhandy.com/2009/08/26/url-vs-uri-vs-urn-in-more-concise-terms/

So why do we use URIs?

As an aggregator of XCRI information it’s important that all the ids that we receive be unique. An id isn’t much use to us if it’s the same as a dozen previously received ids from a multitude of different suppliers. If every organisation that is supplying us with a feed starts their course ids at 1 and increments by 1 for each new course we’re going to have a very confusing time. To get around this, the XCRI Wiki says:

“It is recommended that identifiers likely to be used by aggregators should be URLs that resolve to human-readable content.” – (http://www.xcri.org/wiki/index.php/Identifier)

This, along with several conversations with Alan Paull, persuaded us that it was the best approach to take. However, using a URL assumes a perfect world where each course has a webpage about it and every presentation a webpage about the specifics of that presentation, but, as we all know, this isn’t a perfect world and that’s not always the case. We could demand that all of our XCRI providing partners create pages where they don’t currently exist, but this may dissuade them from providing an XCRI feed or ultimately taking part in the project, and we don’t want that. Instead it’s much easier to allow a URI that doesn’t resolve to a resource, but does identify that catalog, provider, course or presentation.

Ok, but why are URLs such a good idea?

A domain (eg liverpool.ac.uk) is unique. Globally.  There is only one domain with that name, and that makes it ideal for use as an identifier. As we already know that a domain is unique then it makes the perfect identifier for the provider. The University of Liverpool has the domain Liverpool.ac.uk which we know is unique, and can therefore be used for the provider identifier:

<provider>

                <identifier>http://www.liverpool.ac.uk</identifier>

                …

</provider>

Equally any URL that belongs to that domain will also be unique. Therefore a page that describes a course or presentation will also be unique, and makes the ideal identifier for that course or presentation. As all of the providers in the CCLiP project that will be providing an XCRI feed have a domain name this suits us fine. I’ll look at options for organisations without shortly.

Our only problem then is the case when there isn’t a URL for a catalog, provider, course or presentation. What do we do then? Well, as we already know that the domain is unique we’re half way to having a unique URN. We can still use the domain, which we know is unique, and then we can add our database ids which we also know are unique. Whilst our database ids may be duplicated by someone else’s database if we add them to the unique domain then we can guarantee global uniqueness. For example a presentation of a course with database id 123 could take the form of an XCRI id like this:

<presentation>

<identifier>http://www.liverpool.ac.uk/presentation/123</identifier>

</presentation>

Whilst this address may not resolve to a resource it is unique, and fulfils our requirement. Personally I believe a better option for those who want to use a URN rather than a URL is as follows. Create an HTML page called course.htm or presentation.htm etc as required with text that reads something along the lines of:

“We do not currently store information about individual presentations, but for more information about the courses we provide please visit: <INSERT URL FOR COURSES HERE>”

The message could equally give an email address or phone number. You can then use this page with the id added at the end to form a unique id in the following way:

<presentation>

<identifier>http://www.liverpool.ac.uk/presentation.htm?id=123</identifier>

</presentation>

This means that although the page doesn’t hold information about the presentation it now gives information that can be used to find out more, and remains unique.

Finally, what can we do with organisations with no domain name? Well, we can still produce a unique string if we have another globally unique identifier; perhaps a telephone number or some other business identifier? A telephone number could work, using the University of Liverpool as an example, by using the international dialling code, +44(0)1517942000, we could then add the database id in the same way:

<presentation>

<identifier>tel:+44(0)1517942000/presentation/123</identifier>

</presentation>

However, in this instance the number would need to conform to the URI tel conventions. More info here: http://www.rfc-archive.org/getrfc.php?rfc=3966

I hope that this post explains a bit more about identifier and goes some way toward showing why URIs make sense for this purpose. If you have any comments or would like to discuss the issue further please don’t hesitate to get in touch.

Leave a Reply