INST > Clients > JISC > ioNW2

Architecture to support the representation and use of National Occupational Standards, skill and competence information: issues and discussion

Simon Grant, version of 2008-06-05

Background

The prior document, “An analysis of skill domain concepts underlying two sets of existing National Occupational Standards”, described the concepts and structures of existing National Occupational Standards (NOSs), focusing examples on those created and managed by two Sector Skills Councils: Skillset, and Skills for Health. This present document takes up the question of how practically to represent NOS and similar materials and their concepts and structures so that they can be made available electronically, through the Internet, and used and reused in many ways.

NOSs are generalised definitions of the characteristics of personal competence in operational settings. Though it is generally agreed that “competence” most properly refers to the ability of a person to perform to certain standards in a range of situations, some people use the word “competency” interchangeably with “competence”, while others use “competency” to mean an abstract description separate from context or level. In this report, the level of abstraction is not the focus of interest, and the word “competence” is used throughout.

One question is how to transmit information about competence. Transmitting information is the key task which informs interoperability specifications and standards. We will here look at two or three options, each of which has different merits. If files were held on web servers in those formats, that would be all that had to be answered. However, in practice there are several advantages to holding the information in some kind of web database, and serving it up on demand, in any of several formats. The deeper question is then how to hold the information so that web programs can serve up the competence definition information in any of the various ways suggested.

One identified use is in conjunction with XCRI – eXchanging Course Related Information. The only strict requirement with XCRI is for each competence definition to have a unique identifier, as the very fact of having a unique identifier would allow automatic matching of, say, the learning outcomes of one course with the entry requirements of a higher course.

Another potential use, beyond the scope of this report, is in electronic portfolio technology, where personal claims to competence can refer to the proper definitions of competence, and thus be matched automatically against requirements referring to the same definitions.

Identifying and communicating the NOS structures

Interoperability formats

The first issue to explore is the general approach to communicating NOS materials in an XML or similar format. Three approaches are identified here, of which two have enough merit to be explored further.

The most readily available, and perhaps the most widespread, technology for representing information for interoperability or exchange is XML. Secondarily, several collaborators have recently been recommending exploring the Semantic Web and RDF, which can be expressed as XML, as well as in other ways. Both technologies are mature enough to have plenty of literature, tools and documentation available. XML has been a preferred “binding” for interoperability specifications for several years now, while RDF is now increasingly used in fields where it is relationships that are being represented. A popular application of RDF is the Friend of a Friend project (FOAF).

As the previous document has described, there are some things in NOSs that seem like substantial entities or resources, and other things that look more like relationships between resources. This invites one plausible approach of representing resources or substantial entities using some kind of plain XML, and information about relationships between them using RDF.

Following this first approach, there is plenty of information associated directly with a NOS Unit, and this is the prime candidate to be represented as a substantial entity or resource, using a plain XML approach. Statements can also be represented as resources. Sets of units could also be seen as resources, particularly when the set is widely referred to, or has information associated with it that is not simply an aggregate of the information associated with the individual units.

On the other hand, the relationships of statements to units, the relationships of units to each other, and the relationships of units to sets of units could be represented in the more relationship-oriented RDF, rather than plain XML. This is the other part of the first approach explored here.

A second approach would be to use RDF for all of the information. It is likely that this would not take a great deal more effort, but RDF has in the past been a barrier to some people. There are two common ways of “serialising” RDF: RDF/XML uses XML syntax and constructs and is relatively hard to understand. The other way is to represent the triples more directly, not in XML, e.g. in N3 or Turtle. This is far easier to understand. The difficulty in interpreting RDF/XML comes from the mismatch between the natural XML tree structure, and the essentially relational information encoded naturally by RDF.

A third approach relies on a relatively new strategy for representing RDF within HTML, called RDFa, which will be described below. Though this is only at the W3C Draft, rather than Recommendation, stage, it looks very promising, because it offers to mix the flexibility of HTML with the rigour of the Semantic Web in a single representation. This is the second approach to be explored here. RDFa is similar in some respects to Microformats, but is potentially more rigorous, and more in keeping with RDF.

However, not too much weight should be placed on file formats. While the output of an HTTP call is inevitably a stream of characters, and a stream of characters can always be seen as a file, that which is output is often not held in exactly the same form on the web host. It would probably suit applications supplying competence definitions to hold the material in some form of database, and communicate it with the help of appropriate web software delivering simple web services. Ideally, we need to find a model for holding the information such that it can be served up in any likely format.

Identifiers for the structures

The envisaged uses of the information, though not yet fully clear, are in the context of web-based services, and in these situations anything that is to be referred to needs an identifier, whatever format the information is given in.

There are many potential uses for competence definitions, including those envisaged for XCRI. All these uses need to refer unambiguously to the selected skills or competences, so unique identifiers need to be provided. For example, in XCRI, both prerequisites for a course, and intended learning outcomes for a course, can potentially refer to competence definitions, rather than only formal qualifications. Ideally, XCRI files should be able to refer to any particular competence definition by a unique identifier, rather than with a text label, which might well be ambiguous. The same need for unambiguous reference occurs in electronic portfolio systems and other learning technology.

URIs, and more specifically, http URLs are probably the most commonly accepted and useful kind of unique identifiers to be used in this way. It helps in the definition and understanding of what the URI refers to if it is a URL which “resolves to” a human-readable file. This could be an XML file together with stylesheet information (css or xsl), able to be read on a normal browser, or even more straightforwardly, the URL could resolve to an HTML file.

Identifiers for NOS units

How would URI identifiers for NOS units be established?

All NOS units are maintained by a Sector Skills Council (SSC) or other recognised standards setting body, nearly all of which have their own Internet domain. Some SSC's have collections of units which are not NOSs: these could also be handled the same way as their NOSs. Therefore, it makes sense for a URI for a NOS unit to have the domain name of sponsoring body. For all NOS units recognised by the Commission for Employment and Skills (in ukstandards.org) the ukstandards.org domain could alternatively be used. For example, Skillset's NOS Editing unit E1, “Identify and agree editing outcomes and process” could be given a location such as http://www.skillset.org/standards/units/E1 or possibly http://www.ukstandards.org/units/O52NE1 (these URLs, and the other example URLs given below, are imaginary, and do not currently resolve to anything.)

Identifiers for NOS statements

Allocating URI identifiers for statements is somewhat less straightforward, as statement items are not given formal codes within the NOS documentation. One approach would simply be to use the text of the item as part of the identifier. However, some items are quite long, with punctuation and formatting which would not suit use as identifiers. A better approach would be to assign a short identifier, unique within the unit.

Many units have numbers or letters by the items, which could be used in such identifiers. In cases where numbers or letters are not given, a number could be allocated in the obvious way, indicating the position that the statement came in the documentation. Thus, we could assign the identifier URL (as before, the example URLs here are currently imaginary, and do not resolve to anything) http://www.skillset.org/standards/units/E1/KSc to the item c) in the knowledge statement list, namely “what paperwork is needed, and how to acquire it”. This statement has the heading and rubric “KNOWLEDGE STATEMENTS this is what you must know” - the KS standing mnemonically for Knowledge Statement.

Similarly, we could assign the identifier URL http://www.skillset.org/standards/units/E1/PS3 to the performance statement “identify any unusual or innovative aspects of the production and note them clearly and accurately”. This statement has the heading and rubric “PERFORMANCE STATEMENTS this is what you must be able to do”

As the statements do not have fixed headings – some are under the heading of awareness rather than knowledge, etc. - the abbreviations used in the codes may vary from unit to unit. But in all cases it will be possible to assign a meaningful code in no more than say 5 characters which will be unique just within the unit. It would be neither necessary nor intended to make these short codes unique across units.

In this study, a temporary location is given to a few example pieces of NOS information. The actual SSC domains are not used.

Dated identifiers for version control

The above forms of URL do not have any explicit reference to date, and therefore are suitable for identifying most recent versions, but not different dated versions of the same structure. Practice in the W3C and elsewhere is to have dated versions as well as most recent versions. Take the RDF primer for an example. The most recent version is at http://www.w3.org/TR/rdf-primer/ which will change when any future version is released. On that same page there is the URL of the same document which will not change when future revision become available, http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ and the URL of a previous version, http://www.w3.org/TR/2003/PR-rdf-primer-20031215/ which would have been displayed at http://www.w3.org/TR/rdf-primer/ previously. Clearly the exact conventions can be chosen at will, but for instance if the current version of a unit was http://www.skillset.org/standards/units/current/E1 dated versions could be something like http://www.skillset.org/standards/units/2003/E1 with different year numbers for versions from different years. By itself, http://www.skillset.org/standards/units/E1 could either be set to refer to the current version, or more helpfully give a page with a list of all available versions as links. The year alone would normally be sufficient to distinguish a version of a unit, as it would be both unhelpful and impractical to produce versions more often than every few years – this is consistent with current practice.

Representation of structures – information model

Before discussing the actual representation of units and statements, the information that needs to be represented is clarified here.

NOS units

There are two obvious approaches to what to represent as a unit. Firstly, there could be a unit file containing a representation of everything that is part of the unit. Individual statement items could potentially then be referred to by fragment ID within that document. If a fragment ID is to have a clear referent, it would need to be an element that contains exactly the relevant material. The HTML “div” element would probably work, but better still might be the forthcoming “section” element, currently drafted for XHTML 2.

That would leave the question of what to do if statements are replicated across units. A cleaner approach, which could ensure that items were not duplicated unnecessarily, would be for the unit to be represented as just the things that belong exclusively to itself, and the statement items to be represented separately. This would also avoid any potential problems with using fragment identifiers. The relationships between the NOS unit and its statements could be represented separately. The representation of the unit would then contain

Statement items

The representation of a statement item by itself might then contain just

Relationships

The question of relationships is much more open than the question of types of resource. Starting with the relationships which are essential to any representation of NOS materials, we can look progressively more widely, encompassing relationships which are implicit in NOSs, and then other relationships that could usefully be represented.

Essential relationships

If we are to be able to represent statements separately from their units, it is essential that we can represent the relationship between a statement and units to which it belongs.

Articulation of resource and relationship files

Equally, if the descriptive material is to be represented separately from the relationships themselves, there must be an explicit link between the descriptive material and the relationship material.

One option is as presupposed above: having one or more links within the resource file to the relevant relationship files. This requires some convention to be laid out and agreed, as existing specs do not have a slot explicitly for this purpose. Including one or more links in this way would aid clarity, ease of processing, and would avoid relationships being lost.

Implicit relationships

In the Skillset materials, there are only a few consistent implicit relationships. One obvious consistent one is that of the place of a unit in a NOS set, though it must be remembered that there are common units that apply, for example, across the Skillset sets, possibly extending to some NOS sets from other SSCs.

Implicit relationships without the same consistency include those involving elements of units, where a unit has subdivisions which in turn comprise some of the statements for that unit.

The Skills for Health materials are more richly structured, and thus have more relationships implicit. The following facts have corresponding entries their NOS units.

All but the last of these need to refer to a location outwith the unit itself.

Other plausible relationships

Any glossary, key words, and scope statements could refer to a domain model or ontology of some kind. Within that model or ontology there could be many relationships of the kind commonly encountered in ontologies. Key words and concepts, as well as relating within the unit documentation, could also be referred out to larger dictionaries of concepts in the domain.

Units could play various roles with respect to externally-defined resources.

An important point here is that there is in principle no limit to the relationships in which a unit (or any other competence unit) can take part. The further into the future one looks, the harder it is to predict what might be useful for a unit to relate to, and which of these relationships would be useful to have as machine-processable. It would therefore make a great deal of sense to provide for an easily extensible framework for relationships. This would be in contrast to writing yet another complex XML schema like IMS LIP, where many relationships were built into the structure of the specification.

Metadata

For our purposes in this report, it is convenient to keep to a strict view of metadata as data about the records themselves. There are several reasons why one might wish to keep and display information, for example, relating to the authorship and editing of any materials to be communicated. Often this is done at the level of the electronic file or other self-contained object: operating systems tend to note date of last editing automatically in any case, and office software can typically record much more within the files stored.

As the definition of metadata tends to provoke disagreement, it may be wise to keep to a small model of the most commonly recorded strict metadata:

It may be useful to separate this metadata relating to statements from the metadata relating to units, as in this way one could indicate a revision of a statement without having to change a whole unit.

Metadata about relationships could be more problematic, as one can easily envisage adding and editing individual relationships independently of each other. It seems sensible to adopt a conservative position, and not to require any metadata for which there is no very clear need.

Using existing specifications for representation

Unless there are clear reasons against, it is generally a good idea to use existing standards and specifications rather than invent new ones. There are a few that are immediately relevant either to the competence domain or to generic relationships:

These will be briefly explored and explained here.

IEEE Reusable Competency Definition

RDC is closely modelled on the earlier IMS RDCEO. Essentially the two specs are the same. Discussion here will be in terms of RCD, but both specs are implied. For openness and ease of replication, RDCEO will be used in the examples.

RCD is a relatively simple spec, consisting of

Units

Units, as conceived above, would fit into this structure. Identifier, title and description fit straightforwardly.

It is tempting to use the Definition elements to try to replicate NOS structures within units. This idea is not supported here, as it would preclude effective reuse of NOS statements and other structures.

Instead, to be in accordance with the concepts promoted above, the metadata needs to include either the information about the relationships of that unit to other things, or the location of that relationship information. The option preferred and proposed here is to represent the relationship information separately, and to include in the metadata a reference to the location of the relationship information.

The definition element could still be used to represent other structures not discussed in this report. This might mean, for instance, that an RCD already doing duty in one context can be repurposed to function as a NOS unit, leaving the existing definition elements intact.

Statement items

Statement items could be treated similarly. The identifier is straightforward: the item identifier is represented by the RCD Identifier. But where does the text of the statement go? The RCD title is in fact long enough to hold any reasonable statement text, though there may be an issue with formatting, where the statement has significant format such as a bulleted list. If the title held the text, it might be seen as problematic to have the same text in the description, as that might undermine the definitive nature of the statement text. But if the statement text were held in the description, what would be in the (required) title? It would be unreasonable to expect SSCs and other bodies to produce new words to fit into the title slot. One solution would be to have a plain text version of the whole statement in the title, and the description to contain a marked-up version for display, using fragments of XHTML. However, XHTML in an RDC description would have to be included in a CDATA section, otherwise the markup would conflict with the RCD specification, which does not allow arbitrary markup at that point.

As in the case of units, one could have twin approaches to the use of metadata. The preferred approach would be to have the location of the relationship/ontology file in the metadata, along with any other essential information. The other approach would be to have all directly relevant relationship information in the metadata. This would include at least the identifier (which is also the location) of the parent unit, but could also include equivalences and other relationships.

The immediate context information is problematic, as it belongs inseparably with the statement itself. It is not really practical to represent all of the immediate context of a statement item inside in the title itself, because the syntax differs across the various NOS statements.

Hence we have the following.

Links to relationship information

RCD's metadata element seems reasonably adapted to the use of holding a pointer to the relational information. If RCDs are used for other purposes, the rest of the metadata can be kept in place. A way of doing this needs to be defined.

One idea for this would be have a self-contained tag that as closely as possible represented a triple, and where a URI was provided for the relation of the containing resource to the relationship information. This might be done in a way similar to RDFa, possibly using the attributes about, rel and content.

Another, possibly more straightforward approach, would be to use just a few 'lines' of normal RDF. As RDF is part of the whole approach proposed here, it would make sense to stay with using RDF rather than exploring alternatives such as LOM.

Further relationships

If it were required to represent other relationships, they would need to be put within the Metadata element, perhaps also in RDF form.

HR-XML Competency

The overall structure of the HR-XML competency structure has been described in previous work for XCRI. It is a much more complex specification than RCD. The main top-level components of an HR-XML competency are:

An HR-XML competency can also include other competencies as sub-competencies.

CompetencyEvidence and CompetencyWeight can be ruled out as not relevant for NOS purposes: Any evidence, including CompetencyEvidence, is about what can be presented as evidence for a particular person's competency. That is not relevant to defining the competency itself. Nor is the weight to do with the essence of the competency, but rather with relative importance of different competencies. Some kind of weight may be relevant to assessment, but that is normally done separately from the main NOS documentation. In any case, a more flexible way to specify weight would be separate from the main NOS definitions.

“Required” is also a field that essentially relates to the context, and thus not applicable to the pure competency definition per se.

TaxonomyId is intended to allow an HR-XML Competency to refer to one that has an established place in an existing taxonomy. In principle, this could be useful to link a definition with equivalent definitions in other places. However, because other mechanisms will be needed for other specs, and because HR-XML is not the only competency spec used, it may in practice be more appropriate to represent such links along with other relationships.

That leaves three directly useful components: “name” appears equivalent to Title in RCD; “description” is a concept shared with RCD; CompetencyId looks much the same as RCD Identifier. However, in HR-XML, “name” and “description” are bound as attributes rather than elements. This means that it is inconvenient, and against the intention of the spec, to make them long. Having the name as an attribute is tolerable, but there being no other place in the spec, the problem is what to do with a long description. The descriptive and explanatory material in Skills for Health NOS units can be quite long: 4000 characters simply counting pure text and spaces (i.e. with no markup) would not be at all unusual.

Units

The issue with units is mainly with the problem, already identified above, in representing all the associated text. Given the impossibility of putting the associated text into a description attribute, this would have to be in a separate file. Within an open standards-based approach, a natural choice would be to have it in HTML, and we already have some NOS materials as HTML, so perhaps the existing HTML could simply be reused for this purpose.

However, the idea of using HTML for a separate file calls into question the value of this approach as a whole. If one is going to use HTML as all, then why not stay with something close to the current documentary representation of the NOS materials, and use XHTML with RDFa to embed relational information? This will be described below.

Statement items

The issue with statements would again be with representing long fields. Whereas statements are not nearly as long as the ancillary material in some units, neither are they necessarily short. Following the approach for RCD would still risk having long and unwieldy strings inside a tag attribute, and any formatting – even plain markup – would be either lost or escaped in an unnatural way. In effect, there would be no possibility of a straightforward description attribute for statements within HR-XML.

The link to the relationship information and other relationships

There is no metadata element in HR-XML Competency, and the only place which seems even moderately adapted to holding either a pointer to the relational information, or relational information itself, is called the User Area. It may be that the User Area could be used in a way equivalent to RCD's metadata, but with any construct of this kind, it would feel strained to use something which is intended for implementation-specific information in order to represent something intended to be standard.

Overall evaluation of HR-XML

Though on the surface, the information model of HR-XML seems to be sufficiently similar to IEEE RCD and IMS RDCEO to allow the representation of sufficient information for the purposes in hand, a deeper look strongly suggests that the current version (2.5) of HR-XML is inappropriate for representing NOS material.

RDF and Turtle

Simply put, the “Resource Description Framework (RDF) is a framework for representing information in the Web.” (RDF Concepts) It could be seen as a general-purpose knowledge representation language.

There is a general consensus that RDF is most easily understood in terms of diagrammatic graphs, and the triples that are can be directly derived from those graphs. For RDF purposes, graph diagrams are represented simply using nodes and directed arcs. Each arc, with the nodes at each end of it, corresponds to an RDF triple of subject, predicate and object. The subject is the source node on the graph, the predicate is the arc between the nodes, and the object is the destination node on the graph.

RDF can be written in various ways. There are several syntaxes for writing triples, of which Turtle (Terse RDF Triple Language) is perhaps the most commonly used. Others are N-Triples and Notation 3. Alternatively, RDF can be expressed as XML, which is then known as RDF/XML. The two kinds of approach, triples and XML, do not mix. RDF/XML is notoriously impenetrable, and tends to obscure the essential simplicity of RDF.

Predicates in RDF are necessarily RDF URI references. One of the ideas behind RDF is to allow liberal reuse of URI references, so here various common sources will be reviewed, before proposing a few new ones which will be needed as predicates for the purposes of representing the information dealt with here.

RDF for relationships

There are a number of options for representation using RDF. The first option would be for an RDF file to hold information about relationships between a unit and its statements together, or perhaps an even broader range of relationships, for example between several units. There would need to be a link in each unit and statement to that RDF resource. Within the RDF file there would be in effect one or two triples for each relevant relationship. If OWL were used to define the nature of the relationships, it would be possible to use only one triple to relate a unit and a statement, but more generally two could be used. Intermediate structures might also be defined in the same file.

Exact equivalences of the unit or statement to ones defined elsewhere could be represented using the owl:sameAs relationship, within the same piece of RDF; non-equivalence with owl:differentFrom.

A second option would be to include the relationships relevant to the particular thing within the definition of that thing. Thus, the definition of a unit could include at least enough RDF to specify the relationships with its constituent statements, and the definition of a statement could include an RDF reference to its parent unit.

A challenge with the second option is in the representation of intermediate structures: in particular, the elements of a unit, where they occur. Which solution seems easy or natural depends greatly on the structures used. With a structure as open as HTML it can be envisaged easily. But many XML structures are ill-adapted for anything but straightforward use. Ideally, intermediate and superordinate structures should be defined separately.

Another consideration is whether relationships could be represented in a separate existing domain model or ontology, which might be beneficial in many ways. Such a domain model could naturally include the relationships described above: a file for the model could act as the file of relationships for NOS purposes. If there were such a domain model, or ontology, it would seem odd to have some parts of that kind of information represented with the unit or statement definitions rather than in the domain model file. Perhaps the answer lies just in not having fixed files, but in providing web services with different interfaces. One interface could return a file of a unit or statement including all its relevant relationships; a separate interface could return a file with only a link to a broader relationship or ontology, also available by a further web service.

External relationships in any approach could include equivalence, non-equivalence, satisfaction, contribution or support, and overlap,1 but could also be restricted to a subset of these. These are not explored further at this stage, but are mentioned again at the end of this document.

By way of review, the information that is envisaged here as being represented as part of relationship RDF is primarily:

The RDF can also hold information about other relationships:

RDFa

RDFa (also previously written “RDF/A”) is well described by the RDFa Primer.

RDFa does not define any HTML or XML elements, as it is intended to be compatible with XHTML 2.0, but instead works through attributes of XHTML elements. Defining new terms for use with RDFa does not require a new DTD or XML schema, but instead the same kind of resources as would be used to define new relationships for RDF.

In the RDFa Primer, examples are worked up from information that might be already placed on a web page, the resulting triples are listed. An alternative approach, which may be just as useful, if not more so, in this context, would be to start from the graph and the triples that are required, and to go on to consider how to represent that in RDFa.

RDFa is extremely flexible, in that there are an unlimited number of XHTML documents which could use RDFa to encode the same RDF triples. However, the principles are fixed, and once decisions have been taken on what, from an XHTML file, to represent as RDF, the approach is straightforward.

What matters for Semantic Web purposes is not the precise XHTML that is used, but the triples that are extractable. Given the appropriate predicate or property URIs, to be defined below, a possible way of adding RDFa to NOS HTML files becomes clear, in terms of the triples that should be extractable. (It is to be recalled that discussion of the “files” here may refer to what is returned as a result of a web service call, and not necessarily files stored on a web server.)

Topic Maps for relationships

Topic Maps is another commonly used knowledge representation format, with an expressive power very similar to that of RDF. A topic map would represent all the units, statement items and any other structures such as elements as topics in the topic map, and the relationships between them as associations. The topic map itself would be addressable by URL, and the topic map URL would be given in all the relevant unit and statement files. Within the topic map, each unit and statement item would be associated with the “published subject identifier” URL which would be identical to the URL defined for the object, as in the scheme suggested above; an http call to that URL would resolve into the XML file for that object. External equivalences and other relationships could also be represented in a topic map.

Note on XHTML versions

In March 2007 the W3C HTML working group announced its intention to bring out a new version of HTML, with XML and non-XML variants, by 2010. Currently the latest recommended version of XHTML is XHTML 1.0, while XHTML 1.1 is still in draft. XHTML 1.1 is supported on the W3C Validator, whereas the more speculative work on XHTML 2, many of whose ideas are used by RDFa, does yet validate on that validator. It would appear that the draft XHTML 2 work is to be merged in with the currently restarted process. The new work is to include consideration of RDFa, so we can look forward to a validation service in the future which will validate XHTML complete with RDFa additions, which it does not at present. For further background, there is a useful article on XHTML 2 from June 2007.

However, in January 2008, the HTML Working Group published a new Working Draft of HTML 5, in which it is stated that HTML 5 is intended to take over from HTML 4 and XHTML 1, but not completely to take over the XHTML 2 proposals. There is a nice commentary on the choice on the xhtml.com web site. It is fully expected that RDFa will work within HTML 5.

Vocabulary sources and choices

RDF and RDFa need relationship definitions, and just as it is a good idea to use existing specifications where possible, it is also good to look for existing relationship definitions to reuse, rather than reinvent.

Dublin Core – DC

The Dublin Core DCMI Metadata Terms include many metadata relationships relevant to resources such as published works. The majority of these terms could well be used within an IEEE RCD definition or elsewhere, to indicate various normal properties of the definition, including author, dates, etc. These are not the focus of this present work, but could easily used in passing in examples. For the current purposes of defining relationships, the following two terms may be of relevance.

The namespace prefix “dcterms:” is the normally used in XML files to stand for the namespace http://purl.org/dc/terms/. In XML files, it would normally be introduced with the namespace declaration xmlns:dcterms="http://purl.org/dc/terms/".

RDF & RDFS

Within the RDF documentation there is a substantial list of relationships that are potentially relevant to this work. As with Dublin Core, there are just one or two of immediate relevance.

The prefix “rdfs:” is used for the namespace http://www.w3.org/2000/01/rdf-schema# and the prefix “rdf:” is used for http://www.w3.org/1999/02/22-rdf-syntax-ns#.

OWL

OWL, the Web Ontology Language, is conceptually built on top of RDF.

The prefix “owl:” is used for http://www.w3.org/2002/07/owl#

The OWL reference material contains two relationships which could be relevant to this work.

This could be particularly useful in a setting where information is distributed, as is envisaged by the whole Semantic Web project. A single assertion by a domain A that two resources in domain A and domain B were “owl:sameAs” would not by itself be reliable. The other domain, B, can resolve the issue, however, by publishing information amounting to a triple with relationship either “owl:sameAs” or “owl:differentFrom” between the same two resources.

Other vocabulary sources

While the above sources are widely used and respected, there are many others that may be less so. What matters in each case is that a sound set of terms are consistently used, as near as possible exclusive and exhaustive of the requirements identified. For each needed vocabulary item (or predicate) one would be looking for:

Other sources that come to mind include:

Structures to support web services

As mentioned above, it is not necessarily envisaged that files are held in the formats outlined above, designed for interoperability. More likely, the information will be held in some kind of database: an http or other call will invoke a web service, delivering the information in the required form to the user's browser. This approach fits in well with the strategy for giving URL identifiers, as above, without any filename extension such as .xml or .html.

XML-like structures

To output XML, one needs to have available the information which is represented by the XML schema or DTD. Following through the discussion in this document, the necessary structures to represent the units and statement items are described in the section, “Representation of Structures: information model”. It is not difficult to imagine storing such information in ordinary database tables.

RDF-like structures

As explained above, RDF is in essence composed of triples. These are very simple structures indeed, and pose no particular problems in their storage. A “triple store” is a common feature in Semantic Web discussions. It may be useful to have other fields stored alongside subject predicate and object in a triple table. RDF in any form can be generated straightforwardly from the information underlying triples.

XHTML with RDFa

Using XHTML complete with RDFa markup poses more challenges. One approach would be just to store the XHTML with RDFa as it is. If RDF were wanted, it could be extracted using an RDFa extractor such as the W3C's RDFa Distiller. It could also be arranged to have XML extractable from such an XHTML file, perhaps through XSLT. But this approach of storing a file as a whole is not much compatible with web services that can also generate other forms of output. What is really needed is a method of generating the XHTML with RDFa from more stable and reusable components which can be stored in an ordinary database.

The following section contains speculative but plausible ideas on graphical RDFa construction, and are explained with the help of a conceptual design for a user interface tool to construct XHTML and RDF on the basis of HTML with a graphical graph tool. In essence, what is needed in the database are the RDF triples, the XHTML, and a way of referring to XHTML fragments from within triples.

Work required or envisaged

Graphical RDFa construction scenario

What is needed, to make the XHTML and RDFa approach work, is to locate and record the places in the XHTML relevant to the RDF. For constructing static files, this could be done by hand, as illustrated in the RDFa documentation. But looking forward, it would much more appealing to have a graphical tool that could facilitate not only just the writing of XHTML with RDFa embedded in it, but also the storage of information (within a database) from which the XHTML with RDFa can be generated, as an alternative to XML and RDF separately.

The scenario given here, of adding RDF to an XHTML file as RDFa, firstly illustrates the process of creating documentation that will serve the purposes discussed; secondly illustrates a possible tool; and thirdly clarifies what would be needed in a database from which to output XHTML and RDF as part of some web service software.

At the start of the scenario, we assume that we have:

For the purposes of this scenario, we will assume that there is just one expert in whom the knowledge is vested, who is driving the tool.

The expert starts by using their knowledge of the desired relationships (which may also be given in the materials themselves) to construct a graph, connecting nodes, which are the relevant competences and related items, with arcs, which represent relationships from the vocabulary. There are several windows in this tool, with different representations of the information, which can be hidden at will.

As the pointer moves over each object in any window, the same thing presented in different ways in different windows is highlighted. Moving over a triple highlights subject, predicate and object at the same time, in the other relevant windows.

To support this display, the software is holding :

With triples for which the object is a literal, where that literal exists within the XHTML, the triple does not hold the literal itself, but an internal address of the appropriate fragment identifier in the XHTML.

Details of the node resources and predicates can be fetched on demand across the web for any display function: these are cached for fast display.

To generate a new triple, a subject, a predicate and an object could be selected with mouse operations. A subject would be selected by clicking on the appropriate object icon from the window with the list of relevant resources, or on an existing graphical node. A predicates would be selected from the window with the list of allowed predicates. If they are literals, objects are chosen by sweeping out a fragment from the XHTML display. In basic sweeping mode, only existing addressable units of XHTML are highlighted (in the same way that text selection often prioritises word boundaries). If the text required as the literal object is not in an existing self-contained fragment, a “new selection” mode allows any portion of the display to be selected, and when selected, new <span> tags are added to the XHTML around the selection. Objects that are URIs are selected from the window giving the list of possible resources, which not only contains resources held in the competency definition system, but also includes link URIs. If a link URI occurs more than once in the XHTML, information needs to be saved on which link is to be used for that triple: the context of the link text may mean that one is more appropriate than the others. One can fill in this outline with added functions, menus, etc., but the functionality described covers what is essential for this tool's core operations.

When the software is called upon by a web service to output something, what happens depends on the output required.

Defining new predicates for required relationships

New URIs need to be coined in order to define new relationship predicates, in cases where existing ones are not adequate. As an example illustration for this report, an HTML file of definitions of the terms is given, with fragment identifiers serving the dual role of locating definition information about the new predicates, and acting as RDF URI identifiers. These identifiers serve equally for the standard RDF and for the RDFa. The filename would act as the namespace identifier. Later on, it should be possible to have something that looks more like a proper namespace identifier, in particular, without a .html suffix, which can be used in a similar way to the Dublin Core namespaces.

As the RDFa approach does not use RCD, HR-XML or similar, new RDF predicates (represented by RDF URIs) will have to cover the functions which would be served by tags in those specifications. In particular, new predicate URIs will be needed for any extra relationships, and class URIs would be useful to define the types of entity that we are talking about, and to link in with any ontology work.

Ontologies

Beyond the very limited ontological proposals contained immediately here, there is more to do along the lines of a “upper” ontology of the space of skill, competency and assessment. Other work is gently progressing towards this end.

The other aspect of ontology would be domain ontologies. As noted in the previous document, domain models or ontologies could play a very useful part in helping in the consistent construction and maintenance of NOSs and other web-based documentation relevant to sector needs.

Example files

The following files, including examples illustrating these approaches, are to be found (for the time being) through: http://www.simongrant.org/inst/clients/jisc/ioNW2/

Discussion of other possibly desirable structures

Representation of a group of units

If it were wanted to use plain RDF, it would not be too difficult to represent just about any knowledge structure, as that is what RDF was intended to do. The abstract structures can easily enough be represented in RDF, but still there is a need to inform people by displaying the information in a comprehensible form. There seem no reason against, and many reasons for, extending the RDFa approach to representing sets or groups of units, and the assessment structures that naturally go alongside them.

Representing a simple set of units is relatively easy. A 'file' for the set of units can refer to individual units in much the same way as a unit refers to statements. However, the usefulness of representing a particular set of units would depend on why anyone would want to teach, or to assess, or just record, that particular combination of units. Whatever sets of units are chosen for explicit representation, it is still necessary to represent each unit separately. Structure above the level of the unit would aid the navigation and discovery of individual units.

Of course, many authorities have done this kind of exercise, including the NHS, through their Knowledge and Skills Framework. The same kind of exercise can be seen in things like the National Curriculum. But one person's, or one body's, way of categorising and grouping skills will depend on their own purposes, as well as their more general outlook and conceptual framework. It is reasonable to assume that there is no one best general-purpose hierarchy or categorisation scheme for units.

Representation of unit substructure including unit elements

The units represented in the examples associated with this work do not have an element structure – that is, distinct subdivisions of a unit above the level of the statement. Elements of a unit (as explained in the previous document) have a name, a code, and contain some of the statements belonging to a unit. There would be no point at all in having a unit with one element: when they are present there are two or more of them. Between two and four elements seems common among units that have element at all. Where these elements exist, the question of how best to represent the structure within this approach described here depends on a number of factors.

Whether it is worth representing them as machine-processable structures, rather than just layout, depends firstly on whether there are any practical uses for the structures other than for the grouping together and display of statements in a unit. Are they used commonly independently of other elements in the same unit? The usage of the elements as written down could be in training or assessment; the usage of the skills represented by elements would be in actual occupations. Are there qualifications which require competence in one element but not in the other elements in the same unit? Are there instances where an occupation requires the skills represented by one element, without needing the skills represented by the other elements in the same unit?

If not, one may conclude that elements are essentially presentational, and would be better left out of the Semantic Web effort. On the other hand, if they are frequently or commonly used separately, there would be a case to be made out for regarding the elements as actually the effective units. The previous units would then be better seen as groupings of the old elements.

Looking outward: the wider relating of definitions

In this study, we have looked mainly at representing what is currently present in the NOS materials. Even within existing materials, there are several cases of various kinds where the same unit or the same statement is duplicated for one reason or another, and the suggestion has been to use the “owl:sameAs” predicate for this.

But this is only the beginning of the task of relating such definitions together. The practical point is that, from a human point of view, many people and organisations want, and will continue to want, their own representation and structuring of the skills that are significant to them, irrespective of how close they might be to those of others. The Semantic Web, of itself, does nothing to halt this, beyond toothless exhortations to use predicates that others have previously defined.

But when it comes to skills, it is actually in many people's interests to be able to relate various definitions together. It is only when this is done that the goal of XCRI and others will start to be achieved, by allowing a “common currency” covering learning outcomes, course prerequisites, person specifications, and professional development objectives, to name the most obvious. How then can we join up existing definitions, as well as the inevitable future ones that will continue to be coined?

For practical purposes, a concept of strict or formal equivalence is of limited value. What would be much more useful would be a concept of practical or operational equivalence, for whatever purposes which people envisage when they set out the definitions in the first place. Thus, when one educational institution decides to allow students in from other institutions, what matters is whether, in their judgement, the learning outcomes attained by students in the other institution are operationally equivalent to the ones attained in their own institution. It does not matter how those outcomes are formulated: what matters is simply the actual level of skill acquired. The same is true in many other settings, mutatis mutandis. The point is that ability is a reality, not simply a matter of formal definition.

How could this be done? As discussed, the predicates “owl:sameAs” and “owl:differentFrom” seem better adapted for the noting of duplication, rather than operational equivalence. In any case, as well as equivalence, there are a few other relational concepts which would be of great assistance in weaving a Semantic Web of competence. Here are some outline ideas.

Equivalence

The rationale for this has already been covered: an operational equivalence predicate would be the bedrock of interoperability of skill and competence definitions. To be solidly reliable, operational equivalence should be noted by both parties, otherwise there is no guarantee against unsupportable claims of equivalence from bodies of dubious reputation who try to claim more than is there. Evidence supporting the likelihood of spurious claims of equivalence is already visible in the many spam e-mails suggesting that getting a higher degree is fundamentally a matter of paying money. Those degrees are not equivalent, and an explicit disclaimer would be very useful, in the form of a predicate indicating operational non-equivalence.

Satisfaction

What usually happens with entry requirements is that a certain level of attainment is specified, but any higher level is also acceptable. Whereas it is relatively easy to note that a higher grade of the same qualification indicates more knowledge at least, it is not so obvious whether a foreign qualification covers the same ground as a native one. Leaving aside the ongoing initiatives to harmonise qualifications into international frameworks (whatever their merits) it remains up to any admission authority to make the judgement that a particular competency defined and assessed elsewhere satisfies some entry requirement couched in terms of their home-defined competences. One would expect this kind of relationship to be transitive: if A satisfies B, and B satisfies C, then one can assume that A satisfies C.

Contribution or support

Perhaps only slightly less useful would be the ability to note which lesser competences, either in one's own domain or another, support or contribute materially towards some greater competence. This would be useful for people wanting to know if some course or experience, promising limited competence outcomes, is useful in the context of their wider goals.

At present, the main way of indicating this kind of support or contribution is internally in a course or progression scheme, by explicitly defining the parts of a competence, in the same way that statements are part of a NOS unit. In that case, the constituent parts largely define the whole. But where assessment is more fluid, it may be that there are few if any clearly defined constituent parts. Noting what would contribute to the competence would still be useful.

Overlap

The loosest relationship to be proposed here is this one of overlap between competences. Little direct use could be made of such relationships, because there is no generally applicable measure of the degree of overlap between two competences that overlap. Any measure would depend on the weighting of the various parts of both entities.

However, noting overlap plays another role, which is to invite further analysis. Where two competences overlap, both responsible bodies could at some point decide to analyse their competences into smaller parts, in the hope that some of those parts would be directly equivalent, or clearly separate.

Use in practice

Using these new predicates, which remain to be formally defined or given URIs, would be a simple extension of the approaches suggested earlier. With the pure RDF approach to representing relationships, the extension is trivial. One simply adds RDF amounting to new triples using the new predicates.

For the XHTML/RDFa approach, it is not difficult to imagine that on a web page devoted to any particular skill definition, there could be sections, perhaps at the end, with links to other related definitions. RDFa attributes could easily be added to such links to amount to exactly the same as in the pure RDF approach.

Definitions of operational equivalence and other relationships would allow people to link together competencies, knowledge, skill, or for that matter any definitions, in their own ways. This would convert what is currently the semantic “firmament” into a real Semantic Web. (The word “firmament” is used poetically to indicate the stars, where the many individual points are only joined together in the imagination, not by visible lines.)

Conclusion

The suggested ideal architecture for representing competency definitions in the future involves the representation of textual materials primarily for human reading, together with Semantic Web structures for machine processing, particularly of any relationships where there is a likelihood of web services involving those relationships.

It is not difficult to imagine defining web services for this functionality, though it has not been worked out for this report.

In a system to manage the construction and the serving of such definitions in several forms, it is envisaged that a database would hold marked up XHTML, triples, and information to relate the triples to the appropriate places in the XHTML.

Ideally, the information relating just to one unit, or to one statement, could be served as an XHTML file with added RDFa. This would enable a single URL to be used for human reading and machine processing.

If this is not yet practical, a fallback position, relatively easy to implement, would involve an interoperability specification comprising XML files with stylesheet information for units and statements, alongside RDF files for the relationships.

The great strength of both of these approaches is that the scope of the relationship information can be extended at will, simply by adding extra relationship predicates to the list of allowed predicates.

Footnote

1 http://www.simongrant.org/pubs/TENComp/Manchester2007.html

page maintained by and © Simon Grant, edition 2008-06-05