©1990, 1995 section list 2: Literature overview General Contents
Section 2.2 2.3 Detailed review subsections Section 2.4

2.3 A more detailed review of exemplary literature

The review here is not intended to be exhaustively comprehensive. Rather, the object is to review major exponents of the different approaches to mental modelling. In this section are covered:

2.3.1 Decomposition formalisms

The research detailed here corresponds to the class of formalisable models (§2.1.3) and also to those models that have the purpose of being tools for prediction and control.

Clearly, these authors are attempting to construct a formalism that fits with the structure of human cognition. If these formalisms were to provide the basis for a cognitive task analysis, it would be on the grounds that the analysis is in similar terms to the analysis which we might presume is done in the human who is performing the task.

2.3.1.1 The GOMS family of models

The analysis given by Card, Moran & Newell [20] is based around a model of task performance, referred to as GOMS, which stands for Goals, Operators, Methods and Selection rules. The Goals are those that the user is presumed to have in mind when doing the task---the authors see no need for argument here, since the user's presumed goals seem obviously reasonable. The Operators are at the opposite end of the analysis from the Goals: they are the elementary acts which the user performs (not to be confused with the human `operator' who controls a process). The Methods are the established means of achieving a Goal in terms of subgoals or elementary Operators. When there is more than one possible Method for the achievement of a certain Goal, a Selection rule is brought into play to choose between them. The acts are assumed to be sequential, and the authors explicitly rule out consideration of the possibilities of parallelism in actions. They also see the control structure in their model as a deliberate approximation, being more restricted in scope than the production system of Newell & Simon [91].

What is considered elementary for the purposes of the analysis is to some extent arbitrary. The authors give examples of different analyses of the same text-editing system using different classes of elements: possible elementary units range from the course grain of taking each editing task as an Operator, to the fine grain of the keystroke as the Operator. At each level, the times to perform each elementary unit operation should lie generally within a certain band [83]. For the course grain, this would be at least several seconds; for the fine grain keystroke units it would be a fraction of a second.

By calculating times for each Operator, from experiment, and allowing time necessary for mental workings, we could, in principle, use the GOMS model to make a prediction about the time necessary for a user to perform any particular task, for instance a benchmark task to be compared between different systems. Thus we could have a prediction of the relative practical speeds of various systems. The success of the predictions would depend on the validity of the simplifying assumptions for the studied task, including the choice of level or grain of analysis.

The area of application chosen for developing GOMS was text editing (see the quotation given above, § 2.1.3). It could be that much of what they say, and the approximations they use, are appropriate to text editing but not to quite different kinds of task such as riding a bicycle on one hand, and controlling a complex chemical works on the other. It is difficult to imagine a GOMS analysis of bicycle riding. We could imagine Goals without too much difficulty: at the highest level, to get from A to B, and at an intermediate level, for example, to stay upright while travelling in a straight line. At the lowest level, comparable to the grain of keystrokes, the Operators may be to turn the handlebars left or right; but what could the Methods be to connect those Goals and Operators? For the different example of a complex plant, any procedures that one could explicitly teach a human operator could probably be expressed in the form of GOMS. But it is a well known fact [100] that humans do not reach full competence by formal instruction alone. After substantial experience, they develop what is known as `process feel', which is often considered as beyond the reach of usual methods of analysis. How could this be represented in the GOMS model? And how would GOMS represent the many exceptions, anomalous states, emergency procedures, and unenvisaged states of the system? To be fair to the authors, they do not suggest that GOMS would be a suitable model for these various control tasks. However, it is possible to cast doubt even on the GOMS analysis of text editing. This will be done below, §2.6.1.

The GOMS model in general is a scientist's model, but a particular GOMS model of a task would be the analyst's model. It is not so clear, however, exactly what GOMS is trying to model. It is not intended to be an accurate model of the user's mental processes. Rather, it is an idealised model, which falls somewhere between a putative model of the user's mental processes and a model of the analyst's, or designer's, understanding of the task. Thus we can see that GOMS does not fit into the Norman/Streitz analysis (owner and object, § 2.2.1) very easily. But analysis in terms of purpose is much clearer: the function of the GOMS approach is to enable designers or analysts to produce models, the purpose of which is to provide the designer with comparisons of the performance of systems which have, at least in outline, been designed. It is limited in its accuracy by the simplifying assumptions which have been made, which also limit its applicability.

The Keystroke-Level Model (KLM)
The KLM, although often referred to separately, is a simplified special case of the GOMS family, namely an analysis with keystrokes as Operators. Card, Moran & Newell introduce it as a practical design tool (as opposed to GOMS in general?), to predict the time that an expert would take on a certain task, if performed without error, given: a task; the command language; parameters for the user's motor skill and the system's response; and the method. This means that the KLM does not predict the method---it has no Selection rules; and it only predicts execution time, not time spent in task acquisition. As they point out, task acquisition time is highly variable, depending on the type of task. For text editing, they assume 2--3 seconds, but for creative writing, it could be a many orders of magnitude longer. (In the case of Ph. D. students, perhaps the most convenient unit in which to measure task acquisition time would be months, so the KLM would not be much use in predicting the time to write a thesis.) The treatment of mental operations is even simpler in the KLM than it is in GOMS. An average of 1.35 seconds of thinking time is allowed in various places, decided on by `heuristic' rules-of-thumb.

As with GOMS, the KLM deals with error-free operation. The explanation of errors is an important goal in the modelling of the control of complex systems.

It should be clear from this discussion that the KLM is suited to relatively routine tasks involving interaction with a computer system via a keyboard, and it is not suited to the analysis and design of the HCI aspect of the supervisory control of complex dynamic systems.

2.3.1.2 Command Language Grammar (CLG)

This is a development by Moran based on ideas in the GOMS model [83]. Moran recognises that the model that a designer has when designing a system will determine the one that the user will have to learn, and therefore it would be a good idea if the designer had a clear and consistent model in mind when designing. The purpose of Moran's CLG formalism is to ensure that the designer has a framework round which to design. The design is done (generally) on four levels: the Task Level, the Semantic Level, the Syntactic Level, and the Interaction Level. Moran gives guidelines, and an example (a simple mail system), for how to do this.

Moran identifies three important views of CLG. The linguistic view is that CLG articulates the structure of command language systems, and generates possible languages. This explains the G of CLG. It may be that the linguistic view is of most interest to HCI researchers and theorists.

In the psychological view, CLG models the user's knowledge of a system. This assumes that the user's knowledge is layered in the same way as CLG. Moran suggests ways of testing whether a CLG is like a user's knowledge, but he does not give ways of testing the detailed structure of the knowledge, nor whether the representation is the same in both user and CLG. He has a clear idea that it is the designer's model that should be able to be assimilated by the user, hence the designer should be careful to make, and present, a clear and coherent model. But concentrating on this idea neglects discussion of the real possibility that users may develop their own independent models of a system, which may not be describable in the same formalism. We might fairly say that viewed psychologically, CLG makes another speculative attempt to introduce a theory explanatory of aspects of human cognition. It is hard to identify any success in this endeavour above that which is achieved by other psychological theories, and other models mentioned in the study in hand.

In the design view, CLG helps the designer to generate and evaluate alternative designs, but does not claim to constitute a complete design methodology. It could aid the generation of designs by giving an ordered structure to the detailed design tasks, and Moran suggests that CLG could provide measures for comparing designs, addressing efficiency, optimality, memory load, errors and learning.

Sharratt [122, 123] describes an experiment in which CLG was used by a number of postgraduates to design a transport timetabling system. The study shows the wide variation in designs produced, and although it was not the object of the study, this shows that CLG does not effectively guide a designer to any standard optimal design. Sharratt evaluated the designs with three metrics, for complexity, optimality and error, which metrics were developed from Moran's own suggestions. Sharratt also gives ideas on extending CLG to help with its use in an iterative design process. Sharratt notes difficulties with the use of CLG, and if such difficulties arise even in areas of design such as an interactive mail system or a transport scheduling system, we have all the less reason for supposing that CLG could substantially help with the design of decision aids for complex systems.

We may raise one further question, on top of those already asked by Moran himself and by Sharratt. That is, do we know whether Moran's level structure is correct? Is this the right way to go about dividing the specification? In the absence of any arguments, we must say that we do not know. It may well be that there are other possible ways of analysing a system into different levels, and a different system of levels would give rise to possibly quite different analyses or designs. The most important point to be made here, however, is that it is presumptuous to suppose that the levels given here actually correspond in all or most cases with similar levels in human representations. We can imagine, or perhaps some have experience of, design based on other characterisations of level. Would designers all feel that one particular level model is natural and all the others artificial? So as far as the psychological view goes, we have to say no more than that CLG gives a guess at a possible level framework for human knowledge about a system, and that this guess has no more empirical support than alternative ones.

It may be that there is some definite and constant basis for human cognition in the context of interactive systems, in which case we need to find it and make it the basis of the models of the user's knowledge that underlie HCI tools. If there is no such basis, we would need to find a more flexible approach to modelling and to design than is offered by CLG, or similar techniques.

2.3.1.3 Cognitive Complexity Theory

The stated aim of Kieras & Polson's Cognitive Complexity Theory [66, 12] is to provide a framework for modelling the complexity of a system from the point of view of the user, out of which should come measures able to predict the `usability' of that system. We may speculate that if this were effective and efficient, it would be potentially useful to the systems designer, by providing comparisons between different possible designs. But it is not intended to provide performance predictions directly, in the way that GOMS and the KLM can.

The achievement of this aim is via the provision of two formalisms, intended to interact with each other: one for representing the user's knowledge of how to operate the device (the ``job-task representation''), and another for representing the device itself, in the form of a generalized transition network. The first formalism allows a designer to create a model of a user's understanding of a task in context. The second is a representation of the system from a technical point of view.

The formalism for the user's job-task representation is based on the concept of the production system (as in [91]). Although the authors cite GOMS as the precursor to their work, they do not follow GOMS in deliberately simplifying the full production system formalism, which is a general purpose architecture of great power. It may well be that the production system architecture is sufficiently powerful to simulate human cognitive capability in many fields (the aim of some AI research), but to model cognitive complexity for many purposes (such as modelling errors) there needs to be a correspondence between the model execution and the ways in which humans make use of their knowledge (a process model). They do not give any clear argument supporting the inherent suitability of the production system for modelling the appropriate aspects of the user's knowledge, nor do they offer any restrictions which bring it more in line with the capabilities of the human. They consider it sufficient justification to refer to other research which has used, or reviewed the use of, production system models.

Because there are no inherent restrictions in using a production system, and because of the potential variability of human ways of performing a task, formalising task knowledge in this way seems to be closer to an exercise in ingenuity (with relatively arbitrary results) than a means of faithfully reproducing actual human thought processes. Hence the doubt about whether the computed values of cognitive complexity bear any necessary relationship to the actual difficulties experienced by users.

These authors also chose text processing as a field of study. Although the task of writing is very complex, text processing (i.e., conversion of words in the mind to words on the machine) does not take very long to learn, and does not afford a great deal of variability in the strategies that one can adopt. One can imagine a text processor with a very limited repertoire, where an extensive logical structure is a necessary consequence of the structure of the computer program running the text processor. In this case, it is also easy to imagine a full analysis of error-free text processing skill, that would not vary between different people. Hence it would be plausible to put forward a production system account of the skill. But even if a production system account is valid for such a well-defined task, we cannot extrapolate this validity to complex tasks such as industrial process control.

2.3.1.4 Task-Action Grammars (TAG)

The idea of a grammar model of a task is that it is possible to represent tasks, or complex commands, in terms of the basic underlying actions necessary to perform those tasks, and this can be done in a way that parallels the building up of sentences from words, or the building up of programs from elementary commands in a programming language. Grammars which describe the structure of programming languages have often been formalised in Backus-Naur Form (BNF), which is a relatively straightforward formalism.

When a grammatical model has been made of a `language', two measures are of interest: the total number of rules in the grammar is some measure of complexity of the language, and therefore potentially related to the difficulty of learning it; and the number of rules employed in the construction (or analysis) of a particular statement, command, or whatever, is a measure of the difficulty of executing that command, and therefore potentially related to the difficulty of comprehending it.

Green, Schiele & Payne [45] argue convincingly that representing task languages in terms of BNF (as in [108]) must miss something of human comprehension, because in many cases the measures of complexity in BNF do not tally with experimentally derived or intuitive human ideas of complexity. Payne & Green's Task-Action Grammars [96] set out to provide a formalism in terms of which simple measures correspond more closely with actual psychological complexity.

The formalism is a scientist's model of a possible way in which people might represent tasks. An instantiation of the model would presumably be a designer's, or analyst's, model of how a typical user would structure a task internally. The designer makes this model by considering important abstract `features' of the task (i.e., dimensions or attributes on which there are distinct differences between commands), which are assumed to be apparent to a user, and then formalising the task language to take account of those features, by enabling rules to be written in terms of the features, as well as in terms of instantiated feature values. The simplest example of this that Payne & Green give concerns cursor movement. Instead of having to have independent rules for moving a cursor forwards or backwards by different amounts, TAG allows the rule

Task[Direction, Unit] --> symbol[Direction] + letter[Unit]
provided that the available actions support such a generalisation. An important point made by these authors is that the consistency that allows such schematic rules makes command languages easier to understand and learn, compared to languages whose inconsistency does not allow the formation of such general rules.

Reisner [109] has recently pointed out that Payne & Green's notion of consistency is not as straightforward as perhaps they would like. There is usually room for various views of consistency, and what matters in system design is whether or not the designer and the user share the same view of consistency. Only then will a task's consistency make learning easier.

This is a point to which empirical data would be relevant, in that if it could be shown that a large majority of users shared one particular view of consistency in a particular task, then designers should design to it and analysts analyse in terms of it. But Payne & Green themselves admit a lack of such data on perceived task structure (perceived by users or operators, that is), and therefore the way in which a task is formalised is up to the judgement of the analyst. Here again we have the problem that the formalism is of such a power as to permit varying solutions. How are we to know whether a particular formalisation is the one that most closely corresponds to a practical human view of the task? Only, it would seem, by experiment, and that would make it impossible to use TAG with any confidence for a system that had not at least had a prototype built.

An alternative view would be that there is some ideal view of consistency, in which the grammar would represent the task in the most compact form. (Compactness is also a guiding principle in many machine learning studies. See, for example, Muggleton [87].) Users could then be encouraged to adopt this view. The difficulty for this idealist notion is that there is no general method of proving that any particular formalisation is the most compact possible. Payne has accepted this as a valid critique of TAG [personal communication].

It is to be expected, as for other formalisms, that for inherently well-structured tasks, the representation is fairly self-evident, and therefore a guess at an appropriate formalisation may well be near enough to get reasonable results. But as previously, there is plenty of room to doubt whether using this method in modelling the control of complex, dynamic systems could be helpful to the systems designer.

2.3.1.5 General points about formalisms

No doubt it is already clear that formalisms such as those described above may work reasonably in the analysis of simpler systems, and for systems where the tasks have no latitude for variability. Currently there are no generally known examples of such analysis being performed on a complex system. We may perhaps take this as an indication that the formalisms reviewed are not well suited to the analysis of complex systems, but we cannot be certain about this until either someone does perform one of these analyses of a complex system, or a substantially different type of analysis is shown to be superior for complex systems.

2.3.2 Models of cognition

The next grouping of literature corresponds to the more general mental models of §2.1.5, and in terms of purpose, to those models intended to aid understanding. In this group, there are theories and models of human cognition, which specify the units and structure of supposed human cognitive methods and resources. This could be characterised as the analysis of (internal) cognitive tasks, which does not, of itself, specify how to analyse and map external tasks into these internal units, but obviously asks to be complemented by such a mapping.

Providing a model cognitive architecture does not in itself present a technique for cognitive task analysis, because there are other necessary aspects of a mental model. But if one wishes to perform such an analysis, a cognitive model will define the terms in which the task has to be analysed. A useful model here would be one which dealt with the aspects of cognition most relevant to interacting with complex systems.

2.3.2.1 The Model Human Processor (MHP)

The first `framework' architecture to consider is the MHP of Card, Moran & Newell [20]. This is not closely related to GOMS or the KLM, despite appearing in the same book. The authors attempt to bring together many results from cognitive psychology which they see as relevant to HCI design. The mind is seen as being made up of memories and processors, which have parameters for (memory) storage capacity, decay time and code type, and (processor) cycle time. They give estimates for general values of these parameters. Thrown in with these values are a number of other general principles (e.g., Fitts's Law, and the power law of practice). Taken together, what these parameters tell us is clearly more relevant to short-term simple tasks (and laboratory experiments) than to longer-term subtler cognitive abilities involving problem-solving or decision-making.

Card, Moran & Newell give several examples of the kind of question which could be answered with the help of the MHP. The questions mostly are about how quickly things can be done (reading, pushing buttons or keys, etc.) in different circumstances. There are no examples of applying the MHP to problem-solving or decision-making.

What the MHP does in terms of task analysis is essentially to set bounds on what is feasible for a human to do (cognitively). Thus a matching analysis would have to show what items were in what memories at what different times, and to take account of the times required for motor actions. What the MHP does not do is to set a limit on depth or complexity of information processing, nor to other values which may be of interest in analysing complex control tasks.

2.3.2.2 Programmable User Models (PUMs)

The PUMs idea described by Young, Green & Simon [151] potentially takes the modelling of cognitive processes much further, though its implementation is still thought to be several years in the future. That idea is to represent the user of a system by a program, interacting with another program that represents the system. The purpose of a PUM is to benefit a designer in two ways: firstly, by making the designer think rigorously about how the system requires the user to interact; secondly, if such a program were ever constructed, by enabling predictions of users' competence and performance on a given system in the same kind of way as other analytical methods, but with improved accuracy because of the closer matching of mental processes by PUMs than by simpler formalisms.

What language would the program be written in? How would knowledge be represented and manipulated in that language? They give no definitive answers. The cited paper and a paper on a related concept by Runciman & Hammond [117] suggest progress towards answers by considering fundamental facts about human cognition: e.g., that working memory is limited (so you can't just reference global variables from anywhere at any time), and that there is no default sequential flow of control in humans, as there is in many programming languages.

Because there is not yet any explicitly decided architecture for PUMs, it is easier to imagine the approach being used in the course of design, rather than analysis. But the potential is there to provide a detailed language and knowledge structure which would constrain the analysis of a task more closely and helpfully than the MHP. In the meanwhile, using the PUMs concept in analysis could be a way of testing the plausibility of hypotheses about the mechanisms of task-related cognition: one could attempt to fit a task into the constraints selected, and if the performance was similar to that of a human, that could be said to corroborate those constraints.

The concern with implementing a system which represents the main features of human cognition is shared by a number of people not directly concerned with HCI, on the borderlines of AI and psychology. Young, Green & Simon cite SOAR [72] as the closest in flavour to their desired representation of the human processor: both they and Runciman & Hammond also mention Anderson (ACT*) [5] and others. Some difficulties with these for modelling in HCI will be discussed below.

2.3.2.3 ACT*

Anderson's ACT* [5] is a much more specific implemented architecture that aims to model human cognition generally. It deals with three kinds of cognitive units: temporal strings; spatial images; and abstract propositions. Anderson does not discuss in detail the cognitive processes that convert sensory experience into these units.

There are three distinct forms of memory dealing with cognitive units: working memory, which stores directly accessible knowledge temporarily; declarative memory, which is the long-term store of facts, in the form of a tangled hierarchy of cognitive units; and production memory, which represents procedural knowledge in the form of condition-action pairs.

Factual learning is said to be a process of copying cognitive units from working memory to declarative memory. This is seen as quite a different process from procedural learning, which happens much more slowly, because of the danger of very major changes in cognitive processes, which may be produced by the addition of just one production rule.

Procedural learning is the construction of new productions, which then compete for being used on the same terms as the established productions. ACT* allows procedural learning only as a result of performing a skill. First, general instructions are followed (using established general purpose productions), then those instructions are compiled into new productions, which are then tuned by further experience in use. Compilation happens through two mechanisms: first a string of productions is composed into a single production (a mechanism called ``composition''); then this composite production is proceduralised by building in the information which had previously been retrieved from declarative memory.

Anderson illustrates the operation of learning by ACT* simulating the acquisition of early language, specifically syntax. Many assumptions are made for this purpose, including the assumption that words and parts of words (morphemes) are known and recognised. More important, the meaning of complete utterances is assumed to be understood.

From this brief description, we may see that ACT* is designed to emulate human learning, among other things. However, it is very difficult to see how, for the kind of complex systems that we are considering, ACT*'s mechanism for procedural learning could work. Where would the initial `general' productions come from, which would be needed to guide the initial experience?

The question of whether ACT* serves to guide a task analysis in a useful cognitive way is a separate question. Since the knowledge which results in action in ACT* is implemented in a production system, it would make sense to analyse tasks in terms of production rules, and there is no reason to suppose this is a difficulty, since this is the formalism adopted by Kieras & Polson [66]. The problem is not that production rules are difficult to create, but rather that it is possible in general to analyse a task in terms of production rules in many widely differing ways---just as it is in general possible to find many algorithms to solve a particular problem. The ACT* model does not help to focus an approach to analysis, but rather leaves this aspect of the analysis open. ACT* does not seem to pose the right questions, or offer useful guidance, for the practical problem of analysing a task in terms that are specifically matched to actual human cognition.

SOAR [72] is a general-purpose architecture that is less directly concerned with modelling human cognition than ACT*. SOAR cannot create representations, or interact with the external task environment. While it may well be another promising model of the ideal performance of a single task, what it cannot do includes some of the crucial aspects of the operator in either an active learning situation, or (which is similar) an unfamiliar emergency, not well-remembered or encountered before.

What both SOAR and ACT* need, to complement them in providing guidance for task analysis, is (at least) a way of finding those production rules which best represent a particular human approach to a particular task. Analysing a task in terms of goals and rule structures is not a strong enough constraint to specify a method of cognitive task analysis.

2.3.2.4 Interacting Cognitive Subsystems

Barnard [9] gives a theory of cognitive resources which he claims is applicable to human-computer interaction. He wishes to deal explicitly with the various different representations of information and the different models that are appropriate in different circumstances, and with the interaction between those models.

His theory, ``Interacting Cognitive Subsystems'' (ICS), postulates a model of cognition as a number of cognitive subsystems joined by a general data network. In his main diagram of the architecture, Barnard gives two sensory subsystems, acoustic and visual; four representational subsystems, morphonolexical, propositional, implicational and object; and two effector subsystems, articulatory and limb. Each subsystem has its own record structure, and methods for copying records both within the subsystem and across to other subsystems.

ICS is used to explain a number of features of cognition and experimental results, particularly concerning novice users of a computer system engaging in dialogue via the keyboard. Barnard's intention is that the ICS model should provide the basis for describing principles of the operation of the human information processing system that could be tested empirically for generality. If general principles were indeed found in this way, we would both have gained knowledge applicable to the field of HCI, and would have demonstrated the usefulness of the ICS model.

Barnard clearly states that his approach starts with the assumption that ``perception, cognition and action can usefully be analysed in terms of well-defined and discrete information processing modules''. What is not entirely clear is whether Barnard is committed to the particular subsystems that he mentions in this paper. One could certainly imagine a similar model based on different subsystems, or subsystems interacting in a different way. Moreover, it is highly plausible to suppose that individuals use different representational systems in different ways. Investigating the statistical features of experimental results on a number of subjects together, in the way that Barnard reports, is not designed to show up any differences between individuals in this respect.

The concept of interacting cognitive subsystems is not dependent on a particular theory about any of the relationships between the subsystems. Indeed, Barnard gives few indications about how information is changed from one representation to another. For example, how is iconic visual information converted to propositional form? Interestingly, it could be just this sort of conversion of information from iconic to logical form that plays a crucial rule in many dynamic control skills---particularly controlling vehicles. The acquisition of such a conversion `routine' may be a key aspect of learning the skill.

Another unclear feature of the ICS model concerns the general data network. In computer systems, communication is made possible by agreement on common formats for data transfer and communication: networks can have elaborate standards to define these formats. According to Barnard, it is the individual subsystems that recode information in formats ready for other subsystems. Thus recoding is seen as dependent on the sender, rather than the receiver. This means that the supposed data network has to be able to convey information in any of the constituent representations. How it could possibly do this is not discussed, which is disappointing, considering that it would be a major discussion if the problem were seen in terms of the computer analogy that it invokes.

Barnard regards his theoretical framework as in some respects an enhanced model human information processor of the type proposed by Card, Moran & Newell (see above, §2.3.2.1). But whereas those authors largely report the separate parameters of cognitive abilities, Barnard is making assumptions about the structure of cognition that have not been explicitly verified. The present study does not see ICS as in the same league as GOMS, yet. A judgement on whether his theory is applicable to human-computer interaction in complex systems would have to await an attempted detailed analysis of a practical complex task using his framework.

2.3.2.5 General points about models of cognition

The models reviewed here are more relevant to well-understood, more straightforward tasks than to complex tasks involving complex systems. Their starting point has generally been within the scope of cognitive psychology, where there is a dominance of experiments designed to discover about particular identified parts of human cognitive capabilities, rather than the way in which these abilities are coordinated to produce skilled control. The models which address the coordination, though very interesting, cannot claim a strong empirical base for the way in which they model coordination itself, however much they may rightly claim to base themselves on empirical research of the individual abilities. The core of the problem seems to be firmly tied to complexity. In §1.3.2 a case is made out for defining complexity in terms of the variety of practical strategies, hence in a complex task one would expect variation over time or between individuals. The methodology for empirically addressing the issues arising from complexity seems not to have been worked out yet in the cognitive science tradition.

2.3.3 Important features of cognition in complex systems

This part of the literature corresponds to the models of error-prone human operators, above, §2.1.9, and to the purpose of communication of the modelled concepts. In this literature, there are observations on salient features of human cognition in complex processes, that do not directly relate either to current models of cognition, or to current methods of logical analysis of tasks. Here we find the distinctions between novice and expert styles of reasoning, and Rasmussen's distinctions between skill-, rule- and knowledge-based behaviour [101]. We can see these as offering partial specifications of what a model of human performance of complex tasks should cover.

2.3.3.1 Individuality

Many authors have stressed that much human mental activity differs between individuals and between tasks (e.g. [1, 91, 110]). The variety of individual strategies and views of any particular task has been identified by Rasmussen [100], who states that ``Human data processes in real-life tasks are extremely situation and person dependent''. This may well have a bearing on the information requirements and priorities, and thus it should be reflected in any comprehensive cognitive task analysis.

We may here make a useful distinction between individual variation in Intelligent Tutoring Systems (ITS), and in complex system control. For ITS, it is not difficult to imagine the production of models of complete and incomplete knowledge of a domain, and expert and `buggy' performance strategies (examples in [128]). In contrast, in complex process control it is much more difficult to define what would comprise `complete' knowledge, and what (if anything) is an optimum strategy for a given task, since, although there is usually much that is defined in procedures manuals, this is never the whole story. Hence, for complex systems, it is implausible to model individual variation as an `overlay' on some supposed perfect model.

In current approaches to cognitive task analysis, variation between individuals is often ignored, and analysis performed only in terms of a normative structure, often justified merely by the observation that it is plausible to analyse the task in this way. But, considering (for example) the reality of complex process control, to ignore differences is highly implausible, since there are apparent obvious differences in the way that, for example, novices and experts perform tasks. Better, surely, if it were possible, to construct a model of each individual's mental processes and information requirements. If this were done, a designer would have the option of designing either a system tailored to the information requirements of an individual, or a system which could adapt to a number of individuals, where the information presented could more closely match their particular strategies, methods, etc.

In any case, it could well be dangerous to specify rigid operating procedures, where there is any possibility of a system state arising that had not been envisaged by the designers, since an operator dependent on rigid procedures might be at a loss to know what to do in the situation where the rule-book was no help. If there are not rigid operating procedures, then operators will find room for individuality, and their information requirements will not be identical. Hence, in complex systems, it would be advantageous to be able to model individuals separately, and hence there is space for the development of models more powerful than current ones.

2.3.3.2 Skills, rules and knowledge

Rasmussen and various co-workers wished to have a basic model of human information-processing abilities involved in complex process control, including (particularly, nuclear) power plants and chemical process works, in order to provide the basis for the design of decision support systems using advanced information technology techniques. The analysis of many hours of protocols from such control tasks has led to a conceptual framework based around the distinction between skill-based, rule-based, and knowledge-based information processing, which has been introduced above, §1.3.1.

Although Rasmussen presents his stepladder model as a framework for cognitive task analysis, he suggests neither an analytical formalism, such as a grammar, nor any explanation of the framework based on cognitive science. In a later report [103], he does give examples of diagrammatic analysis of a task in terms of his own categories. However, this is neither formalised nor based on any explicit principles.

Clearly, what Rasmussen gives does not amount to a complete cognitive task analysis technique. Writing in the same field, Woods [144] identifies the impediment to systematic provision of decision support to be the lack of an adequate cognitive language of description. In other words, neither Rasmussen nor anyone else provides a method of describing tasks in terms that would enable the inference of cognitive processes, and hence information needs. What Rasmussen does provide is an incentive to produce formalisms and models that take account of the distinctions that he has highlighted. Any new purported model of cognition or cognitive task analysis technique must rise to the challenge of incorporating the skill, rule and knowledge distinction.

2.3.3.3 Mapping cognitive demands

Roth & Woods [114] base their suggestions for cognitive task analysis on experience with numerous successful and unsuccessful decision support systems designed for complex system control. They see the central requirements for cognitive task analysis as being firstly, analysing what makes the domain problem hard, i.e., what it is about the problem that demands cognitive ability; and secondly, analysing the ways people organise the task that lead to better or worse performance, or errors. Once the errors have been understood, it should become possible to design a support system to minimise their occurrence.

The study of the central requirements of cognitive task analysis is referred back to Woods & Hollnagel [147]. They recognise three elements basic to problem-solving situations: ``the world to be acted on, the agent who acts on the world, and the representation of the world utilized by the problem-solving agent''. Before considering the interrelationship of all three elements, their proposed first step in the analysis is to map the cognitive demands of the domain in question, independently of representation and cognitive agent. This implies that any variation between cognitive agents (people, computers, etc.) will not feature in this first stage of the analysis. We may expect this to capture any necessary logical structure of a task, but this is not the cognitive aspect in the sense related to actual human cognition. The only variation they allow at this stage is between different ``technically accurate decompositions'' of the domain. For these authors, ``determining the state of the system'' is an example of this kind of cognitive demand. The difficulty with this mode of analysis is, given the possibility of variant human strategies, and thence representations, that the human description of the `state of the system', and the method for determining it, may indeed vary both with the individual operator, and with the representation that they are currently using.

To try to retreat into technical descriptions at this point is only further to sidestep the cognitive issue, and evade the question of whether there is any cognitive analysis at all that can be done independently of the agent and the representation. A possible reply might be that it is necessary to abstract from the influence of the agent and the representation in order to make progress in task analysis, since these factors are difficult to capture: however this does no more than beg the question of the possibility or ease of finding out about the agent and representation---and that question has not been opened very far, let alone closed.

Inasmuch as some analysis is done without reference to the agent and the representation, we could see this as compromising the cognitive nature of the analysis, taking it further away from a full treatment of cognitive issues, back towards the a priori formalisms of §2.1.3. Woods & others' approach still has some advantage over such formalisms, by taking the operational setting into account, but this trades off against simplicity, and means that their approach is harder to formalise.

In essence, these authors' approach to cognitive task analysis falls short of a detailed methodology, because there is still uncertainty about how the domain problem is to be described in the first place. Of course, for systems which are not especially complex, there may be a certain amount of cognition-independent task analysis based on the logical structure of the task. For more complex tasks, it is difficult to see how cognitive aspects of the task could usefully be analysed without a concurrent analysis of the actual cognitive task structure that the operators work with.

2.3.3.4 Modelling the operator's view of the structure of a system

Holland et al. [56] present a detailed account of rule-based mental models, which they see as incorporating advantages from both production systems and connectionist approaches. The mental models of the world are based on what they call quasi-homomorphisms, or q-morphisms for short. In essence, any particular way of categorising the world gives rise to categories which are abstract to some degree, by leaving out some of the detailed properties of real-world objects. For instance, to give an extreme example from their book, we could categorise objects simply into fast-moving and slow-moving. These categories behave in certain more-or-less uniform ways, so that on the whole we can predict the future state of an object by classifying it, and applying a general rule for how that class of objects behaves. Thus, on the whole, fast-moving objects, after a while, become slow-moving. If we then look at the real world at a subsequent time, we might notice that some of the things that we had predicted were wrong: i.e., our categorisation was not sufficient to predict the behaviour of all the objects in which we were interested. For example, wasps, members of the class of fast-moving objects, mostly retain their fast-moving quality over time. This makes the mapping between real world and model a quasi-homomorphism, rather than a plain homomorphism, which a faithful many-to-one mapping of world to model would be. (This also contrasts with an isomorphism, which is a one-to-one mapping.)

If the person with the model is concerned to be able to predict more closely things that happen in the world, they can introduce another categorisation to deal with the exceptions from the first one, and this process can be repeated if necessary. Of course, the categorisations have to be based on detectable differences, but there are no hard-and-fast properties which always serve to classify particular objects. In their example, the further categorisation in terms of size (small/large) and stripiness (uniform/striped) might serve to distinguish wasps from a few other cases of fast objects that do indeed slow down over time. The way that things are classified depends on what the purpose of classification is, and can rely on feature clusters rather than specific predefined features alone.

Their theories are particularly interesting because they extend to giving accounts of the fundamental processes of inference, learning and discovery. This could provide a very powerful model in the long run, because if we can describe the ways in which people learn about complex systems (with any variations in such ways) we would be in a stronger position to model the knowledge that they actually have at any time. Also, having a theory of how knowledge can arise lends credence to the model of knowledge itself, and provides a composite model which is stronger than a model which speculates only on the nature of the knowledge that already exists. We could describe these theories as mental meta-models, since they set out to describe the processes that underlie the creation of mental models in the human.

Moray [85] endorses the work of Holland et al., and gives a different angle on what a process operator knows. He suggests that an operator's model includes an appreciation of the plant as made up from subsystems which are more-or-less independent, and their interactions are known to the operator to an appropriate degree of accuracy. He goes on to suggest that decision aids could well be based on a view of the plant consistent with the operator's model: he says that there are ways of predicting what a likely decomposition of the plant into `quasi-independent subsystems' may look like. Of course, there may at times be a need to look at the state of the plant in more detail than usual, particularly when faults occur or errors have been made. A good decision aid would support presentation of information at the appropriate level of detail.

Moray furthermore claims that there are methods which could automatically discover the plausible subsystems of a complex system. This would mean that cognitive task analysis could proceed by identifying these subsystems, and using them as basic terms in the language describing the task from a cognitive point of view. The same subsystem structure could also be used as the basis for organising the information to be displayed to the operator.

We should note that Moray offers no evidence about the extent to which actual models of operators match up with the methodical analyses. Furthermore, it is not clear to what extent individual operators can or do develop their own models, differing from those of others. It would be surprising, though far from incredible, if a methodical analysis could show up a range of possible subsystem decompositions to match a range of individual models. If these questions are taken as challenges to further work, particularly towards discovering actual human mental models and representations, Moray's suggestions could be forerunners of a task analysis approach that was cognitive without being arbitrary or irremediably intuitive.

These approaches give some substance to hopes that there could be decision aids, and more general systems design methods, based on a much richer picture of mental models. These models could be scientists' models of users' or operators' models of a system, based on more general theoretical models of the users' or operators' cognitive processes.

2.3.3.5 Qualitative models and reasoning

It is patently obvious that people reason about the world, in a reasonably effective manner for everyday life, without recourse to detailed theory of, for example, physics. Much the same could be said of process operators: they manage to control complex systems by using knowledge which appears not to be equivalent to the engineering knowledge which went into aspects of the design. It seems reasonable to suppose that this knowledge is qualitative, rather than quantitative, and if we knew more about it, we may be able either to design decision aids which interact more effectively with the operator's knowledge, or to apply the principles learned towards the better design of human-machine systems.

The literature on qualitative models and reasoning (e.g. [37, 38, 69]) does not deal with HCI questions. Currently, it attempts to provide a model of the knowledge and reasoning processes which could support the kinds of commonsense reasoning that we know in everyday life.

In modelling commonsense reasoning, this approach implicitly tries to capture some of the underlying common knowledge that we have about the way the world works. This can be seen not as propositions, but as the framework which binds together the factual content in these models. While none of the authors here claim to model more than a fraction of commonsense reasoning, this approach at least offers a start where most others do not begin.

Qualitative modelling is not immediately very useful for the systems designer, since the methodology it provides is open-ended, but it has the advantage that for some of these systems (e.g. Kuipers' QSIM [70, 71]) the time development of the model has been implemented, so that if a researcher or analyst provides a qualitative description of a system, they may get qualitative predictions of the future possible behaviours of the system. Using this to model human reasoning would mean that the model predicts what the human should be able to predict, and thus we could say something about how the human should reason, based on their current knowledge.

But it can be difficult to express the functioning of a system in qualitative terms that lead to sensible predictions. The degree to which these models actually correspond to human thought processes is unknown. Hence this approach does not yet provide a practical means of modelling the operator controlling a complex system. It has, however, been used to model some systems as a preliminary to performing exhaustive analysis of what faults could occur in qualitative terms [97].

We could say that qualitative reasoning models are highly idealised models of people's knowledge and reasoning about systems.

Non-monotonic reasoning

This (see [42]) is an AI approach to describing common-sense reasoning in terms of formal logic. It attempts to capture something of the essence of default reasoning, and therefore could potentially be applied to mental modelling, possibly yielding insight into knowledge and inference structures that people use when controlling complex structures. However, this approach has not been applied yet to HCI issues, and its concentration on formalisms familiar to logicians tends to indicate a lack of concern with modelling human thought processes and structures, beyond a correspondence between the output of the human and of the model.

2.3.3.6 General points about important features of cognition

Here we have a set of considerations that are complementary to those arising from formalisms and models of cognition. Whereas some of the formalisms are well-developed, but not closely relevant to complex tasks, here we have a collection of points that are relevant to complex tasks but not well-developed.

Next Section 2.4
General Contents Copyright