|©1990, 1995||General contents|
|Chapter 1||Chapter 3|
Within the study of Human-Computer Interaction (HCI) there is a substantial body of literature which uses the phrases ‘mental models’, ‘user models’, ‘conceptual models’, etc. Confusion starts because there is no generally agreed single term which covers these phrases, and other similar ones. For brevity, we will exploit the ambiguity between models of the mind and models in the mind, and here use the term ‘mental models’ as a general term standing for all of the models referred to, and the general models implicit in modelling methods, techniques and theories, as well as having a slightly more circumscribed meaning than it tends to have in the literature.
In this literature there is no universal agreement about exactly which aspects of the users or their actions should be modelled, or how they should be modelled, and as a result writers have tended to define their area of interest implicitly by reference to others' work, or by concentrating on a particular point of view, or a particular practical problem, or particular aspects of modelling. The categories arising from this process of self-definition are not necessarily the best from the point of view of defining independent sub-literatures, or separating out different logical aspects of the subject. Nevertheless, these categories provide a starting point for a sketch of how the literature appears in its own terms: a kind of natural history of the literature. The objective in this first section about the literature is to give that natural history, before starting to discuss individual papers or theories: that will come after substantial discussion of the structure of the whole field.
The lack of definition in the subject also means that the boundaries between it and a number of neighbouring fields are ill-defined. Those fields which are left out here, or only mentioned in passing, include much of cognitive psychology, ergonomics, control theory, general systems theory, cybernetics, information theory, decision theory, management science, and planning. Intelligent tutoring systems and computer-aided instruction are only mentioned briefly. Similarly brief is the discussion of optimal control modelling, since it is less concerned with cognition. A guide to the literatures in control theory, communication theory, statistical decision theory, and information processing is given by Woods & Roth .
The language chosen by authors tends to indicate their background and informal affiliation. The term “mental model” emphasises that the object of study is a model which does, or could, belong to a person. It often goes along with the idea that a person has some kind of imagery in their head, which can be ‘run’, or otherwise directly used, to imagine and predict what will happen to something. Authors using this term favourably are concerned with finding out, or modelling, what the person actually ‘has in their mind’.
“User models”, on the other hand, often represent a view of the user and their attributes to be used equally well by a designer when designing a system, or by a computer system when interacting with a user, the latter model being termed an “embedded user model”. A “user” is generally seen as someone interacting with a computer-based device or system, which provides a service in the context of either work or leisure. Thus a “user” might use a word processor, a photocopier, or a library system.
If, in contrast, the person interacts all day long with machinery, the function of which is to serve some purpose external to the person, she or (usually) he tends to be called an “operator”. Someone may “operate” a ship, an aircraft, or a piece of industrial plant. “Operator models” tend to model this kind of operator from the perspective of more traditional ergonomics and cognitive psychology. The term is also often used by those writing from the engineering tradition of optimal control modelling.
Various uses of the term “conceptual models” have been noted. The most obvious usage follows the idea that a ‘concept’ is an explicitly expressible idea, thus “conceptual models” are models communicated to, or used by, people; usually relevant to the early stages of using or operating a system, where it is the general concepts that are the most important features of a model. A conceptual model would be expressible in words or figures, and probably communicated during the course of learning to use or operate a system.
There are other terms used in the literature, of which some (e.g. “student models”) have an obviously specialised use. More cautious authors say explicitly what aspect of what they are modelling, and do not use any of the general terms unqualified. All these terms are used in the literature, but each of the groupings in the literature tend to have their favourite.
Making a model of task performance can naturally involve task analysis: the breaking down of a task into component sub-tasks and elementary actions. In general, there are many possible ways of doing this, and, unlike the case in some branches of the longer-established sciences, there is no established canonical method for analysis of tasks, nor any widely agreed set of primitive elements which the analysis would give as end-points. The more complex the task, the more potential choice there is of different analyses or styles of analysis, just as there is more potential choice of methods of performing that task.
Traditionally, task analysis has been seen as fulfilling many purposes, including personnel selection, training, and manpower planning, as well as workload assessment and system design, and various analytic procedures have been proposed to meet these purposes. Many of these purposes are given by Phillips et al. , who also note that “systems which are stimulus-intensive, non-sequential and involve many cognitive responses, such as computer-based systems” have been less amenable to traditional methods.
There has recently been a growing body of opinion, that styles of task analysis that do not take into account relevant facts about human cognition are less than fully adequate for the analysis of complex tasks. This has led to attempts to describe those principles of cognition which are most relevant for task performance, or to devise task analysis methods that are in harmony with the human appreciation of the tasks. However, the lack of firmly established theoretical principles for such an analysis has meant that various authors have taken, or suggested, different approaches to this goal of a cognitive task analysis.
If we take a view exemplified by Sutcliffe , “...task analysis is a description of human activity systems and the knowledge necessary to complete a task”, then task analysis is closely related to mental modelling. Cognitive task analysis could be regarded as task analysis done from a mental models viewpoint. Much of the literature reviewed in §2.3 could be seen equally from the perspective of mental models or cognitive task analysis.
Barnard  (see §18.104.22.168) sees cognitive task analysis as analysis in terms of the cognitive subsystems needed to perform the task. This brings in different considerations from those of other authors, and thus the phrase needs using with caution. For this reason, the discussion here prefers the term ‘mental models’.
The first strand of the literature that we shall consider here focuses around the issues raised by Card, Moran & Newell's book, “The Psychology of Human-Computer Interaction” . This book is exemplary in defining an audience, and giving a clear scenario of the possible use of the end products of the analysis given. It sets the scene well to quote some extracts from their application scenario (pp. 9–10):
A system designer, the head of a small team writing the specifications for a desktop calendar-scheduling system, is choosing between having users type a key for each command and having them point to a menu with a lightpen. ... The key-command system takes less time ... he calculates that the menu system will be faster to learn ... A few more minutes of calculation and he realizes ... training costs for the key-command system will exceed unit manufacturing costs! ... Are there advantages to the key-command system in other areas, which need to be balanced? ...This gives us a picture of the use of formal models, which is in calculating reasonable values for expected human performance on a system. The model is of human abilities and ways of doing tasks. Card, Moran & Newell suggest that systems designers will be able to use their models and principles of user performance in the design process, and also that psychologists, computer scientists, human factors specialists, and engineers will find their work of interest.
Card, Moran & Newell expect designers, before using their methods, to have considered the psychology of the user and the design of the interface, specified the performance requirements, the user population, and the tasks. Thus it can be seen that their methods assume a context of use where the problem has already been specified, rather than being of use in the initial stage of breaking down a real-world problem to create the initial task or interface design. This is a fairly limited objective, but by being limited it becomes achievable, and it is not difficult to see how it may work in various circumstances.
A number of authors have followed this general lead, by taking a simplified tractable model of the user, and analyzing human-computer interaction in terms of a formalised approach to modelling. Some models have a worked-out formalism, and they could be called formal models, whereas other models have not yet reached that stage (though clearly intended to) and could therefore be called formalisable. In another sense, the modelling method only provides the means of producing a formal model, actually produced when the method is applied, and in this sense we could say that the methods given in the literature are able to formalise facts or suppositions into usable models. For these reasons, we call these modelling methods ‘formalisable models’, following Green et al. . Some reviewers here distinguish ‘performance’ models, which make predictions, generally quantitative, about the time or the effort involved in operations [12, 20, 66]; and ‘competence’ models, which tell what someone is, or is not, able to do (e.g. ). There are also papers which discuss this kind of approach in general [16, 21, 45, 127, 140], in particular [122, 123], or take their own specific related approach [9, 83].
This class of model answers the question, “what models can or could predict aspects of human performance?” As techniques for implementing models get more sophisticated, the boundary between working and just potential models is likely to shift, with more being included in the class of implemented, predictive models. The application scenarios are likely to widen and diversify from Card, Moran & Newell's vision given above.
Engineering and operator models could also be described as formal or formalisable, since they work with numerical or mathematical formalisms, and control theory, rather than informal ideas; and they offer predictions on issues such as: amount of deviation from some ideal performance; workload; errors; and task allocation. Some examples of literature discussing such issues are given in the present bibliography [46, 115, 132, 136, 139, 153].
Since this kind of model is based on engineering control theory, it is most suited to the simulation of tasks which would be amenable to that discipline, such as the manual control task of track-following, where there is a clear definition of ideal performance, and therefore measurements of deviation from the ideal are possible. The models are usually justified by comparing the characteristics of the performance of the models, with measurements of similar characteristics of human performance. But there is little or no justification in terms of psychological mechanisms for the generation of the human behaviour. Thus, these models are models of output, rather than models of internal processes.
The limitations of this kind of approach are acknowledged by some authors in the field (e.g., Rouse , Zimolong et al. ). Such models cannot easily and accurately deal with multiple tasks. Variation between individuals is clearly not easy to accommodate within an optimal model, and higher-level decisions taken at longer intervals are much less easy to model this way than continuous control tasks with short time constants. As limited-purpose models they are no doubt useful, but they are not well suited to rule-based tasks, and even less to knowledge-based ones, because in these tasks it is more difficult to posit ideals. And since a complex task has initially been defined here as one for which there are many practical strategies, these models, based on uniform strategies, even where parameters are allowed to vary, cannot be a good choice for modelling complex tasks.
There would seem to be little prospect for any contact between engineering models and the psychological and other models considered in this study: the references cited by authors in this field overlap little with those in the other fields. For all these reasons, this kind of model is not reviewed in detail here.
There is a set of models, less closely-defined than the formalisable models above, which can be seen as attempting to model what users “have in their heads”, at the same time trading off the clear-cut predictive ability of the previous classes of models. Gentner & Stevens' book, “Mental Models”  is a wide-ranging collection of papers which are fairly representative of this kind of approach to modelling, and many authors cite that collection, or at least some paper from it [18, 37, 56, 70, 84, 85, 133, 150]. Inasmuch as these models attempt to model human cognitive processes from a cognitive psychology standpoint, they have also been called “cognitive models”.
It is difficult to find a common subject in such a broad-ranging literature, which includes models of memory, of learning, of analogy, of commonsense reasoning, all from some kind of computational or algorithmic point of view. What these papers have in common is more like a general attitude: the authors seem to wish to propose models of various aspects of human action broadly in terms of computational processes, but not at such a simplified level as the previous class of models. This usually means that they do not expect practical implementation of their models now or in the near future. They do not in general provide performance predictions. However, the nature of these models is generally such that we could envisage them being formalised and implemented if there was enough time and interest.
Most of these models attempt to clarify what is going on in one area of human thinking, rather than trying to be all-embracing. This means that they can be taken as complementary to one another. Different methods which look irreconcilable may actually be appropriate for different areas of human action. These approaches could be seen as addressing the question, “what may we suppose to be the basis of a user's or operator's performance?”
The previous group of models focuses on the theoretical basis, but the application of models to training is to produce a practical result. Kieras & Bovair found (in a well-cited study ) that explicit conceptual models presented to learners made a difference to the efficiency of their learning. This bears on the topics of training and learning, where, as some authors have pointed out (e.g. ), explicit conceptual models, and analogies, have a large role to play. The model delivered to the learner or trainee is simpler than that of the designer or trainer, and although it is intended to form the basis of the learner's model of the system, it could hardly be the same as the learner's final model, for if it were, the learner could not be continuing to refine his or her model (as is clearly the case in practice). There is therefore a sense in which the study of this kind of model is not a study of the actual mental models that are in the subject's mind, but rather a study of how to stimulate a user to develop his or her own model more quickly. The corresponding question is, “what model can we give to a user to help him or her understand and control a new system, or device, or machine, or program?” There are other opinions on this question that differ from Kieras & Bovair: for instance, Halasz & Moran  consider analogy harmful in the training of new users.
Conceptual models, as in Kieras & Bovair, are unlikely to be identical with the model in the trainee's head, so there is scope for studying the latter model as part of an attempt to see how much of the presented conceptual model has been assimilated. The present study relates to a possible study of what the trainee has actually learnt, but not directly to studies that omit this.
In decision support, and skilled operator's models of complex dynamic processes, effective HCI relies on knowing what is currently in the operators mind, which may be complex and only partially conscious or able to be verbalised, rather than knowing about the conceptual level of the model, which may have been taught to the operator, or which the operator may teach to someone else. For this reason no review of the training and learning literature is given in this paper, other than in passing. Murray  gives a useful wide-ranging review of the modelling literature with close attention to this part of the field.
Different again are models of human learning, which many or may not be used in the context of training. A model of learning would imply a model of what is learnt, and modelling what has been learnt is a reasonable precursor to modelling (rather than merely emulating) the learning process itself. Again, the present study does not relate centrally to the modelling of learning independently from what is learnt, since in complex tasks it is both difficult and important to model what has been learnt, and one cannot expect to achieve this thoroughly from an a priori approach to modelling learning.
Training focuses more on the performance of the user than on the models they actually develop. So what can we find out about what is in a user's model? There are a number of papers that consider how to derive mental models from the users or operators themselves [2, 6, 95, 124]. This is a tricky problem, since users can be quite idiosyncratic, (as recognised long ago by Newell & Simon [91, p.788]). Knowledge elicitation techniques such as protocol analysis and personal construct theory have been used to this end.
Typically, if a user or operator knows a lot about something, they will be able to answer many direct questions about what they do in different situations, but they may not be able to answer questions about their knowledge, such as “how much do you know about such-and-such?” If they do not know the extent of their knowledge, and even more so if some of that knowledge is unconscious, they will not be able to give an exhaustive unprompted account of it, and therefore the knowledge that is actually elicited will be restricted by what questions the analyst asks, which will in turn be restricted by the concepts held by the analyst (or in the case of the repertory grid technique , restricted by the elements chosen for the elicitation of constructs). If we have an explicit model of the user's knowledge (and model) of the system, our model will limit what we will be able to find out about the user's model, which will not be unhelpful, if we wish to find out correspondingly limited things. But if we have no explicit model, what we discover about the user's model will be at the mercy of chance, or our intuition. In other words, the way we think of the user's model is intimately bound up with what we are able to discover about it. This means that advances in our model of the user's model will enable more knowledge to be recognised, and conversely, if we know something informally about the user's model that we want to capture, but cannot yet because of the restrictions of our model of the user, that informal knowledge may act as a stimulus to elaborating our model of the user.
Another branch of this literature, where knowledge is gathered from people, is the field of expert systems. The normal area of application of expert systems is, as their name implies, to encode the knowledge of an expert in a way that a non-expert can have access to the expert knowledge and, if possible, reasoning behind that knowledge. Typically, though not exclusively, this has been in such areas as diagnosis.
Diagnosis is generally something that can be discussed and reasoned about, and even if it is difficult to elicit all the knowledge from an expert in diagnosis (in whatever field), it is at least conceivable. In process operation, however, Bainbridge  points out that much of a process operator's knowledge is typically not able to be expressed readily in verbal form. This means that it is difficult to construct a faithful model of operator's decision-making, from the basis of verbal reports or protocols, as would be the norm following the expert system methodology.
Woods & Roth  did not consider the problems of elicitation, but instead considered the cognitive activities performed in nuclear power plants, as their criterion for selecting an expert system as a possible basis for a model of behaviour in the nuclear power plant domain. The system they chose, CADUCEUS, originally from medical diagnosis, fulfilled their criteria of: having a structured knowledge representation; having the ability to simulate problem-solving under multiple-fault, multiple constraint conditions; and allowing data-driven revision of knowledge over time. As with engineering operator models, one question of importance here is, to what extent are such models simply modelling some overall features of human performance, rather than providing a causative model compatible with cognitive psychology? If the object of such a model is merely to provide expert system decision support, then one cannot object to it just on the grounds that it does not model human cognitive processes. However, to provide a sound basis in general for designing operator aids, we do indeed need to model the operators' cognitive processes, at least in terms of their information usage. But Zimolong et al.  did not find any expert systems for process control, whose output matched human output well at a detailed level. For these reasons, it was decided to omit detailed review of papers following this approach.
Whatever the difficulties in elicitation, any good model has to explain the major observed features of human cognition in complex tasks. What needs to be in a good model of a user or operator? A few authors approach the subject of modelling from the standpoint of knowing about the realities of controlling complex dynamic systems [2, 3, 7, 57, 58, 100, 101, 125, 138, 143, 146, 148]. These realities are of such intricacy that there are as yet no proposed fully-fledged models which attempt to account in detail for both normal and emergency operation of complex systems, including errors and recovery from them. The papers of this type therefore tend to tell us (and system designers) about the features which should be taken into account when forming models of process operators and their knowledge and skill. Jens Rasmussen is a central author here, and most of the other authors cite him.
This is the least formalised area in the field of mental models, and this can lead to a sense of vagueness when reading the papers. This is due to the complexity of the questions being approached, compared with the questions dealt with by the cut-down idealisations in the currently formalised models. In this trade-off, informal models gain breadth of applicability at the expense of precision. These authors are more concerned with realism than with ease of formalisation.
Since this division of the field of mental models has not yet reached the stage of common agreement, it is not surprising that there are papers which do not fall neatly into one of these classes, including several review papers covering various parts of the field [13, 51, 88, 106, 140, 142, 150].
Another reason why the literature is not well-defined is because it is possible to see ways in which the divisions will break down in future. It may be naïve to think that one grand mental model could perform all the functions of all the different classes outlined above, but the idea is certainly attractive. If a definitive, theoretically sound model were invented (such as we might expect in the established physical sciences), it would certainly have something to say about all the areas of mental modelling.
More realistically, there are developments which are more easy to envisage, which would move or remove some of the boundaries implied above. Firstly, there is the tendency for implementation techniques to become more intricate and powerful, thus enabling more of the models which have been devised to be used in a practical way. This may mean that some of the less-restricted models pass into being formalisable. Also we may expect models to become more like the humans they are intended to model, which is a conceptual, as well as a technical, advance. The section of the literature on models taken from real life may find greater contact with other sections.
We cannot expect predictive models which accurately reflect all the aspects of human cognition relevant to systems designers until development and integration of the research areas has taken place.
Part of the disarray of the literature is caused by different authors meaning different things when they write about models, and by these meanings not being entirely clear and explicit. The literature shows that people have been aware of a variety of meanings for many years, and there are several papers which offer classifications of the meaning or usage of the terms. We shall start this section by looking at the distinctions between owners of models and between objects of models. Then, looking at the purpose of models, we find that there is a reasonable correspondence between purposes and the categories given in the previous section. This suggests that the purpose of a model is a important factor in classifying a mental modelling approach.
Distinguishing the owner and object of a mental model is important, particularly for readers who may gain a wrong impression by writings that do not make this clear. To start clarifying this, let us imagine a situation where we have a user using a system (often complex) which has been designed by someone else (a designer), and the interaction is being studied by another person, a scientist.
An author often quoted or cited for distinguishing the owner and object of a model is Norman . He distinguishes the target system t, the scientist's conceptual model of that system C(t), the user's mental model of that system M(t), and the scientist's conceptual model of the user's mental model C(M(t)). This brings out the importance of whose model it is, in that M(t) is not assumed to be identical with C(t). Equally well, C(t) is not the same as C(M(t)), even though it may be the same scientist who is doing the conceptualising. Great potential confusion may arise, because C(M(t)) is intended to be like M(t). Other authors sometimes talk loosely as if C(M(t)) actually was M(t).
Now if the user could reliably describe his or her mental model M(t) explicitly, the scientist could presumably accept this, and there would be no need for a separate C(M(t)). But most often, much of M(t) is implicit and tacit. Remembering this should help to reinforce the distinction.
Streitz , elaborating Norman, introduces the needs of a designer into his “mental model ‘zoo’ ”. The models of the target belonging to the system designer Cd(t) and to the psychologist Cp(t) may differ, as may their models of the user's mental model, Cd(M(t)) and Cp(M(t)). He also makes the distinction between the ‘content problem’ (the problem to be solved, itself) and the ‘interaction problem’ (how to solve the problem with the tools in hand). The user's mental model of the content domain is referred to as M(c). We may follow Streitz in suggesting a different ‘target’ for each of the different ways in which someone may be treating a system. Clearly there are many such distinctions which could be made in the spirit of the original ones put forward by Norman.
Whitefield  gives a classification based on these same two dimensions of ‘whose’ and ‘what of’. He claims that this classification is of use to systems designers, so it is worth a closer look. He has
There is also the question of what a model is for. As we shall see, the categories revealed by this question map much more closely onto the apparent groups outlined in §2.1, suggesting that the purpose of a model is an important feature for its classification. Although the purpose of a model is often ill-defined in many authors' works, it seems that without the dimension of purpose, there is little order or sense to be made from the literature on models.
Norman  makes passing reference to the purpose of a model without considering purposes as centrally important. In his view, the purpose of a mental model is to allow the person who owns it to understand and to anticipate the behaviour of a system, whereas conceptual models are devised as tools for the understanding or teaching of systems. (Norman talks of physical systems here, but we can easily extend the idea to cover human or social systems.) In this way, Norman recognises the question of what the model is for, without considering it as contentious. In particular, he does not discuss the possibility that a user may have a number of separate models of a particular system, used for different purposes, nor the possibility that a scientist may have various models of the user's mental model of a particular system. Norman describes models as tools, and we may reflect that, for a tool, the purpose is at least as significant a determining factor in its nature, as the identities of the tool's user and the object on which the tool is used.
Other authors make more of the importance of purposes of models in general. Murray  posits that “a statement of a model's purpose is an additional necessary constituent of any taxonomy which is to be used to specify the boundaries of different classes of models”. Benyon  asks “what is the purpose of the User Model? Is it to assist designers? to assist the user? to provide an adaptive capability for the system? to assess the knowledge of the user? to develop and refine other models? to assist research into human cognition?” Wahlström  distinguishes the following possible purposes of models:
There is a general approach to the concept of mental models, which aims not to describe particular users' mental models of a system, but rather to describe general issues for the designer to consider when thinking about the needs of the user. These issues may include general principles of human cognition, and its limitations; general observations about the way people deal with complex systems; factors which affect human tendency to error; and so on. These considerations may aid systems designers simply by getting them to think along appropriate lines. These approaches fall largely into the class described in §2.1.9 above.
If the purpose of a model is communication of an idea (outside the training context), there is not much we can deduce in principle about the form of the model. The way in which a researcher may attempt to communicate important ideas to a systems designer could be expected to vary greatly depending on the individual researcher and his or her appreciation of, and rapport with, the intended audience, which is typically seen to be both academic and professional.
Three surveys of opinions or practice of professional designers [11, 51, 131] give no positive evidence that designers are influenced by this kind of model. Designers seem to have many other more pressing things in mind: consistency and structure in the software; commercial pressure and deadlines; compatibility, convention and current design practice are among these.
Of course, these ideas continue to circulate among HCI researchers. More discussion on this topic would say no more than could be said of ideas within scientific enquiry in general.
A possible purpose for a mental model is to aid understanding of human cognition. Researchers need to develop their understanding of the field to continue to generate further useful facts for designers, whereas one would expect designers to be more interested in the applicable facts. Models of user's mental structures and processes could help researchers' understanding by giving extra ways of looking at cognition, whether by metaphor, analogy or other means. It is the kind of knowledge that requires further digestion and synthesis by cognitive scientists and HCI researchers before it is directly useful.
Some authors see mental models primarily in this way. For example, Young  suggests that what he calls a “User's Conceptual Model” (even though this sometimes refers, as here, to a model possessed by a psychologist) should help to explain aspects of the user's performance, learning and reasoning about a system, as well as providing guidelines for good design. Similarly, Carroll  says, “Mental models are structures and processes imputed to a person's mind in order to account for that person's behavior and experience”, thereby characterising models as psychological theories. There is much overlap between the class of models described here and those described in §2.1.5 above.
Viewing mental models as aids to understanding does not imply much about the form of such models, except perhaps that they should be comprehensible by the people for whom they are intended, i.e., researchers in HCI. It would be shortsighted of HCI practitioners to underrate the importance of this kind of model just because they are not immediately useful. It is from here that future developments may arise.
A designer may wish to know something about the potential or actual performance of a human-machine system, while wishing to avoid the need to do experiments involving people, which could be time-consuming and expensive. This could be in the context of choice between possible designs.
The formalisable models mentioned in §2.1.3 explicitly aim to predict either human performance or competence in terms of their analysis of the task. For any model to be a useful predictive tool, it must be able to take values of things which are assumed to be known, and produce values for the quantities which are to be predicted. This, of course, depends on what quantities are assumed to be known, or taken as known. One view of the essence of a model could be to predict whatever the modeller was interested in, from the things the modeller knows.
There is an important distinction within this class of models between models to be communicated and ones for private use. If the model is made to be communicated it would of course have to be explicit; but the models that we spontaneously generate, for the prediction and control of the things that we deal with in everyday life, are not normally available for direct inspection, either by others or even (often) by the owner of the model, and therefore if we wish to know about these models, their contents have to be inferred.
The formality of explicit models enables much more detailed discussion of particular approaches to modelling, which will be done in §2.3.1 shortly below.
Operator, or engineering, models also provide predictions, particularly about human performance and mental workload; and hence possible errors, and suggested task allocation between human and computer. For reasons given above (§2.1.4), no discussion of these models is given here.
As has been explained above (§2.1.6), these models form a separate category, which is recognised here, but will not be discussed. From the user's point of view, the purpose of these models is to aid understanding and using the system, while at the same time the trainer is treating them as something to be communicated, to assist the training process by supporting “direct and simple inference of the exact steps required to operate the device” . The form of these models is an issue to be dealt with by theoretical and empirical study within the discipline of the psychology of learning, which is outside the present study.
The review here is not intended to be exhaustively comprehensive. Rather, the object is to review major exponents of the different approaches to mental modelling. In this section are covered:
Clearly, these authors are attempting to construct a formalism that fits with the structure of human cognition. If these formalisms were to provide the basis for a cognitive task analysis, it would be on the grounds that the analysis is in similar terms to the analysis which we might presume is done in the human who is performing the task.
The analysis given by Card, Moran & Newell  is based around a model of task performance, referred to as GOMS, which stands for Goals, Operators, Methods and Selection rules. The Goals are those that the user is presumed to have in mind when doing the task—the authors see no need for argument here, since the user's presumed goals seem obviously reasonable. The Operators are at the opposite end of the analysis from the Goals: they are the elementary acts which the user performs (not to be confused with the human ‘operator’ who controls a process). The Methods are the established means of achieving a Goal in terms of subgoals or elementary Operators. When there is more than one possible Method for the achievement of a certain Goal, a Selection rule is brought into play to choose between them. The acts are assumed to be sequential, and the authors explicitly rule out consideration of the possibilities of parallelism in actions. They also see the control structure in their model as a deliberate approximation, being more restricted in scope than the production system of Newell & Simon .
What is considered elementary for the purposes of the analysis is to some extent arbitrary. The authors give examples of different analyses of the same text-editing system using different classes of elements: possible elementary units range from the course grain of taking each editing task as an Operator, to the fine grain of the keystroke as the Operator. At each level, the times to perform each elementary unit operation should lie generally within a certain band . For the course grain, this would be at least several seconds; for the fine grain keystroke units it would be a fraction of a second.
By calculating times for each Operator, from experiment, and allowing time necessary for mental workings, we could, in principle, use the GOMS model to make a prediction about the time necessary for a user to perform any particular task, for instance a benchmark task to be compared between different systems. Thus we could have a prediction of the relative practical speeds of various systems. The success of the predictions would depend on the validity of the simplifying assumptions for the studied task, including the choice of level or grain of analysis.
The area of application chosen for developing GOMS was text editing (see the quotation given above, §2.1.3). It could be that much of what they say, and the approximations they use, are appropriate to text editing but not to quite different kinds of task such as riding a bicycle on one hand, and controlling a complex chemical works on the other. It is difficult to imagine a GOMS analysis of bicycle riding. We could imagine Goals without too much difficulty: at the highest level, to get from A to B, and at an intermediate level, for example, to stay upright while travelling in a straight line. At the lowest level, comparable to the grain of keystrokes, the Operators may be to turn the handlebars left or right; but what could the Methods be to connect those Goals and Operators? For the different example of a complex plant, any procedures that one could explicitly teach a human operator could probably be expressed in the form of GOMS. But it is a well known fact  that humans do not reach full competence by formal instruction alone. After substantial experience, they develop what is known as ‘process feel’, which is often considered as beyond the reach of usual methods of analysis. How could this be represented in the GOMS model? And how would GOMS represent the many exceptions, anomalous states, emergency procedures, and unenvisaged states of the system? To be fair to the authors, they do not suggest that GOMS would be a suitable model for these various control tasks. However, it is possible to cast doubt even on the GOMS analysis of text editing. This will be done below, §2.6.1.
The GOMS model in general is a scientist's model, but a particular GOMS model of a task would be the analyst's model. It is not so clear, however, exactly what GOMS is trying to model. It is not intended to be an accurate model of the user's mental processes. Rather, it is an idealised model, which falls somewhere between a putative model of the user's mental processes and a model of the analyst's, or designer's, understanding of the task. Thus we can see that GOMS does not fit into the Norman/Streitz analysis (owner and object, § 2.2.1) very easily. But analysis in terms of purpose is much clearer: the function of the GOMS approach is to enable designers or analysts to produce models, the purpose of which is to provide the designer with comparisons of the performance of systems which have, at least in outline, been designed. It is limited in its accuracy by the simplifying assumptions which have been made, which also limit its applicability.
As with GOMS, the KLM deals with error-free operation. The explanation of errors is an important goal in the modelling of the control of complex systems.
It should be clear from this discussion that the KLM is suited to relatively routine tasks involving interaction with a computer system via a keyboard, and it is not suited to the analysis and design of the HCI aspect of the supervisory control of complex dynamic systems.
This is a development by Moran based on ideas in the GOMS model . Moran recognises that the model that a designer has when designing a system will determine the one that the user will have to learn, and therefore it would be a good idea if the designer had a clear and consistent model in mind when designing. The purpose of Moran's CLG formalism is to ensure that the designer has a framework round which to design. The design is done (generally) on four levels: the Task Level, the Semantic Level, the Syntactic Level, and the Interaction Level. Moran gives guidelines, and an example (a simple mail system), for how to do this.
Moran identifies three important views of CLG. The linguistic view is that CLG articulates the structure of command language systems, and generates possible languages. This explains the G of CLG. It may be that the linguistic view is of most interest to HCI researchers and theorists.
In the psychological view, CLG models the user's knowledge of a system. This assumes that the user's knowledge is layered in the same way as CLG. Moran suggests ways of testing whether a CLG is like a user's knowledge, but he does not give ways of testing the detailed structure of the knowledge, nor whether the representation is the same in both user and CLG. He has a clear idea that it is the designer's model that should be able to be assimilated by the user, hence the designer should be careful to make, and present, a clear and coherent model. But concentrating on this idea neglects discussion of the real possibility that users may develop their own independent models of a system, which may not be describable in the same formalism. We might fairly say that viewed psychologically, CLG makes another speculative attempt to introduce a theory explanatory of aspects of human cognition. It is hard to identify any success in this endeavour above that which is achieved by other psychological theories, and other models mentioned in the study in hand.
In the design view, CLG helps the designer to generate and evaluate alternative designs, but does not claim to constitute a complete design methodology. It could aid the generation of designs by giving an ordered structure to the detailed design tasks, and Moran suggests that CLG could provide measures for comparing designs, addressing efficiency, optimality, memory load, errors and learning.
Sharratt [122, 123] describes an experiment in which CLG was used by a number of postgraduates to design a transport timetabling system. The study shows the wide variation in designs produced, and although it was not the object of the study, this shows that CLG does not effectively guide a designer to any standard optimal design. Sharratt evaluated the designs with three metrics, for complexity, optimality and error, which metrics were developed from Moran's own suggestions. Sharratt also gives ideas on extending CLG to help with its use in an iterative design process. Sharratt notes difficulties with the use of CLG, and if such difficulties arise even in areas of design such as an interactive mail system or a transport scheduling system, we have all the less reason for supposing that CLG could substantially help with the design of decision aids for complex systems.
We may raise one further question, on top of those already asked by Moran himself and by Sharratt. That is, do we know whether Moran's level structure is correct? Is this the right way to go about dividing the specification? In the absence of any arguments, we must say that we do not know. It may well be that there are other possible ways of analysing a system into different levels, and a different system of levels would give rise to possibly quite different analyses or designs. The most important point to be made here, however, is that it is presumptuous to suppose that the levels given here actually correspond in all or most cases with similar levels in human representations. We can imagine, or perhaps some have experience of, design based on other characterisations of level. Would designers all feel that one particular level model is natural and all the others artificial? So as far as the psychological view goes, we have to say no more than that CLG gives a guess at a possible level framework for human knowledge about a system, and that this guess has no more empirical support than alternative ones.
It may be that there is some definite and constant basis for human cognition in the context of interactive systems, in which case we need to find it and make it the basis of the models of the user's knowledge that underlie HCI tools. If there is no such basis, we would need to find a more flexible approach to modelling and to design than is offered by CLG, or similar techniques.
The stated aim of Kieras & Polson's Cognitive Complexity Theory [66, 12] is to provide a framework for modelling the complexity of a system from the point of view of the user, out of which should come measures able to predict the ‘usability’ of that system. We may speculate that if this were effective and efficient, it would be potentially useful to the systems designer, by providing comparisons between different possible designs. But it is not intended to provide performance predictions directly, in the way that GOMS and the KLM can.
The achievement of this aim is via the provision of two formalisms, intended to interact with each other: one for representing the user's knowledge of how to operate the device (the “job–task representation”), and another for representing the device itself, in the form of a generalized transition network. The first formalism allows a designer to create a model of a user's understanding of a task in context. The second is a representation of the system from a technical point of view.
The formalism for the user's job–task representation is based on the concept of the production system (as in ). Although the authors cite GOMS as the precursor to their work, they do not follow GOMS in deliberately simplifying the full production system formalism, which is a general purpose architecture of great power. It may well be that the production system architecture is sufficiently powerful to simulate human cognitive capability in many fields (the aim of some AI research), but to model cognitive complexity for many purposes (such as modelling errors) there needs to be a correspondence between the model execution and the ways in which humans make use of their knowledge (a process model). They do not give any clear argument supporting the inherent suitability of the production system for modelling the appropriate aspects of the user's knowledge, nor do they offer any restrictions which bring it more in line with the capabilities of the human. They consider it sufficient justification to refer to other research which has used, or reviewed the use of, production system models.
Because there are no inherent restrictions in using a production system, and because of the potential variability of human ways of performing a task, formalising task knowledge in this way seems to be closer to an exercise in ingenuity (with relatively arbitrary results) than a means of faithfully reproducing actual human thought processes. Hence the doubt about whether the computed values of cognitive complexity bear any necessary relationship to the actual difficulties experienced by users.
These authors also chose text processing as a field of study. Although the task of writing is very complex, text processing (i.e., conversion of words in the mind to words on the machine) does not take very long to learn, and does not afford a great deal of variability in the strategies that one can adopt. One can imagine a text processor with a very limited repertoire, where an extensive logical structure is a necessary consequence of the structure of the computer program running the text processor. In this case, it is also easy to imagine a full analysis of error-free text processing skill, that would not vary between different people. Hence it would be plausible to put forward a production system account of the skill. But even if a production system account is valid for such a well-defined task, we cannot extrapolate this validity to complex tasks such as industrial process control.
The idea of a grammar model of a task is that it is possible to represent tasks, or complex commands, in terms of the basic underlying actions necessary to perform those tasks, and this can be done in a way that parallels the building up of sentences from words, or the building up of programs from elementary commands in a programming language. Grammars which describe the structure of programming languages have often been formalised in Backus-Naur Form (BNF), which is a relatively straightforward formalism.
When a grammatical model has been made of a ‘language’, two measures are of interest: the total number of rules in the grammar is some measure of complexity of the language, and therefore potentially related to the difficulty of learning it; and the number of rules employed in the construction (or analysis) of a particular statement, command, or whatever, is a measure of the difficulty of executing that command, and therefore potentially related to the difficulty of comprehending it.
Green, Schiele & Payne  argue convincingly that representing task languages in terms of BNF (as in ) must miss something of human comprehension, because in many cases the measures of complexity in BNF do not tally with experimentally derived or intuitive human ideas of complexity. Payne & Green's Task-Action Grammars  set out to provide a formalism in terms of which simple measures correspond more closely with actual psychological complexity.
The formalism is a scientist's model of a possible way in which people might represent tasks. An instantiation of the model would presumably be a designer's, or analyst's, model of how a typical user would structure a task internally. The designer makes this model by considering important abstract ‘features’ of the task (i.e., dimensions or attributes on which there are distinct differences between commands), which are assumed to be apparent to a user, and then formalising the task language to take account of those features, by enabling rules to be written in terms of the features, as well as in terms of instantiated feature values. The simplest example of this that Payne & Green give concerns cursor movement. Instead of having to have independent rules for moving a cursor forwards or backwards by different amounts, TAG allows the rule
Task[Direction, Unit] → symbol[Direction] + letter[Unit]provided that the available actions support such a generalisation. An important point made by these authors is that the consistency that allows such schematic rules makes command languages easier to understand and learn, compared to languages whose inconsistency does not allow the formation of such general rules.
Reisner  has recently pointed out that Payne & Green's notion of consistency is not as straightforward as perhaps they would like. There is usually room for various views of consistency, and what matters in system design is whether or not the designer and the user share the same view of consistency. Only then will a task's consistency make learning easier.
This is a point to which empirical data would be relevant, in that if it could be shown that a large majority of users shared one particular view of consistency in a particular task, then designers should design to it and analysts analyse in terms of it. But Payne & Green themselves admit a lack of such data on perceived task structure (perceived by users or operators, that is), and therefore the way in which a task is formalised is up to the judgement of the analyst. Here again we have the problem that the formalism is of such a power as to permit varying solutions. How are we to know whether a particular formalisation is the one that most closely corresponds to a practical human view of the task? Only, it would seem, by experiment, and that would make it impossible to use TAG with any confidence for a system that had not at least had a prototype built.
An alternative view would be that there is some ideal view of consistency, in which the grammar would represent the task in the most compact form. (Compactness is also a guiding principle in many machine learning studies. See, for example, Muggleton .) Users could then be encouraged to adopt this view. The difficulty for this idealist notion is that there is no general method of proving that any particular formalisation is the most compact possible. Payne has accepted this as a valid critique of TAG [personal communication].
It is to be expected, as for other formalisms, that for inherently well-structured tasks, the representation is fairly self-evident, and therefore a guess at an appropriate formalisation may well be near enough to get reasonable results. But as previously, there is plenty of room to doubt whether using this method in modelling the control of complex, dynamic systems could be helpful to the systems designer.
No doubt it is already clear that formalisms such as those described above may work reasonably in the analysis of simpler systems, and for systems where the tasks have no latitude for variability. Currently there are no generally known examples of such analysis being performed on a complex system. We may perhaps take this as an indication that the formalisms reviewed are not well suited to the analysis of complex systems, but we cannot be certain about this until either someone does perform one of these analyses of a complex system, or a substantially different type of analysis is shown to be superior for complex systems.
The next grouping of literature corresponds to the more general mental models of §2.1.5, and in terms of purpose, to those models intended to aid understanding. In this group, there are theories and models of human cognition, which specify the units and structure of supposed human cognitive methods and resources. This could be characterised as the analysis of (internal) cognitive tasks, which does not, of itself, specify how to analyse and map external tasks into these internal units, but obviously asks to be complemented by such a mapping.
Providing a model cognitive architecture does not in itself present a technique for cognitive task analysis, because there are other necessary aspects of a mental model. But if one wishes to perform such an analysis, a cognitive model will define the terms in which the task has to be analysed. A useful model here would be one which dealt with the aspects of cognition most relevant to interacting with complex systems.
The first ‘framework’ architecture to consider is the MHP of Card, Moran & Newell . This is not closely related to GOMS or the KLM, despite appearing in the same book. The authors attempt to bring together many results from cognitive psychology which they see as relevant to HCI design. The mind is seen as being made up of memories and processors, which have parameters for (memory) storage capacity, decay time and code type, and (processor) cycle time. They give estimates for general values of these parameters. Thrown in with these values are a number of other general principles (e.g., Fitts's Law, and the power law of practice). Taken together, what these parameters tell us is clearly more relevant to short-term simple tasks (and laboratory experiments) than to longer-term subtler cognitive abilities involving problem-solving or decision-making.
Card, Moran & Newell give several examples of the kind of question which could be answered with the help of the MHP. The questions mostly are about how quickly things can be done (reading, pushing buttons or keys, etc.) in different circumstances. There are no examples of applying the MHP to problem-solving or decision-making.
What the MHP does in terms of task analysis is essentially to set bounds on what is feasible for a human to do (cognitively). Thus a matching analysis would have to show what items were in what memories at what different times, and to take account of the times required for motor actions. What the MHP does not do is to set a limit on depth or complexity of information processing, nor to other values which may be of interest in analysing complex control tasks.
The PUMs idea described by Young, Green & Simon  potentially takes the modelling of cognitive processes much further, though its implementation is still thought to be several years in the future. That idea is to represent the user of a system by a program, interacting with another program that represents the system. The purpose of a PUM is to benefit a designer in two ways: firstly, by making the designer think rigorously about how the system requires the user to interact; secondly, if such a program were ever constructed, by enabling predictions of users' competence and performance on a given system in the same kind of way as other analytical methods, but with improved accuracy because of the closer matching of mental processes by PUMs than by simpler formalisms.
What language would the program be written in? How would knowledge be represented and manipulated in that language? They give no definitive answers. The cited paper and a paper on a related concept by Runciman & Hammond  suggest progress towards answers by considering fundamental facts about human cognition: e.g., that working memory is limited (so you can't just reference global variables from anywhere at any time), and that there is no default sequential flow of control in humans, as there is in many programming languages.
Because there is not yet any explicitly decided architecture for PUMs, it is easier to imagine the approach being used in the course of design, rather than analysis. But the potential is there to provide a detailed language and knowledge structure which would constrain the analysis of a task more closely and helpfully than the MHP. In the meanwhile, using the PUMs concept in analysis could be a way of testing the plausibility of hypotheses about the mechanisms of task-related cognition: one could attempt to fit a task into the constraints selected, and if the performance was similar to that of a human, that could be said to corroborate those constraints.
The concern with implementing a system which represents the main features of human cognition is shared by a number of people not directly concerned with HCI, on the borderlines of AI and psychology. Young, Green & Simon cite SOAR  as the closest in flavour to their desired representation of the human processor: both they and Runciman & Hammond also mention Anderson (ACT*)  and others. Some difficulties with these for modelling in HCI will be discussed below.
Anderson's ACT*  is a much more specific implemented architecture that aims to model human cognition generally. It deals with three kinds of cognitive units: temporal strings; spatial images; and abstract propositions. Anderson does not discuss in detail the cognitive processes that convert sensory experience into these units.
There are three distinct forms of memory dealing with cognitive units: working memory, which stores directly accessible knowledge temporarily; declarative memory, which is the long-term store of facts, in the form of a tangled hierarchy of cognitive units; and production memory, which represents procedural knowledge in the form of condition–action pairs.
Factual learning is said to be a process of copying cognitive units from working memory to declarative memory. This is seen as quite a different process from procedural learning, which happens much more slowly, because of the danger of very major changes in cognitive processes, which may be produced by the addition of just one production rule.
Procedural learning is the construction of new productions, which then compete for being used on the same terms as the established productions. ACT* allows procedural learning only as a result of performing a skill. First, general instructions are followed (using established general purpose productions), then those instructions are compiled into new productions, which are then tuned by further experience in use. Compilation happens through two mechanisms: first a string of productions is composed into a single production (a mechanism called “composition”); then this composite production is proceduralised by building in the information which had previously been retrieved from declarative memory.
Anderson illustrates the operation of learning by ACT* simulating the acquisition of early language, specifically syntax. Many assumptions are made for this purpose, including the assumption that words and parts of words (morphemes) are known and recognised. More important, the meaning of complete utterances is assumed to be understood.
From this brief description, we may see that ACT* is designed to emulate human learning, among other things. However, it is very difficult to see how, for the kind of complex systems that we are considering, ACT*'s mechanism for procedural learning could work. Where would the initial ‘general’ productions come from, which would be needed to guide the initial experience?
The question of whether ACT* serves to guide a task analysis in a useful cognitive way is a separate question. Since the knowledge which results in action in ACT* is implemented in a production system, it would make sense to analyse tasks in terms of production rules, and there is no reason to suppose this is a difficulty, since this is the formalism adopted by Kieras & Polson . The problem is not that production rules are difficult to create, but rather that it is possible in general to analyse a task in terms of production rules in many widely differing ways—just as it is in general possible to find many algorithms to solve a particular problem. The ACT* model does not help to focus an approach to analysis, but rather leaves this aspect of the analysis open. ACT* does not seem to pose the right questions, or offer useful guidance, for the practical problem of analysing a task in terms that are specifically matched to actual human cognition.
SOAR  is a general-purpose architecture that is less directly concerned with modelling human cognition than ACT*. SOAR cannot create representations, or interact with the external task environment. While it may well be another promising model of the ideal performance of a single task, what it cannot do includes some of the crucial aspects of the operator in either an active learning situation, or (which is similar) an unfamiliar emergency, not well-remembered or encountered before.
What both SOAR and ACT* need, to complement them in providing guidance for task analysis, is (at least) a way of finding those production rules which best represent a particular human approach to a particular task. Analysing a task in terms of goals and rule structures is not a strong enough constraint to specify a method of cognitive task analysis.
Barnard  gives a theory of cognitive resources which he claims is applicable to human-computer interaction. He wishes to deal explicitly with the various different representations of information and the different models that are appropriate in different circumstances, and with the interaction between those models.
His theory, “Interacting Cognitive Subsystems” (ICS), postulates a model of cognition as a number of cognitive subsystems joined by a general data network. In his main diagram of the architecture, Barnard gives two sensory subsystems, acoustic and visual; four representational subsystems, morphonolexical, propositional, implicational and object; and two effector subsystems, articulatory and limb. Each subsystem has its own record structure, and methods for copying records both within the subsystem and across to other subsystems.
ICS is used to explain a number of features of cognition and experimental results, particularly concerning novice users of a computer system engaging in dialogue via the keyboard. Barnard's intention is that the ICS model should provide the basis for describing principles of the operation of the human information processing system that could be tested empirically for generality. If general principles were indeed found in this way, we would both have gained knowledge applicable to the field of HCI, and would have demonstrated the usefulness of the ICS model.
Barnard clearly states that his approach starts with the assumption that “perception, cognition and action can usefully be analysed in terms of well-defined and discrete information processing modules”. What is not entirely clear is whether Barnard is committed to the particular subsystems that he mentions in this paper. One could certainly imagine a similar model based on different subsystems, or subsystems interacting in a different way. Moreover, it is highly plausible to suppose that individuals use different representational systems in different ways. Investigating the statistical features of experimental results on a number of subjects together, in the way that Barnard reports, is not designed to show up any differences between individuals in this respect.
The concept of interacting cognitive subsystems is not dependent on a particular theory about any of the relationships between the subsystems. Indeed, Barnard gives few indications about how information is changed from one representation to another. For example, how is iconic visual information converted to propositional form? Interestingly, it could be just this sort of conversion of information from iconic to logical form that plays a crucial rule in many dynamic control skills—particularly controlling vehicles. The acquisition of such a conversion ‘routine’ may be a key aspect of learning the skill.
Another unclear feature of the ICS model concerns the general data network. In computer systems, communication is made possible by agreement on common formats for data transfer and communication: networks can have elaborate standards to define these formats. According to Barnard, it is the individual subsystems that recode information in formats ready for other subsystems. Thus recoding is seen as dependent on the sender, rather than the receiver. This means that the supposed data network has to be able to convey information in any of the constituent representations. How it could possibly do this is not discussed, which is disappointing, considering that it would be a major discussion if the problem were seen in terms of the computer analogy that it invokes.
Barnard regards his theoretical framework as in some respects an enhanced model human information processor of the type proposed by Card, Moran & Newell (see above, §22.214.171.124). But whereas those authors largely report the separate parameters of cognitive abilities, Barnard is making assumptions about the structure of cognition that have not been explicitly verified. The present study does not see ICS as in the same league as GOMS, yet. A judgement on whether his theory is applicable to human-computer interaction in complex systems would have to await an attempted detailed analysis of a practical complex task using his framework.
The models reviewed here are more relevant to well-understood, more straightforward tasks than to complex tasks involving complex systems. Their starting point has generally been within the scope of cognitive psychology, where there is a dominance of experiments designed to discover about particular identified parts of human cognitive capabilities, rather than the way in which these abilities are coordinated to produce skilled control. The models which address the coordination, though very interesting, cannot claim a strong empirical base for the way in which they model coordination itself, however much they may rightly claim to base themselves on empirical research of the individual abilities. The core of the problem seems to be firmly tied to complexity. In §1.3.2 a case is made out for defining complexity in terms of the variety of practical strategies, hence in a complex task one would expect variation over time or between individuals. The methodology for empirically addressing the issues arising from complexity seems not to have been worked out yet in the cognitive science tradition.
This part of the literature corresponds to the models of error-prone human operators, above, §2.1.9, and to the purpose of communication of the modelled concepts. In this literature, there are observations on salient features of human cognition in complex processes, that do not directly relate either to current models of cognition, or to current methods of logical analysis of tasks. Here we find the distinctions between novice and expert styles of reasoning, and Rasmussen's distinctions between skill-, rule- and knowledge-based behaviour . We can see these as offering partial specifications of what a model of human performance of complex tasks should cover.
Many authors have stressed that much human mental activity differs between individuals and between tasks (e.g. [1, 91, 110]). The variety of individual strategies and views of any particular task has been identified by Rasmussen , who states that “Human data processes in real-life tasks are extremely situation and person dependent”. This may well have a bearing on the information requirements and priorities, and thus it should be reflected in any comprehensive cognitive task analysis.
We may here make a useful distinction between individual variation in Intelligent Tutoring Systems (ITS), and in complex system control. For ITS, it is not difficult to imagine the production of models of complete and incomplete knowledge of a domain, and expert and ‘buggy’ performance strategies (examples in ). In contrast, in complex process control it is much more difficult to define what would comprise ‘complete’ knowledge, and what (if anything) is an optimum strategy for a given task, since, although there is usually much that is defined in procedures manuals, this is never the whole story. Hence, for complex systems, it is implausible to model individual variation as an ‘overlay’ on some supposed perfect model.
In current approaches to cognitive task analysis, variation between individuals is often ignored, and analysis performed only in terms of a normative structure, often justified merely by the observation that it is plausible to analyse the task in this way. But, considering (for example) the reality of complex process control, to ignore differences is highly implausible, since there are apparent obvious differences in the way that, for example, novices and experts perform tasks. Better, surely, if it were possible, to construct a model of each individual's mental processes and information requirements. If this were done, a designer would have the option of designing either a system tailored to the information requirements of an individual, or a system which could adapt to a number of individuals, where the information presented could more closely match their particular strategies, methods, etc.
In any case, it could well be dangerous to specify rigid operating procedures, where there is any possibility of a system state arising that had not been envisaged by the designers, since an operator dependent on rigid procedures might be at a loss to know what to do in the situation where the rule-book was no help. If there are not rigid operating procedures, then operators will find room for individuality, and their information requirements will not be identical. Hence, in complex systems, it would be advantageous to be able to model individuals separately, and hence there is space for the development of models more powerful than current ones.
Rasmussen and various co-workers wished to have a basic model of human information-processing abilities involved in complex process control, including (particularly, nuclear) power plants and chemical process works, in order to provide the basis for the design of decision support systems using advanced information technology techniques. The analysis of many hours of protocols from such control tasks has led to a conceptual framework based around the distinction between skill-based, rule-based, and knowledge-based information processing, which has been introduced above, §1.3.1.
Although Rasmussen presents his stepladder model as a framework for cognitive task analysis, he suggests neither an analytical formalism, such as a grammar, nor any explanation of the framework based on cognitive science. In a later report , he does give examples of diagrammatic analysis of a task in terms of his own categories. However, this is neither formalised nor based on any explicit principles.
Clearly, what Rasmussen gives does not amount to a complete cognitive task analysis technique. Writing in the same field, Woods  identifies the impediment to systematic provision of decision support to be the lack of an adequate cognitive language of description. In other words, neither Rasmussen nor anyone else provides a method of describing tasks in terms that would enable the inference of cognitive processes, and hence information needs. What Rasmussen does provide is an incentive to produce formalisms and models that take account of the distinctions that he has highlighted. Any new purported model of cognition or cognitive task analysis technique must rise to the challenge of incorporating the skill, rule and knowledge distinction.
Roth & Woods  base their suggestions for cognitive task analysis on experience with numerous successful and unsuccessful decision support systems designed for complex system control. They see the central requirements for cognitive task analysis as being firstly, analysing what makes the domain problem hard, i.e., what it is about the problem that demands cognitive ability; and secondly, analysing the ways people organise the task that lead to better or worse performance, or errors. Once the errors have been understood, it should become possible to design a support system to minimise their occurrence.
The study of the central requirements of cognitive task analysis is referred back to Woods & Hollnagel . They recognise three elements basic to problem-solving situations: “the world to be acted on, the agent who acts on the world, and the representation of the world utilized by the problem-solving agent”. Before considering the interrelationship of all three elements, their proposed first step in the analysis is to map the cognitive demands of the domain in question, independently of representation and cognitive agent. This implies that any variation between cognitive agents (people, computers, etc.) will not feature in this first stage of the analysis. We may expect this to capture any necessary logical structure of a task, but this is not the cognitive aspect in the sense related to actual human cognition. The only variation they allow at this stage is between different “technically accurate decompositions” of the domain. For these authors, “determining the state of the system” is an example of this kind of cognitive demand. The difficulty with this mode of analysis is, given the possibility of variant human strategies, and thence representations, that the human description of the ‘state of the system’, and the method for determining it, may indeed vary both with the individual operator, and with the representation that they are currently using.
To try to retreat into technical descriptions at this point is only further to sidestep the cognitive issue, and evade the question of whether there is any cognitive analysis at all that can be done independently of the agent and the representation. A possible reply might be that it is necessary to abstract from the influence of the agent and the representation in order to make progress in task analysis, since these factors are difficult to capture: however this does no more than beg the question of the possibility or ease of finding out about the agent and representation—and that question has not been opened very far, let alone closed.
Inasmuch as some analysis is done without reference to the agent and the representation, we could see this as compromising the cognitive nature of the analysis, taking it further away from a full treatment of cognitive issues, back towards the a priori formalisms of §2.1.3. Woods & others' approach still has some advantage over such formalisms, by taking the operational setting into account, but this trades off against simplicity, and means that their approach is harder to formalise.
In essence, these authors' approach to cognitive task analysis falls short of a detailed methodology, because there is still uncertainty about how the domain problem is to be described in the first place. Of course, for systems which are not especially complex, there may be a certain amount of cognition-independent task analysis based on the logical structure of the task. For more complex tasks, it is difficult to see how cognitive aspects of the task could usefully be analysed without a concurrent analysis of the actual cognitive task structure that the operators work with.
Holland et al.  present a detailed account of rule-based mental models, which they see as incorporating advantages from both production systems and connectionist approaches. The mental models of the world are based on what they call quasi-homomorphisms, or q-morphisms for short. In essence, any particular way of categorising the world gives rise to categories which are abstract to some degree, by leaving out some of the detailed properties of real-world objects. For instance, to give an extreme example from their book, we could categorise objects simply into fast-moving and slow-moving. These categories behave in certain more-or-less uniform ways, so that on the whole we can predict the future state of an object by classifying it, and applying a general rule for how that class of objects behaves. Thus, on the whole, fast-moving objects, after a while, become slow-moving. If we then look at the real world at a subsequent time, we might notice that some of the things that we had predicted were wrong: i.e., our categorisation was not sufficient to predict the behaviour of all the objects in which we were interested. For example, wasps, members of the class of fast-moving objects, mostly retain their fast-moving quality over time. This makes the mapping between real world and model a quasi-homomorphism, rather than a plain homomorphism, which a faithful many-to-one mapping of world to model would be. (This also contrasts with an isomorphism, which is a one-to-one mapping.)
If the person with the model is concerned to be able to predict more closely things that happen in the world, they can introduce another categorisation to deal with the exceptions from the first one, and this process can be repeated if necessary. Of course, the categorisations have to be based on detectable differences, but there are no hard-and-fast properties which always serve to classify particular objects. In their example, the further categorisation in terms of size (small/large) and stripiness (uniform/striped) might serve to distinguish wasps from a few other cases of fast objects that do indeed slow down over time. The way that things are classified depends on what the purpose of classification is, and can rely on feature clusters rather than specific predefined features alone.
Their theories are particularly interesting because they extend to giving accounts of the fundamental processes of inference, learning and discovery. This could provide a very powerful model in the long run, because if we can describe the ways in which people learn about complex systems (with any variations in such ways) we would be in a stronger position to model the knowledge that they actually have at any time. Also, having a theory of how knowledge can arise lends credence to the model of knowledge itself, and provides a composite model which is stronger than a model which speculates only on the nature of the knowledge that already exists. We could describe these theories as mental meta-models, since they set out to describe the processes that underlie the creation of mental models in the human.
Moray  endorses the work of Holland et al., and gives a different angle on what a process operator knows. He suggests that an operator's model includes an appreciation of the plant as made up from subsystems which are more-or-less independent, and their interactions are known to the operator to an appropriate degree of accuracy. He goes on to suggest that decision aids could well be based on a view of the plant consistent with the operator's model: he says that there are ways of predicting what a likely decomposition of the plant into ‘quasi-independent subsystems’ may look like. Of course, there may at times be a need to look at the state of the plant in more detail than usual, particularly when faults occur or errors have been made. A good decision aid would support presentation of information at the appropriate level of detail.
Moray furthermore claims that there are methods which could automatically discover the plausible subsystems of a complex system. This would mean that cognitive task analysis could proceed by identifying these subsystems, and using them as basic terms in the language describing the task from a cognitive point of view. The same subsystem structure could also be used as the basis for organising the information to be displayed to the operator.
We should note that Moray offers no evidence about the extent to which actual models of operators match up with the methodical analyses. Furthermore, it is not clear to what extent individual operators can or do develop their own models, differing from those of others. It would be surprising, though far from incredible, if a methodical analysis could show up a range of possible subsystem decompositions to match a range of individual models. If these questions are taken as challenges to further work, particularly towards discovering actual human mental models and representations, Moray's suggestions could be forerunners of a task analysis approach that was cognitive without being arbitrary or irremediably intuitive.
These approaches give some substance to hopes that there could be decision aids, and more general systems design methods, based on a much richer picture of mental models. These models could be scientists' models of users' or operators' models of a system, based on more general theoretical models of the users' or operators' cognitive processes.
It is patently obvious that people reason about the world, in a reasonably effective manner for everyday life, without recourse to detailed theory of, for example, physics. Much the same could be said of process operators: they manage to control complex systems by using knowledge which appears not to be equivalent to the engineering knowledge which went into aspects of the design. It seems reasonable to suppose that this knowledge is qualitative, rather than quantitative, and if we knew more about it, we may be able either to design decision aids which interact more effectively with the operator's knowledge, or to apply the principles learned towards the better design of human-machine systems.
The literature on qualitative models and reasoning (e.g. [37, 38, 69]) does not deal with HCI questions. Currently, it attempts to provide a model of the knowledge and reasoning processes which could support the kinds of commonsense reasoning that we know in everyday life.
In modelling commonsense reasoning, this approach implicitly tries to capture some of the underlying common knowledge that we have about the way the world works. This can be seen not as propositions, but as the framework which binds together the factual content in these models. While none of the authors here claim to model more than a fraction of commonsense reasoning, this approach at least offers a start where most others do not begin.
Qualitative modelling is not immediately very useful for the systems designer, since the methodology it provides is open-ended, but it has the advantage that for some of these systems (e.g. Kuipers' QSIM [70, 71]) the time development of the model has been implemented, so that if a researcher or analyst provides a qualitative description of a system, they may get qualitative predictions of the future possible behaviours of the system. Using this to model human reasoning would mean that the model predicts what the human should be able to predict, and thus we could say something about how the human should reason, based on their current knowledge.
But it can be difficult to express the functioning of a system in qualitative terms that lead to sensible predictions. The degree to which these models actually correspond to human thought processes is unknown. Hence this approach does not yet provide a practical means of modelling the operator controlling a complex system. It has, however, been used to model some systems as a preliminary to performing exhaustive analysis of what faults could occur in qualitative terms .
We could say that qualitative reasoning models are highly idealised models of people's knowledge and reasoning about systems.
This (see ) is an AI approach to describing common-sense reasoning in terms of formal logic. It attempts to capture something of the essence of default reasoning, and therefore could potentially be applied to mental modelling, possibly yielding insight into knowledge and inference structures that people use when controlling complex structures. However, this approach has not been applied yet to HCI issues, and its concentration on formalisms familiar to logicians tends to indicate a lack of concern with modelling human thought processes and structures, beyond a correspondence between the output of the human and of the model.
Here we have a set of considerations that are complementary to those arising from formalisms and models of cognition. Whereas some of the formalisms are well-developed, but not closely relevant to complex tasks, here we have a collection of points that are relevant to complex tasks but not well-developed.
Surprisingly little seems to have been written in detailed direct criticism of current mental modelling and cognitive task analysis approaches. The first paper discussed below gives one way of comparing a number of mental model and cognitive task analysis approaches. The four papers discussed next give the most direct criticisms though they are critical of different things. This is then followed by general criticisms that have a principled nature, somewhat in complement to the direct criticisms that have been offered in situ in the review section above, §2.3.
When a researcher devises a cognitive modelling method, there are bound to be some features that are more important in his or her mind than others. We would expect this to lead to the observed situation in the literature, where different models tackle the problem of modelling human-computer interaction in different ways, and from different viewpoints. In that there is no single ‘best’ model, there are likely to be trade-offs between different features, and for this reason, it makes sense to criticise modelling techniques only from stated viewpoints or with particular purposes.
A paper by Simon  gives us an analysis of cognitive models in HCI based on the trade-offs as he sees them. He considers a representative selection of models:
Figure 2.1: The two continuous dimensions of Simon's trade-off space
The diagram that is central to his analysis gives two continuous dimensions (Figure 2.1) and two discrete dimensions, given here with the models that Simon assigns to each category.
This void at the bottom of the diagram would be filled in properly only by some model that managed successfully to formalise the principles of human cognition that are relevant to HCI. Models that aspire to fill this gap must, among other things, justify why they are relevant.
The second of the continuous dimensions is displayed from left to right on Simon's diagram. When this dimension is seen in terms of knowledge operationalisation, the left is a high representation of knowledge, i.e., the knowledge that users employ is laid out rather than being recast into abstract form (in terms of memory, for example); when this dimension is seen in terms of parameterisation, the right means that there are more explicit parameters governing mental performance, such as decay rates of memories. Perhaps this trade-off is a temporary phenomenon: symbolic representation may be incompatible with parameters only because no parameters have yet been devised for higher-level mental operations.
Indeed, the search for higher-level parameters could be seen as one of the higher-level goals in cognitive research for HCI, in that it would be very helpful for system designers. Perhaps a trade-off analysis would be more successful when the field of HCI is more mature, with more clearly defined issues.
Rips  takes issue with some of the models which fit into the category of “more general mental models” in §2.1.5, and which serve the purpose of aiding the understanding of (mostly) psychologists studying cognition. The criticisms are levelled from the point of view of a psychologist or a philosopher, not a systems designer.
Rips suggests that by taking the idea of mental models too literally, some authors end up claiming that models help with questions of semantics and reference; some claim that the idea of ‘working models in the head' helps to explain how people reason about the everyday world; and some claim that people perform everyday logical inferences using mental models rather than internal logical rules.
The important point which Rips makes is that the concept of mental models does not suddenly solve philosophical or psychological problems. If we can be described as possessing mental models, then is it sure that the possession of them does not give us any privileged access to truth about the world, and mental models as explanations have no a priori advantage over other, previously established theories.
Rips acknowledges the usefulness of considering “perceptual representations” as well as symbolic representations, and if that is all that is meant by ‘mental models’ than he is quite happy with it. He also recognises that many writers use the idea of a mental model simply as an explanatory device, without any great ontological implications. The focus of theorising in the mental models fashion should be to explain the phenomena under consideration, rather than to gather support for the existence of theoretical entities; and just because a particular explanatory framework fits certain observations, this should not lead researchers into making assumptions about the status of the theoretical entities that are part of the model.
Lindgaard  delivers a brief attack of mixed quality on the idea that mental models can be used by system designers. It is claimed that mental models are of their nature subjective and individual, and that claim, although it contains a lot of truth, ignores the possibility that the common points between individuals' models may be of relevance and interest; and then it is stated baldly that “a good user interface can be designed entirely without reference to mental models”: hence the idea of mental models is “somewhat irrelevant” to system designers.
While it cannot be denied that it is possible to design an interface without explicitly thinking about mental models, one is led to wonder whether the author would also claim that it is possible to design a good interface without considering the information needs of the user, or the possible ways that the user may respond to information presented. These questions are questions about the designer's model of the user, whether explicit or implicit, whether consciously acknowledged or not. As we have seen in previous sections, many approaches to mental models have a bearing on these matters: whether it is by providing formal simplified approximations to the user's behaviour, or by discussing the digested fruits of experience working with operators controlling systems, or just by attempting to elucidate the mental workings behind the person's activities. Mitigating this over-general and sweeping criticism of the present state, Lindgaard implies that more research and development at a theoretical level could bring mental models to a point where they could be used in practice.
Booth  offers a much more detailed critique of the usefulness of a more closely defined class of model. In the context of the quest for the best understanding and modelling of the user, he casts doubt on the usefulness of predictive grammars within the design and development process. As an example of a predictive grammar, he focuses on TAG , which has been described above (§126.96.36.199). However, grammar models are not the only kind of predictive models that have been produced, and we will here discuss Booth's criticisms with respect to formal methods in general, realising that this may entail us arguing against something that, strictly, Booth does not claim, but in any case it is more relevant to us here than discussing grammar models alone. Since Booth lists eight specific criticisms, let us discuss them one by one.
As mentioned above for GOMS, in all the formal methods, what is taken as an elementary action is to some extent arbitrary. There is no established method for deciding what a human takes as being elementary, which itself may not be a stable quantity. Different grains of analysis would produce different analyses. This indeed means that in any model produced, the grain of analysis may not match that of the user, and so the model is in danger of failing to represent the user's model accurately. It is a commonly acknowledged potential failing of formal methods, that they give no guarantee of accurately representing the user's model . There still remains the question, whether formal methods, using a common formalism for the various systems, may provide a useful comparison of complexity even though this may not be identical with user's true cognitive complexity. The point is that the ordering of complexity may be similar even if the actual measure of complexity differs (consistently) between human and formal analysis. Booth is content to conclude that the grains of the formal model and the human are unlikely to match.
A user or operator brings prior knowledge and experience to a task, along with expectations and patterns of meaning and significance. This could be expected to vary from person to person, and for one individual it could vary across different situations and even across different times for the same situation and person. Booth points out that this means that there is no single definition of consistency for all situations. In terms of TAG, the abstract “features” used to give structure to the task are not fixed. This supports Reisner's critique  given above (§188.8.131.52), and also discussed below (§2.6.2).
Whether this is counted as a failing of formal methods depends on one's point of view. It is true that the formal model may not reflect all the varying views on consistency, but there are potential ways of circumventing the difficulties that arise from this. A system designer could propose the incorporation of training attempting to ensure that people shared the same understanding of the features of the task. And even if this was not possible, this would not prevent any variant sets of features from being worked into the formalism, if the analyst could discover what they were.
The next round of questions could include “should people be forced to be consistent with themselves, and with others?” and “how good or bad are various formal models at allowing the description of natural human inconsistency?” Booth does not address these.
Booth argues that human actions are based on that human's specific knowledge, which may be sufficiently different from the background knowledge of the designer or the analyst to prevent reliable prediction of the user's behaviour. He suggests that the actual knowledge used in practice is not practically amenable to formalisation into cognitive grammars. In what can be seen as a related point, Suchman  has argued that the situation in which an action occurs is crucial to the interpretation of that action. It is the very independence of formal models from their contexts that persuades both Suchman and Booth of the models' inability to account for action in real life (as opposed to laboratory experiment).
While this may be an argument against current formal methods, advocates of formal methods may regard it as a challenge to extend their formalisms to be able to take into account more background knowledge and context-dependency. It would be very difficult to argue convincingly that formal methods were in principle incapable of this, though it might be seen as unlikely that something as specific as a grammar would have much chance. Moreover, the more routine a situation is, the more the relevant world knowledge is likely to be a known quantity, and hence formalisable. This argument of Booth's looks more like an argument of degree than an argument of principle, liable to progressive erosion as the tendency to incorporate context-dependency and background or real-world knowledge into formal models develops.
One of the points of the formal methods mentioned above was the possibility of avoiding having to use prototypes. There may be reasons why user involvement is impractical, undesirable, or even impossible, whether for reasons of time and money or otherwise. It would be these situations where we would expect formal techniques to come into their own.
The essence of this criticism is that design does not in practice proceed by means of construction of a number of fully-fledged and specified designs, to be tested against each other competitively. Formal methods appear to assume that this is the context for their use, since their main claim to usefulness is to provide the capability of comparative evaluation.
This is quite probably a valid criticism. But this does not prevent the possible usefulness of formal methods, particularly in cases where for some reason comparative evaluation is wanted, for example if there is no experience of a certain kind of design being preferred for a certain system. In this or other ways, formal method studies could still inform design decisions, even if they are not carried out in most instances of design.
Here Booth makes the point, that in the early stages of design, the assumptions being made about the system may not remain valid as the details are worked out, and as changes of approach are made to circumvent newly discovered difficulties. This calls into question the usefulness of formal studies based on unreliable assumptions.
However, it is also possible to envisage formal methods playing a part in that very process of revising one's assumptions. Assessment by a formal method could reveal an unenvisaged area of unacceptable complexity, which could prompt the designer to turn the design towards more reasonable options.
It would appear that designers find current formal methods daunting, and Booth doubts whether even the originators of formal methods could use them to describe real complex systems.
Possible solutions to this could either involve the education of designers, so that they were familiar with and competent in the methods, or better formal methods, which are easier to use. None of this would be to any avail however, unless the concerns of designers are actually addressed by the formal methods. It is possible that influencing designer's concerns (or maybe those of their management) might help formal methods to gain acceptance and use.
This criticism compounds the previous one: current formal methods do not offer evaluation of all areas of concern, therefore a designer would have to learn more than one, with all the resultant effort to master a number of methods that are already seen to be too complex individually.
The validity of this criticism depends on whether or not one identifiable method by itself can provide useful results. It may be that for many design problems, one method (possibly different in different cases) may be able to do the needed evaluation, because there may be one issue in the design which is salient, and obviously more important than the others. None of the authors studied here would claim that formal methods provide a complete paradigm for system design, which has to be followed the whole way, though they differ in the degree to which they see formal methods as helping or guiding the process of design.
Taking in the spirit of all of Booth's criticisms, one could retort that it may well be that there are aspects of systems design which could be helped by formal methods in some circumstances. Fancifully, we could see Booth as providing us with an armoury for a joust against formal modellers, complete with suggestions as to how the weapons might be used; but Booth does not see it as his business to force the model knight off his white charger, out of his armour, and back into scholar's rags. In the course of this fray, I have suggested that there will be a second round, after both contestants have been unhorsed, thrashing out the possible lesser claims of formal models.
This is not to argue that current methods are actually substantially and practically helpful. It is just that the questions have not been argued in conclusive ways. There are further general arguments given below (§2.4.6–§2.4.9), intended to strengthen the case against the adequacy of current formal methods in the modelling of complex dynamic systems.
Bellotti  gives reasons, gathered from studying real design projects, why designers do not use what she calls “HCI design and evaluation techniques” (which include GOMS and TAG). This is directly a criticism of current design practice, and only indirectly a criticism of the modelling techniques themselves. The main reasons discovered are that there are unavoidable constraints (as well as some avoidable ones) inherent in the commercial systems design environment, which mean that current HCI techniques are not appropriate. Bellotti suggests that future design and evaluation techniques should bear in mind these constraints in their development. One might suggest, even, that HCI authors should take a little of their own medicine by considering the usability of their proposals. She gives the following as a summary of constraints.
As is obvious, and has been pointed out in several places above, people often do things in different ways. Similarly, it is not surprising that there are usually many ways, in principle, to construct a model which does the same overall task, i.e., produces the same overall output from the same overall input. Because many tasks which humans perform are difficult to emulate by computer, the focus of some AI research has been simply to get any one emulation running effectively, that being a major achievement.
But if we are content with a model only of the overall performance, we may fail to model many aspects of human performance, because we cannot know to what extent such an overall model works in detail like a human until we have specifically tested its similarity. It is particularly important in the study of control of complex systems to know about errors and their origins; and to be able to predict human errors, a model must faithfully simulate not only the overall input and output of the human operator, but also all the separate intermediate processes that are liable to error. Rasmussen  makes it clear that for his purposes, a model must not only be shown to work, but its points of failure must be investigated before we can be said to know how good that model is. In modelling the human operator, a model is more successful the closer its errors correspond with errors of the human.
Despite these observations, some authors seem to have fallen for the temptation of assuming that just because their model works, producing the right output in a plausible way, that there is no need for further discussion of its suitability for modelling human cognition. For example, Woods & Roth [148, p.40] are satisfied with a model that provides a “psychologically plausible effective procedure”. The trouble is that, if one has sufficient imagination, there may be many plausible effective models, which may differ radically from one another. They cannot all be equally good as models of the human.
As mentioned above (§184.108.40.206), the CLG model  assumes that analysis in terms of certain ‘levels’ reflects the structure of human knowledge. Presumably Moran does this because it is plausible (to him) and it produces in the right kind of results. There is no obvious justification why his levels should be the only right ones, nor does he give any reason for supposing that they are.
Authors using production systems in the execution of their models generally do not justify the appropriateness of production systems for modelling human mental processes. A notable exception is Anderson  who is concerned with the realism of his model in describing the mechanisms of human cognition and accounting for many different experimental observations from human cognitive psychology. All this shows, however, is that a production system is adequate or sufficient, not that it is necessary nor that it is the best way of modelling cognition. Anderson describes other production systems which are far less well matched to the realities of the human. And there is no reason to suppose that the same insights could not be captured through the use of some quite different formalism, suitably restricted, for example first-order logic, or the kind of language envisaged in the PUMs study .
When authors fail to discuss the appropriateness of their modelling formalism or program for representing actual human mental processes, one can only assume that either they do not think it is important (an odd view, considering the obvious nature of the remarks above), or that they have failed to notice and question a tacit assumption, that there is only one way in which an intelligent system can do the task that they are investigating. The same tacit assumption seems to be in play when failing to consider possible variation between individual humans.
It may be that some of the perceived difficulties with mental models, as well as their lack of generality, stem from their lack of theoretical rigour. It is easy to see how a lack of theoretical foundation could pose problems. As remarked above, there is often plenty of room for doubt about how well a model matches the relevant characteristics of what is being modelled. In the absence of theoretical argument, the problem of validating a system is merely moved, rather than solved, onto a problem of validating a mental modelling or task analysis technique.
One may at first suppose that a particular technique could be validated, after which it could be taken as reliable: the trouble is that there is no theory to suggest what range of other applications are sufficiently similar to the validated one to benefit from the validating exercise. For example, suppose that we had shown the validity of the TAG model for a range of word processors. Could we rely on its results for another new kind of word processor? for a speech-driven text system? for an intelligent photocopier? for a business decision aid? The problem is not just that the answers might be “no”. It is also that there is no sound basis for judging. It is not therefore surprising that available mental model techniques are not reckoned to be very good value.
Essentially, models not based on firm foundations of theory are less useful to systems design than they might be because they make little impression on the problem of validation.
It may be that formal methods work best on systems simple enough that the formalism can be applied without much ambiguity or room for choice. As much has been suggested above (§2.3.1), with reference to text editors. The tasks for which we use text editors are uncontroversial, in that when we have made up our mind exactly what to change, the editing process has a result which is clearly specified, even if (as is usual) there is more than one way of performing such a task.
We may imagine a system, perhaps many years in the future, which interacts by means of voice and touch screen, so that the author of a document could edit it with no more effort than it would take to tell someone else what to change. There would no longer need to be a task of text editing (as in the studies discussed above) distinct from the author's task of marking revisions.
We might speculate that the same end may befall many tasks that are easily formalisable, leaving humans in the long term with jobs that call for such things as judgement, insight, intuition, or whatever can only be formalised by making many arbitrary assumptions: for example, deciding what changes to make in a document. Formal methods have up to now not been applied to such tasks, so it is reasonable to suppose that those methods as we see them now would be unreliable on such tasks. Easily analysable tasks, on the other hand, should yield to automation or redefinition sooner or later. There need to be developments in the methodologies of formal analysis to deal somehow with this range of choice and individuality, before they could be applied to systems where there is room for human choice: both choice between methods of the performance of tasks, and choice between methods of application of the formalism to the analysis of the system.
Many authors remark on the dissimilarity between the performance of the learner and of the expert; and between their supposed respective mental models. Expertise comes with practice and experience. But in any complex system, the expert is only practiced at the range of systems or states that are encountered frequently. When that range is exceeded they are no longer experts by virtue of knowing what to do, from experience, in a particular situation. If anything, their expertise is then based on the relevant aspects of their mental model of the plant and process, and consists of some kind of problem-solving ability. In Rasmussen's terms, the mental processing moves to the knowledge-based level, and here operators may not have much advantage over others whose knowledge of the plant does not come from the experience of operation. Working at the knowledge-based level is usual for people learning to operate a system, so any information system should be prepared to treat the operator more like a learner at the appropriate times; and this is important because these situations are often the ones in which accidents occur. Differences between individuals have a bearing here, in that different operators may develop different areas of particular skill, and these differences need to be taken into account if we are to be able to model their expertise accurately, in the kind of way necessary if we are to model their proneness to errors, or know anything about the variation between their individual information requirements.
Rasmussen's ‘stepladder’ model of mental processes  (above, §1.3.1) is one attempt to characterise this issue. But there are no current attempts to make a formalisable or predictive model that captures the same insights.
A number of authors have expressed what they think would make a good mental model, though for obvious reasons they cannot say exactly how this would be achieved, or indeed whether it is in principle possible.
Green, Schiele & Payne  give six tentative criteria (here abbreviated) that they think a formalisable model should meet in order to be directly applicable by systems designers.
Models that would be currently appropriate, according to Hollnagel , would be those that describe intentions, goals, plans, strategies and the human ways of thinking about these.
Sheridan & Hennessy  suggest both qualitative and quantitative models of the various parts of the human-machine system, which would then give to the systems designer the goal of harmonising these various models.
In giving these desiderata, these authors either state or imply that no model currently comes up to all of these standards at once, and also they imply that they do not see any model coming up to those standards in the near future. This points towards the need to develop theories, that are currently still in the realm of speculative psychology, into formal implemented models; to broaden current formal theories to take into account more of the realities of human cognition; and to package them into a form that matches the needs of their users, the designers.
A model of a human operator which could be combined with a task description, and from that predict what information the operator would need, at what time, would make the task of the designer much easier. For whereas in the simplest systems it is possible to present all relevant information all the time, in a complex system this is not practical, and thus the designer needs to know about the operator's information needs. In order for the designer to know how to prioritise information for the operator, the model of the operator needs to tell the designer much about the presumed mental processes in progress in the operator. Such a model would at least have to cover the operator's representations both of the task, and of the tools by which he or she is going to perform that task; and the operator's ways of combining together and processing that information.
Needs are perhaps only the converse of desires, but the points that can be made about the general deficiencies in mental models differ somewhat from the desiderata.
Woods  states that the problem in designing decision support systems is “the lack of an adequate language to describe cognitive activities in particular domains”. The formalisms reviewed above are too idealised for this task, as they rely on unsubstantiated assumptions. In the opinion of Hollnagel & Woods , “practically every attempt to make a formal description of a part of human activity” fails to recognise that one cannot formalise human activity on the same basis as the logical models of the physical world.
The models of cognition are off target, dealing with cognition at the wrong level. Suchman [134, p.178] characterises the research strategy in cognitive science as first representing mental constructs, then stipulating the procedures by which those constructs might determine action: this is seen as relying on an explicit enumeration of the conditions under which certain actions are appropriate. Suchman wishes to “explore the relation of knowledge and action to the particular circumstances in which knowing an acting invariably occur.” Suchman goes on to say that the way each person interprets and uses particular circumstances depends on the particular situation. Her views tally with the recognition that skilled behaviour does not have the explicit nature of problem-solving tasks, such as the involvement of detailed stable plans. The AI view of planning is also regarded as inadequate for characterising human cognition in HCI by Young & Simon . They suggest an approach based on partial, rather than complete, plans.
The literature seems to be saying that neither formal nor more general mental models are able to deal with the particularities of real complex tasks. Further, if we are to model an operator in a situation where there are a number of concurrent tasks, this would obviously compound the modelling problem with the need to model the tasks separately, and it may also introduce the added need to model the way in which the operator allocates attention and effort between the various tasks. All this underlines what may be regarded as the central message from the literature, that there are currently no experimentally derived and tested models of human performance in complex control tasks.
The problem we are left with, having reviewed a selection of the current literature, is two-edged. On one side (§2.3.3), there is the literature concerned with communicating important insights into the mental processes that go on in complex control. The concepts here (such as the skill, rule, and knowledge distinction) make good sense intuitively, and it is easy to see their importance and relevance to the subject. However, it is not clear from the literature whether these concepts could be formalised, and if so, how. This requires moving towards more formalism, attempting to detail how these general concepts are realised in individual humans.
On the other side (§2.3.1), there are the formal task analysis and modelling techniques, from which can be built up structures that are plausible as models of human cognition, but fail to connect with the concepts that are seen as central to the realities of process control. Here we need to move towards right structure, by finding out how to formalise the concepts that we are really interested in. Thus both sides are seeking to answer the questions, what concepts, or mental structures and processes, do humans actually use in these complex tasks? How do humans represent systems and tasks internally? The goal of both sides is to enable the building of predictive models which are also relevant.
But if this is a representation problem, it is not closely related to the way in which representation is treated in many studies, particularly some of those from Artificial Intelligence. There, the problem is more usually how to represent the structure and interrelationship of concepts that are given—they are already part of human knowledge explicitly. Amarel , for example, discusses different ways of representing the ‘missionaries and cannibals’ puzzle, while Korf  discusses the Tower of Hanoi and other problems. The present study, in contrast, is looking for concepts that are not already explicitly known or defined. The human ability to create new concepts to structure an area of knowledge or experience has not been well analysed or explained in the literature reviewed.
How is it that formalisms alone do not solve the representation problem? We have already noted above (§220.127.116.11) how in TAG  the way the analysis is done relies on the intuition of the analyst to select features that have psychological validity. Payne & Green fear that there is nothing analysts can do to ensure that ideal of psychological validity, rather than being bound by his or her own intuitive preconceptions about the task. The present study would agree that nothing a priori can be done to check the validity of a representation, but that experimental evidence may be able to bear on this, in a way not anticipated by Payne & Green.
The difficulties may be even easier to see with GOMS. Card, Moran & Newell [20, p.224–5] give a GOMS analysis of the use of a text-editing system, BRAVO. We shall consider this here in more detail than above, §18.104.22.168.
There is a single overall ‘Goal’, to edit the document. Whether this is the only goal might have an effect on the way in which the task is done: if the user was interested in the content of the document, or particularly hurried or at leisure, one could imagine differences in performance. But let us for the moment ignore these factors. At the other end, that of ‘Operators’, clearly, the system that is being used ultimately defines a set of actions that can be performed with it. This would be unequivocal at the level of the keystroke.
It is the intermediate levels where most room for doubt exists. This is the realm of the ‘Method’, and for each method given in their analysis, one is tempted to ask, are there any other ways of doing this? Because the domain is a relatively straightforward one, the answer will often be that there are no sensible alternatives. For example, for the goal “Acquire-unit-task”, “Read-task-in-ms-method”, which is simply to “Get-from-manuscript” seems reasonable enough. But then, how about unmarked spelling errors that are seen in passing? Maybe one is prohibited from dealing with these, if it is a restricted enough working environment.
But how about the goal “Select-target”? The “Zero-in-method” given for this seems rather contrived. The problem is not in the way in which it is written out, but in the content. The method implies a series of approximate targets, which are pointed to on the way to finding the actual target (the goal “Point-to-Target”). One is left with the feeling that perhaps what is given is plausible, but is it possible to specify the method to that extent, without alternatives? What are these “approximate” targets?
Even if one was happy with the way a particular method was specified, there is the question of alternatives. For the goal “Point-to-Target” there are five methods quoted. How would one know whether this list was complete? One possible approach, avoiding the need to specify, is to say that if another method is found, it would be included, and selection rules added. This may be a model of the way people refine their task performance, but it does not help us to model the behaviour of an operator who has already achieved a refined skill, much of which is beyond the level of conscious expression.
One should be careful to distinguish questions of method, which are questions about the representation of actions, from questions about selection rules (for which there are again five for the goal “Point-to-Target”). Simply changing a parameter in a selection rule does not alter the representation language needed for the complete analysis, but introducing selection on the basis of a new criterion would be changing the representation, i.e., it would be altering how the situation would have to be perceived by the user to use those selection rules.
In essence, what is uncertain in the GOMS methodology is to what extent the structure of methods and selection rules matches any particular user's internal representation of both situations and actions. Neither Card, Moran & Newell, nor Payne & Green, nor any other authors of formalisable models, give ways of deriving this intermediate structure from analysis of the actions of the users themselves. But if a way to do this was found, much of the uncertainty of their analyses could be eliminated, and researchers could profitably debate what formalism was the most apt for codifying the representational language and structure of users.
The point about formal methods can be focused a different way by looking at how formalisation relates to consistency, amplifying the point made by Booth, discussed above (§2.4.4). Reisner  was searching for “a single, consistent, psychological” formalisation in which to describe human task performance, although she noted that it was not clear how to define consistency and inconsistency. Payne & Green  carried further the aim “to capture the notion of regularity or consistency. Consistency is difficult to define, and therefore difficult to measure, but is informally recognised to be a major determinant of learnability.” If there were one favoured way of formalising a task, which somehow maximised consistency, then that would define a good representation for that task, and formal modellers would be able to produce logically equivalent formalisms to capture this.
A recent paper by Grudin, “The case against user interface consistency” , strongly suggests that formal ideas of consistency, springing from a priori analysis rather than patterns of use in the work environment, are bad guides for systems design. He illustrates the point by considering the “household interface design problem” of where to keep knives. The ‘consistent’ approach, of keeping them all together, conflicts with the common sense approach based on usage, in which we keep most of our dinner knives together, perhaps, but not together with the putty knife (which is in the garage) or the Swiss army knife (with the camping equipment). The scope for analogy with computer systems is clear. Applying this to analyses, if we are guided by formal consistency, we may miss the common-sense human representations of tasks or problems, which are used only because they have been found to work in practice, rather than in theory.
What can we then say about consistency? Reisner  takes up all these points, focusing the question onto the issue of partitioning the universe into classes. “There is more than one way to partition the universe”, she states, because “semantic features are (probably) context dependent.” The important issue in systems design can then be seen clearly to be, not any a priori consistency, but whether the way the system designer partitions the world is sufficiently similar to the way the user partitions the world. If it is not, then inevitably the user will mis-classify things (according to the designer), which could lead to mistaken expectations, or actions whose effects are other than intended. It is perhaps a related point made by Halasz & Moran , in the paper, “Analogy considered harmful”. We could see analogy bringing in a way of partitioning part of the world, based on a successful way of partitioning a different part, to help with a new task: but this is unlikely to match exactly, and it is very likely to lead to problems if the learner extends the analogical representation beyond where it fits. They judge that an abstract, conceptual model is preferable to an analogical one, for training.
The chief point in this discussion is that, in general, since humans have no standard of consistent representation, formalisms cannot capture ‘it’. This is equally true of analysis, and of making models of human performance, as of design.
There are no generally known formal analyses of complex task performance: what literature there is discusses the question in a more general way. Woods  identifies the problems of interface design, and support system design, with the problem of representation design. He does not mean by this the details of exactly how information is displayed (e.g., its visual appearance), but rather the structure of the information, in the sense of the mapping between basic measurements and the entities that are to be displayed.
Woods makes a number of useful points about this aspect of representation in HCI. The context determines what is vital information, and this can be seen as a ‘context sensitivity problem’. Simply making information available, according to some supposedly logical scheme, can invite problems, in cases where an operator cannot simultaneously keep track of all possibly relevant information. The challenge, in designing interfaces or support systems, is to structure the information so that the operator can most easily use it, by ensuring that the structure of the information matches as well as possible the structure of the actual task as done by the operator (not some idealised laboratory or prototype version of the task performed by the designer). The HCI challenge amounts to fitting the presentation of information to the operator's representation (which we can think of as an aspect of their mental model). Woods goes on to give more ideas about general ways that information can be integrated, computationally or analogically, in order to suit supposed requirements of some kind of general user, but he gives no leads on the question of distinguishing the needs of different users or classes of user.
In process control tasks, with a very large number of measurements available, it is tempting to think of the problem of representation as selecting relevant facts from a single, supposedly complete, set; whereas the choice of representation is a wider issue, including the possibility of higher-level concepts that are not measured directly, and covering the choice of ways of describing the whole situation. Any calculation of the number of possible representations of a complex situation (the size of representation space) gives an extremely large number—so large that it is very difficult to imagine any established general method being able to produce worthwhile results.
To illustrate this, let us suppose that we wished to control some complex process, and we were given some rules in the form of condition–action pairs. If we consider all the concepts or terms present in the rules, that defines the representation necessary for the use of those rules. For instance, we may be given a set of rules in terms of raw gauge readings. For these, we would have to know which gauge was which, and how to read them. Rules in this form might be long-winded and cumbersome. Alternatively, we may have some rules framed in terms of higher-level concepts. In this case, before we could operate the system effectively, we would have to learn how to tell what facts were true about these higher-level concepts. So a representation is like a mapping or a function (but in general, a program) which takes the raw, uninterpreted real world (about which we can say nothing without interpreting it), and delivers facts, in the first case (low-level) such as “valve A is open”, “the pressure is B”, “the speed is C”; and in the second case (higher-level) such as “system P is unstable”, “Q is dangerously hot”, “R and S are on a collision course”. To reiterate, the representation is the connection between the real world (before it has been described in any formalism) and the terms used in—the language of—the facts and rules relevant to the task.
From this discussion of representation, it should be clearer both why it would be useful to study the actual representations that humans have (e.g. of complex systems), and what the object of that study would be. In essence, deciding how operator support systems should present information to the human, without having firm knowledge about human representations, is shooting in the dark.
We have eschewed detailed consideration of training and learning, on the grounds that the models there are used in a different way for a different purpose (§2.1.6). But it may be helpful at this point to imagine the way in which representations that people have of tasks and systems relate both to the stages of learning a task, and to Rasmussen's distinction between skills, rules, and knowledge. This is meant as an imaginative aid to help focus on what these internal mental representations might be.
At the initial stages of learning a new task, we can imagine a trainee working with general-purpose problem-solving representations, or with representations based on analogy or metaphor, and working at an explicit, knowledge-based level. He or she would be selecting and starting to shape a representation suitable for the new task: in particular, the overall structure of the system and the task, and the meaning of the terms used by others to describe it.
At an intermediate stage of learning, a representation of the system and task would be being refined, combining lower-level into higher-level concepts, and building up compound actions out of elementary ones. We can imagine rules being learnt, rules that are made up from the representational primitives that have been identified. This process could then iterate, if the rules found were not able to support an adequate level of performance, by refining or altering the representation. This would start off at the conscious level, and progressively become more automated.
At the final stages of learning a skill, we can imagine the faculties having become so attuned to the task that much of the learning would be at a subconscious level. There would be continuing refinement of representation, but at this level, the operator would not be able consciously to express how the representation was being refined. The grosser structure of the representation could have stabilised, giving more chance for experimental study, while the parts that were being further refined could be the lowest-level parameters of the representation.
It seems reasonable to suppose that a human, or other intelligent system, would do well to have many different representations of the world, suitable for different circumstances. Because facts and rules are based on the representational system, we would expect to see facts and rules being learnt and used in a context where there was also a well-defined (even if not verbally expressible) representation. Analogy may be an exception, with the representation from one domain being used to provide initial structure for the learning of informational knowledge in another domain [56, 65].
|Next Chapter 3|