|©1990, 1995||section list||2: Literature||overview||General Contents|
|Section 2.3||2.4 Critique subsections||Section 2.5|
Surprisingly little seems to have been written in detailed direct criticism of current mental modelling and cognitive task analysis approaches. The first paper discussed below gives one way of comparing a number of mental model and cognitive task analysis approaches. The four papers discussed next give the most direct criticisms though they are critical of different things. This is then followed by general criticisms that have a principled nature, somewhat in complement to the direct criticisms that have been offered in situ in the review section above, § 2.3.
When a researcher devises a cognitive modelling method, there are bound to be some features that are more important in his or her mind than others. We would expect this to lead to the observed situation in the literature, where different models tackle the problem of modelling human-computer interaction in different ways, and from different viewpoints. In that there is no single `best' model, there are likely to be trade-offs between different features, and for this reason, it makes sense to criticise modelling techniques only from stated viewpoints or with particular purposes.
A paper by Simon  gives us an analysis of cognitive models in HCI based on the trade-offs as he sees them. He considers a representative selection of models:
Figure 2.1: The two continuous dimensions of Simon's trade-off space
The diagram that is central to his analysis gives two continuous dimensions (Figure 2.1) and two discrete dimensions, given here with the models that Simon assigns to each category.
This void at the bottom of the diagram would be filled in properly only by some model that managed successfully to formalise the principles of human cognition that are relevant to HCI. Models that aspire to fill this gap must, among other things, justify why they are relevant.
The second of the continuous dimensions is displayed from left to right on Simon's diagram. When this dimension is seen in terms of knowledge operationalisation, the left is a high representation of knowledge, i.e., the knowledge that users employ is laid out rather than being recast into abstract form (in terms of memory, for example); when this dimension is seen in terms of parameterisation, the right means that there are more explicit parameters governing mental performance, such as decay rates of memories. Perhaps this trade-off is a temporary phenomenon: symbolic representation may be incompatible with parameters only because no parameters have yet been devised for higher-level mental operations.
Indeed, the search for higher-level parameters could be seen as one of the higher-level goals in cognitive research for HCI, in that it would be very helpful for system designers. Perhaps a trade-off analysis would be more successful when the field of HCI is more mature, with more clearly defined issues.
Rips  takes issue with some of the models which fit into the category of ``more general mental models'' in § 2.1.5, and which serve the purpose of aiding the understanding of (mostly) psychologists studying cognition. The criticisms are levelled from the point of view of a psychologist or a philosopher, not a systems designer.
Rips suggests that by taking the idea of mental models too literally, some authors end up claiming that models help with questions of semantics and reference; some claim that the idea of `working models in the head' helps to explain how people reason about the everyday world; and some claim that people perform everyday logical inferences using mental models rather than internal logical rules.
The important point which Rips makes is that the concept of mental models does not suddenly solve philosophical or psychological problems. If we can be described as possessing mental models, then is it sure that the possession of them does not give us any privileged access to truth about the world, and mental models as explanations have no a priori advantage over other, previously established theories.
Rips acknowledges the usefulness of considering ``perceptual representations'' as well as symbolic representations, and if that is all that is meant by `mental models' than he is quite happy with it. He also recognises that many writers use the idea of a mental model simply as an explanatory device, without any great ontological implications. The focus of theorising in the mental models fashion should be to explain the phenomena under consideration, rather than to gather support for the existence of theoretical entities; and just because a particular explanatory framework fits certain observations, this should not lead researchers into making assumptions about the status of the theoretical entities that are part of the model.
Lindgaard  delivers a brief attack of mixed quality on the idea that mental models can be used by system designers. It is claimed that mental models are of their nature subjective and individual, and that claim, although it contains a lot of truth, ignores the possibility that the common points between individuals' models may be of relevance and interest; and then it is stated baldly that ``a good user interface can be designed entirely without reference to mental models'': hence the idea of mental models is ``somewhat irrelevant'' to system designers.
While it cannot be denied that it is possible to design an interface without explicitly thinking about mental models, one is led to wonder whether the author would also claim that it is possible to design a good interface without considering the information needs of the user, or the possible ways that the user may respond to information presented. These questions are questions about the designer's model of the user, whether explicit or implicit, whether consciously acknowledged or not. As we have seen in previous sections, many approaches to mental models have a bearing on these matters: whether it is by providing formal simplified approximations to the user's behaviour, or by discussing the digested fruits of experience working with operators controlling systems, or just by attempting to elucidate the mental workings behind the person's activities. Mitigating this over-general and sweeping criticism of the present state, Lindgaard implies that more research and development at a theoretical level could bring mental models to a point where they could be used in practice.
Booth  offers a much more detailed critique of the usefulness of a more closely defined class of model. In the context of the quest for the best understanding and modelling of the user, he casts doubt on the usefulness of predictive grammars within the design and development process. As an example of a predictive grammar, he focuses on TAG , which has been described above (§ 22.214.171.124). However, grammar models are not the only kind of predictive models that have been produced, and we will here discuss Booth's criticisms with respect to formal methods in general, realising that this may entail us arguing against something that, strictly, Booth does not claim, but in any case it is more relevant to us here than discussing grammar models alone. Since Booth lists eight specific criticisms, let us discuss them one by one.
As mentioned above for GOMS, in all the formal methods, what is taken as an elementary action is to some extent arbitrary. There is no established method for deciding what a human takes as being elementary, which itself may not be a stable quantity. Different grains of analysis would produce different analyses. This indeed means that in any model produced, the grain of analysis may not match that of the user, and so the model is in danger of failing to represent the user's model accurately. It is a commonly acknowledged potential failing of formal methods, that they give no guarantee of accurately representing the user's model . There still remains the question, whether formal methods, using a common formalism for the various systems, may provide a useful comparison of complexity even though this may not be identical with user's true cognitive complexity. The point is that the ordering of complexity may be similar even if the actual measure of complexity differs (consistently) between human and formal analysis. Booth is content to conclude that the grains of the formal model and the human are unlikely to match.
A user or operator brings prior knowledge and experience to a task, along with expectations and patterns of meaning and significance. This could be expected to vary from person to person, and for one individual it could vary across different situations and even across different times for the same situation and person. Booth points out that this means that there is no single definition of consistency for all situations. In terms of TAG, the abstract ``features'' used to give structure to the task are not fixed. This supports Reisner's critique  given above (§ 126.96.36.199), and also discussed below (§ 2.6.2).
Whether this is counted as a failing of formal methods depends on one's point of view. It is true that the formal model may not reflect all the varying views on consistency, but there are potential ways of circumventing the difficulties that arise from this. A system designer could propose the incorporation of training attempting to ensure that people shared the same understanding of the features of the task. And even if this was not possible, this would not prevent any variant sets of features from being worked into the formalism, if the analyst could discover what they were.
The next round of questions could include ``should people be forced to be consistent with themselves, and with others?'' and ``how good or bad are various formal models at allowing the description of natural human inconsistency?'' Booth does not address these.
Booth argues that human actions are based on that human's specific knowledge, which may be sufficiently different from the background knowledge of the designer or the analyst to prevent reliable prediction of the user's behaviour. He suggests that the actual knowledge used in practice is not practically amenable to formalisation into cognitive grammars. In what can be seen as a related point, Suchman  has argued that the situation in which an action occurs is crucial to the interpretation of that action. It is the very independence of formal models from their contexts that persuades both Suchman and Booth of the models' inability to account for action in real life (as opposed to laboratory experiment).
While this may be an argument against current formal methods, advocates of formal methods may regard it as a challenge to extend their formalisms to be able to take into account more background knowledge and context-dependency. It would be very difficult to argue convincingly that formal methods were in principle incapable of this, though it might be seen as unlikely that something as specific as a grammar would have much chance. Moreover, the more routine a situation is, the more the relevant world knowledge is likely to be a known quantity, and hence formalisable. This argument of Booth's looks more like an argument of degree than an argument of principle, liable to progressive erosion as the tendency to incorporate context-dependency and background or real-world knowledge into formal models develops.
One of the points of the formal methods mentioned above was the possibility of avoiding having to use prototypes. There may be reasons why user involvement is impractical, undesirable, or even impossible, whether for reasons of time and money or otherwise. It would be these situations where we would expect formal techniques to come into their own.
The essence of this criticism is that design does not in practice proceed by means of construction of a number of fully-fledged and specified designs, to be tested against each other competitively. Formal methods appear to assume that this is the context for their use, since their main claim to usefulness is to provide the capability of comparative evaluation.
This is quite probably a valid criticism. But this does not prevent the possible usefulness of formal methods, particularly in cases where for some reason comparative evaluation is wanted, for example if there is no experience of a certain kind of design being preferred for a certain system. In this or other ways, formal method studies could still inform design decisions, even if they are not carried out in most instances of design.
Here Booth makes the point, that in the early stages of design, the assumptions being made about the system may not remain valid as the details are worked out, and as changes of approach are made to circumvent newly discovered difficulties. This calls into question the usefulness of formal studies based on unreliable assumptions.
However, it is also possible to envisage formal methods playing a part in that very process of revising one's assumptions. Assessment by a formal method could reveal an unenvisaged area of unacceptable complexity, which could prompt the designer to turn the design towards more reasonable options.
It would appear that designers find current formal methods daunting, and Booth doubts whether even the originators of formal methods could use them to describe real complex systems.
Possible solutions to this could either involve the education of designers, so that they were familiar with and competent in the methods, or better formal methods, which are easier to use. None of this would be to any avail however, unless the concerns of designers are actually addressed by the formal methods. It is possible that influencing designer's concerns (or maybe those of their management) might help formal methods to gain acceptance and use.
This criticism compounds the previous one: current formal methods do not offer evaluation of all areas of concern, therefore a designer would have to learn more than one, with all the resultant effort to master a number of methods that are already seen to be too complex individually.
The validity of this criticism depends on whether or not one identifiable method by itself can provide useful results. It may be that for many design problems, one method (possibly different in different cases) may be able to do the needed evaluation, because there may be one issue in the design which is salient, and obviously more important than the others. None of the authors studied here would claim that formal methods provide a complete paradigm for system design, which has to be followed the whole way, though they differ in the degree to which they see formal methods as helping or guiding the process of design.
Taking in the spirit of all of Booth's criticisms, one could retort that it may well be that there are aspects of systems design which could be helped by formal methods in some circumstances. Fancifully, we could see Booth as providing us with an armoury for a joust against formal modellers, complete with suggestions as to how the weapons might be used; but Booth does not see it as his business to force the model knight off his white charger, out of his armour, and back into scholar's rags. In the course of this fray, I have suggested that there will be a second round, after both contestants have been unhorsed, thrashing out the possible lesser claims of formal models.
This is not to argue that current methods are actually substantially and practically helpful. It is just that the questions have not been argued in conclusive ways. There are further general arguments given below (§ 2.4.6--§ 2.4.9), intended to strengthen the case against the adequacy of current formal methods in the modelling of complex dynamic systems.
Bellotti  gives reasons, gathered from studying real design projects, why designers do not use what she calls ``HCI design and evaluation techniques'' (which include GOMS and TAG). This is directly a criticism of current design practice, and only indirectly a criticism of the modelling techniques themselves. The main reasons discovered are that there are unavoidable constraints (as well as some avoidable ones) inherent in the commercial systems design environment, which mean that current HCI techniques are not appropriate. Bellotti suggests that future design and evaluation techniques should bear in mind these constraints in their development. One might suggest, even, that HCI authors should take a little of their own medicine by considering the usability of their proposals. She gives the following as a summary of constraints.
As is obvious, and has been pointed out in several places above, people often do things in different ways. Similarly, it is not surprising that there are usually many ways, in principle, to construct a model which does the same overall task, i.e., produces the same overall output from the same overall input. Because many tasks which humans perform are difficult to emulate by computer, the focus of some AI research has been simply to get any one emulation running effectively, that being a major achievement.
But if we are content with a model only of the overall performance, we may fail to model many aspects of human performance, because we cannot know to what extent such an overall model works in detail like a human until we have specifically tested its similarity. It is particularly important in the study of control of complex systems to know about errors and their origins; and to be able to predict human errors, a model must faithfully simulate not only the overall input and output of the human operator, but also all the separate intermediate processes that are liable to error. Rasmussen  makes it clear that for his purposes, a model must not only be shown to work, but its points of failure must be investigated before we can be said to know how good that model is. In modelling the human operator, a model is more successful the closer its errors correspond with errors of the human.
Despite these observations, some authors seem to have fallen for the temptation of assuming that just because their model works, producing the right output in a plausible way, that there is no need for further discussion of its suitability for modelling human cognition. For example, Woods & Roth [148, p.40] are satisfied with a model that provides a ``psychologically plausible effective procedure''. The trouble is that, if one has sufficient imagination, there may be many plausible effective models, which may differ radically from one another. They cannot all be equally good as models of the human.
As mentioned above (§ 188.8.131.52), the CLG model  assumes that analysis in terms of certain `levels' reflects the structure of human knowledge. Presumably Moran does this because it is plausible (to him) and it produces in the right kind of results. There is no obvious justification why his levels should be the only right ones, nor does he give any reason for supposing that they are.
Authors using production systems in the execution of their models generally do not justify the appropriateness of production systems for modelling human mental processes. A notable exception is Anderson  who is concerned with the realism of his model in describing the mechanisms of human cognition and accounting for many different experimental observations from human cognitive psychology. All this shows, however, is that a production system is adequate or sufficient, not that it is necessary nor that it is the best way of modelling cognition. Anderson describes other production systems which are far less well matched to the realities of the human. And there is no reason to suppose that the same insights could not be captured through the use of some quite different formalism, suitably restricted, for example first-order logic, or the kind of language envisaged in the PUMs study .
When authors fail to discuss the appropriateness of their modelling formalism or program for representing actual human mental processes, one can only assume that either they do not think it is important (an odd view, considering the obvious nature of the remarks above), or that they have failed to notice and question a tacit assumption, that there is only one way in which an intelligent system can do the task that they are investigating. The same tacit assumption seems to be in play when failing to consider possible variation between individual humans.
It may be that some of the perceived difficulties with mental models, as well as their lack of generality, stem from their lack of theoretical rigour. It is easy to see how a lack of theoretical foundation could pose problems. As remarked above, there is often plenty of room for doubt about how well a model matches the relevant characteristics of what is being modelled. In the absence of theoretical argument, the problem of validating a system is merely moved, rather than solved, onto a problem of validating a mental modelling or task analysis technique.
One may at first suppose that a particular technique could be validated, after which it could be taken as reliable: the trouble is that there is no theory to suggest what range of other applications are sufficiently similar to the validated one to benefit from the validating exercise. For example, suppose that we had shown the validity of the TAG model for a range of word processors. Could we rely on its results for another new kind of word processor? for a speech-driven text system? for an intelligent photocopier? for a business decision aid? The problem is not just that the answers might be ``no''. It is also that there is no sound basis for judging. It is not therefore surprising that available mental model techniques are not reckoned to be very good value.
Essentially, models not based on firm foundations of theory are less useful to systems design than they might be because they make little impression on the problem of validation.
It may be that formal methods work best on systems simple enough that the formalism can be applied without much ambiguity or room for choice. As much has been suggested above (§ 2.3.1), with reference to text editors. The tasks for which we use text editors are uncontroversial, in that when we have made up our mind exactly what to change, the editing process has a result which is clearly specified, even if (as is usual) there is more than one way of performing such a task.
We may imagine a system, perhaps many years in the future, which interacts by means of voice and touch screen, so that the author of a document could edit it with no more effort than it would take to tell someone else what to change. There would no longer need to be a task of text editing (as in the studies discussed above) distinct from the author's task of marking revisions.
We might speculate that the same end may befall many tasks that are easily formalisable, leaving humans in the long term with jobs that call for such things as judgement, insight, intuition, or whatever can only be formalised by making many arbitrary assumptions: for example, deciding what changes to make in a document. Formal methods have up to now not been applied to such tasks, so it is reasonable to suppose that those methods as we see them now would be unreliable on such tasks. Easily analysable tasks, on the other hand, should yield to automation or redefinition sooner or later. There need to be developments in the methodologies of formal analysis to deal somehow with this range of choice and individuality, before they could be applied to systems where there is room for human choice: both choice between methods of the performance of tasks, and choice between methods of application of the formalism to the analysis of the system.
Many authors remark on the dissimilarity between the performance of the learner and of the expert; and between their supposed respective mental models. Expertise comes with practice and experience. But in any complex system, the expert is only practiced at the range of systems or states that are encountered frequently. When that range is exceeded they are no longer experts by virtue of knowing what to do, from experience, in a particular situation. If anything, their expertise is then based on the relevant aspects of their mental model of the plant and process, and consists of some kind of problem-solving ability. In Rasmussen's terms, the mental processing moves to the knowledge-based level, and here operators may not have much advantage over others whose knowledge of the plant does not come from the experience of operation. Working at the knowledge-based level is usual for people learning to operate a system, so any information system should be prepared to treat the operator more like a learner at the appropriate times; and this is important because these situations are often the ones in which accidents occur. Differences between individuals have a bearing here, in that different operators may develop different areas of particular skill, and these differences need to be taken into account if we are to be able to model their expertise accurately, in the kind of way necessary if we are to model their proneness to errors, or know anything about the variation between their individual information requirements.
Rasmussen's `stepladder' model of mental processes  (above, § 1.3.1) is one attempt to characterise this issue. But there are no current attempts to make a formalisable or predictive model that captures the same insights.
|Next Section 2.5|