©1990, 1995 General contents
Chapter 7 Appendix A

Chapter 8: Overall interpretation of results, conclusions and directions

8.1 Conclusions on human representations of complex systems

8.1.1 Collected salient important findings

Looking over all the foregoing work, we may collect together a number of points that are relevant to our central theme.

  1. Issues to do with complexity have not been addressed within the cognitive science tradition in sufficient depth to encompass many important features of complex tasks (§, §2.5.2). This justified the first research goal of exploring cognition in complex tasks.
  2. Formal methods alone do not solve the problem of representation for complex tasks, as there is insufficient empirical evidence to avoid their relying on unsubstantiated assumptions (§2.6.1). This brings out the problem of finding representations which are adequate for a fuller description of complex task performance.
  3. Machine learning without human task performance input does not reveal human representations for that task, because the range of possible human representations is too wide (§3.2.4). This meant that human performance data had to be used in the analysis.
  4. The full study of tasks involving motor skills involves modelling psycho-motor abilities and limitations, and is therefore less likely to reveal cognitive structure than studies of tasks not centrally involving motor skills (§4.4.3).
  5. There were at the time of writing no readily available tasks that are well-suited to the study of cognitive aspects of complex control tasks (§5.2). This therefore necessitated the construction of a suitable task.
  6. It proved possible to construct a simulation game task, that fulfilled the criteria (§5.1), with programs amounting to some 10000 lines of C source code, and with analysis programs amounting to around 5000 lines of C source code and shell scripts. From this was found (§6.4.5):
    1. the different representations of situations and actions, that were used in rule induction of human task performance data, led to corresponding rules that perform differently when tested in the standard way (on predicting actions from further data not used in the rule induction); this confirmed the ability of rule-induction to act as a test of the quality of a representation;
    2. in many cases, the performance of rules induced on data from a particular time interval was better when tested against data from the same interval, or a close one, and worse when tested against data from intervals that were more distant in time; this could be explained in terms of human rules that were changing over time through learning.
    Thus it was evident that rule-induction was a useful tool for the exploration of cognitive aspects of complex tasks.
  7. It was possible to implement a version of the task where sensors were priced and able to be turned on and off. The subjects' sensor usage fell into natural groups, and these groups formed the basis of a division of the subjects' performance data into 'contexts', which were peculiar to each subject, and had some correspondence with the stages of the task as reported verbally. Using a context-based representation for rule induction revealed strikingly different degrees of ruliness in some of the different contexts (§7.3.1). This showed that this kind of context structure is at least related to some important feature in the analysis of human performance of complex tasks.

The findings that emerge from the study as a whole are thus:

8.1.2 Variation between individuals and situations

This issue is independent of the other main points of this study, and will therefore here be discussed separately. We started out, in §1.3.2, defining a complex task as one for which there were a large number of potential practical strategies. With this large number of possible strategies, and without superimposed severe constraints to limit this number, it is not surprising that individuals settle on different strategies, as suggested, passim, in §2, and more explicitly in § This study does not belabour the general recognition that strategies, contexts and rules differ: that is informally evident from many of the experimental results. The difficulty is in measuring that difference in a way relevant to the model being developed.

If there were a clear correspondence between the contexts of two subjects, but with different rules within those contexts, we would be able to compare the performance of rules induced from one subject's training data on test data both from that subject and another subject. This would show how fax apart the rules were. But as it is, their contexts differ, and this is not possible. What was done in §7.2.7 was to analyse two subjects' data both in terms of their own context structure and in terms of the other's. This was inevitably a somewhat artificial procedure, but it was the nearest that could be devised to measuring the difference between the two context structures. This measurement was consistent with there being a difference between the rules used for selecting contexts, though it did not provide a method of determining which context structure matched up with an anonymous segment of performance data. Instead, for long sequences of trace data, we could distinguish whose data they were simply by looking at the frequencies of sensor usage. We can see something of the difference in sensor usage in the second experiment in §7.2.1. Turning back to rules, the results in both the sea-searching experiment chapters (§6 and §7) included evidence concurring with the intuitive notion of rules differing with respect both to the player and to time. Informally, even the short snatches of verbal report that have been given here (§7.2.9) reveal differences in rules between the subjects: there were of course many more examples in the verbal reports.

The fact of there being differences between individuals has had implications for the concept of modelling being developed. Differences between individuals mean that there is no universal human strategy to model, and that therefore the important advance is not to try to discover a normative model, but to establish methods of modelling individual human task performance, by setting up a framework and a methodology for that modelling. In the literature, the term 'model' is used sufficiently broadly to permit such a modelling approach to be termed a model.

8.1.3 What is modelled?

It is important to emphasise, despite the fact that many rules have been induced in the course of analysis, that no claims are being made about any of the rules that have been induced. There are very few grounds for putting much confidence in any particular induced rule, or attaching great significance to the content of one of them. In this study, throughout §6 and §7, rules are induced only as a guide to how ruly or unruly a certain set of examples are, with respect to a set of potentially determining attributes and a set of actions potentially determined by those attributes in a rule-like way. Comparing the ruliness of one set of data, represented in different ways, gives measures that can help progress towards finding better sets of attributes for that data; and similarly, looking at the ruliness of sets of data divided in different ways helps towards identifying better ways of dividing the data. Together, the ways of dividing the data, and the attributes that are relevant within each division, amount to a representation. The representations that help the analysis to be more tractable, concise and effective are very natural candidates for consideration as representations that humans use to structure a task, even if the action rules themselves are not good models of human rules.

If we are to take the information-processing model of human cognition seriously, it is reasonable to work towards modelling human abilities by investigating structures which help to clarify human task performance data; and inasmuch as the derived structure does actually clarify, this could be taken as indirect evidence for the existence of structures of similar form in the human. Further indirect evidence would be gained if the structures were able to serve as the basis, in the longer term, for the discovery of rules that could more plausibly be attributed to human agents than the rules in this study.

The idea that context (as used here) is a main feature of such structures is supported by:

The information pricing technique described in Chapter 7 offers a start to analysis of task performance data in terms of contexts, in a way that is not just a priori; but to be more valuable, the methodology needs to be extended to deal with data from less artificially restricted domains.

8.1.4 Generalising the methodology Removal of information hiding

Starting with the last feature to be introduced, the first generalisation would be to remove the necessity of the information-costing interface (§7.1.1). To achieve the same aims, this would mean that the analysis had to discover information usage in a different way. A possible explicit approach to monitoring information usage would be by using eye-tracking, which would naturally go together with a more detailed analysis of short-term information flow in terms of short-term memory. This would require more input from cognitive psychology, and it is not clear to what extent this approach would reveal more about the aspects of cognition addressed in the present study. Discovering information usage otherwise, that is, implicitly rather than explicitly, would be tantamount to an advance in machine learning techniques, and therefore the discussion of that, though important, is left to §8.3.1, below. Removal of restriction on interaction timing

Another artificial constraint imposed in the course of constructing the experimental vehicle was the strict quantisation of the times at which interaction was possible. Removing this constraint has two implications, corresponding to the two reasons that strict control of action timing was introduced originally. Firstly, there is the technical problem of storage and regeneration of the runs. Due to the essential indeterminacy of the physical world, we cannot expect to recreate episodes from real life given only the actions taken, however accurately these are recorded. If we wish, nevertheless, to gather data from real tasks, this means that the only option is to record all the possibly relevant data to whatever level of accuracy is appropriate-limited perhaps by the ability of the human to make discriminations. For life-size complex tasks, this would mean in practice a lot of magnetic tape. For simulations, the method of recording data would be entirely dependent on the details of how the simulation was implemented. Suffice it to say here that the amount of data that needed recording would be somewhere between the minimal amount necessary in the experiments in this study, and the maximal amount for a real life task with a physical system.

Secondly, removing strict control on timing would mean taking into account the possibility that precise timing of actions was an important aspect of task performance. This is discussed here immediately below, and in §8.2.1. Including analogue control inputs

In §4, investigation was started on a task which had a virtually analogue control input, and for which precise timing of actions was important. Others of the tasks rejected in §5.1.3 were seen to have a similar character.

The main problem with this kind of task is in relating appropriate situations and actions together in a way that is relevant to human cognition. As shown in §4.3.3, it is difficult to decide how to represent human actions when they are executed through an analogue channel; and if precise timing is involved, as discussed in §4.4.2, it is difficult to know which situations to relate to which actions, in terms of time.

Failing any more principled ways of overcoming these problems, the approach that is implied by this study is to work by trial and error, using rule induction as a means of testing the relative merit of ways of representing actions. Extension to these kinds of task will be briefly discussed further below, §8.2.1. Finding new representational primitives

In the sea-searching experiments, the issue of developing new representational primitives for situations was not explored beyond the introduction of reasonable hand-crafted compound attributes, in the first experiment intuitively, and in the second experiment following the idea of information implications. In the representation of actions, the method in the first experiment was simply a kind of chunking, of no more sophistication than, for example, the reasonably well-known methods referred to by Schiele & Hoppe [120]. The methodology of the present study has minimised the need to find new representational primitives, but the proper representation of actions would be a more important issue in manual tasks with analogue controls, and the representation of situations would be more important in more complex tasks.

The problem of representing situations properly is focused by bringing in some kind of realistic limitations to the amount of information and the number of rules to be dealt with at any one time, in line with the supposed abilities of a human. We may find that to get the greatest possible ruliness from a set of human task performance data, where the situations are represented at a low level, requires the number of attributes, or the number of rules, to exceed such a limit. One possible reason for this would be that the human preprocesses some of the information into higher-level units (aggregation of data), in terms of which the rules may be considerably more compact. For this methodology to be generally effective, ideally it needs the addition of an automated method of finding such higher-level primitives.

Ideas on this have been introduced in the discussion of machine learning, above (§3.2.3), and a general solution would belong to that field where the issue of constructing new predicates has already been addressed (e.g., [86]). However, it is possible that other less general methods could be developed to tackle this problem specifically for the domain of human task performance. The process of looking for higher-level primitives could be triggered by finding a set of attributes that exceeded some reasonable bounds on human processing ability. For instance, a control room operator could be confronted with a large number of warning lights in a single panel. We could imagine the strategy to be markedly different depending on whether only one or two warnings were active, or many simultaneously. A very complex rule could describe the difference in terms of the status of each warning light; but the human would more likely be using a higher-level qualitative attribute of roughly how many warnings were active at the time. In this example, there would be very many individual sensors in the low-level representation, but some kind of constrained search over these sensors might turn up a qualitative measure of the number that were on simultaneously. Deriving such a method would require extensive further investigation.

If no such methods were obtainable, one would again have to rely on trial and error, with different sets of primitives being evaluated with regard to the performance of the rules induced under each representation, as has been done in this study. Clearly, this process could be greatly aided by finding as much as possible about the terminology that people use when describing or discussing the task, and attempting to formalise those terms.

8.1.5 Conjectures about contexts

The nature of this study is exploratory, and there is no fully-fledged theory of human representations of complex systems which can be presented in conclusion. Nevertheless, it is important, in the conclusion to this study, to give a wider idea of the concept of context that has grown up alongside it. This is because these conjectures serve both to clarify the idea of context by giving it some background, and to point towards further areas of research.

Contexts can also be thought of as entities that serve the purpose of supporting other ideas. First in §2.4.9, and subsequently in §7, we have suggested that an operator's information processing can range from skill-based to knowledge-based in the course of one task. If we cannot describe a whole task performance as purely skill-based, rule-based, or knowledge-based, then to use these terms we must identify a smaller unit to which they could apply. Contexts as described here fit the bill. The articulation of contexts

This study has not revealed any empirical evidence about how people articulate contexts: in particular, how they move from one to another, how many there can be active at one time, and whether there is any differentiation amongst contexts-different types, hierarchies etc. Reflection on everyday life, as well as other complex task performance, leads to the conjecture that context shifting may be done primarily by means of cues, which may be internal to the task (e.g., a particular goal state having been reached), or external to it (e.g., the phone rings). There are some context shifts that seem to be very widely applicable: for instance, in an office, when the fire alarm sounds. The rules of behaviour while a fire alarm is ringing are, to say the least, noticeably different from the rules obeyed in other situations. Of course, it is not the sound of the alarm itself that changes a person's rules of behaviour, but an internal change in response to that alarm. Another intuitively obvious phenomenon is that people sometimes get disoriented while they are performing a task, or engaged in an action. This leads to actions to identify what is happening, in other words, what the current context should be. So it seems reasonable that any context at any time will have a 'reorientation' context behind it, so to speak, from which the person may say 'what was I doing, now?' This reorientation context may vary according to circumstances, and it may have a more or less fixed method of determining the (lost) current context.

People also clearly engage in multitasking. This could be done via a mechanism involving attention: when there is a lull of attention-needing activity in the current context, one could move into another context that needed attention. Alternatively, it could be that a number of contexts can be simultaneous in a more immediate way, such that a relevant change of state in any of them brought that one into awareness. Analogies with multitasking computer operating systems may be of some use, including the ideas of dæmons and interrupts.

If one can easily imagine multitasking, then it would be even easier to imagine that the process of transition between two contexts involved a gradual, rather than a sudden, takeover. Being sure that a new context was appropriate and workable could be a precursor to relinquishing the previous context. The development of contexts

We have seen in this study (§7.2.1) that it takes time from starting on a new task to settling on a particular pattern of information usage. At the early stages, there may wen be some context structure, but it was not revealed by the methods used, and in any case, it appeared to be in a state of flux. In any unfamiliar situation, we have already said (§2.4.9) that information processing is more likely to be knowledge-based, and it seems likely that one of the chief processes by which a knowledge-based situation becomes rule-based and thence skill-based is by identifying the context structure within that situation. That is, a structure needs to be set up where it is known what the relevant variables are, what the appropriate actions might be, and when that context is no longer applicable. This would then be a precursor to refining the rules for application within each context.

An individual new context could originate from the human realising that some current context (perhaps an underdeveloped knowledge-based one) was needing better performance than was currently being achieved. Some distinguishing features would then he sought, on the basis of which to refine the context structure. As soon as some new context was defined, a process of adjustment would be started. Along any context boundary, one might find that one context was more appropriate than the other, in which case the boundary would shift so that the more appropriate one was used more.

The previous context structure out of which the new one came could be retained in the background, and might be referred to again in cases of disorientation. In cases of disuse, it could be that either the rules within contexts get forgotten, or the progression rules between contexts could be forgotten. In each case, we might see a reversion to a previous, more general, or simplified context structure for that domain or area.

A completely new area of experience would provide the possibility for borrowing the context structure from another domain, as an analogy. Just what parts of the context structure need to be borrowed from the analogous field, and what parts need to be changed, is unclear. Types of context

This study has largely focused on a particular kind of context, where rules can be induced based on a few attributes. These are where the information processing is, in Rasmussen's terms (see §1.3.1), rule-based (if the rules are explicitly known) or skill-based (if the rules are not explicitly known). We have also discussed the possibility of contexts, occurring at times where the less experienced operator is still, in Rasmussen's terms, using knowledge-based processing. Here, much varied information may be used, but we would expect it to be processed sequentially, and relatively slowly.

We can also imagine a third kind of information regime for a context: where there is much information, and that information is processed in parallel, relatively quickly. One paradigm for this would be pattern recognition. Now it may well be that, in practice, patterns are recognised as falling into one of a small range of classes, before being, as it were, passed on to the decision-making process, but it is often characteristic of patterns that we cannot build any simple rules which would enable inference of the class from the pattern elements.

If pattern-based information processing is going on, analysis and simulation in these contexts may require the integration of some kind of pattern-recognising front end to the information gathering process: something that would enable the classification of patterns into classes, modelling a particular human's way of doing this. However, just because a human uses pattern-matching does not necessarily mean that there is no other way of finding equivalent rules from the same information. If the situation arose, where a human was pattern-matching and other information could support concise rules, it would be possible to emulate human performance while losing some realism, by using the other information as the basis for inducing rules.

Extensive discussion of pattern-matching would be out of place here, since it has a large literature to itself, involving connectionist models. The existence of pattern-matching aspects to human information processing by no means invalidates the rule-based approach: but it does put limits on its comprehensiveness.

8.2 Further implications for systems design, decision aids, and training

When discussing cognitive psychology above, the point was made that having a theory does not necessarily mean, either that is it relevant to real problems, or that it can be put to good use. Since this study was motivated by real problems in the world, it is appropriate to consider briefly how the work done could be built on to provide something of external use, which serves some of the purposes introduced in §1.3.4.

These could come from either of two levels of the analysis in this study: firstly (and more simply) from the representation, along with its implications for information use; and secondly (with more difficulty) from the rules, of a particular individual's task performance. The context structure and information usage could be useful to the incremental redesign of interfaces; but application to early design, training, and what is here called the 'Guardian Angel' approach to operator support, need a model of rules as well as context structure. Before considering these in detail, we shall first look at obstacles remaining in the way of applying the methodology to real tasks at all.

8.2.1 Preconditions for applying the methodology

The methods developed here aim not to rely on subjective judgement, but as far as possible to be objective and automatic. Hence it is important that the information automatically gathered and processed for the analysis covers as much as possible of the information actually used by the human operator in making decisions or in performing actions. In §8.1.4, we have discussed problems attending representing the information that is available, but this still leaves the problem of gathering relevant information in the first place. This research does not circumvent this problem, but rather highlights it by providing methods of using the information that has been gathered.

Applying to ship navigation

We have discussed above (§3.1) how ship navigation involves a number of aspects, and that there is limited usefulness in analysing one without the involvement of the others. But capturing data relevant to all the aspects presents major problems. Even for collision avoidance (a relatively easily represented aspect of the task) it would be difficult to capture the unmediated visual information that an officer of the watch might well use. And although it should be possible in principle automatically to capture the data presented on an advanced plan position indicator, there would still be problems in formalising that information, just as there were problems in the formalisation of graphical information in the experiments of the present study. But if we go beyond collision avoidance, the involvement of several people on the bridge clearly implies managerial aspects to the task, that would be difficult to capture and formalise automatically; and even on a one-man bridge, even in fog, the navigational decisions are greatly affected by communication with external agents such as harbour authorities and other ships. Before ship navigation can be fully analysed using the present methods, there is clearly more work to be done in gathering appropriate information.

Applying to other complex tasks

The problems in analysing ship navigation generalise easily to other complex tasks. We can imagine, in any complex system, the existence of sources of information that were either not monitored electronically, or difficult to formalise, or both. As well as the fairly obvious sights, sounds and smells, it is not uncommon in complex systems for the actions of an operator to be affected by factors relating to other people involved in the process. Keeping other people happy is not to be forgotten, but surely very difficult to capture and formalise.

The obvious way of attempting to perform a study in circumstances like navigation or other complex tasks would be to have a human observer recording significant events, and this could be fruitful; but there is the danger, at each of the stages of observation, recording, and interpretation, of the researcher's representation of the task interfering with discovering that of the subject being studied.

Fast dynamic tasks

The faster dynamic tasks include riding bicycles; many computer games, discussed in §5; and car driving, which will be used as an example below (§8.2.4). These pose particular problems in analysis, to the formalisation of both situations and actions.

The problem with the representation of situations is that there is an information input of major importance from vision, in a way that can appear to have the character of pattern-matching; and in the cases of driving and riding, there is much potential for information to be gained from such senses as proprioception and balance, which might be difficult to capture. Even to model human performance in the tasks of this kind that are easiest to analyse—the computer games—one might have to apply some sort of pattern-recognition preprocessing to get a tractable description of situations. At least, for computer games, the presented information is available for analysis: in driving or riding, automatic sensors cannot yet collect the same kind of information, to the same detail, as humans can. Simulation for these live tasks is certainly not easy, as was reported in §4 for bicycle riding, and therefore this would not be an easy alternative for analysing the content of the skill. A practical approach would therefore have to rely on gathering what data can be gathered by more straightforward means, and hoping that an analysis based on these would match a human performance even though it was not identical.

The problem with the representation of actions is that there are in the case of driving and riding only a few analogue inputs (steering, power) that are used to perform the wide range of actions that we are able to discuss for these tasks. Detecting what action had been performed, or indeed intended, from a record of physical movements of the controls would be difficult, and until this had been done, no satisfactory analysis of the skills could be expected (see the analysis and discussion in §4). Even with computer games of this type, where the input is technically digital rather than analogue, typically most, if not all the control is via a joystick or mouse, and there is still a great problem in adequately formalising and recognising the higher-level actions, given only data on the very limited range of low-level signals. Despite one reported attempt to extract machine pole-balancing skill from human performance on a pole-balancing simulation (see above, §3.2), simply in terms of the primitive left and right movements of a joystick [79], it remains doubtful how near this approach can come to modelling the human skill itself, without making higher-level characterisations of the actions.

For this kind of game, players are usually performing against their limits, such as reaction time and coordination. An adequate description of these skills would want to take into account these limits in some way, which may include modelling some psycho-motor aspects of task performance. Even for tasks in which the psycho-motor aspect is not a critical limiting factor, it is possible that psycho-motor factors affect the way the task is performed: this would again mean modelling these factors.

8.2.2 Interface redesign

If we imagine these preconditions to be satisfied, we could use the data gathered to reveal context structure to the task (whether by the methods used in this study, or by extensions and generalisations such as those discussed above, §8.1.4), and thus to progress towards interface redesign, addressing some of the problems of interfaces introduced in §1.2. The context structure could be made the basis for the interface, by being used as a fundamental unit of the organisation of the interface. It might help the immediate comprehensibility of this if the operator could agree meaningful names for each context. We could imagine, for example, one screen for each context, in an interface where one VDU is used to display many screenfuls of information. An important part of the interaction would be maintaining the appropriate context for the situation.

Within each context, the information that had been found relevant could be displayed. In general, we could expect the amount displayed at one time to be less than is generally presented simultaneously in complex system interfaces, but one should not lose sight of the value, and actual use, of redundant information in the same context, both to support a small range of alternative strategies, and to enable checking of the internal consistency of the data. The present study has not examined the use of redundant information, but one could extend the methods used here, to discover redundancy.

Some aspects of the style of the interface could also be based on knowledge gained from the analysis. If a context was highly ruly, the rules being derivable from only a small number of quantities, care could be taken on the appearance of the display to maximise the ease with which those quantities could be apprehended. This could well involve qualitative or symbolic displays, as well as the more usual continuous analogue or digital ones. But if the context did not have well-defined rules, and instead appeared either to be one where the information processing had a knowledge-based character, or one where the quantities needed were not sensed directly, one might want to design the display to enhance access to other sources of information, and the making of inferences, links, or analogies. Hypermedia-style interfaces come to mind here. For more subtle skills involving many small pieces of information, or graphic information, processed by pattern, one could design the display to enhance this ability, by ensuring that the important aspects of the pattern were salient.

If an interface were to be redesigned for a number of operators or users, the ease of using the currently suggested approach would depend on the extent to which the users all shared the same context structure and information use. If there were a lot in common, then the redesign could proceed in essentially the same way as for one individual. However, the present study would doubt whether individuals would be likely to share a lot in common in such a complex task, without at least extensive training designed to ensure conformity. If the context structure, only, were common to the different operators, and the information use different, then a redesigned interface could include all the information that any of the operators used in any one context. But if the operators' context structure were different, in terms of the divisions and boundaries between the contexts, and the higher-level rules governing transitions between them, it would be difficult to obtain a principled common redesign from the methods of the present study.

While discussing redesign, it may be pointed out that redesign of an interface is likely to change the way an operator performs the task, thus invalidating the analysis that led to the changed design. Another factor that would invalidate the design is if the user changed information use. Thus, in this paradigm, redesign should not be based on an analysis of operator's performance while that operator's representation was changing, but it would be valid if the representation had achieved stability, even if the action rules were still changing.

It is relatively easy to see how this methodology could lead to incremental improvements in an interface, particularly to do with prioritising the information that was actually used over that which was not, thus easing the workload of operators, and reducing the chances of overlooking important information. For the more difficult objective of identifying major deficiencies in the provision of information, a context analysis could provide the raw material for someone with extensive knowledge of the task to use their creative insight to suggest, for example, where a new source of information could relieve an intricate context structure built around a badly instrumented system.

8.2.3 Safety

Having suggested that this methodology is primarily able to support the redesign of interfaces for individual operators, we would agree that it would be counterproductive to design an interface to support any maladaptive strategies that might have been adopted by a particular operator. Such maladaptive strategies could not easily be detected on the basis of context structure alone, except by means of another expert's judgement on what information should be consulted in a situation. If, however, a detailed model of the rules of a particular operator had been obtained, from a deeper analysis, this model could be tested on a simulation of the task, to see whether the rules used were likely to lead into trouble or inefficiency in any situations-most likely those that were not frequently encountered, but possibly also where bad habits had consolidated. Finding such maladaptations could lead to the recommendation of remedial training on a simulator, focused on the kinds of situation where the model had shown potential problems.

This kind of detailed analysis of individual rules and their potential failures could bring a qualitative change in methods of human factors safety and risk assessment. Probabilistic methods of assessing the unreliability in the execution of a particular action cannot take into account the possible variation of reliability across different contexts, without having a context model. If errors could be analysed in a context-based way, there would be the potential of a very informative and precise attribution of causes to errors. The more accurately the rules of human task performance are known, the clearer will be the explanations of failures in that task.

8.2.4 The Guardian Angel support paradigm

The prospect, even a remote prospect, of having a rule-based model of operator performance in a complex task, stimulated the concept of a kind of operator support that invites the name, "Guardian Angel". The concept is like that of a guardian angel, supposedly looking over the shoulder perhaps, remaining mostly in the background; understanding actions in terms of intentions; intervening when action and intention do not match, or where a harmful intention is formed, or where important information is overlooked. The intervention could be either giving advice, information, or asking a question to direct the attention to something. Like a guardian angel, such a system would be inherently personal. In order to relate some examples to common experience, we will here consider potential application to driving a car. This example is chosen not because it is typical of the kind of complex task considered in the present study (it is not), but in order that we may relate the concept to a task of which most adults have extensive experience.

Probably many of us would be very glad of a voice quietly telling us that there is a police car following behind. The potential value of this is recognisable irrespective of the practical difficulty of implementing this technologically. But when would a guardian angel system make such an announcement? Not every time a police car began to follow, for that might become annoying. If a guardian angel system had learnt that in every case when I knew a police car was following, I rigorously obeyed the current speed limit, then the observation that I was not being rigorous would enable the hypothesis that I had not seen the police car. That would be one truly helpful time to let me know. Equally well, if I was already cruising along at the speed limit, pressing my foot hard down on the throttle would be clearly inappropriate. A similar warning, delivered in a timely fashion, could keep me out of trouble.

This is not universally applicable, however. A fire-engine driver would not appreciate such advice, and indeed, there would be no such rule observable from a fire-engine driver's past behaviour. Here, a guardian angel system would not intervene, because the actions were within the normal range.

A guardian angel system would have to know about different classes of passenger, as well. We would not want it to give suggestions to the effect that I was driving slower than normal, when I had an aged relative as a passenger. On the other hand, if I forgetfully drove with my normal style for lone driving, I would appreciate a reminder (preferably visible, and only to myself) that I had a granny in the back.

In other examples, there might be no simple external explanation of the deviation from the normal range of behaviour. A guardian angel system might come to know what my normal performance was in keeping my place in a lane—how much I deviated each side of the mean, what the frequency of the deviations were, etc. Large oscillations are obviously undesirable, and if I started to exceed my normal limits of performance, it would be quite in order to suggest that I should reduce my speed. If information concerning my consumption of alcohol was also routinely available (or perhaps taking of some prescribed drugs) a guardian angel system might recognise the performance as belonging to a known category of substandard performances, for whatever the reason was. An ideal system might also know from experience what to do to reduce the risk of accident in these circumstances. This would be based on generalisation of many people's behaviour.

Probably most of us are aware of a wide variety of driving style, though we would be hard put to define exactly how to measure it. To design a general driver advice system would mean designing to the common factors of a large number of drivers. Such a system would not have the ability to discriminate between the same performance when done by different people, and the different implications of this. We would not want drivers with somewhat less motor control continually to be criticised for their lack of perfection, but it would be useful to be able to consider the possible reasons for an otherwise excellent driver to be driving in a way that might be reasonable for others, but distinctly bad for him or her.

Perhaps the most important distinction between a general advice system and a guardian angel system is in the likely response from users. With a general advice system, there would be external rules and standards, and it is easy to imagine a driver rejecting advice with the riposte that "It is perfectly safe!" The idea behind a guardian angel system is that the advice given would be like one's own advice, distilled from one's own performance, gathered and approved over long periods. If delivered appropriately, that advice should not suffer from the same disadvantage of being felt as fundamentally alien.

In order to work at all, such a system would need comprehensive access to information about the quantities or variables that affect task performance. Initially, there would be a learning phase, where the guardian angel system formed a detailed model of the user's task performance skill. The only advice that could be given initially would be of the same grade as from a general advice system, possibly tailored by user's choices, or by fitting the user into a stereotype.

In time, enough data would accumulate to allow the derivation of a representation, and rules. Further data would have to be continually added and analysed, to ensure that the rules remained current. If no input was obtained from the user, the rules would reflect what he or she did, rather than what he or she thought was good performance. However, one can see much more potential coming from a system that includes value judgements from the user on his or her own performance. Most people seem to be aware when they make a mistake, or do something that they would rather not repeat, and if this value judgement could be incorporated by the guardian angel system, it could form models not only of what the user did, but what the user thought was decent performance. This might lead even to the ability to suggest causes of poor performance, and suggestions for avoiding that.

In the European Community DRIVE Programme (Dedicated Road Infrastructure for Vehicle safety in Europe) there is a project, Generic Intelligent Driver Support (GIDS) [130], which aims to deal with similar issues, specifically for driving, including the idea of an adaptive interface: but they do not consider machine learning as a tool. The advantages of approaches using rule induction are firstly that the adaptation of the interface could in principle be closer than is possible using predefined stereotypes, and secondly that it provides a searching test of the adequacy of a representation for describing performance. If such a test is not carried out, it is easy to rely on a representation that only encompasses a small subset of the real task. This is related to the problem identified earlier (§ in which formalisms may fail to represent a task adequately.

8.2.5 Training and assessment

The clearest and most obvious application to training, of the analytic methods discussed in this study, would be to assess training's efficacy. Since training can potentially make a difference to the way in which a task is learnt, a detailed analysis of what has been learnt, in terms of contexts and rules, may reveal generic differences in the results of different training schemes, beyond the differences between the individual trainees. If the individual characteristics of the trainees were taken into account, there would be the potential for discovering whether different training regimes suited different types of people. In these cases, the context and rule analysis of task performance would be contributing more detailed feedback about what is learnt than is normally obtained through straightforward tests of speed or accuracy.

However, in the course of this study we have confirmed that it is more difficult to discover detail about task performance when a task is still in the early stages of learning, because in general the contexts and rules have still not stabilised. One outcome of the analyses performed is to show that different contexts differ in their degree of ruliness; and this suggests that even at the earlier stages of training one might be able to identify particular contexts where the rules were established relatively quickly.

Designing a training programme for a new task poses more problems. After such a program had been running for a while, analysis of the results of training, as above, might be able to guide a redesign. But this does not help in the initial design of training: for that, one would need general principles of what was learnable, and what a human would be likely to learn about a task. This overlaps with the problem of early design, which will be discussed below.

Related to the concept of training, we could ask whether the methods of analysis described in this study could form the basis of an assessment tool, to discover the suitability of different people to the performance of different kinds of tasks. The answer to this is by no means clear, but if an answer were to be found, it would most likely relate to parameters governing the structuring of a task, which would here be chiefly about contexts. If one were to study the task performance of subjects, in terms of rules and contexts, across a wide range of tasks, there might be consistent differences between people, for example in terms of the number of rules that could be comfortably accommodated in one context; the number of different items of information that were taken into account in each context; the number of contexts into which a given task was split; and the amount of intermediate processing necessary in the execution of the rules in any context. From these differences, it might be possible to factor out one or more dimensions of ability in general task performance.

8.2.6 Early design

To be able to help with early design, or (relatedly) to help with the construction of a training programme for a new system, a model of an operator's capabilities must exist prior to the analysis of an actual operator's performance. This model would have to be able to address the question, "how difficult is the task of controlling this system, given this amount of information?" An even more general question would be, "What information has to be provided to make this task doable?", and we could imagine answering this latter question in terms of the former one, using the extra input of the cost of providing whatever information is needed. These questions are closely related to the idea of 'cognitive task analysis' which has been raised in the discussion of the literature above (§2.1.2), and in a more extensive focused fashion elsewhere [43].

Here, as previously, to answer these questions we need to know something of the parameters governing human ability to structure a task. If we then came up with a model of one possible method of performing a task, those parameters could be applied to that model, resulting in a judgement whether that particular method was humanly possible or not. A positive result should be reassuring, that the task was indeed possible, but should not be taken to imply that any human would actually perform the task in that way. But the converse would not be true: just because one found that a particular method was implausible for a human would not mean that the task was impossible. To prove that a task was impossible would be at least much more exacting, if not itself strictly impossible.

To design a training programme, or to derive human models which could be expected to arise from a (possibly null) training programme, would need a model of human learning as well as parameters governing what is learnable. For this reason, the automatic design of a training programme is a yet more distant goal.

8.3 Still further work

8.3.1 Recreating context structure without explicit data on information usage

In the second sea-searching experiment (§7), a context structure was derived based on explicit use of sensors. In contrast, during a task where the information was freely available, it would be more difficult to demonstrate the existence of a context structure, albeit easy to imagine. To set the idea of contexts on a firm footing, we should be able to derive such a structure without needing to monitor the information explicitly, nor relying on verbal reports of phases and information use. At the same time, room for improvement exists for finding more accurate and reasonable rules governing the actions taken. How could we envisage progress being made in these two areas?

The essence of the concept of context that has been introduced here is that it is useful for a number of purposes simultaneously. For regularly performed complex human tasks, it is economical to conjecture that a manageable number of rules for a limited range of actions should be closely associated with an information environment that: supports the application of those rules; is processable within the limits of human capability; and supports the rules necessary to switch to different contexts where appropriate.

One approach to this would be to start looking for a set of rules that fitted at all into a context structure: or, looking at the problem the other way round, to look for a context structure that divides rules up into suitable groups. The action rules that we have seen here each have conditions and an action: the conditions as a group, and the actions, can be true or false for any given example (here we are not counting the appropriateness of the context as a condition). What can we say about the truth of conditions and actions in the ideal model?

What this amounts to logically is this: if a rule is in its own context, and its conditions are true, then the action should be true. Conversely, if the action is other than a rule predicts, then it should follow that either the conditions are false, or the context is not proper to the rule.

To consider this in more detail, one may recognise the way in which a rule can divide up a set of examples. Thus any rule divides the set of examples into four:

The function of contexts then emerges in this fashion. The context for a given rule must exclude as many as possible of the examples where the conditions are all true, but the action false (i.e., other than the rule's action), and should include as many as possible of the examples where the conditions and action are both true, though this latter is less crucial. The context structure as a whole should do this as economically as possible for all the rules together, and in such a way that rules for transition between contexts are possible.

It would be easy to derive an unsatisfactory context structure by concentrating on one aspect while neglecting the others. For a given set of rules, division of a data set into a large enough number of contexts would presumably be able to separate off the examples with true conditions and false actions: but this would be likely to lead to the inability to form rules governing context applicability. Alternatively, concentrating on plausible contexts with clear transition rules would be less likely to result in the ability of the contexts to distinguish accurately the applicability of rules. Again, if the contexts were chosen in advance, rules could be induced wholly inside those contexts, which would guarantee that the contexts served to limit the applicability of the rules, but to be sure of doing this, the rules would most likely be very numerous, and would be less likely to predict actions accurately.

Satisfying all these constraints for a context-based rule system poses a very challenging task. Could these constraints actually be sufficient to obviate the need for explicit knowledge of human information-processing limitations? After all, if the data is all taken from Eve human performance, the constraints should in principle be able to be discovered from the data, not vice versa.

How could an answer to this question be approached? If we could define a goal, in terms of the desired characteristics of a representation, we could perhaps set up a heuristic search (in effect, through representation space), to find a representation which both conformed to expectations about context structure, and allowed action rules to be induced that accurately predicted human task performance. Unfortunately, it is not clear how to define such a goal; nor, for any given goal, is there any obvious way of determining whether it is attainable at all. A less explicitly goal-oriented approach would be to define a measure of success of a representation, and search for better ones for as long as desired. This is one way of looking at the process that has been followed in this study: the main criterion of success of a representation has been the performance, compared with the default rule, of the rules induced with that representation.

But this study has been searching for something at a deeper level. This is that the success of a representation is the extent to which it divides up the data into different contexts which are recognisably different both internally, in terms of induced rules, or ruliness; and externally, in that there is some method, or there are some rules, for determining either which context should apply to any situation, or when a transition from one context to another should be made. More criteria which have not been discussed extensively are that the representation should minimise simultaneously the number of contexts, the number of rules in each context, and the information and amount of processing needed both to execute those rules, and to determine the context. The trade-offs inherent in attempting to satisfy these conflicting criteria have yet to be determined. In short, this thesis suggests that something like what is called here context constitutes a naturally occurring structural element in the analysis of human performance of complex tasks, and that therefore representations of the control of complex systems should incorporate context as a salient feature. Still better criteria for approximating human representations should be a goal for future work.

8.3.2 Further refinements of the context structure Refining the quantities into qualitative ranges

The analyses in this study have used floating-point quantities (effectively continuous from a human point of view). This is because recent induction algorithms have been designed to process floating-point values, and to introduce their own divisions of these quantities into qualitative ranges. However, the literature referenced in § considers that humans often treat continuous quantities as if they were composed of a small number of discrete qualitative ranges. Also, in §3.2, we looked at the problem of dividing up continuous variables into qualitative ranges for the control of a dynamic system. This leads on to considering the potential of extending the context analysis to incorporate qualitative divisions, rather than leaving it to the induction programs.

This could be done by insisting that within any context, the qualitative ranges that are used by the different rules must be harmonised. It is clear that CN2, at least, does not consider this when constructing rules—see the rules in Appendix B, in which each quantity has many different splitting values, or thresholds, to use the terminology of §3.2. To implement a harmonisation of qualitative ranges would either need a major change to a rule-induction algorithm, or a possibly unwieldy arrangement whereby different thresholds were tested out for efficiency by reinducing all the rules for the context, using existing induction algorithms. Another implementation problem is that it is not clear how to set the trade-off between accuracy of induced rules and number of thresholds.

But if this were indeed possible, the information presented to the operator of a redesigned interface could be presented in a discrete, rather than an analogue, form, and this would enable a great reduction in the amount of information presented. Effectively all the unused information from the high-resolution sensors would be cut out, leaving only the essential bits. Whether or not this would be a good idea overall is difficult to determine; but it would certainly provide feedback about whether the analysis was accurate or not. If the operator's performance was impaired by having only qualitative information rather than quantitative, one would be led to ask what that extra information was being used for. On the other hand, it is conceivable that the operator would find the task easier, due to the simplification of the presented information, and the reduction of distracting extraneous information. Re-examination of actions

If rules are specific to contexts, it makes sense to consider the actions specified by them as proper to contexts as well. This implies another potential constraint on, and another way of discovering about, contexts. The constraint is that each context should have a limited number of possible actions: in practice, if the number of rules is already restricted, this means that those rules must be predicting only a small range of actions. Independently of rules, one could use the co-occurrence of actions as another guide to the sections of data belonging to different contexts, because one would expect each context to have a peculiar pattern of actions.

However, it is also important to remember that actions from the point of view of cognitive analysis are not necessarily the same as individual button-presses, or whatever else the most basic interaction with the system is. As in §6, an analysis of sequences of actions may be necessary to establish more correctly what the cognitive actions are. Such an analysis may be called for if one were to find that a context did contain unreasonably many individual actions. Using better-represented actions could be expected to clarify context structure as well as to improve the apparent ruliness within contexts.

8.3.3 Directions for machine learning

The extension of machine learning techniques in the analysis of aspects of human task performance has been discussed above. A question which naturally arises from this is, can machine learning alone learn how to perform a task in a similar way to a human, without knowing anything about how humans have actually performed it? If this could be done, it would clearly relate to early design (§8.2.6), and possibly to training. The ultimate goal here would be the automatic analysis of a task that had not yet been mastered, and the generation of a training programme to teach it to humans.

The answer to this question depends on what we mean by 'similar' to a human. The strongest criterion would be a Turing test—whether other people could distinguish between the performance of humans and the performance of the machine-acquired skill. The possibility of such a skilled machine for complex tasks is limited by our knowledge of the information-processing structures of the human, and so progress towards that goal would come with the refinement of our knowledge about human skill and knowledge. But there are also weaker criteria. As Michie & Johnston pointed out [81], it is becoming increasingly important that such knowledge as is acquired automatically, is accessible to humans, for checking its validity and applicability. But in order to fit into this 'human window', it is not strictly necessary to perform a task in a human-like way, but only to have a humanly manageable structure to it. A context structure, irrespective of how closely it corresponds to actual human practice, is certainly a reasonable approach to providing just the kind of structure to a task that it would be relatively easy to understand. This is because a context structure is a way of minimising the amount of information, the number of rules, and the complexity of processing, that has to be dealt with at any time. It would be interesting and significant to know what features of a task structure are strictly implied by this quest for minimising cognitive difficulty. This would be a valuable extension along more formal lines.

After deciding on the form of what is to be learnt, the next problem for machine learning to tackle is how to go about learning the content. If we were to suppose that humans are good at learning new, unstructured problems, then machine learning might also profitably gain from copying a model of human learning. In this case, advance in machine learning and advance in the study of human learning would share a direction for progress. Hence, our last consideration is directed towards human learning.

8.3.4 Prospects for contributing to the study of human learning

Investigating parameters for the structure and content of human task performing skill has already been mentioned. The contribution of this study is in suggesting the importance of rules and ruliness, and the centrality of the concept of context; and in suggesting some of the central parameters which might govern contexts: amount of information available; number of rules; complexity of higher-level rules for determining context; processing requirements in the gathering of information into a usable form, and in the execution of rules.

We have seen above (§7.2.7) how some contexts appear to be well-defined, but not ruly, on the basis of rule-induction from obvious attributes. This has suggested a distinction paralleling Rasmussen's, between contexts where the information processing has more knowledge-based character, and when it has a more rule-based character. Here we draw attention to the possible need to model these kinds of context differently. Since it has been supposed that the knowledge-based approach comes before the rule-based or the skill-based, a model of learning could address the question of how general-purpose problem-solving contexts become gradually differentiated into specific, efficient, task-oriented rule- or skill-based contexts.

We have noted above (§6.4.6) how the methods of this study are not well-suited to the study of the early stages of learning, and indeed to the study of the learning process itself. This was partly because it is difficult to obtain sufficient data at a stable early level of skill. One plausible idea for circumventing this would be to arrange a study of subjects whose practice would be strictly regulated, interspersing periods of learning and improvement with periods when just the right amount of practice was done so that the performance remained at a stable level, neither improving because of too much practice, nor declining through too much time between the practices. Whether this could work, even in principle, would depend on whether what was learnt was the same as what was forgotten. If one could, by this means or any other, gather a much larger amount of data from the early stages of learning a skill, it would become possible to use the kind of methods both that have been used here, and that have been discussed as improvements.

This study has furthered the aim of modelling cognitive aspects of complex control tasks, by analysis of human performance in terms of information and ruliness, using rule-induction tools and some concepts from cognitive psychology. The same inter-disciplinary approach could be greatly extended, potentially illuminating a broad range of topics concerning learnt human skills and how to support them.
General Contents Copyright