©1990, 1995 section list 7: Experiment 2 overview General Contents
Section 7.2.7 7.3 Discussion subsections Section 8.1

7.3 Discussion

7.3.1 Main findings of this experiment

The discussion at the end of the previous chapter highlighted the need for an approach to discovering about human representations of situations. To this end, we have seen the introduction of a concept of context, together with a rudimentary means of deriving contexts within the framework of the information-costing experimental arrangement that was devised expressly for that purpose; and then an analysis in terms of those contexts.

Despite the shortcomings of these methods, which will be discussed later in this section, the context structures derived

Some of the contexts appear to have a comparatively highly rule-based character, and it is easy to relate this to Rasmussen's categories of rule-based and skill-based behaviour. It would be rule-based, in Rasmussen's terms, if the rules were consciously known by the operator, and skill-based if they were not. On the other hand, other contexts do not reveal a highly rule-governed nature through this method of analysis. There are a number of possible explanations for this, but one obvious one is that they correspond to Rasmussen's category of knowledge-based behaviour. Here it is interesting and suggestive to note that in the ship searching contexts, for which good rules could not be derived, the information flow is relatively small, with the sensors kept mostly off, and the number of actions is comparatively low. These are just the conditions one would expect for knowledge-based processing.

It must be emphasised here that the results of these analyses are tentative. The analysis methods have apparently not been tried on this kind of data, and there are no established equivalents of the general statistical methodologies, current in psychology, to support this approach. The results have been presented and discussed largely in terms of the difference in performance between induced rules and default rules, expressed as a simple difference in percentage. However, there are undoubtedly other possible ways of arriving at a measure of `how much has been learnt', and the methods used have been used because they were plausible and gave interesting results. We await a more thoroughly worked out methodology. To the extent to which these results can be considered at all valid, they serve also to support and justify the novel techniques that have been necessary to derive them. There is a great deal more that could be done in the line of analysis in terms of contexts, and this can be seen as a highly valuable outcome of the context principle.

7.3.2 Justification of results in terms of other work

A context structure is also a means of structuring a task so that it does not grossly exceed the known capabilities of the human information processing system. Card, Moran & Newell's Model Human Processor, which has been discussed above (§ 2.3.2.1), has a useful collection of relevant values of those capabilities. No explicit attention has been paid to make a context structure fit in with these boundaries, but it is not difficult to see firstly, that a context structure is a plausible way of breaking down a task so that only a small number of independent quantities need to be monitored at any one time; and secondly, that explicit constraints of this type could be built in to a context analysis process, to ensure that the limitations were kept within, and thus that a context analysis remained consistent with what is known about human information processing.

This would also be addressing similar issues to those addressed by the idea of Programmable User Models (PUMs), also discussed above (§ 2.3.2.2). A context-based structure could provide a model of the content of task skill, in a form which could be run on an explicitly constrained computational model of a human operator, as envisaged by the PUMs approach.

7.3.3 Problems and direct remedies

Here we will consider problems with each stage of the analysis, from the subjects onwards. These problems invite solutions, which are suggested as well.

Both subjects both showed and described recent changes in their methods of performing the task. A longer practice time would be preferable. Based on the experience of these experiments, one could conjecture that perhaps 100 hours of practice would be more appropriate for the level of complexity of the of the task examined here.

The relatively short practice time meant that the data could be expected to have more anomalies in it than would be the case for later practice: but the data was not `cleaned up' in any way before use. This means that they could have included runs, or parts of runs, when the player was doing something other than the usual task. It would be possible, if laborious, to watch all the runs carefully, and to discard those runs which appeared not to be conforming to a minimum standard of attempting to perform the task as given. This would run the risk, of course, of selecting the data to fit the theory, but it might also produce an improvement in the clarity of the analysis. Another related open question is whether to filter out actions which preceded disasters (such as setting off a mine), on the grounds that such actions cannot be consistent with a successful overall strategy.

Having chosen the data, attention turns to the analysis, with the construction of contexts and the choice of attributes within each context. The method of finding contexts was not highly developed or principled, and there is no doubt that this could be improved, both for the information-hiding methods employed in the second experiment, and by exploring other methods, which will be discussed below, § 8.3.1.

The question of selecting attributes within a context is highly problematic. Seen in one way, this is an endless problem, to be solved only in the ideal case that a full predictive model of behaviour is constructed in terms of the full set of attributes. However, the impossibility of this need not blind us to possibilities of improving the attribute set for any context. This is also linked to the question of whether we have a realistic context structure, since an inappropriate structure could mean that an inhomogeneous mixture of information might be being used. But assuming that there was a good context structure, there are essentially two approaches to improving the set of attributes associated with it. The first is the way which has been taken here, to monitor usage, and to ask the operator what information is being used. More attention could be given to this. The second way is to ascertain which attributes lead to the best induced rules, and this will be taken up later, in § 8.3.1.

Having decided on the contexts and attributes to be used, the next important factor in the induction is to optimise the operation of the rule-induction program for the data presented. In the analysis reported here, plausible values were assigned to the parameters of the program, and not altered, so that the analysis would not be confused. There remains the possibility that other values would have given better or clearer results. A natural extension to the work would be to check this.

Another approach to obtaining good rules is not to rely entirely on the rule-induction process, but to attempt some kind of selection or editing of rules. This could be done by eliminating those rules that performed least well on test data; or that could be discounted on a priori grounds such as symmetry, or the use of attributes that should have nothing to do with the action. It is important to recognise that in this experiment, no attention has been paid to the rules themselves, but only to the performance of the rules together. In other words, the chief interest has been the ruliness of the data, rather than the details of the rules. The number of rules is rather larger than one would desire for a model of dynamic task performance, and the rules individually appear more to specify when a given action does not occur, than when it does occur. Hence it is unclear how successful editing rules would be.

Another unexplored possibility is the integration of the analysis of situation representations followed in this experiment, with the analysis of action representations, which was carried further in the previous experiment. It is an open question whether this would improve the effectiveness of the analysis as a whole.

7.3.4 Other possible direct extensions to the study

Other extensions to the work, that do not arise specifically from recognised problems or deficiencies, involve methods to further check the validity or consistency of the results.

Originally envisaged, but not undertaken, was to use the representations derived from particular operators, and implement interfaces where the sub-displays corresponded to the contexts, and the information available in those sub-displays corresponded to the information that was found to be used within that context. It would then be possible to test experimentally how operators performed with interfaces that either corresponded, or not, to their own context structure. This might provide valuable feedback about how closely an individual's representation had been captured.

Related to this, it would be very interesting to train people on the information-costing version of this task, and then put them on a version as in the former experiment, where all the information is simultaneously available. An important question would be, do their rules for performing the task stay as they were, or does the presence of extra information help, or even possibly hinder, them? Having developed a strategy for using information, do they prefer an interface where information can be turned off?

There might be some value in changing the scoring system. For instance, any access to a piece of information could be priced at the appropriate value for a minimum time of a few seconds. Alternatively, a sensor could be set to be disabled a few seconds after a button-press on an enabled sensor. This might make the analysis of information usage easier, by making the system fit more closely with human short-term memory.

At some point it might become worthwhile to assess the difference, if any, between results obtained with CN2 (in its different modes), other rule-induction algorithms, and other techniques such as Bayes classifiers.

Another more ambitious way of testing the whole context and rule system is to use them to construct a executable model player, based on the data from one human player. To do this, one would have to first code context selection rules, then, for each context, code a set of rules for that context. In considering rules for contexts, some of the same considerations arise as in the discussion of types of action, above (§ 6.4.2). One could consider a context to be a function of the system state, with every state having a unique corresponding context. This may, however, be over-idealised for representing a human context structure. In order to implement a model where the context was a function of several variables of the system, those variables would have to be continuously monitored, to check for change of context. If the number of variables to be monitored was in excess of the plausible human monitoring capacity, it might become more realistic to consider context changes as the fundamental method of keeping track of context, with rules for change from one context to others existing alongside the rules for actions within that context. There could then arise considerations such as whether more than one context could coexist, where there was swapping between contexts based on available attention rather than triggering rules. The issues involved in constructing a full executable model of an individual's task performance are extensive, and some of them are taken further in the next chapter.

Next Section 8.1
General Contents Copyright