©1990, 1995 section list 7: Experiment 2 overview General Contents
Section 7.1 7.2 Analysis and results subsections Section 7.2.5

7.2 Analytic methods and results

7.2.1 Analysis of sensor usage

The modifications to the program meant that, as well as the effectors recorded in the previous experiment, the trace records included all the key-presses having the effect of turning sensors on or off. From these records, knowing that all the sensors were originally off (except the scores, which were all on), one could deduce the state of visibility of all the sensors at any time, and this from only the trace files, without needing to recreate the simulations for that run.

So that we can discuss collections of sensors, we shall here use the term ``chord'', by analogy with the musical term, to mean a collection of sensors that are simultaneously available---in the current interface, visibly. For instance, one chord commonly used (later on) by subject AJ was the combination of the three sensors that indicate the ROV height, range from target, and relative heading of target.

Analysis of these chords needed a means of representing and manipulating them on the computer. Since there were less than 24 sensors in each of the three sub-displays, it was possible to hold the chords in the form of 32-bit integers, where there was one bit for each of the sensors that could be on or off, three bits showing what sub-display was current (each sub-display having different sensors), a further bit showing whether the graphic display for that sub-display was on or off, and another showing whether the general position indicator was on or off. Here these are written in octal format. The first digit (after `ch') contains the bits indicating the status of the graphic sensors, with the first digit being 1 if the sub-display graphic is on, 2 if the general position indicator is on, and 3 if both are on. The second digit is 2 for the ship sub-display, 3 for the ROV, and 4 for the cable. The last 8 digits (24 bits) are for the individual sensors, one digit having information on up to 3 sensors (the individual bits of the octal number). A 7 indicates three sensors on; a 3, 5 or 6, two sensors, and a 1, 2 or 4, one sensor on. Thus, as in Figure 7.2, the chord ch0347372742 indicates the ROV sub-display, with graphics off, and 15 sensors on (which was all of them). The chord ch3200000000 indicates the ship sub-display, with both its own graphic display and the general position indicator on, and all other sensors off.

A program called tracechord was written by the author to analyse the chords from the action traces. The action trace files, which were in the same format as in the previous experiment, were passed to tracechord, which kept a track of which sensors were visible, identified the chords used, and added up the number of actions performed while each chord was showing. Figures 7.2 and 7.3 show output from tracechord for the two subjects, side by side. Each entry shows the chord code, the number of effector actions performed when this chord was operating, and the proportion of the total represented by this number.


             AJ                              MT
 ch0347372742  1331 0.4087       ch0347372742  1374 0.4883
 ch1347372742   885 0.2717       ch3277633303   605 0.2150
 ch3277633303   600 0.1842       ch1347372742   538 0.1912
 ch0400766736   241 0.0740       ch0400766736   161 0.0572
 ch3200000000    44 0.0135       ch2277633303   127 0.0451
 ch2277633303    41 0.0126       ch0400000000     5 0.0018
 ch2200000000    32 0.0098       ch1300000000     2 0.0007
 ch0300000000    30 0.0092       ch2200000000     1 0.0004
 ch1300000000    17 0.0052       ch3200000000     1 0.0004
 ch0400766636     9 0.0028
 ch0200000000     8 0.0025
 ch0400000000     8 0.0025
 ch2347372742     5 0.0015
 ch0277633303     3 0.0009
 ch0343372742     2 0.0006
 ch3347372742     1 0.0003
Figure 7.2: Sensor chord usage at the outset

Figure 7.2 shows figures for the first few hours of each subject's practice. The subjects, as instructed, initially left most of the sensors on while they were in the initial stages of learning the task, and there is not a very wide range of chords that were tried. The chords that were extensively used at this early stage were mainly those where most, if not all, the sensors are on. In Figure 7.3 are corresponding figures for each subject's final few hours, except that the list shown for MT omits further chords with effector frequencies down to 1. For both subjects, there were also many other chords used where there were no effector actions, and these are omitted from the figures. In the late chords, we see that the sensor usage of both subjects has changed greatly from the early pattern, and each is markedly different from the other subject's. The frequently used chords have only a few sensors on, and there are many more different chords used.


             AJ                              MT
 ch0301000140   901 0.2891       ch1303020700   518 0.1902
 ch0200200200   330 0.1059       ch1303020100   235 0.0863
 ch0300000000   324 0.1039       ch0300000600   219 0.0804
 ch0200000000   310 0.0995       ch0300000000   186 0.0683
 ch1301000140   255 0.0818       ch0303020700   149 0.0547
 ch1300000000   205 0.0658       ch0400000000    88 0.0323
 ch0201200200   166 0.0533       ch0205000000    85 0.0312
 ch0400000000   149 0.0478       ch0207200000    80 0.0294
 ch0204000000    92 0.0295       ch0400000002    76 0.0279
 ch0400000002    68 0.0218       ch0201200002    75 0.0275
 ch0207000000    64 0.0205       ch0207200002    64 0.0235
 ch0300000040    60 0.0192       ch0200000000    58 0.0213
 ch0201000000    47 0.0151       ch0201200000    55 0.0202
 ch0206000000    21 0.0067       ch0205200000    54 0.0198
 ch0200200000    19 0.0061       ch0300000700    43 0.0158
 ch0301000100    17 0.0055       ch0301020700    42 0.0154
 ch0205000000    15 0.0048       ch0205200002    39 0.0143
 ch0300000140    10 0.0032       ch0301000100    32 0.0117
 ch0202000000     9 0.0029       ch0300000100    29 0.0106
 ch0201200000     8 0.0026       ch0225200000    28 0.0103
 ch0207200000     7 0.0022       ch0235000000    28 0.0103
 ch0200000200     6 0.0019       ch0301000700    28 0.0103
 ch0300000100     6 0.0019       ch0204000000    27 0.0099
 ch0201000200     5 0.0016       ch0207000000    25 0.0092
 ch1200000000     5 0.0016       ch0215200000    22 0.0081
 ch0207200200     3 0.0010       ch0303020100    18 0.0066
 ch0301000000     3 0.0010       ch2400000000    18 0.0066
 ch0207000200     2 0.0006       ch2400000002    17 0.0062
 ch1206000000     2 0.0006       ch0300020700    15 0.0055
 ch2201200200     2 0.0006       ch1301000100    15 0.0055
 ch0203000000     1 0.0003       ch0225000000    14 0.0051
 ch0206200000     1 0.0003       ch0234000000    14 0.0051
 ch0301000040     1 0.0003       ch0237000000    13 0.0048
 ch1201200200     1 0.0003       ch2205000000    13 0.0048
 ch1204000000     1 0.0003       ch0224000000    12 0.0044
 ch2207200200     1 0.0003       ch0301020100    12 0.0044
                                 ch1303000100    12 0.0044
                                 ch0235400000    11 0.0040
                                 ch0303020600    10 0.0037
                                 ch1300000600    10 0.0037
                                 ch2237000000    10 0.0037
                                 ch0201200202     9 0.0033
                                 ch0303000100     9 0.0033
                                 ch1301020700     9 0.0033
                                 ch0201000000     8 0.0029
                                 ch0227200000     8 0.0029
                                 ch0235200000     8 0.0029
                                 ch0215000000     7 0.0026
                                 ch0221200000     7 0.0026
                                 ch0300010000     7 0.0026
                                 ch2225000000     7 0.0026
                                 ch0203200000     6 0.0022
                                 ch0221000000     6 0.0022
                                 ch0227200002     6 0.0022
                                 ch0300000400     6 0.0022
                                 ch2224000000     6 0.0022
                                 ch3234000000     6 0.0022
                                 ch0207600000     5 0.0018
                                 ch0217000000     5 0.0018
                                 ch0220000000     5 0.0018
Figure 7.3: Sensor chord usage at the end

The sensor usage results are enough to add another dimension to differences between individuals, but they are not of themselves enough to give a predictive model of the players' actions. For this, we must think again about what is necessary for a predictive model, before we are able to integrate these sensor usage results into a coherent model.

7.2.2 The idea of context applied to this analysis

In the previous chapter, we were looking at the attempt to derive rules for various groups of control actions, and recognising that better rules were derivable from some representations of situations and actions than for others. But we did not address the wider issue of making a predictive model of an operator's behaviour of broad enough scope to simulate the performance of the whole task. Considering this issue in greater depth has important repercussions for this analysis.

From the above analysis of sensor usage, it was clear that players were able to perform the task adequately, and with improving scores, using only a small selection of sensors at any one time. A predictive model of an operator using only certain information should ideally use the same information. What information should a predictive model be using, and what rules it should have `loaded', at any particular time?

Let us consider the full spectrum of possible answers to these questions. At one extreme, it would be possible to base a model on all rules being available at once. In order to execute this model, all the relevant information for all of the rules would also have to be available. As well as not matching the results of this experiment, this model's reliance on all the information being monitored would perhaps plausibly model that aspect of human information processing if actions were few and far between, but not so if some of the actions demanded focused attention over time, as is the case in the experimental simulation here.

At the other extreme, it would be possible to base a model on the principle of only one rule being present at once. The information necessary for the execution of that rule would be well-defined and limited, but the difficulty in the model would come from the extensive higher-level rules necessary to decide which rule was the one that was relevant to any particular situation.

Seeing this spectrum of possibilities, the approach taken in this analysis was to explore a range of the middle ground first, since that appeared by far the most plausible. The middle ground assumption is that there are groups of rules that are applicable at the same stage of the task, and those rules share an information environment, in that, although they will not all require all the same information, there will be a considerable overlap. The amount of this information should be such that it is plausible to imagine a human monitoring it, given the workload and constraints of that stage of the task. There should also be some chance of reasonable higher-level rules governing either the transition to a new information environment, or rules which allow the deduction of which one should apply at any time. This `package' of rules and information requirements will here be called a context.

As well as relating to the natural use of the word, the name also serves to distinguish the idea from potentially related previous concepts such as schemas [10], frames [82], scripts [119], and their offspring. The motivation behind these concepts is more to do with long-term memory, general knowledge, and understanding story fragments, which differs from that of the present study. However, the term is used in a similar sense by Fagan et al. [35], when discussing the VM system, although this is not the same sense as in the rest of the MYCIN project.

A context is a particular stage of the task, along with the rules and the information that are actually being used during this stage, for which the chords are evidence. This is in harmony with natural usage such as ``in different contexts, the same values imply different actions''. The word ``representation'', though it has been used in a variety of looser ways, will now refer to a whole pattern of context-based information use that can be thought of as latently present while a person is performing a task; but the meaning should not be taken to include the lower-level action rules themselves. The higher-level rules for switching between contexts could be seen either as a property of individual contexts, or as a property of the representation as a whole.

Having introduced the idea of context, it should be noted that in principle it could stretch all the way in between the two extremes mentioned above. One could have just a few contexts just below the level of the task as a whole: each context would include a relatively large number of rules, and need a relatively large amount of information, but the transition rules would be less likely to be intricate. On the other hand, there could be many contexts comprising only a few rules, and each context would have a relatively small requirement for information. The higher-level rules for determining context would be correspondingly more complex. Furthermore, in principle there is no reason why a kind of context structure should not be built up in more than one layer: there could be grouping of action rules at the bottom level, and grouping of higher-level rules at levels up to the level of the whole task. This last possibility will not be explored here.

Since this idea of context is about a package of rules, information and higher-level rules, ideally we want to do context analysis in terms of rules as well as information use. However, this has not proved possible yet. An approach to this will be discussed below (§ 8.3.1), but for the time being we shall use information in our analysis.

7.2.3 Analytic approach

The analysis in terms of contexts ideally needs data about information usage, but what we have at this point is data on sensor usage. The two are not necessarily the same. This could be for a number of reasons.

  1. There could be sensors visible that were not being used. Up to a point, this could be minimised by the player having practiced to the stage where the majority of unused sensors could be turned off. However, over short time intervals under time pressure, one could expect some to be left on unused.
  2. The values of the sensors could be remembered while they were turned off, thus possibly playing a part in some decision while not being visible. In discussion, one of the subjects confirmed that this was a conscious strategy in some specific situations. Equally well, it appeared not to be an issue in other situations, so that an analysis purely in terms of memory would not reveal all that was desired.
  3. Information could be deduced from other visible sensors. For example, acceleration can be deduced from observation of speed over time, and time before arriving somewhere can be deduced from speed and distance. Also, the effector settings could be tested without needing the relevant sensor, by pressing on the effector reckoned to be currently set, which would result in the audible beep given in response to an ineffective action. Again, in principle these possibilities were confirmed in discussion.
Though the first of these reasons may not be a great problem, the second and third mean that it is unsatisfactory to use simply the chords themselves as the basis for the analysis. Furthermore, we can see from Figure 7.3 that there are many different chords used, and that this would appear rather too many to correspond with a human division into stages of the task.

The analysis needed to compensate as much as possible for these ways in which information usage was likely to differ from sensor usage. As well as this, in keeping with the original aims of the study, it was desirable for the analysis to be kept as much as possible objective and automatic. A method of grouping the chords together, and a method of allowing for implied information, are described below.

7.2.4 Analysis structure


Figure 7.4: Simplified data flow during analysis of second experiment

An outline of the flow of data in this analysis is given in Figure 7.4, which should be compared with Figure 6.4 above. The first stage of the analysis, shown in outline on the left side of the figure, is to find some context structure and content, and the second stage, down the main axis of the figure, is to use this structure in the induction of rules for the actions.

7.2.4.1 Finding context structure

There are at least three potential ways of finding context structure. Firstly, we could ask subjects what their perceived context structure is, i.e., how they split up the task in subtasks or substages, what information they use to make decisions in each context, and what rules they use. (See below, § 7.2.9.4.) This may or may not correspond with what they actually do. Secondly, we can examine the information that they use, and look for patterns in that usage. This is the main approach taken here. Thirdly, we could derive a context structure from the pattern of applicability of rules. This will be explained and discussed as part of further work, below (§ 8.3.1).

To obtain a more satisfactory (and possibly more realistic) context structure than that given by the chords alone, the raw chords needed to be grouped in some way. What was needed was more than a simple clustering process, because when the chords are clustered together, we do not wish to take the central or most frequent one as wholly defining the information usage, since other chords may have had extra sensors turned on. From the point of view of rule induction, the important point was not necessarily to find the exact information used in a context, but to find a superset of this. Having a few variables present that were not actually used should not hinder the rule induction to any great extent.

In the following procedure, which the author devised to meet this need, the frequency associated with a chord was the number of effector key-presses in the sample performed while that chord was in use. Starting from the least frequently used chord, each chord in turn was matched with all the other more frequently used chords, to find whether there was another chord within a specified `distance' of the first, and with at least as great a frequency; and if there was, to find the one of those with the greatest frequency. The less frequently used chord would then be absorbed into the more frequently used one. If the frequency of the original chord was greater than zero, the keys used in that chord would be added on to the keys used in the chord to which it was absorbed, to make a superset chord stored separately (with harmonics, using the analogy). A distance of one unit meant that the two chords differed by exactly one non-graphic sensor being on in one chord and off in the other, while if one had a graphic display on that the other had off, this was (arbitrarily) assigned a distance of three units.


 ch0301000140   998 0.3202 och0301000140 
 ch0200000000   569 0.1825 och0207200200 
 ch0200200200   537 0.1723 och0207200200 
 ch0300000000   324 0.1039 och0300000000 
 ch1301000140   255 0.0818 och1301000140 
 ch0400000000   217 0.0696 och0400000002 
 ch1300000000   205 0.0658 och1300000000 
 ch1200000000     8 0.0026 och1206000000 
 ch2201200200     3 0.0010 och2207200200 
 ch1201200200     1 0.0003 och1201200200 
Figure 7.5: Result of chord absorption for the later chords of AJ

An example of the result of this process, applied to the later chords of subject AJ already given in Figure 8.3, is shown in Figure 7.5. With the threshold of absorption of a chord set at two units, the number of distinct units reduced from 36 original chords to 10 groups of chords. In the figure, each line has four components. The first three are as before: the `base' chord code of highest frequency in this group (starting with `ch'); the frequency (of effector actions with this group of chords); and this frequency expressed as a proportion of the whole. The fourth and last entry on each line (starting with `och' for `overchord') represents the chord made up by including all the sensors that were on in any of the chords in the group. For instance, in the original list, the chord ch0400000000 (cable sub-display with all sensors off) had a frequency of 149. In the list of chord groups, this has absorbed the chord ch0400000002 (cable sub-display with one sensor on) which had a frequency of 68, so that the resulting group of chords has a base chord of ch0400000000, an overchord of och0400000002 and a frequency of 217. The other groups are made up in the same way.

Having thus made an attempt to integrate related chords together into groups, the next step was to make allowance for information that was implied. Each sensor was assessed for its likely implications, and a routine was written to add in these implied quantities to the contexts. The implications were based on just a few basic principles.

  1. In every sub-display, the settings of the effectors would be counted as known, since they would be initially known on setting, and it was also possible to test or confirm the settings by further effector key-presses.
  2. Any sensor would imply its time derivative, if this was a sensible relevant quantity.
  3. The graphic displays would be taken to imply the information that was most obvious. This involved introducing quantities that had no separate sensor of their own. This was the most difficult implication about which to achieve any certainty.
It is difficult to be comprehensive about these implications, so it would be surprising if there were not some omissions, and indeed spurious inclusions. This is a list of the implications that were included in the analysis. The items followed by a star (*) did not have a sensor of their own.

These implications were added to the overchords, which then resulted in chord groups as in Figure 7.6. Comparing this with the previous figure, 7.5, we see how the overchords have been filled out. For instance, for the first chord, ch0301000140, before implications, the overchord has only the same three sensors as the base chord: after implications, the overchord has seven sensors (och0303102142). In addition to this, the implied quantities that had no actual sensor were recorded in another part of the data structure, along with the implications from the general position indicator that referred to information whose digital sensor was in another sub-display. When these chord groups are used, the base chords on the left are used to match a chord for closeness, and the overchords and extra quantities without sensors are used to give what is hoped to be a superset of the information used in any particular context.


 ch0301000140   998 0.3202 och0303102142 
 ch0200000000   569 0.1825 och0207211301 
 ch0200200200   537 0.1723 och0207211301 
 ch0300000000   324 0.1039 och0300102002 
 ch1301000140   255 0.0818 och0303112142 
 ch0400000000   217 0.0696 och0400000116 
 ch1300000000   205 0.0658 och0303112142 
 ch1200000000     8 0.0026 och0206611101 
 ch2201200200     3 0.0010 och0237211301 
 ch1201200200     1 0.0003 och0205611301 
Figure 7.6: Result of implications after chord absorption

7.2.4.2 Using context structure in the remaining analysis

The second stage of the analysis follows the data down the central path in the diagram (Figure 7.4). Starting with the trace data, the first step is to expand it in the same way as in the previous experiment. There, actrep then dealt with the representation of actions, both null and compound. In this experiment, having introduced higher-level ROV turn controls, the focus was away from the representation of actions; but null actions still needed attending to even if compound actions were going to be ignored.

A modified version of actrep removed key-presses that were ineffectual, and put in a null action wherever there were at least 10 consecutive time steps without any key-presses (that is, 5 seconds). This produced a reasonable number of null actions, such that the number of null actions was at least of the same order as the numbers of any other individual class, but not so many as to far outnumber all the other classes put together.

The functions of the previous programs sitrep and indprep (see Figure 6.4) were combined into a new program prepcont (for prepare data according to context), part of which incorporated a definition of a representation in terms of contexts, either output from tracechord or hand-written. The program prepcont then amounted to some 600 lines of source code. Some decision had to made about which actions to include and which to leave out, as was done previously by sitrep with the representation files. Including all of them would merely clutter up the programs, since there are several actions that were either never or very rarely taken. The subject AJ never used the ship's propellers individually, and therefore the relevant effectors were left out in his case. But they were left in for subject MT, who did use them. The camera angle controls were left out on the grounds that the information on which these actions would be based would be graphical in form, and difficult to formalise. The action of detonating the mines was left out because the button for it is in the top section of the screen, always available, and hence it would not obviously belong to any one of the ordinary contexts. There were 44 remaining keys that were included in the analysis for AJ, 53 for MT, which were responsible for the overwhelming majority of the total key-presses. In a significant change from the earlier method, the program prepcont output a number of files ready for the rule-induction programs: this is the intended meaning of the fanning out of arrows in Figure 7.4. Each of the defined contexts had separate files, and to facilitate the testing of rules against test data from the same time interval, the data for each interval and context were split into two parts, by putting alternate examples in two files. Thus, for a representation of 10 contexts, 20 files would be generated from however much data was fed in to prepcont at one time. The form of these files was as before (see Figure 6.8).

This separation of alternate examples was reasonable in this case because any action is associated with the situation prevailing at only one time interval, and the rule-induction process does not make any distinction on grounds of the order of the examples---any significance that there might have been is lost in any case. In contrast, when inducing higher-level rules for contexts themselves (see below, § 7.2.7), one cannot use the same method of splitting data, because one context covers a sequence of examples, and if one split up the data by assigning alternate ones to alternate data sets, one would effectively have training and test sets that were drawn from the same instances. So in that case, the data needed to be divided sequentially.

In order to be more comprehensive than in the previous experiment, it was decided to generate rules for every set of data, and to test those rules against every other set of data within the same context. It proved possible to write a C-shell script to govern this process, using the same implementation of CN2 as previously for the induction. The unordered mode of CN2 was used for this analysis, with the modification that only rules that had the decision class as the class of maximum frequency were to be recorded. The parameters of CN2's operation were given values that had given reasonable results in the first experiment. `Star' was set at 15 (a value also used in tests by the algorithm authors [23]), and the significance threshold at 15.0.

Next Section 7.2.5
General Contents Copyright