| ©1990, 1995 | section list | 7: Experiment 2 | overview | General Contents |
| Section 7.1 | 7.2 Analysis and results subsections | Section 7.2.5 | ||
The modifications to the program meant that, as well as the effectors recorded in the previous experiment, the trace records included all the key-presses having the effect of turning sensors on or off. From these records, knowing that all the sensors were originally off (except the scores, which were all on), one could deduce the state of visibility of all the sensors at any time, and this from only the trace files, without needing to recreate the simulations for that run.
So that we can discuss collections of sensors, we shall here use the term ``chord'', by analogy with the musical term, to mean a collection of sensors that are simultaneously available---in the current interface, visibly. For instance, one chord commonly used (later on) by subject AJ was the combination of the three sensors that indicate the ROV height, range from target, and relative heading of target.
Analysis of these chords needed a means of
representing and manipulating them on the computer.
Since there were less than 24 sensors in each of
the three sub-displays, it was possible to hold
the chords in the form of 32-bit integers,
where there was one bit for each of the sensors that
could be on or off, three bits showing what sub-display
was current (each sub-display having different sensors),
a further bit showing whether the graphic display for
that sub-display was on or off, and another showing
whether the general position indicator was on or off.
Here these are written in octal format.
The first digit (after `ch') contains the bits
indicating the status of the graphic sensors,
with the first digit being 1 if the sub-display graphic
is on, 2 if the general position indicator is on,
and 3 if both are on.
The second digit is 2 for the ship sub-display,
3 for the ROV, and 4 for the cable.
The last 8 digits (24 bits) are for the individual sensors,
one digit having information on up to 3 sensors
(the individual bits of the octal number).
A 7 indicates three sensors on;
a 3, 5 or 6, two sensors, and a 1, 2 or 4, one sensor on.
Thus, as in Figure 7.2,
the chord ch0347372742 indicates the ROV sub-display,
with graphics off, and 15 sensors on (which was all of them).
The chord ch3200000000 indicates the ship sub-display,
with both its own graphic display and the general position
indicator on, and all other sensors off.
A program called tracechord was written by the
author to analyse the chords from the action traces.
The action trace files, which were in the same format
as in the previous experiment, were passed to
tracechord, which kept a track of which sensors
were visible, identified the chords used,
and added up the number of actions
performed while each chord was showing.
Figures 7.2 and 7.3
show output from tracechord for the two subjects,
side by side.
Each entry shows the chord code,
the number of effector actions performed when this chord
was operating, and the proportion of the total represented
by this number.
AJ MT
ch0347372742 1331 0.4087 ch0347372742 1374 0.4883
ch1347372742 885 0.2717 ch3277633303 605 0.2150
ch3277633303 600 0.1842 ch1347372742 538 0.1912
ch0400766736 241 0.0740 ch0400766736 161 0.0572
ch3200000000 44 0.0135 ch2277633303 127 0.0451
ch2277633303 41 0.0126 ch0400000000 5 0.0018
ch2200000000 32 0.0098 ch1300000000 2 0.0007
ch0300000000 30 0.0092 ch2200000000 1 0.0004
ch1300000000 17 0.0052 ch3200000000 1 0.0004
ch0400766636 9 0.0028
ch0200000000 8 0.0025
ch0400000000 8 0.0025
ch2347372742 5 0.0015
ch0277633303 3 0.0009
ch0343372742 2 0.0006
ch3347372742 1 0.0003
Figure 7.2: Sensor chord usage at the outset
Figure 7.2 shows figures for the first
few hours of each subject's practice.
The subjects, as instructed,
initially left most of the sensors on while they
were in the initial stages of learning the task,
and there is not a very wide range of chords that
were tried.
The chords that were extensively used at this early stage
were mainly those where most, if not all, the sensors are on.
In Figure 7.3 are corresponding figures for
each subject's final few hours, except that
the list shown for MT omits further chords
with effector frequencies down to 1.
For both subjects, there were also many other
chords used where there were no effector actions,
and these are omitted from the figures.
In the late chords, we see that the sensor usage
of both subjects has changed greatly from the early pattern,
and each is markedly different from the other subject's.
The frequently used chords have only a few sensors on,
and there are many more different chords used.
AJ MT
ch0301000140 901 0.2891 ch1303020700 518 0.1902
ch0200200200 330 0.1059 ch1303020100 235 0.0863
ch0300000000 324 0.1039 ch0300000600 219 0.0804
ch0200000000 310 0.0995 ch0300000000 186 0.0683
ch1301000140 255 0.0818 ch0303020700 149 0.0547
ch1300000000 205 0.0658 ch0400000000 88 0.0323
ch0201200200 166 0.0533 ch0205000000 85 0.0312
ch0400000000 149 0.0478 ch0207200000 80 0.0294
ch0204000000 92 0.0295 ch0400000002 76 0.0279
ch0400000002 68 0.0218 ch0201200002 75 0.0275
ch0207000000 64 0.0205 ch0207200002 64 0.0235
ch0300000040 60 0.0192 ch0200000000 58 0.0213
ch0201000000 47 0.0151 ch0201200000 55 0.0202
ch0206000000 21 0.0067 ch0205200000 54 0.0198
ch0200200000 19 0.0061 ch0300000700 43 0.0158
ch0301000100 17 0.0055 ch0301020700 42 0.0154
ch0205000000 15 0.0048 ch0205200002 39 0.0143
ch0300000140 10 0.0032 ch0301000100 32 0.0117
ch0202000000 9 0.0029 ch0300000100 29 0.0106
ch0201200000 8 0.0026 ch0225200000 28 0.0103
ch0207200000 7 0.0022 ch0235000000 28 0.0103
ch0200000200 6 0.0019 ch0301000700 28 0.0103
ch0300000100 6 0.0019 ch0204000000 27 0.0099
ch0201000200 5 0.0016 ch0207000000 25 0.0092
ch1200000000 5 0.0016 ch0215200000 22 0.0081
ch0207200200 3 0.0010 ch0303020100 18 0.0066
ch0301000000 3 0.0010 ch2400000000 18 0.0066
ch0207000200 2 0.0006 ch2400000002 17 0.0062
ch1206000000 2 0.0006 ch0300020700 15 0.0055
ch2201200200 2 0.0006 ch1301000100 15 0.0055
ch0203000000 1 0.0003 ch0225000000 14 0.0051
ch0206200000 1 0.0003 ch0234000000 14 0.0051
ch0301000040 1 0.0003 ch0237000000 13 0.0048
ch1201200200 1 0.0003 ch2205000000 13 0.0048
ch1204000000 1 0.0003 ch0224000000 12 0.0044
ch2207200200 1 0.0003 ch0301020100 12 0.0044
ch1303000100 12 0.0044
ch0235400000 11 0.0040
ch0303020600 10 0.0037
ch1300000600 10 0.0037
ch2237000000 10 0.0037
ch0201200202 9 0.0033
ch0303000100 9 0.0033
ch1301020700 9 0.0033
ch0201000000 8 0.0029
ch0227200000 8 0.0029
ch0235200000 8 0.0029
ch0215000000 7 0.0026
ch0221200000 7 0.0026
ch0300010000 7 0.0026
ch2225000000 7 0.0026
ch0203200000 6 0.0022
ch0221000000 6 0.0022
ch0227200002 6 0.0022
ch0300000400 6 0.0022
ch2224000000 6 0.0022
ch3234000000 6 0.0022
ch0207600000 5 0.0018
ch0217000000 5 0.0018
ch0220000000 5 0.0018
Figure 7.3: Sensor chord usage at the end
The sensor usage results are enough to add another dimension to differences between individuals, but they are not of themselves enough to give a predictive model of the players' actions. For this, we must think again about what is necessary for a predictive model, before we are able to integrate these sensor usage results into a coherent model.
In the previous chapter, we were looking at the attempt to derive rules for various groups of control actions, and recognising that better rules were derivable from some representations of situations and actions than for others. But we did not address the wider issue of making a predictive model of an operator's behaviour of broad enough scope to simulate the performance of the whole task. Considering this issue in greater depth has important repercussions for this analysis.
From the above analysis of sensor usage, it was clear that players were able to perform the task adequately, and with improving scores, using only a small selection of sensors at any one time. A predictive model of an operator using only certain information should ideally use the same information. What information should a predictive model be using, and what rules it should have `loaded', at any particular time?
Let us consider the full spectrum of possible answers to these questions. At one extreme, it would be possible to base a model on all rules being available at once. In order to execute this model, all the relevant information for all of the rules would also have to be available. As well as not matching the results of this experiment, this model's reliance on all the information being monitored would perhaps plausibly model that aspect of human information processing if actions were few and far between, but not so if some of the actions demanded focused attention over time, as is the case in the experimental simulation here.
At the other extreme, it would be possible to base a model on the principle of only one rule being present at once. The information necessary for the execution of that rule would be well-defined and limited, but the difficulty in the model would come from the extensive higher-level rules necessary to decide which rule was the one that was relevant to any particular situation.
Seeing this spectrum of possibilities, the approach taken in this analysis was to explore a range of the middle ground first, since that appeared by far the most plausible. The middle ground assumption is that there are groups of rules that are applicable at the same stage of the task, and those rules share an information environment, in that, although they will not all require all the same information, there will be a considerable overlap. The amount of this information should be such that it is plausible to imagine a human monitoring it, given the workload and constraints of that stage of the task. There should also be some chance of reasonable higher-level rules governing either the transition to a new information environment, or rules which allow the deduction of which one should apply at any time. This `package' of rules and information requirements will here be called a context.
As well as relating to the natural use of the word, the name also serves to distinguish the idea from potentially related previous concepts such as schemas [10], frames [82], scripts [119], and their offspring. The motivation behind these concepts is more to do with long-term memory, general knowledge, and understanding story fragments, which differs from that of the present study. However, the term is used in a similar sense by Fagan et al. [35], when discussing the VM system, although this is not the same sense as in the rest of the MYCIN project.
A context is a particular stage of the task, along with the rules and the information that are actually being used during this stage, for which the chords are evidence. This is in harmony with natural usage such as ``in different contexts, the same values imply different actions''. The word ``representation'', though it has been used in a variety of looser ways, will now refer to a whole pattern of context-based information use that can be thought of as latently present while a person is performing a task; but the meaning should not be taken to include the lower-level action rules themselves. The higher-level rules for switching between contexts could be seen either as a property of individual contexts, or as a property of the representation as a whole.
Having introduced the idea of context, it should be noted that in principle it could stretch all the way in between the two extremes mentioned above. One could have just a few contexts just below the level of the task as a whole: each context would include a relatively large number of rules, and need a relatively large amount of information, but the transition rules would be less likely to be intricate. On the other hand, there could be many contexts comprising only a few rules, and each context would have a relatively small requirement for information. The higher-level rules for determining context would be correspondingly more complex. Furthermore, in principle there is no reason why a kind of context structure should not be built up in more than one layer: there could be grouping of action rules at the bottom level, and grouping of higher-level rules at levels up to the level of the whole task. This last possibility will not be explored here.
Since this idea of context is about a package of rules, information and higher-level rules, ideally we want to do context analysis in terms of rules as well as information use. However, this has not proved possible yet. An approach to this will be discussed below (§ 8.3.1), but for the time being we shall use information in our analysis.
The analysis in terms of contexts ideally needs data about information usage, but what we have at this point is data on sensor usage. The two are not necessarily the same. This could be for a number of reasons.
The analysis needed to compensate as much as possible for these ways in which information usage was likely to differ from sensor usage. As well as this, in keeping with the original aims of the study, it was desirable for the analysis to be kept as much as possible objective and automatic. A method of grouping the chords together, and a method of allowing for implied information, are described below.
An outline of the flow of data in this analysis is given in Figure 7.4, which should be compared with Figure 6.4 above. The first stage of the analysis, shown in outline on the left side of the figure, is to find some context structure and content, and the second stage, down the main axis of the figure, is to use this structure in the induction of rules for the actions.
There are at least three potential ways of finding context structure. Firstly, we could ask subjects what their perceived context structure is, i.e., how they split up the task in subtasks or substages, what information they use to make decisions in each context, and what rules they use. (See below, § 7.2.9.4.) This may or may not correspond with what they actually do. Secondly, we can examine the information that they use, and look for patterns in that usage. This is the main approach taken here. Thirdly, we could derive a context structure from the pattern of applicability of rules. This will be explained and discussed as part of further work, below (§ 8.3.1).
To obtain a more satisfactory (and possibly more realistic) context structure than that given by the chords alone, the raw chords needed to be grouped in some way. What was needed was more than a simple clustering process, because when the chords are clustered together, we do not wish to take the central or most frequent one as wholly defining the information usage, since other chords may have had extra sensors turned on. From the point of view of rule induction, the important point was not necessarily to find the exact information used in a context, but to find a superset of this. Having a few variables present that were not actually used should not hinder the rule induction to any great extent.
In the following procedure,
which the author devised to meet this need,
the frequency associated with a chord was
the number of effector key-presses in the sample
performed while that chord was in use.
Starting from the least frequently used chord,
each chord in turn was matched with all the other
more frequently used chords,
to find whether there was another chord
within a specified `distance' of the first, and
with at least as great a frequency; and if there was,
to find the one of those with the greatest frequency.
The less frequently used chord would then be absorbed
into the more frequently used one.
If the frequency of the original chord was greater than zero,
the keys used in that chord would be added on to the keys used
in the chord to which it was absorbed, to make a superset
chord stored separately (with harmonics, using the analogy).
A distance of one unit meant that
the two chords differed by exactly one non-graphic sensor
being on in one chord and off in the other,
while if one had a graphic display on that the other had off,
this was (arbitrarily) assigned a distance of three units.
ch0301000140 998 0.3202 och0301000140 ch0200000000 569 0.1825 och0207200200 ch0200200200 537 0.1723 och0207200200 ch0300000000 324 0.1039 och0300000000 ch1301000140 255 0.0818 och1301000140 ch0400000000 217 0.0696 och0400000002 ch1300000000 205 0.0658 och1300000000 ch1200000000 8 0.0026 och1206000000 ch2201200200 3 0.0010 och2207200200 ch1201200200 1 0.0003 och1201200200Figure 7.5: Result of chord absorption for the later chords of AJ
An example of the result of this process, applied to the later
chords of subject AJ already given in Figure 8.3,
is shown in Figure 7.5.
With the threshold of absorption of a chord set at two units,
the number of distinct units reduced from
36 original chords to 10 groups of chords.
In the figure, each line has four components.
The first three are as before: the `base' chord code
of highest frequency in this group
(starting with `ch'); the frequency
(of effector actions with this group of chords);
and this frequency expressed as a proportion of the whole.
The fourth and last entry on each line
(starting with `och' for `overchord') represents the
chord made up by including all the sensors that
were on in any of the chords in the group.
For instance, in the original list, the chord ch0400000000
(cable sub-display with all sensors off)
had a frequency of 149.
In the list of chord groups, this has absorbed the chord
ch0400000002
(cable sub-display with one sensor on)
which had a frequency of 68,
so that the resulting group of chords has a base chord of
ch0400000000, an overchord of och0400000002
and a frequency of 217.
The other groups are made up in the same way.
Having thus made an attempt to integrate related chords together into groups, the next step was to make allowance for information that was implied. Each sensor was assessed for its likely implications, and a routine was written to add in these implied quantities to the contexts. The implications were based on just a few basic principles.
These implications were added to the overchords,
which then resulted in chord groups as in Figure 7.6.
Comparing this with the previous figure, 7.5,
we see how the overchords have been filled out.
For instance, for the first chord, ch0301000140,
before implications, the overchord has only the same
three sensors as the base chord:
after implications, the overchord has seven sensors
(och0303102142).
In addition to this, the implied quantities that had no
actual sensor were recorded in another part
of the data structure, along with the implications from
the general position indicator that referred to
information whose digital sensor was in another sub-display.
When these chord groups are used, the base chords on the
left are used to match a chord for closeness, and the
overchords and extra quantities without sensors
are used to give what is hoped to be a superset
of the information used in any particular context.
ch0301000140 998 0.3202 och0303102142 ch0200000000 569 0.1825 och0207211301 ch0200200200 537 0.1723 och0207211301 ch0300000000 324 0.1039 och0300102002 ch1301000140 255 0.0818 och0303112142 ch0400000000 217 0.0696 och0400000116 ch1300000000 205 0.0658 och0303112142 ch1200000000 8 0.0026 och0206611101 ch2201200200 3 0.0010 och0237211301 ch1201200200 1 0.0003 och0205611301Figure 7.6: Result of implications after chord absorption
The second stage of the analysis follows the data
down the central path in the diagram (Figure 7.4).
Starting with the trace data, the first step is to expand
it in the same way as in the previous experiment.
There, actrep then dealt with the representation
of actions, both null and compound.
In this experiment, having introduced higher-level
ROV turn controls, the focus was
away from the representation of actions;
but null actions still needed attending to
even if compound actions were going to be ignored.
A modified version of actrep removed key-presses
that were ineffectual, and put in a null action wherever
there were at least 10 consecutive time steps without
any key-presses (that is, 5 seconds).
This produced a reasonable number of null actions,
such that the number of null actions was at least
of the same order as the numbers of any other
individual class, but not so many as to far
outnumber all the other classes put together.
The functions of the previous programs sitrep and
indprep (see Figure 6.4)
were combined
into a new program prepcont (for prepare
data according to context),
part of which incorporated a definition of a representation
in terms of contexts, either output from tracechord
or hand-written.
The program prepcont then amounted
to some 600 lines of source code.
Some decision had to made about which actions to include
and which to leave out, as was done previously by
sitrep with the representation files.
Including all of them would merely clutter up the programs,
since there are several actions that were either never or
very rarely taken.
The subject AJ never used the ship's propellers individually,
and therefore the relevant effectors were left out in his case.
But they were left in for subject MT, who did use them.
The camera angle controls were left out on the grounds
that the information on which these actions would be based
would be graphical in form, and difficult to formalise.
The action of detonating the mines was left out because
the button for it is in the top section of the screen,
always available, and hence it would not obviously
belong to any one of the ordinary contexts.
There were 44 remaining keys that were included in
the analysis for AJ, 53 for MT, which were responsible
for the overwhelming majority of the total key-presses.
In a significant change from the earlier method,
the program prepcont output a number of files ready for
the rule-induction programs: this is the intended meaning
of the fanning out of arrows in Figure 7.4.
Each of the defined contexts had separate files,
and to facilitate the testing of rules against test data from
the same time interval, the data for each
interval and context were split into two parts,
by putting alternate examples in two files.
Thus, for a representation of 10 contexts,
20 files would be generated from however much
data was fed in to prepcont at one time.
The form of these files was as before
(see Figure 6.8).
This separation of alternate examples was reasonable in this case because any action is associated with the situation prevailing at only one time interval, and the rule-induction process does not make any distinction on grounds of the order of the examples---any significance that there might have been is lost in any case. In contrast, when inducing higher-level rules for contexts themselves (see below, § 7.2.7), one cannot use the same method of splitting data, because one context covers a sequence of examples, and if one split up the data by assigning alternate ones to alternate data sets, one would effectively have training and test sets that were drawn from the same instances. So in that case, the data needed to be divided sequentially.
In order to be more comprehensive than in the previous experiment, it was decided to generate rules for every set of data, and to test those rules against every other set of data within the same context. It proved possible to write a C-shell script to govern this process, using the same implementation of CN2 as previously for the induction. The unordered mode of CN2 was used for this analysis, with the modification that only rules that had the decision class as the class of maximum frequency were to be recorded. The parameters of CN2's operation were given values that had given reasonable results in the first experiment. `Star' was set at 15 (a value also used in tests by the algorithm authors [23]), and the significance threshold at 15.0.
| Next Section 7.2.5 | |
| General Contents | Copyright |