| ©1990, 1995 | section list | 6: Experiment 1 | overview | General Contents |
| Section 6.2 | 6.3 Methods and results subsections | Section 6.4 | ||
Unpaid volunteers were requested by word-of-mouth and local posters. The selling pitch was that the simulation was more interesting than, and different from, other computer games. A spirit of competition was fostered firstly by providing a scoreboard as part of the help, secondly by offering a small prize (unspecified at the start) for the highest overall score in a single game at the end of a given period. Four subjects at GH, and two at CXT (other than the author), achieved at least a basic competence allowing them to complete games within a reasonable time-span. We will abbreviate those at GH to R, G, S and M. Those at CXT we will abbreviate to DM and DJ.
As explained above (§ 6.2.3),
the interaction with the simulation
was entirely through the mouse.
The keyboard itself had no effect, except that
during the course of the experiment, a facility was introduced
so that players could skip short amounts of time (when they
knew in advance that they would not want to take any actions)
by pressing a number key.
The key 1 skipped about 10 seconds-worth of action,
2 about 20 seconds-worth, and so on.
In no case could this increase the score, and
these actions were not recorded in the first experiment.
The primary data consisted of
time-stamped records of every legal key-press.
This was recorded in a format of five
blank-separated numerical fields per line, one line
representing one key-press (see Figure 6.2).
These files will be referred to as `action trace
files' or simply `trace files'.
00018 2 3 2 4 Both_Props_Full_Ahead 00021 2 3 5 0 Rudder_Hard_Port 00049 2 3 5 2 Rudder_Centre 00053 1 0 8 1 Scale_over_2 00054 1 0 8 1 Scale_over_2 00056 1 0 2 0 Fix_Ship 00058 1 0 2 1 Centre_Ship 00160 2 3 5 0 Rudder_Hard_Port 00195 2 3 5 2 Rudder_Centre 00201 2 3 5 3 Rudder_Gentle_Stbd 00207 2 3 5 2 Rudder_Centre 00223 2 3 2 0 Both_Props_Full_Astn 00356 1 3 1 0 Stop_Return_to_HelpFigure 6.2: A commented example game trace file
There were two types of action trace file. Each game had its key-presses recorded in a separate file, and each session of zero or more games had the ancillary key-presses (starting and stopping the game, reading the help information provided, etc.) recorded in the same format, together, without the key-presses from the games themselves.
The first field of each trace file gives the number of half-seconds since the beginning of the game, or session. This varies from zero up to several thousand (7200 being equivalent to one hour). The remaining four fields are single digits, representing in turn the sub-display, the column, the row, and the element within that row. These refer to the obvious divisions of the interface screen.
Including the files due to the author, covering the period Oct 8th to Jan 16th, there were over 300 files from GH and nearly 150 files from CXT. These occupied over 1 Mbyte and over 400 kbyte respectively.
All plausible actions were recorded, which included those actions that were impossible or ineffective at the time, and which caused an audible warning to the player at the time of the game, coinciding with the highlighting of the selected area. However, there were some mouse-button-clicks that occurred while the cursor was in an area that was generally inactive, and these were not recorded. Such button-clicks also caused an audible warning at the time of play, but did not cause any highlighting of the selected area of the screen.
In addition to the action trace files mentioned above, there was one file for each site, in which each game has a one line entry. The fields in this file (called the `Runindex') were:
calm_weather s 17282750 Apr14-00:29 -546 1036 20262 calm_weather s 17283514 Apr14-00:42 982 998 10535 calm_weather s 17338726 Apr14-16:02 5799 2201 571 calm_weather s 17339722 Apr14-16:19 7044 2456 10535 calm_weather s 17340482 Apr14-16:32 6845 2155 20262 calm_weather s 17341776 Apr14-16:53 7781 3219 1940 calm_weather s 17447710 Apr15-22:19 6637 2363 8818Figure 6.3: Part of a Runindex file
Table 6.1: Subjects' times, scores, and dates (1989--90)
An idea of the amount of data gathered can be obtained from Table 6.1. ``No. Runs'' is the total number of games played. Only a few of these were false starts, abandoned after a very short time. ``Total Time'' is the hours and minutes spent on the games themselves, not counting ancillary activities. ``Best Time'' is the total time spent at the end of the game that gave the best score, recorded in the next column. ``Start Date'' and ``Best Date'' allow the calculation of the calendar period from first trying the game to scoring the recorded high score. ``Finish Date'' represents the date of the last game played before the (arbitrary) cut-off date of 16th January 1990. There were in fact only a few games played after this, with no-one improving their score. The subject with the highest score, G, was also the one who had put in most hours of practice. For comparison, the author, with substantially more practice than any of the subjects, achieved a score of over 9000. A practical ceiling would appear to be around 10000, though this did depend on the random fluctuation of the arrangement of the mines. A score as high as this was only possible (even in theory) on a small proportion of games. Uncertainties in the scoring system are discussed below (6.4.1).
Since the analysis of the data involves many stages, it might be helpful to review the overall pattern before describing the details. This overall view, from which some details have been omitted for clarity, appears here as Figure 6.4. In this figure, the rectangles represent types of file, and the ovals represent programs for transforming one type of file into another.
Figure 6.4: Simplified data flow during analysis
It was clear from very early on in the analysis that a single action from a human point of view was not necessarily equivalent to a single key-press. In particular, after practice on the task, short sequences of key-presses were frequently apparent. This happened particularly in the case of manoeuvring the ROV. Because there was no simple effector to turn left or right, players had to create their own sequences of actions that performed the function of turning. Even for the same direction of turn, different sequences were performed in different contexts. When going full ahead with both thrusters, a left turn of a few degrees would most probably be executed by selecting half ahead on the left thruster, followed immediately (0.5 seconds later) by restoring the left thruster to full ahead, using the full-ahead key for either the left thruster or both thrusters together. In contrast, when near a mine, gliding along with both thrusters stopped, a left turn would most likely be executed by selecting the starboard thruster half ahead, and then stopped (or both stopped). Other variations were also observed.
With these turning manoeuvres, the time interval between the unbalancing and balancing actions was crucial to determining the magnitude of the effect of the action. Due to the delayed response of the thrusters, leaving an interval of 1 second produced an effect approximately four times the size of the effect produced by an interval of 0.5 seconds. Thus, in situations where the former would be an appropriate action, the latter would not, and vice versa---the two were, in effect, different actions. There were other sequences of actions where time interval and order were not important. For example, when recovering the ROV, the first step was to reel in the cable. This was effected by setting the cable tension to `grip' and the take-in speed to `fast'; but the ordering and interval of these two actions was immaterial.
Perhaps these considerations could in principle be derived from the data. However, in this study, they were deliberately introduced, as background knowledge, in the attempt to get the best classification of actions possible in the available time.
The problem then remained to find what compound actions actually occur in different players' task performance. This could be attempted by observation and questioning; but, without any objective measure, it would be difficult to assess how much of the resulting discoveries were artificially produced by the biases and preconceptions of player and experimenter. Thus a prime objective was to devise a program to compile a list of these compound actions given only a quantity of raw data. Programs for learning macro-operators in games or puzzles have been devised, but the methods used and quoted (e.g. in [120]) do not explicitly deal with time or dynamic systems, and were thus unsuitable here.
The basic program to perform this was called
summ (for summary).
The program first completed a large table,
of the frequency of occurrence of each key-interval-key, for
intervals between 0.5 seconds (coded 0) and 2 seconds (coded 3).
Two seconds was judged to be a reasonable maximum between two
sub-actions that formed a higher-level action.
The program then wrote out a summary
of the more commonly occurring sequences.
Depending on flags given on the command line, this summary
was either intended as a general summary for human reading,
(see Figure 6.5) or as a list of 4-tuples
specifying key-interval-key sequences and single
replacement keys (see Figure 6.6).
The summary in Figure 6.5 gives on the first line:
3 3 1 3 relative 0.050 freq :1342 Port_Ths_Half_Ahd
0 3 3 1 2 freq : 17 Port_Ths_Stop Slow_Turn_Stbd_0
1 3 3 1 2 freq : 55 Port_Ths_Stop Slow_Turn_Stbd_1
2 3 3 1 2 freq : 38 Port_Ths_Stop Slow_Turn_Stbd_2
3 3 3 1 2 freq : 31 Port_Ths_Stop Slow_Turn_Stbd_3
0 3 3 1 4 freq : 370 Port_Ths_Full_Ahd Diff_Turn_Port_0
1 3 3 1 4 freq : 66 Port_Ths_Full_Ahd Diff_Turn_Port_1
2 3 3 1 4 freq : 11 Port_Ths_Full_Ahd Diff_Turn_Port_2
0 3 3 2 2 freq : 24 Both_Ths_Stop Slow_Turn_Stbd_0
1 3 3 2 2 freq : 70 Both_Ths_Stop Slow_Turn_Stbd_1
2 3 3 2 2 freq : 65 Both_Ths_Stop Slow_Turn_Stbd_2
3 3 3 2 2 freq : 55 Both_Ths_Stop Slow_Turn_Stbd_3
0 3 3 3 1 freq : 228 Stbd_Ths_Half_Astn Pure_Turn_Stbd
1 3 3 3 1 freq : 75 Stbd_Ths_Half_Astn Pure_Turn_Stbd
3 3 3 3 2 freq : 9 Stbd_Ths_Stop
Figure 6.5: A fragment of a summary
3 3 1 3 0 3 3 1 4 0 3 3 0 3 3 1 3 0 3 3 3 1 0 3 0 1 3 3 1 3 1 3 3 1 4 0 3 3 1 3 3 1 3 1 3 3 3 1 0 3 0 1 3 3 1 3 2 3 3 1 4 0 3 3 2 3 3 1 3 2 3 3 3 1 0 3 0 1 3 3 1 3 3 3 3 1 4 0 3 3 3 3 3 1 3 3 3 3 3 1 0 3 0 1Figure 6.6: A fragment of a replacement chart
One deficiency with this basic summary program was that
it could not deal with sequences of more than two actions.
This was overcome by using the program iteratively.
First a basic replacement chart was made.
Next, actions were fed through a filter that
makes the changes specified in the first chart.
This program was called actrace
(not shown in Figure 6.4)
from its effect of changing the
actions in the trace file.
The output of this filter was then fed in to
summ again, giving a second chart,
some of whose input keys may have been new composite ones.
A C-shell script (named chart) was written
to govern this iterative process.
In the final version of chart,
the first application of summ output only more common
sequences, and two subsequent iterations included
progressively less common sequences.
The typology of actions is further discussed below (§ 6.4.2).
The trace files only had information concerning
the actions taken, not about the situations.
The files had therefore to be expanded before
analysis to include full details of the situation,
and this was done by a program called exp,
which was a modified version of the simulation program,
without the graphics or interaction.
The input was an action trace file, and the output was
a binary file containing all the data on which
the displays were based for each half-second step.
This produced an increase in
the file size of a factor of 250--300.
(Thus it would be quite impractical to
store many of these expanded files on disk.)
An expanded file also permitted a more flexible form of replay. Replaying from the trace file was possible, but it was only one-way (forwards), and took considerable time to execute, since all the original mathematics have to be performed. With an expanded file, on the other hand, one was able to jump to any place in the game, stop, or go backwards. This proved to be a help in getting an intuitive feel for what the various subjects were doing. It allowed an observer to study the circumstances of a certain action in a flexible, easy way.
The expanded file then had its actions modified in
accordance with the required replacement chart.
The program that effected this was called actrep
(for action representation).
This was done twice, in series,
so as to enable longer sequences to be converted
than would be possible with only one application.
As well as the explicit actions, there was also the question of a reasonable human representation of the null action. The original expansion gave null actions for every time step (0.5 seconds) where there was no explicit action. Since one of the priorities of the simulation was to get away from an over-dependence on critical timing, it seemed unreasonable to class the time steps immediately preceding an action as null---after all, the player may have just been a bit slower than intended or desired. A thinking-time parameter was therefore introduced, imagined to be around 1 to 3 seconds, and for this amount of time around an action no null actions were passed on. This was tried with thinking time extending only before, or before and after, any action. Another way of describing the purpose of this would be to say that we want null actions to be registered when everything is fine, not in the thick of hectic action. Compare this with Card et al.'s `M' operator (``mentally prepare'') in their Keystroke Level Model [20]. The value for the `M' operator is given as 1.35 seconds.
Having thus reached the stage where the representation of actions was altered to what we may suppose is a more human-like form, it was left to decide what to do about the representation of the situations. Whereas the basic representation of actions was unequivocal (discrete key-presses), the representation of situations implied by the interface was not completely clear (see below, § 6.4.3). The approach taken was to remain agnostic about the exact information provided, because in any case the aim of the methods investigated was to be able to tell when a representation was closer to the human one.
Having decided on a representation to test,
the selection of the attribute values in that
representation was done by the program called
sitrep (for situation representation).
The representation was defined by a hand-crafted file, listing
the attributes to be selected (see Figure 6.7).
(RT01) (RT2)
2 1
rov_degrees rov_off_head
rov_target_head 4
4 rov_height
rov_height rov_speed
rov_speed rov_r
rov_r rov_target_range
rov_target_range 4
4 sub_display
sub_display rov_av_revs_demand
rov_port_revs_demand rov_turn_demand
rov_stbd_revs_demand rov_status
rov_status
3
3 0 3 0 1 Pure_Turn_Stbd
3 3 1 3 Port_Ths_Half_Ahead 0 3 0 3 Pure_Turn_Port
3 3 3 3 Stbd_Ths_Half_Ahead 0 0 0 0 NO_KEY
0 0 0 0 NO_KEY
Figure 6.7: Example representation files
The three sections of situation attributes are respectively
integer variables, floating-point variables,
and qualitative variables.
The fourth group is a selection of actions (classes) in
the single `decision' attribute, along with their key codes.
Lower-level representations
(such as the one marked (RT01))
contain only attributes that are explicitly
present in the unmodified expanded data.
Higher-level ones (such as the one marked (RT2))
have some quantities that are not explicitly present in the
original data, and therefore have to be calculated on the spot.
In the example case, rov_off_head is the relative
bearing of the closest active target from the ROV.
It is calculated from rov_degrees,
the heading of the ROV, and rov_target_head,
the true bearing of the target from the ROV.
Rov_av_revs_demand and rov_turn_demand are
calculated from the two demands in the old representation RT01.
Pure_Turn_Stbd is a combination of
Port_Ths_Half_Ahead and Stbd_Ths_Half_Astn,
as shown in Figures 6.5
and 6.6 above.
Pure_Turn_Port is defined similarily.
In the expanded file,
the number of half-second intervals where there was
no key-press outnumber those in which there was.
Any individual key, therefore, is greatly outnumbered by
what might be regarded as null actions (NO_KEY).
It would be unhelpful to include all these as examples
for a rule-induction program, for two reasons:
After sitrep had output the selected data
(now in readable, `ascii' form), it was a straightforward
matter to format this for a rule-induction program.
This was done by a program called indprep
(for induction preparation), which had a
command line flag determining which rule-induction program the
data should be prepared for, since there is no standard format.
Other flags determined whether indprep should output a file
of examples (data) only, attributes (names), or both together.
As an example, the attribute file and
example file (containing 30 examples)
for subject G, representation RS2
(see Figure 6.9, below),
interval 4 (as in Table 6.2 below),
is given here as Figure 6.8.
In each example, there is one entry for every attribute,
in order.
**ATTRIBUTE FILE**
rov_off_head:(FLOAT)
rov_u:(FLOAT)
rov_v:(FLOAT)
rov_target_range:(FLOAT)
rov_height:(FLOAT)
rov_speed:(FLOAT)
sub_display:ship rov umb env;
rov_av_revs_demand:f_astn hf_astn h_astn q_astn stop q_ahd h_ahd hf_ahd f_ahd;
stage:initial searching placing far close final pull_in infringe;
class:Both_Ths_Full_Astn Both_Ths_Half_Astn Both_Ths_Stop Both_Ths_Half_Ahd
Both_Ths_Full_Ahd NO_KEY;
**EXAMPLE FILE**
48 7.98 0.00 1000.0 48.0 0.00 ship stop initial NO_KEY;
3 7.97 0.00 418.9 48.0 0.00 ship stop initial NO_KEY;
11 7.94 -0.22 320.3 48.0 0.00 ship stop placing NO_KEY;
30 7.60 -0.11 231.4 48.0 0.00 ship stop placing NO_KEY;
52 4.08 -0.01 150.4 34.5 4.14 rov stop far Both_Ths_Full_Ahd;
15 3.21 -1.08 134.9 34.5 3.39 rov f_ahd far NO_KEY;
15 3.19 -0.27 50.6 30.2 3.68 rov f_ahd far NO_KEY;
21 1.34 -0.72 16.1 12.1 1.53 rov f_ahd final Both_Ths_Stop;
15 -0.13 -0.11 14.1 8.5 0.88 rov stop final Both_Ths_Half_Ahd;
28 0.30 0.26 11.9 3.0 0.56 rov q_ahd final NO_KEY;
-56 0.24 0.56 15.7 2.0 0.61 rov q_ahd final Both_Ths_Half_Ahd;
-8 1.04 0.33 8.7 1.8 1.09 rov h_ahd final Both_Ths_Stop;
-49 0.13 0.25 4.5 2.1 0.29 rov stop final NO_KEY;
-20 -0.12 0.07 6.1 2.6 0.15 rov stop final Both_Ths_Stop;
27 0.06 -0.06 6.9 2.8 0.08 rov stop final Both_Ths_Stop;
-15 -0.15 -0.08 6.8 3.0 0.16 rov stop final Both_Ths_Stop;
56 0.09 -0.24 4.3 3.4 0.26 rov stop final NO_KEY;
-5 -6.69 -1.41 65.7 19.7 7.04 ship stop pull_in NO_KEY;
2 -0.67 1.12 1000.0 36.3 1.31 ship stop searching NO_KEY;
2 -1.44 -5.76 1000.0 37.9 6.86 ship stop searching NO_KEY;
2 0.59 -7.44 1000.0 39.2 7.98 ship stop searching NO_KEY;
-157 0.62 -8.28 487.4 39.8 7.57 ship stop searching NO_KEY;
-169 0.55 -7.37 465.4 40.3 8.36 ship stop searching NO_KEY;
178 0.61 -8.21 463.9 40.9 7.51 ship stop searching NO_KEY;
166 0.58 -7.49 483.1 41.5 8.18 ship stop searching NO_KEY;
-132 -5.27 -6.18 424.4 42.5 7.37 ship stop searching NO_KEY;
-129 -7.05 -4.36 229.7 45.2 7.55 ship stop placing NO_KEY;
-109 -3.64 -2.69 119.6 47.0 4.59 rov stop far Both_Ths_Full_Ahd;
5 1.94 1.50 80.1 47.4 2.46 ship f_ahd infringe NO_KEY;
-14 2.78 0.88 60.1 47.7 2.91 ship f_ahd infringe NO_KEY;
Figure 6.8: An instance of an example and attribute file for the
CN2 induction program.
Three rule-induction programs were readily available: C4, ID3 and CN2 [23] (for a description and comparison of some algorithms, see [39]). (The version of CN2 used was developed by Robin Boswell of the Turing Institute, as part of ESPRIT project 2154, the Machine Learning Toolkit.) C4 is based on the ID3 algorithm, and like ID3, produces output in the form of decision trees. When C4 was tried on larger data sets (a few thousand examples) it was found to be excessively slow, taking several hours to run, and on the largest ones it `crashed'. It was then decided against as a primary tool. Of the other two, CN2 was chosen as the more appropriate, because
The unordered mode produces if-then rules where the condition is made up of a conjunction of conditions on any of the attributes. Disjunction (`or') is produced by having a number of rules all for the same decision class.
A standard method for generating and testing rules was adopted. This is to take a training set of data, and use the program to generate rules, then to take the training set and unseen test sets, and to evaluate the prediction performance of the rules on these data. This process leads to figures for the effectiveness of the generated rules for each decision class considered, and an overall prediction performance figure, which must be carefully compared with the prediction performance of a default rule before being able to assess its value.
The first comparison of representations was between those given above in Figure 6.7 (see also § B.1). This used CN2 in its `unordered' mode, where the rules produced are independent of each other (the order of them is immaterial). This was, however, a new facility to be added to CN2, and it was still to be fully tested.
The example was interesting, because although it looked
as if the second case performed better, in fact,
comparing the prediction performance of the rules
with the default rule reveals that these rules
did not score better at predicting human actions
than the rule ``do nothing all the time''.
The default rule is that all examples belong to the modal
(most frequent) class, which in these cases is NO_KEY.
So we obtain a figure for the default rule by
summing the actual frequencies of NO_KEY and
dividing by the total number of examples.
In the unordered case, with the representation RT01,
the prediction performance of the rules
even on the training set was very close
to the prediction performance of the default rule.
On the test data, the prediction performance
was substantially worse than the default.
Looking at the individual rules (§ B.1),
the second rule makes sense in that it is saying that
when the first key-press of a `pure turn' has been
carried out, the corresponding part then needs to be done.
In contrast, the fourth rule is quite implausible.
Simply from considerations of symmetry, we could
call into question a rule where there were symmetrical
conditions but an asymmetrical action.
In this case, the rule must be presumed to have emerged
from a coincidence in the data.
The figures after the rule indicate that it is not very
well supported even in the training data: we might
expect it to be even more poorly supported in test data.
But many of the other rules can be criticised in a similar way.
With the new representation RT2, the overall accuracy
figures are not much different from the default rule figures.
But the rules look much better than
in the representation RT01.
Firstly, there are fewer of them, which is an advantage.
Secondly, most of them can be made good sense of.
After discovering some unresolved issues
with the unordered mode in CN2,
the same data were reworked using the ordered mode
(see Appendix B.2).
This appendix illustrates the kind of rules
obtained using the ordered mode.
Briefly, the test data results on the representation
RT01 still are below the default values.
For the representation RT2, the results
just manage to be better than default.
The subject with the longest experience was G, at George House.
His trace files were grouped into four equal calendar time
intervals: 02, 03, 04 and 05, in order from earlier to later.
The calendar divisions are October 19th, October 30th,
November 10th, November 22nd, and December 4th.
The data was fed through the actrep filter,
using action replacement charts generated for
the subject G from all his games together.
The 05 games (including the subject's best scoring game)
were used for generating rules.
Rules were generated on three representations of
ROV speed control, named RS0, RS1 and RS2.
(See Figure 6.9.)
(RS0) (RS1) (RS2)
2 1 1
rov_degrees rov_off_head rov_off_head
rov_target_head
3 5
5 rov_target_range rov_u
rov_u rov_height rov_v
rov_v rov_speed rov_target_range
rov_target_range rov_height
rov_height 3 rov_speed
rov_speed sub_display
rov_av_revs_demand 3
3 stage sub_display
sub_display rov_av_revs_demand
rov_port_revs_demand 6 stage
rov_stbd_revs_demand 3 3 2 0 Both_Ths_Full_Astn
3 3 2 1 Both_Ths_Half_Astn 6
6 3 3 2 2 Both_Ths_Stop 3 3 2 0 Both_Ths_Full_Astn
3 3 2 0 Both_Ths_Full_Astn 3 3 2 3 Both_Ths_Half_Ahd 3 3 2 1 Both_Ths_Half_Astn
3 3 2 1 Both_Ths_Half_Astn 3 3 2 4 Both_Ths_Full_Ahd 3 3 2 2 Both_Ths_Stop
3 3 2 2 Both_Ths_Stop 0 0 0 0 NO_KEY 3 3 2 3 Both_Ths_Half_Ahd
3 3 2 3 Both_Ths_Half_Ahd 3 3 2 4 Both_Ths_Full_Ahd
3 3 2 4 Both_Ths_Full_Ahd 0 0 0 0 NO_KEY
0 0 0 0 NO_KEY
Figure 6.9:
Three slightly varying representations for ROV speed control
These were intended to be progressively more human-like.
The CN2 parameter `star' was set to 10,
and `threshold' was set to 10, and ordered mode was used.
The rules generated were then tested against the data
from each of the divisions, 02, 03, 04 and 05.
The results are summarised in Table 6.2.
The numbers in the body of the table are the percentage points
difference between the prediction performance of the rules
and the prediction performance of the default rule.
The high scores for the interval 05 are due to
the fact that 05 interval provided the training data.
The default rule generally scored around 60% to 70%,
and the 05 interval absolute scores are over 95%.
Table 6.2: Testing rules for interval 05, subject G, against defaults
There are two trends immediately apparent in this table. One is that RS1 and RS2 perform substantially better than RS0, with RS2 being slightly the better of the two. The other is that whatever rules were induced for the interval 05 were not much in evidence during interval 02, and progressively became more so. This is reassuring in two ways: firstly it suggests that the rules found are not imaginary, or due to random effects; and secondly that these rules are being adopted increasingly as time goes on. This is consistent with a common-sense view of learning.
An alternative way of dividing up the examples
is into sets of similar size.
This was done with subject M, but otherwise the
same procedure was followed as with subject G.
Table 6.3 summarises the results for M.
Table 6.3:
Testing rules for interval 0499--0508, subject M, against defaults
The rules were again constructed on the data containing the highest score, which in this case was the 0499--0506 interval. The same general trend is apparent with respect to the representations as above.
The prediction performance of the rules across time again shows a build-up of prediction performance towards the training interval; but now also shows a subsequent decline. This could in principle be due either to a decline in task performance, with the acquired rules not being followed as strictly as before, or due to new rules supplanting the old ones. In this table, the overall accuracy figures have also been included, to show that there is in fact no rise in overall accuracy between the fourth and fifth intervals. The rise in the relative figure is due to a fall in the default rule accuracy, which, in this interval, implies that there were fewer null actions.
G and M were both in George House,
and took an interest in each other's games.
It is perhaps not surprising that similar patterns
emerge in their results, and that a representation that was able
for one of them to produce rules performing substantially above
the default rule, should also be able to do so for the other.
This is not so, however, for DM,
one of the Charing Cross Tower subjects.
His results, derived by exactly the same process as
the above results, are summarised in Table 6.4.
Table 6.4:
Testing rules for interval 065--1, subject DM, against defaults
This table suggests that the rules do not reflect the actual rules being used by this subject. The first two columns suggest, rather more strongly, that the rules are substantially different from those used at the earlier stages of learning. The pattern for all the representations is similar, and this suggests that none of RS0, RS1 or RS2 cover the attributes actually used by subject DM. However there is, if anything, a slight favouring of RS1 over RS2, contrary to the other subjects. Another representations would have to be found, if we were to find results as satisfactory for DM as for G and M.
| Next Section 6.4 | |
| General Contents | Copyright |