©1990, 1995 section list 6: Experiment 1 overview General Contents
Section 6.2 6.3 Methods and results subsections Section 6.4

6.3 Methods and results

6.3.1 Game organisation

The stand-alone nature of the game, with all the necessary information incorporated on line, allowed a user-directed experimental setting with minimal supervision. Identical versions of the game were used on two sites: George House and Charing Cross Tower. In George House (GH), people were not prevented from watching each other, and indeed this was part of the method of stimulating interest. Hence the amount of observational experience obtained by the subjects before starting the game was unrecorded. At Charing Cross Tower (CXT), the more structured environment meant that the subjects had little or no prior exposure to the game.

6.3.2 Subjects

Unpaid volunteers were requested by word-of-mouth and local posters. The selling pitch was that the simulation was more interesting than, and different from, other computer games. A spirit of competition was fostered firstly by providing a scoreboard as part of the help, secondly by offering a small prize (unspecified at the start) for the highest overall score in a single game at the end of a given period. Four subjects at GH, and two at CXT (other than the author), achieved at least a basic competence allowing them to complete games within a reasonable time-span. We will abbreviate those at GH to R, G, S and M. Those at CXT we will abbreviate to DM and DJ.

6.3.3 Data collection

As explained above (§ 6.2.3), the interaction with the simulation was entirely through the mouse. The keyboard itself had no effect, except that during the course of the experiment, a facility was introduced so that players could skip short amounts of time (when they knew in advance that they would not want to take any actions) by pressing a number key. The key 1 skipped about 10 seconds-worth of action, 2 about 20 seconds-worth, and so on. In no case could this increase the score, and these actions were not recorded in the first experiment.

The primary data consisted of time-stamped records of every legal key-press. This was recorded in a format of five blank-separated numerical fields per line, one line representing one key-press (see Figure 6.2). These files will be referred to as `action trace files' or simply `trace files'.


00018 2 3 2 4    Both_Props_Full_Ahead
00021 2 3 5 0    Rudder_Hard_Port
00049 2 3 5 2    Rudder_Centre
00053 1 0 8 1    Scale_over_2
00054 1 0 8 1    Scale_over_2
00056 1 0 2 0    Fix_Ship
00058 1 0 2 1    Centre_Ship
00160 2 3 5 0    Rudder_Hard_Port
00195 2 3 5 2    Rudder_Centre
00201 2 3 5 3    Rudder_Gentle_Stbd
00207 2 3 5 2    Rudder_Centre
00223 2 3 2 0    Both_Props_Full_Astn
00356 1 3 1 0    Stop_Return_to_Help
Figure 6.2: A commented example game trace file

There were two types of action trace file. Each game had its key-presses recorded in a separate file, and each session of zero or more games had the ancillary key-presses (starting and stopping the game, reading the help information provided, etc.) recorded in the same format, together, without the key-presses from the games themselves.

The first field of each trace file gives the number of half-seconds since the beginning of the game, or session. This varies from zero up to several thousand (7200 being equivalent to one hour). The remaining four fields are single digits, representing in turn the sub-display, the column, the row, and the element within that row. These refer to the obvious divisions of the interface screen.

Including the files due to the author, covering the period Oct 8th to Jan 16th, there were over 300 files from GH and nearly 150 files from CXT. These occupied over 1 Mbyte and over 400 kbyte respectively.

All plausible actions were recorded, which included those actions that were impossible or ineffective at the time, and which caused an audible warning to the player at the time of the game, coinciding with the highlighting of the selected area. However, there were some mouse-button-clicks that occurred while the cursor was in an area that was generally inactive, and these were not recorded. Such button-clicks also caused an audible warning at the time of play, but did not cause any highlighting of the selected area of the screen.

In addition to the action trace files mentioned above, there was one file for each site, in which each game has a one line entry. The fields in this file (called the `Runindex') were:

  1. the name of the version of the game;
  2. the player's short name, as entered by the player;
  3. the name (number) of the game record file;
  4. the date and time of the end of the game;
  5. the total score for that game;
  6. the time taken for the game, in half seconds;
  7. a random seed which determines the number and position of the mines, and which differs between most games.
A section of a Runindex is shown in Figure 6.3 (runs by the author later than the experimental period).


calm_weather s           17282750    Apr14-00:29    -546    1036   20262
calm_weather s           17283514    Apr14-00:42     982     998   10535
calm_weather s           17338726    Apr14-16:02    5799    2201     571
calm_weather s           17339722    Apr14-16:19    7044    2456   10535
calm_weather s           17340482    Apr14-16:32    6845    2155   20262
calm_weather s           17341776    Apr14-16:53    7781    3219    1940
calm_weather s           17447710    Apr15-22:19    6637    2363    8818
Figure 6.3: Part of a Runindex file


Table 6.1: Subjects' times, scores, and dates (1989--90)

An idea of the amount of data gathered can be obtained from Table 6.1. ``No. Runs'' is the total number of games played. Only a few of these were false starts, abandoned after a very short time. ``Total Time'' is the hours and minutes spent on the games themselves, not counting ancillary activities. ``Best Time'' is the total time spent at the end of the game that gave the best score, recorded in the next column. ``Start Date'' and ``Best Date'' allow the calculation of the calendar period from first trying the game to scoring the recorded high score. ``Finish Date'' represents the date of the last game played before the (arbitrary) cut-off date of 16th January 1990. There were in fact only a few games played after this, with no-one improving their score. The subject with the highest score, G, was also the one who had put in most hours of practice. For comparison, the author, with substantially more practice than any of the subjects, achieved a score of over 9000. A practical ceiling would appear to be around 10000, though this did depend on the random fluctuation of the arrangement of the mines. A score as high as this was only possible (even in theory) on a small proportion of games. Uncertainties in the scoring system are discussed below (6.4.1).

6.3.4 Analysis

Since the analysis of the data involves many stages, it might be helpful to review the overall pattern before describing the details. This overall view, from which some details have been omitted for clarity, appears here as Figure 6.4. In this figure, the rectangles represent types of file, and the ovals represent programs for transforming one type of file into another.


Figure 6.4: Simplified data flow during analysis

6.3.4.1 Analysis of the actions

It was clear from very early on in the analysis that a single action from a human point of view was not necessarily equivalent to a single key-press. In particular, after practice on the task, short sequences of key-presses were frequently apparent. This happened particularly in the case of manoeuvring the ROV. Because there was no simple effector to turn left or right, players had to create their own sequences of actions that performed the function of turning. Even for the same direction of turn, different sequences were performed in different contexts. When going full ahead with both thrusters, a left turn of a few degrees would most probably be executed by selecting half ahead on the left thruster, followed immediately (0.5 seconds later) by restoring the left thruster to full ahead, using the full-ahead key for either the left thruster or both thrusters together. In contrast, when near a mine, gliding along with both thrusters stopped, a left turn would most likely be executed by selecting the starboard thruster half ahead, and then stopped (or both stopped). Other variations were also observed.

With these turning manoeuvres, the time interval between the unbalancing and balancing actions was crucial to determining the magnitude of the effect of the action. Due to the delayed response of the thrusters, leaving an interval of 1 second produced an effect approximately four times the size of the effect produced by an interval of 0.5 seconds. Thus, in situations where the former would be an appropriate action, the latter would not, and vice versa---the two were, in effect, different actions. There were other sequences of actions where time interval and order were not important. For example, when recovering the ROV, the first step was to reel in the cable. This was effected by setting the cable tension to `grip' and the take-in speed to `fast'; but the ordering and interval of these two actions was immaterial.

Perhaps these considerations could in principle be derived from the data. However, in this study, they were deliberately introduced, as background knowledge, in the attempt to get the best classification of actions possible in the available time.

The problem then remained to find what compound actions actually occur in different players' task performance. This could be attempted by observation and questioning; but, without any objective measure, it would be difficult to assess how much of the resulting discoveries were artificially produced by the biases and preconceptions of player and experimenter. Thus a prime objective was to devise a program to compile a list of these compound actions given only a quantity of raw data. Programs for learning macro-operators in games or puzzles have been devised, but the methods used and quoted (e.g. in [120]) do not explicitly deal with time or dynamic systems, and were thus unsuitable here.

The basic program to perform this was called summ (for summary). The program first completed a large table, of the frequency of occurrence of each key-interval-key, for intervals between 0.5 seconds (coded 0) and 2 seconds (coded 3). Two seconds was judged to be a reasonable maximum between two sub-actions that formed a higher-level action. The program then wrote out a summary of the more commonly occurring sequences. Depending on flags given on the command line, this summary was either intended as a general summary for human reading, (see Figure 6.5) or as a list of 4-tuples specifying key-interval-key sequences and single replacement keys (see Figure 6.6). The summary in Figure 6.5 gives on the first line:

  1. the code for the former of the pair of keys;
  2. its frequency relative to all the actions in the sample;
  3. its absolute frequency;
  4. and a meaningful label;
and on each of the subsequent lines:
  1. the number of empty half-seconds separating the pair of actions;
  2. the code for the latter of the action pair;
  3. the absolute frequency of the key-interval-key combination;
  4. a label for the latter key;
  5. and finally, if that combination has been recognised and named (by the experimenter), a label for the combination.

3 3 1 3 relative 0.050 freq :1342 Port_Ths_Half_Ahd 
            0    3 3 1 2 freq :  17    Port_Ths_Stop        Slow_Turn_Stbd_0
            1    3 3 1 2 freq :  55    Port_Ths_Stop        Slow_Turn_Stbd_1
            2    3 3 1 2 freq :  38    Port_Ths_Stop        Slow_Turn_Stbd_2
            3    3 3 1 2 freq :  31    Port_Ths_Stop        Slow_Turn_Stbd_3
            0    3 3 1 4 freq : 370    Port_Ths_Full_Ahd    Diff_Turn_Port_0
            1    3 3 1 4 freq :  66    Port_Ths_Full_Ahd    Diff_Turn_Port_1
            2    3 3 1 4 freq :  11    Port_Ths_Full_Ahd    Diff_Turn_Port_2
            0    3 3 2 2 freq :  24    Both_Ths_Stop        Slow_Turn_Stbd_0
            1    3 3 2 2 freq :  70    Both_Ths_Stop        Slow_Turn_Stbd_1
            2    3 3 2 2 freq :  65    Both_Ths_Stop        Slow_Turn_Stbd_2
            3    3 3 2 2 freq :  55    Both_Ths_Stop        Slow_Turn_Stbd_3
            0    3 3 3 1 freq : 228    Stbd_Ths_Half_Astn   Pure_Turn_Stbd
            1    3 3 3 1 freq :  75    Stbd_Ths_Half_Astn   Pure_Turn_Stbd
            3    3 3 3 2 freq :   9    Stbd_Ths_Stop       
Figure 6.5: A fragment of a summary

 3 3 1 3      0      3 3 1 4             0 3 3 0
 3 3 1 3      0      3 3 3 1             0 3 0 1
 3 3 1 3      1      3 3 1 4             0 3 3 1
 3 3 1 3      1      3 3 3 1             0 3 0 1
 3 3 1 3      2      3 3 1 4             0 3 3 2
 3 3 1 3      2      3 3 3 1             0 3 0 1
 3 3 1 3      3      3 3 1 4             0 3 3 3
 3 3 1 3      3      3 3 3 1             0 3 0 1
Figure 6.6: A fragment of a replacement chart

One deficiency with this basic summary program was that it could not deal with sequences of more than two actions. This was overcome by using the program iteratively. First a basic replacement chart was made. Next, actions were fed through a filter that makes the changes specified in the first chart. This program was called actrace (not shown in Figure 6.4) from its effect of changing the actions in the trace file. The output of this filter was then fed in to summ again, giving a second chart, some of whose input keys may have been new composite ones. A C-shell script (named chart) was written to govern this iterative process. In the final version of chart, the first application of summ output only more common sequences, and two subsequent iterations included progressively less common sequences.

The typology of actions is further discussed below (§ 6.4.2).

6.3.4.2 Expanding the trace files

The trace files only had information concerning the actions taken, not about the situations. The files had therefore to be expanded before analysis to include full details of the situation, and this was done by a program called exp, which was a modified version of the simulation program, without the graphics or interaction. The input was an action trace file, and the output was a binary file containing all the data on which the displays were based for each half-second step. This produced an increase in the file size of a factor of 250--300. (Thus it would be quite impractical to store many of these expanded files on disk.)

An expanded file also permitted a more flexible form of replay. Replaying from the trace file was possible, but it was only one-way (forwards), and took considerable time to execute, since all the original mathematics have to be performed. With an expanded file, on the other hand, one was able to jump to any place in the game, stop, or go backwards. This proved to be a help in getting an intuitive feel for what the various subjects were doing. It allowed an observer to study the circumstances of a certain action in a flexible, easy way.

6.3.4.3 Effecting the action changes

The expanded file then had its actions modified in accordance with the required replacement chart. The program that effected this was called actrep (for action representation). This was done twice, in series, so as to enable longer sequences to be converted than would be possible with only one application.

As well as the explicit actions, there was also the question of a reasonable human representation of the null action. The original expansion gave null actions for every time step (0.5 seconds) where there was no explicit action. Since one of the priorities of the simulation was to get away from an over-dependence on critical timing, it seemed unreasonable to class the time steps immediately preceding an action as null---after all, the player may have just been a bit slower than intended or desired. A thinking-time parameter was therefore introduced, imagined to be around 1 to 3 seconds, and for this amount of time around an action no null actions were passed on. This was tried with thinking time extending only before, or before and after, any action. Another way of describing the purpose of this would be to say that we want null actions to be registered when everything is fine, not in the thick of hectic action. Compare this with Card et al.'s `M' operator (``mentally prepare'') in their Keystroke Level Model [20]. The value for the `M' operator is given as 1.35 seconds.

6.3.4.4 Selection of the desired attributes

Having thus reached the stage where the representation of actions was altered to what we may suppose is a more human-like form, it was left to decide what to do about the representation of the situations. Whereas the basic representation of actions was unequivocal (discrete key-presses), the representation of situations implied by the interface was not completely clear (see below, § 6.4.3). The approach taken was to remain agnostic about the exact information provided, because in any case the aim of the methods investigated was to be able to tell when a representation was closer to the human one.

Having decided on a representation to test, the selection of the attribute values in that representation was done by the program called sitrep (for situation representation). The representation was defined by a hand-crafted file, listing the attributes to be selected (see Figure 6.7).


(RT01)                               (RT2)

2                                    1
rov_degrees                          rov_off_head
rov_target_head                      4
4                                    rov_height
rov_height                           rov_speed
rov_speed                            rov_r
rov_r                                rov_target_range
rov_target_range                     4
4                                    sub_display
sub_display                          rov_av_revs_demand
rov_port_revs_demand                 rov_turn_demand
rov_stbd_revs_demand                 rov_status
rov_status
                                     3
3                                     0 3 0 1   Pure_Turn_Stbd
 3 3 1 3   Port_Ths_Half_Ahead        0 3 0 3   Pure_Turn_Port
 3 3 3 3   Stbd_Ths_Half_Ahead        0 0 0 0   NO_KEY
 0 0 0 0   NO_KEY
Figure 6.7: Example representation files

The three sections of situation attributes are respectively integer variables, floating-point variables, and qualitative variables. The fourth group is a selection of actions (classes) in the single `decision' attribute, along with their key codes. Lower-level representations (such as the one marked (RT01)) contain only attributes that are explicitly present in the unmodified expanded data. Higher-level ones (such as the one marked (RT2)) have some quantities that are not explicitly present in the original data, and therefore have to be calculated on the spot.

In the example case, rov_off_head is the relative bearing of the closest active target from the ROV. It is calculated from rov_degrees, the heading of the ROV, and rov_target_head, the true bearing of the target from the ROV. Rov_av_revs_demand and rov_turn_demand are calculated from the two demands in the old representation RT01. Pure_Turn_Stbd is a combination of Port_Ths_Half_Ahead and Stbd_Ths_Half_Astn, as shown in Figures 6.5 and 6.6 above. Pure_Turn_Port is defined similarily.

In the expanded file, the number of half-second intervals where there was no key-press outnumber those in which there was. Any individual key, therefore, is greatly outnumbered by what might be regarded as null actions (NO_KEY). It would be unhelpful to include all these as examples for a rule-induction program, for two reasons:

  1. the program would take a much longer time to execute, possibly making the difference between a practical length of time and an impractical one;
  2. including all of them would not help the program identify rules for the keys in which we are interested.
For these reasons, sitrep also performed the function of cutting out a large proportion of null actions. The precise proportion was controllable via a command-line parameter, or settable to a `good guess' value dependent on the number of non-null keys being investigated.

6.3.4.5 Evaluating representation primitives using rule induction

After sitrep had output the selected data (now in readable, `ascii' form), it was a straightforward matter to format this for a rule-induction program. This was done by a program called indprep (for induction preparation), which had a command line flag determining which rule-induction program the data should be prepared for, since there is no standard format. Other flags determined whether indprep should output a file of examples (data) only, attributes (names), or both together. As an example, the attribute file and example file (containing 30 examples) for subject G, representation RS2 (see Figure 6.9, below), interval 4 (as in Table 6.2 below), is given here as Figure 6.8. In each example, there is one entry for every attribute, in order.


**ATTRIBUTE FILE**
rov_off_head:(FLOAT)
rov_u:(FLOAT)
rov_v:(FLOAT)
rov_target_range:(FLOAT)
rov_height:(FLOAT)
rov_speed:(FLOAT)
sub_display:ship rov umb env;
rov_av_revs_demand:f_astn hf_astn h_astn q_astn stop q_ahd h_ahd hf_ahd f_ahd;
stage:initial searching placing far close final pull_in infringe;
class:Both_Ths_Full_Astn Both_Ths_Half_Astn Both_Ths_Stop Both_Ths_Half_Ahd
      Both_Ths_Full_Ahd NO_KEY;

**EXAMPLE FILE**
48 7.98 0.00 1000.0 48.0 0.00 ship stop initial NO_KEY;
3 7.97 0.00 418.9 48.0 0.00 ship stop initial NO_KEY;
11 7.94 -0.22 320.3 48.0 0.00 ship stop placing NO_KEY;
30 7.60 -0.11 231.4 48.0 0.00 ship stop placing NO_KEY;
52 4.08 -0.01 150.4 34.5 4.14 rov stop far Both_Ths_Full_Ahd;
15 3.21 -1.08 134.9 34.5 3.39 rov f_ahd far NO_KEY;
15 3.19 -0.27 50.6 30.2 3.68 rov f_ahd far NO_KEY;
21 1.34 -0.72 16.1 12.1 1.53 rov f_ahd final Both_Ths_Stop;
15 -0.13 -0.11 14.1 8.5 0.88 rov stop final Both_Ths_Half_Ahd;
28 0.30 0.26 11.9 3.0 0.56 rov q_ahd final NO_KEY;
-56 0.24 0.56 15.7 2.0 0.61 rov q_ahd final Both_Ths_Half_Ahd;
-8 1.04 0.33 8.7 1.8 1.09 rov h_ahd final Both_Ths_Stop;
-49 0.13 0.25 4.5 2.1 0.29 rov stop final NO_KEY;
-20 -0.12 0.07 6.1 2.6 0.15 rov stop final Both_Ths_Stop;
27 0.06 -0.06 6.9 2.8 0.08 rov stop final Both_Ths_Stop;
-15 -0.15 -0.08 6.8 3.0 0.16 rov stop final Both_Ths_Stop;
56 0.09 -0.24 4.3 3.4 0.26 rov stop final NO_KEY;
-5 -6.69 -1.41 65.7 19.7 7.04 ship stop pull_in NO_KEY;
2 -0.67 1.12 1000.0 36.3 1.31 ship stop searching NO_KEY;
2 -1.44 -5.76 1000.0 37.9 6.86 ship stop searching NO_KEY;
2 0.59 -7.44 1000.0 39.2 7.98 ship stop searching NO_KEY;
-157 0.62 -8.28 487.4 39.8 7.57 ship stop searching NO_KEY;
-169 0.55 -7.37 465.4 40.3 8.36 ship stop searching NO_KEY;
178 0.61 -8.21 463.9 40.9 7.51 ship stop searching NO_KEY;
166 0.58 -7.49 483.1 41.5 8.18 ship stop searching NO_KEY;
-132 -5.27 -6.18 424.4 42.5 7.37 ship stop searching NO_KEY;
-129 -7.05 -4.36 229.7 45.2 7.55 ship stop placing NO_KEY;
-109 -3.64 -2.69 119.6 47.0 4.59 rov stop far Both_Ths_Full_Ahd;
5 1.94 1.50 80.1 47.4 2.46 ship f_ahd infringe NO_KEY;
-14 2.78 0.88 60.1 47.7 2.91 ship f_ahd infringe NO_KEY;
Figure 6.8: An instance of an example and attribute file for the CN2 induction program.

Three rule-induction programs were readily available: C4, ID3 and CN2 [23] (for a description and comparison of some algorithms, see [39]). (The version of CN2 used was developed by Robin Boswell of the Turing Institute, as part of ESPRIT project 2154, the Machine Learning Toolkit.) C4 is based on the ID3 algorithm, and like ID3, produces output in the form of decision trees. When C4 was tried on larger data sets (a few thousand examples) it was found to be excessively slow, taking several hours to run, and on the largest ones it `crashed'. It was then decided against as a primary tool. Of the other two, CN2 was chosen as the more appropriate, because

  1. it was designed specifically for `noisy' data, and human actions are rarely noise-free.
  2. it can produce output in the form of if-then rules rather than as a decision tree.
There are two major modes of CN2: ordered and unordered. The ordered mode produces if-then-else rules, where, when the rules are being executed, the search through the rules stops when a match is found. In effect, later rules have as part of their conditions the negation of the conditions of earlier rules. This means that the ordering of the rules is significant, and that the application of any rule cannot be understood out of context. Thus, from the point of view of human comprehensibility, there is little advantage of the ordered mode over a decision tree, as in ID3.

The unordered mode produces if-then rules where the condition is made up of a conjunction of conditions on any of the attributes. Disjunction (`or') is produced by having a number of rules all for the same decision class.

A standard method for generating and testing rules was adopted. This is to take a training set of data, and use the program to generate rules, then to take the training set and unseen test sets, and to evaluate the prediction performance of the rules on these data. This process leads to figures for the effectiveness of the generated rules for each decision class considered, and an overall prediction performance figure, which must be carefully compared with the prediction performance of a default rule before being able to assess its value.

The first comparison of representations was between those given above in Figure 6.7 (see also § B.1). This used CN2 in its `unordered' mode, where the rules produced are independent of each other (the order of them is immaterial). This was, however, a new facility to be added to CN2, and it was still to be fully tested.

The example was interesting, because although it looked as if the second case performed better, in fact, comparing the prediction performance of the rules with the default rule reveals that these rules did not score better at predicting human actions than the rule ``do nothing all the time''. The default rule is that all examples belong to the modal (most frequent) class, which in these cases is NO_KEY. So we obtain a figure for the default rule by summing the actual frequencies of NO_KEY and dividing by the total number of examples.

In the unordered case, with the representation RT01, the prediction performance of the rules even on the training set was very close to the prediction performance of the default rule. On the test data, the prediction performance was substantially worse than the default. Looking at the individual rules (§ B.1), the second rule makes sense in that it is saying that when the first key-press of a `pure turn' has been carried out, the corresponding part then needs to be done. In contrast, the fourth rule is quite implausible. Simply from considerations of symmetry, we could call into question a rule where there were symmetrical conditions but an asymmetrical action. In this case, the rule must be presumed to have emerged from a coincidence in the data. The figures after the rule indicate that it is not very well supported even in the training data: we might expect it to be even more poorly supported in test data. But many of the other rules can be criticised in a similar way.

With the new representation RT2, the overall accuracy figures are not much different from the default rule figures. But the rules look much better than in the representation RT01. Firstly, there are fewer of them, which is an advantage. Secondly, most of them can be made good sense of.

After discovering some unresolved issues with the unordered mode in CN2, the same data were reworked using the ordered mode (see Appendix B.2). This appendix illustrates the kind of rules obtained using the ordered mode. Briefly, the test data results on the representation RT01 still are below the default values. For the representation RT2, the results just manage to be better than default.

6.3.4.6 Evidence for development of rules, and representational effects

The subject with the longest experience was G, at George House. His trace files were grouped into four equal calendar time intervals: 02, 03, 04 and 05, in order from earlier to later. The calendar divisions are October 19th, October 30th, November 10th, November 22nd, and December 4th. The data was fed through the actrep filter, using action replacement charts generated for the subject G from all his games together. The 05 games (including the subject's best scoring game) were used for generating rules. Rules were generated on three representations of ROV speed control, named RS0, RS1 and RS2. (See Figure 6.9.)



(RS0)                           (RS1)                           (RS2)

2                               1                               1
rov_degrees                     rov_off_head                    rov_off_head
rov_target_head
                                3                               5
5                               rov_target_range                rov_u
rov_u                           rov_height                      rov_v
rov_v                           rov_speed                       rov_target_range
rov_target_range                                                rov_height
rov_height                      3                               rov_speed
rov_speed                       sub_display
                                rov_av_revs_demand              3
3                               stage                           sub_display
sub_display                                                     rov_av_revs_demand
rov_port_revs_demand            6                               stage
rov_stbd_revs_demand             3 3 2 0   Both_Ths_Full_Astn
                                 3 3 2 1   Both_Ths_Half_Astn   6
6                                3 3 2 2   Both_Ths_Stop         3 3 2 0   Both_Ths_Full_Astn
 3 3 2 0   Both_Ths_Full_Astn    3 3 2 3   Both_Ths_Half_Ahd     3 3 2 1   Both_Ths_Half_Astn
 3 3 2 1   Both_Ths_Half_Astn    3 3 2 4   Both_Ths_Full_Ahd     3 3 2 2   Both_Ths_Stop
 3 3 2 2   Both_Ths_Stop         0 0 0 0   NO_KEY                3 3 2 3   Both_Ths_Half_Ahd
 3 3 2 3   Both_Ths_Half_Ahd                                     3 3 2 4   Both_Ths_Full_Ahd
 3 3 2 4   Both_Ths_Full_Ahd                                     0 0 0 0   NO_KEY
 0 0 0 0   NO_KEY

Figure 6.9: Three slightly varying representations for ROV speed control

These were intended to be progressively more human-like. The CN2 parameter `star' was set to 10, and `threshold' was set to 10, and ordered mode was used. The rules generated were then tested against the data from each of the divisions, 02, 03, 04 and 05. The results are summarised in Table 6.2. The numbers in the body of the table are the percentage points difference between the prediction performance of the rules and the prediction performance of the default rule. The high scores for the interval 05 are due to the fact that 05 interval provided the training data. The default rule generally scored around 60% to 70%, and the 05 interval absolute scores are over 95%.

Table 6.2: Testing rules for interval 05, subject G, against defaults

There are two trends immediately apparent in this table. One is that RS1 and RS2 perform substantially better than RS0, with RS2 being slightly the better of the two. The other is that whatever rules were induced for the interval 05 were not much in evidence during interval 02, and progressively became more so. This is reassuring in two ways: firstly it suggests that the rules found are not imaginary, or due to random effects; and secondly that these rules are being adopted increasingly as time goes on. This is consistent with a common-sense view of learning.

An alternative way of dividing up the examples is into sets of similar size. This was done with subject M, but otherwise the same procedure was followed as with subject G. Table 6.3 summarises the results for M.

Table 6.3: Testing rules for interval 0499--0508, subject M, against defaults

The rules were again constructed on the data containing the highest score, which in this case was the 0499--0506 interval. The same general trend is apparent with respect to the representations as above.

The prediction performance of the rules across time again shows a build-up of prediction performance towards the training interval; but now also shows a subsequent decline. This could in principle be due either to a decline in task performance, with the acquired rules not being followed as strictly as before, or due to new rules supplanting the old ones. In this table, the overall accuracy figures have also been included, to show that there is in fact no rise in overall accuracy between the fourth and fifth intervals. The rise in the relative figure is due to a fall in the default rule accuracy, which, in this interval, implies that there were fewer null actions.

G and M were both in George House, and took an interest in each other's games. It is perhaps not surprising that similar patterns emerge in their results, and that a representation that was able for one of them to produce rules performing substantially above the default rule, should also be able to do so for the other. This is not so, however, for DM, one of the Charing Cross Tower subjects. His results, derived by exactly the same process as the above results, are summarised in Table 6.4.

Table 6.4: Testing rules for interval 065--1, subject DM, against defaults

This table suggests that the rules do not reflect the actual rules being used by this subject. The first two columns suggest, rather more strongly, that the rules are substantially different from those used at the earlier stages of learning. The pattern for all the representations is similar, and this suggests that none of RS0, RS1 or RS2 cover the attributes actually used by subject DM. However there is, if anything, a slight favouring of RS1 over RS2, contrary to the other subjects. Another representations would have to be found, if we were to find results as satisfactory for DM as for G and M.

Next Section 6.4
General Contents Copyright