| ©1990, 1995 | section list | 7: Experiment 2 | overview | General Contents |
| Section 7.2 | 7.2 Analysis and results subsections | Section 7.2.6 | ||
The subject AJ interacted with the simulation for a nominal 30 hours 31 minutes, including 62 starts, between 18th June and 25th July. The first non-negative score was 9500 after 14h 38m. Progressive maxima were 11362 after 17h 23m, 12089 after 19h 53m, 14332 after 20h 35m, 14477 after 24h 0m, 16990 after 25h 17m, and 17441 after 30h 31m. Interspersed with these high scores were several where, due to infringements or damage penalties, the score was large and negative. The values of these low scores reveal very little other than the fact that a mine exploded, and so a complete table or graph is not given here. As a comparison, indicating the region where scores would stop improving, the author on a good day can score around 20000 on this task, but has never scored as much as 21000.
A potentially serious error was discovered after this subject had completed 8h 39m of practice. When the ship was travelling backwards, there were certain circumstances where it accelerated backwards without power, and well beyond maximum speed. This was rectified by attending to the simulation of the rudder, but this meant that the data from before this time was not easily analysable with the updated versions of the programs. So the analysis we have here does not include the first stages of learning. The remainder of the data was divided up into six intervals. These intervals were intended to be of approximately similar sizes, but preference was given where possible to put the boundaries coincident with the end of a day. The seven boundaries corresponded with practice times of 8h 39m, 12h 23m, 16h 37m, 19h 53m, 23h 10m, 26h 41m, and 30h 31m. These will here be referred to as intervals C to H respectively, as a reminder that the first part of the data is absent. The durations of the intervals were 3h 44m, 4h 14m, 3h 16m, 3h 17m, 3h 31m, and 3h 50m.
In order to observe the value of using a context-style
representation for the analysis, it was desirable
to have at least two contrasting analyses.
The first analysis follows the minimum
context structure compatible with the interface,
taking only the contexts defined by the
three separate sub-displays of ROV, ship, and cable.
This representation was derived by using the
tracechord program with trace data from the early
interval C, with a very large distance parameter
governing chord absorption (20), ensuring that only
one chord group for each sub-display would remain.
Table 7.1: General ROV context for AJ (of 3 basic)
The first table, 7.1, shows the results of inducing rules for ROV actions, in the general ROV context, with CN2 and testing them on fresh data. The number of examples in each set of data is given below the label for that data set: because the data were dealt out evenly between the two sets for each context, the numbers in set 0 and set 1 differ by at most 1. In the body of the table, the upper figure (e.g., 37.4% in the top left element) gives the overall performance of the rules (generated from the training set) at classifying examples from the test set. The rules generated were used as they were, without any attempt to `clean them up' (despite the fact that it was easy to see opportunities to clean up the rules), so as to reduce the possibility of unaccountable knowledge affecting the analysis. From the data that generate any set of rules, a default rule can also be generated, which is that the class for all examples is the class seen most frequently in the training data. The difference between the performance of the default rule and the induced rules is given as the lower number in each element of the body of the table, where a positive value indicates rules performing better than the default rule, and a negative value that the default rule performs better. As an example, for the top left element, 37.4% is an improvement over the default rule of 17.4%: and thus the performance of the default rule was 20.0%.
The main trend to be observed in this table is that the improvement of performance of the rules (over the default rule) generally is near a maximum when the test set is from the same time interval as the training set, and falls off to either side. This suggests that the rules that are induced are ones that change over time, something that could be explained by the subject learning, and his score improving, during the experimental period. However, the overall performance of the rules is far from good. This implies, in terms of the discussion of action types above (§ 6.4.2), that there were many actions either which fell into a category other than that of established rule-following actions, or for which effective rules could not be induced given only the attributes included, and the characteristics of the induction program. Some ways of attempting to get better performing rules will be addressed in the next section (§ 7.3).
Another noticeable feature of this set of data is that the figures for test set E1 appear to be slightly depressed from what would be expected on the basis of the above trend. In fact, between sets D and E there was a two-week break, during which the subject suffered accidental injury. The combination of injury and falling out of practice would seem very plausible explanations in this particular context, where the actions are faster-moving and more time-critical than in the other contexts.
Table 7.2: General ship context for AJ (of 3 basic)
The next table, 7.2, shows similar overall trends. The overall performance percentages are higher than for the ROV context, but this is accounted for by the higher performance of the default rule in each case (this is because there is a greater proportion of null actions in the ship context). The increases of performance over default fall within the same range as for the ROV context. There is somewhat less of a trend of higher performance for training and test data close in time, but instead, there appears to be an increase in the performance of all the rules as the test set is later in time. This could be explained as a general increase in the proportion of rule-governed actions.
Table 7.3: General cable context for AJ (of 3 basic)
The final table of the three in the first group, 7.3, has similar overall performance figures to the ship context, but this time the improvements over the default rule are far more marked. It seems thus likely that the actions in this context are more rule-governed in nature. This is to be expected, given the relative simplicity of the decisions that have to be made in the cable context.
The second analysis given here used contexts derived directly from the data at a finer granularity than the previous ones. The data used in the context derivation was the last set of data from subject AJ, i.e., that called H here. As described above, a chord distance of 2 units led to 10 contexts. Some of the contexts, however, had very small numbers of examples in them, and perhaps could be regarded as fictions created by the analysis process.
Table 7.4: ROV approach context for AJ (of 10)
The three ROV contexts that we shall consider here were termed ROV approach, ROV visual and ROV miscellaneous. This ROV approach context generally applied from after the ROV has been put out, up to where it is close enough to the target for the camera to reveal the nature of the target. It was based around three sensors: the relative heading of the target from the ROV; the range of the target from the ROV; and the height of the ROV above the sea-bed. In Table 7.4 we see much the same trends as for the general ROV context above (Table 7.1). Here the trend indicating shifting rules in somewhat more pronounced than in the general ROV context. In Table 7.5, in contrast, the trend towards shifting rules is distinctly less pronounced. This context, here described as `ROV visual', included the ROV graphic sensor on, so that the subject was relying less on the digital sensors. This context applied immediately after the ROV approach context, and covered the stage where the ROV was manoeuvring very close to a mine. However, looking at the number of examples over time suggests that this context is declining in use, with more actions being taken under the ROV approach context as time goes on. If indeed this context is `on the way out', it is not surprising that there is not much change or development of the rules over the period covered. Conversely, if the ROV approach context is taking on a larger share of the action, it may be that there is further sub-structure within it.
Table 7.5: ROV visual context for AJ (of 10)
The other ROV context is given in Table 7.6. This context is based around no sensors, and tended to occur both immediately as the ROV was put out, and immediately before being pulled in again. It could contain, for example, routine preparatory actions that were not dependent on the situation at all.
Table 7.6: Miscellaneous ROV context for AJ (of 10)
There were also three ship contexts of interest (two others had very few examples). In the ship search context, there were generally no sensors turned on, and from observing replays, it was apparent that brief glances at information were taken, often being turned off before any action was taken. This context was the one relevant to going between targets. Looking at Table 7.7, we see that virtually all the rules induced performed worse at classifying test examples than the default rule. This means that the rules induced could not be accurately showing consistent regularities in the data. The obvious explanation is that there are no good rules, in terms of the attributes associated with this context.
Table 7.7: Ship search context for AJ (of 10)
This is consistent with the view that the searching pattern is the aspect of the task that is most at the knowledge-based level, involving reasoning and planning, rather than simple condition-matching. Subjects were able to discuss at length reasons for or against taking a particular path, both in general, and in particular, when they could see several targets at once and had to decide where to stop the ship most advantageously. A complementary explanation would be that suitable attributes were not provided, in terms of which decisions could be taken. Providing those attributes would involve considerable machine processing, in lieu of the considerable knowledge-based processing that is presumably performed by people.
Table 7.8: Ship positioning context for AJ (of 10)
The ship search context contrasts greatly with the ship positioning context, for which results are given in Table 7.8. This context covered the stage from where a position to stop had been selected, to the time when the ship was stopped and attention moved to the ROV. The sensors centrally involved were the propeller revs and the ship surge speed, with the control demands, headings and target range also amongst the overtones. In this context, the rules perform very well by comparison both with the ship search context and the general ship context, though there are no very clear trends within this good performance.
What is very clear, however, is that this division of context between ship search and ship positioning divides two collections of data that have very different characteristics, and that thus this division is highly relevant to the analysis of this data. This could not have been due to the choice of attributes, since it happened that the same set of attributes had been selected for both contexts.
Table 7.9: Ship with General Position Indicator context for AJ (of 10)
The other ship context included here is the one including the general position indicator (GPI). The GPI was priced heavily to deter its use, and it was never envisaged as being easy to formalise its information content. In Table 7.9, we see that even for period C, the rules induced do not perform very well. The context is then progressively abandoned as time goes on, and the decline of this context roughly matches the growth of the ship search context.
Table 7.10: General cable context for AJ (of 10)
The general cable context derived from period H of AJ's runs (Table 7.10) differed from the previous one (Table 7.3) only in that there were more attributes included in the earlier version's analysis. These extra attributes cannot have been centrally important however, because comparing the tables shows a better performance for the later version with fewer attributes, for the majority of the table elements, including all of the leading diagonal. One plausible hypothesis here is that some of the extra attributes allowed increased precision in the rules, whereas others allowed spurious precision leading to unfounded rules.
| Next Section 7.2.6 | |
| General Contents | Copyright |