|©1990, 1995||General contents|
|Chapter 4||Chapter 6|
If human response to complex tasks was to be studied, experiments were needed to obtain relevant data. To help in the evaluation of experiments, let us distinguish a few constituent parts of a suitable experiment.
The following criteria are not seen as specific to the author's actual position, and therefore they are presented as general methodological points, with the possibility that the same criteria could be relevant to other research into the same area. Specific options will be considered in detail below, §5.2.
The idea that the study of games can be relevant to understanding complex dynamic systems is supported by Rivers . He suggests that study of games and simulations could address questions such as: how do people generally make decisions and cope with complexity; how does the surface representation of the underlying dynamics of a situation affect people's understanding of it; how do people generally learn to behave in relation to complex dynamic systems; what is the variability between individuals on these dimensions; to what extent is it possible to predict performance in the real situation from performance in a game? This further motivates the consideration of games as well as higher-fidelity simulators and live applications, as relevant to the general aims of the present study.
The nature of complexity is not unambiguously defined, but has been discussed above, §1.3.2. There are factors weighing both for and against greater complexity in an experimental task.
The argument in favour of greater complexity is that the relevant real-world systems and tasks are highly complex. The more like these tasks is an experimental arrangement, the more relevant an experiment would be to these tasks. In particular, the more complex a task is (using a common-sense meaning of complex) the more likely it is to exhibit complexity as has been operationally defined above (§1.3.2), namely, that a variety of strategies is likely to be employed either across time, or across different subjects.
When we come to consider tasks and systems of equal or comparable complexity to these real-life systems, problems emerge. Subjects will either be fully trained or not fully trained beforehand on the task. If they are fully trained, they are unlikely to be readily available as experimental subjects, as their training tends to be expensive, leading to a high cost of their time. If fully trained personnel were to be used, the experimental system would have to closely match their normal working environment. This entails either using real-life equipment, or (usually expensive) high-fidelity simulators. If, on the other hand, subjects were not previously trained on the target system, they are likely to take a long time to develop a stable skill in performing the task. This, in turn, means either that the experiments would have to be extended over a long period (with consequent expense), or that the subjects would still be learning about the task and system as the experiments were performed. If subjects are still learning, instead of there being a stable set of rules underlying their behaviour, the rules would still be changing. Modelling a stable set of rules is the more fundamental aim: so it would seem unwise to try to model rules in a learning situation without, either at the same time or beforehand, being able to model stable rules.
If a realistically complex target system is desired, but no actual system can be used as it is, there is an unknown amount of work needed to realise an effective experimental system. In the case of building a computer simulation from scratch, the time necessary is likely to be prohibitive.
The conclusion of the arguments on complexity is that we would like the most complex target system and task that come within all the practical limitations. In practice any system that conforms to these limits is likely to be not more than fairly complex.
A related aspect of the choice of task is the choice of level of control. This considers the task with respect to the operator interface, rather than the underlying target system.
In the design of any complex task interface, there is a choice of level for the sensors and effectors. At the lowest level the primitive components of the interface correspond to individual elements of the target system—the raw sensors and effectors that are implemented in hardware. At a higher level there would be some composite sensors or effectors that in some way combine more than one lower-level sensor or effector. Let us illustrate this with a few examples.
A raw sensor might give the revolution speed of a motor, or the temperature or pressure of a certain part of the target system. There are many possibilities for higher-level sensors. A sensor which gave the estimated time to a particular condition being satisfied would have to integrate information on current values and current rates of change. A sensor for the working state of a ship's rudder needs information about the rudder angle and the angle of the water flow past the rudder. Further examples can be imagined. Any operational concept that depends only on measurable quantities could in principle have a high-level sensor built to display it.
In complex systems, the lowest level effectors sometimes have servo systems on them which cannot be bypassed, and for this reason among others the effectors do not necessarily directly alter the quantities sensed by the lowest level sensors. In ships, typically, the direct controlling actions are to set demands for the propeller speed or rudder angle, since it is not possible for these to respond immediately. Servo mechanisms then bring the actual value towards the demanded value over a period of time, perhaps several seconds. In more everyday examples, low-level effectors often take effect simultaneously with the physical control action—gear changing in a car, for example. Higher-level effectors are set up whenever programming is done. In mechanical systems, a higher-level effector might have the same effect as a number of lower-level ones. As with sensors, construction of higher-level effectors is not constrained in principle. In terms of a game or well-defined task, the highest level effector possible would be a single button that started automatic execution of the whole task.
The level of control has important consequences for what can be learnt by observing control actions. Observing control actions at the highest possible level would not reveal anything about the mental structures involved in task performance, because there would be no structure in the control actions. At low levels of control, the salient features of the control actions are likely to concern the lower levels. The extent to which higher-level structure is present and established in human control would depend on the extent to which the human had mastered the lower levels, and gone on to develop higher-level control strategies. Lower levels preceding higher levels of control is reflected in many human activities, where you have to learn ‘the basics’ before you are able to learn the more advanced points, and this is largely dependent on experience gathered through time. If one wishes to study higher-level strategies, the situation to avoid is where a low-level interface is being used by a person who has not had the time to master the lower levels completely. For complex systems, mastering lower levels could take a long time.
The different levels of control are also reflected in Rasmussen's categorisation of skill-, rule-, and knowledge-based behaviour . The lowest level of sensors is most likely to correspond to the skill-based level, where Rasmussen characterises the information as signals. When humans act at the skill-based level, their actions can often be clearly seen as effectors at a similar level—consider steering a car or bicycle, or being a helmsman on a ship without the autopilot. At an intermediate level of control, corresponding with Rasmussen's rule-based level, the actions taken are more abstract, but still without knowledge-based processing. For information to be appropriate to this level of control, it must be presented in terms of the antecedents of the rules being used. Rasmussen calls this information signs. Higher levels of control are more likely to correspond with Rasmussen's category of knowledge-based behaviour. However, at the highest possible level of control, where the task is completely automated, human cognitive processes are no longer necessarily involved at the time the control is being carried out.
The knowledge-based level is where both conscious mental processing, and explicit learning, are most likely. If explicit learning it going on, this suggests that some salient aspect of the cognitive structure is changing, and this is more difficult to study than an unchanging cognitive structure.
Overall, considerations of the level of interface suggest a fairly low level of control as appropriate to an experimental arrangement, but not so low as to make the task too complex and difficult to learn thoroughly.
In contrast with these arguments for a low level of control, the experience of the Simple Unstable Vehicle experiment (above, Chapter 4) warns us against control that is too much motor-skill based. There it was noted that investigation of motor-skill tasks is likely to require discovering about relatively low-level perceptual and psycho-motor skills.
In practice, complex tasks such as the ones we are holding as exemplars tend not to involve any motor skill. A ship's master would rarely take the helm: most actions are initiated by spoken commands. In most supervisory control tasks, there are no analogue controls present on which motor skill would be appropriate (beyond the everyday skills of pressing buttons, etc.). Therefore excluding motor skill from an experimental arrangement would benefit the relevance of the experiment.
There are various ways in which motor skills and psycho-motor limits could appear. One is hand-eye coordination: for example in which the mouse could be used to guide the cursor following an intricate route; or the cursor coordinates on the screen could be used as an analogue input to a simulation. The limitations here would be more obvious in cases where a human had impaired limb movement. Another aspect of motor skill is in the precise timing of actions: either doing a planned action at an exact moment, or reacting as quickly as possible to an unexpected stimulus. Everyone knows about their own limit of reaction time.
Having no motor skills in an interface means ruling out a whole level of interaction. This is in opposition to the idea of “direct manipulation” (e.g., ), where the advantages of physical, reversible, incremental interaction are stressed. But removing much of the vast range inherent in analogue interaction makes the job of precisely recording the interaction much simpler, and may lose a great deal of variation which did not have any significance for the present study. A further advantage is that a task with a limited range of interaction could provide a fairer comparison of unmediated human ability with the performance of pre-programmed rules.
Whatever the target system, and interface to it, there is still the question of how the task is specified. Without a specified task, users of a system might explore it, or experiment with it, in whatever way comes to mind at the time. They may set their own goals explicitly, or may rely on unspoken implicit goals to guide their behaviour. They may not appear to have any goals at all.
Being goalless is not what is wanted for this experiment, for two reasons. Firstly, real-life complex systems rarely permit much exploration or experimentation. Typically, some aspects of an operator's task are clearly defined by his or her employers, and this may well be sufficient to prevent exploration, particularly when there is risk or danger involved. Secondly, in order to study the human approaches to a complex task, we need to have as much data as possible relating to the same task. Thus, we do not want to allow users to make up their own tasks as they go along, with the twin risks of the task changing frequently, and it being not easy to know at any time what the effectively current task is.
In real-life tasks, any operator may be motivated by a number of factors, some of which may be common to all operators, and some which may be personal, or may be varied in the strength which different individuals attach to them. In this sense, the tasks performed by different people in the same job are not necessarily identical. This is even more likely to be true in complex tasks, where there are a variety of possible strategies, than in straightforward tasks, where there is a highly constrained set of methods and acceptable outcomes. Ideally we would want to dispense with this variation of motivating factors, for the sake of this stage of experiment.
Explicit predefined goals would avoid these problems, and provide a stable and well-defined task for operators to adapt to. This may be more motivating than trying to achieve one's own ad hoc goal, if only because it is difficult to give oneself finely-graded feedback on a self-defined task, and without fine feedback, the improvement with practice will be less noticeable, and therefore probably less motivating. An experimental subject is even less likely to set goals of the type usually encountered in complex systems: that is, multiple conflicting ones.
Another important factor for the potential subject is the inherent interest and challenge of the task. While a well-defined task is an important element in this, another important element is the nature of the task itself. It would seem likely that an operator could relate more easily to a task that has some realism in it, and where “things happen”. This realism need not be the strict engineering realism of high-fidelity simulators, especially not so if the subject has no detailed knowledge of the target system. But it should give the sense to the subject that he or she is engaged in a real task. One way of spoiling this sense of realism is to have a component of the simulation behaving counter-intuitively. This is less likely to matter if it is only a weakly-held intuition about something of which the subject has little experience, but even in unfamiliar situations there will be some strong expectations based on general knowledge of the world, and these should be respected.
We turn now from the requirements of the subject to those of the experimenter. What does the experimenter do, if the experimental arrangement turns out to be producing data more relevant to another study than to this one, as was the case with the SUV study (Chapter 4)? In principle, the target system, the definition of the task, or the interface could be modified to change the nature of the data produced. An experimental system would be better, on this criterion, if it was able to be modified. A simulated target system may need to be altered if the behaviour of some part proves counter-intuitive. The task might need alteration if it produces behaviour which is either too knowledge-based or too motor-skill-based. The interface might need modification if it is too much of an obstacle in the way of performing the task.
Modifying the target system itself would be difficult for a system not written by the experimenter. The task could be changed in any case, but if the interface was not able to be changed, the task definition would have to be on paper, which may not be so satisfactory (as argued above). Altering the interface has similar constraints to altering the target system, except that no knowledge of simulation mathematics would be required. The main point here is that modifiability is not easy criterion to satisfy, and therefore needs close consideration.
The need to log data is a briefly statable but centrally important criterion for a good experimental system. Without the ability to log data and analyse it, the experimental method would be severely constrained, and would have to rely on verbal reporting (for a discussion of verbal reports, see Bainbridge ).
Detailing this requirement, data needs to be logged in such a form as would permit the complete regeneration of of experimental trials: both the situations which occurred during the experimental runs (in terms of information presented), and the actions taken by the operator. This must be machine-readable. The practical considerations of storing the data must also be taken into account.
A final practical criterion of choice is the obvious one, that whatever system is chosen must be realisable in some way or other. For systems tied to bulky hardware (such as training simulators), this means in practice that access is needed. For ready-built simulations, the code must be available in a form which can be run on an available machine. For unimplemented simulations, the mathematics must be available, and it must not be too difficult to code. If the simulation and the interface are separate, the same considerations apply to both.
Subjects must also be obtainable, which means taking into account any need for skill or experience, and the time the subjects are needed for.
Some choice of experimental system needed to be made to enable the further study of this thesis. The consideration of possibilities ranged widely. For completeness, we will here briefly discuss the options examined at the time of decision, along with others, already rejected, which have been discussed more fully above. The following options for the object system are discussed here, with reference to the criteria given in the previous section:
Table 5.1 gives one-word summaries of the suitability of the options on each criterion. A question mark indicates an uncertain evaluation, and “poss” indicates that the relevant criterion is at least to some extent under experimental control.
A choice of implementation platform also needed to be made. This was largely dependent on the choice of object system, and will therefore be mentioned at the end of this section.
This has already been discussed above (§3.1), along with reasons why it is impractical to gather data from such simulators.
This system is described in a paper by Hollan, Hutchins & Weitzman . The fact that there appeared to be no working versions in Britain at the time of enquiry limits this review to the contents of that paper. STEAMER is included here because it appeared at first sight to be a candidate worth considering as an experimental vehicle.
The STEAMER project was aimed at evaluating techniques for the construction of computer-based training systems. They chose a naval steam propulsion system as the domain, and constructed an “interactive inspectable simulation” with the aid of high-level graphics and AI tools.
About 100 diagrams both illustrate the operation of the system and its many subsystems, and provide a means of controlling the simulation. The intention was to make the simulation at least conceptually realistic, if not high-fidelity in an engineering sense. The paper does not describe in detail how to use STEAMER as a training tool: however it is fairly easy to imagine a control task being defined using STEAMER, and training being given for this.
The problems with STEAMER in the context of this thesis stem from the different motivation behind its design. It is intended to provide the kind of interface that allows trainees to develop their own mental models of what is going on. Hence there is much emphasis on the graphics, with ‘direct manipulation’ where possible. It would appear that exploration plays a big role in the kind of training that these authors envisage. This may very well be an important part of training, but it does have a negative correlation with the definition of the task. In some ways, STEAMER could be seen as aiming at the stage of training before (and perhaps also while) definite tasks are introduced with the idea of trainees honing their skill against definite performance criteria. One can imagine trainees with STEAMER saying to themselves “Let's see what happens if I do that”, and “Ah! So that's how it works!” In this context, it makes sense to provide the maximum amount of information, and undoubtedly graphical interfaces can help in this to a great extent. However, the same design philosophy does not lend itself to providing detailed feedback against external objectives. It is difficult to judge how much task definition and feedback could be introduced without having seen STEAMER at work.
STEAMER is put together using a sophisticated graphical editor designed specifically for the job. This means that, in the ways the editor permits, it would be easy to adapt the system, simplify it, or even build a new system from scratch. But the very specificity of the graphics editor, and the complexity of the system in which it is embedded, means that it would be very difficult for someone without extensive experience of that system to adapt it in any way that was not specifically envisaged by the system's designers.
The flight simulator game on the Silicon Graphics Iris 3130 takes us down in complexity and realism, but still retains enough realism for the simulator to be interesting, and naturally used as a recreation, even though, in the absence of another similar networked machine, there is no clearly defined task other than landing the aircraft in a manner as close as possible to a preset way which has a maximum score.
The level of control is almost entirely near the lowest level that one would have in an aircraft. This means that manual skill plays a large part in successful landing, as is the case in (unaided) real life. Hand-eye coordination and speed of reaction are probably both limiting factors here.
The chief positive points that this system has are firstly that it is obtainable, and secondly that there is a logging mechanism from which one can replay previous flights. However, these positive points are not strong enough to overcome the big problems that this system would encounter if used as an experimental vehicle. There are two problems with the logging mechanism. Firstly, the log files take up a very large amount of space, and it would be impractical to store more than a few on hard disk. Secondly, the log only stores the situations that occurred, not the control actions. Even if the control actions were stored, it would be difficult automatically to characterise control actions at a higher level, starting at such a low analogue level.
The best aspects of existing computer games are their good task definition and feedback. Evidently, these are important features for any game to have intrinsic appeal.
Many computer games, including all those of the ‘space invader’ type, are heavily based on manual skill. These can immediately be written off as desirable experimental vehicles. Another category is the adventure game with verbal interaction. These are not dynamic control tasks (see above, §1.3.1), but are of a different class which is of less interest to us here. At the time of choice, there were no obvious candidates which escaped these two complementary problems.
Another problem with commercially marketed games is that the source code is not available, and this means that the game is not adaptable, nor is any automatic logging of actions possible.
Nuclear power plants feature strongly in the literature related to HCI and complex systems (see above, §126.96.36.199). For this reason among others, the idea of a power plant simulation was interesting. A simulation system had been set up in Denmark several years ago, called the Generic Nuclear Plant (GNP), which is a simplified version of a pressurised water reactor, designed with experiments into operator control in mind.
Enquiries for the source code revealed that it was written in Pascal, but not generally available. It was not clear whether there was any particular interactive interface already built for the GNP simulation. If not, then the task of building an interface would be not much less in scale than the task of building a whole system complete with interface. Even if an interface was built, it would be very surprising if it did not need substantial modification to provide all the facilities required for the present work.
In other respects, the task was clearly a promising one. The dynamic nature, and long time constants of the system ensure that there are no psycho-motor limits to contend with, and that the skill is more a cognitive than a motor one. But with no available complete system with interface adapted to the present experimental requirements, the idea of a nuclear power plant simulation had no particular advantage over a task that had to be constructed without the help of a target system simulation that had previously been implemented.
Having rejected the idea of nautical collision avoidance, there remained other possibilities in the same field. Mathematical simulations of some of the relevant objects were available, which would make possible reasonable realism. A task could be chosen to lie in the acceptable range of complexity, with a reasonable level of control. The interface could be designed to eliminate psycho-motor limits. Task definition, feedback and logging could be built in, and having constructed the system, adapting it would be no more difficult than necessary.
The great disadvantage was the need to implement all the software without external help, which would inevitably detract from the time available for experimentation.
An important extrinsic factor in the final decision was the fact that the area of study originally envisaged had been nautical. Resources, expertise and interest were available in this field in a way that they would not have been in another. Despite the burden of needing to implement the software, a task built from nothing appeared to be the only way of securing a suitable experimental vehicle. On these grounds, a nautical task simulation was chosen. A platform to implement the experiment then needed to be chosen.
Since the chosen task had an important spatial content good graphics were desirable, preferably colour; and good graphics primitives on a system make graphics programming less difficult. The systems that were available were the ones already used in the Simple Unstable Vehicle experiment (Chapter 4): these were various Sun workstations, and the Silicon Graphics Iris 3130 computer. Of these the Iris was clearly preferable, in virtue of its superior graphics primitives and fast dedicated graphics hardware. An added bonus was that two almost identical machines of this type operated at the Scottish HCI Centre and at YARD Ltd., and time was available on both systems for development and experimentation. No other system available at the time had the same advantages.
The choice of implementation language was restricted. Although an object-oriented approach may have been tidier, or in other ways preferable, such languages were not at the time installed on either computer. The availability of compilers at the time of writing the programs, the ability to write low-level code where necessary, and the ease of the interface with the graphics commands, all dictated that C would be the language chosen. UNIX was the operating system for the chosen workstations.
|Next Chapter 6|