This material is copyright and must not be reproduced or copied or public links set up to it without the formal consent of the copyright owner.

Patterns of Ruliness in HCI

To model users, or learn about them, we need to have good representations of situations and actions from a human, cognitive point of view, which involves the difficult but important question of what information people are using. Here, a method of using machine learning to evaluate and compare different representations is described, with examples of the results from applying this method to human control data from a simulation game. This method was used to show the significance of contextuality in the analysis of the human data, which opens up great possibilities for further work based around the idea of contexts. If problems can be overcome, concerning uncaptured data and analogue variables, one can look forward to potential applications in interface design and redesign, and in the study of human learning of cognitive skills.


A fundamental problem in complex HCI

HCI usually stands for Human-Computer Interaction. This usage does not tell us much about what the computer is being used for, nor how it is being used. Using a computer as a calculator raises very different issues to using one as the interface to controlling a complex process or system. But the phrase, ``human-machine interaction'', sometimes used when discussing interaction with large technological systems, can evoke a picture of interaction at the physical level (like humans pulling levers), and so miss out the emphasis on the computer as a tool which may facilitate interaction by channelling, or managing, the information flow. So, to orient the reader to the issues to be discussed here, I suggest that HCI is here read with the phrase ``Human-Complex-system Interaction'' in mind.

When a human interacts with a complex system, where there is a large amount of information potentially available, there is a basic HCI issue beyond ergonomics, screen design, or choice of the medium of interaction. That is: what information should be selected, given priority, or made easy to access? Quite probably, there will not be a static answer to this, but it will depend on the situation, or context. An important consideration here is to do with the human involved. What is their strategy for dealing with the system? What information is needed to implement this strategy? A designer of a complex system can, of course, ignore this question, and the frequent result is that information is presented in the same fashion as it is collected, from the many sensors or sources of information, to a multitude of displays arranged statically around the control area [Woods, 1991]. This ducking of the cognitive issues can be held at least partly responsible for errors and disasters.


One concept which can help clarify this area is representation, involving the choice of terms - the language - with which to describe particular systems and tasks. A representation can be low-level, i.e., using terms relating to the fundamental physical features of the system (e.g., temperatures, pressures, speeds, basic control settings), but describing a system in this way would probably be very lengthy; it would not match many of the rules to be found in an average rule-book or operation manual; and it would not necessarily correspond to the terms people use when describing a task involving a complex system. Searching for higher-level, more cognitive representations should be a central theme in HCI, because they could elucidate the way humans operate, and their information needs. Such representations should include both the terms that humans use in talking about a job, and also the terms in which we can make sense of the human's actions at a level below that of their conscious awareness. This is not necessarily the same as a system-level representation.

There are more reasons to be concerned with representation. Many of the chapters in this volume involve learning in some way, and it is well recognised in the machine learning community that the representation of the material to be learned crucially affects the effectiveness of learning algorithms, whether those be symbolic, connectionist, or whatever. One could therefore say that finding a good representation is a large part of learning.


One potentially important feature of a human's representation of a complex task is that task's division into different stages, or contexts. This would be particularly significant in a dynamic or ongoing task, rather than tasks involving independent and isolated decisions. For independent decisions, a human starts from the beginning each time, and so any ascertaining of context can be seen as part of the decision process. We can imagine this kind of decision-making going on in diagnostic tasks, or where experts are asked for opinions or judgements on specific questions. In contrast, when a human controls a system continuously over time, his or her strategy can take advantage of dividing the task into a number of distinct stages, or contexts, where the context does not have to be re-evaluated before each decision, but instead there is a continuing awareness of the current context, and a watchfulness for cues signalling a change from one context to another.

Clearly, if a human does operate in a contextual way, the contextual divisions and articulation are important aspects of their representation of the task or the system with which they are working. In effect, a context-based approach divides a task into a number of separate micro-representations. In each context, only a limited number of rules need to be considered, and only a limited number of items of information are relevant to these rules. If, for any task, a human does use this contextual approach, then the patterns we recognise in records of human actions are going to be clearer if we take into account the contextuality than if we assume that the whole strategy is monolithic.


The ideas of representation and contextuality are important in this discussion, but of particular interest is the related concept of ruliness. Loosely, the idea of how ruly (or unruly) a set of data is, means how much sense we can make of it: whether the data are rationally explicable, or predictable, in whatever way. More formally, we could perhaps define ruliness in terms of the predictability of unseen items of data (from a substantial set), given some learning algorithm. However, it would probably be difficult to achieve a satisfactory formal definition.

The dependence of ruliness on representation can be illustrated from common experience. Many people must have played party games, such as the one where a pair of scissors is passed from person to person around a ring, accompanied by a declaration, at each passing, of whether the scissors are `crossed' or `uncrossed'. Most people, confronted by the task of classifying each passing of the scissors as crossed or not, seem to attempt at first to induce a rule, from examples of the reported state of the scissors, based on a representation that includes visible characteristics of the scissors. The amusing part of the game is in seeing how much experience is needed, by those initially naive to the game, to find a representation adequate to induce the (very simple) rule which determines crossedness, perhaps despite misleading actions and clues from those in the know. To the naive player, the examples are unruly, because all tentative rules, induced from a few examples using representations including only obvious features of the scissors, are contradicted by subsequent examples. To those in the know, the examples are perfectly ruly. Several other party games are based around similar learning tasks.

The perceived ruliness of data is also dependent on the method of learning. Perhaps it is because learning is innate to humans that it seems difficult to be aware of the learning processes that we use. In machine learning, in contrast, the algorithms or methods are well-defined and repeatable, although the advantages and disadvantages of different approaches are not yet fully clear, since the subject could be regarded as still at an early stage of development. But there are many examples of academic papers investigating differences between the performance of different algorithms (e.g., [Gams and Lavrac, 1987]), so at least we can safely say that the apparent ruliness discovered in data depends on the learning technique or algorithm.

Having recognised that ruliness is dependent on representation and learning method, there are at least two ways to use this result. Firstly, if we fix the representation, we can compare learning methods by comparing the apparent ruliness of the same data with different algorithms; but also, secondly, we can compare representations by fixing the learning method, and comparing the ruliness of the same data, represented differently. It is this second comparison that we will be discussing further in this chapter.

Experiments on ruliness and representation


The organisation of an experimental test of this method of evaluating representations by measuring ruliness is not straightforward. One has to have a variety of plausible ways of representing the same experimental data, which means, among other things, that one cannot choose simple tasks, where there is often only one plausible representation. And there are many problems with the kind of data that one might imagine as being available. Firstly, many real-life systems have vital inputs of information, affecting human decisions, that are not captured electronically. This means that the data are not going to be perfectly ruly, even under the best representation using the available variables. Secondly, computer systems that generate a great deal of human control data, such as computer games, tend not to afford the facility of recording that data. Thirdly, for many systems, such as those that require analogue inputs and outputs, it is difficult to find any convincing representation of the data that would allow even the first steps towards a thorough analysis of the data's ruliness. Car driving is a common example of this last kind of system. Fourthly (an opposite point), many computer tasks, such as word processing, have a uniformly defined representation, in virtue of the fact that the entities being worked with correspond directly with definable computational entities. This means that there may not be suitable alternative representations.

For these, and other practical reasons, the author chose to develop a purpose-built simulation game on a graphics workstation, designed to provide tractable data, while attempting to retain relevance to tasks in the real world. The task chosen was nautical mine-hunting, with a ship and a remotely-operated vehicle (ROV) connected by a cable. Subjects on this task needed several hours of learning and practice before developing a reasonable facility at the task. Their actions, mediated through discrete mouse-button clicks, were recorded for analysis. The game had a scoring system, to encourage a uniform appreciation of the task. A more detailed description of this work, and further discussion on most of the issues discussed in this chapter, is in the author's doctoral dissertation [Grant 1990].

Outline of analytical methods

In order to give a good chance that the representations chosen were suitably different, it made sense to take on the one hand, representations in terms of the quantities explicitly present in the simulation system, and on the other hand, ones designed to be closer to human cognitive representations. Each of these was approached in two parts: the representation of actions; and of situations. As suggested above, these were to be compared by measuring the ruliness of the same examples, represented in these different terms, analysed using the same machine learning method.

The system-level actions were obvious enough, since the interface had been designed to make the characterisation of actions relatively easy, by avoiding analogue input. To describe situations at a system level, the relevant information which went into forming the displays was taken. Cognitive-level actions were devised by following a similar procedure to other approaches such as chunking [Laird et al. 1987] and plan recognition [Davenport and Weir 1986]. Here, it was assumed that if a short sequence of actions occurred frequently, it was probably being treated by the human as a single, compound action. However, the representation of situations at a more cognitive level posed more problems. There were a reasonably large number of different variables in the system, and it seemed clear from first-hand experience that only a small selection of these would be relevant to a particular action. We wanted to know which variables were relevant, so that examples could be prepared using those variables to define the situations, to be put along with the actions (which could be null actions).

At this first stage of experimentation, suitable representations for particular actions were hand-crafted with up to about a dozen variables: on the one hand (the system level), using variables that were explicitly present, and on the other hand (the more cognitive level), aggregating them into higher-level variables that seemed plausible from experience.

When portions of the data had been selected, represented in different ways, and assembled together, the ruliness was assessed in terms of the performance of a rule-induction program, CN2 [Clark and Niblett 1989], with the data. 1 Parts of the data were taken as training sets, and rules induced, connecting the situations (as attributes) and the actions (as decision classes). Then other data, not part of the training sets, were tested on the induced rules, and the degree to which the predictions of the rules matched the new data was recorded.


The first interesting analysis was done with data relating to the turning of the ROV (remotely operated vehicle). The method of assembly of the data resulted in a preponderance of examples where there was no action, so the simplest attempt to predict the action in any situation would be to predict that no action was taken. This is called the default rule, for this example. In fact, the first rules generated from the data had predictive powers scarcely different from the default rule, though different representations did produce sets of rules of differing length and complexity, with the rules from the more human representation being shorter. Following this analysis, the measure of ruliness adopted was the difference between the predictive power of the generated rules, and the predictive power of the default rule.

The next interesting analysis used data relating to the ROV speed control. Table 1 shows some summary results from the analysis of one subject's data. From left to right, the different columns show data relating to a sequence of time periods of practice, from earlier to later. From top to bottom, the broad divisions (RS0, RS1, RS2) are between three parallel analyses using three slightly different representations. Within each of these divisions, the `relative' figure is the improvement in predictive power of the rules, over the default rule, as previously explained. The `examples' figure gives the number of examples in the data set, and the `overall' figure is the absolute accuracy of the induced rules at classifying examples in each set of data. The third column's data was used to induce rules, and hence the accuracy of the predictions on that data is artificially good, as the same data is being used both for training and test data. But neglecting this column, the main important feature of the table is the difference in relative accuracy of the rules induced under the three representations. We can see a clear, if small, increase in the relative accuracy of RS1 and RS2, intended to be more cognitive-level representations, compared with RS0, the lower-level representation, across the figures in the central block of the table. The low relative accuracy figures in the outermost columns suggest that any rules that were being followed in these areas were substantially different from the rules discovered in the training set. This is consistent with continued learning and development over time.


Results showed that differences in representation produced differences in ruliness, but that does not in itself get us nearer to human representations. We could at this stage, of course, simply generate many different representations, and test them in the way described. However, the testing process requires a large amount of computation, and to search exhaustively for good representations would most likely be futile, since there are an exceedingly large number of possible representations. Instead, we really want to find out what information is being used by people, as the basis on which they decide what actions to take at what time. A first step towards this is to find out what basic information is being used, and further progress would be through finding higher-level combinations of variables that more closely approximated to the cognitive-level concepts being used by the human.

The research that was done in this area Grant [1990] took this first step. All the information available via the control interface was given a cost, and enabled to be turned on or off at any time. When the subjects progressed to attempting to maximise their scores, they turned off all those parts of the display that they could do without, so leaving the ones they needed. From the data collected, it was possible to tell, for every action that was taken, what information was visible at that time. One clear result of this experiment was to show that different people used the sensors differently. Further analysis was needed to discover more about the contextuality of human control.

Contextuality in human control

When the information usage results were examined, they proved to be not of immediate use in looking for naturally occurring human contexts. This appeared to be because there were too many slight variations - too much `noise' in the data. However, clustering the patterns together revealed a number of fundamental sensor usage patterns which, along with slight variations, had a rough correspondence with the stages of the task revealed in discussions. In some cases, the contexts were very clear: certain stages of a particular subject's strategy demanded certain specific sensors, and it appeared that the rules used in those contexts were becoming progressively clearer as the subject gained experience.

The data analysis then proceeded as follows. The examples were divided into the different contexts, according to the sensors that were visible at that time. Each context had its own selection of variables, assembled from information about the sensors that had been visible, and it was with those attributes that a separate rule induction was performed for each context.

Some of the results are reproduced in Tables 2 and 3 In order to avoid the problem of using the training set as a test set, the data from each time period (C--H) were divided into two, simply by allocating alternate examples to the two sets (e.g., C0 and C1). The subscript 0 sets were used for training, and the subscript 1 sets for testing. Each cell of these tables has firstly, the overall accuracy of the rules induced from the training set, when tested with the examples of the test set (``overall'' in Table 1) and secondly, a positive or negative figure indicating the difference between this overall accuracy, and the accuracy of the default rule, as explained above (``relative'' in Table 1). The number of examples in each set is given with its name: thus, the set C0 had 60 examples. The results were quite striking, though they amounted to less than a good model of human skill. Some contexts were highly ruly (e.g., Table 3), and some were less so, down to some (e.g., Table 2), where the learned rules actually performed worse than the default rule. If the data in these tables is analysed all together, the performance figures come in between the ranges of performance typical of the tables separately.

Clearly there were great differences in the character of the subjects' behaviour between these different contexts, which means at least that the contexts have a relationship with some important feature of the human way of doing things. So, what we can see here is that the concept of context, as here used, is confirmed as significant by an analysis based on ruliness. This can be seen as related to Rasmussen's concepts of skill-, rule- and knowledge-based behaviour in process operators (e.g., [Rasmussen 1983]). If a set of data for one context appears ruly in terms of low-level variables, then we might imagine it as being a context in which the operator is at a skill- or a rule-based level. In contrast, where a context is unruly in terms of simple variables, there is a choice of conclusion. Either there is some pattern-based variables present, which the learning algorithm is incapable of detecting; or there is higher-level processing (knowledge-based) going on which, again, is not uncovered by the learning algorithm.

The methods reported here are far from ideal, and, as we look at the problems, we can see opportunities for further progress along these lines of investigation. To start with, the information-costing interface is a special case, and makes the task different from the task with a more ordinary interface. How could we adapt the methods here for use with a wider range of interfaces? One possibility is to extend the ruliness analysis so that it serves also to pick out the different contexts in the first place, as well as confirming their existence, as is described above. This remains an area for future research.

Outlook for further developments

General problems

There are also more general problems to address, for the furtherance of these lines of investigation. Firstly, for real-life studies, we need to tackle the problem mentioned above, about any significant variables that are not available in electronic form. For example, a process controller might consider the colour of smoke, or particular sounds or vibrations from machinery. The least we need to do in this kind of situation is to capture some measurement relevant to the variables in question. What if there is some pattern-based element to the cognitive concept? This leads on to the next consideration. In order to get from, say, a television picture to a human concept, we would probably want to employ some kind of pattern recognition technique, not directly to discover rules or ruliness, but more basically to capture pattern-like concepts.

Another point mentioned above concerned analogue channels of information and control. In order to work towards a human representation, and to assess ruliness, we have to be able to find some symbolic, or at least discrete variables, on the basis of the analogue channels that are measured. When looking at car driving, for example, we can describe the actions at various levels of granularity, and it remains a taxing problem to relate these different levels together effectively. Perhaps here, we should not be looking at either symbolic or connectionist models alone, but rather together. We would expect different aspects of human skill to show up at different levels of granularity. When looking at detailed physical movements, we should consider how we can model, or allow for, physiological and psycho-motor aspects of human skill; at a larger granularity, there is the more purely cognitive side, exemplified by conscious reasoning. Clearly no one current methodology is optimally suited for modelling this whole range of aspects of human skill. 2

Possible applications

Finally, we should look at the possible applications of the knowledge that we might foresee coming from the methods discussed above. Perhaps the clearest application, and the one most directly relevant to HCI, would be in the redesign of interfaces to complex systems and tasks, to make them easier to use and less conducive to human error. If we were to find out, in any particular instance, the details of a human's representation of a complex task, then we could in principle redesign the interface to that task to match their representation more closely. This could use human contexts as the basis for the organisation of the information. I say redesign, because a human representation of a task will be dependent on the existing interface. At least in the field of complex tasks, there is no single optimal representation, and one cannot solve the interface design problem once and for all. 3 An iterative approach is the best that one can hope for in this area.

Looking ahead further, the initial design of complex tasks and interfaces could be helped by an appreciation of the ways in which humans habitually organise tasks. This would go forward from the `Model Human Processor' of Card, Moran and Newell [1983], to model other aspects of human ability. There would be a possibility of focusing much investigation on the characteristics of the contexts into which humans habitually divide tasks, or, to put it another way, the contextuality of peoples' representations. The questions one would be tackling would be: what structuring in the task makes it humanly possible to execute; and what implications does this have for the design of such tasks?

The modelling of human learning is perhaps more of a long-term interest in HCI, concerning the design and efficiency of training, and how difficult a new system is to learn, with or without externally structured training. But it is an open question, to what extent human learning can be modelled, without being corroborated by a clear knowledge of what humans have actually learned in particular situations. Traditional examinations or interviews scratch the surface; but methods building on discovering human representations could give a more thorough view, not reliant on verbal reports of uncertain veracity.


Machine learning methods generally, including pattern recognition and connectionist techniques as well as symbolic ones, potentially offer great things to HCI, despite difficulties remaining to be overcome. What we have seen here is firstly, how the tool of machine learning provides a vital measurement of ruliness, giving us a method of discriminating between different possible human, cognitive representations of situations and actions. Secondly, the experimentation and discussion surrounding contextuality in human representations is a starting point for working out this methodology, and applying it to issues which have a bearing on real problems. We should be encouraged to continue and develop the use of such methods to HCI, both in research, and progressively in practice.


Card, S. K., Moran, T. P., and Newell, A. (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.

Carroll, J. M., Kellogg, W. A., and Rosson, M. B. (1991). The task-artifact cycle. In: Carroll, J. M. (ed.), Designing Interaction: Psychology at the Human Computer Interface. Cambridge University Press, Cambridge.

Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4): 261--283.

Davenport, C. and Weir, G. (1986). Plan recognition for intelligent advice and monitoring. In: Harrison, M. D. and Monk, A. F. (eds), People and Computers: Designing for Usability, pp. 296--315. Cambridge University Press.

Gams, M. and Lavrac, N. (1987). Review of five empirical learning systems within a proposed schemata. In: Bratko, I. and Lavrac, N. (eds), Progress in Machine Learning: Proceedings of EWSL-87, Bled, Yugoslavia, pp. 46--66, Wilmslow. Sigma Press.

Grant, A. S. (1990). Modelling Cognitive Aspects of Complex Control Tasks. PhD thesis, Department of Computer Science, University of Strathclyde, Glasgow.

Laird, J. E., Newell, A., and Rosenbloom, P. S. (1987). SOAR: An architecture for general intelligence. Artificial Intelligence, 33: 1--64.

Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Transactions on Systems, Man and Cybernetics, SMC-13: 257--266.

Woods, D. D. (1991). The cognitive engineering of problem representations. In: Weir, G. R. S. and Alty, J. L. (eds), Human-Computer Interaction and Complex Systems. Academic Press, London.