| ©1990, 1995 | section list | 3: Early studies | overview | General Contents |
| Section 3.1 | 3.2 Dynamic control & machine learning subsections | Section 4.0 | ||
In the attempt to study human control of complex systems, studying collision avoidance in a realistic situation was a direct approach: but since that proved to have too many difficulties, a more roundabout approach was worthy of consideration, going via studying the control of a simpler dynamic system. The idea would be to start off modelling how people perform a task that is relatively easy to specify, and then gradually to extend the model to cover the control of more and more complex systems, until one is able to model realistic complex tasks. The realisation that there are no successful models even of human skills acquired in childhood, such as bicycle-riding, suggests that the approach via simpler skills is at least challenging.
Figure 3.1: The pole and cart, or inverted pendulum
One of the simplest dynamic control problems to be studied is that of the inverted pendulum, or pole-and-cart system (see Figure 3.1). In this system, a rigid pole is connected by a hinge at its base, to a cart which is constrained to move along a linear horizontal track. A force is applied to this cart: if appropriate forces are applied in a timely fashion, the pole can be kept from falling over, and the cart kept from wandering too far away from its starting position. Typical values used in the simulation are:
In some ways, the task of pole-balancing is at the opposite end of the spectrum to collision avoidance. As we have discussed above, collision avoidance has much complexity, and consequently there are problems in research methodology. A real pole-and-cart system may have relatively few problems, depending on the details of the physical system: in an idealised version, simulated on a computer, there are no unknown influences on the system. If machine learning is being studied, and no humans are involved, the research methodology is relatively straightforward.
In amongst the early literature on controlling the pole-and-cart system, there are mentions of involving humans, and possibly learning a skill from a combination of human input and machine learning techniques. The objects of this section are to examine this literature, considering how it reflects on the issue of representation, and to consider the implications for the study of complex human control tasks.
Donaldson [31] uses the pole-and-cart apparatus essentially as described above, to demonstrate a technique of learning which he terms ``error decorrelation''. This is an early suggestion of a way in which one might learn from a human how to perform some skill. The task in this case is to give as output a suitable value for the force to be applied to the cart. This output is constructed by taking a number of measured system variables (which we can think of as defining the `situation') and multiplying these by a set of coefficients. If there is more than one output variable, the same arrangement would be be replicated. The system learns from example: that is, an `expert' output is given from some other source, and the learning mechanism attempts to adjust the coefficients so that it matches the expert output as closely as possible. If the expert control signal correlates at all with any of the measured variables, the response of the learning system will become closer to the expert output.
We see here the dependency of learning on an adequate representation of the system. If the expert signal is not correlated to something that is measured, Donaldson's learning process will fail to learn.
Eastwood [34] makes the point that in order to construct a control engineering solution to a problem it is necessary to ``identify as many as possible of the contributory variables and to express their interrelationships in terms of mathematical models which can be simulated on the computer''. Using control theory, he derives a method of controlling the pole-and-cart system that we have described above. Eastwood gives results in the form of graphs plotting the behaviour of simulated, and real, pole-and-cart systems. For the idealised simulation, the control, and recovery from disturbances, is very quick, efficient and smooth. Applying the same control to a real pole and cart, the resultant motion is more erratic, though still well controlled. No mathematical model of a real system can ever be perfect, and the results from the real system illustrate the effect of the imperfections in modelling.
Human control of such a system differs in appearance from the control engineering solution. In pole-balancing, human physical control is not based on explicit mathematical analysis, and hence it does not suffer from the need to have detailed mechanical descriptions of things before being able to control them. For systems that are able to be thoroughly analysed, human control is liable to be less accurate, smooth, or efficient than theoretically-based control, but for systems that have not been thoroughly analysed, humans are still able to learn control where theoretical solutions are not yet possible.
Control theory is rooted in continuous algebra, and is quantitative rather than qualitative. Using qualitative control techniques [24] results in a response that at least superficially appears to be more like human control and less like that based on control engineering. So it seems reasonable to assume that the investigation of qualitative techniques would lead us closer to an understanding of human control.
An early qualitative approach to pole-balancing is given by Widrow & Smith [141]. Their approach has some similarities to that of Donaldson, but is based on a discrete, rather than continuous, representation of the problem. They also introduce `bang-bang' control, in keeping with their qualitative approach. However, they are much more concerned with demonstrating that their system can learn something, than with the relationship between this and human skill. This paper is one of the prototypes of the research which is now termed `connectionist' or to do with `neural nets'.
Michie & Chambers [80] take a much more explicit approach to learning to control the pole-and-cart system, and the learning is not from an expert, but purely from the experience of failure, which in human terms is a much harder learning problem. Their basic strategy is to divide up the state space of the problem into `boxes' (hence the algorithm name), which are defined by thresholds---particular values of each dimension of the state space. One can imagine the boxes as box-shaped regions of state space.
`In' each box, a separate learning process is going on. The data which is passed to each box includes what time, or how many moves, elapsed between that box's decision and ultimate failure. So, by a process whose details do not concern us here, each box learns what is the best decision to take. When each box has learned a good decision, the decisions of all the boxes put together constitute a strategy for the task as a whole.
Fundamentally important to the ability to learn well is the selection of the state space dimensions, and the choice of thresholds to divide the state space up into boxes. Each box is a region of state space that is treated as uniform for the purposes of the learning algorithm. If a box includes regions where a good strategy would recommend different actions, then this may compromise the ability of BOXES to learn any effective strategy at all. Attempting to avoid this problem by having very many very small boxes leads to long computation times, and strategies which are even less homogeneous and comprehensible.
Given the importance of the choice of dimensions and thresholds, one would expect the authors to discuss it in detail. In fact, they accept the problem dimensions as they would be defined by engineers, without comment. One could at least say that the dimensions given (x, x dot, theta and theta dot) are able to describe any possible state of the idealised system. Of the thresholds, they say very little. It seems as though the values were derived by a process of trial and error, guided by human intuition, and therefore difficult to document. The choice of dimensions and thresholds is clearly a problem area, and this corresponds to our problem of representation, as already discussed.
Chambers & Michie [22] discuss possible human-machine cooperation on the task of learning to balance the pole-and-cart. It must be pointed out that their objective was not to replicate a human skill by using machine learning, but rather to short-cut the process of learning, which in Michie & Chambers is entirely by experience of failure.
Chambers & Michie envisage three kinds of cooperative learning. The first is where the BOXES algorithm just accepts the decisions from the human, without effecting any decisions itself. The second is where there is provision for the human not to give a decision, and to leave it up to the algorithm, so that the decision-making would be shared. In the third case, some criterion would govern whether the algorithm had enough confidence in its decision to override any decision that the human might take.
The authors point out that BOXES can complement a human by providing consistency where a human might be inconsistent. However, whether this is an advantage depends on whether the representation is a good one. If the thresholds are badly placed, or the dimensions wrong, `inconsistency' within a bad box may be the optimal strategy, and in this case BOXES would be reducing appropriate `requisite variety' that the human had. On the other hand, if we knew what dimensions and thresholds were used by the human, then enforcing consistency might well improve performance, and BOXES would be truly cooperating with the human. There is, however, no discussion in this paper about what a human representation might be, or how to discover one.
More recent work on the pole-and-cart system adds little to the originals, from the point of view of the present study. Makarovic [75] derives qualitative control rules by consideration of the physical dynamics, together with many simplifying assumptions. Bratko [17] derives control rules from qualitative modelling. Sammut [118] extends the original BOXES work by performing rule-induction on the decisions generated by BOXES rules, to get a humanly comprehensible and concise set of rules not unlike Makarovic's. Between the time of consideration and the time of writing, researchers at the Turing Institute have done some work on the human control of a pole-and-cart system [79]. They do not derive any new representations for human control.
Little work has been done on the human side of pole-balancing. No empirical tests of representations have been made, to assess how closely they correspond to human representations. No-one has claimed to have discovered a specifically human representation of pole-balancing.
Makarovic's rules [75] are in fact for a double pole system, where a second pole is hinged to the top of the lower pole. For the sake of simplicity, we here give the form of the rule for balancing one pole, which comes from assuming that the top pole is perfectly balanced at all times. The notation is also simplified to accord with that already introduced.
IF theta dot = big positive THEN Push Left
IF theta dot = big negative THEN Push Right
IF theta dot = small
THEN IF theta = big positive THEN Push Left
IF theta = big negative THEN Push Right
IF theta = small
THEN IF x dot = big positive THEN Push Right
IF x dot = big negative THEN Push Left
IF x dot = small
THEN IF x = positive THEN Push Right
IF x = negative THEN Push Left
The ``big positive'', ``big negative'' and ``small'' values are
exclusive exhaustive qualitative ranges.
Although this rule is given in terms of the four basic physical quantities, its derivation used the idea of desired reference values for the quantities, justified in terms of control concepts. The present author also used this kind of approach, but justified in terms of human understandability, in devising an alternative representation for the pole-and-cart system. This was attempting both to enable a physical pole-and-cart apparatus (functioning at the time in the Turing Institute) to balance for longer than was being achieved by other means, and also to try out a representation that had more human flavour to it. The principle of this representation is to calculate desired values of the various quantities, and to represent explicitly the deviations of the actual values from the desired values, which would in turn affect the desired values of other quantities.
In the pole-balancing task, we may fix a particular position on the track as the place where we wish the cart to be. The difference between the actual position and this desired position---the distance discrepancy---determines what we wish the velocity to be. The connection between distance discrepancy and desired velocity may be done in two ways: either quantitatively, for example by making the desired velocity a negative factor times the distance discrepancy; or qualitatively, by dividing up the range of distance discrepancies into a small number of sub-ranges, and for each of these sub-ranges, assigning a particular value to the desired velocity. We may continue in the same fashion, qualitative or quantitative. Comparing the desired velocity of the cart to the actual velocity, we obtain a velocity discrepancy, and a desired acceleration of the cart can be fixed as a simple function of the velocity discrepancy. The desired acceleration may then be converted directly into a desired pole angle, based on the fact that the pole would be in unstable equilibrium at a particular angle, depending on the acceleration of the cart. Comparing the desired angle with the actual angle can give us a desired angular velocity, analogously with position and speed. Finally, comparing the desired angular velocity with the actual, we can derive a control decision, whether to apply the force to the right or to the left.
Implementing this strategy requires the setting of the functions which derive a desired value from a previously measured discrepancy. In practice, all except the last function were linear relationships, with constants that had the nature of time constants in exponential decay. It was discovered, by intuitively-led trial and error, that good results were obtained by setting what was effectively a short time constant for the last part of the decision (going from the angular velocity to the force), with progressively longer time constants, the longest governing the connection between discrepant position and desired velocity. This strategy was tried on a simulated pole-and-cart, producing control with apparently no time limit. The quantitative version led to an apparently static system on the graphic display, while the qualitative version led to small oscillations around the desired position. The quantitative version, suitably adapted to the physical apparatus, produced runs balancing for longer than had been achieved using Makarovic's rules implemented on the same apparatus (this was of the order of a minute or two).
It cannot be claimed that this was in any way a model of human control of the pole-and-cart, because no comparison was attempted. However, it does show that using a different representation of the problem can lead to solutions that are at least as good, and at least as comprehensible, as the representations already tried.
The problems in representing human control are more obvious than the problems in representing expert knowledge of the type used commonly in expert systems. In medical diagnosis, for example, the decision classes are the different possible diseases, and at least in many cases, these diseases fall into well-defined natural kinds. There is no disease `half-way' between mumps and measles. Also, as a consequence, much of the knowledge is able to be written down, and discussed, and the general kinds of symptoms that are relevant for diagnosis are reasonably well-known. It follows that representation of the problem is relatively easy, even though the rules for diagnosis may be intricate and uncertain, and probabilistic rather than definite. It is in this context that the classic study of soy-bean diagnosis [77] shows such success for machine learning. Because representation in this field is so clear-cut, Michalski & Chilausky did not report any difficulty, or even alternatives, in the choice of representational primitives for soy-bean diagnosis.
Many other papers in machine learning, up to the present, have considered methods of learning classifications based on some predefined set of attributes. Some recent algorithms extend the representation language, by introducing new predicates (e.g. [86]), and other recent work [154] considers the effect of differently aligning the axes of the problem space, to allow more effective rule-induction. Indeed, the idea of change of representation is now established as a topic within machine learning (see, e.g., [121, 137, 149]). Nevertheless these new techniques generally rely on the assumption that there is some underlying fundamental adequate description language known for the problem, which implies that the problem of change of representation could be seen as a search through a large but bounded space of possible representations.
This approach does not fit well onto discovering human representations. Our set of possible concepts is at least exceedingly large, if not actually unbounded, and there are no known laws restricting human ingenuity and imagination in representing problems or tasks in various ways. We certainly do not yet know what the principles governing human representations might be, and so we cannot predict in detail how a human might represent a problem, or what the possible range of human representations is.
Unassisted machine learning, purely from experience, is still a long way from being able to deal with complex tasks. To that extent, it as yet fails to give us a model of human learning about complex systems. It neither ties in with, nor validates, the study of mental models used in training, discussed above (§ 2.1.6). Nor is there much current concern with the structure of human representations. In a way, this is surprising, because the concept of the `human window' into intelligent systems has been discussed for some time [81]. It would seem obvious to the present author, that if intelligent systems are to have an effective human window, much more must be learnt about human representations, so that human and computer can share a language in which to communicate.
Certainly, the mere fact that a representation is qualitative rather than quantitative does not mean that it is human-like. Machine learning, of itself, does not reveal human representations. The machine learning community is aware of the centrality of representation, but offers no systematic approach to discovering representations, either of human performance, or of unstudied applications, where also there is no known underlying representation out of which to select and build a new one.
Learning about human control rules depends on having a satisfactory representation language in which to describe the rules, and because we do not have any techniques for discovering that representation language, machine learning cannot yet provide a good model of human performance at a complex control task.
Further work for machine learning will be dealt with in the appropriate place (§ 8.3), but this study now continues with experimental approaches to discovering more about human representations and complex task performance.
| Next Section 4.0 | |
| General Contents | Copyright |