IV. THE HCLSA MODEL


Having completed a review of the field of LA and, specifically, a formal presentation of the recent work done on CLSA models, we turn to the subject of a hierarchy of automata.

IV.1. Hierarchical Single Learning Automaton (HLSA)

In the field of LA, we encounter but one specific example of a hierarchical system previous to this work. This is the HLSA discussed by Viswanathan (66) in his doctoral dissertation. The structure presented is a two-level single LSA designed to interact with a periodic random environment. Figure IV.1. illustrates the model. For this case, the root of the HLSA stochastically selects a period (See Appendix B) to work on; then a lower-level node stochastically selects a value within the given period. There are as many lower-level LSA's as there are possible periods from which the root HLSA may select.

Communication between levels is an important aspect in any hierarchical model, because the performance of the automata depends on how common information is shared by all levels. It is important to note that in Viswanathan's model the compensation is made at both levels; also, since selections at both levels are made stochastically, the automaton must learn the correct period and the correct selection within that period at the same time. Viswanathan strongly suggests the need of a hierarchical structure in order to handle complex systems.


IV.2. Hierarchical Collective Learning

To deal with complex systems by means of a single LSA seems quite inappropriate and very impractical. Steps towards more complex systems are given by both Bock's CLSA and Viwanathan's hierarchy.

In this thesis we have combined the benefits of both approaches through a formal description and simulation of a Hierarchical Collective Learning Stochastic Automaton (HCLSA). We take a set of CLSA's and arrange them in a hierarchical manner; the model is designed to interact with a deterministic environment. A graphical description of our HCLSA is presented in Figure IV.2. The fundamental concept in such a structure is that for each transition taken by the CLSA 1 at the root of the tree, and entirely different CLSA is called at the next level down in order to select the set of lower-level transitions required to achieve the transition at the root. The process is repeated recursively until a primitive CLSA is reached. This primitive CLSA, i.e., at level n, generates the most elementary transitions of the entire HCLSA. The reader will note that the HCLSA is a generalization of the model proposed by Albus (1) in control systems.

Before expanding any further on the subject, we postulate some appropriate nomenclature (see Figure IV.2) We use a canonical notation to denote any CLSA. For instance, CLSA 1.2.1. is the CLSA selected for the first transition of CLSA1.2.; CLSA 1.2. is the CLSA 1. A general parent CLSA is denoted as CLSA1...k' while a lower CLSA at the next level down is denoted as CLSA1... k.j. This notation is unambiguous and proves helpful in describing the model.


IV.3. Hierarchical Communication

Let CLSA1.. k be a parent mode and CLSA 1... k.j. be one of the next lower-level nodes. We then define the following communications between any two levels:

We note that in the case of the root CLSA1, it is the environment that communicates the necessary information. On the other end of the hierarchy, the most primitive CLSA does not have any further lower level CLSA's to which to communicate information. Occasionally, the lower-level CLSA1... k.j may have a unique unchanging starting state. However, most often the parent CLSA 1... k.j should start its transitions.

It is not immediately obvious whether or not it should be the task of a parent CLSA to set the number of transitions for its lower-level CLSA. On the one hand, one might consider that the parent mode is only concerned with the final state reached by the lower-level CLSA, i.e., how close it has come to the required goal. On the other hand, one might assume that the parent CLSA additionally determines the exact number of transitions to be taken by the lower level CLSA. In this thesis, we will assume the second approach, keeping in mind that such an assumption precludes the lower level CLSA from learning an optimal short path. Another piece of information shared between two hierarchically connected CLSA's is vector Y, which is the set of transitions generated by the lower-level CLSA and provided to the parent CLSA, so that the parent CLSA1.. k may compute a composite error value for use by lower-level CLSA 1... k.j in compensating itself.

Finally, let us comment on the error information passed from the parent CLSA1...k to the lower-level CLSA1... k.j. In the discussion of the interaction {CLSA, QDE} we noted that such a function could be a mapping of the number of correct transitions taken. For the case of a HCLSA, however, it would not be reasonable for a parent CLSA to know in advance the correct path of the lower-level CLSA's. If that were known in advance, it would be trivial and useless to require another lower-level CLSA to follow the path. hence, in the HCLSA the error function, e, is a mapping of the final state and the goal state, i.e., Y (nmax) and T (nmax). Such considerations does not hold true for the root CLSA1 because it is assumed that the environment is deterministic and knows in advance both the goal and the vector of correct transitions. Therefore, the error is a function of all the elements of vectors Y and T.

To recapitulate briefly a higher level CLSA1... k.j provides a lower-level CLSA 1 ... k.j with the initial state, i.e., SS, the number of transitions, nmax, and an evaluation of how close to the goal the lower-level CLSA1... k.j. has come, x. The HCLSA must remain in a closed loop for every pair {CLSA1... k, CLSA1... k.j} until a satisfactory result is achieved, because it is only then that the CLSA1... k can proceed with its own transitions. Again, the loop works recursively upward until the root CLSA1 submits all the transitions for a given interaction to its environment for the ultimate evaluation.

Closed loops of the previous type are designed to allow the HCLSA to learn lower-level goals concurrently with learning higher-level ones. It must be emphasized that the environment only rewards the transitions of the root CLSA1. This fact has considerable significance in the way in which learning must take place, as described below.


IV.4. Hierarchical Coordination Modes

In order to acquire learning, we consider four different ways in which the application of the compensation can be accomplished, we shall define this application as the hierarchical compensation coordination mode, or simply, coordination:

Environmental compensation at every level would be quite efficient because it would allow the environment to deliver error messages for all CLSA's involved. However, it goes against the idea of a collective error evaluation, which is central to the purpose of this study, as it will be described later.

In order to exemplify coordination modes, we provide the following analogy. In order to understand computers, a novice can take any of several approaches, i.e.: one approach might be to enter into a computer center with all the manuals in his hands, and proceed immediately to solve all the high-level operations and maintenance jobs of both the application and systems software of the computer center. This would be a Top-down approach, since be would have to use his imagination to define lower-level goals, such as providing a spooling system for the printers, accessing a privileged account, understanding a language, and the like. However, he may very well invent wrong lower-level goals that will not help in solving the computer center needs.

A second approach would be to learn all there is to know about computer languages, operating systems and teleprocessing etc., and then finally to operate his particular computer system. This is a Bottom-Up approach. A combined approach would be to learn the require skills just as in the Bottom Up approach, but not in complete detail, at some point he would attempt to solve the overall task of operating and maintaining the computer center. Upon encountering any problem he would review the topics that he did not learn well enough in the first place, and reinforce those weaknesses.

Following the analogy just described, a Bottom-Up approach allows the automaton to learn primitive goals in the lowest levels of the hierarchy first. These can later be used as viable structures to learn upper level goals. Such a process continues upward until it reaches the root CLSA. This would seem to be the most acceptable method for collective models since, as we discussed in Chapter II, many authors strongly suggest that lower level goals should be learned first.A Top-Down approach suggests that it may be possible to learn lower-level goals at the same time higher-level ones are being learned. In this case a HCLSA, when confronted with a higher-level goal, must assume the required lower-level subgoals in order to achieve the higher-level ones. In this case these subgoals may turn out wrong and new ones must be defined. In a Top-Down experiment, a CLSA1... k must call its lower-level CLSA1... k.j, based on a supposed starting state, SS, an assumed goal, T (nmax). A combination of both approaches, a Top-Down/Bottom-Up, is also possible. First, the HCLSA is forced to learn lower-level goals by means of a Bottom-Up approach; once a reasonable degree of learning has taken place, the Top-Down approach used to reinforce paths that have not been learned completely, i.e., to reinforce a given behavior. As we discuss it below, just how much is a reasonable degree of learning is one of the primary areas of investigation in this work.

Before stating the main hypotheses of this dissertation, let us briefly present the ideas that led us to the proposition of the HCLSA as a model for learning. Initially, the ideas were mere arguments that stimulated long and interesting talks between this author and the various members of his dissertation committee. As the ideas were more formally discussed the model grew into a more concise form which motivated the various steps in the research here presented. The type of learning presented by automata theorists is fundamentally based on single interactions. However, if machine learning is to eventually emulate that existing in human beings, it requires some sort of collective form of interaction, as discussed by Bock (8) and Michie (32). As previously described, in a collective experiment the automaton may fail to determine the right transition out of the set of transitions taken before the environment delivers a single summarial response. As the length of the collection grows larger, discrimination becomes difficult, if not unattainable, as reported by Bock and his students (9,22, 28, 42, 49, 55)

To solve this problem, a novel and challenging ideas is suggested by Albus (1). The proposition can be simply stated as follows: In order for the automaton to be able to determine which transition should be algedonically compensated, it is necessary to design the experiment using a hierarchical organization. This can be used to insure reasonably short collection lengths at each level in the hierarchy. In such an organization the automaton communicates with the environment only at the root level. In this way the single overall value delivered by the environment need only by distributed among the transitions of the root CLSA. It is a task of the automaton itself to serve as an internal environment for its lower-level CLSA's, based solely on external environmental interaction only at the root CLSA.


IV. 5 Hypothesis

Considering a hierarchical organization and the coordination modes described before, the following hypotheses are stated regarding the learning behavior of an HCLSA:

H.1. IT IS HYPOTHESIZED THAT THERE IS A TOP-DOWN/BOTTOM-UP MODE FOR OPTIMAL LEARNING.
The hypothesis is based on the traditional belief that lower-level goals should be learned first. The hypothesis is to be validated in terms of the amount of time required to learn a given goal: the amount of storage required, and the average probability of selecting the right path. This last metric varies inversely with the traditional concept 'e' optimality, because the higher the probability, the lesser the amount of penalty the automaton will receive.
H.2. IT IS HYPOTHESIZED THAT FOR A TOP-DOWN/BOTTOM-UP APPROACH COORDINATION MODE, THERE IS AN OPTIMAL NUMBER OF INTERACTIONS BETWEEN LOWER-LEVEL CLSA's AND THE ENVIRONMENT, BEFORE MOVING UP TO HIGHER LEVEL ONES.
Again stating the next hypothesis, let us briefly comment on the context in which it is stated. Since the HCLSA model is recursive in nature, lower-level goals are reinforced every time a higher-level one is being learned. The reason for this lies in the fact that in order to achieve their goal, a top CLSA1... k must call its lower-level CLSA1...k.j's to achieve required subgoals hence causing an algedonic loop to occur at the lower level.Using again the analogy of the novice programmer, whenever he writes an application program to solve a specific problem, his knowledge of the computer language is reinforced. This is what we mean by hierarchical learning.
H.3. IT IS HYPOTHESIZED THAT IF HIERARCHICAL LEARNING OCCURSTHEN LOWER-LEVEL GOALS WILL BE LEARNED BETTER THAN HIGHER LEVEL ONES USING A BOTTOM-UP COORDINATION MODE.
This hypothesis is to be validated using the average probability of selecting a goal within various levels in a HCLSA.


IV.6. Formalization of the Model

Given the hypotheses just state, we now proceed to define the HCLSA model more formally. We have purposely defined a HCSA as a single entity, although in actuality it is composed of a set of CLSA's. This is done in order to maintain the idea of a single system interacting with the environment. the interaction between the environment and a hierarchical collective learning stochastic automaton is given by the pair {HCLSA, QDE}, where QDE was previously defined in (38) and

In order to clarify the ideas, a detailed example of a HCLSA seems appropriate. For the example we use a set of production rules to represent the transitions available to the automaton. The example demonstrates a general application of the HCLSA concept. Let us begin with a discrete and finite two-dimensional plane, with coordinates (c1, c2) with domains 0 < c1 < 100 and 0< c2 <100. The automaton may be located anywhere in this plane at a given time t. At time t+1, the location of the automaton is given by (c1+delta c1, c2+ delta c2).

Let the set of production rules for these increments be:

Or simply, By the application of these rules sequentially, the automaton can reach any location in the defined region. In general, the automaton requires a very large probability matrix to store each possible transition. Yet, if we provide the automaton with some sort of abstraction, it is possible to work only with the increments, and thus the space 100*100 may be mapped onto a set of vectors, one for each transition in the process. We are interested in teaching the automaton how to reach a given location after a given umber of transitions. The method to be used is a trial-and error process. The learning is to take place by a continuous application of the production rules, with an algedonic compensation after a set of nmax transitions have taken place. Let us initially assume that no compensation is applied. In this case the automaton selects randomly from the instances of the production rules. Consider, for example, that the starting state is given by location (1,8), with nmax=3. Let the goal of the automaton be to get to location (3,9). After four consecutive random interactions of three transitions each,the automaton could have generated the following:
Interaction ......... SS ............. Locations after each transition

______________________________________________________________________________

1 ..........................(1,8) ............. (2,8) (2,8) (2,9)

2 ..........................(1,8).............. (2,8) (2,9) (2,10)

3 ..........................(1,8) .............. (2,8) (3,8) (3,9) ...... Desired Goal

4 ..........................(1,8)............... (2,8) (2,9) (2,10)

______________________________________________________________________________

Although the automaton reached the goal state at interaction 3, it would not necessarily reach it from then on, i.e., reaching the goal once does not imply knowing how to reach it every time (see Figure IV.4)

Let us define the error function e as the number of locations away from the goal state. For the case of the interactions just describe, the environment delivers the following values:

Interaction .................. Error X (t)

_____________________________

1 .......................................1

2 ....................................... 2

3........................................ 0

4 ........................................2

_____________________________

Due to the nature of a stochastic selection process, if the automaton was to be algedonically compensated with a scheme like (10), the automaton still would not always use the correct path to reach the goal. However, after a certain number of interactions, the matrix would steadily show a higher probability of selecting a path. Of course pursuant to our discussion of preprogrammed intelligence versus adaptation, it this example we would have tutorialized the automaton by saturating the matrix (turning it into a deterministic transition matrix) right after the correct path to the goal was found. On first thought, this might seem desirable; however, we don not want an inflexible machine. if we had that in mind, we could go ahead and use an efficient deterministic search algorithm. But in fact, we want just the opposite: an adaptable machine, where the environment can set a different goal state, and the machine can still learn to reach it without requiring an external algorithmic modification to its internal structure. For this we are willing to pay the price of a slower leaning method The example just described is nothing more than a simple CLSA; it requires a large number of interactions with its environment to update its transition matrix in order to obtain adequate performance for a given goal.

We now set up a more complicated example by specifying a larger number of transitions. It is clear that the larger the nmax becomes, i.e., the larger the collection length becomes, the more difficult it becomes for the automaton to determine the right with path. The probability of achieving the goal in the first trial is [1/(nmax-1)]nmax. For the next trial, this value increases due to our compensation after each trial; yet, it remains dependent on nmax.

The foremost assumption in the present work is that it is possible to subdivide a large set of transitions required for a given interaction into a number of smaller sets of lower-level transitions. With this scheme we intend to provide a more efficient way of learning, where efficiency is measured in terms of the average probability of achieving the goal as well as time and memory requirements. The efficiency of a hierarchical versus a non-hierarchical approach has already been studied by Viswanathan (66) as was previously discussed.

In order to implement hierarchical approach, we must provide the automaton with a built-in hierarchical structure for learning. This should not be confused with preprogrammed deterministic intelligence. The automaton still has to modify its behavior only on the basis of the information shared between the environment and itself through its continuing collective interaction. Let us use the hierarchical structure in order to show the feasibility of the HCLSA approach. For the case, it is necessary to define a multilevel set of production rules. At the root, we have powerful production rules, and in the lower level, we have the most concrete and primitive rules.The production rules may now be started for the various levels as:

With the rules of level j, the automaton can move to any position in the plane; however, with the rules of k, the automaton will move faster. In this way, the probability of getting to a given position is increased to (1/kmx)nmaxk, where kmax is the number of instances of the production rule at level k and nmaxk is the number of transitions taken. The hidden assumption in this case is that in order to make a transition at level k it is necessary to call level j for the primitive transitions. For example, let the instance required at k be: This requires a call to level j with initial state (0,0) and goal state (1,-1). this can be done as follows: It is required to learn either of these paths before taking such transition at level k. As we established earlier, there are various modes to obtain such learning. Now let us add the highest level root production rules as: This rule is powerful; it allows for a displacement of any range within the plane with just a single transition. however, obviously the definition of such rule requires the existence of rules at levels k and j. The definition for a three dimensional space would require a higher level in the hierarchy and could be done without much additional complexity. In summary, for the lowest level, the displacement is a single vertical or horizontal movement. The next level up allows for a displacement bounded by a radius of 1. Finally, at the highest level, the displacement is only limited by a radius of n.


IV.6. Step-by-Step Example

As a further example to clarify the mechanism of the HCLSA, before presenting the algorithms, let us describe the problem as the automaton would solve it.

The required transitions are not known initially and the automaton must learn them by continuous interaction in the following fashion: From the example just described, it is observed that the mechanism of the HCLSA is highly recursive and therefore it is easier to describe it in such a way. We present the pseudo-code for a Top-Down implementation in Figure IV.5. The algorithm was implemented in PL/I in order to run the simulations of this thesis. As we can see from the figure, we must pass a set of variables in a recursive hierarchy. These are the {CLSA}, which includes all the information described in (49); a level that tells the automaton its depth in the hierarchy; SS and nmax for the starting state and the number of transitions that are required at the lower-level; finally, a vector T that stands for the truth vector.

It must be understood that the existence of a truth vector is necessary in order to proceed with the algedonic compensation. We include in the algedonic algorithm both a mechanism for the evaluation and an algorithm to compensate and update the probability matrix. For the case of the higher level in the hierarchy, it is the job of the environment to provide the ultimate performance evaluation. However, as the process proceeds down the hierarchy, it is required that higher levels provide assumed truth vectors for the lower level transitions. While the environment may know all the valid transitions, the lower-level CLSA's can only assume them, basically as a function of the desired goal at their own levels. These assumptions must be evaluated by higher-level CLSA's and eventually corrected by the environment. For the previous reason we include in the pseudo-code the lines:

With this logic we create an artificial environmental function of evaluation for lower-level CLSA's. STOCHASTIC SELECTION is a procedure to select a transition using a random number with a cumulative distribution of all possible transitions for a given level. ALGEDONIC COMPENSATION is a procedure to update the transition matrix according to the values selected for a given level as reported in the action vector Y. In short, the procedure TOP-DOWN calls itself recursively to generate lower-level transitions, once a transition Y (1) to Y (1+1) has been taken at a given level. The process continues all the way down the hierarchy,until the lowest level is reached. There, control is returned to the calling procedure with no further lower-level action.

A combined Top-Down/Bottom-Up approach is presented in Figure IV.6. As discussed earlier this is a more controlled experiment. In this case, lower-level goals are learned to some goals extent before going to higher-level goals. Once lower-level goals have been learned, the same mechanism as a Top-Down experiment takes place. It is important to note that a complete Bottom-Up experiment would mean that the goals are completely learned before going to higher levels. In addition , when previously learned goals are tied from a higher-level goal there is a complete assurance that the path learned is the right one.
 
 

.