V

V. HCLSA SIMULATION

The purpose of this chapter is threefold. First, we discuss the details of the PL/I program used to simulate and test the model, then we discuss the simulation outputs and finally we present a discussion on the verification and limitations of simulation.

V.1. Implementation

The program was developed in PL/I to take advantage of the following programming language capabilities:

Rich Data Structure
Dynamic Memory
Recursion
Structured Programming

The program was to run on an IBM 370/148 (later 4341). It was later converted to the more interactive environment of a DEC-VAX/780, where PL/I was also supported.

V.1.1. Data Structures

One of the most important aspects in any complex programming implementation is the proper selection of data structures or data types. For our case we defined data structures in a linked list organization that represented the various sets of variable given by the formal definitions (48), (49) and (50). Since most of the information carried by the HCLSA is dynamic in nature, i.e., new CLSA's are to be created for new goals, we chose to make extensive use of POINTER type variables and BASED type memory. All the information resides in the main memory. Figure V.1 presents the linked-list organization for the main structures and Figure V.2 gives a brief description of each one of them.

V.1.2. Program Structures

The program, based on the algedonic algorithm described in Chapters III and Iv, is divided into 23 internal procedures arranged in the hierarchical organization given in Figure V.3. Because of the nature of the model, one of the procedures (HIERARCHY) is recursive, generating stacked copies of some required variables.

V.1.3. Information Coding

An interesting aspect to discuss in the implementation of the model is that of information coding. By this we mean the way in which some non-numerical information is coded to be handled properly. Although PL/I has some string-oriented facilities, it certainly lacks many of the nice features of pattern matching found in languages like SNOBOL. Therefore, we decided to code the production rules of the HCLSA in the following fashion:

1) IF for a given coordinate x, its increment delta x is given by: delta x ----> 0 .. +n
- THEN the code for such increment is the string 'n2'
2) IF for given coordinate x, its increment delta x is given by: delta x ----> -n .. +n
- THEN the code for such increment is the string 'nl'
3) IF for a pair of coordinates x, y, the increment delta ( x ,y) is given by: delta (x, y) -----> (-n..+n,0 )
- THEN the code for such increment is the string 'n001'
4) IF for a pair of coordinates x, y the increment delta ( x , y) is given by delta (x, y) -----> (0, -n..+n )
- THEN the code for such increment is the string'01n0'

V.1.4. Generation and Selection of Transition Matrices

An important aspect to comment on is that of dynamic generation of transitions matrices. In the simulation, entries in MATGEN and LSTATES are dynamically generated; LSTATES also has a pointer to the subsequent transitions. This is arranged through the use of TRANPO, which includes the coded value of a possible transition and probability of selecting it. The procedure required for this work are GENTRA, OBTAIN and COMPUTE. The selection process for a given transition is done stochastically in the simulation. The first step in the selection is to obtain a pseudo-random number. This is achieved by function RANINT that, by using a multiplicative method, delivers a number normalized in the interval (0,1). Then CHOOSE traverses the list TRANPO until a cumulative transition probability value is larger than the random number generated and selects the corresponding output state.

V.1.5. Algedonic Information

Before applying any compensation to the transition matrices, it is necessary to obtain a value for the compensation. In our HCLSA model, this was achieved in two different ways. For the case of the root CLSA, it is the environment which directly compares the vector of newly generated transitions with the truth vector. To simulate this, the program includes the function ENVIRON. The function delivers and integer value representing the number of erroneous transitions taken by the automaton. The function uses the lists ENV-INFOR and RITRAN (Right Transition) and allows for alternate paths to reach the desired goal. In the case of CLSA,s not located at the root, it is the responsibility of the calling CLSA to supply the algedonic information needed to determine the compensation. As we discussed in Chapter IV, the parent HCLSA knows only the supposed final location and none of the transitions. Based upon this it must provide a number that represents some nearness to the desired location. Therefore, the simulation includes the function TOPCLSA, that delivers an integer value representing how close to the desired location the automaton has come after having completed all the transitions. For our simulation we measure nearness as the direct distance between the final state and the goal state.

V.1.6. Algedonic Compensation

The compensation or modification of the transition matrices is achieved in the simulation by means of procedure ALGEDONIC, This procedure allows up to eight different schemes of compensation; each one of them represents the codification of the algorithms discussed in Chapter III. The compensation in the model is applied to individual CLSA's. As the model prescribes no regard is given to the level in the hierarchy where the CLSA is located. The only pieces of information required are thetransitions taken (LOGPTR) by the CLSA and the algedonic information provided either by the TOPCLSA or by the ENVIRON functions previously described.

V.1.7. Hierarchical Learning

After the presentation of the various procedures and functions of the simulation, we now discuss the most important concept, the simulated modes of hierarchical learning. This is achieved by two procedures: LEARNING and HIERARCHY. The code for both is presented in Figures V.4 and V.5. The procedures fully implement the ideas discussed in Chapter IV. In our simulation, learning is treated as a combined Top-Down/Bottom-Up coordination process, treating total Bottom-Up and Total Top-Down as special cases. This desired selection is done by selecting the parameter SELECT as a fractional value of the degree of coordination desired. I SELECT is set to 1.0 then the coordination is totally Bottom-Up; if SELECT is set to 0.0 then the coordination is totally Top-Down. Values for SELECT between 0.5 and 0.0 provide for weighted combinations of Top-Down/bottom-Up coordination.

V.2. Simulation Outputs

We include here a discussion on the various output listings delivered through a simulation run. One of the input parameters allows us to select the degree of detail output desired in the run, i.e., DEBUG MODE. In this way, the debugging task was rather straightforward. For normal operation., i.e., DEBUG MODE = 0 the program delivers the following results:

1) Echo of the Input Data:

- Parameters selected for the run
- Structure of the plane where the automaton is located
- Code for the production rules at each level and for each type
- Initial State
- Desired State

II) Computed Date:

- Total number of dynamic structures required in the run
- For each level, type and goal, the following data is reported in tabular form:
. Trial number
. Level
. Type
. Goal
. Number of trials required to achieve the goal at this level
. Number of trials required to achieve the goal including all levels
. CPU time (unreliable)
. Average probability of the selected path
. Average normalized entropy of selected path
. Joint probability of the selected path
. Joint normalized entropy of selected path
. Number of new dynamic structures needed

To explain each one of these values, let us look at the sample table presented in Figure V.7; the table was obtained with the input data of Figure V.6.The total number of LSTATES structures allocated in the run was 88; such a value indicates that 88 transitions were dynamically created for all levels and types while the automaton was learning the various goals. Looking at the first four rows of the table, the data corresponds to the goal represented by the displacements (x,y,t) = (1,-1,2) for level 4 and type 0. Row 1 states that the automaton required 2 trials to reach the goal for the first time; the probability of the selection path was 0.18250 and the average entropy was 0.98339. Eleven new structures had to be dynamically allocated.

Row 4 states that is only took one trial to reach the goal. Furthermore, the average probability increased significantly to 0.94188, meaning that the matrix is almost saturated (i.e., deterministic). Obviously, the entropy value had to decrease and reached a value of 0.18267 indicating that the information was fairly well organized. the automaton did not allocate any further structures.

V.3. Simulation Considerations

In order to verify the correct operation of the model, we ran a set of benchmark programs which delivered the desired outputs as well as some further insights of the program. The benchmarks included the test of

A random HCLSA
A combined Top-Down/bottom-Up HCLSA
Totally Top-Down HCLSA
Totally Bottom-Up HCLSA
Tutorialized HCLSA

Figure V.9. presents the comparative performance of these HCLSA's for the following parameters:

Theta constant = 1
Number of Levels in the Hierarchy = 2
Number of Goals to be Learned = 6

We now present some interesting examples of each one of these HCLSA's. Let us begin with a random HCLSA. As discussed in Chapter IV, a random HCLSA does not update its probabilities after each interaction with its environment. Such behavior can be observed in Figure V.10. In the figure we see how the number of trials required to reach a goal does not decrease as the interaction number increases. It can also be observed how the average probability and entropy values are not modified at all.

Another interesting automaton is the one using a tutorializing algedonic scheme; the moment the automaton finds the correct path, it saturates the transition matrices, turning them deterministic ones. Figure V.11 gives a sample output of such goal, the probability of selecting it becomes 1.0 and the entropy goes down to 0.0. Therefore, the next interaction requires only one trial to reach the goal. Note that for the case of level 3, the overall trial count has a value of 4. This because the CLSA used at level 3 required 3 lower-level CLSA's to get to the goal. Note that although the path is already learned, the hierarchical nature of the model requires a call to the lower level CLSA's at least one for each transition at the higher level.

We conclude this chapter with a discussion on the various considerations taken during the simulation, a well as a discussion of the limitations of the current implementation of the HCLSA model. To test any set of input parameters, we ran various cases in order to obtain a reasonable sample of the mean. Borrowing an idea from Gordon (21), each case desired was run using different random seeds, obtaining in this way independent samples. We took samples of 20 to 30 independent runs Lower values were considered meaningless and higher values did not provide significant variations to both the mean and the standard deviation. The number of cases per sample agrees also with the number used by Viswanathan and Narendra in their simulation studies (67).

The program allows for up to 4 levels in the hierarchy, where level 1 is the highest and level 4 the lowest. When testing the various hypotheses of this work, we used generally 3 and 4 levels. In order to have a significant measure of the amount of memory required in each run, we used the number of dynamic structures generated during each run. Although the program allows for various algedonic schemes, we used basically a linear scheme with expectation and delta error modifiers, i.e., Scheme = 5. As discussed in chapter III, this scheme was initially proposed by Templeman (55). The reason for this selection lies in the fact that such modifications avoid saturation of the transition matrices. The process of continuous reinforcement may be dangerous to adaptation; if a lower levels path is highly reinforced, the model turns deterministic. An example of this is presented in Figure V.11.

When measuring the performance of a given coordination value, the following methodology was used:

1) The mean value, MH, of the trial count for a benchmark case was obtained, as it is a measure of the time used to learn the goal.
2) The mean value, MH, of the trial count to learn lower level goals was computed, as it is a measure of previous Bottom-Up learning.
3) The value MH was multiplied by both the degree of coordination, i.e., LEARN; and by the number of subgoals, NG, to be used at the lower level. This value represents the amount of time spent in any previous Bottom-Up learning.
4) Hence, compound means were calculated as:
MC= MH + (ML * LEARN * NG)
5) The final values of MC were the data compared for the various coordination values tested.

The previous computation was used since we feel it takes into account the amount of time spent learning lower-level goals in any combined Top-Down/ Bottom-Up approach, as discussed in Chapter IV. Although the program does allow for testing coordination values in the range of 0.0 to 1.0 in steps of 0.1, it is important to state that the simulations ran for values < 0,5 were not able to converge, or converged very slowly taking an immense 3.00 minutes of CPU time for each run. These were not investigated further as they proved too inefficient to be considered for any optimality test.

Since we used sample sizes smaller than 30, in order to obtain true mean ranges for all the samples taken, we used the regular two tailed student t distribution. As it can be seen in Appendix E., in some cases the standard deviations were rather large; such behavior was investigated using larger samples without obtaining significant variations.