Information about http://cnd.memphis.edu/paper/kozma03.pdf

Learning Spatial Navigation Using Chaotic Neural Network…

Tags: activity patterns, amygdala, classification methods, cognitive maps, destabilization, dunn hall, emergence, neural activity, neural network model, olfactory system, pattern recognition, robert kozma, robustness, sensory input, septum, spatial navigation, statistical classification, temporal patterns, university of memphis, university of memphis memphis tn,
Pages: 4
Language: english
Created: Thu Apr 3 21:16:02 2003
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
              Learning Spatial Navigation Using Chaotic Neural Network Model
                                          Robert Kozma & Prashant Ankaraju
                                        Division of Computer Science, 373 Dunn Hall
                                       The University of Memphis, Memphis, TN 38152
                                                Email: rkozma@memphis.edu


Abstract ­ In this work the KIV model is used for the             state. This is a temporal burst process that lasts for about a
description of the interaction between the sensory and            hundred milliseconds [4]. A memory pattern is defined
cortical systems, the hippocampus, the amygdala, and the          therefore as a spatio-temporal process represented by the
septum. Neural activity patterns in KIV determine the             sequence of spatial AM patterns during a burst. KIII-based
emergence of global spatial encoding to implement the             modeling of the olfactory system is used to classify linearly
orientation function of a simulated animal. Our results           non-separable patterns. Its performance is compared with
embody the mechanisms, which we believe support the               those of statistical classification methods and multi-layer
generation of cognitive maps in the hippocampus, based            feed-forward neural network-based classifications. KIII
on the sensory input-based destabilization of cortical            compares favorably with these methods regarding robustness
spatio-temporal patterns. We illustrate learning results          and noise-tolerance of the pattern recognition, especially for
using the example of simulated navigation in a 2D                 classification of objects that are not linearly separable by any
environment.                                                      set of features [3].
                                                                      The next highest level of the K sets is the KIV model. As
                       I. INTRODUCTION                            in the case of all other K sets, the architecture and
                                                                  functionality of KIV is biologically motivated [5]. We extend
    The KIII model is a working example of the                    multiple KIII sets into a KIV set that models the interactions
implementation of chaotic principles in a computer software       in the cortical-hippocampal system. KIV is intended to have
environment. KIII exhibits several of the experimentally          the functionality of planning and selection of action, in
observed behaviors of brains, like robust pattern recognition     addition to classification and pattern recognition represented
and classification of input stimuli, and fast transitions         by single KIII units. KIV consists of three KIII sets, which
between brain states [1], [2], [3].                               model the cortical and hippocampal areas. All 3 are involved
     KIII consists of various sub-units; i.e., the KO, KI, and    with learning and memory. The hippocampus is strongly
KII sets. The KO set is a basic processing unit and its           involved in the cognitive processes of spatial and temporal
dynamics is described by a second order ordinary differential     orientation, like cognitive mapping and short-term memory
equation feeding into an asymmetric sigmoid function. By          [6], [7], [8].
coupling a number of excitatory and inhibitory KO sets, KIe           In the KIII and KIV models several types of learning rules
(excitatory) and KIi (inhibitory) sets are formed. Interaction    are used simultaneously, including habituation, Hebbian
of interconnected KIe and KIi sets forms the KII unit.            reinforcement learning, supervised learning, and global
Example of KI sets is the dentate gyrus. Examples of KII sets     stability control through normalization. All these learning
are the olfactory bulb, and prepyriform cortex. In the            methods exist in a subtle balance and their relative
hippocampus we have CA1, CA2, and CA3 as KII sets. By             importance changes at various stages of the memory process.
coupling KII sets with feed-forward and delayed feedback          Information is encoded in the KIII and KIV sets in the form
connections, one arrives at the KIII system. KIII shows           of dynamical oscillations of spatially distributed activity
excellent performance in learning new classes of training         patterns. It is hypothesized that the sequence of such activity
input data and it can generalize efficiently the classification   patterns during the theta cycle belong to the encoding of
of new test data.                                                 spatial clues in the form of cognitive maps.
    The operation of the KIII model is described as follows.          In this paper, we start with the description of the internal
In the absence of stimuli the system is in a high dimensional     organization of the Hippocampal Formation (HF) model, its
state of spatially coherent basal activity, which is governed     parts and interconnections. This is followed by the functional
by an aperiodic, nonconvergent global attractor. In response      description of the HF and its interaction with the sensory
to an external stimulus, the system activates a landscape of      cortex and limbic system. Next, details of learning processes
multiple attractors. It is kicked out of the basal state into a   in the HF and cortex model are given. We propose to
local basin of attraction, which is a memory wing. This wing      demonstrate the operation of the system, using example of
is usually of much lower dimension than the basal state. It       navigation conducted by the mobile agent EMMA in a simple
shows coherent and spatially patterned amplitude-modulated        2-dimensional environment. Our results show that the
(AM) fluctuations. The system resides in the localized wing       EMMA learns certain aspects of the environment and uses it
for the duration of the stimulus then it returns to the basal     for goal-oriented navigation.


0-7803-7898-9/03/$17.00 ©2003 IEEE                            1476
                                                                       The theta rhythm will be introduced in the numerical
       II. MAP BUILDING AND NAVIGATION USING                       experiments by providing the various KIII units with sensory
               DYNAMICAL PRINCIPLES                                stimuli periodically, at rates corresponding to the theta
                                                                   frequency. We can simulate the theta sampling in computer
   A. Behavioral patterns                                          experiments by designing a learning cycle as follows. Initial
                                                                   implementation of the model used here has been conducted in
    We use a simplified model of the internal motivational         [9]. We show pattern A to the system for a duration, say, 100
system with several internal senses. One is the battery level      ms, which corresponds to the drive period in the animal
of the robot, modeling the hunger of animals. Another two          experiments. This is followed by a period of 100 ms without
are the state of the drive and the tuning motors. The other        input pattern, corresponding to a resting part of the cycle.
variable is the exploration/curiosity drive that is expressed in   Afterward, a new pattern is shown, etc. This will generate a
a state variable that promotes forward motion with a random        period of 5 Hz to approximate theta cycle. We will have this
turning component. We define two basic behavioral modes:           5 Hz oscillatory behavior through the simulations. Various
     · wall following and                                          learning algorithms take place continuously or during the
     · object avoidance.                                           drive period, as it is described in the next section.
    In the wall following mode, the robot tries to stay close to
the wall, once it detected one. This is a more opportunistic
strategy. Object avoidance is implemented by turning away             III. RESULTS WITH REINFORCEMENT LEARNING
from objects upon contact, to avoid collision in the near
future. Object avoidance is an exploratory strategy. We            A. Description of the experiments
combine the above two strategies with changing relative
weights depending on the internal state. We use                        Incremental habituation and reinforcement learning are
predominantly object avoidance when the robot has plenty of        initiated by reinforcement signal during the theta gating. As
resources, and convert to more and more conservation, as the       the testbed, we use a simple 2D environment with several
internal resources are becoming depleted.                          obstacles. In this environment, the movement can take place
                                                                   along a grid, as the one shown in Fig.1. Consequently, at any
    In a very simple approach, we define a few basic               instance, the robot can chose the next move from one of the 8
behaviors and use those for the demonstration of the               direct neighbors of the given grid point.
orientation/navigation capabilities. These can be considered           The orientation signals are the distances and directions
as some reflexive behaviors that are hardwired in the motor        with respect to the landmarks, measured from the actual
system and actuators and will be able to solve the certain         location of the robot Sensory signals to the cortex can be the
relatively simple tasks. Of course, there are the basic            6 short-range infrared signals as used in the case of Khepera
behaviors we mentioned above, i.e., wall following or object       robot [10]. For the sensory signals, we consider the past
avoidance. Those will assure, that the robot is in constant        several time steps as inputs, in addition to the present time
movement without harming itself, while exploring the               frame.
environment. Here we define one additional behavior: backup.           Both habituation and Hebbian learning take place during
Backup behavior is invoked if the robot is stuck or cannot         the 100 ms window defined by the theta rhythm. Habituation
execute a chosen action. A wide range of problems can be           is implemented as a continual degradation of the sensory
solved with these simple behaviors, as it is demonstrated in       channels as they process sensory information. The
this study.                                                        degradation level is proportional to the signal magnitude. The
                                                                   proportionality constant is chosen in such a way that
   B. The role of theta gating                                     habituation diminishes the weights within several theta period.
                                                                   Habituation should not be overly dominant and should not
    Learning and adaptation is a key component of our model,       prevent Hebbian learning through reinforced channels at the
and it will be discussed in the next chapter. Here we discuss      same time period.
the time periodicity determined by the theta rhythm.                   Both habituation and reinforcement learning are
    The HF and cortex complete their functions by sampling         calculated based on the root mean square (RMS) intensity of
the environment at a theta rate. To achieve this periodicity,      the gamma-filtered signals at each node, using filter band of
KIV relies on the septum to generate the theta frame rate as a     20Hz to 60Hz. Both habituation and Hebbian learning
gating function. Temporal framing is done in all sensory           constants are experimentally tuned to have optimum learning
systems.      Examples of this sampling are the saccadic           performance in the cortical and hippocampal KIII sets.
movement in visual system, sniffing in olfaction, perhaps
something similar in the cochlea etc. The present model is
simplified by having a single gate generator for all
environmental samplings, which is located in the septum.



                                                               1477
    Figure 1: Random exploration of the environment without                Figure 2: The path of the agent when reinforcement
reinforcement learning. The `Start' and `Goal' locations are           learning takes place. Note the significantly reduced path
indicated by arrows.                                                   length after learning, c.f. Fig. 1.

   B. Dynamical map generation                                             At each episode of being stuck, as behavioral response,
                                                                       we use the `back up' motion. We use back when the stuck
    At first, the only landmark the animal is given is the             state is detected and the reinforcement learning has been
`Start' beacon, which is set by the human controller. In an            initiated. The back up action simply can be a step in a
explorative mode, the home acts as a repeller with a                   direction the animal came from. This information is available,
monotonic gradient field centered at the home, that drives the         as the sensory signal at each time instant includes the present
animal away from home. At each step, the animal's action is            and a few (8) recent time frames as well. An example of such
determined by its present position and the location of home.           exploration is shown in Fig. 1. It took the system about 250
In other words, it goes straight away from home, until it              steps to get from `Start' [0, 0] to the `Goal' [80, 60].
meets an obstacle. Constrained by the obstacle, it continues               It should be noted that the direction of the next step is
its path along the steepest possible gradient. Soon or later it        selected based on the above algorithm, complemented with a
will not be able to move further, it stuck. That is a conflict,        small random noise component. We used this additive noise
which generates a reinforcement signal to learn.                       to simulate real life uncertainties, and also to avoid the
Reinforcement learning takes place both in the cortex OB/PC            system to get stuck deterministically in certain repeating
("What?") and in the hippocampus CA3/CA1 ("Where?").                   situations. In the present experiments, additive noise has been
    The above learning mechanisms are complemented with                selected at the level of 3%. This means that in 3% of the
the following algorithm to form additional landmarks based             cases the system select a direction that has not been
on the experience during exploration. When the animal is               determined as the optimal one based on the given learning
stuck, the controller is notified about this event and its             level.
location. As a result, a new landmark is generated and its                 Once the exploration phase has been conducted
position is added to the existing ones. From now on, the               extensively, we can switch the home beacon to `attract'
animal gets orientation signals from all the beacons,                  mode. In order to test the system's performance, we re-start it
including this new one. The sign of this one will be negative,         from home and give a goal location to go to. If the robot is
which will be taken into account as a vector sum of field              properly learned the environment, it will navigate efficiently
intensities at each time step when the decision about the next         and find a reasonably optimal path to the goal based on the
action (step direction) is made. As this new beacon will be            combined use of the internally formed cognitive map, using
repeller, it is less likely that the animal gets close to it another   only the home beacon and its classification landscape learned
time.                                                                  in the cortical areas. This is illustrated in Fig. 2. After
    Based on this approach, each learning experience                   learning, the length of the trajectory from `Home' to `Goal' is
establishes a positive or negative orientation beacon in the           reduced to about 40 steps.
environment, or more concretely, at the location of the event
in the map of the controller, and also in the cognitive map               IV. DISCUSSION OF THE RESULTS
being constructed and maintained in the HF, as the roving
device explores in search of positive goals and avoiding                  Results of the previous section clearly demonstrate that
negative sites.                                                        our leaning algorithm produces significant learning gains,
                                                                       which are converted into improved navigation through the


                                                                   1478
environment. Now we evaluate in details the nature of the models have shown robust performance as classification and
observed gains.                                                       pattern recognition devices. With this new advancement, we
    We have conducted 50 independent sessions of numerical have expanded the potential application areas of the K sets
experiments of navigation through the environment with and from the classification task to a more complex decision
without learning. The experiments were conducted until the making and behavioral generation domains.
robot reached the target. However, we have terminated the
experiment after 300 steps, for practical reasons. If a session                             ACKNOWLEDGMENT
lasted very long, it meant that the robot has got stuck at a
particular location, and it took very long time to escape, or it          This research is supported by NASA grant NCC-2-1244
could not get away at all from such a `trap.' Getting trapped and NSF grant EIA-0130352.
in a corner in spite of the learning advances could not be
excluded. Even the probabilistic component of the decision                                      REFERENCES
making at each step could not completely remedy such
potential problems. In order to avoid a bias caused by the [1] Chang H.J. & Freeman W.J. (1996) Parameter optimization
deformation of the distribution function of the paths in the             in models of the olfactory system, Neural Networks, Vol.
experiments, we have excluded paths with length of 300 or                9, pp. 1-14.
more from the present analysis.                                     [2] Freeman, W.J. (2000) Neurodynamics. An exploration of
                                                                         mesoscopic brain dynamics. London UK. Springer Verlag.
                                                                     [3] R. Kozma, W.J. Freeman, "Chaotic Resonance ­ Methods
                                                                         and Applications of Noisy and Variable Patterns," Int. J.
                                                                         Bifurcation & Chaos, 11(6), 2307-2322, 2001.
                                                                    [4] Barrie J.M., Freeman W.J., Lenhart M.D. (1996)
                                                                         Spatiotemporal analysis of prepyriform, visual, auditory,
                                                                         and somesthetic surface EEGs in trained rabbits, J. of
                                                                         Neurophysiology, 76: 520-539.
                                                                     [5] R. Kozma, W.J. Freeman, P. Erdi , "The KIV Model ­
                                                                         Nonlinear spatio-temporal dynamics of the primordaial
                                                                         vertebrate forebrain," Neurocomputing, 2003, in press.
                                                                    [6] Arleo, A. and Gerstner, W. (2000) Spatial cognition and
                                                                         neuro-mimetic navigation: A model of hippocampal place
                                                                         cell activity. Biological Cybernetics, 83: 287-299.
                                                                    [7] Bliss, T.V.P., and Lomo, T. (1973) Long-lasting
                                                                         potentiation of synaptic transmission in the denate area of
                                                                         the anaesthetized rabbit following simulation of perforant
    Figure 3: Comparison of the average traveled path                    path. J. Physiol. 232: 331-356.
without learning and with reinforcement learning as the             [8] Burgess, N., Recce, M., and O'Keefe, J. (1994) A model of
function of the distance between the goal and the starting               hippocampal function. Neural Networks, 7 (6/7): 1065-
position.                                                                1081.
                                                                    [9] Ankaraju, P. "The Hierarchy of K sets ­ From Pattern
    In Figure 3, the average length of the path traveled by the          Recognition to Navigation," Masters' Thesis, The
robot is shown, as the function of the distance between the              University of Memphis, 2002.
start and goal locations. For simplicity, the start has been [10] Harter, D., Kozma, R., Graesser, A. (2001) Models of
always at position [0, 0], while the goal has been moved                 Ontogenetic Development for Autonomous Adaptive
across the diagonal of the map, up to position [100, 100]. For           Systems. Proc. 23rd Annual Conference of the Cognitive
small distances the learning gain is relatively small. One can           Science Society, CogSci'02, pp. 405-410.
see that the learning caused a reduction of the travel by about
half. Clearly, this result is far from perfect. Still, it indicates
the potential of our method.

                       V. CONCLUSIONS

   We have introduced a novel method of learning spatial
maps using hippocampal model, as part of the KIV set. We
have demonstrated the feasibility of the methodology, and
showed that K models are promising dynamic chaos neural
networks to address navigation tasks. Previously, the KIII


                                                               1479