Tags: activity patterns, amygdala, classification methods, cognitive maps, destabilization, dunn hall, emergence, neural activity, neural network model, olfactory system, pattern recognition, robert kozma, robustness, sensory input, septum, spatial navigation, statistical classification, temporal patterns, university of memphis, university of memphis memphis tn,
Learning Spatial Navigation Using Chaotic Neural Network Model
Robert Kozma & Prashant Ankaraju
Division of Computer Science, 373 Dunn Hall
The University of Memphis, Memphis, TN 38152
Email: rkozma@memphis.edu
Abstract In this work the KIV model is used for the state. This is a temporal burst process that lasts for about a
description of the interaction between the sensory and hundred milliseconds [4]. A memory pattern is defined
cortical systems, the hippocampus, the amygdala, and the therefore as a spatio-temporal process represented by the
septum. Neural activity patterns in KIV determine the sequence of spatial AM patterns during a burst. KIII-based
emergence of global spatial encoding to implement the modeling of the olfactory system is used to classify linearly
orientation function of a simulated animal. Our results non-separable patterns. Its performance is compared with
embody the mechanisms, which we believe support the those of statistical classification methods and multi-layer
generation of cognitive maps in the hippocampus, based feed-forward neural network-based classifications. KIII
on the sensory input-based destabilization of cortical compares favorably with these methods regarding robustness
spatio-temporal patterns. We illustrate learning results and noise-tolerance of the pattern recognition, especially for
using the example of simulated navigation in a 2D classification of objects that are not linearly separable by any
environment. set of features [3].
The next highest level of the K sets is the KIV model. As
I. INTRODUCTION in the case of all other K sets, the architecture and
functionality of KIV is biologically motivated [5]. We extend
The KIII model is a working example of the multiple KIII sets into a KIV set that models the interactions
implementation of chaotic principles in a computer software in the cortical-hippocampal system. KIV is intended to have
environment. KIII exhibits several of the experimentally the functionality of planning and selection of action, in
observed behaviors of brains, like robust pattern recognition addition to classification and pattern recognition represented
and classification of input stimuli, and fast transitions by single KIII units. KIV consists of three KIII sets, which
between brain states [1], [2], [3]. model the cortical and hippocampal areas. All 3 are involved
KIII consists of various sub-units; i.e., the KO, KI, and with learning and memory. The hippocampus is strongly
KII sets. The KO set is a basic processing unit and its involved in the cognitive processes of spatial and temporal
dynamics is described by a second order ordinary differential orientation, like cognitive mapping and short-term memory
equation feeding into an asymmetric sigmoid function. By [6], [7], [8].
coupling a number of excitatory and inhibitory KO sets, KIe In the KIII and KIV models several types of learning rules
(excitatory) and KIi (inhibitory) sets are formed. Interaction are used simultaneously, including habituation, Hebbian
of interconnected KIe and KIi sets forms the KII unit. reinforcement learning, supervised learning, and global
Example of KI sets is the dentate gyrus. Examples of KII sets stability control through normalization. All these learning
are the olfactory bulb, and prepyriform cortex. In the methods exist in a subtle balance and their relative
hippocampus we have CA1, CA2, and CA3 as KII sets. By importance changes at various stages of the memory process.
coupling KII sets with feed-forward and delayed feedback Information is encoded in the KIII and KIV sets in the form
connections, one arrives at the KIII system. KIII shows of dynamical oscillations of spatially distributed activity
excellent performance in learning new classes of training patterns. It is hypothesized that the sequence of such activity
input data and it can generalize efficiently the classification patterns during the theta cycle belong to the encoding of
of new test data. spatial clues in the form of cognitive maps.
The operation of the KIII model is described as follows. In this paper, we start with the description of the internal
In the absence of stimuli the system is in a high dimensional organization of the Hippocampal Formation (HF) model, its
state of spatially coherent basal activity, which is governed parts and interconnections. This is followed by the functional
by an aperiodic, nonconvergent global attractor. In response description of the HF and its interaction with the sensory
to an external stimulus, the system activates a landscape of cortex and limbic system. Next, details of learning processes
multiple attractors. It is kicked out of the basal state into a in the HF and cortex model are given. We propose to
local basin of attraction, which is a memory wing. This wing demonstrate the operation of the system, using example of
is usually of much lower dimension than the basal state. It navigation conducted by the mobile agent EMMA in a simple
shows coherent and spatially patterned amplitude-modulated 2-dimensional environment. Our results show that the
(AM) fluctuations. The system resides in the localized wing EMMA learns certain aspects of the environment and uses it
for the duration of the stimulus then it returns to the basal for goal-oriented navigation.
0-7803-7898-9/03/$17.00 ©2003 IEEE 1476
The theta rhythm will be introduced in the numerical
II. MAP BUILDING AND NAVIGATION USING experiments by providing the various KIII units with sensory
DYNAMICAL PRINCIPLES stimuli periodically, at rates corresponding to the theta
frequency. We can simulate the theta sampling in computer
A. Behavioral patterns experiments by designing a learning cycle as follows. Initial
implementation of the model used here has been conducted in
We use a simplified model of the internal motivational [9]. We show pattern A to the system for a duration, say, 100
system with several internal senses. One is the battery level ms, which corresponds to the drive period in the animal
of the robot, modeling the hunger of animals. Another two experiments. This is followed by a period of 100 ms without
are the state of the drive and the tuning motors. The other input pattern, corresponding to a resting part of the cycle.
variable is the exploration/curiosity drive that is expressed in Afterward, a new pattern is shown, etc. This will generate a
a state variable that promotes forward motion with a random period of 5 Hz to approximate theta cycle. We will have this
turning component. We define two basic behavioral modes: 5 Hz oscillatory behavior through the simulations. Various
· wall following and learning algorithms take place continuously or during the
· object avoidance. drive period, as it is described in the next section.
In the wall following mode, the robot tries to stay close to
the wall, once it detected one. This is a more opportunistic
strategy. Object avoidance is implemented by turning away III. RESULTS WITH REINFORCEMENT LEARNING
from objects upon contact, to avoid collision in the near
future. Object avoidance is an exploratory strategy. We A. Description of the experiments
combine the above two strategies with changing relative
weights depending on the internal state. We use Incremental habituation and reinforcement learning are
predominantly object avoidance when the robot has plenty of initiated by reinforcement signal during the theta gating. As
resources, and convert to more and more conservation, as the the testbed, we use a simple 2D environment with several
internal resources are becoming depleted. obstacles. In this environment, the movement can take place
along a grid, as the one shown in Fig.1. Consequently, at any
In a very simple approach, we define a few basic instance, the robot can chose the next move from one of the 8
behaviors and use those for the demonstration of the direct neighbors of the given grid point.
orientation/navigation capabilities. These can be considered The orientation signals are the distances and directions
as some reflexive behaviors that are hardwired in the motor with respect to the landmarks, measured from the actual
system and actuators and will be able to solve the certain location of the robot Sensory signals to the cortex can be the
relatively simple tasks. Of course, there are the basic 6 short-range infrared signals as used in the case of Khepera
behaviors we mentioned above, i.e., wall following or object robot [10]. For the sensory signals, we consider the past
avoidance. Those will assure, that the robot is in constant several time steps as inputs, in addition to the present time
movement without harming itself, while exploring the frame.
environment. Here we define one additional behavior: backup. Both habituation and Hebbian learning take place during
Backup behavior is invoked if the robot is stuck or cannot the 100 ms window defined by the theta rhythm. Habituation
execute a chosen action. A wide range of problems can be is implemented as a continual degradation of the sensory
solved with these simple behaviors, as it is demonstrated in channels as they process sensory information. The
this study. degradation level is proportional to the signal magnitude. The
proportionality constant is chosen in such a way that
B. The role of theta gating habituation diminishes the weights within several theta period.
Habituation should not be overly dominant and should not
Learning and adaptation is a key component of our model, prevent Hebbian learning through reinforced channels at the
and it will be discussed in the next chapter. Here we discuss same time period.
the time periodicity determined by the theta rhythm. Both habituation and reinforcement learning are
The HF and cortex complete their functions by sampling calculated based on the root mean square (RMS) intensity of
the environment at a theta rate. To achieve this periodicity, the gamma-filtered signals at each node, using filter band of
KIV relies on the septum to generate the theta frame rate as a 20Hz to 60Hz. Both habituation and Hebbian learning
gating function. Temporal framing is done in all sensory constants are experimentally tuned to have optimum learning
systems. Examples of this sampling are the saccadic performance in the cortical and hippocampal KIII sets.
movement in visual system, sniffing in olfaction, perhaps
something similar in the cochlea etc. The present model is
simplified by having a single gate generator for all
environmental samplings, which is located in the septum.
1477
Figure 1: Random exploration of the environment without Figure 2: The path of the agent when reinforcement
reinforcement learning. The `Start' and `Goal' locations are learning takes place. Note the significantly reduced path
indicated by arrows. length after learning, c.f. Fig. 1.
B. Dynamical map generation At each episode of being stuck, as behavioral response,
we use the `back up' motion. We use back when the stuck
At first, the only landmark the animal is given is the state is detected and the reinforcement learning has been
`Start' beacon, which is set by the human controller. In an initiated. The back up action simply can be a step in a
explorative mode, the home acts as a repeller with a direction the animal came from. This information is available,
monotonic gradient field centered at the home, that drives the as the sensory signal at each time instant includes the present
animal away from home. At each step, the animal's action is and a few (8) recent time frames as well. An example of such
determined by its present position and the location of home. exploration is shown in Fig. 1. It took the system about 250
In other words, it goes straight away from home, until it steps to get from `Start' [0, 0] to the `Goal' [80, 60].
meets an obstacle. Constrained by the obstacle, it continues It should be noted that the direction of the next step is
its path along the steepest possible gradient. Soon or later it selected based on the above algorithm, complemented with a
will not be able to move further, it stuck. That is a conflict, small random noise component. We used this additive noise
which generates a reinforcement signal to learn. to simulate real life uncertainties, and also to avoid the
Reinforcement learning takes place both in the cortex OB/PC system to get stuck deterministically in certain repeating
("What?") and in the hippocampus CA3/CA1 ("Where?"). situations. In the present experiments, additive noise has been
The above learning mechanisms are complemented with selected at the level of 3%. This means that in 3% of the
the following algorithm to form additional landmarks based cases the system select a direction that has not been
on the experience during exploration. When the animal is determined as the optimal one based on the given learning
stuck, the controller is notified about this event and its level.
location. As a result, a new landmark is generated and its Once the exploration phase has been conducted
position is added to the existing ones. From now on, the extensively, we can switch the home beacon to `attract'
animal gets orientation signals from all the beacons, mode. In order to test the system's performance, we re-start it
including this new one. The sign of this one will be negative, from home and give a goal location to go to. If the robot is
which will be taken into account as a vector sum of field properly learned the environment, it will navigate efficiently
intensities at each time step when the decision about the next and find a reasonably optimal path to the goal based on the
action (step direction) is made. As this new beacon will be combined use of the internally formed cognitive map, using
repeller, it is less likely that the animal gets close to it another only the home beacon and its classification landscape learned
time. in the cortical areas. This is illustrated in Fig. 2. After
Based on this approach, each learning experience learning, the length of the trajectory from `Home' to `Goal' is
establishes a positive or negative orientation beacon in the reduced to about 40 steps.
environment, or more concretely, at the location of the event
in the map of the controller, and also in the cognitive map IV. DISCUSSION OF THE RESULTS
being constructed and maintained in the HF, as the roving
device explores in search of positive goals and avoiding Results of the previous section clearly demonstrate that
negative sites. our leaning algorithm produces significant learning gains,
which are converted into improved navigation through the
1478
environment. Now we evaluate in details the nature of the models have shown robust performance as classification and
observed gains. pattern recognition devices. With this new advancement, we
We have conducted 50 independent sessions of numerical have expanded the potential application areas of the K sets
experiments of navigation through the environment with and from the classification task to a more complex decision
without learning. The experiments were conducted until the making and behavioral generation domains.
robot reached the target. However, we have terminated the
experiment after 300 steps, for practical reasons. If a session ACKNOWLEDGMENT
lasted very long, it meant that the robot has got stuck at a
particular location, and it took very long time to escape, or it This research is supported by NASA grant NCC-2-1244
could not get away at all from such a `trap.' Getting trapped and NSF grant EIA-0130352.
in a corner in spite of the learning advances could not be
excluded. Even the probabilistic component of the decision REFERENCES
making at each step could not completely remedy such
potential problems. In order to avoid a bias caused by the [1] Chang H.J. & Freeman W.J. (1996) Parameter optimization
deformation of the distribution function of the paths in the in models of the olfactory system, Neural Networks, Vol.
experiments, we have excluded paths with length of 300 or 9, pp. 1-14.
more from the present analysis. [2] Freeman, W.J. (2000) Neurodynamics. An exploration of
mesoscopic brain dynamics. London UK. Springer Verlag.
[3] R. Kozma, W.J. Freeman, "Chaotic Resonance Methods
and Applications of Noisy and Variable Patterns," Int. J.
Bifurcation & Chaos, 11(6), 2307-2322, 2001.
[4] Barrie J.M., Freeman W.J., Lenhart M.D. (1996)
Spatiotemporal analysis of prepyriform, visual, auditory,
and somesthetic surface EEGs in trained rabbits, J. of
Neurophysiology, 76: 520-539.
[5] R. Kozma, W.J. Freeman, P. Erdi , "The KIV Model
Nonlinear spatio-temporal dynamics of the primordaial
vertebrate forebrain," Neurocomputing, 2003, in press.
[6] Arleo, A. and Gerstner, W. (2000) Spatial cognition and
neuro-mimetic navigation: A model of hippocampal place
cell activity. Biological Cybernetics, 83: 287-299.
[7] Bliss, T.V.P., and Lomo, T. (1973) Long-lasting
potentiation of synaptic transmission in the denate area of
the anaesthetized rabbit following simulation of perforant
Figure 3: Comparison of the average traveled path path. J. Physiol. 232: 331-356.
without learning and with reinforcement learning as the [8] Burgess, N., Recce, M., and O'Keefe, J. (1994) A model of
function of the distance between the goal and the starting hippocampal function. Neural Networks, 7 (6/7): 1065-
position. 1081.
[9] Ankaraju, P. "The Hierarchy of K sets From Pattern
In Figure 3, the average length of the path traveled by the Recognition to Navigation," Masters' Thesis, The
robot is shown, as the function of the distance between the University of Memphis, 2002.
start and goal locations. For simplicity, the start has been [10] Harter, D., Kozma, R., Graesser, A. (2001) Models of
always at position [0, 0], while the goal has been moved Ontogenetic Development for Autonomous Adaptive
across the diagonal of the map, up to position [100, 100]. For Systems. Proc. 23rd Annual Conference of the Cognitive
small distances the learning gain is relatively small. One can Science Society, CogSci'02, pp. 405-410.
see that the learning caused a reduction of the travel by about
half. Clearly, this result is far from perfect. Still, it indicates
the potential of our method.
V. CONCLUSIONS
We have introduced a novel method of learning spatial
maps using hippocampal model, as part of the KIV set. We
have demonstrated the feasibility of the methodology, and
showed that K models are promising dynamic chaos neural
networks to address navigation tasks. Previously, the KIII
1479