Lappeenranta University of Technology
Department of Information Technology

Reinforcement learning for artificial creatures
Oleg Daviduyk, Danila Medvedev
daviduyk@lut.fi, medvedev@lut.fi
Course instructors: Jan Voracek, Saku Kukkonen
Lappeenranta, Finland
2002
Abstract
Reinforcement learning is one of the techniques, used in building an artificial intelligence. With reinforced learning the AI agent receives rewards and punishments (negative rewards) from the supervising system. The agent dynamically adapts its behaviour to maximise the total reward. In this paper we experimented with training an AI-controlled creature in the Black and White computer game. The creature was trained to perform several simple tasks: picking objects from the ground, throwing objects at a target, destroying buildings by throwing stones at them.
Keywords: reinforcement learning, character-based AI, clicker training, worlds with objects
In reinforcement learning an agent selects or is given a goal and then follows it to receive more rewards (also called reinforcements) [9]. When we put this agent into a complex environment with a large number of objects, with other agents and complex relationships between them, this environment becomes similar to the real world. Increasing the complexity of the environment leads to increase in the number of states and state/action pairs. This makes it very interesting to observe the interaction with the world and the learning processes of the agent, as some unexpectedly complex actions become possible in this environment.
The feasibility of training of an artificial creature with its own interests and goals as opposed to implicit programming is very interesting. It is important to determine to what extent such agents can be trained using a variety of methods, such as “clicker training”.
There was some interesting work already done in this field. The best example of the reinforcement learning driven application is Sydney K 9.0 (created in the Media Laboratory of MIT). Sydney is a synthetic dog that can interact with humans via a set of sensors. Human participant is presented as another creature – Fridge. Sydney can be trained to perform simple tasks by rewarding it and using a stick to guide its attention [3].
We are interested in the field of Game AI, specifically, learning of AI entities in world with objects. This project involves practical experiments with one of successful examples of AI. To avoid the difficulties, caused by the great complexity of the reality, we will use a virtual creature in a simplified simulated world.
In this project we used the “Black and White” strategy game created by the Lionhead Studios Ltd (www.lionhead.com). The game revolves around an AI-controlled creature that can be trained by the player. There is very little direct control of the creature by the player, but the range of possible tasks and actions is very wide. We designed and implemented a training experiment with an AI creature from this game, based on reinforced learning. We took training strategies from one of the web sites dedicated to the game and followed the instructions given there to train the creature to perform some simple tasks. All steps of the training process were recorded.
In the first part of the paper we give an overview of two AI areas: character-based AI in games and interaction of AI with object-based environments. We also provide general information about the implementation of AI in Black and White game. In the next section we explain the methods used in our experiment. Then we give detailed account of the experiment, divided into five training sessions. Finally we analyse the results and make conclusions.
The character-based AI is a simulation of behaviour by artificial creatures (agents). These characters receive complete brains and a body from developers. In case of a game it can be called virtual embodiment [1]. Such creature becomes an independent being, which behaves in a lifelike way.
The main problem in every character-based AI is the perception of the world by a creature. The character in the world co-exists with other objects and creatures, so perception is very important for successful AI application. Perception gives necessary input for the character-based AI and allows it to make decisions and act – perception also influences the decision-making process.
How can perception influence the model of the world? We must keep in mind that usually the character-based AI does not deal with the model of the whole world, but uses a limited model. One of the reasons for this is sensory honesty. This means that the number of possible states in the environment can be reduced, according to the limits of agent’s sensors (in the case of simulated environment, the sensors are also simulated and therefore the sensory limits can be arbitrarily set by the programmer), we set limits to the model of the real world that is used by character-based AI. In this way the amount of processing that has to be done by the AI algorithm is reduced.
In this short overview we will not cover other topics of perception such as pattern recognition and so on, we will focus only on the behaviour.
Planning in character-based AIs is usually presented in a simple form in order to reduce the number of “uninteresting” states [1]. Usually uninteresting features are removed in order to allow the observer to enjoy only interesting realistic behaviour of an agent. The constraints are not only set on the perception of the agent, but on the decision-making process as well.
Emotions in character-based AIs are presented with two classes: primary emotions (model of happiness/sadness) and secondary (such as curiosity). Emotions are implemented to make the behaviour of an agent more realistic and lifelike. Emotions influence the action-selection process and express states of a character. In Black and White the curiosity emotion is used to influence exploration decisions made by a character.
In world with objects the whole environment is presented as a set of states, in each state there is a set of possible actions. All the states are connected with each other.
So the typical reinforcement learning task can be described as [2]:
Predefined:
A set of possible states S
A set of possible actions A
An unknown transition function d: S x A ® S
An unknown real-valuated reward function r: S x A ® R
Find a policy p*: S ® A that maximizes
![]()
for all st where 0≤g<1
The agent can be in one of the states st of S and select an action at = p(st) Î A to execute it according to its policy p.
The agent
will go to a new state st+1
= d(st, at) after
selecting the action at in a state st. The reward, which
agent will receive is rt = r(st, at)×
.
As was mentioned before, the value of the transition function d is unknown for the agent. This means that agent does not realise the effects of the actions and it does not know the reward function r also.
The Black & White game was released by Lionhead Studios in March 2001 after more than two years in development. In Black & White the player takes the role of a deity in the fictitious world of Eden. The world is populated by villagers that can become followers of the player and other gods. In addition to more traditional real-time strategy gameplay, Black & White includes large animal-like creatures, controlled by an advanced AI.
The AI of the creature is very advanced compared to other games on the market. In the beginning of the game the creature does not have much knowledge of the world. But over time and with proper training it can acquire understanding of the world and useful skills. Possible things that it can be taught range from eating specific foods to healing the villagers to casting the magic spells. The designers of the game claim that any action or combination of actions can be learned by the creature in appropriate circumstances.
There is little direct control over the creature, but it is compensated by wide possibilities of training it. By observing the actions of a player, the creature can learn to copy his actions. To help it learn, the player can reward or punish the creature using its mouse to slap or caress the creature.
Some of the basic behavioural aspects of the creature, such as pathfinding, are done so well, that the player does not even notice them.
The creature can wander around the Eden on its own, without the player controlling it. The built in intelligence allows the creature to behave realistically. As a result, one of the many interesting aspects in the game is just observing the creature. Steve Jackson, one of the game developers, says: “…just following your Creature as he wanders about is compulsive viewing. You can never sure what he is going to do next. It is intriguing to watch as he investigates what is food and what is not…” (4)
One of the AI programmers, Richard Evens, explained in an interview (5), what kinds of AI techniques they used. Some of them include:
The extraordinary AI of the Black & White game was widely recognised. The game has won several awards for its AI implementation, including US PC Gamer Award 2001-2002. Black & White was also nominated for Guinness Book of Records for best AI in a computer game (6).
In the beginning of the game the creature instinctively knows a small number of basic actions, such as “sleeping, eating, drinking and pooping” (7). Additional actions are learned through the process of the game, in one of several ways.
In addition to learning simple actions, the creature can learn whole sequences of action, for example catching a stone, thrown by the player, casting the Fire miracle on it and hitting some buildings in enemy village with this stone. Although impressive, training the creature do this can be difficult and will require patience.
Rewarding or punishing the creature is done by clicking and holding the mouse button on the creature to focus on it and then stroking or slapping the creature. The amount of reward (or punishment) is displayed on the screen, ranging from -100% (Bad Boy) to +100% (Good Boy). The creature reacts to the player’s actions immediately, which can cause those players who became emotionally attached to their creature to stop punishment early or reward it excessively.
As the strategic guide notes, “…the creature’s AI is very short-sighted and not very reflective. It does not understand many of effects of its actions…” (7). Because of that the player should only punish the creature if he does not want it to do the specific action. It should not be related to the accidental side-effect. The only way is to wait until the creature learns how to perform the action correctly so that there are no negative side-effects.
Our goal was to experiment with training the creature. To get the training strategies we analysed the online creature training guide at Planet Black & White (7).
We took a fresh and untrained creature. We selected a tiger, because our experiment was oriented towards violent and aggressive behaviour and we wanted a creature with a proper disposition.
We then started a skirmish game on the Kilroy’s Training Grounds map. This map was specially designed for training the creature and provides easy access to all necessary objects and features of the gameplay.
We decided to train the creature to perform an important task: catching and throwing stones to smash enemy buildings
We used all training techniques outlined in the previous section. Our experiment was divided into several five small sessions. During every session we attempted to train our creature using a specific training technique. We documented our actions, the environment, response of the creature and training results.
We used built-in game features for training. To reward our creature for correct action we clicked on it, zoomed in and stroked the creature with slow mouse movements. To punish the creature we zoomed in on it, but instead of stroking, slapped it with quick and energetic mouse movements. We selected the appropriate reward or punishment level according to the importance of the action learned or severity of the mistake made.
We had 5 training sessions with the creature. Each session directly followed the previous one. Average time per session was 15 minutes. Our goal during every session was to strengthen the knowledge acquired by the creature during the previous session and stimulate the learning of new actions based on the existing knowledge to achieve our final task for the creature.
One of the goals of our initial sessions was to understand the abilities of the creature and the potential of its AI.
The creature’s task for the first 5 sessions was to learn to throw stones at enemy buildings to destroy them.
In the description of sessions below “the creature learned” means that the creature learned to do something by itself. We were only passively observing the learning process.
“We trained” means that we actively tried to influence the behaviour of the creature by helping it, giving the necessary objects, focusing the attention of the creature, etc.
|
The basics of throwing objects |
10 minutes |
|
|
Starting conditions: Untrained creature in the creature pen |
||
|
Our action (creature’s action) |
Comment |
Reinforcement |
|
The creature went out of the pen |
||
|
The creature scattered its toys (beach ball and teddy bear) around |
||
|
The creature ate 2 white (psychedelic) mushrooms |
Unknown |
|
|
The creature walked up to the beach ball and kicked it |
First real kick. The creature pointed to the ball |
|
|
The creature picked up the teddy bear and then put it back on the ground |
||
|
The creature went to the ball, picked it up and brought it back to the pen |
||
|
The creature picked up the bear and stroked it |
||
|
The creature picked up the beach ball and threw it at the bear |
The first aimed throw |
+70% |
|
The creature kicked the ball twice |
||
|
The creature placed the bear standing on the ground and then threw a ball at the bear |
The bear was knocked down. Impressive. |
+70% |
|
The creature picked up the teddy bear and while holding it, kicked the ball for a while |
||
|
The creature put the bear on the ground and threw the ball at it again |
The creature learned to throw the ball at the bear |
+30% |
|
Result of the session:
|
||
|
Throwing stones |
12 minutes |
|
|
Starting conditions: The creature knows how to throw the beach ball |
||
|
Our action (creature’s action) |
Comment |
Reinforcement |
|
We forced the creature to drop the bear |
||
|
We put the learning leash on the creature and commanded it to go to the unconverted village |
We stopped the creature near the large heap of stones |
|
|
We attached the leash to one of the stones |
||
|
We draw the attention of the creature by taking one of the stones and showing it to the creature |
The creature focused on the stones |
+20% |
|
The creature laid down to have a rest |
The creature was not tired |
-20% |
|
We gave the creature a stone |
The creature ate the stone |
-30% |
|
We tried giving a stone to the creature twice |
The creature was putting the stones on the ground, because it was 34% hungry |
|
|
We gave the creature some food from the village store |
+20% |
|
|
We gave a stone to the creature again |
The creature dropped the stone on the ground and laid down to sleep |
-20% |
|
We gave a stone to the creature again |
The creature walked for about 20 seconds carrying the stone, came to the ocean shore and threw the stone into water |
+100% |
|
Result of the session:
|
||
|
Throwing stones at the target |
10 minutes |
|
|
Starting conditions: The creature is familiar with taking stones from our hands and knows how to throw them |
||
|
Our action (creature’s action) |
Comment |
Reinforcement |
|
We gave some food (grain) to the creature |
The creature did not eat the food and dropped it |
|
|
The creature laid down to sleep |
It was tired |
|
|
We gave the creature some food again |
The creature ate the food |
+10% |
|
We gave a stone to the creature |
The creature dropped the stone |
|
|
We gave a stone to the creature again |
The creature dropped the stone |
|
|
We put the creature on the leash of aggression and attached the leash to the house in the unconverted village |
The creature approached the house and kicked it |
|
|
We tried giving stones to the creature three times |
The creature dropped the stones to the ground |
-20% |
|
We gave a stone to the creature again |
The creature dropped the stone and laid down to sleep |
-20% |
|
We gave a stone to the creature once again |
The creature took the stone and threw it at the house, damaging it |
+90% |
|
Result of the session:
|
||
|
Destruction of buildings |
14 minutes |
|
|
Starting conditions: The creature is in the unconverted village, attached by the leash of aggression to the house. The creature knows how to throw stones at the buildings |
||
|
Our action (creature’s action) |
Comment |
Reinforcement |
|
The creature approached the house and kicked it |
||
|
The creature took the stone from the ground and threw it at the house |
For the first time the creature took the stone from the ground by itself |
+80% |
|
The creature threw the stone at the house twice, kicking the house after each throw |
The house is badly damaged. The stone that the creature was using got stuck in the building. |
|
|
The creature went aside to the heap of stones, took one from the heap |
||
|
The creature threw the stone at the house twice, kicking the house after each throw |
The house is completely smashed |
|
|
We unfastened the leash of aggression from the house |
||
|
The creature lifted the stone from the ground and then put it back. It kicked a tree, pulling it from the ground |
||
|
The creature went away from the village and headed back to the creature pen |
||
|
We attached the leash to the creature and commanded it to return to the village |
||
|
We tried giving the food (grain) to the creature four times |
The creature was throwing the food away |
-20% |
|
The creature took the food from the ground and ate it |
||
|
We attached the creature to another house with the leash of aggression |
||
|
The creature threw stones at the house fifteen times, kicking it after each throw |
The house is razed to the ground |
+40% |
|
The creature found a cow and lifted it from the ground |
We rewarded him for finding the food |
+10% |
|
The creature ate the cow |
+30% |
|
|
The creature ate one more cow |
+20% |
|
|
The creature ate the third cow |
||
|
The creature trampled down the fourth cow |
||
|
The creature went away from the village |
We tried to stop it |
-10% |
|
The creature laid down to sleep |
||
|
Result of the session:
|
||

Figures 1,2: the creature throws the stone at the small house on the left and smashes it
|
Motivation to destruction |
17 minutes |
|
|
Starting conditions: The creature is in the village. The creature is not on the leash. The creature knows how to pick up the stones and destroy house with them (when it is attached to the buildings) |
||
|
Our action (creature’s action) |
Comment |
Reinforcement |
|
The creature tried to leave the village |
-10% |
|
|
The creature tried to leave the village again |
-20% |
|
|
The creature tried to leave the village once more |
-20% |
|
|
The creature tried to leave the village one more time |
-20% |
|
|
The creature turned back and returned to the village |
+10% |
|
|
The creature picked up a stone and tried to go away from the village |
-10% |
|
|
The creature threw the stone at the barrier |
||
|
The creature picked up the cow and threw it |
||
|
The creature picked up the stone again, went a bit from the village and threw the stone |
-40% |
|
|
The creature picked up the dead cow and ate it |
The cow died from falling from great height |
|
|
The creature picked up a stone and tried to go away from the village again |
-10% |
|
|
The creature threw the stone |
||
|
The creature returned to the village |
||
|
We attached the creature to the tree in the middle of the village on the leash of aggression and restricted the movement |
We observed the creature’s behaviour |
|
|
The creature was bored. It was yawning and wallowing |
||
|
The creature picked up a villager and ate it |
+20% |
|
|
The creature was tired and fell asleep |
||
|
The creature woke up but tried to sleep again |
The creature was no longer tired |
-20% |
|
The creature got up but tried to sleep again |
The creature was no longer tired |
-30% |
|
The creature got up but tried to sleep again |
The creature was no longer tired |
-20% |
|
We gave the stone to the creature, but it dropped it |
The creature was not interested |
|
|
We attached the leash to the mill |
||
|
The creature picked up a stone and threw it at the mill |
+50% |
|
|
The creature threw the stone at the mill three times, kicking it after each time |
+20% |
|
|
Result of the session:
|
||
After the fifth session we checked the statistics on the creature in the Creature Cave. The stats are given below from the creature’s point of view (“Creature Mind”):
|
|
|
Figure 3: the “Creature Mind” scroll in the Creature Cave and selected statistics
During the first session we observed the behaviour of the creature to understand its abilities and motivation. When we saw that the creature was interested in throwing objects as part of the playing process, we decided to develop and shape this behaviour according to our goals.
To make the creature able to cause some damage, we needed to replace the beach ball with stones. Another reason was that we had only one beach ball, but we had access to a large amount of stones. During the second training session we punished the creature for wrong actions and rewarded it for finally throwing the stone.
During our third session, the creature was taught to through the stone at the target by attaching it to the target (house) on the leash of aggression.
In the next session the creature, still attached to the house with the leash of aggression, learned to pick up new stones from the ground and finally was able to destroy the house with an accurate throw.
In the last session we did not achieve complete success, as we were unable to train the creature to voluntarily perform the specific destructive actions.
To conclude the results of the experiment, during the first four sessions we were able to form specific behavioural patterns. The creature was trained to perform specific sequences of actions: finding the nearest stone, picking it and throwing the stone at the target. With this, we achieved the stated goal of the training.
During the course of our training we were able to shape the personality of the creature in the way we wanted. Specifically, it was trained to be violent (destroying the houses), playful (throwing the stones) and to rarely be tired (participate in our experiment all the time).
However, we were not able to motivate the creature to destroy the buildings on its own. The creature performs this action only when specifically ordered to do so (by means of putting it on the leash of aggression and clicking on the target).
Observing the creature, especially during the last session, suggested the reason for the incomplete success in training the creature. The game AI is programmed for complex and varied behaviour of the creature. There are additional ways to fulfil the needs that we developed in our creature. The needs for playing, throwing objects and violent behaviour can be fulfilled in different ways, for example, by throwing stones into the ocean, throwing cows, villagers, kicking houses, living things and making menacing gestures. We observed all this behaviour repeatedly during our session, when the creature was acting independently (was not on a leash).
Because the training time was limited, the creature was not able to try many “secondary” actions. As a result, we could not punish it for deviating from the main course of actions (throwing objects other than stones, acting violently to objects other than houses).
A promising way to continue the training experiment is to oversee creature’s actions for a prolonged period of time, punishing it for any deviations. For example, when the creature heads to the ocean carrying a stone to throw it into the water, the best response would be to way and punish it right after doing it. As we were limited in time, we did not allow this to happen and forced the creature to return to village immediately.
We thank Lionhead studios and personally Peter Moulinex for developing Black and White, an innovative computer game with a revolutionary implementation of character AI. We also thank Jan Voracek for the opportunity to make this project and for giving us unlimited creative freedom. Finally we thank the villagers in the game for patiently enduring a wild creature that was destroying their houses and mill.