Artificial Intelligence (AI) — Part two

Spoiler alert, I won’t use Unity machine learning (also called mlagents) to implement artificial intelligence for my bots. If you want to know more about why, read on.

At first, it was hard to use, see my previous post, but then Unity helped my by giving me access to their alpha mlagents-cloud. That fixed my previous problem which was mostly a hardware problem.

From hard to use it becomes easy to iterate, and that’s exactly what I needed to find out if it was a good approach for my idea of having bot using “real” AI.


When you try to train a model you have to give it three main data points:

1 – Observations: what it (it’s called an agent) can see from its environment
2 – Actions: what the agent can do
3 – Rewards: information on how it performs

So you have to think quite hard about it, but as you know your environment, in the end you can find some good inputs for each of those (so you think).

In the very beginning I tried to train a model that would move and shot the target.

Let’s dive into some details

I had 12 observations, 10 actions and plenty of rewards points here and there. But I found out that no matter what, my model could not understand how to fire, it was moving quite alright but never firing.

I decided to split the model in two, one for moving, and one for aim and fire. I found out online that most people do this way when the problem for the agent is too hard. It’s a first trade off but I thought that it was acceptable.

Now I have two experiments, one to learn to move and the other to aim and fire.


The agent has to go to the target so reward is calculated on how close it is to the target. It can go left/right/jump/double jump. The map can be pretty hard to navigate for sure, even sometime impossible (something that machine learning does not like).

After 7 iterations, where I changed the reward values, added/removed some observations, made the map easier to navigate etc. This is what I’ve got:

The agent mostly succeed but sometime it goes in the wrong direction, it is always jumping like crazy, it does not handle the double jump when needed, it does not look natural at all.

Give it 8 more iterations, trying to add negative reward to the jump so it stops doing it that much etc. I did not get anything better.

Note that even if Unity mlagents-cloud allow me to iterate quickly, it still needs a couple of hours between each model changes.


The agent has to hit the target with the bazooka so reward is calculated on how many damages it makes and also how close it is (when failing). It can aim up/down/load fire/release to shot. This time the map was made easy from the beginning.

But after 5 iterations I found out that this was already too complex for the model. It did not manage to hit the target, only itself. The load and release action to fire is too complex from what I understood.


The problem is that machine learning is hard, and I’m not an expert in it

It took me a full week, working like crazy to conclude that I’m not an expert enough to know what are the limits of this, and how to bypass them. Of course, I could spend more time on this but it seems that no matter what, the outcome will not be as good as I first imagined.

By working on machine learning in this scope, training an agent to be a bot in a game, I also realized that doing so as a developper, you would lose all the control on your bot. I’m quite sure that when the AI is well trained the result for the player is nice, but as a game designer you can not force how your bot would behave (except making an new model each time).

This adds up to my final conclusion: machine learning is not what I need so I’ll have to make a manual AI for Artillery Battle, and this will be hard.

Funny note

When working with machine learning, you can come across some funny (but logic) behaviors. For example in my first “fire” experiment the agent learned that not firing at all was the best way to go. Because if it failed and hit itself it was punished. So I had to give it some positive reward for firing and lower negative on hitting itself (this is an example on what you have to do between each of your experiments to get a better outcome).