AI architecture proposition (DEAT)

Update 2018: The architecture presented here isn't inaccurate, but it is incomplete. Later I have created another model that is actually complete and works much better. No production code lais in the follwoing system.

DEAT stands for Decision Event Action Tree. It is based on decision trees model, with introduction of concepts of actions and events. First, let's consider the benefits over existing most popular solutions:

1) Drawbacks of FSMs and HFSMs:

FSMs have the ability to make the logic really convoluted when the number of states and transitions increases. It makes both the diagram and the code complex.
Most reasonably efficient implementations hold the transitions logic inside the states themselves, which makes one state responsible for knowing about all states which it may be connected to. Thus, changes to one state automatically draw potential changes to all other states. States are in this way coupled, and have a"code smell" of dependant objects.

2) Drawbacks of BTrees:

Require custom node logic, which is separate from the "programming logic". This involves things like sequencers, parallels, selectors and (worst of all) active selectors. They require thinking in a separate domain, for example thinking in "conditions with actions after them all under a sequencer" instead of "if-s".
May be difficult to debug. Graphical debugging tool is a must.
May be a little cumbersome to construct. They are created either with "tree builder" pattern in code, or with a GUI tool.
Most of them are slow (I). The basic implementation traverses the tree every tick to find the same running node. Even in cases where they don't, active selectors allow for interruptions, but they re-run parts of the tree every tick. If an active selector is placed at the root, then basically the whole tree will be traversed anyway. They have to be carefully placed to avoid performance drops.
Most of them are slow (II). They create a bunch of scattered objects in memory, 1 for every node of every tree of every agent. 2nd gen BTrees are concerned with speed, but as Unity developer I develop in C# with pleasure and can't pack the memory the way C++ developers do (in most cases). I have found most of the optimizations suggestions difficult to apply.
Most of them are slow (III). They usually involve structuring the logic to place most costly conditions later in the tree to avoid over-processing. This involves thinking both in terms of the problem and in terms of the optimization, and the "slow" in this case means "slow to develop".
Event driven and optimized implementations are difficult to develop, maintain and debug unless they are a part of a seasoned library.
Some implementations need to use blackboards for shared data. This makes code inside the BTs look clumsy.
With that being said, I don't negate the benefits of these approaches (Update 2018: actually, I do), but I wanted to set up the scene for a simpler, but still very capable solution.

Before starting, I want to say I have realized this architecture in code and it runs fine, however I am not using it yet on any commercial project.

So, let's introduce the basic building blocks.

Concept of ActionBase: base class for actions, something that does operation, either on the agent or the world. Actions can be started (Activate()), Process()-ed (separate methods for both Update() and FixedUpdate()), stopped (Terminate()). In addition to those, I have a special helper method called AnnounceCompletion() that fires the supplied completion event and removes itself from the list of running actions. ActionBase type must be parameterized to know its "owner", which is the type of the Component that is attached to the game object (the holder of the AI). It then can access all the members of the owner.
SingletonActionBase: it extends action and makes sure only 1 instance is active at one time (for 1 set of parameter types). That is why actions can have no state on their own, and must take supplied owner in all operations. In order to create the instance, it must be parameterized also with the exact concrete type of action, in order to create the instance for that precise action.
OnwerBase: Must derive from MonoBehavior, and must be parameterized with Owner's type. It's the base class for the Owner class mentioned above. It is the main workhorse of the framework and it abstracts away all the working parts, allowing the Owner to focus on implementation.
Event: This is an enum (and not an event in usual sense) appropriate to our implementation. Event can represent any piece of changed logic in the world. They can be added either by Actions (successful or failed) or changes in the world state (from the senses). In either case, the event is something that lets the logic know something has changed and we need to make another decision.
Owner: The Component attached to GOs. It holds all the shared data and implements MainDecision() method.
ConcreteActions, that implement ActionBase. They can be either separate or nested classes inside Owner. If nested, they can access private fields of Owner.

Back to OwnerBase (I didn't call the actual classes like this, this is for ease of explanation). It has the following members:

A list of currently running actions. It's a list of references to all actions that are considered processing on this MB. That is, a list of references to state-less singletons, that get the data by taking a reference to the Owner for every method.
All logic is processed in Update(). Actions run in Update() and FixedUpdate().
A list of unprocessed events. Those are enums that have been accumulated during this frame and will be processed at the start of the next frame.
Actions that are waiting to be activated. When the main logic runs, it works by adding actions. When that logic is complete, all added actions will be initialized properly first (here) and only then added to the list of running actions, unless it chooses not be with a bool return result. The action may be completed in Activate() and there may be no need to add it to a list of running actions.
A method for setting the owner, which will be called from the Owner class, which we then use as a parameter for processing all the actions.
Methods to manipulate the running actions in the list. Action may be added in normal or exclusive mode, where they cause all other actions to terminate. This brings up the same problem as with BTrees - how to handle cancellation. There is no silver bullet, and with this approach the states will be stopped immediately. If there are sounds or animations that must be completed over time, then the subsystems responsible for those will take the possible next order, and then do those only after the current ones have been completed gracefully.
Methods to manipulate events. While being processed by the main logic, events can be left to "fall through" or used at some place.
Only 1 event of 1 type can be in the list at one time. This is a convention to avoid confusion with multiple same events and events falling through.

If you're finding this article helpful, consider our asset Dialogical on the Unity Asset store for your game dialogues.

7. MainDecision: this is the abstract method on OwnerBase, and it represents the logic for adding or cancelling running actions. It is important to note this logic will be invoked only on frames where we have unhandled events. This means refreshing the logic only where there is something new to decide. The events by any internal system may signal it's time to reprocess periodically even if there are no other changes.

This logic is written obviously in the same language as the rest of the system, which means there are no scripting languages, special node types or "compilations from GUI tools". There is also no Composite pattern anywhere. It's the same logic that we use for everything, the difference is we check the state of the world and agent and make decision to start (or even cancel) certain actions.

Since the signature of this method is like System.Action, any other logic may be called to internal or external methods (in which case the owner would have to be supplied). This gives the ability to have the shared logic even between AI agents with partially overlapping AI, something that I would call "AI Inheritance". The example would be having the same logic for a patrolling guard and stationary guard, without repeating most of the code. Those methods would theoretically take an owner and return a set of actions to the MainDecision. (Note: This is also possible for both FSMs and BTs.)

So, here's how it all works:

When the game starts the Start event is fired. This lets the machinery know we are starting from scratch.
MainDecision is called, which sets up all the actions required for the agent to start operating.
MainDecision adds actions to the list of actions to be activated.
All events unused that have "fallen through" the MainDecision are cleared. (In this case, it's the start event.)
Actions that are awaiting activation are Activate() -ed.
The actions in the active list are processed for Update(). This means if the action has been added to a list, it will be Activated() and Update()-ed once within one frame.
In FixedUpdate all actions run FixedUpdate method.
Actions run and fire events. Optionally other systems run events. At a frame where there is at least one event, the logic is called again, first thing in the frame.

In any approach based on decision trees there is a bit of an issue with sequential actions. These could introduce a bit of excessive checking and state variables. There is one solution to allow only single actions and compose all actions that would be a sequence into one, but that would very much lower the reusability of actions. I have opted for another solution: OwnerBase has another dictionary of following actions. When we add the actions, we can also add the following actions to any action. When any action announces completion, we check the dictionary to see if there are any following actions to this action, and if there is then we immediately activate it and add it to the list of running actions. We have to be careful to empty this dictionary whenever we clear the running actions list.

Benefits of this approach:

The logic used is the same logic used everywhere. As mentioned, there are no specialized nodes, GUI tools or other additions. The logic is simple to understand and easy to debug.
Small memory footprint: There are no separated memory instances for the logic, only 1 memory instance for each action singleton, in addition to 1 memory object for Owner.
It's faster than a BT. BTs can at best have the implementation as fast as this one, if they implement action lists and are event based, and do all the optimizations of the memory within limitations of the language. They will still be marginally slower because of many virtual function calls instead of nested if-s.

If you're finding this article helpful, consider our asset Dialogical on the Unity Asset store for your game dialogues.

AI architecture proposition (DEAT)

Post Comment