DeepSim: automated controller design
(A Deep Reinforcement Learning based platform for designing embedded or cloud based controller software for a variety of applications - from electromechanical systems to business processes)
What is a controller ?
The traditional definition of a controller is, a module in a feedback loop with a system which is designed to achieve optimal performance. As illustrated in the left panel of the figure below, the output of the system (y) is fed back as an error term (e) to the controller. The error is calculated by comparing it with a reference signal (r) . The controller tries to drive the error signal to zero by controlling its output (u).
In the context of Reinforcement Learning systems the controller from traditional systems becomes the “agent” in RL terminology as illustrated in the right panel of the figure below. The reference and error signals from traditional controllers are replaced by the “reward” function (Rt).
Figure 1. On the left a traditional controller on the right a reinforcement learning system training the controller.
What is DeepSim ?
DeepSim is a versatile platform which can be used to generate controller software for almost any optimization problem where a simulator is available. In the absence of a simulator the platform can also use historical data about the environment being controlled. Hardware in the Loop (HIL) systems that are robust enough to allow for some trial and error can also be used to train the controller.
DeepSim can be used to generate a SW controller for any system with sensors and actuators. The sensors are inputs to the controller and the controller decides how to trigger the actuators so that optimum performance can be achieved when the system is operational. Optimum performance is specified by the designer during the RL training process using the “reward function”.
DeepSim can also be used to generate intelligent agents for business process optimisation tasks. In this case the inputs to the agent could be the detailed information about the current state of the business process and the output could be a recommendation about what action to take next. The agent could be trained on a simulation of the process or historical data and logs to optimise for a certain outcome as specified in the “reward function”. Very sophisticated reward functions can be specified that try to optimise for complex tradeoffs.
Why do our customers need DeepSim ?
Today we are witnessing the convergence of the following technology trends:
Autonomous driving: land, water and sky.
Increasing complexity: design and performance requirements and simulation capabilities.
Smart environments: intelligent edge devices enabling automation of homes to factories and cities.
Big data: availability of big data from multitude of sensors and large networked systems
These technology trends are driving the need for AI agents which can perform sophisticated decision making, acting autonomously and often in real time. Electrification and autonomy and driving the emergence of complex and highly variable designs e.g. today there are more than a 1000 different drone designs that are in commercial production. Design of vehicles is also undergoing fundamental changes not seen since the invention of internal combustion engines. And finally these smart environments and vehicles are performing ever more complex and diverse jobs which are too dangerous or even impossible for human agents.
The availability of big data from a multitude of on device sensors (traffic, enterprise business networks, etc.) enables the autonomous design of intelligent controllers using DeepSim. Such sensor data is used for training our models to make efficient decisions in a very high dimensional space. With traditional methods, many inputs and outputs remain unoptimized due to the complexity being beyond human capacity.
As a result of the above, the manufacturers of these vehicles and factories are facing an enormous challenge to rapidly design, test and deploy smart controllers for optimal control under diverse and challenging circumstances. Such controllers are very complex:
Need to monitor a large number of input sensors
Based on inputs need to drive actuators (make decisions) for optimising various performance criteria like, range, battery life, passenger / load safety, etc.
Need to be updated regularly to deal with new designs and changing environments and use cases.
DeepSim is created to offer a comprehensive platform to our customers for solving the above challenges and creating a fundamentally new methodology for auto generation of embedded control software.
In summary, these are the limitations of traditional methods for SW (embedded) controllers:
Difficult to model when transfer functions are not defined
Models become unstable when the control space becomes large
Heuristics require a lot of manual labor
Benefits of Deep Reinforcement Learning based controllers:
A system modeled by layers of linear combinations followed by non-linear activation functions that can represent,
Logical systems alone or in combination with a linear system
Time series via techniques such as LSTM and RNNs
Can handle large input and output space dimensionality
Do not require knowledge about transfer function or heuristic models
Can all be taught with the help of training examples
DeepSim: applicability and use cases
The following figure illustrates the wide applicability of the DeepSim platform.
In the above figure the x-axis represents the frequency with which the agent has to make decisions, from fast (milliseconds, on the left) to slow (hours or days, on the right). Fast agents are typically deployed in embedded real time controllers such as vehicles and drones. Slow agents are typically human scale and can be located as a SaaS server in the cloud that can be queried to make decisions.
The y-axis represents the complexity involved in the agent's decision making process. Simple agents with low complexity typically have to monitor few ( < 10) input signals and control a few actuators ( < 5) , this is the lower region of the plot. On the high side of the y-axis we show highly complex systems with 10 - 100s of input signals and dozens of actuators or output signals.
DeepSim like tools are urgently needed for applications in the upper two and lower left quadrants. DeepSim has been designed to serve this market need and to be efficient in those three quadrants. An overview of verticals where DeepSim is and can be applied can be seen in the figure below.
DeepSim for Cloud based Compute Environments
Reinforcement Learning is a machine learning technique that takes information from an environment and teaches an agent how to control a device or system and optimize a desired outcome. Once the agent has learned the optimal control rules it can be deployed in the real world.
To teach an agent two major components are required:
An interactive training environment that can approximate the real world condition.
A training component that trains the agent on how to behave in this environment.
We’ll discuss some additional details on these components below.
The environment, which is most often a simulator, is a virtual representation of the real environment in which the agent ultimately has to operate. This environment could be a chess board in a chess program if the goal is to teach the agent to play chess. A more complex example would be a simulation of an intersection in a busy city if the goal is to teach the agent (an autonomous car) how to navigate the intersection without running into any objects while obeying all traffic rules.
The type of environment, its complexity, and degree of accuracy, dictates the type and amount of compute power that is required for a suitable simulation. For example, in the chess board example the amount of compute power requirement will be low since the visuals are basic and the rules of the game are relatively simple. On the other hand, the intersection example requires a lot of computational power. You have to calculate the physics of all the cars that are in the intersection (friction on the road, braking power, acceleration performance, etc.), possible pedestrian behaviour (crowd simulation methods are required for this), and of course rendering the scene as accurately as possible so the agent learns how to interact with the world as accurately as possible (this includes expensive graphical effects like shadows, reflections, obstructions, partial visible objects through windows, weather, road signs, etc.). Rendering of these scenes is often accelerated by the use of dedicated graphical processing units.
The above example illustrates that the more complex the environment and the tasks taught to the agent, the greater the required compute power.
The agent has to be taught which actions to take in a dynamic environment with scenarios it has not seen before. In order to learn it needs to have a certain amount of intelligence to adapt and generalize to different situations beyond what it has seen during the original training process. This intelligence is created using a neural network. These networks form the backbone of many of today's advances in machine learning and artificial intelligence such as image recognition, speech recognition and synthesis, natural language processing, etc. Training these networks is a computationally intensive activity as the networks consist of millions of parameters and multiple (sometimes over a hundred) elements called layers. Transferring information between the layers is commonly done using matrix multiplication operations. The high computational flop requirements means that neural network engineers often use dedicated workstations with the latest generation Intel or AMD CPUs, or even use dedicated GPUs for mathematical calculations instead of rendering. This neural network compute requirement has sparked an explosion of dedicated chips that are optimized for these types of operations. These computer chips are often installed in large scale clusters. It is not uncommon to see cutting edge research which requires hundreds or thousands of chips to train a neural network within a reasonable amount of time.
These demanding compute requirements are why reinforcement learning has only recently gained popularity. Not only do you have to train a neural network, you also have to use an environment that accurately resembles the real world in which the agent will operate in.
The need for DeepSim
Running a single training scenario of a reinforcement learning algorithm for an RL agent can be done on hardware ranging from a laptop to large scale clusters. However, it can take a long time before an agent is sufficiently trained on a laptop system because the compute envelope is much smaller than that of a large scale cluster. This is where DeepSim’s core technology adds significant value since, for production quality neural network agents, you do not train an agent on only a single scenario. You must train the agent on a range of scenarios and evaluate a wide range of training (hyper) parameters to get an optimal controller (trained agent). This requires many runs, and executing all those runs on a single laptop, or even a workstation with a powerful GPU, will take too long to produce meaningful results.
DeepSim manages the underlying cloud infrastructure and scales the required number of compute nodes dynamically. DeepSim will schedule the required training runs on the compute infrastructure, track the progress of these runs and add additional resources if required. By monitoring the training runs, the platform can decide to stop runs that show little or no training progress thereby freeing up resources for additional runs. This feature decreases training time and cost. DeepSim takes advantage of the large number of compute SKUs available and will select the most optimal SKU for a given simulator. For example, if a simulator can not take advantage of GPU processing power then there is no need to use a GPU SKU and vice versa.
Combined with the hyper-parameter optimizer (HPO) and neural network architecture search (NAS) extensions of the platform, the number of runs to manage and launch in parallel will become even larger. These features reduce the workload on the neural network engineer and data-scientist by shifting the selection of parameters and neural network architectures to the cloud based intelligence that comes with DeepSim. By replacing engineering hours with compute resources you can reduce training time and get a better solution faster. Realizing this value is something that would not be possible with only having access to local workstations.
The DeepSim platform will scale down compute resources when no runs are being performed to eliminate idle time charges. This dynamic scaling is something you cannot do with a laptop, workstation or dedicated on-prem cluster which is always active and costs money when not in use.
Future DeepSim releases will integrate our patented intelligent cost optimization feature. This feature uses a neural network trained by DeepSim itself to intelligently allocate compute resources. By weighting the requirements provided by the user (e.g. time-to-solution, maximum amount of simulator licenses, and compute budget, etc.) DeepSim will find the most cost-effective solution possible. This feature enables customers to balance neural network performance and budget or restrictions on the number of licenses/parallel simulator runs that can be used. DeepSim allocates the optimal compute resources for the given budget. For example, if using open source simulators the optimal program could use many small and cheap compute nodes in parallel. On the other hand, for simulators with expensive licensing models, DeepSim could allocate high performance compute nodes to run the simulations in a shorter time. With this feature DeepSim will be able to design a controller for any budget.
In short, the DeepSim platform enables a seamless transition from toy examples that previously were run on laptops or workstations into large scale production training projects all on the same platform without accruing additional costs when you’re still in the early stages of development by dynamic scheduling of compute resources.
DeepSim: use case examples
DeepSim improves hybrid car range
with a software update
For electric and hybrid vehicle manufacturers the top key performance indicator (KPI) is range. The minds.ai DeepSim platform was used to create controller software which managed the power source (IC engine vs battery). One mode maximized economy and the other maximized performance. Both modes resulted in less fuel consumption and higher battery charge vs. default values.
DeepSim RL-Builder uses deep reinforcement learning to create embedded controller software (DeepSim Solutions) to improve performance across many use cases and verticals while also reducing R&D time.
For more details see the project brief.