Microsoft Maluuba teaches management 101 to machines in its first paper since being acquired

In mid-January, the ongoing race for AI put Montreal-based Maluuba on our radar. Microsoft acquired the startup and its team of researchers to build better machine intelligence tools for analyzing unstructured text to enable more natural human computer interaction — think bots that can actually respond with reasonable intelligence to a text you send. The team dropped its first paper since being acquired and it sheds light on the group’s priorities.

The paper outlines a method for multi-advisor reinforcement learning that breaks problems down to be simpler and more easily computable. In oversimplified terms, Maluuba is effectively trying to teach leadership to groups of machines working to solve problems.


Existing conversational interfaces are rigid and easily broken. Siri, Alexa and Cortana are miles ahead of old-fashioned dialog trees, but they still are a far cry from generalized intelligence. From a computational standpoint, a complete model of the world would be infeasible to create so instead engineers create specialized machine intelligence tools that can perform well on a smaller number of tasks. This is why you can ask Siri to make a phone call but can’t ask it to organize a large dinner event.

A lot of attention is being given to reinforcement learning, a specialized branch of machine learning. As I have explained previously, reinforcement learning steals the idea of utility from economists in an effort to quantify and iteratively evaluate decision making. Instead of explicitly telling an autonomous car every rule of the road, it can be more effective to gamify the problem and assign figurative “points” that the intelligent system can optimize. The system could hypothetically lose points for driving over a double yellow line and gain them for maintaining the speed limit.

This allows for a much more adaptable system, but it unfortunately is still a rather complex problem requiring a lot of compute. This is where multi-advisor reinforcement learning comes in.

  1. IMG_2149

  2. IMG_5356

  3. IMG_2183


The Maluuba team is trying to solve these complexity problems that face reinforcement learning. Their approach is to use multiple “advisors” to break the problem down into smaller, more digestible, chunks. Traditionally, a single virtual agent is used for reinforcement learning but in recent years multi-agent approaches have become more common.

In a conversation, the group presented the example of an intelligent scheduling assistant. Rather than have a single agent learn to schedule every kind of optimal meeting, it could someday make sense to assign a different agent to different classes of meetings. The challenge is getting all these agents to work together in consonance.

Intuitively it’s easy to imagine these agents as humans splitting up a task. Getting people to work together efficiently is no small task even though a divide and conquer strategy can outperform the lone wolf mentality.

The solution is to have an aggregator sit on top of all the “advisors” to make a decision. Each advisor in Maluuba’s paper has a different focus with respect to the grand problem being solved. Each agent gets a different reward for the action it specializes in. If agents take different positions, the aggregator steps in and arbitrates.

Maluuba used a simplified version of Ms. Pac-Man, called Pac-Boy to test different methods for it multi-advisor reinforcement aggregating learning framework. The team wants to study the process of breaking down problems. Ideally there is some universality in how problems can be organized around a number of optimal aggregators. This is another place where it’s interesting to think about how humans decompose problems, often inefficiently — think leadership 101 for machines.

Why you should care

Multi-advisor reinforcement learning can save CPU and GPU power. Breaking down a problem also makes it more easily distributed to different servers for paralyzed processing. Reduced complexity is universally helpful for all reinforcement learning problems.

The research team explained to me that it’s still early days for working alongside Microsoft. They’re transitioning to Azure and building out communication channels between exiting machine learning teams. But when that process is complete, it strikes me that Maluuba will play a huge role in analyzing text and the language held within it.

While reinforcement learning itself isn’t novel, Maluuba is pouring a lot of resources into it. We have already seen the potential of reinforcement learning in DeepMind’s AlphaGo. Future joint research projects could bring more efficient and adaptable reinforcement learning into new consumer and enterprise facing dialog products for Microsoft.

Leave a Reply

Your email address will not be published. Required fields are marked *