We investigate if policies learnt by agents using the Off-Belief Learning (OBL) algorithm in the multi-player cooperative game Hanabi in the zero-shot coordination (ZSC) context are invariant across symmetries of the game, and if any conventions formed during training are arbitrary or natural. We do this by a convention analysis on the action matrix of what the agent does, introduce a novel technique called the Intervention Analysis to estimate if the actions taken by the policies learnt are equivalent between isomorphisms of the same game state, and finally evaluate if our observed results also hold in a simplified version of Hanabi which we call Mini-Hanabi.

Joint work with William Zhang for the course project of 15-784 Foundations of Cooperative AI


Link to our paper.