The Big Match with a clock and a bit of memory.
We show that in the classic game "the Big Match", Player 1 has for any epsilon>0, and epsilon-optimal strategy that depend on the clock and just two states of memory. The result generalizes to absorbing games with either finite or compact actions sets, and to repeated games with symmetric incomplete information.
Joint work with Rasmus Ibsen-Jensen and Abraham Neyman.
Solving Two-State Markov Games with Incomplete Information on One Side.
We study the optimal use of information in Markov games with incomplete information on one side and two states. We provide a finite-stage algorithm for calculating the limit value as the gap between stages goes to 0, and an optimal strategy for the informed player in the limiting game in continuous time.
This limiting strategy induces an epsilon-optimal strategy for the informed player, provided the gap between stages is small.
This is joint work with Eilon Solan and Catherine Rainer.
John Fearnley, University of Liverpool, United Kingdom
A Tour of the Complexity Classes Between P and NP.
Simple-stochastic games, discounted games, mean-payoff games, and parity games are some of the simplest problems in NP intersect co-NP that are not known to lie in P. Over the past thirty years, complexity classes between NP and P have been proposed, and many of them contain the game problems listed above. In this talk I will give a tour of some of these classes and the game problems that they capture, including PPAD (bimatrix games), PLS (congestion games), and CLS (network coordination games). I will finish by describing a new complexity class called UEOPL, which is the closest complexity class to P among all those discussed, and also the closest to classifying the game problems mentioned above.
János Flesch, School of Business and Economics, Maastricht University, The Netherlands
Simplifying optimal strategies in limsup and liminf stochastic games.
We consider two-player zero-sum stochastic games with the limsup payoff. In these games it is known that the players may have no optimal strategies at their disposal, due to discontinuity in the payoffs. We prove two simplification results: (1) If player 1 (the maximizing player) has an optimal strategy then he also has a stationary optimal strategy. Our construction does not require the knowledge of an optimal strategy, only its existence. (2) If player 2 (the minimizing player) has a subgame-optimal strategy (i.e. a strategy that is optimal in every subgame) then he also has an optimal strategy that only requires finite memory. The proofs rely on considering restrictions on the strategy spaces and techniques that originate in gambling theory. Our results apply directly to stochastic games with the liminf payoff.
This is a joint work with Arkadi Predtetchinski and William Sudderth
Tristan Garrec, Toulouse School of Economics, France
Communicating Zero-Sum Product Stochastic Games.
We study two classes of zero-sum stochastic games with compact action sets and a finite product state space. These two classes assume a communication property on the state spaces of the players. For strongly communicating on one side games, we prove the existence of the uniform value. For weakly communicating on both sides games, we prove that the asymptotic value, and therefore the uniform value, may fail to exist.
Hugo Gimbert, LaBRI, Université de Bordeaux, France
Games with imperfect information and finite duration.
Two player poker (a.k.a. "heads-up poker") is an example of a zero-sum game of imperfect information with finite duration. It has been studied since the early steps of game theory, notably by Emile Borel and John von Neumann. Nowadays, algorithms rule this game: the AI "Libratus" of Carnegie Mellon University has recently racked up over $1.7 million worth of chips against four of the top professional poker players.
In this talk we will review classical results about games with imperfect information and finite duration, including the complexity of games without perfect recall and the PTIME algorithm of Koller, Meggido and vonStengel for games with perfect recall. We will present new results as well.
Marion Hallet, UMONS - Université de Mons, Belgium
Games with imperfect information and finite duration.
In game theory, we can distinguish a static approach, where all players choose their strategies and the game is played once, from a dynamic approach, where the players play repeatedly, updating their respective strategies after each play, in order to improve the outcome at the next play. In this talk, we consider a dynamic approach on sequential games and games played on graphs. Our goal is to obtain conditions that guarantee the termination of the dynamics; we also study the nature of the obtained stable profiles. Recently, we have started to investigate the relations between our work and computer networking.
Sebastian Junges, RWTH Aachen University, Germany
Parameter Synthesis for Markov Chains.
Probabilistic Model Checking extends Model Checking to Markov chains,
and allows the analysis of questions like “what is the probability to reach a target”,
or what is the expected time until a failure occurs in the system.
Naturally, the analysis crucially depends on the probabilities in the model. However,
precise probabilities are often unknown or unclear during design time.
We motivate the problem with several application areas for parameter synthesis,
give an overview over the research landscape, and highlight some recent advances.
Antonin Kucera, Masaryk University, Czech Republic
Game-Theoretic Models of Patrolling Problems.
Patrolling games are a special variant of security games where the Defender (e.g., a police patrol) moves among vulnerable targets and tries to discover possible intrusions. The Defender's moves are constrained by the environment (streets, passages, etc.) specified as a directed graph, where the targets form a subset of the vertices. The task is to construct a strategy determining the moves of the Defender so that the overall protection is maximized. In adversarial patrolling, it is further assumed that the Attacker knows the Defender's strategy and can observe his moves, which leads to Stackelberg equilibrium as a natural solution concept.
The talk covers fundamental properties of patrolling games as well as recent results obtained for specific classes of patrolling problems.
Jasmine Maes, School of Business and Economics, Maastricht University, The Netherlands
Subgame optimal strategies in zero-sum stochastic games with tolerance levels.
We study subgame phi-optimal strategies in two-player zero-sum stochastic games with finite action spaces and a countable state space. Here phi denotes the tolerance function, a function which assigns a non-negative tolerated error level to every subgame. A subgame phi-optimal strategy guarantees the value in every subgame within the subgame-dependent tolerance level as given by phi.
First, we provide necessary and sufficient conditions for a strategy to be a subgame phi-optimal strategy. As a special case we obtain a characterization for subgame optimal strategies, i.e. strategies that exactly guarantee the value at every subgame. Secondly, we present sufficient conditions for the existence of a subgame phi-optimal strategy. Finally, we show the possibly surprising result that the existence of subgame phi-optimal strategies for every positive tolerance function phi is equivalent to the existence of a subgame optimal strategy.
Optimizing Expectation with Guarantees in POMDPs.
A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications.
Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the “expectation” and “threshold” approaches and consider a “guaranteed payoff optimization (GPO)” problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that (a) each possible outcome of σ yields a discounted-sum payoff of at least t, and (b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying (a).
We present a practical approach to tackle the GPO problem and evaluate it on toy POMDP benchmarks.
Arkadi Predtetchinski, School of Business and Economics, Maastricht University, The Netherlands
Subgame perfect epsilon-equilibrium in perfect information games.
Subgame perfect epsilon-equilibrium (epsilon-SPE) is a strategy profile such that no player has a deviation, after any history of play, yielding a payoff increase of more than epsilon. Equivalently, epsilon-SPE could be defined as a strategy profile that induces an epsilon-Nash equilibrium in each subgame. It is a refinement of epsilon-Nash equilibrium serving to eliminate non-credible threats.
In this talk, we provide an overview of recent literature on epsilon-SPE in perfect information games. We review both positive and negative results. Positive results include the existence of epsilon-SPE in games with lower semicontinuous payoffs, in games with upper semicontinuous payoffs, in games with qualitative objectives, and in games that are continuous outside a thin set. Negative results include some remarkable counterexamples to existence.
We also briefly review variations of the concept such as one-deviation strategy profile, finite-deviation strategy profile, or weak epsilon-SPE. In essence, these concepts only require robustness to one-shot deviations, or to deviation at finitely many histories. These concepts proved to be important for the analysis of epsilon-SPE.
We conclude with some open questions.
Mickael Randour, FNRS & UMONS - Université de Mons, Belgium
Rich behavioural models: illustration on journey planning.
I will illustrate rich behavioral models in games and Markov decision processes through applications to journey planning in an uncertain environment. The shortest path problem asks to find a path of minimal length between a source vertex and a destination vertex within a weighted graph. It can be used to model minimization of travel time between two places in a town where the environment (traffic, weather, etc) is perfectly predictable. I will first discuss how it can be generalized to more realistic stochastic environments, in which case one usually wants to minimize its expected time to destination, or to maximize its chances to reach the destination within a given time. To that end, I will use (discrete) Markov decision processes. Then, I will present recent extensions allowing to reason about richer types of travel strategies (choice of route, mean of transport, etc): strategies that take into account both the length of a journey and its cost, strategies that minimize the expected travel time while providing strong guarantees on the worst-case travel time (i.e., when considering the environment as an opponent in a two-player game), and more. I will focus on the modeling power of the different models through natural application scenarios of the everyday life.
Jean-François Raskin, ULB - Université libre de Bruxelles, Belgium
Assume Admissible Synthesis.
In this talk, I will show how the notion of admissibility can be used to provide an elegant solution to the synthesis problem for reactive system where the environment is not completely adversarial or when the system is composed of several components. In particular, I will introduce a novel rule for synthesis of reactive systems, applicable to systems made of n components which have each their own objectives. Contrary to previous proposals in the literature, the rule « assume admissible » defines sets of solutions which are rectangular. This property leads to solutions which are robust and resilient, and allows one to synthesize strategies separately for each component.
Marie Van Den Bogaard, ULB - Université libre de Bruxelles, Belgium
Beyond admissibility: Dominance between chains of strategies.
In this talk, we focus on the concept of rational behaviour in multi-player games on finite graphs, taking the point of view of a player that has access to the structure of the game but cannot make assumptions on the preferences of the other players. In the qualitative setting, admissible strategies have been shown to fit the rationality requirements, as they coincide with winning strategies when these exist, and enjoy the fundamental property that every strategy is either admissible or dominated by an admissible strategy. However, as soon as there are three or more payoffs, one finds that this fundamental property does not necessarily hold anymore: one may observe chains of strategies that are ordered by dominance and such that no admissible strategy dominates any of them. Thus, to recover a satisfactory rationality notion (still based on dominance), we depart from the single strategy analysis approach and consider instead chains of strategies as families of behaviours. We establish a sufficient criterion for games to enjoy a similar fundamental property, ie, all chains are below some maximal chain, and, as an illustration, we present a class of games where admissibility fails to capture some intuitively rational behaviours, while our chain-based analysis does.
Based on a joint work with N. Basset, I. Jecker, A. Pauly and J.-F. Raskin, presented at CSL'18.
Bruno Ziliotto, Université Paris Dauphine, France
Partially Observable Markov Decision Processes and stochastic games with finite memory.
A Partially Observable Markov Decision Process (POMDP) is a discrete-time repeated decision-problem where at each period, the stage payoff depends both on the stage action and on the current state of the world. The state evolves stochastically from one stage to the other. The decision-maker does not know the state, but receives a stream of signals about it.
> One example is a cleaning robot, that does not know the configuration of the room it is cleaning, but learns it while cleaning. We consider a long interaction, and prove that the decision-maker has approximately optimal strategies that have finite memory. This implies that the problem of computing approximately the long-term value of a POMDP is semi-decidable, which is surprising since it was known to be undecidable. In the second part of the talk, I will consider the same problem in the two-player case (stochastic games), and survey recent results on the subject. Joint work with K. Chatterjee and R. Saona.