# Quantum Stochastic Walks On Networks For Decision Making

We describe the cognitive state of the agent in a Hilbert space , and we denote a state by |ψ〉. Let the state be definite, then the Schrödinger equation d|ψ(t)〉/dt = −iH|ψ(t)〉 formalizes its time evolution if the system is isolated, where H is the Hamiltonian (a Hermitian operator acting on ) and i2 = −1. Nevertheless, in a general case we do not know the state, so the system has to be described by a mixed state or density matrix ρ. This ρ is a statistical mixture of pure states and formally it is a Hermitian, non-negative operator, whose trace is equal to one. The natural extension of the Schrödinger equation to density matrices is the von Neumann equation

with [H, ρ] the commutator Hρ − ρH.

Furthermore, we consider that the best description for the ‘mind’ of an agent involved in a decision-making process is not as an isolated system, but one subject to some interaction with the environment. Therefore, its evolution is not given by the simple von Neumann equation.

Let our system of interest be a composite of two constituents: M and E, mind and environment. Due to the whole system (mind and environment) being isolated, ρM+E evolves according to the dynamics given in Eq. (1) by definition. From this, we can focus specifically on the state of M if we take partial trace over the Hilbert space of E such that the subsystem of the mind is ρM = TrE(ρM+E)34. Henceforth we drop the subindex M when referring to the state of the mind.

In order to know the equation of motion of ρ, we should take partial trace in Eq. (1), which is generally impossible. However, under the assumption of Markovianity (the evolution can be factorized as given a sequence of instants t0, t1, t2) one can find the most general form of this time evolution based on a time local master equation , with a differential superoperator (it acts over operators) called Lindbladian, embedding the standard form in which any Markovian master equation can be considered, and given by the Lindblad-Kossakowski equation35 such that

Here, H is the Hamiltonian of the subsystem of interest (in our case, the ‘mind’ of the decision-maker), the matrix with elements γ(m, n) is positive semidefinite, L(m, n) is a set of linear operators, and denotes the anticommutator . The meaning of the subindices (m, n) shall become clear later.

The second part in the master equation Eq. (2) contains the dissipative term responsible for the irreversibility in the decision-making process, weighted by the coefficient α such that the parameter α [0, 1] interpolates between the von Neumann evolution (α = 0) and the completely dissipative dynamics (α = 1). The section Methods covers the basics required to reach this formulation. See also Fig. 1 for an axiomatic construction of the quantum stochastic walks.

### From the tree to the network

A usual feature of many standard models for the analysis of decision-making problems is their representation as a graph with characteristics of a directed tree. We can understand such models as a root node 0 connected to each possible state of the world Ω W that the decision-maker can face. For each of this possible states, there is a set of weighted edges linking each state of the world Ω to the actions iS that the agent can take. These are nested models implying a sequential structure in the cognitive process: the agent is supposed to first form her (possibly own) beliefs about the (distribution of) states of the world and then optimize her action choice as a response to this information.

Our work departs from this standard setting and proposes a model where a richer networked structure of the decision-making mechanism represents an incessant flow of the agent’s response-probabilities conditioned on the topology of the problem. We propose that the decision-making process is a combination of the comparison of utilities taking place simultaneously with the elicitacion of beliefs and therefore removing the nested structure. The process extends over an interval of time and due to the dissipative dynamics we compute the unique stationary distribution of random walkers defining the behavioral choice-probabilities.

An appropriate definition of the so-called dissipators–operators L(m, n) in Eq. (2)–allows the quantum formalism to also contain any classical random walk. The possible moves that the walker can make from each node can be described by a network, such that each node represents observable states of the system, and the edges account for the allowed transitions. This in turn relates the transition matrix defining the dynamics of the stochastic process to the structure of an underlying network. We prove in the Methods section how the dissipators lead to a unique stationary solution if they are defined as L(m, n) = |m〉〈n|, with γ(m, n) = cmn being cmn the entries of a cognitive matrix C(λ, φ) formalized as the linear combination of two matrices, Π(λ) and B, associated to the profitability comparison between alternatives and the formation of beliefs, respectively. The parameters λ and φ become meaningful in the next section, together with the definition of C(λ, φ) in Eq. (4).

### The cognitive matrix

As the starting point for defining the cognitive matrix in our model, we consider one of the most basic yet meaningful formulations of probabilistic choice theory: Luce’s choice axiom37,38,39. In this framework, given a choice-set S containing the available alternatives, the system of choice probabilities is defined by , for every iS, with wi being a scalar measure of some salient properties of the alternatives: a weight of each element within the set of available options.

A natural parametrization for the salience of each alternative iS is to define wi = u(i|Ω)λ, where u(i|Ω) relates to the payoff the decision-maker obtains from taking action i if the state of the world is Ω. Because the terms u(·|·) have to be non-negative, situations with negative payoffs can be included after a monotonic transformation, the standard procedure in discrete choice theory. The exponent λ [0, ∞) measures the agent’s ability to discriminate the profitability among the different options. When λ = 0, each element iS has the same probability of being chosen (1/NS with NS the cardinality of the set S), and when λ → ∞ only the dominant alternative is chosen. If there is more than one option with the same maximum valuation, then the probability of an option being chosen is uniform within the restricted subset of the most preferred ones.

We now build the aforementioned matrix Π(λ) relying on the response probabilities pS(i) already defined. Let the connected components of the graph be in a bijection with the set of states of the world W such that each Ω W is related to one and only one connected component. The number of nodes KΩ in the connected component associated to each possible state of the world Ω is the size of the corresponding action set. Let ni(Ω) be the node representing the event of the decision-maker taking action i when considering the state of the world is Ω. Then, every node ni(Ω) has KΩ incoming flows of walkers, one from each of the other nodes nj(Ω) (j ≠ i) and one self-edge. These links are edges weighted in the spirit of Luce’s choice axiom,

Note that every node ni(Ω) has KΩ outgoing edges eΩ(i, j), generally with KΩ different weights pΩ(j). See Fig. 2 for a graphical example deriving the matrix Π(λ) from the sequential tree.

We can define Π(λ) as a transition matrix where every entry πij(Ω) is the probability that a random walker switches from action i to j for a given state of the world Ω. The navigation of random walkers along the network described by Π(λ) accounts for the comparison between alternatives for each given state of the world.

The decision-maker faces simultaneously another cognitive activity: the formation of her beliefs about the state of the world (either a forecast on some external random event, or a prediction on the behavior of an interacting agent). We model this process through the definition of the matrix B such that its entries connect nodes of the form aik) to those of the form ail). Thus, B allows the walker to introduce a change of belief about the state of the world in the cognitive process by jumping from one connected component associated to a particular state of the world ΩkW to the connected component associated to another one ΩlW, while keeping the action i fixed.

We denote the cognitive matrix by C(λ, φ), which is defined as

where φ (0, 1) is a parameter assessing the relevance of the formation of beliefs during the decision-making process. The superscript T denotes the transpose matrix. We discuss the reason for obtaining C(λ, φ) after the transposition of the transition matrix in the Methods section. Combining Π(λ) with B is crucial for the dynamics of the process: B establishes connections between the NW (originally disjoint) connected components described by Π(λ). Therefore, C(λ, φ) describes a weighted (and oriented) graph with only one connected component which contains now NW strongly connected components, one per each possible state of the world.

Typically, we may consider risky or uncertain situations to be objective if a random move has to be realized (lotteries), subjective if the agent has to evaluate probabilities based on her own judgment, or strategic if there is a game-theoretic interaction with hidden or simultaneous move of the opponents. As a consequence, there is a certain degree of arbitrariness in the way we can define the entries {bkl} for the matrix of belief formation, as long as its linear combination with Π(λ) guarantees existence and uniqueness of the stationary distribution ρ*. This is satisfied when the cognitive matrix fulfills the Perron-Frobenius theorem, i.e., C(λ, φ) is irreducible and aperiodic40.

In the following part of the paper, we propose a definition for the matrix B in line with the standard models for strategic decision-making. Nevertheless, even if there is no particular information given by the problem, one can always define bkl(i) = 1/(NW − 1), with NW being the cardinality of the set W. This ‘homogenous’ law for the change of beliefs is reminiscent of the long-distance hopping matrix which has been fruitfully exploited in the study of ranking problems through quantum navigation of networks41,42.

### Analyzing the Prisoner’s Dilemma

The Prisoner’s Dilemma is widely considered to be the cognitive and game-theoretical task equivalent to the harmonic oscillator in physics: a well-defined problem with quite a simple formulation but still rich possibilities for both experimental and theoretical exploration, which qualifies this problem as the first system for which a new model should provide consistent explanation as a benchmark case study.

The symmetric Prisoner’s Dilemma is a game involving two players, A and B. They can choose among two actions: cooperate (C) or defect (D). Considering the game in its normal form, it is defined by the following payoff matrix,

where d > a > c > b implies mutual cooperation is the Pareto optimal situation (maximum social payoff). In standard game theory, defection is the dominant strategy for both players, so mutual defection is the Nash equilibrium of this game (strategy profile stable against unilateral deviations), which is not an efficient outcome when compared against mutual cooperation. The rational prediction for the play of this game is the choice of defection as action, together with the expectation (belief) of facing also defection from the opponent, even though a fraction of cooperation usually appears when humans play this game43.

On the one hand, deviations from the purely rational (Nash) equilibrium in games can actually be modelled with classical probability theory if we consider stochastic choice-making with a finite value of the ‘rationality exponent’ analogous to the parameter λ already defined in Eq. (3)–see, e.g., the concept of quantal response equilibrium44–but on the other hand, deeper empirical findings challenge the validity of the axioms of classical probability theory at their fundamental level: experiments with the Prisoner’s Dilemma game can also be used to show how the Sure Thing Principle (a direct consequence from the law of total probability in the classical framework) is violated45.

These two effects together lead Pothos et al.46 to formulate a quantum walk outperforming the predictions from the classical model, concluding that “human cognition can and should be modelled within a probabilistic framework, but classical probability theory is too restrictive to fully describe human cognition”. Their model incorporates a unitary evolution (QW) originated from a Hamiltonian operator implementing cognitive dissonances. Nevertheless, the unitary evolution lacks stationary solutions unless a stopping time is exogenously incorporated into the process, which raises also a fundamental concern about how to apply the class of models using QWs both from a conceptual and a practical standpoint47. We show later how the present model of QSWs accounts for these violations of the Sure Thing Principle in a natural way and that it has a stationary solution.

Thus, it is our intention to show how the more general formulation of quantum stochastic walks (QSWs) interpolating between both extreme cases (CRW with α = 1, and QW with α = 0) is able to incorporate the positive aspects of both models such as the relaxing dynamics towards a well-defined stationary solution from the classical perspective, together with the possibility for coupling and interference between populations and coherences through the unitary evolution. Besides, we respect the standard usage of the parameter λ as an upper-bound in the optimality of the solution, while our networked definition of the problem introduces new effects which are not reachable with the traditional representations of decision-making trees that do not allow for simultaneous exploration through both spaces of preferences and beliefs in parallel.

### Fine tuning the model

To illustrate the predictions of this class of models, we consider the payoff matrix used in the experiments conducted by Shafir and Tversky45, a = 75, b = 25, c = 30, and d = 85. From the point of view of any of the two players (this game is symmetric), the following identification for the action set and the possible states of the world is straightforward: u(C|C) = a, u(C|D) = b, u(D|C) = d, u(D|D) = c. We have a four-dimensional space of states because two possible actions are associated to two possible states of the world. We choose the basis of the system spanning the space of states to be . As an example, these pure states are defined such that indicates the cognitive state in which the player chooses to execute the action C and holds the expectation of the opponent choosing D. The same definition holds for the other combinations. Following Eq. (3) and the definition of Π(λ) discussed in the introduction of the model, we write

with , and . Then, for a given set of payoff values, the weights for the dynamics in the decision-making process are specific to the type of player given by the rationality parameter λa, where the index a just indicates that the parameter λ is representative of the player comparing the actions. See panel (a) in Fig. 3 for its graphical representation as a network.

We are dealing with a situation of strategic uncertainty, so we define the matrix of formation of beliefs to be dependent on the payoff entries corresponding to the opponent, analogous to the definition of Π. In this case, the frequency with which a stochastic walker will jump from the belief associated to the state of the world C to the state of the world D is directly the estimation of how the other player is weighting her action C versus her action D during her deliberation on profitabilities, and then we write

In a first step, B gets defined as a function of the rationality exponent associated to the second player whose action-set defines the set of possible states of the world faced by the first player. Because the game of the Prisoner’s Dilemma is symmetric, we can assume a common value λa = λb = λ which simplifies the model.

Combining these two connectivity patterns (do not forget the operation of matrix transposition) in the linear fashion defined in Eq. (4) finally determines the cognitive matrix C(λ, φ) given in Eq. (8),

and modelling human behavior in Prisoner’s Dilemma games together with the Hamiltonian

We use a simplified construction Hij = 1 if the nodes are connected in Π(λ) and zero otherwise, in agreement with other applications of quantum rankings successfully explored in the literature about complex networks42. See a more thorough discussion on possible definitions of H in the corresponding section in Methods.

### Behavioral aspects

The cognitive matrix C(λ, φ) in Eq. (8) and the Hamiltonian operator in Eq. (9) define a three-parametric family of Lindbladian operators for the play of Prisoner’s Dilemma games. Since there exists a unique stationary solution ρ* for each evolution defined by the values of (α, λ, φ) as we prove in the Methods section, then we have defined a whole family of behaviors in the game which we analyze now as a function of the parameters.

#### λ as a measure of bounded rationality

We should consider the level of rationality in the play in comparison to the Nash equilibrium (DD), the reference of a rational outcome in the game. When we introduce λ in the model in Eq. (3), this parameter is a monotonic measure of the ability to discriminate between the profitability of the different options, and as such it has also a strictly monotonic influence on the level of rationality in the equilibrium predictions: the higher the λ, cet. par., the higher is the probability of choosing the dominant action. See Fig. 3-Panel (b).

In our model, the parameter λ really plays the role of upper-binding the level of rationality in the process and not just a point-prediction. It determines the maximum probability of playing defection, while for a given λ different probability outcomes are achieved depending on the tradeoff between α and φ. See in Figs 3-Panel (c) and 4-Panel (a) how the weight on the belief formation in the dynamical process shapes the smoothness/steepness of the transition from pure randomization to the bounded level of rationality as a function of α, with the behavior getting closer to the allowed maximum when the process becomes more classical (α → 1).

We see in Fig. 4-Panel (b) how the finite limit in the level of rationality λ translates into a one-to-one correspondence with the expectation on the level of defection (black solid line), which remains basically constant and independent of the values of α and φ. Therefore, experimental results on belief elicitation can be used to adjust the numerical value of λ.

#### Believing the same to act different

This model based on the connected topology for the dynamical process combining simultaneously the formation of beliefs and the comparison of actions reveals an interesting effect: even for fixed values of λ (and then also fixed expectation on the rival’s move), it is possible to obtain different choice probabilities as a result of the different weights assigned to each of the two cognitive processes through φ.

We see in Fig. 4-Panel (b) how the probability of choosing defection as action (orange solid line) is decreasing on φ (for each possible value of α). This effect is very intuitive since higher φ implies less focusing on the discrimination between the profitability of own actions of the player. The dynamical process of decision-making incorporates this effect as a consequence of higher values of λ generating lower weights in the connections of the cognitive network C(λ, φ) inherited from the matrix Π(λ). This effect is hardly obtainable with standard models based on decision-making trees, since their sequential structure does not allow for the interaction between nodes belonging to different states of the world.

#### Relaxation time

By definition, α is the parameter interpolating between the unitary evolution (α = 0) which is a process of continuous oscillation without stationary solution, and the Markovian evolution (α = 1) which is dissipative and has a stationary solution. Thus, α is expected to play an important role in the determination of the relaxation time of our networked quantum stochastic model for decision-making. We denote it by τ, and we discuss its definition in the Methods section. As a cautious remark, let us say that the relaxation time of the dynamics may not always be the endogenously determined decision time of a subject on a given trial. The decision time will likely be a random variable with a distribution of stopping times. For further elaboration on this issue, see e.g., Busemeyer et al.30 and Fuss and Navarro48.

Regarding the two cognitive parameters λ and φ, we observe no influence of λ on the magnitude of the relaxation time. We see in Fig. 4-Panel (c) how τ depends on φ with a clear minimum τMin. The curve asymptotically diverges for φ → 0, while it remains finite for φ > 0 if α (0, 1], unless α = 1 and φ = 1, when τ diverges as well. As α approaches 1, τ vs. φ becomes very large for high values of φ, resulting in a U-shaped curve (inset of Fig. 4-Panel c). This comes from the tradeoff in the dynamics between the cognitive matrix C(λ, φ) and our choice of the Hamiltonian (Eq. 9). As the reader can see in the Methods sections, existence of the stationary solution requires the network to be connected such that no node is isolated, and the cognitive network represented by the matrix C(λ, φ) becomes disjoint if φ = 0, when there is no transition allowed between the components associated to the two states of the world. Thus, we can say that the presence of deliberation about the possible states of the world is crucial for the existence of a stationary solution, and therefore the process of construction of the belief is a key aspect in the convergence towards a stationary state.

In Fig. 4-Panel (d) we analyze τMin and φMin as a function of α. Considering τ as a function τ(α, φ), we implicitly define φMin(α) as the value of φ for which τ is minimum, for each possible α. This figure clearly shows how the relaxation time remains finite for non-zero values of α, and decreasing the higher is the influence of the Markovian aspect of the dynamics. The abrupt step we observe in the relationship between φMin and the values of the parameter α is due to the breakdown of degeneracies in the spectrum of the Lindbladian superoperator at that point. Note that for a fully classical case , which would imply a fully homogeneous combination of Π(λb) and B(λb), in agreement with the symmetry of the Prisoner’s Dilemma problem and the choice of parametrization λa = λb = λ.

#### Stationary solution

We prove the existence and uniqueness of the stationary solution for our class of quantum stochastic walks for decision-making in the Methods section. Moreover, the solution is analytically defined and can be computed by exact diagonalization of the Lindbladian superoperator without the need for numerical simulations. Nevertheless, and only for illustrative purposes, we show the explicit evolution of the component PDD for several initial conditions in Fig. 4-Panel (e), and the joint evolution of the four components starting from one extreme initial condition of full cooperation in Fig. 4-Panel (f).

Finally, let us emphasize the connection between the three parameters of the model and their observable counterparts. The bounded rationality parameter λ can be related to the result of the formation of beliefs about the opponent’s move. For a given λ, the choice probabilities in the space of actions and the relaxation time of the dynamical process are both governed by the pair (α, φ). Thus, we provide a model with three parameters to be estimated by the appropriate measurement of three observables. The presence of a distribution in stopping times can be proxied through a distribution of values for τ as a consequence of having a population of players with heterogeneous values of the parameters, such as different weights in the formation of expectations.

### Violation of the Sure Thing Principle

In the spirit of Pothos and Busemeyer46, we turn now to the application of this quantum model in explaining the so-called violations of the ‘Sure Thing’ Principle often observed in human behavior. This principle dates back to the work by Savage49 and can be understood as follows. Let a decision-maker decide between two options (A or B) when the actual state of the world (may it be the choice of an opponent, an objective lottery, or any other setting with uncertainty) is unknown, but the decision-maker knows that it can be either X or Y. Then, as a consequence of the (classical) law of total probability applied to modelling human behavior, if a decision-maker prefers A over B if the state of the world was known to be X and also prefers A over B if the state was known to be Y, she should also choose A when the state of the world is uknown because A is superior to B for every expectation on the realization of X/Y. Nevertheless, this principle was already refuted in an experiment by Tversky and Shafir50, an observation which has been regularly reproduced afterwards.

Busemeyer et al.51 and Pothos and Busemeyer46 provide a further review of empirical evidence on this issue and also show how quantum-inspired models can account for this effect, outperforming the classical ones. They explicitly compare models based on unitary evolution of the decision-making probabilities versus their Markovian counterparts. Despite of the (qualitative and quantitative) success of these quantum-like models, they are subject to the already mentioned criticism of lacking stationary solutions defined endogenously. We want to briefly show here how the stationary states of the quantum stochastic walks that we have defined in this paper can model the violations of the Savage’s Principle in a parsimonious manner. Furthermore, this effect is available only if the model is not restricted to its classical part (α = 1) but applied in its general way (0 < α < 1), emphasizing the synergies from combining both the quantum and the classical term in this dynamics.

In order to make our case, we reproduce the experimental results in Busemeyer et al.51. The entries of the payoff matrix are a = 20, b = 5, c = 10, and d = 25. Their results show a defection rate of 91% when the subjects know their opponent will defect, and of 84% when they know the rival’s action is to cooperate. The Sure Thing Principle is violated in this experiment because the defection rate when the choice of the opponents is unknown drops to 66%. See model fit to this data in Fig. 5-Panel (a).

First, we consider the two defection rates when the state of the world is known, and use them to obtain the best fit of the model under the constraint φ = 0, because in these two situations the decision-maker does not need to allocate any effort to build an expectation about the rival’s move since it is fixed by default. The dynamics are solved numerically, and we choose the density matrices with diagonal elements and as initial points for the two scenarios (the rival defects or cooperates) such that the system is confined to the subspace of each announced state of the world. We obtain the best fit for the parameter values λ = 10.495 and α = 0.812, yielding predictions of 0.911 and 0.839 for the two defection rates in the sure situations.

Second, we take these values for (α, λ) as fixed and study the impact of introducing uncertainty in the decision-making process. This is modelled by the parameter φ > 0, which means that the decision-maker has to assign some effort to the ‘guessing task’. We see in Panel (a) that the quantum stochastic walk naturally includes violations of the Sure Thing Principle in this setting when the weight of the matrix B in the dynamics becomes more relevant. We obtain as the critical value of φ for which the predicted outcome for this experiment lies below the defection rate of 84%, and the value φexp = 0.898 models the experimental result of only 66% of defection in the uncertain situation. Finally, Fig. 5-Panel (b) illustrates how this effect is not available when only the classical term is considered (by fixing α = 1). It is straightforward to see how in such a classical case, the prediction (for any value of λ) is independent of the parameter φ. One can understand this by noticing that several type of transitions are not present in the dynamics when only the CRW applies (see Fig. 1-Panel (b) once again).

Source : https://www.nature.com/articles/srep23812

Thank you for visit my website