# publications

Publications in reversed chronological order

## 2024

- Nerve injury disrupts temporal processing in the spinal cord dorsal horn through alterations in PV+ interneuronsGenelle Rankin, Anda M. Chirila, Alan J. Emanuel, Zihe Zhang, Clifford J. Woolf, Jan Drugowitsch, and David D. Ginty
*Cell Reports*, 2024How mechanical allodynia following nerve injury is encoded in patterns of neural activity in the spinal cord dorsal horn (DH) remains incompletely understood. We address this in mice using the spared nerve injury model of neuropathic pain and in vivo electrophysiological recordings. Surprisingly, despite dramatic behavioral over-reactivity to mechanical stimuli following nerve injury, an overall increase in sensitivity or reactivity of DH neurons is not observed. We do, however, observe a marked decrease in correlated neural firing patterns, including the synchrony of mechanical stimulus-evoked firing, across the DH. Alterations in DH temporal firing patterns are recapitulated by silencing DH parvalbumin+ (PV+) interneurons, previously implicated in mechanical allodynia, as are allodynic pain-like behaviors. These findings reveal decorrelated DH network activity, driven by alterations in PV+ interneurons, as a prominent feature of neuropathic pain and suggest restoration of proper temporal activity as a potential therapeutic strategy to treat chronic neuropathic pain.

- bioRxivAn opponent striatal circuit for distributional reinforcement learningAdam S. Lowet, Qiao Zheng, Melissa Meng, Sara Matias, Jan Drugowitsch, and Naoshige Uchida
*bioRxiv*, 2024Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards - an approach known as distributional reinforcement learning (RL). The mesolimbic dopamine system is thought to underlie RL in the mammalian brain by updating a representation of mean value in the striatum, but little is known about whether, where, and how neurons in this circuit encode information about higher-order moments of reward distributions4. To fill this gap, we used high-density probes (Neuropixels) to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task in which reward mean, reward variance, and stimulus identity were independently manipulated. In contrast to traditional RL accounts, we found robust evidence for abstract encoding of variance in the striatum. Remarkably, chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding. Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons - D1 and D2 MSNs - contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively. We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 MSNs to reap the computational benefits of distributional RL.

## 2023

- bioRxivCompetitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamicsMichael Bukwich, Malcolm G. Campbell, David Zoltowski, Lyle Kingsbury, Momchil S. Tomov, Joshua Stern, HyungGoo R. Kim, Jan Drugowitsch, and 2 more authors
*bioRxiv*, 2023The ability to make advantageous decisions is critical for animals to ensure their survival. Patch foraging is a natural decision-making process in which animals decide when to leave a patch of depleting resources to search for a new one. To study the algorithmic and neural basis of patch foraging behavior in a controlled laboratory setting, we developed a virtual foraging task for head-fixed mice. Mouse behavior could be explained by ramp-to-threshold models integrating time and rewards antagonistically. Accurate behavioral modeling required inclusion of a slowly varying patience" variable, which modulated sensitivity to time. To investigate the neural basis of this decision-making process, we performed dense electrophysiological recordings with Neuropixels probes broadly throughout frontal cortex and underlying subcortical areas. We found that decision variables from the reward integrator model were represented in neural activity, most robustly in frontal cortical areas. Regression modeling followed by unsupervised clustering identified a subset of neurons with ramping activity. These neurons firing rates ramped up gradually in single trials over long time scales (up to tens of seconds), were inhibited by rewards, and were better described as being generated by a continuous ramp rather than a discrete stepping process. Together, these results identify reward integration via a continuous ramping process in frontal cortex as a likely candidate for the mechanism by which the mammalian brain solves patch foraging problems.

- Causal inference during closed-loop navigation: parsing of self- and object-motionJean-Paul Noel, Johanes Bill, Haoran Ding, John Vastola, Gregory C DeAngelis, Dora Angelaki, and Jan Drugowitsch
*Philosophical Transactions of the Royal Society B*, Aug 2023A key computation in building adaptive internal models of the external world is to ascribe sensory signals to their likely cause(s), a process of causal inference (CI). CI is well studied within the framework of two-alternative forced-choice tasks, but less well understood within the cadre of naturalistic action–perception loops. Here, we examine the process of disambiguating retinal motion caused by self- and/or object-motion during closed-loop navigation. First, we derive a normative account specifying how observers ought to intercept hidden and moving targets given their belief about (i) whether retinal motion was caused by the target moving, and (ii) if so, with what velocity. Next, in line with the modelling results, we show that humans report targets as stationary and steer towards their initial rather than final position more often when they are themselves moving, suggesting a putative misattribution of object-motion to the self. Further, we predict that observers should misattribute retinal motion more often: (i) during passive rather than active self-motion (given the lack of an efference copy informing self-motion estimates in the former), and (ii) when targets are presented eccentrically rather than centrally (given that lateral self-motion flow vectors are larger at eccentric locations during forward self-motion). Results support both of these predictions. Lastly, analysis of eye movements show that, while initial saccades toward targets were largely accurate regardless of the self-motion condition, subsequent gaze pursuit was modulated by target velocity during object-only motion, but not during concurrent object- and self-motion. These results demonstrate CI within action–perception loops, and suggest a protracted temporal unfolding of the computations characterizing CI.

- Diverse effects of gaze direction on heading perception in humansWei Gao, Yipeng Lin, Jiangrong Shen, Jianing Han, Xiaoxiao Song, Yukun Lu, Huijia Zhan, Qianbing Li, and 6 more authors
*Cerebral Cortex*, Feb 2023Gaze change can misalign spatial reference frames encoding visual and vestibular signals in cortex, which may affect the heading discrimination. Here, by systematically manipulating the eye-in-head and head-on-body positions to change the gaze direction of subjects, the performance of heading discrimination was tested with visual, vestibular, and combined stimuli in a reaction-time task in which the reaction time is under the control of subjects. We found the gaze change induced substantial biases in perceived heading, increased the threshold of discrimination and reaction time of subjects in all stimulus conditions. For the visual stimulus, the gaze effects were induced by changing the eye-in-world position, and the perceived heading was biased in the opposite direction of gaze. In contrast, the vestibular gaze effects were induced by changing the eye-in-head position, and the perceived heading was biased in the same direction of gaze. Although the bias was reduced when the visual and vestibular stimuli were combined, integration of the 2 signals substantially deviated from predictions of an extended diffusion model that accumulates evidence optimally over time and across sensory modalities. These findings reveal diverse gaze effects on the heading discrimination and emphasize that the transformation of spatial reference frames may underlie the effects.

- Bayesian inference in ring attractor networksAnna Kutschireiter, Melanie A Basnak, Rachel I. Wilson, and Jan Drugowitsch
*Proceedings of the National Academy of Sciences*, Feb 2023Working memories are thought to be held in attractor networks in the brain. These attractors should keep track of the uncertainty associated with each memory, so as to weigh it properly against conflicting new evidence. However, conventional attractors do not represent uncertainty. Here, we show how uncertainty could be incorporated into an attractor, specifically a ring attractor that encodes head direction. First, we introduce a rigorous normative framework (the circular Kalman filter) for benchmarking the performance of a ring attractor under conditions of uncertainty. Next, we show that the recurrent connections within a conventional ring attractor can be retuned to match this benchmark. This allows the amplitude of network activity to grow in response to confirmatory evidence, while shrinking in response to poor-quality or strongly conflicting evidence. This “Bayesian ring attractor” performs near-optimal angular path integration and evidence accumulation. Indeed, we show that a Bayesian ring attractor is consistently more accurate than a conventional ring attractor. Moreover, near-optimal performance can be achieved without exact tuning of the network connections. Finally, we use large-scale connectome data to show that the network can achieve near-optimal performance even after we incorporate biological constraints. Our work demonstrates how attractors can implement a dynamic Bayesian inference algorithm in a biologically plausible manner, and it makes testable predictions with direct relevance to the head direction system as well as any neural system that tracks direction, orientation, or periodic rhythms.

- Is the information geometry of probabilistic population codes learnable?John J. Vastola, Zach Cohen, and Jan Drugowitsch
*In Proceedings of the 1st NeurIPS Workshop on Symmetry and Geometry in Neural Representations*, 03 dec 2023One reason learning the geometry of latent neural manifolds from neural activity data is difficult is that the ground truth is generally not known, which can make manifold learning methods hard to evaluate. Probabilistic population codes (PPCs), a class of biologically plausible and self-consistent models of neural populations that encode parametric probability distributions, may offer a theoretical setting where it is possible to rigorously study manifold learning. It is natural to define the neural manifold of a PPC as the statistical manifold of the encoded distribution, and we derive a mathematical result that the information geometry of the statistical manifold is directly related to measurable covariance matrices. This suggests a simple but rigorously justified decoding strategy based on principal component analysis, which we illustrate using an analytically tractable PPC.

## 2022

- Efficient stabilization of imprecise statistical inference through conditional belief updatingJulie Drevet, Jan Drugowitsch, and Valentin Wyart
*Nature Human Behaviour*, Dec 2022Statistical inference is the optimal process for forming and maintaining accurate beliefs about uncertain environments. However, human inference comes with costs due to its associated biases and limited precision. Indeed, biased or imprecise inference can trigger variable beliefs and unwarranted changes in behaviour. Here, by studying decisions in a sequential categorization task based on noisy visual stimuli, we obtained converging evidence that humans reduce the variability of their beliefs by updating them only when the reliability of incoming sensory information is judged as sufficiently strong. Instead of integrating the evidence provided by all stimuli, participants actively discarded as much as a third of stimuli. This conditional belief updating strategy shows good test–retest reliability, correlates with perceptual confidence and explains human behaviour better than previously described strategies. This seemingly suboptimal strategy not only reduces the costs of imprecise computations but also, counterintuitively, increases the accuracy of resulting decisions.

- Visual motion perception as online hierarchical inferenceJohannes Bill, Samuel J. Gershman, and Jan Drugowitsch
*Nature Communications*, Dec 2022Identifying the structure of motion relations in the environment is critical for navigation, tracking, prediction, and pursuit. Yet, little is known about the mental and neural computations that allow the visual system to infer this structure online from a volatile stream of visual information. We propose online hierarchical Bayesian inference as a principled solution for how the brain might solve this complex perceptual task. We derive an online Expectation-Maximization algorithm that explains human percepts qualitatively and quantitatively for a diverse set of stimuli, covering classical psychophysics experiments, ambiguous motion scenes, and illusory motion displays. We thereby identify normative explanations for the origin of human motion structure perception and make testable predictions for future psychophysics experiments. The proposed online hierarchical inference model furthermore affords a neural network implementation which shares properties with motion-sensitive cortical areas and motivates targeted experiments to reveal the neural representations of latent structure.

- Controllability boosts neural and cognitive signatures of changes-of-mind in uncertain environmentsMarion Rouault, Aurélien Weiss, Junseok K Lee, Jan Drugowitsch, Valerian Chambon, and Valentin Wyart
*eLife*, Sep 2022In uncertain environments, seeking information about alternative choice options is essential for adaptive learning and decision-making. However, information seeking is usually confounded with changes-of-mind about the reliability of the preferred option. Here, we exploited the fact that information seeking requires control over which option to sample to isolate its behavioral and neurophysiological signatures. We found that changes-of-mind occurring with control require more evidence against the current option, are associated with reduced confidence, but are nevertheless more likely to be confirmed on the next decision. Multimodal neurophysiological recordings showed that these changes-of-mind are preceded by stronger activation of the dorsal attention network in magnetoencephalography, and followed by increased pupil-linked arousal during the presentation of decision outcomes. Together, these findings indicate that information seeking increases the saliency of evidence perceived as the direct consequence of one’s own actions.

- Hierarchical structure learning for perceptual decision making in visual motion perceptionJohannes Bill, Samuel J. Gershman, and Jan Drugowitsch
*In The 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM-22). Brown University, Providence, RI, USA.*, May 2022Successful behavior in the real world critically depends on discovering the latent structure behind the volatile inputs reaching our sensory system. Our brains face the online task of discovering structure at multiple timescales ranging from short-lived correlations, to the structure underlying a scene, to life-time learning of causal relations. Little is known about the mental and neural computations driving the brain’s ability of online, multi-timescale structure inference. We studied these computations by the example of visual motion perception owing to the importance of structured motion for behavior. We propose online hierarchical Bayesian inference as a principled solution for how the brain might solve multi-timescale structure inference. We derive an online Expectation-Maximization algorithm that continually updates an estimate of a visual scene’s underlying structure while using this inferred structure to organize incoming noisy velocity observations into meaningful, stable percepts. We show that the algorithm explains human percepts qualitatively and quantitatively for a diverse set of stimuli, covering classical psychophysics experiments, ambiguous motion scenes, and illusory motion displays. It explains experimental results of human motion structure classification with higher fidelity than a previous ideal observer-based model, and provides normative explanations for the origin of biased perception in motion direction repulsion experiments. To identify a scene’s structure the algorithm recruits motion components from a set of frequently occurring features, such as global translation or grouping of stimuli. We demonstrate in computer simulations how these features can be learned online from experience. Finally, the algorithm affords a neural network implementation which shares properties with motion-sensitive cortical areas MT and MSTd and motivates a novel class of neuroscientific experiments to reveal the neural representations of latent structure.

- A large majority of awake hippocampal sharp-wave ripples feature spatial trajectories with momentumEmma L. Krause, and Jan Drugowitsch
*Neuron*, May 2022During periods of rest, hippocampal place cells feature bursts of activity called sharp-wave ripples (SWRs). Heuristic approaches have revealed that a small fraction of SWRs appear to “simulate” trajectories through the environment, called awake hippocampal replay. However, the functional role of a majority of these SWRs remains unclear. We find, using Bayesian model comparison of state-space models to characterize the spatiotemporal dynamics embedded in SWRs, that almost all SWRs of foraging rodents simulate such trajectories. Furthermore, these trajectories feature momentum, or inertia in their velocities, that mirrors the animals’ natural movement, in contrast to replay events during sleep, which lack such momentum. Last, we show that past analyses of replayed trajectories for navigational planning were biased by the heuristic SWR sub-selection. Our findings thus identify the dominant function of awake SWRs as simulating trajectories with momentum and provide a principled foundation for future work on their computational function.

- Projection Filtering With Observed State Increments With Applications in Continuous-Time Circular FilteringAnna Kutschireiter, Luke Rast, and Jan Drugowitsch
*IEEE Transactions on Signal Processing*, Jan 2022Angular path integration is the ability of a system to estimate its own heading direction from potentially noisy angular velocity (or increment) observations. Non-probabilistic algorithms for angular path integration, which rely on a summation of these noisy increments, do not appropriately take into account the reliability of such observations, which is essential for appropriately weighing one’s current heading direction estimate against incoming information. In a probabilistic setting, angular path integration can be formulated as a continuous-time nonlinear filtering problem (circular filtering) with observed state increments. The circular symmetry of heading direction makes this inference task inherently nonlinear, thereby precluding the use of popular inference algorithms such as Kalman filters, rendering the problem analytically inaccessible. Here, we derive an approximate solution to circular continuous-time filtering, which integrates state increment observations while maintaining a fixed representation through both state propagation and observational updates. Specifically, we extend the established projection-filtering method to account for observed state increments and apply this framework to the circular filtering problem. We further propose a generative model for continuous-time angular-valued direct observations of the hidden state, which we integrate seamlessly into the projection filter. Applying the resulting scheme to a model of probabilistic angular path integration, we derive an algorithm for circular filtering, which we term the circular Kalman filter. Importantly, this algorithm is analytically accessible, interpretable, and outperforms an alternative filter based on a Gaussian approximation.

## 2021

- Interacting with volatile environments stabilizes hidden-state inference and its brain signaturesAurélien Weiss, Valérian Chambon, Junseok K. Lee, Jan Drugowitsch, and Valentin Wyart
*Nature Communications*, Jan 2021Making accurate decisions in uncertain environments requires identifying the generative cause of sensory cues, but also the expected outcomes of possible actions. Although both cognitive processes can be formalized as Bayesian inference, they are commonly studied using different experimental frameworks, making their formal comparison difficult. Here, by framing a reversal learning task either as cue-based or outcome-based inference, we found that humans perceive the same volatile environment as more stable when inferring its hidden state by interaction with uncertain outcomes than by observation of equally uncertain cues. Multivariate patterns of magnetoencephalographic (MEG) activity reflected this behavioral difference in the neural interaction between inferred beliefs and incoming evidence, an effect originating from associative regions in the temporal lobe. Together, these findings indicate that the degree of control over the sampling of volatile environments shapes human learning and decision-making under uncertainty.

- Optimal policy for attention-modulated decisions explains human fixation behaviorAnthony I. Jang, Ravi Sharma, and Jan Drugowitsch
*eLife*, Mar 2021Traditional accumulation-to-bound decision-making models assume that all choice options are processed with equal attention. In real life decisions, however, humans alternate their visual fixation between individual items to efficiently gather relevant information (Yang et al., 2016). These fixations also causally affect one’s choices, biasing them toward the longer-fixated item (Krajbich et al., 2010). We derive a normative decision-making model in which attention enhances the reliability of information, consistent with neurophysiological findings (Cohen and Maunsell, 2009). Furthermore, our model actively controls fixation changes to optimize information gathering. We show that the optimal model reproduces fixation-related choice biases seen in humans and provides a Bayesian computational rationale for this phenomenon. This insight led to additional predictions that we could confirm in human data. Finally, by varying the relative cognitive advantage conferred by attention, we show that decision performance is benefited by a balanced spread of resources between the attended and unattended items.

- Human visual motion perception shows hallmarks of Bayesian structural inferenceSichao Yang, Johannes Bill, Drugowitsch Jan, and Samuel J. Gershman
*Scientific Reports*, Feb 2021Motion relations in visual scenes carry an abundance of behaviorally relevant information, but little is known about how humans identify the structure underlying a scene’s motion in the first place. We studied the computations governing human motion structure identification in two psychophysics experiments and found that perception of motion relations showed hallmarks of Bayesian structural inference. At the heart of our research lies a tractable task design that enabled us to reveal the signatures of probabilistic reasoning about latent structure. We found that a choice model based on the task’s Bayesian ideal observer accurately matched many facets of human structural inference, including task performance, perceptual error patterns, single-trial responses, participant-specific differences, and subjective decision confidence—especially, when motion scenes were ambiguous and when object motion was hierarchically nested within other moving reference frames. Our work can guide future neuroscience experiments to reveal the neural mechanisms underlying higher-level visual motion perception.

- Scaling of sensory information in large neural populations shows signatures of information-limiting correlationsMohammadMehdi Kafashan, Anna W. Jaffe, Selmaan N. Chettih, Ramon Nogueira, Iñigo Arandia-Romero, Christopher D. Harvey, Rubén Moreno-Bote, and Jan Drugowitsch
*Nature Communications*, Jan 2021How is information distributed across large neuronal populations within a given brain area? Information may be distributed roughly evenly across neuronal populations, so that total information scales linearly with the number of recorded neurons. Alternatively, the neural code might be highly redundant, meaning that total information saturates. Here we investigate how sensory information about the direction of a moving visual stimulus is distributed across hundreds of simultaneously recorded neurons in mouse primary visual cortex. We show that information scales sublinearly due to correlated noise in these populations. We compartmentalized noise correlations into information-limiting and nonlimiting components, then extrapolate to predict how information grows with even larger neural populations. We predict that tens of thousands of neurons encode 95% of the information about visual stimulus direction, much less than the number of neurons in primary visual cortex. These findings suggest that the brain uses a widely distributed, but nonetheless redundant code that supports recovering most sensory information from smaller subpopulations.

## 2020

- Distributional Reinforcement Learning in the BrainAdam S. Lowet, Qiao Zheng, Sara Matias, Jan Drugowitsch, and Naoshige Uchida
*Trends in Neurosciences*, Oct 2020Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in the mammalian midbrain and the reward prediction errors of reinforcement learning algorithms, which express the difference between actual reward and predicted mean reward. However, it may be advantageous to learn not only the mean but also the complete distribution of potential rewards. Recent advances in machine learning have revealed a biologically plausible set of algorithms for reconstructing this reward distribution from experience. Here, we review the mathematical foundations of these algorithms as well as initial evidence for their neurobiological implementation. We conclude by highlighting outstanding questions regarding the circuit computation and behavioral readout of these distributional codes.

- Hierarchical structure is employed by humans during visual motion perceptionJohannes Bill, Hrag Pailian, Samuel J. Gershman, and Jan Drugowitsch
*Proceedings of the National Academy of Sciences*, Sep 2020In the real world, complex dynamic scenes often arise from the composition of simpler parts. The visual system exploits this structure by hierarchically decomposing dynamic scenes: When we see a person walking on a train or an animal running in a herd, we recognize the individual’s movement as nested within a reference frame that is, itself, moving. Despite its ubiquity, surprisingly little is understood about the computations underlying hierarchical motion perception. To address this gap, we developed a class of stimuli that grant tight control over statistical relations among object velocities in dynamic scenes. We first demonstrate that structured motion stimuli benefit human multiple object tracking performance. Computational analysis revealed that the performance gain is best explained by human participants making use of motion relations during tracking. A second experiment, using a motion prediction task, reinforced this conclusion and provided fine-grained information about how the visual system flexibly exploits motion structure.

- Heuristics and optimal solutions to the breadth–depth dilemmaRubén Moreno-Bote, Jorge Ramírez-Ruiz, Jan Drugowitsch, and Benjamin Y. Hayden
*Proceedings of the National Academy of Sciences*, Aug 2020In multialternative risky choice, we are often faced with the opportunity to allocate our limited information-gathering capacity between several options before receiving feedback. In such cases, we face a natural trade-off between breadth—spreading our capacity across many options—and depth—gaining more information about a smaller number of options. Despite its broad relevance to daily life, including in many naturalistic foraging situations, the optimal strategy in the breadth–depth trade-off has not been delineated. Here, we formalize the breadth–depth dilemma through a finite-sample capacity model. We find that, if capacity is small (∼10 samples), it is optimal to draw one sample per alternative, favoring breadth. However, for larger capacities, a sharp transition is observed, and it becomes best to deeply sample a very small fraction of alternatives, which roughly decreases with the square root of capacity. Thus, ignoring most options, even when capacity is large enough to shallowly sample all of them, is a signature of optimal behavior. Our results also provide a rich casuistic for metareasoning in multialternative decisions with bounded capacity using close-to-optimal heuristics.

- Meissner corpuscles and their spatially intermingled afferents underlie gentle touch perceptionNicole L. Neubarth, Alan J. Emanuel, Yin Liu, Mark W. Springel, Annie Handler, Qiyu Zhang, Brendan P. Lehnert, Chong Guo, and 11 more authors
*Science*, Jun 2020The Meissner corpuscle, a mechanosensory end organ, was discovered more than 165 years ago and has since been found in the glabrous skin of all mammals, including that on human fingertips. Although prominently featured in textbooks, the function of the Meissner corpuscle is unknown. Neubarth et al. generated adult mice without Meissner corpuscles and used them to show that these corpuscles alone mediate behavioral responses to, and perception of, gentle forces (see the Perspective by Marshall and Patapoutian). Each Meissner corpuscle is innervated by two molecularly distinct, yet physiologically similar, mechanosensory neurons. These two neuronal subtypes are developmentally interdependent and their endings are intertwined within the corpuscle. Both Meissner mechanosensory neuron subtypes are homotypically tiled, ensuring uniform and complete coverage of the skin, yet their receptive fields are overlapping and offset with respect to each other. Science, this issue p. eabb2751; see also p. 1311 Light touch perception and fine sensorimotor control arise from spatially overlapping mechanoreceptors of the Meissner corpuscle. Meissner corpuscles are mechanosensory end organs that densely occupy mammalian glabrous skin. We generated mice that selectively lacked Meissner corpuscles and found them to be deficient in both perceiving the gentlest detectable forces acting on glabrous skin and fine sensorimotor control. We found that Meissner corpuscles are innervated by two mechanoreceptor subtypes that exhibit distinct responses to tactile stimuli. The anatomical receptive fields of these two mechanoreceptor subtypes homotypically tile glabrous skin in a manner that is offset with respect to one another. Electron microscopic analysis of the two Meissner afferents within the corpuscle supports a model in which the extent of lamellar cell wrappings of mechanoreceptor endings determines their force sensitivity thresholds and kinetic properties.

- The impact of learning on perceptual decisions and its implication for speed-accuracy tradeoffsAndré G. Mendonça, Jan Drugowitsch, M. Inês Vicente, Eric E. J. DeWitt, Alexandre Pouget, and Zachary F. Mainen
*Nature Communications*, Jun 2020The Meissner corpuscle, a mechanosensory end organ, was discovered more than 165 years ago and has since been found in the glabrous skin of all mammals, including that on human fingertips. Although prominently featured in textbooks, the function of the Meissner corpuscle is unknown. Neubarth et al. generated adult mice without Meissner corpuscles and used them to show that these corpuscles alone mediate behavioral responses to, and perception of, gentle forces (see the Perspective by Marshall and Patapoutian). Each Meissner corpuscle is innervated by two molecularly distinct, yet physiologically similar, mechanosensory neurons. These two neuronal subtypes are developmentally interdependent and their endings are intertwined within the corpuscle. Both Meissner mechanosensory neuron subtypes are homotypically tiled, ensuring uniform and complete coverage of the skin, yet their receptive fields are overlapping and offset with respect to each other. Science, this issue p. eabb2751; see also p. 1311 Light touch perception and fine sensorimotor control arise from spatially overlapping mechanoreceptors of the Meissner corpuscle. Meissner corpuscles are mechanosensory end organs that densely occupy mammalian glabrous skin. We generated mice that selectively lacked Meissner corpuscles and found them to be deficient in both perceiving the gentlest detectable forces acting on glabrous skin and fine sensorimotor control. We found that Meissner corpuscles are innervated by two mechanoreceptor subtypes that exhibit distinct responses to tactile stimuli. The anatomical receptive fields of these two mechanoreceptor subtypes homotypically tile glabrous skin in a manner that is offset with respect to one another. Electron microscopic analysis of the two Meissner afferents within the corpuscle supports a model in which the extent of lamellar cell wrappings of mechanoreceptor endings determines their force sensitivity thresholds and kinetic properties.

- Adaptation Properties Allow Identification of Optimized Neural CodesLuke Rast, and Jan Drugowitsch
*In Advances in Neural Information Processing Systems*, Jun 2020The adaptation of neural codes to the statistics of their environment is well captured by efficient coding approaches. Here we solve an inverse problem: characterizing the objective and constraint functions that efficient codes appear to be optimal for, on the basis of how they adapt to different stimulus distributions. We formulate a general efficient coding problem, with flexible objective and constraint functions and minimal parametric assumptions. Solving special cases of this model, we provide solutions to broad classes of Fisher information-based efficient coding problems, generalizing a wide range of previous results. We show that different objective function types impose qualitatively different adaptation behaviors, while constraints enforce characteristic deviations from classic efficient coding signatures. Despite interaction between these effects, clear signatures emerge for both unconstrained optimization problems and information-maximizing objective functions. Asking for a fixed-point of the neural code adaptation, we find an objective-independent characterization of constraints on the neural code. We use this result to propose an experimental paradigm that can characterize both the objective and constraint functions that an observed code appears to be optimized for.

## 2019

- Learning optimal decisions with confidenceJan Drugowitsch, André G. Mendonça, Zachary F. Mainen, and Alexandre Pouget
*Proceedings of the National Academy of Sciences*, Nov 2019Diffusion decision models (DDMs) are immensely successful models for decision making under uncertainty and time pressure. In the context of perceptual decision making, these models typically start with two input units, organized in a neuron–antineuron pair. In contrast, in the brain, sensory inputs are encoded through the activity of large neuronal populations. Moreover, while DDMs are wired by hand, the nervous system must learn the weights of the network through trial and error. There is currently no normative theory of learning in DDMs and therefore no theory of how decision makers could learn to make optimal decisions in this context. Here, we derive such a rule for learning a near-optimal linear combination of DDM inputs based on trial-by-trial feedback. The rule is Bayesian in the sense that it learns not only the mean of the weights but also the uncertainty around this mean in the form of a covariance matrix. In this rule, the rate of learning is proportional (respectively, inversely proportional) to confidence for incorrect (respectively, correct) decisions. Furthermore, we show that, in volatile environments, the rule predicts a bias toward repeating the same choice after correct decisions, with a bias strength that is modulated by the previous choice’s difficulty. Finally, we extend our learning rule to cases for which one of the choices is more likely a priori, which provides insights into how such biases modulate the mechanisms leading to optimal decisions in diffusion models.

- Family of closed-form solutions for two-dimensional correlated diffusion processesHaozhe Shan, Rubén Moreno-Bote, and Jan Drugowitsch
*Physical Review E*, Sep 2019Diffusion processes with boundaries are models of transport phenomena with wide applicability across many fields. These processes are described by their probability density functions (PDFs), which often obey Fokker-Planck equations (FPEs). While obtaining analytical solutions is often possible in the absence of boundaries, obtaining closed-form solutions to the FPE is more challenging once absorbing boundaries are present. As a result, analyses of these processes have largely relied on approximations or direct simulations. In this paper, we studied two-dimensional, time-homogeneous, spatially correlated diffusion with linear, axis-aligned, absorbing boundaries. Our main result is the explicit construction of a full family of closed-form solutions for their PDFs using the method of images. We found that such solutions can be built if and only if the correlation coefficient ρ between the two diffusing processes takes one of a numerable set of values. Using a geometric argument, we derived the complete set of ρ’s where such solutions can be found. Solvable ρ’s are given by ρ=−cos(π/k), where k ∈ Z+ ∪ +∞. Solutions were validated in simulations. Qualitative behaviors of the process appear to vary smoothly over ρ, allowing extrapolation from our solutions to cases with unsolvable ρ’s.

- Control of Synaptic Specificity by Establishing a Relative Preference for Synaptic PartnersChundi Xu, Emma Theisen, Ryan Maloney, Jing Peng, Ivan Santiago, Clarence Yapp, Zachary Werkhoven, Elijah Rumbaut, and 11 more authors
*Neuron*, Sep 2019The ability of neurons to identify correct synaptic partners is fundamental to the proper assembly and function of neural circuits. Relative to other steps in circuit formation such as axon guidance, our knowledge of how synaptic partner selection is regulated is severely limited. Drosophila Dpr and DIP immunoglobulin superfamily (IgSF) cell-surface proteins bind heterophilically and are expressed in a complementary manner between synaptic partners in the visual system. Here, we show that in the lamina, DIP mis-expression is sufficient to promote synapse formation with Dpr-expressing neurons and that disrupting DIP function results in ectopic synapse formation. These findings indicate that DIP proteins promote synapses to form between specific cell types and that in their absence, neurons synapse with alternative partners. We propose that neurons have the capacity to synapse with a broad range of cell types and that synaptic specificity is achieved by establishing a preference for specific partners.

- Optimal policy for multi-alternative decisionsSatohiro Tajima, Jan Drugowitsch, Nisheet Patel, and Alexandre Pouget
*Nature Neuroscience*, Sep 2019Everyday decisions frequently require choosing among multiple alternatives. Yet the optimal policy for such decisions is unknown. Here we derive the normative policy for general multi-alternative decisions. This strategy requires evidence accumulation to nonlinear, time-dependent bounds that trigger choices. A geometric symmetry in those boundaries allows the optimal strategy to be implemented by a simple neural circuit involving normalization with fixed decision bounds and an urgency signal. The model captures several key features of the response of decision-making neurons as well as the increase in reaction time as a function of the number of alternatives, known as Hick’s law. In addition, we show that in the presence of divisive normalization and internal variability, our model can account for several so-called ‘irrational’ behaviors, such as the similarity effect as well as the violation of both the independence of irrelevant alternatives principle and the regularity principle.

- Prefrontal mechanisms combining rewards and beliefs in human decision-makingMarion Rouault, Jan Drugowitsch, and Etienne Koechlin
*Nature Communications*, Jan 2019In uncertain and changing environments, optimal decision-making requires integrating reward expectations with probabilistic beliefs about reward contingencies. Little is known, however, about how the prefrontal cortex (PFC), which subserves decision-making, combines these quantities. Here, using computational modelling and neuroimaging, we show that the ventromedial PFC encodes both reward expectations and proper beliefs about reward contingencies, while the dorsomedial PFC combines these quantities and guides choices that are at variance with those predicted by optimal decision theory: instead of integrating reward expectations with beliefs, the dorsomedial PFC built context-dependent reward expectations commensurable to beliefs and used these quantities as two concurrent appetitive components, driving choices. This neural mechanism accounts for well-known risk aversion effects in human decision-making. The results reveal that the irrationality of human choices commonly theorized as deriving from optimal computations over false beliefs, actually stems from suboptimal neural heuristics over rational beliefs about reward contingencies.

## 2017

- Lateral orbitofrontal cortex anticipates choices and integrates prior with current informationRamon Nogueira, Juan M. Abolafia, Jan Drugowitsch, Emili Balaguer-Ballester, Maria V. Sanchez-Vives, and Rubén Moreno-Bote
*Nature Communications*, Mar 2017Adaptive behavior requires integrating prior with current information to anticipate upcoming events. Brain structures related to this computation should bring relevant signals from the recent past into the present. Here we report that rats can integrate the most recent prior information with sensory information, thereby improving behavior on a perceptual decision-making task with outcome-dependent past trial history. We find that anticipatory signals in the orbitofrontal cortex about upcoming choice increase over time and are even present before stimulus onset. These neuronal signals also represent the stimulus and relevant second-order combinations of past state variables. The encoding of choice, stimulus and second-order past state variables resides, up to movement onset, in overlapping populations. The neuronal representation of choice before stimulus onset and its build-up once the stimulus is presented suggest that orbitofrontal cortex plays a role in transforming immediate prior and stimulus information into choices using a compact state-space representation.

## 2016

- Computational Precision of Mental Inference as Critical Source of Human Choice SuboptimalityJan Drugowitsch, Valentin Wyart, Anne-Dominique Devauchelle, and Etienne Koechlin
*Neuron*, Mar 2016Making decisions in uncertain environments often requires combining multiple pieces of ambiguous information from external cues. In such conditions, human choices resemble optimal Bayesian inference, but typically show a large suboptimal variability whose origin remains poorly understood. In particular, this choice suboptimality might arise from imperfections in mental inference rather than in peripheral stages, such as sensory processing and response selection. Here, we dissociate these three sources of suboptimality in human choices based on combining multiple ambiguous cues. Using a novel quantitative approach for identifying the origin and structure of choice variability, we show that imperfections in inference alone cause a dominant fraction of suboptimal choices. Furthermore, two-thirds of this suboptimality appear to derive from the limited precision of neural computations implementing inference rather than from systematic deviations from Bayes-optimal inference. These findings set an upper bound on the accuracy and ultimate predictability of human choices in uncertain environments.

- Optimal policy for value-based decision-makingSatohiro Tajima, Jan Drugowitsch, and Alexandre Pouget
*Nature Communications*, Aug 2016For decades now, normative theories of perceptual decisions, and their implementation as drift diffusion models, have driven and significantly improved our understanding of human and animal behaviour and the underlying neural processes. While similar processes seem to govern value-based decisions, we still lack the theoretical understanding of why this ought to be the case. Here, we show that, similar to perceptual decisions, drift diffusion models implement the optimal strategy for value-based decisions. Such optimal decisions require the models’ decision boundaries to collapse over time, and to depend on the a priori knowledge about reward contingencies. Diffusion models only implement the optimal strategy under specific task assumptions, and cease to be optimal once we start relaxing these assumptions, by, for example, using non-linear utility functions. Our findings thus provide the much-needed theory for value-based decisions, explain the apparent similarity to perceptual decisions, and predict conditions under which this similarity should break down.

- Becoming Confident in the Statistical Nature of Human Confidence JudgmentsJan Drugowitsch
*Neuron*, Aug 2016In this issue of Neuron, Sanders et al. (2016) demonstrate that human confidence judgments seem to arise from computations compatible with statistical decision theory, shining a new light on the old questions of how such judgments are formed.

- Confidence and certainty: distinct probabilistic quantities for different goalsAlexandre Pouget, Jan Drugowitsch, and Adam Kepecs
*Nature Neuroscience*, Aug 2016The authors use recent probabilistic theories of neural computation to argue that confidence and certainty are not identical concepts. They propose precise mathematical definitions for both of these concepts and discuss putative neural representations.

- Fast and accurate Monte Carlo sampling of first-passage times from Wiener diffusion modelsJan Drugowitsch
*Scientific Reports*, Feb 2016We present a new, fast approach for drawing boundary crossing samples from Wiener diffusion models. Diffusion models are widely applied to model choices and reaction times in two-choice decisions. Samples from these models can be used to simulate the choices and reaction times they predict. These samples, in turn, can be utilized to adjust the models’ parameters to match observed behavior from humans and other animals. Usually, such samples are drawn by simulating a stochastic differential equation in discrete time steps, which is slow and leads to biases in the reaction time estimates. Our method, instead, facilitates known expressions for first-passage time densities, which results in unbiased, exact samples and a hundred to thousand-fold speed increase in typical situations. In its most basic form it is restricted to diffusion models with symmetric boundaries and non-leaky accumulation, but our approach can be extended to also handle asymmetric boundaries or to approximate leaky accumulation.

- Multiplicative and Additive Modulation of Neuronal Tuning with Population Activity Affects Encoded InformationIñigo Arandia-Romero, Seiji Tanabe, Jan Drugowitsch, Adam Kohn, and Rubén Moreno-Bote
*Neuron*, Mar 2016Numerous studies have shown that neuronal responses are modulated by stimulus properties and also by the state of the local network. However, little is known about how activity fluctuations of neuronal populations modulate the sensory tuning of cells and affect their encoded information. We found that fluctuations in ongoing and stimulus-evoked population activity in primate visual cortex modulate the tuning of neurons in a multiplicative and additive manner. While distributed on a continuum, neurons with stronger multiplicative effects tended to have less additive modulation and vice versa. The information encoded by multiplicatively modulated neurons increased with greater population activity, while that of additively modulated neurons decreased. These effects offset each other so that population activity had little effect on total information. Our results thus suggest that intrinsic activity fluctuations may act as a “traffic light” that determines which subset of neurons is most informative.

## 2015

- Causal Inference and Explaining Away in a Spiking NetworkRubén Moreno-Bote, and Jan Drugowitsch
*Scientific Reports*, Dec 2015While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference and uses simple operations, such as linear synapses with realistic time constants and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification.

- Simultaneous Learning and Filtering without Delusions: A Bayes-Optimal Derivation of Combining Predictive Inference and Adaptive FilteringJan Kneissler, Jan Drugowitsch, Karl Friston, and Martin Butz
*Frontiers in Computational Neuroscience*, Dec 2015Predictive coding appears to be one of the fundamental working principles of brain processing. Amongst other aspects, brains often predict the sensory consequences of their own actions. Predictive coding resembles Kalman filtering, where incoming sensory information is filtered to produce prediction errors for subsequent adaptation and learning. However, to generate prediction errors given motor commands, a suitable temporal forward model is required to generate predictions. While in engineering applications, it is usually assumed that this forward model is known, the brain has to learn it. When filtering sensory input and learning from the residual signal in parallel, a fundamental problem arises: the system can enter a delusional loop when filtering the sensory information using an overly trusted forward model. In this case, learning stalls before accurate convergence because uncertainty about the forward model is not properly accommodated. We present a Bayes-optimal solution to this generic and pernicious problem for the case of linear forward models, which we call Predictive Inference and Adaptive Filtering (PIAF). PIAF filters incoming sensory information and learns the forward model simultaneously. We show that PIAF is formally related to Kalman filtering and to the Recursive Least Squares linear approximation method, but combines these procedures in a Bayes optimal fashion. Numerical evaluations confirm that the delusional loop is precluded and that the learning of the forward model is more than 10-times faster when compared to a naive combination of Kalman filtering and Recursive Least Squares.

- Tuning the speed-accuracy trade-off to maximize reward rate in multisensory decision-makingJan Drugowitsch, Gregory C DeAngelis, Dora E Angelaki, and Alexandre Pouget
*eLife*, Jun 2015For decisions made under time pressure, effective decision making based on uncertain or ambiguous evidence requires efficient accumulation of evidence over time, as well as appropriately balancing speed and accuracy, known as the speed/accuracy trade-off. For simple unimodal stimuli, previous studies have shown that human subjects set their speed/accuracy trade-off to maximize reward rate. We extend this analysis to situations in which information is provided by multiple sensory modalities. Analyzing previously collected data (Drugowitsch et al., 2014), we show that human subjects adjust their speed/accuracy trade-off to produce near-optimal reward rates. This trade-off can change rapidly across trials according to the sensory modalities involved, suggesting that it is represented by neural population codes rather than implemented by slow neuronal mechanisms such as gradual changes in synaptic weights. Furthermore, we show that deviations from the optimal speed/accuracy trade-off can be explained by assuming an incomplete gradient-based learning of these trade-offs.

## 2014

- Optimal decision-making with time-varying evidence reliabilityJan Drugowitsch, Ruben Moreno-Bote, and Alexandre Pouget
*In Advances in Neural Information Processing Systems*, Jun 2014Previous theoretical and experimental work on optimal decision-making was restricted to the artificial setting of a reliability of the momentary sensory evidence that remained constant within single trials. The work presented here describes the computation and characterization of optimal decision-making in the more realistic case of an evidence reliability that varies across time even within a trial. It shows that, in this case, the optimal behavior is determined by a bound in the decision maker’s belief that depends only on the current, but not the past, reliability. We furthermore demonstrate that simpler heuristics fail to match the optimal performance for certain characteristics of the process that determines the time-course of this reliability, causing a drop in reward rate by more than 50%.

- Optimal multisensory decision-making in a reaction-time taskJan Drugowitsch, Gregory C DeAngelis, Eliana M Klier, Dora E Angelaki, and Alexandre Pouget
*eLife*, Jun 2014Humans and animals can integrate sensory evidence from various sources to make decisions in a statistically near-optimal manner, provided that the stimulus presentation time is fixed across trials. Little is known about whether optimality is preserved when subjects can choose when to make a decision (reaction-time task), nor when sensory inputs have time-varying reliability. Using a reaction-time version of a visual/vestibular heading discrimination task, we show that behavior is clearly sub-optimal when quantified with traditional optimality metrics that ignore reaction times. We created a computational model that accumulates evidence optimally across both cues and time, and trades off accuracy with decision speed. This model quantitatively explains subjects’s choices and reaction times, supporting the hypothesis that subjects do, in fact, accumulate evidence optimally over time and across sensory modalities, even when the reaction time is under the subject’s control.

- Relation between Belief and Performance in Perceptual Decision MakingJan Drugowitsch, Ruben Moreno-Bote, and Alexandre Pouget
*PLoS ONE*, May 2014In an uncertain and ambiguous world, effective decision making requires that subjects form and maintain a belief about the correctness of their choices, a process called meta-cognition. Prediction of future outcomes and self-monitoring are only effective if belief closely matches behavioral performance. Equality between belief and performance is also critical for experimentalists to gain insight into the subjects’ belief by simply measuring their performance. Assuming that the decision maker holds the correct model of the world, one might indeed expect that belief and performance should go hand in hand. Unfortunately, we show here that this is rarely the case when performance is defined as the percentage of correct responses for a fixed stimulus, a standard definition in psychophysics. In this case, belief equals performance only for a very narrow family of tasks, whereas in others they will only be very weakly correlated. As we will see it is possible to restore this equality in specific circumstances but this remedy is only effective for a decision-maker, not for an experimenter. We furthermore show that belief and performance do not match when conditioned on task difficulty – as is common practice when plotting the psychometric curve – highlighting common pitfalls in previous neuroscience work. Finally, we demonstrate that miscalibration and the hard-easy effect observed in humans’ and other animals’ certainty judgments could be explained by a mismatch between the experimenter’s and decision maker’s expected distribution of task difficulties. These results have important implications for experimental design and are of relevance for theories that aim to unravel the nature of meta-cognition.

- Filtering Sensory Information with XCSF: Improving Learning Robustness and Robot Arm Control PerformanceJan Kneissler, Patrick O. Stalph, Jan Drugowitsch, and Martin V. Butz
*Evolutionary Computation*, Mar 2014It has been shown previously that the control of a robot arm can be efficiently learned using the XCSF learning classifier system, which is a nonlinear regression system based on evolutionary computation. So far, however, the predictive knowledge about how actual motor activity changes the state of the arm system has not been exploited. In this paper, we utilize the forward velocity kinematics knowledge of XCSF to alleviate the negative effect of noisy sensors for successful learning and control. We incorporate Kalman filtering for estimating successive arm positions, iteratively combining sensory readings with XCSF-based predictions of hand position changes over time. The filtered arm position is used to improve both trajectory planning and further learning of the forward velocity kinematics. We test the approach on a simulated kinematic robot arm model. The results show that the combination can improve learning and control performance significantly. However, it also shows that variance estimates of XCSF prediction may be underestimated, in which case self-delusional spiraling effects can hinder effective learning. Thus, we introduce a heuristic parameter, which can be motivated by theory, and which limits the influence of XCSF’s predictions on its own further learning input. As a result, we obtain drastic improvements in noise tolerance, allowing the system to cope with more than 10-times higher noise levels.

## 2013

- Variational Bayesian inference for linear and logistic regressionJan Drugowitsch
*ArXiv e-prints*, Oct 2013The article describe the model, derivation, and implementation of variational Bayesian inference for linear and logistic regression, both with and without automatic relevance determination. It has the dual function of acting as a tutorial for the derivation of variational Bayesian inference for simple models, as well as documenting, and providing brief examples for the MATLAB/Octave functions that implement this inference. These functions are freely available online.

## 2012

- Filtering Sensory Information with XCSF: Improving Learning Robustness and Control PerformanceJan Kneissler, Patrick O. Stalph, Jan Drugowitsch, and Martin V. Butz
*In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation*, Jul 2012It was previously shown that the control of a robot arm can be efficiently learned using the XCSF classifier system. So far, however, the predictive knowledge about how actual motor activity changes the state of the arm system has not been exploited. In this paper, we exploit the forward velocity kinematics knowledge of XCSF to alleviate the negative effect of noisy sensors for successful learning and control. We incorporate Kalman filtering for estimating successive arm positions iteratively combining sensory readings with XCSF-based predictions of hand position changes over time. The filtered arm position is used to improve both trajectory planning and further learning of the forward velocity kinematics. We test the approach on a simulated, kinematic robot arm model. The results show that the combination can improve learning and control performance significantly. However, it also shows that variance estimates of XCSF predictions maybe underestimated, in which case self-delusional spiraling effects hinder effective learning. Thus, we introduce a heuristic parameter, which limits the influence of XCSF’s predictions on its own further learning input. As a result, we obtain drastic improvements in noise tolerance coping with more than ten times higher noise levels.

- The Cost of Accumulating Evidence in Perceptual Decision MakingJan Drugowitsch, Rubén Moreno-Bote, Anne K. Churchland, Michael N. Shadlen, and Alexandre Pouget
*Journal of Neuroscience*, Mar 2012Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

- Probabilistic vs. non-probabilistic approaches to the neurobiology of perceptual decision-makingJan Drugowitsch, and Alexandre Pouget
*Current Opinion in Neurobiology*, Dec 2012Decision makingOptimal binary perceptual decision making requires accumulation of evidence in the form of a probability distribution that specifies the probability of the choices being correct given the evidence so far. Reward rates can then be maximized by stopping the accumulation when the confidence about either option reaches a threshold. Behavioral and neuronal evidence suggests that humans and animals follow such a probabilitistic decision strategy, although its neural implementation has yet to be fully characterized. Here we show that that diffusion decision models and attractor network models provide an approximation to the optimal strategy only under certain circumstances. In particular, neither model type is sufficiently flexible to encode the reliability of both the momentary and the accumulated evidence, which is a pre-requisite to accumulate evidence of time-varying reliability. Probabilistic population codes, by contrast, can encode these quantities and, as a consequence, have the potential to implement the optimal strategy accurately.

## 2010

- Quick thinking: perceiving in a tenth of a blink of an eyeJan Drugowitsch, and Alexandre Pouget
*Nature Neuroscience*, Mar 2010What is the minimal sensory processing time before we can make a decision about a stimulus? A study now reports that, for simple perceptual decisions, this can take as little as 30 ms.

## 2008

- Analysis and Improvements of the Classifier Error Estimate in XCSFDaniele Loiacono, Jan Drugowitsch, Alwyn Barry, and Pier Luca Lanzi
*In Learning Classifier Systems*, Mar 2008The estimation of the classifier error plays a key role in accuracy-based learning classifier systems. In this paper we study the current definition of the classifier error in XCSF and discuss the limitations of the algorithm that is currently used to compute the classifier error estimate from online experience. Subsequently, we introduce a new definition for the classifier error and apply the Bayes Linear Analysis framework to find a more accurate and reliable error estimate. This results in two incremental error estimate update algorithms that we compare empirically to the performance of the currently applied approach. Our results suggest that the new estimation algorithms can improve the generalization capabilities of XCSF, especially when the action-set subsumption operator is used.

- A formal framework and extensions for function approximation in learning classifier systemsJan Drugowitsch, and Alwyn M. Barry
*Machine Learning*, Jan 2008Learning Classifier Systems (LCS) consist of the three components: function approximation, reinforcement learning, and classifier replacement. In this paper we formalize the function approximation part, by providing a clear problem definition, a formalization of the LCS function approximation architecture, and a definition of the function approximation aim. Additionally, we provide definitions of optimality and what conditions need to be fulfilled for a classifier to be optimal. As a demonstration of the usefulness of the framework, we derive commonly used algorithmic approaches that aim at reaching optimality from first principles, and introduce a new Kalman filter-based method that outperforms all currently implemented methods, in addition to providing further insight into the probabilistic basis of the localized model that a classifier provides. A global function approximation in LCS is achieved by combining the classifier’s localized model, for which we provide a simplified approach when compared to current LCS, based on the Maximum Likelihood of a combination of all classifiers. The formalizations in this paper act as the foundation of a currently actively developed formal framework that includes all three LCS components, promising a better formal understanding of current LCS and the development of better LCS algorithms.

- A Principled Foundation for LCSJan Drugowitsch, and Alwyn M. Barry
*In Learning Classifier Systems*, Jan 2008In this paper we promote a new methodology for designing LCS that is based on first identifying their underlying model and then using standard machine learning methods to train this model. This leads to a clear identification of the LCS model and makes explicit the assumptions made about the data, as well as promises advances in the theoretical understanding of LCS through transferring the understanding of the applied machine learning methods to LCS. Additionally, it allows us, for the first time, to give a formal and general, that is, representation-independent, definition of the optimal set of classifiers that LCS aim at finding. To demonstrate the feasibility of the proposed methodology we design a Bayesian LCS model by borrowing concepts from the related Mixtures-of-Experts model. The quality of a set of classifiers and consequently also the optimal set of classifiers is defined by the application of Bayesian model selection, which turns finding this set into a principled optimisation task. Using a simple Pittsburgh-style LCS, a set of preliminary experiments demonstrate the feasibility of this approach.

## 2007

- Generalised mixtures of experts, independent expert training, and learning classifier systemsJan Drugowitsch, and Alwyn M. BarryApr 2007
We present a generalisation to the Mixtures of Experts model that introduces prior localisation of the experts as part of the model structure, and as such relates them strongly to the evolutionary computation ML method known as Learning Classifier Systems. While the introduced generalisation allows specification of more complex localisation patterns, identifying good models becomes more difficult. We approach this tradeoff by introducing a new training schema that makes fitting a single model computationally less demanding and shifts the importance to searching the space of possible model structures, guided by approximate variational Bayesian inference to fit the model and find the model evidence. We demonstrate model search for simple non-linear curve fitting tasks by sampling from the model posterior, as a proof-of-concept alternative to the genetic algorithm used in Learning Classifier Systems for that purpose.

- Learning classifier systems from first principles: a probabilistic reformulation of learning classifier systems from the perspective of machine learningJan Drugowitsch
*University of Bath*, Aug 2007Learning Classifier Systems (LCS) are a family of rule-based machine learning methods. They aim at the autonomous production of potentially humanreadable results that are the most compact generalised representation whilst also maintaining high predictive accuracy, with a wide range of application areas, such as autonomous robotics, economics, and multi-agent systems. Their design is mainly approached heuristically and, even though their performance is competitive in regression and classification tasks, they do not meet their expected performance in sequential decision tasks despite being initially designed for such tasks. It is out contention that improvement is hindered by a lack of theoretical understanding of their underlying mechanisms and dy namics. To improve this understanding, our work proposes a new methodology for their design that centres on the model they use to represent the problem structure, and subsequently applies standard machine learning methods to train this model. The LCS structure is commonly a set of rules, resulting in a parametric model that combines a set of localised models, each representing one rule. This leads to a general definition of the optimal set of rules as being the one whose model represents the data best and at a minimum complexity, and hence an increased theoretical understanding of LCS. Consequently, LCS training reduces to searching and evaluating this set of rules, for which we introduce and apply several standard methods that are shown to be closely related to current LCS implementations. The benefit of taking this approach is not only a new view on LCS, and the transfer of the formal basis of the applied methods to the analysis of LCS, but also the first general definition for what it means for a set of rules to be optimal. The work promises advances in several areas, such as developing new LCS implementations with performance guarantees, to improve their performance, and foremost their theoretical understanding.

- Mixing Independent ClassifiersJan Drugowitsch, and Alwyn M. Barry
*In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation*, Aug 2007In this study we deal with the mixing problem, which concerns combining the prediction of independently trained local models to form a global prediction. We deal with it from the perspective of Learning Classifier Systems where a set of classifiers provide the local models. Firstly, we formalise the mixing problem and provide both analytical and heuristic approaches to solving it. The analytical approaches are shown to not scale well with the number of local models, but are nevertheless compared to heuristic models in a set of function approximation tasks. These experiments show that we can design heuristics that exceed the performance of the current state-of-the-art Learning Classifier System XCS, and are competitive when compared to analytical solutions. Additionally, we provide an upper bound on the prediction errors for the heuristic mixing approaches.

- A Principled Foundation for LCSJan Drugowitsch, and Alwyn M. Barry
*In Proceedings of the 9th Annual Conference Companion on Genetic and Evolutionary Computation*, Aug 2007In this paper we explicitly identify the probabilistic model underlying LCS by linking it to a generalisation of the common Mixture-of-Experts model. Having an explicit representation of the model not only puts LCS on a strong statistical foundation and identifies the assumptions that the model makes about the data, but also allows us to use off-the-shelf training methods to train it. We show how to exploit this advantage by embedding the LCS model into a fully Bayesian framework that results in an objective function for a set of classifiers, effectively turning the LCS training into a principled optimisation task. A set of preliminary experiments demonstrate the feasibility of this approach.

## 2006

- A Formal Framework and Extensions for Function Approximation in Learning Classiﬁer SystemsJan Drugowitsch, and Alwyn M. BarryJan 2006
In this paper we introduce part of a formal framework for Learning Classiﬁer Systems (LCS) which, as a whole, aims at incorporating all components of LCS: function approximation, reinforcement learning and classiﬁer replacement. The part introduced here concerns function approximation, and provides a formal problem deﬁnition, a formalisation of the LCS function approximation architecture, and a deﬁnition of the approximation aim. Additionally, we provide deﬁnitions of optimality and what conditions need to be fulﬁlled for a classiﬁer to be optimal. Furthermore, as a demonstration of the usefulness of the framework, we derive commonly used algorithmic approaches that aim at reaching optimality from ﬁrst principles, and introduce a new Kalman ﬁlter-based method that outperforms all currently implemented methods. How to mix classiﬁers to reach an overall approximation is simpliﬁed when compared to current LCS, and is justiﬁed by the Maximum Likelihood Estimate of a combination of all classiﬁers.

- A Formal Framework for Reinforcement Learning with Function Approximation in Learning Classiﬁer SystemsJan Drugowitsch, and Alwyn M. BarryJan 2006
To fully understand the properties of Accuracy-based Learning Classiﬁer Systems, we need a formal framework that captures all components of classiﬁer systems, that is, function approximation, reinforcement learning, and classiﬁer replacement, and permits the modelling of them separately and in their interaction. In this paper we extend our previous work on function approximation to reinforcement learning and its interaction between reinforcement learning and function approximation. After giving an overview and derivations for common reinforcement learning methods from ﬁrst principles, we show how they apply to Learning Classiﬁer Systems. At the same time, we present a new algorithm that is expected to outperform all current methods, discuss the use of XCS with gradient descent and TD(λ), and given an in-depth discussion on how to study the convergence of Learning Classiﬁer Systems with a time-invariant population.

- Improving classifier error estimate in XCSFJan Drugowitsch, and Alwyn M. Barry
*In The Ninth International Workshop on Learning Classifier Systems, IWLCS-2006*, Jan 2006In this paper we explicitly identify the probabilistic model underlying LCS by linking it to a generalisation of the common Mixture-of-Experts model. Having an explicit representation of the model not only puts LCS on a strong statistical foundation and identifies the assumptions that the model makes about the data, but also allows us to use off-the-shelf training methods to train it. We show how to exploit this advantage by embedding the LCS model into a fully Bayesian framework that results in an objective function for a set of classifiers, effectively turning the LCS training into a principled optimisation task. A set of preliminary experiments demonstrate the feasibility of this approach.

- Mixing Independent ClassiﬁersJan Drugowitsch, and Alwyn M. BarryNov 2006
In this study we deal with the mixing problem, which concerns combining the prediction of independently trained local models to a global prediction. We deal with it from the perspective of Learning Classiﬁer Systems where a set of classiﬁers provide the local models. Firstly, we formalise the mixing problem and provide both analytical and heuristic approaches to solving it. The analytical approaches are shown to not scale well with the number of local models, but are nevertheless compared to heuristic models in a set of function approximation tasks. These experiments show that we can design heuristics that exceed the performance of the current state-of-the-art Learning Classiﬁer System XCS, and are competitive when compared to analytical solutions. Additionally, we provide an upper bound on the prediction errors for the heuristic mixing approaches.

- Towards Convergence of Learning Classifier Systems Value IterationJan Drugowitsch, and Alwyn M. Barry
*In Evolutionary Computation Workshop at the European Conference of Artificial Intelligence (ECAI)*, Nov 2006In this paper we are extending previous work on analysing Learning Classiﬁer Systems (LCS) in the reinforcement learning framework to deepen the theoretical analysis of Value Iteration with LCS function approximation. After introducing the formal framework and some mathematical preliminaries we demonstrate convergence of the algorithm for ﬁxed classiﬁer mixing weights, and show that if the weights are not ﬁxed, the choice of the mixing function is signiﬁcant. Furthermore, we discuss accuracybased mixing and outline a proof that shows convergence of LCS Value Iteration with an accuracy-based classiﬁer mixing. This work is a signiﬁcant step towards convergence of accuracy-based LCS that use Q-Learning as the reinforcement learning component.

## 2005

- Integrating Life-Like Action Selection into Cycle-Based Agent Simulation EnvironmentsJoanna J Bryson, Tristan J Caulfield, and Jan Drugowitsch
*In Proceedings of Agent 2005: Generative Social Processes, Models, and Mechanisms*, Nov 2005Standardised simulation platforms such as RePast, Swarm, MASON and NetLogo are making Agent-Based Modelling accessible to ever-widening audiences. Some proportion of these modellers have good reason to want their agents to express relatively complex behaviour, or they may wish to describe their agents’ actions in terms of real time. Agents of increasing complexity may often be better (more simply) described using hierarchical constructs which express the priorities and goals of their actions, and the contexts in which sets of actions may be applicable (Bryson, 2003a). Describing an agent’s behaviour clearly and succinctly in this way might seem at odds with the iterative, cycle-based nature of most simulation platforms. Because each agent is known to act in lock-step synchrony with the others, describing the individual’s behaviour in terms of ﬂuid, coherent long-term plans may seem difﬁcult. In this paper we describe how an action-selection system designed for more conventionallyhumanoid AI such as robotics and virtual reality can be incorporated into a cycle-based ABM simulation platform. We integrate a Python-language version of the action selection for Bryson’s Behavior Oriented Design (BOD) into a fairly standard cycle-based simulation platform, MASON (Luke et al., 2003). The resulting system is currently being used as a research platform in our group, and has been used for laboratories in the European Agent Systems Summer School.

- XCS with Eligibility TracesJan Drugowitsch, and Alwyn M. Barry
*In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation*, Nov 2005The development of the XCS Learning Classifier System has produced a robust and stable implementation that performs competitively in direct-reward environments. Although investigations in delayed-reward (i.e. multi-step) environments have shown promise, XCS still struggles to efficiently find optimal solutions in environments with long action-chains. This paper highlights the strong relation of XCS to reinforcement learning and identifies some of the major differences. This makes it possible to add Eligibility Traces to XCS, a method taken from reinforcement learning to update the prediction of the whole action-chain on each step, which should cause prediction update to be faster and more accurate. However, it is shown that the discrete nature of the condition representation of a classifier and the operation of the genetic algorithm cause traces to propagate back incorrect prediction values and in some cases results in a decrease of system performance. As a result further investigation of the existing approach to generalisation is proposed.

- XCS with Eligibility RracesJan Drugowitsch, and Alwyn M. BarryJan 2005
The development of the XCS Learning Classiﬁer System has produced a robust and stable implementation that performs competitively in direct–reward environments. Although investigations in delayed–reward (i.e. multi–step) environments have shown promise, XCS still struggles to eﬃciently ﬁnd optimal solutions in environments with long action–chains. This paper highlights the strong relation of XCS to reinforcement learning and identiﬁes some of the major diﬀerences. This makes it possible to add Eligibility Traces to XCS, a method taken from reinforcement learning to update the prediction of the whole action–chain on each step, which should cause prediction update to be faster and more accurate. However, it is shown that the discrete nature of the condition representation of a classiﬁer and the operation of the genetic algorithm cause traces to propagate back incorrect prediction values and in some cases results in a decrease of system performance. As a result further investigation of the existing approach to generalisation is proposed.