• Computational Modeling of Behavior and Neural Mechanisms of Decision-Making Using Reinforcement Learning Theory

      Pietras, Bradley William; Schoenbaum, Geoffrey; Dayan, Peter, 1965- (2019)
      In the study of learning and decision-making in animals and humans, the field of Reinforcement Learning (RL) offers powerful ideas and tools for exploring the control mechanisms that underlie behavior. In this dissertation, we use RL to examine the questions of (i) how rats represent information about a complex, changing, task; (ii) what are the relevant variables underlying their decision-making processes; and (iii) whether those variables are encoded in the firing rates of neurons in the orbitofrontal cortex (OFC). We addressed these questions by making inquiries across three levels of understanding: computational theory, algorithmic representation, and physical implementation. Within this tri-level framework, we hypothesize that the subjects are engaged in a form of approximately optimal adaptive control. This involves their tracking critical, task-relevant, features of their environment as these features change, and then making appropriate choices accordingly. Two classes of RL algorithms were constructed, embodying different structural assumptions. One class of so-called return-based algorithms is based on elaborations of a standard Q-learning algorithm. The other, novel, class of income-based algorithms is based on a rather weaker notion of action-outcome contingency. Both classes of algorithm were parametrized and other factors were included such as perseveration. We t the algorithms to behavioral data from subjects using complexity-controlled empirical Bayesian inference. Internal variables from our algorithms were then used to predict neural ring rates of OFC neurons that were recorded as subjects performed the task. Linear regression, randomization testing, and false discovery rate analysis were used to determine statistically significant correlations between the predictors and neural activity. We found that income-class algorithms augmented with perseveration offered the best predictions of behavior. For the least restrictive statistical test (linear regression, p < 0.05), as many as 24% of the neurons were significantly correlated with variables associated with the best-fitting algorithm. By contrast, for our most restrictive test (randomized false discovery rate < 0.05), only 3% of the neurons passed as significant for one or more of our predictor variables. Other forms of neuronal dependence were apparent, including neurons that appeared to change their computational function dynamically depending on the state of the task.