reinforcement learning and ethics

December 20, 2020 · ai/ethics


In the short film Perfectly Natural, we see a young couple, their new baby, and a simulated reality where the baby interacts with her artificially programmed parents. Over the course of this film’s fourteen minutes, we see a familiar thread of theoretical interactions between humans and their artificially intelligent counterparts. First, there is an apparent excitement from experiencing something new and fantastic, then a series of routine correspondence, and finally a hint of mortal obsoletion. Now, if we replace the human parents in this film with humans in the real world, and the simulated reality with colloquial “artificial intelligence”, these three interactions are not a far cry from what we have observed since the idea of AI was born. That is, the human creates some thing new and fantastic, exploits the utility of said thing via regular correspondence (or use), and is potentially overtaken by the same. What does this say about our place in the evolution of “thinking machines,” and what are some ethical concerns therein?

A few years ago, David Abel, et al, at Brown University wrote a paper on Reinforcement Learning as a Framework for Ethical Decision Making.1 The purpose of this paper was to investigate the use of the Reinforcement Learning framework, namely a cleverly specified Partially Observable Markov Decision Processes, as an algorithmic solution to the problem of designing artificial ethical agents. More to the point, seamlessly translating ethics into some kind of algorithmic computable format for an artificial agent to embody is all but an impossible problem. To be fair, it is apparent to the reader that the authors do not intend to diminish the complexity of this problem to the confines of their solution (e.g., they provide a list of issues and possible improvements to their proposal at the end of the paper), but I believe there are at least two additional considerations that should be made, which I will address here: anthropomorphisation of artificial entities, and the potentially problematic probabilistic interpretation of ethics. I will also provide a brief overview of the framework itself.

the framework

In Reinforcement Learning as a Framework for Ethical Decision Making (RLE), Abel et al use Reinforcement Learning (RL) as a tool to place ethical decision making into an algorithmic context. The framework is based on the Partially Observable Markov Decision Process (POMDP), whereby the world (i.e., an environment) is defined in terms of possible discrete states it can take on (e.g., it is raining in Nairobi and the time is 14:31:05.003). An agent navigates (interacts with) this environment by taking actions based on some policy which maximizes an overall reward function. At any given time, the actual state of the environment is unbeknownst to the agent, but it can infer a best policy based on its history of observations and actions.2

The authors inject ethics into this framework by interpreting ethical truths as some part of the hidden state (e.g., it is raining in Nairobi and killing is bad could be a state), and insist that among the actions an agent may take, we must include the ability to ask a companion.3 The idea, then, is that an agent can learn to act ethically through a series of observing the environment, estimating the environment’s true underlying state based on historical observations and actions, and taking an action (often asking a companion).

It should be noted that all aspects of this framework are probabilistic. We first assume some initial state (which has some prior probability associated with it), and then for every other state (of which there can be infinitely many) there is a probability that the environment will transition to that state, and so on recursively. Then, given some action and the resulting state, there is a probability associated with each possible observation. Finally, the overall future reward is contingent on the chosen policy and estimated belief of the hidden underlying state. Small state spaces like the ones illustrated in RLE are fairly easily modelled, but solving the general (infinite-horizon) POMDP has been shown to be intractable if not uncomputable.4

In the paper, the authors illustrate examples of how to apply this framework using two ethical dilemma thought experiments: “Cake or Death”, where the agent needs to ascertain whether to bake a cake for someone or kill them, and “The Burning Room”, whether they need to determine the best way to retrieve a valuable item from a potentially burning room, taking into account the difference in value between the item and themself. The authors make it clear that these are only toy ethical dilemmas, and just as they would be used in a philosophical setting, they do not describe how an agent should behave, but they shed light on how we can evaluate the ethical system itself.


Throughout the course of my reading RLE, I find myself — and the authors for that matter — using anthropomorphising terms to describe a mechanical agent, like when the agent learns, or is unsure, or even when it asks a question.5 In the documentation for the R (programming language) implementation for solving the finite POMDP, the authors even refer to an RL agent as “she.”6 We often project human-like attributes onto non-human entities, like when we give our car a name, or comment on the “happy little trees” on a canvas. When we do so, similarities between us and them seem to fly out of thin air, but at the same time a mountain of differences are lurking in the fog: we can talk about them as if they are, but cars and trees are not humans. Anthropomorphisation shapes the way we find similarities between us and the algorithms we create, but it also causes inconsistencies in our reasoning when they exhibit non-human behaviors.

While the authors do not make it explicit in the paper, there is an uncanny similarity between the way humans view ethics and the way ethics is modeled for the RL agent. That is, the artificial agent is painted as an entity which has choices, acts on observations, and in particular, is constantly attempting to discover some hidden underlying ethical truth. In the previous section, we discussed how the “state of the world” is completely unknown to the agent within this framework, and that it can only use previous observations and actions as a guide to understand and interact with its environment. As humans, we have created a whole field of philosophical inquiry in an effort to understand what is ethical. That is not to say that any artificial agent described in the paper has the capacity to create and theorize within sub-fields of philosophy, but only that we too only have our observations as tools to understand the hidden state beyond the veil.

It is not uncommon for the agent to ask “what is moral” (as the authors put it) in a situation to understand the most “ethical” policy. I recognize that humans often do something similar when we ask our religious teacher for guidance, or when we heed the advice of our parents as children.7 But, as the question “what is moral” leaves no room for doubt, and as the underlying ethical truth is made concrete, there is an element here reminiscent of Kantian categorical human imperative. The difference is instead of being based on human attributes, the agent’s ethical policy is based on human (or companion) direction. Suppose a nefarious actor directs the agent toward malicious, humanly unethical behavior. In this case, the understanding of “what is right” or “what is moral” is tainted from our perspective, but it is only new from the perspective of the agent. Therefore, in the formulation of this framework, there is an element of servitude: the ethical system of the machine must match that of its companion counterparts.

Perhaps there is a underlying latent statement being made in this conversation on artificial agents and automatons: because we cannot help but use anthropomorphising terms to describe their nature, we are either incapable of conceiving of a totally new kind of intelligence with a new system of ethics, or it is simply uninteresting to us. That is, maybe we are only interested in creating human-like intelligence, sharing our system of ethics. And, if this is the case, we should concern ourselves with whether any artificial agent is acting in conflict with what we see as ethical and fair; one could further evaluate the RLE proposal from this perspective.

premises and probabilities

To recapitulate the purpose of RLE, we are attempting to use the RL framework, namely POMDPs, to model a process by which some automaton could learn an ethical system. In the previous section, we discussed reasons why it is safest to assume this ethical system is in fact our own. This framework is inherently probabilistic, and when we model something probabilistically, we are by definition accepting some level of uncertainty in calculation and bias in data. In the case of RLE, the calculations involve state estimations, actions taken, reward expectations, and observation likelihoods, whereas the data stems primarily from observations of the environment. Now, suppose we describe an RL agent’s activity as a mixture of exploring its environment (training) while also exploiting it (rewarding), then how do we reconcile the ethical implications of the way this system would progress?

First, during the exploration process, the agent must build a representative sample of potential observations by taking actions with incrementally more information about its environment. So, within this framework where states (i.e., ethical truths) are estimated using historical observations, the agent must partake in various forms of action to get a robust, diverse data set. In other words, to take ethical actions in the future, the agent must take unethical actions in the near term. Depending on the setting we place this agent, the risks here could be very low. But, in some cases, say those related to medicine or war, the risks could be considerably high. By design, the agent’s view of the true ethical state is never fully lucid, so even after having built a sufficient history, there is still a fair amount of uncertainty, especially in these high risk situations. Always, the action the agent takes is based purely on random selection from some probability distribution.

Is this in conflict with ethical decision making? Rather, is there an aspect of this explicitly stochastic process which is at odds with the human ethical system? Even though the agent is led to “ask a companion”, whom we will assume is altruistic, and rewards are so defined that ethical decisions yield more reward, if the reward has an infinite-horizon (as is the case in RLE), there is an element of making decisions now that will yield higher rewards in the future. Just as an automated chess player may sacrifice its queen mid-way through the game as part of its successful strategy to win, this agent may make an unethical choice in the near term to yield a larger reward in the future. In this way, there is an inherent consequentialist attitude that reigns over any deontologist “responsibility.”

Secondly, an agent in this framework exploits the environment by taking actions that yield favorable observations, and thereby non-negative rewards. In other words, the agent is taking actions based on prior information about what that action might do to the environment. This sounds like a model of human behavior, but the agent is not a human, and a human does not have memoried access to everything it has ever observed, but the agent does. If an aspect of this environment is human activity, it could be argued that the agent is affecting human activity, and potentially free will. We could draw a parallel here with advertisements on social media based on our search and click history. I might ask, “am I clicking on this item because I am and have been sincerely interested, or am I clicking because there is something about me which the recommending agent can exploit to gain a reward through my engagement?” Of course, the answer is not always clear, but en masse (as is evidenced by the recent The Social Dilemma Documentary) this kind of thing can have dangerous implications.


The modus operandi of any RL agent is to maximize some numerical reward value, and RLE proposes a way we can use this model to design an ethical agent. Above, we show that unless we take a strictly utilitarian view of intention, this framework may be in conflict with the natural mechanisms by which humans treat ethical goals. Not only that, but it may be foolish to rely solely on probabilistic processes in ethical decision making. We also discussed ways in which our design of any automated agent will be strongly affected by the language we use to describe it. In a sense, for as long as we lack words to analogize the nature of non-human entities with our own, artificial agents are essentially bound to serve, which has implications for their agency we cannot ignore. Inasmuch as this is not an easy problem to solve, I do believe that research efforts like RLE are a step in the right direction to the development of artificial intelligence, whatever that may become.


Abel, David et al. Reinforcement Learning As a Framework for Ethical Decision Making. AAAI Workshop: AI, Ethics, and Society. 2016.

Bringsjord, S. & Govindarajulu, N. Artificial Intelligence. Stanford Encyclopedia of Philosophy. 2018.

Hollis, James. Eden Project: In Search of the Magical Other. Inner City Books, 1998

Kamalzadeh, H. & Hahsler, M. POMDP: Introduction to Partially Observable Markov Decision Processes. R. 2019. pomdp/vignettes/POMDP.html#defining-a-pomdp-problem.

Madani, Omid. On the Computability of Infinite-Horizon Partially Observable Markov Decision Processes. 1999

Müller, Vincent C. Ethics of Artificial Intelligence and Robotics. Stanford Encyclopedia of Philosophy. 2020.

Nevins, Daniel. Halakhic Responses to Artificial Intelligence and Autonomous Machines. 2019.

Perfectly Natural. Victor Alonso-Berbel. DUST, 2018. Film.

The Social Dilemma. Jeff Orlowski. Netflix, 2020. Film.

  1. Abel, David et al. Reinforcement Learning As a Framework for Ethical Decision Making. AAAI Workshop: AI, Ethics, and Society. 2016. ↩︎

  2. Abel pg 6. Observations and actions yield further actions via a Bayesian update process where new observations inform a “belief” of the environment’s true state. This is the difference between the POMDP and the canonical MDP, where actions depend on full knowledge of the actual state of the environment. ↩︎

  3. Abel pg 5. In the paper, it is not made explicit exactly what “companion” means. This could be another agent, an honest and ethical human, or some kind of nefarious actor. ↩︎

  4. Madani, Omid. On the Computability of Infinite-Horizon Partially Observable Markov Decision Processes. 1999 ↩︎

  5. RLE, pgs 4, 5 ↩︎

  6. Kamalzadeh, H. & Hahsler, M. POMDP: Introduction to Partially Observable Markov Decision Processes. R. 2019.↩︎

  7. Arguably, this is where our less deterministic “gut feeling” originates. In The Eden Project, Jungian psychologist James Hollis, claims our understanding of the world is shaped quite explicitly at an early age. ↩︎