Log in

Relevant bibliographies by topics / Reward functions / Journal articles

To see the other types of publications on this topic, follow the link: Reward functions.

Journal articles on the topic 'Reward functions'

Author: Grafiati

Published: 4 June 2021

Last updated: 8 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Reward functions.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Meder, Björn, and Jonathan D. Nelson. "Information search with situation-specific reward functions." Judgment and Decision Making 7, no. 2 (2012): 119–48. http://dx.doi.org/10.1017/s1930297500002977.

Full text

Abstract:

AbstractThe goal of obtaining information to improve classification accuracy can strongly conflict with the goal of obtaining information for improving payoffs. Two environments with such a conflict were identified through computer optimization. Three subsequent experiments investigated people’s search behavior in these environments. Experiments 1 and 2 used a multiple-cue probabilistic category-learning task to convey environmental probabilities. In a subsequent search task subjects could query only a single feature before making a classification decision. The crucial manipulation concerned t

APA, Harvard, Vancouver, ISO, and other styles

2

Soltani, A. Reza. "Reward processes with nonlinear reward functions." Journal of Applied Probability 33, no. 4 (1996): 1011–17. http://dx.doi.org/10.2307/3214982.

Full text

Abstract:

Based on a semi-Markov process J(t), t ≧ 0, a reward process Z(t), t ≧ 0, is introduced where it is assumed that the reward function, p(k, x) is nonlinear; if the reward function is linear, i.e. ρ (k, x) = kx, the reward process Z(t), t ≧ 0, becomes the classical one, which has been considered by many authors. An explicit formula for E(Z(t)) is given in terms of the moments of the sojourn time distribution at t, when the reward function is a polynomial.

APA, Harvard, Vancouver, ISO, and other styles

3

Soltani, A. Reza. "Reward processes with nonlinear reward functions." Journal of Applied Probability 33, no. 04 (1996): 1011–17. http://dx.doi.org/10.1017/s0021900200100440.

Full text

Abstract:

Based on a semi-Markov process J(t), t ≧ 0, a reward process Z(t), t ≧ 0, is introduced where it is assumed that the reward function, p(k, x) is nonlinear; if the reward function is linear, i.e. ρ (k, x) = kx, the reward process Z(t), t ≧ 0, becomes the classical one, which has been considered by many authors. An explicit formula for E(Z(t)) is given in terms of the moments of the sojourn time distribution at t, when the reward function is a polynomial.

APA, Harvard, Vancouver, ISO, and other styles

4

Xu, Zhe, Ivan Gavran, Yousef Ahmad, et al. "Joint Inference of Reward Machines and Policies for Reinforcement Learning." Proceedings of the International Conference on Automated Planning and Scheduling 30 (June 1, 2020): 590–98. http://dx.doi.org/10.1609/icaps.v30i1.6756.

Full text

Abstract:

Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis rew

APA, Harvard, Vancouver, ISO, and other styles

5

TUMER, KAGAN, and ADRIAN AGOGINO. "MULTIAGENT LEARNING FOR BLACK BOX SYSTEM REWARD FUNCTIONS." Advances in Complex Systems 12, no. 04n05 (2009): 475–92. http://dx.doi.org/10.1142/s0219525909002295.

Full text

Abstract:

In large, distributed systems composed of adaptive and interactive components (agents), ensuring the coordination among the agents so that the system achieves certain performance objectives is a challenging proposition. The key difficulty to overcome in such systems is one of credit assignment: How to apportion credit (or blame) to a particular agent based on the performance of the entire system. In this paper, we show how this problem can be solved in general for a large class of reward functions whose analytical form may be unknown (hence "black box" reward). This method combines the salient

APA, Harvard, Vancouver, ISO, and other styles

6

Corazza, Jan, Ivan Gavran, and Daniel Neider. "Reinforcement Learning with Stochastic Reward Machines." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 6429–36. http://dx.doi.org/10.1609/aaai.v36i6.20594.

Full text

Abstract:

Reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequences of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. To overcome this practical limitation, we introduce a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them. Our algorithm, based on constraint solving, learns minimal stochastic reward machines from the explorations of a reinforcement learning agent. This al

APA, Harvard, Vancouver, ISO, and other styles

7

Pastor-Bernier, Alexandre, Arkadiusz Stasiak, and Wolfram Schultz. "Reward-specific satiety affects subjective value signals in orbitofrontal cortex during multicomponent economic choice." Proceedings of the National Academy of Sciences 118, no. 30 (2021): e2022650118. http://dx.doi.org/10.1073/pnas.2022650118.

Full text

Abstract:

Sensitivity to satiety constitutes a basic requirement for neuronal coding of subjective reward value. Satiety from natural ongoing consumption affects reward functions in learning and approach behavior. More specifically, satiety reduces the subjective economic value of individual rewards during choice between options that typically contain multiple reward components. The unconfounded assessment of economic reward value requires tests at choice indifference between two options, which is difficult to achieve with sated rewards. By conceptualizing choices between options with multiple reward co

APA, Harvard, Vancouver, ISO, and other styles

8

Booth, Serena, W. Bradley Knox, Julie Shah, Scott Niekum, Peter Stone, and Alessandro Allievi. "The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 5 (2023): 5920–29. http://dx.doi.org/10.1609/aaai.v37i5.25733.

Full text

Abstract:

In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often necessarily sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. The sparsity of these true task metrics can make them hard to learn from, so in practice they are often replaced with alternative dense reward functions. These dense reward functions are typically designed by experts through an ad hoc process of trial and error. In this process, experts manually search for a reward function that improves performance with respect to the ta

APA, Harvard, Vancouver, ISO, and other styles

9

Mguni, David, Taher Jafferjee, Jianhong Wang, et al. "Learning to Shape Rewards Using a Game of Two Partners." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (2023): 11604–12. http://dx.doi.org/10.1609/aaai.v37i10.26371.

Full text

Abstract:

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switc

APA, Harvard, Vancouver, ISO, and other styles

10

Toro Icarte, Rodrigo, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning." Journal of Artificial Intelligence Research 73 (January 11, 2022): 173–208. http://dx.doi.org/10.1613/jair.1.12440.

Full text

Abstract:

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propos

APA, Harvard, Vancouver, ISO, and other styles

11

Huo, Liangyu, Zulin Wang, and Mai Xu. "Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (2023): 7953–61. http://dx.doi.org/10.1609/aaai.v37i7.25962.

Full text

Abstract:

Imitation learning (IL) has recently shown impressive performance in training a reinforcement learning agent with human demonstrations, eliminating the difficulty of designing elaborate reward functions in complex environments. However, most IL methods work under the assumption of the optimality of the demonstrations and thus cannot learn policies to surpass the demonstrators. Some methods have been investigated to obtain better-than-demonstration (BD) performance with inner human feedback or preference labels. In this paper, we propose a method to learn rewards from suboptimal demonstrations

APA, Harvard, Vancouver, ISO, and other styles

12

Forbes, Grant C., and David L. Roberts. "Potential-Based Reward Shaping for Intrinsic Motivation (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (2024): 23488–89. http://dx.doi.org/10.1609/aaai.v38i21.30441.

Full text

Abstract:

Recently there has been a proliferation of intrinsic motivation (IM) reward shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was develope

APA, Harvard, Vancouver, ISO, and other styles

13

Zhou, Zhiheng. "A Meta-Analysis of Reward Function and Childhood Obesity." Lecture Notes in Education Psychology and Public Media 2, no. 1 (2023): 1015–20. http://dx.doi.org/10.54254/2753-7048/2/2022640.

Full text

Abstract:

Obesity is a prevailing illness among children around the world. In the past decades, to unveil the truth about obesity, many researchers have conducted different studies on the influence of the reward function system on eating behaviors and obesity. Their reports all have similar statements that the dysfunctional neuronal reward function system has influenced many children worldwide and led to obesity. In addition, they generally state that the reward function brings more lousy eating behaviors and increases chances of obesity in children. In this case, the reward function system functions th

APA, Harvard, Vancouver, ISO, and other styles

14

Mizuhiki, Takashi, Barry J. Richmond, and Munetaka Shidara. "Encoding of reward expectation by monkey anterior insular neurons." Journal of Neurophysiology 107, no. 11 (2012): 2996–3007. http://dx.doi.org/10.1152/jn.00282.2011.

Full text

Abstract:

The insula, a cortical brain region that is known to encode information about autonomic, visceral, and olfactory functions, has recently been shown to encode information during reward-seeking tasks in both single neuronal recording and functional magnetic resonance imaging studies. To examine the reward-related activation, we recorded from 170 single neurons in anterior insula of 2 monkeys during a multitrial reward schedule task, where the monkeys had to complete a schedule of 1, 2, 3, or 4 trials to earn a reward. In one block of trials a visual cue indicated whether a reward would or would

APA, Harvard, Vancouver, ISO, and other styles

15

Nordhaug, Odd. "Reward Functions of Personnel Training." Human Relations 42, no. 5 (1989): 373–88. http://dx.doi.org/10.1177/001872678904200501.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Liu, Zhixiang, Rui Lin, and Minmin Luo. "Reward Contributions to Serotonergic Functions." Annual Review of Neuroscience 43, no. 1 (2020): 141–62. http://dx.doi.org/10.1146/annurev-neuro-093019-112252.

Full text

Abstract:

The brain serotonin systems participate in numerous aspects of reward processing, although it remains elusive how exactly serotonin signals regulate neural computation and reward-related behavior. The application of optogenetics and imaging techniques during the last decade has provided many insights. Here, we review recent progress on the organization and physiology of the dorsal raphe serotonin neurons and the relationships between their activity and behavioral functions in the context of reward processing. We also discuss several interesting theories on serotonin's function and how these th

APA, Harvard, Vancouver, ISO, and other styles

17

Dharmavaram, Akshay, Matthew Riemer, and Shalabh Bhatnagar. "Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13777–78. http://dx.doi.org/10.1609/aaai.v34i10.7160.

Full text

Abstract:

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an

APA, Harvard, Vancouver, ISO, and other styles

18

Ramakrishnan, Arjun, Yoon Woo Byun, Kyle Rand, Christian E. Pedersen, Mikhail A. Lebedev, and Miguel A. L. Nicolelis. "Cortical neurons multiplex reward-related signals along with sensory and motor information." Proceedings of the National Academy of Sciences 114, no. 24 (2017): E4841—E4850. http://dx.doi.org/10.1073/pnas.1703668114.

Full text

Abstract:

Rewards are known to influence neural activity associated with both motor preparation and execution. This influence can be exerted directly upon the primary motor (M1) and somatosensory (S1) cortical areas via the projections from reward-sensitive dopaminergic neurons of the midbrain ventral tegmental areas. However, the neurophysiological manifestation of reward-related signals in M1 and S1 are not well understood. Particularly, it is unclear how the neurons in these cortical areas multiplex their traditional functions related to the control of spatial and temporal characteristics of movement

APA, Harvard, Vancouver, ISO, and other styles

19

Weng, Paul, and Olivier Spanjaard. "Functional Reward Markov Decision Processes: Theory and Applications." International Journal on Artificial Intelligence Tools 26, no. 03 (2017): 1760014. http://dx.doi.org/10.1142/s0218213017600144.

Full text

Abstract:

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss

APA, Harvard, Vancouver, ISO, and other styles

20

Proper, Scott, and Kagan Tumer. "Multiagent Learning with a Noisy Global Reward Signal." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 826–32. http://dx.doi.org/10.1609/aaai.v27i1.8580.

Full text

Abstract:

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function appr

APA, Harvard, Vancouver, ISO, and other styles

21

Ostaszewski, Pawel, and Katarzyna Karzel. "Discounting of Delayed and Probabilistic Losses of Different Amounts." European Psychologist 7, no. 4 (2002): 295–301. http://dx.doi.org/10.1027//1016-9040.7.4.295.

Full text

Abstract:

Previous research has shown that the value of a larger future reward is discounted less steeply than the value of a smaller future reward. The value of probabilistic reward has been shown to have either an opposite effect on discounting (when a smaller reward is not certain its value was discounted less steeply than the value of a larger reward) or no effect on the rate of discounting at all. The present article shows the results for delayed and probabilistic losses: The same hyperbola-like functions describe temporal and probabilistic discounting of both rewards and losses. In the case of los

APA, Harvard, Vancouver, ISO, and other styles

22

Linke, Cam, Nadia M. Ady, Martha White, Thomas Degris, and Adam White. "Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study." Journal of Artificial Intelligence Research 69 (December 14, 2020): 1287–332. http://dx.doi.org/10.1613/jair.1.12087.

Full text

Abstract:

Learning about many things can provide numerous benefits to a reinforcement learning system. For example, learning many auxiliary value functions, in addition to optimizing the environmental reward, appears to improve both exploration and representation learning. The question we tackle in this paper is how to sculpt the stream of experience—how to adapt the learning system’s behavior—to optimize the learning of a collection of value functions. A simple answer is to compute an intrinsic reward based on the statistics of each auxiliary learner, and use reinforcement learning to maximize that int

APA, Harvard, Vancouver, ISO, and other styles

23

Niekum, Scott. "Evolved Intrinsic Reward Functions for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (2010): 1955–56. http://dx.doi.org/10.1609/aaai.v24i1.7772.

Full text

Abstract:

Reward functions in reinforcement learning have largely been assumed given as part of the problem being solved by the agent. However, the psychological notion of intrinsic motivation has recently inspired inquiry into whether there exist alternate reward functions that enable an agent to learn a task more easily than the natural task-based reward function allows. This paper presents an efficient genetic programming algorithm to search for alternate reward functions that improve agent learning performance.

APA, Harvard, Vancouver, ISO, and other styles

24

Stefanov, Valeri T. "Exact distributions for reward functions on semi-Markov and Markov additive processes." Journal of Applied Probability 43, no. 4 (2006): 1053–65. http://dx.doi.org/10.1239/jap/1165505207.

Full text

Abstract:

The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed. On the other hand, it has been noticed that it is difficult to find exact distribution results which lead to the effective computation of such distributions. Note that there is no satisfactory exact distribution result for rewards accumulated over deterministic time intervals [0, t], even in the special case of continuous-time Markov chains. The present paper provides neat general results which lead to e

APA, Harvard, Vancouver, ISO, and other styles

25

Stefanov, Valeri T. "Exact distributions for reward functions on semi-Markov and Markov additive processes." Journal of Applied Probability 43, no. 04 (2006): 1053–65. http://dx.doi.org/10.1017/s0021900200002424.

Full text

Abstract:

The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed. On the other hand, it has been noticed that it is difficult to find exact distribution results which lead to the effective computation of such distributions. Note that there is no satisfactory exact distribution result for rewards accumulated over deterministic time intervals [0, t], even in the special case of continuous-time Markov chains. The present paper provides neat general results which lead to e

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Min, Xin Li, Leiji Zhang, and Mingzhong Wang. "MetaCARD: Meta-Reinforcement Learning with Task Uncertainty Feedback via Decoupled Context-Aware Reward and Dynamics Components." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15555–62. http://dx.doi.org/10.1609/aaai.v38i14.29482.

Full text

Abstract:

Meta-Reinforcement Learning (Meta-RL) aims to reveal shared characteristics in dynamics and reward functions across diverse training tasks. This objective is achieved by meta-learning a policy that is conditioned on task representations with encoded trajectory data or context, thus allowing rapid adaptation to new tasks from a known task distribution. However, since the trajectory data generated by the policy may be biased, the task inference module tends to form spurious correlations between trajectory data and specific tasks, thereby leading to poor adaptation to new tasks. To address this i

APA, Harvard, Vancouver, ISO, and other styles

27

Schultz, Wolfram. "Predictive Reward Signal of Dopamine Neurons." Journal of Neurophysiology 80, no. 1 (1998): 1–27. http://dx.doi.org/10.1152/jn.1998.80.1.1.

Full text

Abstract:

Schultz, Wolfram. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest that midbrain dopamine systems are involved in processing reward information and learning approach behavior. Most dopamine neurons show phasic activations after primary liquid and food rewards and conditioned, reward-predicting visual and auditory stimuli. They show biphasic, activation-depression responses after stimuli that resemble reward-predicting stimuli or are novel or particularly salient. How

APA, Harvard, Vancouver, ISO, and other styles

28

Maulana, Mohammad Deni Irkhamil, and Langgeng Budianto. "THE STUDENT'S PERCEPTION OF REWARDS TO INCREASE THEIR MOTIVATION IN ENGLISH LEARNING IN JUNIOR HIGH SCHOOL." English Edu: Journal of English Teaching and Learning 1, no. 1 (2022): 18–25. http://dx.doi.org/10.18860/jetl.v1i1.1623.

Full text

Abstract:

The reward is a form of appreciation from the student pursuer to provide an inner emotional connection. The reward is a strategy in learning that functions as a stimulus and response to students in learning English effectively. The type of reward given to students varies greatly depending on the needs of students in learning, ranging from praise, grades, and nonverbal. All the variations of rewards in learning have a common goal of increasing student motivation in learning. This study aims to determine students' perceptions of reward strategies to increase motivation to learn English in junior

APA, Harvard, Vancouver, ISO, and other styles

29

Velasquez, Alvaro, Brett Bissey, Lior Barak, et al. "Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (2021): 12015–23. http://dx.doi.org/10.1609/aaai.v35i13.17427.

Full text

Abstract:

Reinforcement learning and planning have been revolutionized in recent years, due in part to the mass adoption of deep convolutional neural networks and the resurgence of powerful methods to refine decision-making policies. However, the problem of sparse reward signals and their representation remains pervasive in many domains. While various rewardshaping mechanisms and imitation learning approaches have been proposed to mitigate this problem, the use of humanaided artificial rewards introduces human error, sub-optimal behavior, and a greater propensity for reward hacking. In this paper, we mi

APA, Harvard, Vancouver, ISO, and other styles

30

Louie, Kenway. "Asymmetric and adaptive reward coding via normalized reinforcement learning." PLOS Computational Biology 18, no. 7 (2022): e1010350. http://dx.doi.org/10.1371/journal.pcbi.1010350.

Full text

Abstract:

Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in

APA, Harvard, Vancouver, ISO, and other styles

31

Lamberton, Damien. "Optimal stopping with irregular reward functions." Stochastic Processes and their Applications 119, no. 10 (2009): 3253–84. http://dx.doi.org/10.1016/j.spa.2009.05.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Schultz, Wolfram. "Reward functions of the basal ganglia." Journal of Neural Transmission 123, no. 7 (2016): 679–93. http://dx.doi.org/10.1007/s00702-016-1510-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Knox, W. Bradley, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, and Peter Stone. "Reward (Mis)design for Autonomous Driving (Abstract Reprint)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 20 (2024): 22702. http://dx.doi.org/10.1609/aaai.v38i20.30602.

Full text

Abstract:

This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaws in reward functions. We survey research that is published in top-tier venues and focuses on reinforcement learning (RL) for autonomous driving (AD). Specifically, we closely examine the reported reward function in each publication and present these reward functions in a complete and standardized format in the appendix. Wh

APA, Harvard, Vancouver, ISO, and other styles

34

Schultz, Wolfram. "Neuronal Reward and Decision Signals: From Theories to Data." Physiological Reviews 95, no. 3 (2015): 853–951. http://dx.doi.org/10.1152/physrev.00023.2014.

Full text

Abstract:

Rewards are crucial objects that induce learning, approach behavior, choices, and emotions. Whereas emotions are difficult to investigate in animals, the learning function is mediated by neuronal reward prediction error signals which implement basic constructs of reinforcement learning theory. These signals are found in dopamine neurons, which emit a global reward signal to striatum and frontal cortex, and in specific neurons in striatum, amygdala, and frontal cortex projecting to select neuronal populations. The approach and choice functions involve subjective value, which is objectively asse

APA, Harvard, Vancouver, ISO, and other styles

35

Gehring, Clement, Masataro Asai, Rohan Chitnis, et al. "Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators." Proceedings of the International Conference on Automated Planning and Scheduling 32 (June 13, 2022): 588–96. http://dx.doi.org/10.1609/icaps.v32i1.19846.

Full text

Abstract:

Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue an

APA, Harvard, Vancouver, ISO, and other styles

36

Rajendran, Janarthanan, Richard Lewis, Vivek Veeriah, Honglak Lee, and Satinder Singh. "How Should an Agent Practice?" Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (2020): 5454–61. http://dx.doi.org/10.1609/aaai.v34i04.5995.

Full text

Abstract:

We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available. During practice, the environment may differ from the one available for training and evaluation with extrinsic rewards. We refer to this setup of alternating periods of practice and objective evaluation as practice-match, drawing an analogy to regimes of skill acquisition common for humans in sports and games. The agent must effectively use periods in the practice environment so that performance improves during matches. In th

APA, Harvard, Vancouver, ISO, and other styles

37

Yang, Luting, Jianyi Yang, and Shaolei Ren. "Contextual Bandits with Delayed Feedback and Semi-supervised Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (2021): 15943–44. http://dx.doi.org/10.1609/aaai.v35i18.17968.

Full text

Abstract:

Contextual multi-armed bandit (MAB) is a classic online learning problem, where a learner/agent selects actions (i.e., arms) given contextual information and discovers optimal actions based on reward feedback. Applications of contextual bandit have been increasingly expanding, including advertisement, personalization, resource allocation in wireless networks, among others. Nonetheless, the reward feedback is delayed in many applications (e.g., a user may only provide service ratings after a period of time), creating challenges for contextual bandits. In this paper, we address delayed feedback

APA, Harvard, Vancouver, ISO, and other styles

38

Wang, Xusheng, Jiexin Xie, Shijie Guo, Yue Li, Pengfei Sun, and Zhongxue Gan. "Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions." Advances in Mechanical Engineering 13, no. 12 (2021): 168781402110670. http://dx.doi.org/10.1177/16878140211067011.

Full text

Abstract:

Deep reinforcement learning (DRL) provides a new solution for rehabilitation robot trajectory planning in the unstructured working environment, which can bring great convenience to patients. Previous researches mainly focused on optimization strategies but ignored the construction of reward functions, which leads to low efficiency. Different from traditional sparse reward function, this paper proposes two dense reward functions. First, azimuth reward function mainly provides a global guidance and reasonable constraints in the exploration. To further improve the efficiency, a process-oriented a

APA, Harvard, Vancouver, ISO, and other styles

39

VILASECA, JORDI, ANTONI MESEGUER, JOAN TORRENT, and RAQUEL FERRERAS. "REWARD FUNCTIONS AND COOPERATIVE GAMES: CHARACTERIZATION AND ECONOMIC APPLICATION." International Game Theory Review 10, no. 02 (2008): 165–76. http://dx.doi.org/10.1142/s0219198908001856.

Full text

Abstract:

In this paper we study network structures in which the possibilities for cooperation are restricted and the benefits of a group of players depend on how these players are internally connected. One way to represent this type of situations is the so-called reward function, which represents the profits obtainable by the total coalition if links can be used to coordinate agents' actions. For any cooperative game, a reward function is associated. Given a reward function, our aim is to analyze under which conditions it is possible to associate a cooperative game to it. We characterize the reward fun

APA, Harvard, Vancouver, ISO, and other styles

40

Dragoni Divrak, Dora. "Reward actualities." Journal of Historical Archaeology & Anthropological Sciences 6, no. 2 (2021): 62–64. http://dx.doi.org/10.15406/jhaas.2021.06.00247.

Full text

Abstract:

Reward system is a key to understand how we can be in health and live in wellbeing or wellness. It is the series of dopaminergic and serotoninergic neurons that involve our body-mind unity. It starts infact from VTA, ventral tegmental area in midbrain and then: • There is a lateral reward pathway related to stress conduction messages • Mainly there is a medial reward pathway related to life functions regulations, wellness, and also the more known decision making and learning and memory capacities.

APA, Harvard, Vancouver, ISO, and other styles

41

Ma, Shuai, and Jia Yuan Yu. "State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4512–19. http://dx.doi.org/10.1609/aaai.v33i01.33014512.

Full text

Abstract:

In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We propose three successively more general state-augmentation transformations (SAT

APA, Harvard, Vancouver, ISO, and other styles

42

Fard, Neshat Elhami, and Rastko Selmic. "Consensus of Multi-agent Reinforcement Learning Systems: The Effect of Immediate Rewards." Journal of Robotics and Control (JRC) 3, no. 2 (2022): 115–27. http://dx.doi.org/10.18196/jrc.v3i2.13082.

Full text

Abstract:

This paper studies the consensus problem of a leaderless, homogeneous, multi-agent reinforcement learning (MARL) system using actor-critic algorithms with and without malicious agents. The goal of each agent is to reach the consensus position with the maximum cumulative reward. Although the reward function converges in both scenarios, in the absence of the malicious agent, the cumulative reward is higher than with the malicious agent present. We consider here various immediate reward functions. First, we study the immediate reward function based on Manhattan distance. In addition to proposing

APA, Harvard, Vancouver, ISO, and other styles

43

Zuo, Guoyu, Qishen Zhao, Jiahao Lu, and Jiangeng Li. "Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards." International Journal of Advanced Robotic Systems 17, no. 1 (2020): 172988141989834. http://dx.doi.org/10.1177/1729881419898342.

Full text

Abstract:

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Tw

APA, Harvard, Vancouver, ISO, and other styles

44

Uchibe, Eiji, and Kenji Doya. "Hierarchical Reinforcement Learning for Multiple Reward Functions." Journal of the Robotics Society of Japan 22, no. 1 (2004): 120–29. http://dx.doi.org/10.7210/jrsj.22.120.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Grundel, Soesja, Peter Borm, and Herbert Hamers. "Resource allocation problems with concave reward functions." TOP 27, no. 1 (2018): 37–54. http://dx.doi.org/10.1007/s11750-018-0482-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Luo, Xudong, Yufeng Yang, and Ho-fung Leung. "Reward and Penalty Functions in Automated Negotiation." International Journal of Intelligent Systems 31, no. 7 (2015): 637–72. http://dx.doi.org/10.1002/int.21797.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Kim, MyeongSeop, Jung-Su Kim, and Jae-Han Park. "Automated Hyperparameter Tuning in Reinforcement Learning for Quadrupedal Robot Locomotion." Electronics 13, no. 1 (2023): 116. http://dx.doi.org/10.3390/electronics13010116.

Full text

Abstract:

In reinforcement learning, the reward function has a significant impact on the performance of the agent. However, determining the appropriate value of this reward function requires many attempts and trials. Although many automated reinforcement learning methods have been proposed to find an appropriate reward function, their proof is lacking in complex environments such as quadrupedal locomotion. In this paper, we propose a method to automatically tune the scale of the dominant reward functions in reinforcement learning of a quadrupedal robot. Reinforcement learning of the quadruped robot is v

APA, Harvard, Vancouver, ISO, and other styles

48

Hahn, Ernst Moritz, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. "Omega-Regular Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (2024): 21125–33. http://dx.doi.org/10.1609/aaai.v38i19.30105.

Full text

Abstract:

Regular decision processes (RDPs) are a subclass of non-Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct representation of non-Markovian decision processes, their expressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead over the system evolution. Semantically, these lookaheads can be c

APA, Harvard, Vancouver, ISO, and other styles

49

Rolls, Edmund T. "Précis of The brain and emotion." Behavioral and Brain Sciences 23, no. 2 (2000): 177–91. http://dx.doi.org/10.1017/s0140525x00002429.

Full text

Abstract:

The topics treated in The brain and emotion include the definition, nature, and functions of emotion (Ch. 3); the neural bases of emotion (Ch. 4); reward, punishment, and emotion in brain design (Ch. 10); a theory of consciousness and its application to understanding emotion and pleasure (Ch. 9); and neural networks and emotion-related learning (Appendix). The approach is that emotions can be considered as states elicited by reinforcers (rewards and punishers). This approach helps with understanding the functions of emotion, with classifying different emotions, and in understanding what inform

APA, Harvard, Vancouver, ISO, and other styles

50

Chen, Yang, Xiao Lin, Bo Yan, et al. "Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (2024): 11407–15. http://dx.doi.org/10.1609/aaai.v38i10.29021.

Full text

Abstract:

Designing suitable reward functions for numerous interacting intelligent agents is challenging in real-world applications. Inverse reinforcement learning (IRL) in mean field games (MFGs) offers a practical framework to infer reward functions from expert demonstrations. While promising, the assumption of agent homogeneity limits the capability of existing methods to handle demonstrations with heterogeneous and unknown objectives, which are common in practice. To this end, we propose a deep latent variable MFG model and an associated IRL method. Critically, our method can infer rewards from diff

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!