Academic literature on the topic 'Reward functions'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Reward functions.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Reward functions"

1

Meder, Björn, and Jonathan D. Nelson. "Information search with situation-specific reward functions." Judgment and Decision Making 7, no. 2 (2012): 119–48. http://dx.doi.org/10.1017/s1930297500002977.

Full text
Abstract:
AbstractThe goal of obtaining information to improve classification accuracy can strongly conflict with the goal of obtaining information for improving payoffs. Two environments with such a conflict were identified through computer optimization. Three subsequent experiments investigated people’s search behavior in these environments. Experiments 1 and 2 used a multiple-cue probabilistic category-learning task to convey environmental probabilities. In a subsequent search task subjects could query only a single feature before making a classification decision. The crucial manipulation concerned t
APA, Harvard, Vancouver, ISO, and other styles
2

Soltani, A. Reza. "Reward processes with nonlinear reward functions." Journal of Applied Probability 33, no. 4 (1996): 1011–17. http://dx.doi.org/10.2307/3214982.

Full text
Abstract:
Based on a semi-Markov process J(t), t ≧ 0, a reward process Z(t), t ≧ 0, is introduced where it is assumed that the reward function, p(k, x) is nonlinear; if the reward function is linear, i.e. ρ (k, x) = kx, the reward process Z(t), t ≧ 0, becomes the classical one, which has been considered by many authors. An explicit formula for E(Z(t)) is given in terms of the moments of the sojourn time distribution at t, when the reward function is a polynomial.
APA, Harvard, Vancouver, ISO, and other styles
3

Soltani, A. Reza. "Reward processes with nonlinear reward functions." Journal of Applied Probability 33, no. 04 (1996): 1011–17. http://dx.doi.org/10.1017/s0021900200100440.

Full text
Abstract:
Based on a semi-Markov process J(t), t ≧ 0, a reward process Z(t), t ≧ 0, is introduced where it is assumed that the reward function, p(k, x) is nonlinear; if the reward function is linear, i.e. ρ (k, x) = kx, the reward process Z(t), t ≧ 0, becomes the classical one, which has been considered by many authors. An explicit formula for E(Z(t)) is given in terms of the moments of the sojourn time distribution at t, when the reward function is a polynomial.
APA, Harvard, Vancouver, ISO, and other styles
4

Xu, Zhe, Ivan Gavran, Yousef Ahmad, et al. "Joint Inference of Reward Machines and Policies for Reinforcement Learning." Proceedings of the International Conference on Automated Planning and Scheduling 30 (June 1, 2020): 590–98. http://dx.doi.org/10.1609/icaps.v30i1.6756.

Full text
Abstract:
Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis rew
APA, Harvard, Vancouver, ISO, and other styles
5

TUMER, KAGAN, and ADRIAN AGOGINO. "MULTIAGENT LEARNING FOR BLACK BOX SYSTEM REWARD FUNCTIONS." Advances in Complex Systems 12, no. 04n05 (2009): 475–92. http://dx.doi.org/10.1142/s0219525909002295.

Full text
Abstract:
In large, distributed systems composed of adaptive and interactive components (agents), ensuring the coordination among the agents so that the system achieves certain performance objectives is a challenging proposition. The key difficulty to overcome in such systems is one of credit assignment: How to apportion credit (or blame) to a particular agent based on the performance of the entire system. In this paper, we show how this problem can be solved in general for a large class of reward functions whose analytical form may be unknown (hence "black box" reward). This method combines the salient
APA, Harvard, Vancouver, ISO, and other styles
6

Corazza, Jan, Ivan Gavran, and Daniel Neider. "Reinforcement Learning with Stochastic Reward Machines." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 6429–36. http://dx.doi.org/10.1609/aaai.v36i6.20594.

Full text
Abstract:
Reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequences of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. To overcome this practical limitation, we introduce a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them. Our algorithm, based on constraint solving, learns minimal stochastic reward machines from the explorations of a reinforcement learning agent. This al
APA, Harvard, Vancouver, ISO, and other styles
7

Pastor-Bernier, Alexandre, Arkadiusz Stasiak, and Wolfram Schultz. "Reward-specific satiety affects subjective value signals in orbitofrontal cortex during multicomponent economic choice." Proceedings of the National Academy of Sciences 118, no. 30 (2021): e2022650118. http://dx.doi.org/10.1073/pnas.2022650118.

Full text
Abstract:
Sensitivity to satiety constitutes a basic requirement for neuronal coding of subjective reward value. Satiety from natural ongoing consumption affects reward functions in learning and approach behavior. More specifically, satiety reduces the subjective economic value of individual rewards during choice between options that typically contain multiple reward components. The unconfounded assessment of economic reward value requires tests at choice indifference between two options, which is difficult to achieve with sated rewards. By conceptualizing choices between options with multiple reward co
APA, Harvard, Vancouver, ISO, and other styles
8

Booth, Serena, W. Bradley Knox, Julie Shah, Scott Niekum, Peter Stone, and Alessandro Allievi. "The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 5 (2023): 5920–29. http://dx.doi.org/10.1609/aaai.v37i5.25733.

Full text
Abstract:
In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often necessarily sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. The sparsity of these true task metrics can make them hard to learn from, so in practice they are often replaced with alternative dense reward functions. These dense reward functions are typically designed by experts through an ad hoc process of trial and error. In this process, experts manually search for a reward function that improves performance with respect to the ta
APA, Harvard, Vancouver, ISO, and other styles
9

Mguni, David, Taher Jafferjee, Jianhong Wang, et al. "Learning to Shape Rewards Using a Game of Two Partners." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (2023): 11604–12. http://dx.doi.org/10.1609/aaai.v37i10.26371.

Full text
Abstract:
Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switc
APA, Harvard, Vancouver, ISO, and other styles
10

Toro Icarte, Rodrigo, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning." Journal of Artificial Intelligence Research 73 (January 11, 2022): 173–208. http://dx.doi.org/10.1613/jair.1.12440.

Full text
Abstract:
Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propos
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Reward functions"

1

Mendonça, Matheus Ribeiro Furtado de. "Evolution of reward functions for reinforcement learning applied to stealth games." Universidade Federal de Juiz de Fora (UFJF), 2016. https://repositorio.ufjf.br/jspui/handle/ufjf/4771.

Full text
Abstract:
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-31T11:40:17Z No. of bitstreams: 1 matheusribeirofurtadodemendonca.pdf: 1083096 bytes, checksum: bb42372f22411bc93823b92e7361a490 (MD5)<br>Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-31T12:42:30Z (GMT) No. of bitstreams: 1 matheusribeirofurtadodemendonca.pdf: 1083096 bytes, checksum: bb42372f22411bc93823b92e7361a490 (MD5)<br>Made available in DSpace on 2017-05-31T12:42:30Z (GMT). No. of bitstreams: 1 matheusribeirofurtadodemendonca.pdf: 1083096 bytes, checksum: bb42372f22411bc93823
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Adawi, Samir Hamed Nasser. "The neuropsychopharmacology of motivation : an examination of reward and frontal-subcortical mechanisms and functions." Thesis, King's College London (University of London), 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.286323.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jönsson, Henrik. "Optimal Stopping Domains and Reward Functions for Discrete Time American Type Options." Doctoral thesis, Mälardalen University, Department of Mathematics and Physics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-58.

Full text
Abstract:
<p>Avhandlingen behandlar problemet att välja tidpunkt för att lösa in en amerikansk option. En amerikansk option ger ägaren rätten att köpa eller sälja en underliggande vara för ett fast pris, kallat lösenpriset, fram till och med en förbestämd tid, den så kallade slutdagen. Den underliggande varan kan vara en aktie, en växelkurs, eller någon annan ekonomisk tillgång. I avhandlingen antar vi att optionen kan lösas in vid vissa givna tidpunkter som vi för enkelhets skull kallar 0,1, 2,…,N, där N är lösendagen.</p><p>Ägaren vill hitta den optimala tidpunkten att lösa in optionen så att nuvärdet
APA, Harvard, Vancouver, ISO, and other styles
4

Gatica, Ricardo A. "A binary dynamic programming problem with affine transitions and reward functions : properties and algorithm." Diss., Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/32839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bradley, Jay. "Reinforcement learning for qualitative group behaviours applied to non-player computer game characters." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/4784.

Full text
Abstract:
This thesis investigates how to train the increasingly large cast of characters in modern commercial computer games. Modern computer games can contain hundreds or sometimes thousands of non-player characters that each should act coherently in complex dynamic worlds, and engage appropriately with other non-player characters and human players. Too often, it is obvious that computer controlled characters are brainless zombies portraying the same repetitive hand-coded behaviour. Commercial computer games would seem a natural domain for reinforcement learning and, as the trend for selling games bas
APA, Harvard, Vancouver, ISO, and other styles
6

Gadre, Aditya Shrikant. "Learning Strategies in Multi-Agent Systems - Applications to the Herding Problem." Thesis, Virginia Tech, 2001. http://hdl.handle.net/10919/36116.

Full text
Abstract:
<p> "Multi-Agent systems" is a topic for a lot of research, especially research involving strategy, evolution and cooperation among various agents. Various learning algorithm schemes have been proposed such as reinforcement learning and evolutionary computing. </p><p> In this thesis two solutions to a multi-agent herding problem are presented. One solution is based on Q-learning algorithm, while the other is based on modeling of artificial immune system. </p><p> Q-learning solution for the herding problem is developed, using region-based local learning for each individual agent. Individual an
APA, Harvard, Vancouver, ISO, and other styles
7

White, Monica Menne. "Marital satisfaction as a function of equity, reward level, expectations, and rewards /." The Ohio State University, 1990. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487683756125182.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Raffensperger, Peter Abraham. "Measuring and Influencing Sequential Joint Agent Behaviours." Thesis, University of Canterbury. Electrical and Computer Engineering, 2013. http://hdl.handle.net/10092/7472.

Full text
Abstract:
Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of tu
APA, Harvard, Vancouver, ISO, and other styles
9

Grunitzki, Ricardo. "Aprendizado por reforço multiagente : uma avaliação de diferentes mecanismos de recompensa para o problema de aprendizado de rotas." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/107123.

Full text
Abstract:
Esta dissertação de mestrado apresenta um estudo sobre os efeitos de diferentes funções de recompensa, aplicadas em aprendizado por reforço multiagente, para o problema de roteamento de veículos, em redes de tráfego. São abordadas duas funções de recompensas que diferem no alinhamento do sinal numérico enviado do ambiente ao agente. A primeira função, chamada função individual, é alinhada à utilidade individual do agente (veículo ou motorista) e busca minimizar seu tempo de viagem. Já a segunda função, por sua vez, é a chamada difference rewards, essa é alinhada à utilidade global do sistema e
APA, Harvard, Vancouver, ISO, and other styles
10

Asri, Layla El. "Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0350/document.

Full text
Abstract:
Cette thèse s’inscrit dans le cadre de la recherche sur les systèmes de dialogue. Ce document propose d’apprendre le comportement d’un système à partir d’un ensemble de dialogues annotés. Le système apprend un comportement optimal via l’apprentissage par renforcement. Nous montrons qu’il n’est pas nécessaire de définir une représentation de l’espace d’état ni une fonction de récompense. En effet, ces deux paramètres peuvent être appris à partir du corpus de dialogues annotés. Nous montrons qu’il est possible pour un développeur de systèmes de dialogue d’optimiser la gestion du dialogue en défi
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Reward functions"

1

Koole, G. M. Monotonicity in Markov reward and decision chains: Theory and applications. Now Publishers, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

M, Liebman Jeffrey, and Cooper S. J, eds. The Neuropharmacological basis of reward. Clarendon Press, 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sam, Byrd, and Idaho. State Division of Vocational Education., eds. Educating unprepared and underprepared adults in Idaho: Addressing the growing demand and reaping the rewards. Idaho Division of Vocational Education, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Koole, Ger. Monotonicity in Markov Reward and Decision Chains: Theory and Applications (Foundations and Trends in Stochastic Systems). Now Publishers Inc, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rustomji, Nerina. The Beauty of the Houri. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780190249342.001.0001.

Full text
Abstract:
The fascination with the houri, the pure female of Islamic paradise, began long before September 11, 2001. The Beauty of the Houri demonstrates how the ambiguous reward of the houri, mentioned in the Qurʾan and developed in Islamic theological writings, has gained a distinctive place in English and French literature from the sixteenth to the twenty-first century and in digital material in the twenty-first century. The houri had multiple functions in Islamic texts that ranged from caretaker to pure companion to entertainment. French, English, and American writers used the houri to critique Isla
APA, Harvard, Vancouver, ISO, and other styles
6

Hogh-Olesen, Henrik. Art and the Brain’s Reward System. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780190927929.003.0008.

Full text
Abstract:
Chapter 7 takes the investigation of the aesthetic impulse into the human brain to understand, first, why only we—and not our closest relatives among the primates—express ourselves aesthetically; and second, how the brain reacts when presented with aesthetic material. Brain scans are less useful when you are interested in the Why of aesthetic behavior rather than the How. Nevertheless, some brain studies have been ground-breaking, and neuroaesthetics offers a pivotal argument for the key function of the aesthetic impulse in human lives; it shows us that the brain’s reward circuit is activated
APA, Harvard, Vancouver, ISO, and other styles
7

Young, Jared W., Alan Anticevic, and Deanna M. Barch. Cognitive and Motivational Neuroscience of Psychotic Disorders. Edited by Dennis S. Charney, Eric J. Nestler, Pamela Sklar, and Joseph D. Buxbaum. Oxford University Press, 2017. http://dx.doi.org/10.1093/med/9780190681425.003.0016.

Full text
Abstract:
Schizophrenia is a complex neuropsychiatric syndrome presenting with a constellation of symptoms. Clinicians have long recognized that abnormalities in cognitive function and motivated behavior are a key component of psychosis, and of schizophrenia in particular. Here we postulate that these deficits may reflect, at least in part, impairments in the ability to actively maintain and utilize internal representations of emotional experiences, previous rewards, and motivational goals in order to drive current and future behavior in a way that would normally allow individuals to obtain desired outc
APA, Harvard, Vancouver, ISO, and other styles
8

Wolpert, David, and NASA Technical Reports Server (NTRS). Theory of Collective Intelligence. BiblioGov, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hogh-Olesen, Henrik. The Aesthetic Animal. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780190927929.001.0001.

Full text
Abstract:
The Aesthetic Animal answers the ultimate questions of why we adorn ourselves; embellish our things and surroundings; and produce art, music, song, dance, and fiction. Humans are aesthetic animals that spend vast amounts of time and resources on seemingly useless aesthetic activities. However, nature would not allow a species to waste precious time and effort on activities completely unrelated to the survival, reproduction, and well-being of that species. Consequently, the aesthetic impulse must have some important biological functions. An impulse is a natural, internal behavioral incentive th
APA, Harvard, Vancouver, ISO, and other styles
10

Han, Shihui. Neural processes of culturally familiar information. Oxford University Press, 2017. http://dx.doi.org/10.1093/acprof:oso/9780198743194.003.0002.

Full text
Abstract:
Chapter 2 introduces the concept of cultural learning and its function in the transmission of cultural knowledge over generations, and the construction of new cultural beliefs/values and behavioral scripts. It examines brain activity that is engaged in differential processing of culturally familiar and unfamiliar information by reviewing functional magnetic resonance imaging and event-related potential studies of neural activity involved in the processing of gesture, music, brand, and religious knowledge. Long-term cultural experiences give rise to specific neural mechanisms in the human brain
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Reward functions"

1

Slikker, Marco, and Anne Van Den Nouweland. "Network Formation and Reward Functions." In Theory and Decision Library. Springer US, 2001. http://dx.doi.org/10.1007/978-1-4615-1569-2_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Fyfe, Colin, and Pei Ling Lai. "Reinforcement Learning Reward Functions for Unsupervised Learning." In Advances in Neural Networks – ISNN 2007. Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-72383-7_47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Schrijvers, Okke, Joseph Bonneau, Dan Boneh, and Tim Roughgarden. "Incentive Compatibility of Bitcoin Mining Pool Reward Functions." In Financial Cryptography and Data Security. Springer Berlin Heidelberg, 2017. http://dx.doi.org/10.1007/978-3-662-54970-4_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tumer, Kagan, and Adrian Agogino. "Efficient Reward Functions for Adaptive Multi-rover Systems." In Learning and Adaption in Multi-Agent Systems. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11691839_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Belotti, Marianna, and Stefano Moretti. "Bankruptcy Solutions as Reward Functions in Mining Pools." In Principles of Blockchain Systems. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-031-01807-7_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Laredo, Sarah A., William R. Marrs, and Loren H. Parsons. "Endocannabinoid Signaling in Reward and Addiction: From Homeostasis to Pathology." In Endocannabinoids and Lipid Mediators in Brain Functions. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-57371-7_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Islam, S. M. R., and H. H. Ammar. "Numerical Solution of Markov Reward Models Using Laguerre Functions." In Numerical Solution of Markov Chains. CRC Press, 2021. http://dx.doi.org/10.1201/9781003210160-34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Petousis, Panayiotis, Simon X. Han, William Hsu, and Alex A. T. Bui. "Generating Reward Functions Using IRL Towards Individualized Cancer Screening." In Lecture Notes in Computer Science. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-12738-1_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Alur, Rajeev, Osbert Bastani, Kishor Jothimurugan, Mateo Perez, Fabio Somenzi, and Ashutosh Trivedi. "Policy Synthesis and Reinforcement Learning for Discounted LTL." In Computer Aided Verification. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37706-8_21.

Full text
Abstract:
AbstractThe difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.
APA, Harvard, Vancouver, ISO, and other styles
10

Fu, Tianyu, Fengming Li, Yukun Zheng, and Rui Song. "Process Learning of Robot Fabric Manipulation Based on Composite Reward Functions." In Intelligent Robotics and Applications. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-89098-8_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Reward functions"

1

Camacho, Alberto, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. "LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/840.

Full text
Abstract:
In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatical
APA, Harvard, Vancouver, ISO, and other styles
2

Mahmud, Saaduddin, Sandhya Saisubramanian, and Shlomo Zilberstein. "Explanation-Guided Reward Alignment." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/53.

Full text
Abstract:
Agents often need to infer a reward function from observations to learn desired behaviors. However, agents may infer a reward function that does not align with the original intent because there can be multiple reward functions consistent with its observations. Operating based on such misaligned rewards can be risky. Furthermore, black-box representations make it difficult to verify the learned rewards and prevent harmful behavior. We present a framework for verifying and improving reward alignment using explanations and show how explanations can help detect misalignment and reveal failure case
APA, Harvard, Vancouver, ISO, and other styles
3

Balakrishnan, Anand, and Jyotirmoy V. Deshmukh. "Structured reward functions using STL." In HSCC '19: 22nd ACM International Conference on Hybrid Systems: Computation and Control. ACM, 2019. http://dx.doi.org/10.1145/3302504.3313355.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Huang, Weiran, Jungseul Ok, Liang Li, and Wei Chen. "Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/317.

Full text
Abstract:
We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years
APA, Harvard, Vancouver, ISO, and other styles
5

MacGlashan, James, Monica Babes-Vroman, Marie desJardins, et al. "Grounding English Commands to Reward Functions." In Robotics: Science and Systems 2015. Robotics: Science and Systems Foundation, 2015. http://dx.doi.org/10.15607/rss.2015.xi.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Baima, Renan Lima, and Esther Luna Colombini. "Modeling Object’s Affordances via Reward Functions." In 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2021. http://dx.doi.org/10.1109/smc52423.2021.9658915.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Goyal, Prasoon, Scott Niekum, and Raymond J. Mooney. "Using Natural Language for Reward Shaping in Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/331.

Full text
Abstract:
Recent reinforcement learning (RL) approaches have shown strong performance in complex domains, such as Atari games, but are highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. Designing such rewards remains a challenge, though. In this work, we use natural language instructions to perform reward shaping. We propose a framework that maps free-form natural language instructions to intermediate rewards, t
APA, Harvard, Vancouver, ISO, and other styles
8

Sadigh, Dorsa, Anca Dragan, Shankar Sastry, and Sanjit Seshia. "Active Preference-Based Learning of Reward Functions." In Robotics: Science and Systems 2017. Robotics: Science and Systems Foundation, 2017. http://dx.doi.org/10.15607/rss.2017.xiii.053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Ziming, Julia Kiseleva, Maarten de Rijke, and Artem Grotov. "Towards Learning Reward Functions from User Interactions." In ICTIR '17: ACM SIGIR International Conference on the Theory of Information Retrieval. ACM, 2017. http://dx.doi.org/10.1145/3121050.3121098.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Marthi, Bhaskara. "Automatic shaping and decomposition of reward functions." In the 24th international conference. ACM Press, 2007. http://dx.doi.org/10.1145/1273496.1273572.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Reward functions"

1

migao, ZIma. Suggestion: adding the function of Question&amp;Reward. ResearchHub Technologies, Inc., 2022. http://dx.doi.org/10.55277/researchhub.f642k10l.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Reward – and threat – related neural function associated with depression. ACAMH, 2021. http://dx.doi.org/10.13056/acamh.18400.

Full text
Abstract:
The focus of this podcast is on the recently published JCPP paper ‘Reward- and threat-related neural function associated with risk and presence of depression in adolescents: a study using a composite risk score in Brazil’, co-authored by Dr. Johnna Swartz.
APA, Harvard, Vancouver, ISO, and other styles
3

A Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, 2020. http://dx.doi.org/10.4271/2020-01-5154.

Full text
Abstract:
At present, with the development of Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), the decision-making for automated vehicle based on connected environment conditions has attracted more attentions. Reliability, efficiency and generalization performance are the basic requirements for the vehicle decision-making system. Therefore, this paper proposed a decision-making method for connected autonomous driving based on Wasserstein Generative Adversarial Nets-Deep Deterministic Policy Gradient (WGAIL-DDPG) algorithm. In which, the key components for reinforcement learning (RL) model
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!