Siga este link para ver outros tipos de publicações sobre o tema: Sparse Reward.

Artigos de revistas sobre o tema "Sparse Reward"

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Veja os 50 melhores artigos de revistas para estudos sobre o assunto "Sparse Reward".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Veja os artigos de revistas das mais diversas áreas científicas e compile uma bibliografia correta.

1

Kong, Yan, Junfeng Wei, and Chih-Hsien Hsia. "Solving Sparse Reward Tasks Using Self-Balancing Exploration and Exploitation." Journal of Internet Technology 26, no. 3 (2025): 293–301. https://doi.org/10.70003/160792642025052603002.

Texto completo da fonte
Resumo:
A core challenge in applying deep reinforcement learning (DRL) to real-world tasks is the sparse reward problem, and shaping reward has been one effective method to solve it. However, due to the enormous state space and sparse rewards in the real world, a large number of useless samples may be generated, leading to reduced sample efficiency and potential local optima. To address this issue, this study proposes a self-balancing method of exploration and development to solve the issue of sparse rewards. Firstly, we shape the reward function according to the evaluated progress, to guide the agent
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Park, Junseok, Yoonsung Kim, Hee bin Yoo, et al. "Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (2024): 592–600. http://dx.doi.org/10.1609/aaai.v38i1.27815.

Texto completo da fonte
Resumo:
Toddlers evolve from free exploration with sparse feedback to exploiting prior experiences for goal-directed learning with denser rewards. Drawing inspiration from this Toddler-Inspired Reward Transition, we set out to explore the implications of varying reward transitions when incorporated into Reinforcement Learning (RL) tasks. Central to our inquiry is the transition from sparse to potential-based dense rewards, which share optimal strategies regardless of reward changes. Through various experiments, including those in egocentric navigation and robotic arm manipulation tasks, we found that
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Xu, Pei, Junge Zhang, Qiyue Yin, Chao Yu, Yaodong Yang, and Kaiqi Huang. "Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (2023): 11717–25. http://dx.doi.org/10.1609/aaai.v37i10.26384.

Texto completo da fonte
Resumo:
Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. One possible solution to this issue is to exploit inherent task structures for an acceleration of exploration. In this paper, we present a novel exploration approach, which encodes a special structural prior on the reward function into exploration, for sparse-reward multi-agent tasks. Specifically, a novel entropic exploration objective which encodes the structural prior is proposed to accelerate the discovery of rewards. By maximizing the lower bound of this objective, we then propose an algor
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Mguni, David, Taher Jafferjee, Jianhong Wang, et al. "Learning to Shape Rewards Using a Game of Two Partners." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (2023): 11604–12. http://dx.doi.org/10.1609/aaai.v37i10.26371.

Texto completo da fonte
Resumo:
Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switc
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Meng, Fanxiao. "Research on Multi-agent Sparse Reward Problem." Highlights in Science, Engineering and Technology 85 (March 13, 2024): 96–103. http://dx.doi.org/10.54097/er0mx710.

Texto completo da fonte
Resumo:
Sparse reward poses a significant challenge in deep reinforcement learning, leading to issues such as low sample utilization, slow agent convergence, and subpar performance of optimal policies. Overcoming these challenges requires tackling the complexity of sparse reward algorithms and addressing the lack of unified understanding. This paper aims to address these issues by introducing the concepts of reinforcement learning and sparse reward, as well as presenting three categories of sparse reward algorithms. Furthermore, the paper conducts an analysis and summary of three key aspects: manual l
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Zuo, Guoyu, Qishen Zhao, Jiahao Lu, and Jiangeng Li. "Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards." International Journal of Advanced Robotic Systems 17, no. 1 (2020): 172988141989834. http://dx.doi.org/10.1177/1729881419898342.

Texto completo da fonte
Resumo:
The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Tw
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Velasquez, Alvaro, Brett Bissey, Lior Barak, et al. "Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (2021): 12015–23. http://dx.doi.org/10.1609/aaai.v35i13.17427.

Texto completo da fonte
Resumo:
Reinforcement learning and planning have been revolutionized in recent years, due in part to the mass adoption of deep convolutional neural networks and the resurgence of powerful methods to refine decision-making policies. However, the problem of sparse reward signals and their representation remains pervasive in many domains. While various rewardshaping mechanisms and imitation learning approaches have been proposed to mitigate this problem, the use of humanaided artificial rewards introduces human error, sub-optimal behavior, and a greater propensity for reward hacking. In this paper, we mi
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Corazza, Jan, Ivan Gavran, and Daniel Neider. "Reinforcement Learning with Stochastic Reward Machines." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 6429–36. http://dx.doi.org/10.1609/aaai.v36i6.20594.

Texto completo da fonte
Resumo:
Reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequences of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. To overcome this practical limitation, we introduce a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them. Our algorithm, based on constraint solving, learns minimal stochastic reward machines from the explorations of a reinforcement learning agent. This al
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Gaina, Raluca D., Simon M. Lucas, and Diego Pérez-Liébana. "Tackling Sparse Rewards in Real-Time Games with Statistical Forward Planning Methods." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 1691–98. http://dx.doi.org/10.1609/aaai.v33i01.33011691.

Texto completo da fonte
Resumo:
One of the issues general AI game players are required to deal with is the different reward systems in the variety of games they are expected to be able to play at a high level. Some games may present plentiful rewards which the agents can use to guide their search for the best solution, whereas others feature sparse reward landscapes that provide little information to the agents. The work presented in this paper focuses on the latter case, which most agents struggle with. Thus, modifications are proposed for two algorithms, Monte Carlo Tree Search and Rolling Horizon Evolutionary Algorithms,
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Zhou, Xiao, Song Zhou, Xingang Mou, and Yi He. "Multirobot Collaborative Pursuit Target Robot by Improved MADDPG." Computational Intelligence and Neuroscience 2022 (February 25, 2022): 1–10. http://dx.doi.org/10.1155/2022/4757394.

Texto completo da fonte
Resumo:
Policy formulation is one of the main problems in multirobot systems, especially in multirobot pursuit-evasion scenarios, where both sparse rewards and random environment changes bring great difficulties to find better strategy. Existing multirobot decision-making methods mostly use environmental rewards to promote robots to complete the target task that cannot achieve good results. This paper proposes a multirobot pursuit method based on improved multiagent deep deterministic policy gradient (MADDPG), which solves the problem of sparse rewards in multirobot pursuit-evasion scenarios by combin
Estilos ABNT, Harvard, Vancouver, APA, etc.
11

Jiang, Jiechuan, and Zongqing Lu. "Generative Exploration and Exploitation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (2020): 4337–44. http://dx.doi.org/10.1609/aaai.v34i04.5858.

Texto completo da fonte
Resumo:
Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm,
Estilos ABNT, Harvard, Vancouver, APA, etc.
12

Yan Kong, Yan Kong, Yefeng Rui Yan Kong, and Chih-Hsien Hsia Yefeng Rui. "A Deep Reinforcement Learning-Based Approach in Porker Game." 電腦學刊 34, no. 2 (2023): 041–51. http://dx.doi.org/10.53106/199115992023043402004.

Texto completo da fonte
Resumo:
<p>Recent years have witnessed the big success deep reinforcement learning achieved in the domain of card and board games, such as Go, chess and Texas Hold’em poker. However, Dou Di Zhu, a traditional Chinese card game, is still a challenging task for deep reinforcement learning methods due to the enormous action space and the sparse and delayed reward of each action from the environment. Basic reinforcement learning algorithms are more effective in the simple environments which have small action spaces and valuable and concrete reward functions, and unfortunately, are shown no
Estilos ABNT, Harvard, Vancouver, APA, etc.
13

Dann, Michael, Fabio Zambetta, and John Thangarajah. "Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 881–89. http://dx.doi.org/10.1609/aaai.v33i01.3301881.

Texto completo da fonte
Resumo:
Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari ga
Estilos ABNT, Harvard, Vancouver, APA, etc.
14

Bougie, Nicolas, and Ryutaro Ichise. "Skill-based curiosity for intrinsically motivated reinforcement learning." Machine Learning 109, no. 3 (2019): 493–512. http://dx.doi.org/10.1007/s10994-019-05845-8.

Texto completo da fonte
Resumo:
Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, c
Estilos ABNT, Harvard, Vancouver, APA, etc.
15

Catacora Ocana, Jim Martin, Roberto Capobianco, and Daniele Nardi. "An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains." Journal of Artificial Intelligence Research 76 (April 26, 2023): 1181–218. http://dx.doi.org/10.1613/jair.1.14390.

Texto completo da fonte
Resumo:
Deep reinforcement learning has achieved impressive results in recent years; yet, it is still severely troubled by environments showcasing sparse rewards. On top of that, not all sparse-reward environments are created equal, i.e., they can differ in the presence or absence of various features, with many of them having a great impact on learning. In light of this, the present work puts together a literature compilation of such environmental features, covering particularly those that have been taken advantage of and those that continue to pose a challenge. We expect this effort to provide guidan
Estilos ABNT, Harvard, Vancouver, APA, etc.
16

Zhu, Yiwen, Yuan Zheng, Wenya Wei, and Zhou Fang. "Enhancing Automated Maneuvering Decisions in UCAV Air Combat Games Using Homotopy-Based Reinforcement Learning." Drones 8, no. 12 (2024): 756. https://doi.org/10.3390/drones8120756.

Texto completo da fonte
Resumo:
In the field of real-time autonomous decision-making for Unmanned Combat Aerial Vehicles (UCAVs), reinforcement learning is widely used to enhance their decision-making capabilities in high-dimensional spaces. These enhanced capabilities allow UCAVs to better respond to the maneuvers of various opponents, with the win rate often serving as the primary optimization metric. However, relying solely on the terminal outcome of victory or defeat as the optimization target, but without incorporating additional rewards throughout the process, poses significant challenges for reinforcement learning due
Estilos ABNT, Harvard, Vancouver, APA, etc.
17

Gehring, Clement, Masataro Asai, Rohan Chitnis, et al. "Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators." Proceedings of the International Conference on Automated Planning and Scheduling 32 (June 13, 2022): 588–96. http://dx.doi.org/10.1609/icaps.v32i1.19846.

Texto completo da fonte
Resumo:
Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue an
Estilos ABNT, Harvard, Vancouver, APA, etc.
18

Xu, Zhe, Ivan Gavran, Yousef Ahmad, et al. "Joint Inference of Reward Machines and Policies for Reinforcement Learning." Proceedings of the International Conference on Automated Planning and Scheduling 30 (June 1, 2020): 590–98. http://dx.doi.org/10.1609/icaps.v30i1.6756.

Texto completo da fonte
Resumo:
Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis rew
Estilos ABNT, Harvard, Vancouver, APA, etc.
19

Ye, Chenhao, Wei Zhu, Shiluo Guo, and Jinyin Bai. "DQN-Based Shaped Reward Function Mold for UAV Emergency Communication." Applied Sciences 14, no. 22 (2024): 10496. http://dx.doi.org/10.3390/app142210496.

Texto completo da fonte
Resumo:
Unmanned aerial vehicles (UAVs) have emerged as pivotal tools in emergency communication scenarios. In the aftermath of disasters, UAVs can be communication nodes to provide communication services for users in the area. In this paper, we establish a meticulously crafted virtual simulation environment and leverage advanced deep reinforcement learning algorithms to train UAVs agents. Notwithstanding, the development of reinforcement learning algorithms is beset with challenges such as sparse rewards and protracted training durations. To mitigate these issues, we devise an enhanced reward functio
Estilos ABNT, Harvard, Vancouver, APA, etc.
20

Qu, Yun, Yuhang Jiang, Boyuan Wang, et al. "Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 19 (2025): 20095–103. https://doi.org/10.1609/aaai.v39i19.34213.

Texto completo da fonte
Resumo:
Reinforcement learning (RL) often encounters delayed and sparse feedback in real-world applications, even with only episodic rewards. Previous approaches have made some progress in reward redistribution for credit assignment but still face challenges, including training difficulties due to redundancy and ambiguous attributions stemming from overlooking the multifaceted nature of mission performance evaluation. Hopefully, Large Language Model (LLM) encompasses fruitful decision-making knowledge and provides a plausible tool for reward redistribution. Even so, deploying LLM in this case is non-t
Estilos ABNT, Harvard, Vancouver, APA, etc.
21

Dharmavaram, Akshay, Matthew Riemer, and Shalabh Bhatnagar. "Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13777–78. http://dx.doi.org/10.1609/aaai.v34i10.7160.

Texto completo da fonte
Resumo:
Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an
Estilos ABNT, Harvard, Vancouver, APA, etc.
22

Abu Bakar, Mohamad Hafiz, Abu Ubaidah Shamsudin, Zubair Adil Soomro, Satoshi Tadokoro, and C. J. Salaan. "FUSION SPARSE AND SHAPING REWARD FUNCTION IN SOFT ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION." Jurnal Teknologi 86, no. 2 (2024): 37–49. http://dx.doi.org/10.11113/jurnalteknologi.v86.20147.

Texto completo da fonte
Resumo:
Nowadays, the advancement in autonomous robots is the latest influenced by the development of a world surrounded by new technologies. Deep Reinforcement Learning (DRL) allows systems to operate automatically, so the robot will learn the next movement based on the interaction with the environment. Moreover, since robots require continuous action, Soft Actor Critic Deep Reinforcement Learning (SAC DRL) is considered the latest DRL approach solution. SAC is used because its ability to control continuous action to produce more accurate movements. SAC fundamental is robust against unpredictability,
Estilos ABNT, Harvard, Vancouver, APA, etc.
23

Sharip, Zati, Mohd Hafiz Zulkifli, Mohd Nur Farhan Abd Wahab, Zubaidi Johar, and Mohd Zaki Mat Amin. "ASSESSING TROPHIC STATE AND WATER QUALITY OF SMALL LAKES AND PONDS IN PERAK." Jurnal Teknologi 86, no. 2 (2024): 51–59. http://dx.doi.org/10.11113/jurnalteknologi.v86.20566.

Texto completo da fonte
Resumo:
Nowadays, the advancement in autonomous robots is the latest influenced by the development of a world surrounded by new technologies. Deep Reinforcement Learning (DRL) allows systems to operate automatically, so the robot will learn the next movement based on the interaction with the environment. Moreover, since robots require continuous action, Soft Actor Critic Deep Reinforcement Learning (SAC DRL) is considered the latest DRL approach solution. SAC is used because its ability to control continuous action to produce more accurate movements. SAC fundamental is robust against unpredictability,
Estilos ABNT, Harvard, Vancouver, APA, etc.
24

Parisi, Simone, Davide Tateo, Maximilian Hensel, Carlo D’Eramo, Jan Peters, and Joni Pajarinen. "Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning." Algorithms 15, no. 3 (2022): 81. http://dx.doi.org/10.3390/a15030081.

Texto completo da fonte
Resumo:
Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that
Estilos ABNT, Harvard, Vancouver, APA, etc.
25

Forbes, Grant C., and David L. Roberts. "Potential-Based Reward Shaping for Intrinsic Motivation (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (2024): 23488–89. http://dx.doi.org/10.1609/aaai.v38i21.30441.

Texto completo da fonte
Resumo:
Recently there has been a proliferation of intrinsic motivation (IM) reward shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was develope
Estilos ABNT, Harvard, Vancouver, APA, etc.
26

Lin, Qi, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Huixing Jiang, and Wei Chen. "Data with High and Consistent Preference Difference Are Better for Reward Model." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 26 (2025): 27482–90. https://doi.org/10.1609/aaai.v39i26.34960.

Texto completo da fonte
Resumo:
Reinforcement Learning from Human Feedback (RLHF) is a commonly used alignment method for Large Language Models (LLMs). This method relies on a reward model trained on a preference dataset to provide scalar rewards. However, the human-annotated preference data is often sparse, noisy, and costly to obtain, necessitating more efficient utilization. This paper proposes a new metric for better preference data utilization from both theoretical and empirical perspectives. Starting with the Bradley-Terry model, we compute the Mean Square Error (MSE) between the expected loss and empirical loss of the
Estilos ABNT, Harvard, Vancouver, APA, etc.
27

Guo, Yijie, Qiucheng Wu, and Honglak Lee. "Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 6792–800. http://dx.doi.org/10.1609/aaai.v36i6.20635.

Texto completo da fonte
Resumo:
Meta reinforcement learning (meta-RL) aims to learn a policy solving a set of training tasks simultaneously and quickly adapting to new tasks. It requires massive amounts of data drawn from training tasks to infer the common structure shared among tasks. Without heavy reward engineering, the sparse rewards in long-horizon tasks exacerbate the problem of sample efficiency in meta-RL. Another challenge in meta-RL is the discrepancy of difficulty level among tasks, which might cause one easy task dominating learning of the shared policy and thus preclude policy adaptation to new tasks. This work
Estilos ABNT, Harvard, Vancouver, APA, etc.
28

Phan, Bui Khoi, Truong Giang Nguyen, and Van Tan Hoang. "Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm." Indian Journal of Science and Technology 14, no. 30 (2021): 2460–71. https://doi.org/10.17485/IJST/v14i30.1030.

Texto completo da fonte
Resumo:
Abstract <strong>Objectives:</strong>&nbsp;To study an algorithm to control a bipedal robot to walk so that it has a gait close to that of a human. It is known that the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a highly efficient algorithm with a few changes compared to the popular algorithm &mdash; the commonly used Deep Deterministic Policy Gradient (DDPG) in the continuous action space problem in Reinforcement Learning.<strong>&nbsp;Methods:</strong>&nbsp;Different from the usual sparse reward function model used, in this study, a reward model combined with a sparse
Estilos ABNT, Harvard, Vancouver, APA, etc.
29

Booth, Serena, W. Bradley Knox, Julie Shah, Scott Niekum, Peter Stone, and Alessandro Allievi. "The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 5 (2023): 5920–29. http://dx.doi.org/10.1609/aaai.v37i5.25733.

Texto completo da fonte
Resumo:
In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often necessarily sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. The sparsity of these true task metrics can make them hard to learn from, so in practice they are often replaced with alternative dense reward functions. These dense reward functions are typically designed by experts through an ad hoc process of trial and error. In this process, experts manually search for a reward function that improves performance with respect to the ta
Estilos ABNT, Harvard, Vancouver, APA, etc.
30

Linke, Cam, Nadia M. Ady, Martha White, Thomas Degris, and Adam White. "Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study." Journal of Artificial Intelligence Research 69 (December 14, 2020): 1287–332. http://dx.doi.org/10.1613/jair.1.12087.

Texto completo da fonte
Resumo:
Learning about many things can provide numerous benefits to a reinforcement learning system. For example, learning many auxiliary value functions, in addition to optimizing the environmental reward, appears to improve both exploration and representation learning. The question we tackle in this paper is how to sculpt the stream of experience—how to adapt the learning system’s behavior—to optimize the learning of a collection of value functions. A simple answer is to compute an intrinsic reward based on the statistics of each auxiliary learner, and use reinforcement learning to maximize that int
Estilos ABNT, Harvard, Vancouver, APA, etc.
31

Velasquez, Alvaro, Brett Bissey, Lior Barak, et al. "Multi-Agent Tree Search with Dynamic Reward Shaping." Proceedings of the International Conference on Automated Planning and Scheduling 32 (June 13, 2022): 652–61. http://dx.doi.org/10.1609/icaps.v32i1.19854.

Texto completo da fonte
Resumo:
Sparse rewards and their representation in multi-agent domains remains a challenge for the development of multi-agent planning systems. While techniques from formal methods can be adopted to represent the underlying planning objectives, their use in facilitating and accelerating learning has witnessed limited attention in multi-agent settings. Reward shaping methods that leverage such formal representations in single-agent settings are typically static in the sense that the artificial rewards remain the same throughout the entire learning process. In contrast, we investigate the use of such fo
Estilos ABNT, Harvard, Vancouver, APA, etc.
32

Sorg, Jonathan, Satinder Singh, and Richard Lewis. "Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (2011): 465–70. http://dx.doi.org/10.1609/aaai.v25i1.7931.

Texto completo da fonte
Resumo:
Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In this work, we compare this reward design approach to the common leaf-evaluation heuristic approach for improving planning agents. We show that in many agents, the reward design approach strictly subsumes
Estilos ABNT, Harvard, Vancouver, APA, etc.
33

Yin, Haiyan, Jianda Chen, Sinno Jialin Pan, and Sebastian Tschiatschek. "Sequential Generative Exploration Model for Partially Observable Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (2021): 10700–10708. http://dx.doi.org/10.1609/aaai.v35i12.17279.

Texto completo da fonte
Resumo:
Many challenging partially observable reinforcement learning problems have sparse rewards and most existing model-free algorithms struggle with such reward sparsity. In this paper, we propose a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model. Specifically, the sequential generative model processes a sequence of partial observations and actions from the agent's historical transitions to compile a belief state for performing forward dynamics prediction. Then we utilize the error of the dynamics prediction task to infer the intrinsic r
Estilos ABNT, Harvard, Vancouver, APA, etc.
34

Hasanbeig, Mohammadhosein, Natasha Yogananda Jeppu, Alessandro Abate, Tom Melham, and Daniel Kroening. "DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 7647–56. http://dx.doi.org/10.1609/aaai.v35i9.16935.

Texto completo da fonte
Resumo:
This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a
Estilos ABNT, Harvard, Vancouver, APA, etc.
35

Hasanbeig, Hosein, Natasha Yogananda Jeppu, Alessandro Abate, Tom Melham, and Daniel Kroening. "Symbolic Task Inference in Deep Reinforcement Learning." Journal of Artificial Intelligence Research 80 (July 23, 2024): 1099–137. http://dx.doi.org/10.1613/jair.1.14063.

Texto completo da fonte
Resumo:
This paper proposes DeepSynth, a method for effective training of deep reinforcement learning agents when the reward is sparse or non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact finite state automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton, so that the generat
Estilos ABNT, Harvard, Vancouver, APA, etc.
36

Jiang, Nan, Sheng Jin, and Changshui Zhang. "Hierarchical automatic curriculum learning: Converting a sparse reward navigation task into dense reward." Neurocomputing 360 (September 2019): 265–78. http://dx.doi.org/10.1016/j.neucom.2019.06.024.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
37

Jin, Tianyuan, Hao-Lun Hsu, William Chang, and Pan Xu. "Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (2024): 12956–64. http://dx.doi.org/10.1609/aaai.v38i11.29193.

Texto completo da fonte
Resumo:
We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm and derived a Bayesian regret bound. H
Estilos ABNT, Harvard, Vancouver, APA, etc.
38

Ma, Ang, Yanhua Yu, Chuan Shi, Shuai Zhen, Liang Pang, and Tat-Seng Chua. "PMHR: Path-Based Multi-Hop Reasoning Incorporating Rule-Enhanced Reinforcement Learning and KG Embeddings." Electronics 13, no. 23 (2024): 4847. https://doi.org/10.3390/electronics13234847.

Texto completo da fonte
Resumo:
Multi-hop reasoning provides a means for inferring indirect relationships and missing information from knowledge graphs (KGs). Reinforcement learning (RL) was recently employed for multi-hop reasoning. Although RL-based methods provide explainability, they face challenges such as sparse rewards, spurious paths, large action spaces, and long training and running times. In this study, we present a novel approach that combines KG embeddings and RL strategies for multi-hop reasoning called path-based multi-hop reasoning (PMHR). We address the issues of sparse rewards and spurious paths by incorpor
Estilos ABNT, Harvard, Vancouver, APA, etc.
39

Wei, Tianqi, Qinghai Guo, and Barbara Webb. "Learning with sparse reward in a gap junction network inspired by the insect mushroom body." PLOS Computational Biology 20, no. 5 (2024): e1012086. http://dx.doi.org/10.1371/journal.pcbi.1012086.

Texto completo da fonte
Resumo:
Animals can learn in real-life scenarios where rewards are often only available when a goal is achieved. This ‘distal’ or ‘sparse’ reward problem remains a challenge for conventional reinforcement learning algorithms. Here we investigate an algorithm for learning in such scenarios, inspired by the possibility that axo-axonal gap junction connections, observed in neural circuits with parallel fibres such as the insect mushroom body, could form a resistive network. In such a network, an active node represents the task state, connections between nodes represent state transitions and their connect
Estilos ABNT, Harvard, Vancouver, APA, etc.
40

Kang, Yongxin, Enmin Zhao, Kai Li, and Junliang Xing. "Exploration via State influence Modeling." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 8047–54. http://dx.doi.org/10.1609/aaai.v35i9.16981.

Texto completo da fonte
Resumo:
This paper studies the challenging problem of reinforcement learning (RL) in hard exploration tasks with sparse rewards. It focuses on the exploration stage before the agent gets the first positive reward, in which case, traditional RL algorithms with simple exploration strategies often work poorly. Unlike previous methods using some attribute of a single state as the intrinsic reward to encourage exploration, this work leverages the social influence between different states to permit more efficient exploration. It introduces a general intrinsic reward construction method to evaluate the socia
Estilos ABNT, Harvard, Vancouver, APA, etc.
41

Adamczyk, Jacob, Volodymyr Makarenko, Stas Tiomkin, and Rahul V. Kulkarni. "Bootstrapped Reward Shaping." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 15 (2025): 15302–10. https://doi.org/10.1609/aaai.v39i15.33679.

Texto completo da fonte
Resumo:
In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based reward shaping" (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required potential function must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a bootstrapped method of reward shaping, termed BS-RS, in which the agent's current estimate of the
Estilos ABNT, Harvard, Vancouver, APA, etc.
42

Sakamoto, Yuma, and Kentarou Kurashige. "Self-Generating Evaluations for Robot’s Autonomy Based on Sensor Input." Machines 11, no. 9 (2023): 892. http://dx.doi.org/10.3390/machines11090892.

Texto completo da fonte
Resumo:
Reinforcement learning has been explored within the context of robot operation in different environments. Designing the reward function in reinforcement learning is challenging for designers because it requires specialized knowledge. To reduce the design burden, we propose a reward design method that is independent of both specific environments and tasks in which reinforcement learning robots evaluate and generate rewards autonomously based on sensor information received from the environment. This method allows the robot to operate autonomously based on sensors. However, the existing approach
Estilos ABNT, Harvard, Vancouver, APA, etc.
43

Morrison, Sara E., Vincent B. McGinty, Johann du Hoffmann, and Saleem M. Nicola. "Limbic-motor integration by neural excitations and inhibitions in the nucleus accumbens." Journal of Neurophysiology 118, no. 5 (2017): 2549–67. http://dx.doi.org/10.1152/jn.00465.2017.

Texto completo da fonte
Resumo:
The nucleus accumbens (NAc) has often been described as a “limbic-motor interface,” implying that the NAc integrates the value of expected rewards with the motor planning required to obtain them. However, there is little direct evidence that the signaling of individual NAc neurons combines information about predicted reward and behavioral response. We report that cue-evoked neural responses in the NAc form a likely physiological substrate for its limbic-motor integration function. Across task contexts, individual NAc neurons in behaving rats robustly encode the reward-predictive qualities of a
Estilos ABNT, Harvard, Vancouver, APA, etc.
44

Han, Ziyao, Fan Yi, and Kazuhiro Ohkura. "Collective Transport Behavior in a Robotic Swarm with Hierarchical Imitation Learning." Journal of Robotics and Mechatronics 36, no. 3 (2024): 538–45. http://dx.doi.org/10.20965/jrm.2024.p0538.

Texto completo da fonte
Resumo:
Swarm robotics is the study of how a large number of relatively simple physically embodied robots can be designed such that a desired collective behavior emerges from local interactions. Furthermore, reinforcement learning (RL) is a promising approach for training robotic swarm controllers. However, the conventional RL approach suffers from the sparse reward problem in some complex tasks, such as key-to-door tasks. In this study, we applied hierarchical imitation learning to train a robotic swarm to address a key-to-door transport task with sparse rewards. The results demonstrate that the prop
Estilos ABNT, Harvard, Vancouver, APA, etc.
45

Song, Qingpeng, Yuansheng Liu, Ming Lu, et al. "Autonomous Driving Decision Control Based on Improved Proximal Policy Optimization Algorithm." Applied Sciences 13, no. 11 (2023): 6400. http://dx.doi.org/10.3390/app13116400.

Texto completo da fonte
Resumo:
The decision-making control of autonomous driving in complex urban road environments is a difficult problem in the research of autonomous driving. In order to solve the problem of high dimensional state space and sparse reward in autonomous driving decision control in this environment, this paper proposed a Coordinated Convolution Multi-Reward Proximal Policy Optimization (CCMR-PPO). This method reduces the dimension of the bird’s-eye view data through the coordinated convolution network and then fuses the processed data with the vehicle state data as the input of the algorithm to optimize the
Estilos ABNT, Harvard, Vancouver, APA, etc.
46

Tang, Wanxing, Chuang Cheng, Haiping Ai, and Li Chen. "Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment." Micromachines 13, no. 4 (2022): 564. http://dx.doi.org/10.3390/mi13040564.

Texto completo da fonte
Resumo:
In this article, the trajectory planning of the two manipulators of the dual-arm robot is studied to approach the patient in a complex environment with deep reinforcement learning algorithms. The shape of the human body and bed is complex which may lead to the collision between the human and the robot. Because the sparse reward the robot obtains from the environment may not support the robot to accomplish the task, a neural network is trained to control the manipulators of the robot to prepare to hold the patient up by using a proximal policy optimization algorithm with a continuous reward fun
Estilos ABNT, Harvard, Vancouver, APA, etc.
47

Xu, Xibao, Yushen Chen, and Chengchao Bai. "Deep Reinforcement Learning-Based Accurate Control of Planetary Soft Landing." Sensors 21, no. 23 (2021): 8161. http://dx.doi.org/10.3390/s21238161.

Texto completo da fonte
Resumo:
Planetary soft landing has been studied extensively due to its promising application prospects. In this paper, a soft landing control algorithm based on deep reinforcement learning (DRL) with good convergence property is proposed. First, the soft landing problem of the powered descent phase is formulated and the theoretical basis of Reinforcement Learning (RL) used in this paper is introduced. Second, to make it easier to converge, a reward function is designed to include process rewards like velocity tracking reward, solving the problem of sparse reward. Then, by including the fuel consumptio
Estilos ABNT, Harvard, Vancouver, APA, etc.
48

Potjans, Wiebke, Abigail Morrison, and Markus Diesmann. "A Spiking Neural Network Model of an Actor-Critic Learning Agent." Neural Computation 21, no. 2 (2009): 301–39. http://dx.doi.org/10.1162/neco.2008.08-07-593.

Texto completo da fonte
Resumo:
The ability to adapt behavior to maximize reward as a result of interactions with the environment is crucial for the survival of any higher organism. In the framework of reinforcement learning, temporal-difference learning algorithms provide an effective strategy for such goal-directed adaptation, but it is unclear to what extent these algorithms are compatible with neural computation. In this article, we present a spiking neural network model that implements actor-critic temporal-difference learning by combining local plasticity rules with a global reward signal. The network is capable of sol
Estilos ABNT, Harvard, Vancouver, APA, etc.
49

Kim, MyeongSeop, and Jung-Su Kim. "Policy-based Deep Reinforcement Learning for Sparse Reward Environment." Transactions of The Korean Institute of Electrical Engineers 70, no. 3 (2021): 506–14. http://dx.doi.org/10.5370/kiee.2021.70.3.506.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
50

Akgün, Onur, and N. Kemal Üre. "Bayesian curriculum generation in sparse reward reinforcement learning environments." Engineering Science and Technology, an International Journal 66 (June 2025): 102048. https://doi.org/10.1016/j.jestch.2025.102048.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!