Log in

Relevant bibliographies by topics / RL ALGORITHMS

Contents

Journal articles
Dissertations / Theses
Book chapters
Conference papers
Reports

Academic literature on the topic 'RL ALGORITHMS'

Author: Grafiati

Published: 11 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'RL ALGORITHMS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "RL ALGORITHMS"

1

Lahande, Prathamesh, Parag Kaveri, and Jatinderkumar Saini. "Reinforcement Learning for Reducing the Interruptions and Increasing Fault Tolerance in the Cloud Environment." Informatics 10, no. 3 (2023): 64. http://dx.doi.org/10.3390/informatics10030064.

Full text

Abstract:

Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason for this is that these existing algorithms need an intelligence mechanism to enhance their abilities. To provide an intelligence mechanism to improve the resource-scheduling process and provision the fault-tolerance mechanism, an algorithm named re

APA, Harvard, Vancouver, ISO, and other styles

2

Trella, Anna L., Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, and Susan A. Murphy. "Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines." Algorithms 15, no. 8 (2022): 255. http://dx.doi.org/10.3390/a15080255.

Full text

Abstract:

Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (predictability, computability, stability) framework, a data science framework that incorpor

APA, Harvard, Vancouver, ISO, and other styles

3

Rodríguez Sánchez, Francisco, Ildeberto Santos-Ruiz, Joaquín Domínguez-Zenteno, and Francisco Ronay López-Estrada. "Control Applications Using Reinforcement Learning: An Overview." Memorias del Congreso Nacional de Control Automático 5, no. 1 (2022): 67–72. http://dx.doi.org/10.58571/cnca.amca.2022.019.

Full text

Abstract:

This article presents the general formulation and terminology of reinforcement learning (RL) from the perspective of Bellman’s equations based on a reward function, its learning methods and algorithms. The important key in RL is the calculation of value-state and value state-action functions, useful to find, compare and improve policies for learning agent through different methods based on values and policies such as Q-learning. The deep deterministic policy gradient (DDPG) learning algorithm based on an actor-critic structure is also described as one of the ways of training the RL agent. RL a

APA, Harvard, Vancouver, ISO, and other styles

4

Abbass, Mahmoud Abdelkader Bashery, and Hyun-Soo Kang. "Drone Elevation Control Based on Python-Unity Integrated Framework for Reinforcement Learning Applications." Drones 7, no. 4 (2023): 225. http://dx.doi.org/10.3390/drones7040225.

Full text

Abstract:

Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in the online training process. In addition, the RL platform tools (e.g., Python OpenAI’s Gym, Unity ML-Agents, PyBullet, DART, MoJoCo, RaiSim, Isaac, and AirSim), that are required to reduce the real-world challenges, suffer from drawbacks (e.g., the limited number of examples and applications, and difficulties in implementation of the RL algorithms, due to difficulties with the pro

APA, Harvard, Vancouver, ISO, and other styles

5

Mann, Timothy, and Yoonsuck Choe. "Scaling Up Reinforcement Learning through Targeted Exploration." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (2011): 435–40. http://dx.doi.org/10.1609/aaai.v25i1.7929.

Full text

Abstract:

Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STAR-MAX behaves identically to R-MAX. When ξ is a subset of the state space, to keep exploration within ξ, a recovery rule β is needed. We compared existing al

APA, Harvard, Vancouver, ISO, and other styles

6

Cheng, Richard, Gábor Orosz, Richard M. Murray, and Joel W. Burdick. "End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3387–95. http://dx.doi.org/10.1609/aaai.v33i01.33013387.

Full text

Abstract:

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages th

APA, Harvard, Vancouver, ISO, and other styles

7

Kirsch, Louis, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, and Yutian Chen. "Introducing Symmetries to Black Box Meta Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (2022): 7202–10. http://dx.doi.org/10.1609/aaai.v36i7.20681.

Full text

Abstract:

Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform compared to human-engineered RL algorithms in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits

APA, Harvard, Vancouver, ISO, and other styles

8

Kim, Hyun-Su, and Uksun Kim. "Development of a Control Algorithm for a Semi-Active Mid-Story Isolation System Using Reinforcement Learning." Applied Sciences 13, no. 4 (2023): 2053. http://dx.doi.org/10.3390/app13042053.

Full text

Abstract:

The semi-active control system is widely used to reduce the seismic response of building structures. Its control performance mainly depends on the applied control algorithms. Various semi-active control algorithms have been developed to date. Recently, machine learning has been applied to various engineering fields and provided successful results. Because reinforcement learning (RL) has shown good performance for real-time decision-making problems, structural control engineers have become interested in RL. In this study, RL was applied to the development of a semi-active control algorithm. Amo

APA, Harvard, Vancouver, ISO, and other styles

9

Prakash, Kritika, Fiza Husain, Praveen Paruchuri, and Sujit Gujar. "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (2022): 8009–16. http://dx.doi.org/10.1609/aaai.v36i7.20772.

Full text

Abstract:

Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep-Q Networks, and Vanilla Proximal Policy Optimization. We propose a new Privacy-Aware Inverse RL analysis framework (PRIL) that involves performing reward reconstruction as an adversarial attac

APA, Harvard, Vancouver, ISO, and other styles

10

Niazi, Abdolkarim, Norizah Redzuan, Raja Ishak Raja Hamzah, and Sara Esfandiari. "Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making." Advanced Materials Research 566 (September 2012): 572–79. http://dx.doi.org/10.4028/www.scientific.net/amr.566.572.

Full text

Abstract:

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the ne

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "RL ALGORITHMS"

1

Marcus, Elwin. "Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240682.

Full text

Abstract:

Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market

APA, Harvard, Vancouver, ISO, and other styles

2

ALI, FAIZ MOHAMMAD. "CART POLE SYSTEM ANALYSIS AND CONTROL USING MACHINE LEARNING ALGORITHMS." Thesis, 2022. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19298.

Full text

Abstract:

The cart and pole system balancing is a classical benchmark problem in control theory which is also referred as inverted pendulum. It is a prototype laboratory model of an unstable mechanical system. It is mainly used to model the control problems of rockets and missiles in the initial stages of their launch. This system represents an unstable system because an external force is required to keep the pendulum in vertically upright position when cart moves on horizontal track. Designing optimal controllers for the Cart and pole system is a challenging and complex problem as it is an in

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "RL ALGORITHMS"

1

Ahlawat, Samit. "Recent RL Algorithms." In Reinforcement Learning for Finance. Apress, 2022. http://dx.doi.org/10.1007/978-1-4842-8835-1_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Nandy, Abhishek, and Manisha Biswas. "RL Theory and Algorithms." In Reinforcement Learning. Apress, 2017. http://dx.doi.org/10.1007/978-1-4842-3285-9_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Hahn, Ernst Moritz, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. "Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning." In Tools and Algorithms for the Construction and Analysis of Systems. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-30823-9_27.

Full text

Abstract:

AbstractMungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for $$\omega $$ ω -regular objectives, and an internal probabilistic model checker for $$\omega $$ ω -regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and $$\omega $$ ω -automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for $$\omega $$ ω -regular objectives in RL.

APA, Harvard, Vancouver, ISO, and other styles

4

Ramponi, Giorgia. "Learning in the Presence of Multiple Agents." In Special Topics in Information Technology. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15374-7_8.

Full text

Abstract:

AbstractReinforcement Learning (RL) has emerged as a powerful tool to solve sequential decision-making problems, where a learning agent interacts with an unknown environment in order to maximize its rewards. Although most RL real-world applications involve multiple agents, the Multi-Agent Reinforcement Learning (MARL) framework is still poorly understood from a theoretical point of view. In this manuscript, we take a step toward solving this problem, providing theoretically sound algorithms for three RL sub-problems with multiple agents: Inverse Reinforcement Learning (IRL), online learning in MARL, and policy optimization in MARL. We start by considering the IRL problem, providing novel algorithms in two different settings: the first considers how to recover and cluster the intentions of a set of agents given demonstrations of near-optimal behavior; the second aims at inferring the reward function optimized by an agent while observing its actual learning process. Then, we consider online learning in MARL. We showed how the presence of other agents can increase the hardness of the problem while proposing statistically efficient algorithms in two settings: Non-cooperative Configurable Markov Decision Processes and Turn-based Markov Games. As the third sub-problem, we study MARL from an optimization viewpoint, showing the difficulties that arise from multiple function optimization problems and providing a novel algorithm for this scenario.

APA, Harvard, Vancouver, ISO, and other styles

5

Metelli, Alberto Maria. "Configurable Environments in Reinforcement Learning: An Overview." In Special Topics in Information Technology. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-85918-3_9.

Full text

Abstract:

AbstractReinforcement Learning (RL) has emerged as an effective approach to address a variety of complex control tasks. In a typical RL problem, an agent interacts with the environment by perceiving observations and performing actions, with the ultimate goal of maximizing the cumulative reward. In the traditional formulation, the environment is assumed to be a fixed entity that cannot be externally controlled. However, there exist several real-world scenarios in which the environment offers the opportunity to configure some of its parameters, with diverse effects on the agent’s learning process. In this contribution, we provide an overview of the main aspects of environment configurability. We start by introducing the formalism of the Configurable Markov Decision Processes (Conf-MDPs) and we illustrate the solutions concepts. Then, we revise the algorithms for solving the learning problem in Conf-MDPs. Finally, we present two applications of Conf-MDPs: policy space identification and control frequency adaptation.

APA, Harvard, Vancouver, ISO, and other styles

6

Gros, Timo P., Holger Hermanns, Jörg Hoffmann, Michaela Klauck, Maximilian A. Köhl, and Verena Wolf. "MoGym: Using Formal Models for Training and Verifying Decision-making Agents." In Computer Aided Verification. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13188-2_21.

Full text

Abstract:

AbstractMoGym, is an integrated toolbox enabling the training and verification of machine-learned decision-making agents based on formal models, for the purpose of sound use in the real world. Given a formal representation of a decision-making problem in the JANI format and a reach-avoid objective, MoGym (a) enables training a decision-making agent with respect to that objective directly on the model using reinforcement learning (RL) techniques, and (b) it supports rigorous assessment of the quality of the induced decision-making agent by means of deep statistical model checking (DSMC). MoGym implements the standard interface for training environments established by OpenAI Gym, thereby connecting to the vast body of existing work in the RL community. In return, it makes accessible the large set of existing JANI model checking benchmarks to machine learning research. It thereby contributes an efficient feedback mechanism for improving in particular reinforcement learning algorithms. The connective part is implemented on top of Momba. For the DSMC quality assurance of the learned decision-making agents, a variant of the statistical model checker modes of the Modest Toolset is leveraged, which has been extended by two new resolution strategies for non-determinism when encountered during statistical evaluation.

APA, Harvard, Vancouver, ISO, and other styles

7

Du, Huaiyu, and Rafał Jóźwiak. "Representation of Observations in Reinforcement Learning for Playing Arcade Fighting Game." In Digital Interaction and Machine Intelligence. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37649-8_5.

Full text

Abstract:

AbstractReinforcement learning (RL) is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning algorithms have become very popular in simple computer games and games like chess and GO. However, playing classical arcade fighting games would be challenging because of the complexity of the command system (the character makes moves according to the sequence of input) and combo system. In this paper, a creation of a game environment of The King of Fighters ’97 (KOF ’97), which implements the open gym env interface, is described. Based on the characteristics of the game, an innovative approach to represent the observations from the last few steps has been proposed, which guarantees the preservation of Markov’s property. The observations are coded using the “one-hot encoding” technique to form a binary vector, while the sequence of stacked vectors from successive steps creates a binary image. This image encodes the character’s input and behavioural pattern, which are then retrieved and recognized by the CNN network. A network structure based on the Advantage Actor-Critic network was proposed. In the experimental verification, the RL agent performing basic combos and complex moves (including the so-called “desperation moves”) was able to defeat characters using the highest level of AI built into the game.

APA, Harvard, Vancouver, ISO, and other styles

8

Bugaenko, Andrey A. "Replacing the Reinforcement Learning (RL) to the Auto Reinforcement Learning (AutoRL) Algorithms to Find the Optimal Structure of Business Processes in the Bank." In Software Engineering Application in Informatics. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-90318-3_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Wang, Dasong, and Roland Snooks. "Artificial Intuitions of Generative Design: An Approach Based on Reinforcement Learning." In Proceedings of the 2020 DigitalFUTURES. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-33-4400-6_18.

Full text

Abstract:

AbstractThis paper proposes a Reinforcement Learning (RL) based design approach that augments existing algorithmic generative processes through the emergence of a form of artificial design intuition. The research presented in the paper is embedded within a highly speculative research project, Artificial Agency, exploring the operation of Machine Learning (ML) in generative design and digital fabrication. After describing the inherent limitations of contemporary generative design processes, the paper compares the three fundamental types of machine learning frameworks in terms of their characteristics and potential impact on generative design. A theoretical framework is defined to demonstrate the methodology of integrating RL with existing generative design procedures, which is further explained with a Random Walk based experimental design example. The paper includes detailed RL definitions as well as critical reflections on its impact and the effects of its implementation. The proposed artificial intuition within this generative approach is currently being further developed through a series of ongoing and proposed research trajectories noted in the conclusion. The ambition of this research is to deepen the integration of intention with machine learning in generative design.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhang, Sizhe, Haitao Wang, Jian Wen, and Hejun Wu. "A Deep RL Algorithm for Location Optimization of Regional Express Distribution Center Using IoT Data." In Lecture Notes in Electrical Engineering. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-0416-7_38.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "RL ALGORITHMS"

1

Simão, Thiago D. "Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/919.

Full text

Abstract:

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To dev

APA, Harvard, Vancouver, ISO, and other styles

2

Chrabąszcz, Patryk, Ilya Loshchilov, and Frank Hutter. "Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/197.

Full text

Abstract:

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can

APA, Harvard, Vancouver, ISO, and other styles

3

Arusoaie, Andrei, David Nowak, Vlad Rusu, and Dorel Lucanu. "A Certified Procedure for RL Verification." In 2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, 2017. http://dx.doi.org/10.1109/synasc.2017.00031.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Gajane, Pratik, Peter Auer, and Ronald Ortner. "Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/413.

Full text

Abstract:

We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algor

APA, Harvard, Vancouver, ISO, and other styles

5

Lin, Zichuan, Tianqi Zhao, Guangwen Yang, and Lintao Zhang. "Episodic Memory Deep Q-Networks." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/337.

Full text

Abstract:

Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interactions with the environments to obtain satisfactory performances. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervis

APA, Harvard, Vancouver, ISO, and other styles

6

Martin, Jarryd, Suraj Narayanan S., Tom Everitt, and Marcus Hutter. "Count-Based Exploration in Feature Space for Reinforcement Learning." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/344.

Full text

Abstract:

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to

APA, Harvard, Vancouver, ISO, and other styles

7

Da Silva, Felipe Leno, and Anna Helena Reali Costa. "Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning." In Concurso de Teses e Dissertações da SBC. Sociedade Brasileira de Computação - SBC, 2020. http://dx.doi.org/10.5753/ctd.2020.11360.

Full text

Abstract:

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important st

APA, Harvard, Vancouver, ISO, and other styles

8

Gao, Yang, Christian M. Meyer, Mohsen Mesgar, and Iryna Gurevych. "Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/326.

Full text

Abstract:

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R)

APA, Harvard, Vancouver, ISO, and other styles

9

Zhao, Enmin, Shihong Deng, Yifan Zang, Yongxin Kang, Kai Li, and Junliang Xing. "Potential Driven Reinforcement Learning for Hard Exploration Tasks." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/290.

Full text

Abstract:

Experience replay plays a crucial role in Reinforcement Learning (RL), enabling the agent to remember and reuse experience from the past. Most previous methods sample experience transitions using simple heuristics like uniformly sampling or prioritizing those good ones. Since humans can learn from both good and bad experiences, more sophisticated experience replay algorithms need to be developed. Inspired by the potential energy in physics, this work introduces the artificial potential field into experience replay and develops Potentialized Experience Replay (PotER) as a new and effective samp

APA, Harvard, Vancouver, ISO, and other styles

10

Sarafian, Elad, Aviv Tamar, and Sarit Kraus. "Constrained Policy Improvement for Efficient Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/396.

Full text

Abstract:

We propose a policy improvement algorithm for Reinforcement Learning (RL) termed Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the Q-value from finite experience data. Greedy policies or even constrained policy optimization algorithms that ignore these errors may suffer from an improvement penalty (i.e., a policy impairment). To reduce the penalty, the idea of RBI is to attenuate rapid policy changes to actions that were rarely sampled. This approach is shown to avoid catastrophic pe

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "RL ALGORITHMS"

1

A Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, 2020. http://dx.doi.org/10.4271/2020-01-5154.

Full text

Abstract:

At present, with the development of Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), the decision-making for automated vehicle based on connected environment conditions has attracted more attentions. Reliability, efficiency and generalization performance are the basic requirements for the vehicle decision-making system. Therefore, this paper proposed a decision-making method for connected autonomous driving based on Wasserstein Generative Adversarial Nets-Deep Deterministic Policy Gradient (WGAIL-DDPG) algorithm. In which, the key components for reinforcement learning (RL) model

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!