To see the other types of publications on this topic, follow the link: Factored reinforcement learning.

Journal articles on the topic 'Factored reinforcement learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Factored reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Wu, Bo, Yan Peng Feng, and Hong Yan Zheng. "A Model-Based Factored Bayesian Reinforcement Learning Approach." Applied Mechanics and Materials 513-517 (February 2014): 1092–95. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.1092.

Full text
Abstract:
Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown struc
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Chao, Yupeng Zhang, Jianqi Wang, et al. "Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (2024): 17453–60. http://dx.doi.org/10.1609/aaai.v38i16.29694.

Full text
Abstract:
In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address thi
APA, Harvard, Vancouver, ISO, and other styles
3

Kveton, Branislav, and Georgios Theocharous. "Structured Kernel-Based Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 569–75. http://dx.doi.org/10.1609/aaai.v27i1.8669.

Full text
Abstract:
Kernel-based reinforcement learning (KBRL) is a popular approach to learning non-parametric value function approximations. In this paper, we present structured KBRL, a paradigm for kernel-based RL that allows for modeling independencies in the transition and reward models of problems. Real-world problems often exhibit this structure and can be solved more efficiently when it is modeled. We make three contributions. First, we motivate our work, define a structured backup operator, and prove that it is a contraction. Second, we show how to evaluate our operator efficiently. Our analysis reveals
APA, Harvard, Vancouver, ISO, and other styles
4

Simão, Thiago D., and Matthijs T. J. Spaan. "Safe Policy Improvement with Baseline Bootstrapping in Factored Environments." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4967–74. http://dx.doi.org/10.1609/aaai.v33i01.33014967.

Full text
Abstract:
We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. Factored reinforcement learning, on the other hand, is kn
APA, Harvard, Vancouver, ISO, and other styles
5

Truong, Van Binh, and Long Bao Le. "Electric vehicle charging design: The factored action based reinforcement learning approach." Applied Energy 359 (April 2024): 122737. http://dx.doi.org/10.1016/j.apenergy.2024.122737.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

SIMM, Jaak, Masashi SUGIYAMA, and Hirotaka HACHIYA. "Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems." IEICE Transactions on Information and Systems E95.D, no. 10 (2012): 2426–37. http://dx.doi.org/10.1587/transinf.e95.d.2426.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Du, Juan, Anshuang Yu, Hao Zhou, Qianli Jiang, and Xueying Bai. "Research on Integrated Control Strategy for Highway Merging Bottlenecks Based on Collaborative Multi-Agent Reinforcement Learning." Applied Sciences 15, no. 2 (2025): 836. https://doi.org/10.3390/app15020836.

Full text
Abstract:
The merging behavior of vehicles at entry ramps and the speed differences between ramps and mainline traffic cause merging traffic bottlenecks. Current research, primarily focusing on single traffic control strategies, fails to achieve the desired outcomes. To address this issue, this paper explores an integrated control strategy combining Variable Speed Limits (VSL) and Lane Change Control (LCC) to optimize traffic efficiency in ramp merging areas. For scenarios involving multiple ramp merges, a multi-agent reinforcement learning approach is introduced to optimize control strategies in these
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Zizhao, Caroline Wang, Xuesu Xiao, Yuke Zhu, and Peter Stone. "Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15778–86. http://dx.doi.org/10.1609/aaai.v38i14.29507.

Full text
Abstract:
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and imp
APA, Harvard, Vancouver, ISO, and other styles
9

Mohamad Hafiz Abu Bakar, Abu Ubaidah bin Shamsudin, Ruzairi Abdul Rahim, Zubair Adil Soomro, and Andi Adrianshah. "Comparison Method Q-Learning and SARSA for Simulation of Drone Controller using Reinforcement Learning." Journal of Advanced Research in Applied Sciences and Engineering Technology 30, no. 3 (2023): 69–78. http://dx.doi.org/10.37934/araset.30.3.6978.

Full text
Abstract:
Nowadays, the advancement of drones is also factored in the development of a world surrounded by technologies. One of the aspects emphasized here is the difficulty of controlling the drone, and the system developed is still under full control by the users as well. Reinforcement Learning is used to enable the system to operate automatically, thus drone will learn the next movement based on the interaction between the agent and the environment. Through this study, Q-Learning and State-Action-Reward-State-Action (SARSA) are used in this study and the comparison of results involving both the perfo
APA, Harvard, Vancouver, ISO, and other styles
10

Kong, Minseok, and Jungmin So. "Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning." Applied Sciences 13, no. 1 (2023): 633. http://dx.doi.org/10.3390/app13010633.

Full text
Abstract:
There are several automated stock trading programs using reinforcement learning, one of which is an ensemble strategy. The main idea of the ensemble strategy is to train DRL agents and make an ensemble with three different actor–critic algorithms: Advantage Actor–Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). This novel idea was the concept mainly used in this paper. However, we did not stop there, but we refined the automated stock trading in two areas. First, we made another DRL-based ensemble and employed it as a new trading agent. We named
APA, Harvard, Vancouver, ISO, and other styles
11

Mutti, Mirco, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein, and Marcello Restelli. "Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (2023): 9251–59. http://dx.doi.org/10.1609/aaai.v37i8.26109.

Full text
Abstract:
In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free i
APA, Harvard, Vancouver, ISO, and other styles
12

Sui, Dong, Chenyu Ma, and Chunjie Wei. "Tactical Conflict Solver Assisting Air Traffic Controllers Using Deep Reinforcement Learning." Aerospace 10, no. 2 (2023): 182. http://dx.doi.org/10.3390/aerospace10020182.

Full text
Abstract:
To assist air traffic controllers (ATCOs) in resolving tactical conflicts, this paper proposes a conflict detection and resolution mechanism for handling continuous traffic flow by adopting finite discrete actions to resolve conflicts. The tactical conflict solver (TCS) was developed based on deep reinforcement learning (DRL) to train a TCS agent with the actor–critic using a Kronecker-factored trust region. The agent’s actions are determined by the ATCOs’ instructions, such as altitude, speed, and heading adjustments. The reward function is designed in accordance with air traffic control regu
APA, Harvard, Vancouver, ISO, and other styles
13

Hao, Zheng, Haowei Zhang, and Yipu Zhang. "Stock Portfolio Management by Using Fuzzy Ensemble Deep Reinforcement Learning Algorithm." Journal of Risk and Financial Management 16, no. 3 (2023): 201. http://dx.doi.org/10.3390/jrfm16030201.

Full text
Abstract:
The research objective of this article is to train a computer (agent) with market information data so it can learn trading strategies and beat the market index in stock trading without having to make any prediction on market moves. The approach assumes no trading knowledge, so the agent will only learn from conducting trading with historical data. In this work, we address this task by considering Reinforcement Learning (RL) algorithms for stock portfolio management. We first generate a three-dimension fuzzy vector to describe the current trend for each stock. Then the fuzzy terms, along with o
APA, Harvard, Vancouver, ISO, and other styles
14

Chu, Yunfei, Zhinong Wei, Guoqiang Sun, Haixiang Zang, Sheng Chen, and Yizhou Zhou. "Optimal home energy management strategy: A reinforcement learning method with actor-critic using Kronecker-factored trust region." Electric Power Systems Research 212 (November 2022): 108617. http://dx.doi.org/10.1016/j.epsr.2022.108617.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Abdulhai, Marwa, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro, and Jonathan P. How. "Context-Specific Representation Abstraction for Deep Option Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 5959–67. http://dx.doi.org/10.1609/aaai.v36i6.20541.

Full text
Abstract:
Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration. One promising approach that learns these options end-to-end is the option-critic (OC) framework. We examine and show in this paper that OC does not decompose a problem into simpler sub-problems, but instead increases the size of the search over policy space with each option considering the entire state space during learning. This issue can result in practical limitations of this method, including sample inefficient l
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Hengjie, Jianghao Zhu, Yun Zhou, Qi Feng, and Donghan Feng. "Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning." International Transactions on Electrical Energy Systems 2022 (December 15, 2022): 1–14. http://dx.doi.org/10.1155/2022/6854620.

Full text
Abstract:
Maximizing the return on electric vehicle charging station (EVCS) operation helps to expand the EVCS, thus expanding the EV (electric vehicle) stock and better addressing climate change. However, in the face of dynamic regulation scenarios with large data, multiple variables, and low time scales, the existing regulation strategies aiming at maximizing EVCS returns many times fail to meet the demand. To handle increasingly complex regulation scenarios, a deep reinforcement learning algorithm (DRL) based on the improved twin delayed deep deterministic policy gradient (TD3) is used to construct b
APA, Harvard, Vancouver, ISO, and other styles
17

Gavane, Vaibhav. "A Measure of Real-Time Intelligence." Journal of Artificial General Intelligence 4, no. 1 (2013): 31–48. http://dx.doi.org/10.2478/jagi-2013-0003.

Full text
Abstract:
Abstract We propose a new measure of intelligence for general reinforcement learning agents, based on the notion that an agent’s environment can change at any step of execution of the agent. That is, an agent is considered to be interacting with its environment in real-time. In this sense, the resulting intelligence measure is more general than the universal intelligence measure (Legg and Hutter, 2007) and the anytime universal intelligence test (Hernández-Orallo and Dowe, 2010). A major advantage of the measure is that an agent’s computational complexity is factored into the measure in a natu
APA, Harvard, Vancouver, ISO, and other styles
18

Street, Charlie, Masoumeh Mansouri, and Bruno Lacerda. "Formal Modelling for Multi-Robot Systems Under Uncertainty." Current Robotics Reports 4, no. 3 (2023): 55–64. https://doi.org/10.5281/zenodo.8420911.

Full text
Abstract:
<strong>Purpose of Review:</strong> To effectively synthesise and analyse multi-robot behaviour, we require formal task-level models which accurately capture multi-robot execution. In this paper, we review modelling formalisms for multi-robot systems under uncertainty and discuss how they can be used for planning, reinforcement learning, model checking, and simulation. <strong>Recent Findings:</strong> Recent work has investigated models which more accurately capture multi-robot execution by considering different forms of uncertainty, such as temporal uncertainty and partial observability, and
APA, Harvard, Vancouver, ISO, and other styles
19

Yedukondalu, Gangolu, Yasmeen Yasmeen, G. Vinoda Reddy, et al. "Framework for Virtualized Network Functions (VNFs) in Cloud of Things Based on Network Traffic Services." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 11s (2023): 38–48. http://dx.doi.org/10.17762/ijritcc.v11i11s.8068.

Full text
Abstract:
The cloud of things (CoT), which combines the Internet of Things (IoT) and cloud computing, may offer Virtualized Network Functions (VNFs) for IoT devices on a dynamic basis based on service-specific requirements. Although the provisioning of VNFs in CoT is described as an online decision-making problem, most widely used techniques primarily focus on defining the environment using simple models in order to discover the optimum solution. This leads to inefficient and coarse-grained provisioning since the Quality of Service (QoS) requirements for different types of CoT services are not considere
APA, Harvard, Vancouver, ISO, and other styles
20

Li, Guangliang, Randy Gomez, Keisuke Nakamura, and Bo He. "Human-Centered Reinforcement Learning: A Survey." IEEE Transactions on Human-Machine Systems 49, no. 4 (2019): 337–49. http://dx.doi.org/10.1109/thms.2019.2912447.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Li, Zhuoran, Chao Zeng, Zhen Deng, Qinling Xu, Bingwei He, and Jianwei Zhang. "Learning Variable Impedance Control for Robotic Massage With Deep Reinforcement Learning: A Novel Learning Framework." IEEE Systems, Man, and Cybernetics Magazine 10, no. 1 (2024): 17–27. http://dx.doi.org/10.1109/msmc.2022.3231416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

White, Jack, Tatiana Kameneva, and Chris McCarthy. "Vision Processing for Assistive Vision: A Deep Reinforcement Learning Approach." IEEE Transactions on Human-Machine Systems 52, no. 1 (2022): 123–33. http://dx.doi.org/10.1109/thms.2021.3121661.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Chihara, Takanori, and Jiro Sakamoto. "Generating deceleration behavior of automatic driving by reinforcement learning that reflects passenger discomfort." International Journal of Industrial Ergonomics 91 (September 2022): 103343. http://dx.doi.org/10.1016/j.ergon.2022.103343.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Wang, Zhe, Helai Huang, Jinjun Tang, Xianwei Meng, and Lipeng Hu. "Velocity control in car-following behavior with autonomous vehicles using reinforcement learning." Accident Analysis & Prevention 174 (September 2022): 106729. http://dx.doi.org/10.1016/j.aap.2022.106729.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Salehi, V., T. T. Tran, B. Veitch, and D. Smith. "A reinforcement learning development of the FRAM for functional reward-based assessments of complex systems performance." International Journal of Industrial Ergonomics 88 (March 2022): 103271. http://dx.doi.org/10.1016/j.ergon.2022.103271.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Matarese, Marco, Alessandra Sciutti, Francesco Rea, and Silvia Rossi. "Toward Robots’ Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher." IEEE Transactions on Human-Machine Systems 51, no. 6 (2021): 578–89. http://dx.doi.org/10.1109/thms.2021.3116119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Roy, Ananya, Moinul Hossain, and Yasunori Muromachi. "A deep reinforcement learning-based intelligent intervention framework for real-time proactive road safety management." Accident Analysis & Prevention 165 (February 2022): 106512. http://dx.doi.org/10.1016/j.aap.2021.106512.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Ashutosh, Yadav, and Archana. "Reinforcement Learning with Variable Fractional Order Approach for MPPT Control of PV Systems for the Real Operating Climatic Condition." International Journal of Recent Technology and Engineering (IJRTE) 10, no. 1 (2021): 44–53. https://doi.org/10.35940/ijrte.A5631.0510121.

Full text
Abstract:
The designing of maximum power point tracking (MPPT) controller is an integral part of the PV array system to ensure a continuous supply of energy in dynamic environmental conditions. The most challenging part here is to design a model that can track the maximum point irrespective of variations in environmental conditions and its parametric variations. The model designed in this article combats both the challenges as it is based on reinforcement learning with fractional-order. The application of Deep Q-learning makes the model parametric free and once the model trained can be implanted in a di
APA, Harvard, Vancouver, ISO, and other styles
29

Gong, Yaobang, Mohamed Abdel-Aty, Jinghui Yuan, and Qing Cai. "Multi-Objective reinforcement learning approach for improving safety at intersections with adaptive traffic signal control." Accident Analysis & Prevention 144 (September 2020): 105655. http://dx.doi.org/10.1016/j.aap.2020.105655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Yang, Kui, Mohammed Quddus, and Constantinos Antoniou. "Developing a new real-time traffic safety management framework for urban expressways utilizing reinforcement learning tree." Accident Analysis & Prevention 178 (December 2022): 106848. http://dx.doi.org/10.1016/j.aap.2022.106848.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Qin, ShuJin, ZhiLiang Bi, Jiacun Wang, et al. "Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems: Demonstrating and Comparing a Proposed Model." IEEE Systems, Man, and Cybernetics Magazine 10, no. 2 (2024): 24–31. http://dx.doi.org/10.1109/msmc.2023.3303615.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Yan, Longhao, Ping Wang, Fan Qi, Zhuohang Xu, Ronghui Zhang, and Yu Han. "A task-level emergency experience reuse method for freeway accidents onsite disposal with policy distilled reinforcement learning." Accident Analysis & Prevention 190 (September 2023): 107179. http://dx.doi.org/10.1016/j.aap.2023.107179.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Nasernejad, Payam, Tarek Sayed, and Rushdi Alsaleh. "Modeling pedestrian behavior in pedestrian-vehicle near misses: A continuous Gaussian Process Inverse Reinforcement Learning (GP-IRL) approach." Accident Analysis & Prevention 161 (October 2021): 106355. http://dx.doi.org/10.1016/j.aap.2021.106355.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Guo, Hongyu, Kun Xie, and Mehdi Keyvan-Ekbatani. "Modeling driver’s evasive behavior during safety–critical lane changes: Two-dimensional time-to-collision and deep reinforcement learning." Accident Analysis & Prevention 186 (June 2023): 107063. http://dx.doi.org/10.1016/j.aap.2023.107063.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Jin, Jieling, Ye Li, Helai Huang, Yuxuan Dong, and Pan Liu. "A variable speed limit control approach for freeway tunnels based on the model-based reinforcement learning framework with safety perception." Accident Analysis & Prevention 201 (June 2024): 107570. http://dx.doi.org/10.1016/j.aap.2024.107570.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Zhang, Gongquan, Fangrong Chang, Jieling Jin, Fan Yang, and Helai Huang. "Multi-objective deep reinforcement learning approach for adaptive traffic signal control system with concurrent optimization of safety, efficiency, and decarbonization at intersections." Accident Analysis & Prevention 199 (May 2024): 107451. http://dx.doi.org/10.1016/j.aap.2023.107451.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Vandaele, Mathilde, and Sanna Stålhammar. "“Hope dies, action begins?” The role of hope for proactive sustainability engagement among university students." International Journal of Sustainability in Higher Education 23, no. 8 (2022): 272–89. http://dx.doi.org/10.1108/ijshe-11-2021-0463.

Full text
Abstract:
Purpose Education in sustainability science is largely ignorant of the implications of the environmental crisis on inner dimensions, including mindsets, beliefs, values and worldviews. Increased awareness of the acuteness and severity of the environmental and climate crisis has caused a contemporary spread of hopelessness among younger generations. This calls for a better understanding of potential generative forces of hope in the face of climate change. This paper aims to uncover strategies for fostering constructive hope among students. Design/methodology/approach This study examines, throug
APA, Harvard, Vancouver, ISO, and other styles
38

Hoffmann, Patrick, Kirill Gorelik, and Valentin Ivanov. "Comparison of Reinforcement Learning and Model Predictive Control for Automated Generation of Optimal Control for Dynamic Systems within a Design Space Exploration Framework." International Journal of Automotive Engineering 15, no. 1 (2024): 19–26. http://dx.doi.org/10.20485/jsaeijae.15.1_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

A.A., Mekhtiev, Asadova Sh.M., Guseinov Sh.B., and Vagabova G.R. "Engagement of mechanisms of cellular differentiation in formation of memory traces." Journal of Life Sciences and Biomedicine 74, no. 1 (2019): 53–62. https://doi.org/10.5281/zenodo.7327642.

Full text
Abstract:
The article concerns study of the effect of antibody-mediated blockade of serotonin-modulating anticonsolidation protein (SMAP), being in linear relation with serotonin, on the formation of memory in the rats on the conditioned models of alternative running and 2-lever operant differentiation with food reinforcement, as well as on the level of nerve growth factor (NGF) in the brain structures. In the 1st series of studies in the rats, achieved 80% level of correct trials on the model of alternative running, through ELISA-test the level of SMAP was evaluated in the brain occipital and temporal
APA, Harvard, Vancouver, ISO, and other styles
40

Wu, Bo, Yanpeng Feng, and Hongyan Zheng. "Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process." Journal of Computers 9, no. 4 (2014). http://dx.doi.org/10.4304/jcp.9.4.845-850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Xu, Jianyu, Bin Liu, Xiujie Zhao, and Xiao-Lin Wang. "Online reinforcement learning for condition-based group maintenance using factored Markov decision processes." European Journal of Operational Research, November 2023. http://dx.doi.org/10.1016/j.ejor.2023.11.039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Amato, Christopher, and Frans Oliehoek. "Scalable Planning and Learning for Multiagent POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 29, no. 1 (2015). http://dx.doi.org/10.1609/aaai.v29i1.9439.

Full text
Abstract:
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian
APA, Harvard, Vancouver, ISO, and other styles
43

Street, Charlie, Masoumeh Mansouri, and Bruno Lacerda. "Formal Modelling for Multi-Robot Systems Under Uncertainty." Current Robotics Reports, August 15, 2023. http://dx.doi.org/10.1007/s43154-023-00104-0.

Full text
Abstract:
Abstract Purpose of Review To effectively synthesise and analyse multi-robot behaviour, we require formal task-level models which accurately capture multi-robot execution. In this paper, we review modelling formalisms for multi-robot systems under uncertainty and discuss how they can be used for planning, reinforcement learning, model checking, and simulation. Recent Findings Recent work has investigated models which more accurately capture multi-robot execution by considering different forms of uncertainty, such as temporal uncertainty and partial observability, and modelling the effects of r
APA, Harvard, Vancouver, ISO, and other styles
44

Sui, D., Z. Zhou, and X. Cui. "Priority-based intelligent resolution method of multi-aircraft flight conflicts." Aeronautical Journal, October 16, 2024, 1–25. http://dx.doi.org/10.1017/aer.2024.75.

Full text
Abstract:
Abstract The rising demand for air traffic will inevitably result in a surge in both the number and complexity of flight conflicts, necessitating intelligent strategies for conflict resolution. This study addresses the critical challenges of scalability and real-time performance in multi-aircraft flight conflict resolution by proposing a comprehensive method that integrates a priority ranking mechanism with a conflict resolution model based on the Markov decision process (MDP). Within this framework, the proximity between aircraft in a multi-aircraft conflict set is dynamically assessed to est
APA, Harvard, Vancouver, ISO, and other styles
45

Xie, Ziyang, Lu Lu, Hanwen Wang, Bingyi Su, Yunan Liu, and Xu Xu. "Improving Workers’ Musculoskeletal Health During Human-Robot Collaboration Through Reinforcement Learning." Human Factors: The Journal of the Human Factors and Ergonomics Society, May 22, 2023, 001872082311775. http://dx.doi.org/10.1177/00187208231177574.

Full text
Abstract:
Objective This study aims to improve workers’ postures and thus reduce the risk of musculoskeletal disorders in human-robot collaboration by developing a novel model-free reinforcement learning method. Background Human-robot collaboration has been a flourishing work configuration in recent years. Yet, it could lead to work-related musculoskeletal disorders if the collaborative tasks result in awkward postures for workers. Methods The proposed approach follows two steps: first, a 3D human skeleton reconstruction method was adopted to calculate workers’ continuous awkward posture (CAP) score; se
APA, Harvard, Vancouver, ISO, and other styles
46

Arents, Janis, and Modris Greitans. "Smart Industrial Robot Control Trends, Challenges and OpportunitiesWithin Manufacturing." January 17, 2022. https://doi.org/10.3390/ app12020937.

Full text
Abstract:
Industrial robots and associated control methods are continuously developing. With the recent progress in the field of artificial intelligence, new perspectives in industrial robot control strategies have emerged, and prospects towards cognitive robots have arisen. AI-based robotic systems are strongly becoming one of the main areas of focus, as flexibility and deep understanding of complex manufacturing processes are becoming the key advantage to raise competitiveness. This review first expresses the significance of smart industrial robot control in manufacturing towards future factories by l
APA, Harvard, Vancouver, ISO, and other styles
47

Rigoli, Lillian, Gaurav Patil, Patrick Nalepka, et al. "A Comparison of Dynamical Perceptual-Motor Primitives and Deep Reinforcement Learning for Human-Artificial Agent Training Systems." Journal of Cognitive Engineering and Decision Making, April 25, 2022, 155534342210929. http://dx.doi.org/10.1177/15553434221092930.

Full text
Abstract:
Effective team performance often requires that individuals engage in team training exercises. However, organizing team-training scenarios presents economic and logistical challenges and can be prone to trainer bias and fatigue. Accordingly, a growing body of research is investigating the effectiveness of employing artificial agents (AAs) as synthetic teammates in team training simulations, and, relatedly, how to best develop AAs capable of robust, human-like behavioral interaction. Motivated by these challenges, the current study examined whether task dynamical models of expert human herding b
APA, Harvard, Vancouver, ISO, and other styles
48

Fragkos, Georgios, Jay Johnson, and Eirini Eleni Tsiropoulou. "Dynamic Role-Based Access Control Policy for Smart Grid Applications: An Offline Deep Reinforcement Learning Approach." IEEE Transactions on Human-Machine Systems, 2022, 1–13. http://dx.doi.org/10.1109/thms.2022.3163185.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Sun, Yuxiang, Bo Yuan, Qi Xiang, et al. "Intelligent Decision-Making and Human Language Communication Based on Deep Reinforcement Learning in a Wargame Environment." IEEE Transactions on Human-Machine Systems, 2022, 1–14. http://dx.doi.org/10.1109/thms.2022.3225867.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Sergio, Barrachina, Chiumento Alessandro, and Bellalta Boris. "Stateless Reinforcement Learning for Multi-Agent Systems: the Case of Spectrum Allocation in Dynamic Channel Bonding WLANs." IFIP Wirel. Days 2021-June (August 1, 2021). https://doi.org/10.5281/zenodo.7214479.

Full text
Abstract:
Spectrum allocation in the form of primary channel and bandwidth selection is a key factor for dynamic channel bonding (DCB) wireless local area networks (WLANs). To cope with varying environments, where networks change their configurations on their own, the wireless community is looking towards solutions aided by machine learning (ML), and especially reinforcement learning (RL) given its trial-and-error approach. However, strong assumptions are normally made to let complex RL models converge to near-optimal solutions. Our goal with this paper is two-fold: justify in a comprehensible way why R
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!