Log in

Relevant bibliographies by topics / Multi-Agent Q-Learning / Journal articles

To see the other types of publications on this topic, follow the link: Multi-Agent Q-Learning.

Journal articles on the topic 'Multi-Agent Q-Learning'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-Agent Q-Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hwang, Kao-Shing, Wei-Cheng Jiang, Yu-Hong Lin, and Li-Hsin Lai. "CONTINUOUS Q-LEARNING FOR MULTI-AGENT COOPERATION." Cybernetics and Systems 43, no. 3 (2012): 227–56. http://dx.doi.org/10.1080/01969722.2012.660032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Galstyan, Aram. "Continuous strategy replicator dynamics for multi-agent Q-learning." Autonomous Agents and Multi-Agent Systems 26, no. 1 (2011): 37–53. http://dx.doi.org/10.1007/s10458-011-9181-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

ICHIKAWA, Yoshihiro, and Keiki TAKADAMA. "Conflict Avoidance for Multi-agent Q-learning Based on Learning Progress." Transactions of the Society of Instrument and Control Engineers 48, no. 11 (2012): 764–72. http://dx.doi.org/10.9746/sicetr.48.764.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Xiao, Yuchen, Joshua Hoffman, Tian Xia, and Christopher Amato. "Multi-Agent/Robot Deep Reinforcement Learning with Macro-Actions (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13965–66. http://dx.doi.org/10.1609/aaai.v34i10.7255.

Full text

Abstract:

We consider the challenges of learning multi-agent/robot macro-action-based deep Q-nets including how to properly update each macro-action value and accurately maintain macro-action-observation trajectories. We address these challenges by first proposing two fundamental frameworks for learning macro-action-value function and joint macro-action-value function. Furthermore, we present two new approaches of learning decentralized macro-action-based policies, which involve a new double Q-update rule that facilitates the learning of decentralized Q-nets by using a centralized Q-net for action selec

APA, Harvard, Vancouver, ISO, and other styles

5

Ge, Yangyang, Fei Zhu, Wei Huang, Peiyao Zhao, and Quan Liu. "Multi-agent cooperation Q-learning algorithm based on constrained Markov Game." Computer Science and Information Systems 17, no. 2 (2020): 647–64. http://dx.doi.org/10.2298/csis191220009g.

Full text

Abstract:

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in th

APA, Harvard, Vancouver, ISO, and other styles

6

Matta, M., G. C. Cardarilli, L. Di Nunzio, et al. "Q‐RTS: a real‐time swarm intelligence based on multi‐agent Q‐learning." Electronics Letters 55, no. 10 (2019): 589–91. http://dx.doi.org/10.1049/el.2019.0244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Matignon, Laetitia, Guillaume J. Laurent, and Nadine Le Fort-Piat. "Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems." Knowledge Engineering Review 27, no. 1 (2012): 1–31. http://dx.doi.org/10.1017/s0269888912000057.

Full text

Abstract:

AbstractIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of

APA, Harvard, Vancouver, ISO, and other styles

8

Park, Kui-Hong, Yong-Jae Kim, and Jong-Hwan Kim. "Modular Q-learning based multi-agent cooperation for robot soccer." Robotics and Autonomous Systems 35, no. 2 (2001): 109–22. http://dx.doi.org/10.1016/s0921-8890(01)00114-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Yin, Xijie, and Dongxin Yang. "Q Value Reinforcement Learning Algorithm Based on Multi Agent System." Journal of Physics: Conference Series 1069 (August 2018): 012094. http://dx.doi.org/10.1088/1742-6596/1069/1/012094.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Hwang, Kao-Shing, Yu-Jen Chen, Wei-Cheng Jiang, and Tzung-Feng Lin. "Continuous Action Generation of Q-Learning in Multi-Agent Cooperation." Asian Journal of Control 15, no. 4 (2012): 1011–20. http://dx.doi.org/10.1002/asjc.614.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Pourpanah, Farhad, Choo Jun Tan, Chee Peng Lim, and Junita Mohamad-Saleh. "A Q-learning-based multi-agent system for data classification." Applied Soft Computing 52 (March 2017): 519–31. http://dx.doi.org/10.1016/j.asoc.2016.10.016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Wang, Yuandou, Hang Liu, Wanbo Zheng, et al. "Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning." IEEE Access 7 (2019): 39974–82. http://dx.doi.org/10.1109/access.2019.2902846.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Ahmed, Azzouna, Guezmil Amel, Sakly Anis, and Mtibaa Abdellatif. "Resource Allocation for Multi-User Cognitive Radio Systems Using Multi-agent Q-Learning." Procedia Computer Science 10 (2012): 46–53. http://dx.doi.org/10.1016/j.procs.2012.06.010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Zhu, Changxi, Ho-Fung Leung, Shuyue Hu, and Yi Cai. "A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint." ACM Transactions on Autonomous and Adaptive Systems 15, no. 2 (2021): 1–28. http://dx.doi.org/10.1145/3447268.

Full text

Abstract:

In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions a

APA, Harvard, Vancouver, ISO, and other styles

15

Mao, Hangyu, Wulong Liu, Jianye Hao, et al. "Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 7219–26. http://dx.doi.org/10.1609/aaai.v34i05.6212.

Full text

Abstract:

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with exi

APA, Harvard, Vancouver, ISO, and other styles

16

Kofinas, P., A. I. Dounis, and G. A. Vouros. "Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids." Applied Energy 219 (June 2018): 53–67. http://dx.doi.org/10.1016/j.apenergy.2018.03.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Jiang, Yu Lian, Jian Chang Liu, and Shu Bin Tan. "Application of Q Learning-Based Self-Tuning PID with DRNN in the Strip Flatness and Gauge System." Applied Mechanics and Materials 494-495 (February 2014): 1377–80. http://dx.doi.org/10.4028/www.scientific.net/amm.494-495.1377.

Full text

Abstract:

In view of the process of automatic flatness control and automatic gauge control that is a nonlinear system with multi-dimensions, multi-variables, strong coupling and time variation, a novel control method called self-tuning PID with diagonal recurrent neural network (DRNN-PID) based on Q learning is proposed. It is able to coordinate the coupling of flatness control and gauge control agents to get the satisfactory control requirements without decoupling directly and amend output control laws by DRNN-PID adaptively. Decomposition-coordination is utilized to establish a novel multi-agent syste

APA, Harvard, Vancouver, ISO, and other styles

18

Raju, Leo, R. S. Milton, and S. Sakthiyanandan. "Energy Optimization of Solar Micro-Grid Using Multi Agent Reinforcement Learning." Applied Mechanics and Materials 787 (August 2015): 843–47. http://dx.doi.org/10.4028/www.scientific.net/amm.787.843.

Full text

Abstract:

In this paper, two solar Photovoltaic (PV) systems are considered; one in the department with capacity of 100 kW and the other in the hostel with capacity of 200 kW. Each one has battery and load. The capital cost and energy savings by conventional methods are compared and it is proved that the energy dependency from grid is reduced in solar micro-grid element, operating in distributed environment. In the smart grid frame work, the grid energy consumption is further reduced by optimal scheduling of the battery, using Reinforcement Learning. Individual unit optimization is done by a model free

APA, Harvard, Vancouver, ISO, and other styles

19

Liu, Chang An, Fei Liu, Chun Yang Liu, and Hua Wu. "Multi-Agent Reinforcement Learning Based on K-Means Clustering in Multi-Robot Cooperative Systems." Advanced Materials Research 216 (March 2011): 75–80. http://dx.doi.org/10.4028/www.scientific.net/amr.216.75.

Full text

Abstract:

To solve the curse of dimensionality problem in multi-agent reinforcement learning, a learning method based on k-means is presented in this paper. In this method, the environmental state is represented as key state factors. The state space explosion is avoided by classifying states into different clusters using k-means. The learning rate is improved by assigning different states to existent clusters, as well as corresponding strategy. Compared to traditional Q-learning, our experimental results of the multi-robot cooperation show that our scheme improves the team learning ability efficiently.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhao, Wenjie, Zhou Fang, and Zuqiang Yang. "Four-Dimensional Trajectory Generation for UAVs Based on Multi-Agent Q Learning." Journal of Navigation 73, no. 4 (2020): 874–91. http://dx.doi.org/10.1017/s0373463320000016.

Full text

Abstract:

A distributed four-dimensional (4D) trajectory generation method based on multi-agent Q learning is presented for multiple unmanned aerial vehicles (UAVs). Based on this method, each vehicle can intelligently generate collision-free 4D trajectories for time-constrained cooperative flight tasks. For a single UAV, the 4D trajectory is generated by the bionic improved tau gravity guidance strategy, which can synchronously guide the position and velocity to the desired values at the arrival time. Furthermore, to optimise trajectory parameters, the continuous state and action wire fitting neural ne

APA, Harvard, Vancouver, ISO, and other styles

21

Yang, Min, Dounan Tang, Haoyang Ding, Wei Wang, Tianming Luo, and Sida Luo. "EVALUATING STAGGERED WORKING HOURS USING A MULTI-AGENT-BASED Q-LEARNING MODEL." TRANSPORT 29, no. 3 (2014): 296–306. http://dx.doi.org/10.3846/16484142.2014.953997.

Full text

Abstract:

Staggered working hours has the potential to alleviate excessive demands on urban transport networks during the morning and afternoon peak hours and influence the travel behavior of individuals by affecting their activity schedules and reducing their commuting times. This study proposes a multi-agent-based Q-learning algorithm for evaluating the influence of staggered work hours by simulating travelers’ time and location choices in their activity patterns. Interactions among multiple travelers were also considered. Various types of agents were identified based on real activity–travel data for

APA, Harvard, Vancouver, ISO, and other styles

22

Shimotakahara, Kevin, Medhat Elsayed, Karin Hinzer, and Melike Erol-Kantarci. "High-Reliability Multi-Agent Q-Learning-Based Scheduling for D2D Microgrid Communications." IEEE Access 7 (2019): 74412–21. http://dx.doi.org/10.1109/access.2019.2920662.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Jalalimanesh, Ammar, Hamidreza Shahabi Haghighi, Abbas Ahmadi, Hossein Hejazian, and Madjid Soltani. "Multi-objective optimization of radiotherapy: distributed Q-learning and agent-based simulation." Journal of Experimental & Theoretical Artificial Intelligence 29, no. 5 (2017): 1071–86. http://dx.doi.org/10.1080/0952813x.2017.1292319.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Pei, Zhaoyi, Songhao Piao, Meixiang Quan, Muhammad Zuhair Qadir, and Guo Li. "Active collaboration in relative observation for multi-agent visual simultaneous localization and mapping based on Deep Q Network." International Journal of Advanced Robotic Systems 17, no. 2 (2020): 172988142092021. http://dx.doi.org/10.1177/1729881420920216.

Full text

Abstract:

This article proposes a unique active relative localization mechanism for multi-agent simultaneous localization and mapping, in which an agent to be observed is considered as a task, and the others who want to assist that agent will perform that task by relative observation. A task allocation algorithm based on deep reinforcement learning is proposed for this mechanism. Each agent can choose whether to localize other agents or to continue independent simultaneous localization and mapping on its own initiative. By this way, the process of each agent simultaneous localization and mapping will be

APA, Harvard, Vancouver, ISO, and other styles

25

WARDELL, DEAN C., and GILBERT L. PETERSON. "FUZZY STATE AGGREGATION AND POLICY HILL CLIMBING FOR STOCHASTIC ENVIRONMENTS." International Journal of Computational Intelligence and Applications 06, no. 03 (2006): 413–28. http://dx.doi.org/10.1142/s1469026806001903.

Full text

Abstract:

Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as

APA, Harvard, Vancouver, ISO, and other styles

26

Yun, Soh Chin, S. Parasuraman, Velappa Ganapathy, and Halim Kusuma Joe. "Neural Q-Learning Based Mobile Robot Navigation." Advanced Materials Research 433-440 (January 2012): 721–26. http://dx.doi.org/10.4028/www.scientific.net/amr.433-440.721.

Full text

Abstract:

This research is focused on the integration of multi-layer Artificial Neural Network (ANN) and Q-Learning to perform online learning control. In the first learning phase, the agent explores the unknown surroundings and gathers state-action information through the unsupervised Q-Learning algorithm. Second training process involves ANN which utilizes the state-action information gathered in the earlier phase of training samples. During final application of the controller, Q-Learning would be used as primary navigating tool whereas the trained Neural Network will be employed when approximation is

APA, Harvard, Vancouver, ISO, and other styles

27

Sadeh, J., and M. Rahimiyan. "Q-Learning Based Cooperative Multi-Agent System Applied to Coordination of Overcurrent Relays." Journal of Applied Sciences 8, no. 21 (2008): 3924–30. http://dx.doi.org/10.3923/jas.2008.3924.3930.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Feng, Tao, Jilie Zhang, Yin Tong, and Huaguang Zhang. "Q-learning algorithm in solving consensusability problem of discrete-time multi-agent systems." Automatica 128 (June 2021): 109576. http://dx.doi.org/10.1016/j.automatica.2021.109576.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Pérez-Pons, María E., Ricardo S. Alonso, Oscar García, Goreti Marreiros, and Juan Manuel Corchado. "Deep Q-Learning and Preference Based Multi-Agent System for Sustainable Agricultural Market." Sensors 21, no. 16 (2021): 5276. http://dx.doi.org/10.3390/s21165276.

Full text

Abstract:

Yearly population growth will lead to a significant increase in agricultural production in the coming years. Twenty-first century agricultural producers will be facing the challenge of achieving food security and efficiency. This must be achieved while ensuring sustainable agricultural systems and overcoming the problems posed by climate change, depletion of water resources, and the potential for increased erosion and loss of productivity due to extreme weather conditions. Those environmental consequences will directly affect the price setting process. In view of the price oscillations and the

APA, Harvard, Vancouver, ISO, and other styles

30

Yang, Qingpei, Zhuangzhi Han, Han Wang, Jian Dong, and Yang Zhao. "Radar Waveform Design Based on Multi-Agent Reinforcement Learning." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 10 (2021): 2159035. http://dx.doi.org/10.1142/s0218001421590357.

Full text

Abstract:

Under the actual combat background, prior information on radar targets has great uncertainty. The waveform designed based on prior information does not meet the requirements for the estimation of parameter. Thus, an algorithm for designing a waveform based on reinforcement learning is proposed to solve the above-mentioned problem. The problem on radar target parameter estimation is modeled as a framework for multi-agent reinforcement learning. Each frequency band acts as an agent, collectively interacts with the environment, independently receives observation results, shares rewards, and const

APA, Harvard, Vancouver, ISO, and other styles

31

Aref, Mohamed A., and Sudharman K. Jayaweera. "Jamming-Resilient Wideband Cognitive Radios with Multi-Agent Reinforcement Learning." International Journal of Software Science and Computational Intelligence 10, no. 3 (2018): 1–23. http://dx.doi.org/10.4018/ijssci.2018070101.

Full text

Abstract:

This article presents a design of a wideband autonomous cognitive radio (WACR) for anti-jamming and interference-avoidance. The proposed system model allows multiple WACRs to simultaneously operate over the same spectrum range producing a multi-agent environment. The objective of each radio is to predict and evade a dynamic jammer signal as well as avoiding transmissions of other WACRs. The proposed cognitive framework is made of two operations: sensing and transmission. Each operation is helped by its own learning algorithm based on Q-learning, but both will be experiencing the same RF enviro

APA, Harvard, Vancouver, ISO, and other styles

32

Bouzahzah, Mounira, and Ramdane Maamri. "An Approach for Fault Tolerance in Multi-Agent Systems using Learning Agents." International Journal of Intelligent Information Technologies 11, no. 3 (2015): 30–44. http://dx.doi.org/10.4018/ijiit.2015070103.

Full text

Abstract:

Through this paper, the authors propose a new approach to get fault tolerant multi-agent systems using learning agents. Generally, the exceptions in the multi-agent system are divided into two main groups: private exceptions that are treated directly by the agents and global exceptions that combine all unexpected exceptions that need handlers to be solved. The proposed approach solves the problem of these global exceptions using learning agents. This work uses a formal model called hierarchical plans to model the activities of the system's agents in order to facilitate the exception detection

APA, Harvard, Vancouver, ISO, and other styles

33

Luviano-Cruz, David, Francesco Garcia-Luna, Luis Pérez-Domínguez, and S. Gadi. "Multi-Agent Reinforcement Learning Using Linear Fuzzy Model Applied to Cooperative Mobile Robots." Symmetry 10, no. 10 (2018): 461. http://dx.doi.org/10.3390/sym10100461.

Full text

Abstract:

A multi-agent system (MAS) is suitable for addressing tasks in a variety of domains without any programmed behaviors, which makes it ideal for the problems associated with the mobile robots. Reinforcement learning (RL) is a successful approach used in the MASs to acquire new behaviors; most of these select exact Q-values in small discrete state space and action space. This article presents a joint Q-function linearly fuzzified for a MAS’ continuous state space, which overcomes the dimensionality problem. Also, this article gives a proof for the convergence and existence of the solution propose

APA, Harvard, Vancouver, ISO, and other styles

34

Ichikawa, Yoshihiro, and Keiki Takadama. "Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem." Journal of Advanced Computational Intelligence and Intelligent Informatics 17, no. 6 (2013): 926–31. http://dx.doi.org/10.20965/jaciii.2013.p0926.

Full text

Abstract:

This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.

APA, Harvard, Vancouver, ISO, and other styles

35

Cao, Huazhen, Chong Gao, Xuan He, Yang Li та Tao Yu. "Multi-Agent Cooperation Based Reduced-Dimension Q(λ) Learning for Optimal Carbon-Energy Combined-Flow". Energies 13, № 18 (2020): 4778. http://dx.doi.org/10.3390/en13184778.

Full text

Abstract:

This paper builds an optimal carbon-energy combined-flow (OCECF) model to optimize the carbon emission and energy losses of power grids simultaneously. A novel multi-agent cooperative reduced-dimension Q(λ) (MCR-Q(λ)) is proposed for solving the model. Firstly, on the basis of the traditional single-objective Q(λ) algorithm, the solution space is reduced effectively to shrink the size of Q-value matrices. Then, based on the concept of ant cooperative cooperation, multi-agents are used to update the Q-value matrices iteratively, which can significantly improve the updating rate. The simulation

APA, Harvard, Vancouver, ISO, and other styles

36

Hooshyar, Milad, S. Jamshid Mousavi, Masoud Mahootchi, and Kumaraswamy Ponnambalam. "Aggregation–Decomposition-Based Multi-Agent Reinforcement Learning for Multi-Reservoir Operations Optimization." Water 12, no. 10 (2020): 2688. http://dx.doi.org/10.3390/w12102688.

Full text

Abstract:

Stochastic dynamic programming (SDP) is a widely-used method for reservoir operations optimization under uncertainty but suffers from the dual curses of dimensionality and modeling. Reinforcement learning (RL), a simulation-based stochastic optimization approach, can nullify the curse of modeling that arises from the need for calculating a very large transition probability matrix. RL mitigates the curse of the dimensionality problem, but cannot solve it completely as it remains computationally intensive in complex multi-reservoir systems. This paper presents a multi-agent RL approach combined

APA, Harvard, Vancouver, ISO, and other styles

37

Ray, Dip Narayan, and Somajyoti Majumder. "Proposed Methodology for Application of Human-like gradual Multi-Agent Q-Learning (HuMAQ) for Multi-robot Exploration." IOP Conference Series: Materials Science and Engineering 65 (July 22, 2014): 012016. http://dx.doi.org/10.1088/1757-899x/65/1/012016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Zheng, Yanbin, Wenxin Fan, and Mengyun Han. "Research on multi-agent collaborative hunting algorithm based on game theory and Q-learning for a single escaper." Journal of Intelligent & Fuzzy Systems 40, no. 1 (2021): 205–19. http://dx.doi.org/10.3233/jifs-191222.

Full text

Abstract:

The multi-agent collaborative hunting problem is a typical problem in multi-agent coordination and collaboration research. Aiming at the multi-agent hunting problem with learning ability, a collaborative hunt method based on game theory and Q-learning is proposed. Firstly, a cooperative hunting team is established and a game model of cooperative hunting is built. Secondly, through the learning of the escaper’s strategy choice, the trajectory of the escaper’s limited T-step cumulative reward is established, and the trajectory is adjusted to the hunter’s strategy set. Finally, the Nash equilibri

APA, Harvard, Vancouver, ISO, and other styles

39

Long, Mingkang, Housheng Su, Xiaoling Wang, Guo-Ping Jiang, and Xiaofan Wang. "An iterative Q-learning based global consensus of discrete-time saturated multi-agent systems." Chaos: An Interdisciplinary Journal of Nonlinear Science 29, no. 10 (2019): 103127. http://dx.doi.org/10.1063/1.5120106.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Abdi, Javad, Baher Abdulhai, and Behzad Moshiri. "Emotional temporal difference Q-learning signals in multi-agent system cooperation: real case studies." IET Intelligent Transport Systems 7, no. 3 (2013): 315–26. http://dx.doi.org/10.1049/iet-its.2011.0158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Wang, Dan, Wei Zhang, Bin Song, Xiaojiang Du, and Mohsen Guizani. "Market-Based Model in CR-IoT: A Q-Probabilistic Multi-Agent Reinforcement Learning Approach." IEEE Transactions on Cognitive Communications and Networking 6, no. 1 (2020): 179–88. http://dx.doi.org/10.1109/tccn.2019.2950242.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Viehmann, Johannes, Stefan Lorenczik, and Raimund Malischek. "Multi-unit multiple bid auctions in balancing markets: An agent-based Q-learning approach." Energy Economics 93 (January 2021): 105035. http://dx.doi.org/10.1016/j.eneco.2020.105035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

de Hauwere, Yann-Michaël, Sam Devlin, Daniel Kudenko, and Ann Nowé. "Context-sensitive reward shaping for sparse interaction multi-agent systems." Knowledge Engineering Review 31, no. 1 (2016): 59–76. http://dx.doi.org/10.1017/s0269888915000193.

Full text

Abstract:

AbstractPotential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.This paper describes the use of context-aware potential functions in a m

APA, Harvard, Vancouver, ISO, and other styles

44

Paek, Min-Jae, Yu-Jin Na, Won-Seok Lee, Jae-Hyun Ro, and Hyoung-Kyu Song. "A Novel Relay Selection Scheme Based on Q-Learning in Multi-Hop Wireless Networks." Applied Sciences 10, no. 15 (2020): 5252. http://dx.doi.org/10.3390/app10155252.

Full text

Abstract:

In wireless communication systems, reliability, low latency and power are essential in large scale multi-hop environment. Multi-hop based cooperative communication is an efficient way to achieve goals of wireless networks. This paper proposes a relay selection scheme for reliable transmission by selecting an optimal relay. The proposed scheme uses a signal-to-noise ratio (SNR) based Q-learning relay selection scheme to select an optimal relay in multi-hop transmission. Q-learning consists of an agent, environment, state, action and reward. When the learning is converged, the agent learns the o

APA, Harvard, Vancouver, ISO, and other styles

45

Hamid, Shahzaib, Ali Nasir, and Yasir Saleem. "Reinforcement Learning Based Hierarchical Multi-Agent Robotic Search Team in Uncertain Environment." July 2021 40, no. 3 (2021): 645–62. http://dx.doi.org/10.22581/muet1982.2103.17.

Full text

Abstract:

Field of robotics has been under the limelight because of recent advances in Artificial Intelligence (AI). Due to increased diversity in multi-agent systems, new models are being developed to handle complexity of such systems. However, most of these models do not address problems such as; uncertainty handling, efficient learning, agent coordination and fault detection. This paper presents a novel approach of implementing Reinforcement Learning (RL) on hierarchical robotic search teams. The proposed algorithm handles uncertainties in the system by implementing Q-learning and depicts enhanced ef

APA, Harvard, Vancouver, ISO, and other styles

46

Avila, Cecilia, Jorge Bacca, Josep Lluis de la Rosa, Silvia Baldiris, and Ramon Fabregat. "Social Presence Approach Within the Question and Answering eLearning Model: An Experiment with a Multi-Agent System." Respuestas 17, no. 1 (2012): 27–34. http://dx.doi.org/10.22463/0122820x.415.

Full text

Abstract:

The model of Questions Answering (Q&A) for eLearning is based on collaborative learning through questions that are posed by students and their answers to that questions which are given by peers, in contrast with the classical model in which students ask questions to the teacher only. In this proposal we extend the Q&A model including the social presence concept and a quantitative measure of it is proposed; besides it is considered the evolution of the resulting Q&A social network after the inclusion of the social presence and taking into account the feedback on questions posed by s

APA, Harvard, Vancouver, ISO, and other styles

47

Du, Yihang, Ying Xu, Lei Xue, Lijia Wang, and Fan Zhang. "An Energy-Efficient Cross-Layer Routing Protocol for Cognitive Radio Networks Using Apprenticeship Deep Reinforcement Learning." Energies 12, no. 14 (2019): 2829. http://dx.doi.org/10.3390/en12142829.

Full text

Abstract:

Deep reinforcement learning (DRL) has been successfully used for the joint routing and resource management in large-scale cognitive radio networks. However, it needs lots of interactions with the environment through trial and error, which results in large energy consumption and transmission delay. In this paper, an apprenticeship learning scheme is proposed for the energy-efficient cross-layer routing design. Firstly, to guarantee energy efficiency and compress huge action space, a novel concept called dynamic adjustment rating is introduced, which regulates transmit power efficiently with mul

APA, Harvard, Vancouver, ISO, and other styles

48

Uwano, Fumito, and Keiki Takadama. "Comparison Between Reinforcement Learning Methods with Different Goal Selections in Multi-Agent Cooperation." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 5 (2017): 917–29. http://dx.doi.org/10.20965/jaciii.2017.p0917.

Full text

Abstract:

This study discusses important factors for zero communication, multi-agent cooperation by comparing different modified reinforcement learning methods. The two learning methods used for comparison were assigned different goal selections for multi-agent cooperation tasks. The first method is called Profit Minimizing Reinforcement Learning (PMRL); it forces agents to learn how to reach the farthest goal, and then the agent closest to the goal is directed to the goal. The second method is called Yielding Action Reinforcement Learning (YARL); it forces agents to learn through a Q-learning process,

APA, Harvard, Vancouver, ISO, and other styles

49

Dou, Zheng, Guangzhen Si, Yun Lin, and Meiyu Wang. "A power allocation algorithm based on cooperative Q-learning for multi-agent D2D communication networks." Physical Communication 47 (August 2021): 101370. http://dx.doi.org/10.1016/j.phycom.2021.101370.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Wen, Chao, Xinghu Yao, Yuhui Wang та Xiaoyang Tan. "SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 34, № 05 (2020): 7301–8. http://dx.doi.org/10.1609/aaai.v34i05.6223.

Full text

Abstract:

This work presents a sample efficient and effective value-based method, named SMIX(λ), for reinforcement learning in multi-agent environments (MARL) within the paradigm of centralized training with decentralized execution (CTDE), in which learning a stable and generalizable centralized value function (CVF) is crucial. To achieve this, our method carefully combines different elements, including 1) removing the unrealistic centralized greedy assumption during the learning phase, 2) using the λ-return to balance the trade-off between bias and variance and to deal with the environment's non-Markov

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!