Journal articles: 'Q-learning'

1

Watkins, Christopher J. C. H., and Peter Dayan. "Q-learning." Machine Learning 8, no. 3-4 (May 1992): 279–92. http://dx.doi.org/10.1007/bf00992698.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Clausen, C., and H. Wechsler. "Quad-Q-learning." IEEE Transactions on Neural Networks 11, no. 2 (March 2000): 279–94. http://dx.doi.org/10.1109/72.839000.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

ten Hagen, Stephan, and Ben Kr�se. "Neural Q-learning." Neural Computing & Applications 12, no. 2 (November 1, 2003): 81–88. http://dx.doi.org/10.1007/s00521-003-0369-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Wang, Yin-Hao, Tzuu-Hseng S. Li, and Chih-Jui Lin. "Backward Q-learning: The combination of Sarsa algorithm and Q-learning." Engineering Applications of Artificial Intelligence 26, no. 9 (October 2013): 2184–93. http://dx.doi.org/10.1016/j.engappai.2013.06.016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Evseenko, Alla, and Dmitrii Romannikov. "Application of Deep Q-learning and double Deep Q-learning algorithms to the task of control an inverted pendulum." Transaction of Scientific Papers of the Novosibirsk State Technical University, no. 1-2 (August 26, 2020): 7–25. http://dx.doi.org/10.17212/2307-6879-2020-1-2-7-25.

Full text

Abstract:

Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is mainly carried out to solve problems in Atari 2600 games or in other similar ones. In this article, reinforcement training will be applied to one of the dynamic objects – an inverted pendulum. As a model of this object, we consider a model of an inverted pendulum on a cart taken from the Gym library, which contains many models that are used to test and analyze reinforcement learning algorithms. The article describes the implementation and study of two algorithms from this approach, Deep Q-learning and Double Deep Q-learning. As a result, training, testing and training time graphs for each algorithm are presented, on the basis of which it is concluded that it is desirable to use the Double Deep Q-learning algorithm, because the training time is approximately 2 minutes and provides the best control for the model of an inverted pendulum on a cart.

APA, Harvard, Vancouver, ISO, and other styles

6

Abedalguni, Bilal. "Bat Q-learning Algorithm." Jordanian Journal of Computers and Information Technology 3, no. 1 (2017): 51. http://dx.doi.org/10.5455/jjcit.71-1480540385.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Zhu, Rong, and Mattia Rigotti. "Self-correcting Q-learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 11185–92. http://dx.doi.org/10.1609/aaai.v35i12.17334.

Full text

Abstract:

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.

APA, Harvard, Vancouver, ISO, and other styles

8

Borkar, Vivek S., and Siddharth Chandak. "Prospect-theoretic Q-learning." Systems & Control Letters 156 (October 2021): 105009. http://dx.doi.org/10.1016/j.sysconle.2021.105009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Ganger, Michael, and Wei Hu. "Quantum Multiple Q-Learning." International Journal of Intelligence Science 09, no. 01 (2019): 1–22. http://dx.doi.org/10.4236/ijis.2019.91001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

John, Indu, Chandramouli Kamanchi, and Shalabh Bhatnagar. "Generalized Speedy Q-Learning." IEEE Control Systems Letters 4, no. 3 (July 2020): 524–29. http://dx.doi.org/10.1109/lcsys.2020.2970555.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

HORIUCHI, Tadashi, Akinori FUJINO, Osamu KATAI, and Tetsuo SAWARAGI. "Q-PSP Learning: An Exploitation-Oriented Q-Learning Algorithm and Its Applications." Transactions of the Society of Instrument and Control Engineers 35, no. 5 (1999): 645–53. http://dx.doi.org/10.9746/sicetr1965.35.645.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Ghazanfari, Behzad, and Nasser Mozayani. "Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks." Journal of Intelligent & Fuzzy Systems 26, no. 6 (2014): 2771–83. http://dx.doi.org/10.3233/ifs-130945.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Yang, Min-Gyu, Kuk-Hyun Ahn, and Jae-Bok Song. "Tidy-up Task Planner based on Q-learning." Journal of Korea Robotics Society 16, no. 1 (February 1, 2021): 56–63. http://dx.doi.org/10.7746/jkros.2021.16.1.056.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Kim, Min-Soeng, Sun-Gi Hong, and Ju-Jang Lee. "Self-Learning Fuzzy Logic Controller using Q-Learning." Journal of Advanced Computational Intelligence and Intelligent Informatics 4, no. 5 (September 20, 2000): 349–54. http://dx.doi.org/10.20965/jaciii.2000.p0349.

Full text

Abstract:

Fuzzy logic controllers consist of if-then fuzzy rules generally adopted from a priori expert knowledge. However, it is not always easy or cheap to obtain expert knowledge. Q-learning can be used to acquire knowledge from experiences even without the model of the environment. The conventional Q-learning algorithm cannot deal with continuous states and continuous actions. However, the fuzzy logic controller can inherently receive continuous input values and generate continuous output values. Thus, in this paper, the Q-learning algorithm is incorporated into the fuzzy logic controller to compensate for each method’s disadvantages. Modified fuzzy rules are proposed in order to incorporate the Q-learning algorithm into the fuzzy logic controller. This combination results in the fuzzy logic controller that can learn through experience. Since Q-values in Q-learning are functional values of the state and the action, we cannot directly apply the conventional Q-learning algorithm to the proposed fuzzy logic controller. Interpolation is used in each modified fuzzy rule so that the Q-value is updatable.

APA, Harvard, Vancouver, ISO, and other styles

15

Moodie, Erica E. M., Nema Dean, and Yue Ru Sun. "Q-Learning: Flexible Learning About Useful Utilities." Statistics in Biosciences 6, no. 2 (September 12, 2013): 223–43. http://dx.doi.org/10.1007/s12561-013-9103-z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Hatcho, Yasuyo, Kiyohiko Hattori, and Keiki Takadama. "Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents." Journal of Advanced Computational Intelligence and Intelligent Informatics 13, no. 6 (November 20, 2009): 667–74. http://dx.doi.org/10.20965/jaciii.2009.p0667.

Full text

Abstract:

This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.

APA, Harvard, Vancouver, ISO, and other styles

17

Clifton, Jesse, and Eric Laber. "Q-Learning: Theory and Applications." Annual Review of Statistics and Its Application 7, no. 1 (March 9, 2020): 279–301. http://dx.doi.org/10.1146/annurev-statistics-031219-041220.

Full text

Abstract:

Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods widely used in statistics and artificial intelligence. In the context of personalized medicine, finite-horizon Q-learning is the workhorse for estimating optimal treatment strategies, known as treatment regimes. Infinite-horizon Q-learning is also increasingly relevant in the growing field of mobile health. In computer science, Q-learning methods have achieved remarkable performance in domains such as game-playing and robotics. In this article, we ( a) review the history of Q-learning in computer science and statistics, ( b) formalize finite-horizon Q-learning within the potential outcomes framework and discuss the inferential difficulties for which it is infamous, and ( c) review variants of infinite-horizon Q-learning and the exploration-exploitation problem, which arises in decision problems with a long time horizon. We close by discussing issues arising with the use of Q-learning in practice, including arguments for combining Q-learning with direct-search methods; sample size considerations for sequential, multiple assignment randomized trials; and possibilities for combining Q-learning with model-based methods.

APA, Harvard, Vancouver, ISO, and other styles

18

He, Ningxia. "Image Sampling Using Q-Learning." International Journal of Computer Science and Engineering 8, no. 1 (January 25, 2021): 5–12. http://dx.doi.org/10.14445/23488387/ijcse-v8i1p102.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Ganapathi Subramanian, Sriram, Matthew E. Taylor, Kate Larson, and Mark Crowley. "Multi-Agent Advisor Q-Learning." Journal of Artificial Intelligence Research 74 (May 5, 2022): 1–74. http://dx.doi.org/10.1613/jair.1.13445.

Full text

Abstract:

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.

APA, Harvard, Vancouver, ISO, and other styles

20

Hu, Yuepeng, Lehan Yang, and Yizhu Lou. "Path Planning with Q-Learning." Journal of Physics: Conference Series 1948, no. 1 (June 1, 2021): 012038. http://dx.doi.org/10.1088/1742-6596/1948/1/012038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Sarigül, Mehmet, and Mutlu Avci. "Q LEARNING REGRESSION NEURAL NETWORK." Neural Network World 28, no. 5 (2018): 415–31. http://dx.doi.org/10.14311/nnw.2018.28.023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Kamanchi, Chandramouli, Raghuram Bharadwaj Diddigi, and Shalabh Bhatnagar. "Successive Over-Relaxation ${Q}$ -Learning." IEEE Control Systems Letters 4, no. 1 (January 2020): 55–60. http://dx.doi.org/10.1109/lcsys.2019.2921158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Patnaik, Srikanta, and N. P. Mahalik. "Multiagent coordination utilising Q-learning." International Journal of Automation and Control 1, no. 4 (2007): 377. http://dx.doi.org/10.1504/ijaac.2007.015863.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Lecué, Guillaume, and Philippe Rigollet. "Optimal learning with Q-aggregation." Annals of Statistics 42, no. 1 (February 2014): 211–24. http://dx.doi.org/10.1214/13-aos1190.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Ahmadabadi, M. N., and M. Asadpour. "Expertness based cooperative Q-learning." IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 32, no. 1 (2002): 66–76. http://dx.doi.org/10.1109/3477.979961.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Linn, Kristin A., Eric B. Laber, and Leonard A. Stefanski. "Interactive Q-Learning for Quantiles." Journal of the American Statistical Association 112, no. 518 (March 31, 2017): 638–49. http://dx.doi.org/10.1080/01621459.2016.1155993.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Goldberg, Yair, and Michael R. Kosorok. "Q-learning with censored data." Annals of Statistics 40, no. 1 (February 2012): 529–60. http://dx.doi.org/10.1214/12-aos968.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Peng, Jing, and Ronald J. Williams. "Incremental multi-step Q-learning." Machine Learning 22, no. 1-3 (1996): 283–90. http://dx.doi.org/10.1007/bf00114731.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

HOSOYA, Yu, and Motohide UMANO. "Improvement of Updating Method of Q Values in Fuzzy Q-Learning." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 27, no. 6 (2015): 942–48. http://dx.doi.org/10.3156/jsoft.27.942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Duryea, Ethan, Michael Ganger, and Wei Hu. "Exploring Deep Reinforcement Learning with Multi Q-Learning." Intelligent Control and Automation 07, no. 04 (2016): 129–44. http://dx.doi.org/10.4236/ica.2016.74012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Hwang, Kao-Shing, Wei-Cheng Jiang, and Yu-Jen Chen. "ADAPTIVE MODEL LEARNING BASED ON DYNA-Q LEARNING." Cybernetics and Systems 44, no. 8 (November 17, 2013): 641–62. http://dx.doi.org/10.1080/01969722.2013.803387.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

da Costa, Luis Antonio L. F., Rafael Kunst, and Edison Pignaton de Freitas. "Q-FANET: Improved Q-learning based routing protocol for FANETs." Computer Networks 198 (October 2021): 108379. http://dx.doi.org/10.1016/j.comnet.2021.108379.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Meng, Xiao-Li. "Discussion: The Q-q Dynamic for Deeper Learning and Research." International Statistical Review 84, no. 2 (December 16, 2015): 181–89. http://dx.doi.org/10.1111/insr.12151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Guo, Yanqin. "Enhancing Flappy Bird Performance With Q-Learning and DQN Strategies." Highlights in Science, Engineering and Technology 85 (March 13, 2024): 396–402. http://dx.doi.org/10.54097/qrded191.

Full text

Abstract:

Flappy Bird, a classic single-player game, boasts a deceptively simple premise yet proves to be a formidable challenge in achieving high scores. Various algorithms have been employed to improve its performance, yet a comprehensive assessment of Q-Learning and Deep Q-Network (DQN) in the context of this game remains elusive. This study undertakes the task of training Flappy Bird using both Q-Learning and DQN methodologies, showcasing the potency of reinforcement learning within the realm of gaming. Through meticulous comparisons and analyses, the paper uncovers the inherent strengths and weaknesses embedded within these algorithms. This exploration not only fosters a nuanced grasp of Q-Learning and DQN but does so by leveraging a simplistic gaming environment as the proving ground. Strikingly, the experimental results unveil an initial disadvantage for DQN during training, followed by a rapid surge in performance surpassing Q-Learning in mid-training. Conversely, Q-Learning demonstrates an aptitude for swiftly reaching its performance zenith. Both algorithms tout distinct merits: Q-Learning's adeptness in simpler tasks and DQN's reliability in tackling complex states. In conclusion, this study not only discerns algorithmic prowess but lays a foundational framework for broader application across diverse gaming scenarios. By delving into the nuances of Q-Learning and DQN, the paper establishes a clearer path for harnessing the advantages in shaping the future landscape of game optimization.

APA, Harvard, Vancouver, ISO, and other styles

35

D'Orazio, Tiziana, and Grazia Cicirelli. "Q-Learning: computation of optimal Q-values for evaluating the learning level in robotic tasks." Journal of Experimental & Theoretical Artificial Intelligence 13, no. 3 (July 2001): 241–70. http://dx.doi.org/10.1080/09528130110063100.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

古, 彭. "Improvement and Implementation of Q-Learning Algorithm." Computer Science and Application 11, no. 07 (2021): 1994–2007. http://dx.doi.org/10.12677/csa.2021.117204.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Wei-Kai Sun, Wei-Kai Sun, Xiao-Mei Wang Wei-Kai Sun, Bin Wang Xiao-Mei Wang, Jia-Sen Zhang Bin Wang, and Hai-Yang Du Jia-Sen Zhang. "MR-SFAMA-Q: A MAC Protocol based on Q-Learning for Underwater Acoustic Sensor Networks." 電腦學刊 35, no. 1 (February 2024): 051–63. http://dx.doi.org/10.53106/199115992024023501004.

Full text

Abstract:

<p>In recent years, with the rapid development of science and technology, many new technologies have made people’s exploration of the ocean deeper and deeper, and due to the requirements of national defense and marine development, the underwater acoustic sensor network (UASN) has been paid more and more attention. Nevertheless, the underwater acoustic channel has the properties of considerable propagation delay, limited bandwidth, and unstable network topology. In order to improve the performance of the medium access control (MAC) protocol in UASN, we propose a new MAC protocol based on the Slotted-FAMA of Multiple Reception (MR-SFAMA) protocol. The protocol uses the Q-Learning algorithm to optimize the multi-receiver handshake mechanism. The current state is judged according to the received node request, and the Q-table is established. Through the multi-round interaction between the node and the environment, the Q-table is continuously updated to obtain the optimal strategy and determine the optimal data transmission scheduling scheme. The reward function is set according to the total back-off time and frame error rate, which can reduce the packet loss rate during network data transmission while reducing the delay. In addition, the matching asynchronous operation and uniform random back-off algorithm are used to solve the problem of long channel idle time and low channel utilization. This new protocol can be well applied to unstable network topology. The simulation results show that the protocol performs better than Slotted-FAMA and MR-SFAMA regarding delay and normalized throughput.</p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

38

Liu, Peiyi. "Q-Learning: Applications and Convergence Rate Optimization." Highlights in Science, Engineering and Technology 63 (August 8, 2023): 210–15. http://dx.doi.org/10.54097/hset.v63i.10878.

Full text

Abstract:

As an important algorithm of artificial intelligence technology, Q-learning algorithm plays a significant part in a number of fields, such as driverless technology, industrial automation, health care, intelligent search, game, etc. As a classical learning algorithm in reinforcement learning, with the help of an experienced action sequence in a Markov environment, an agent can learn to select the best course of action using the model-free learning technique known as Q-learning. This paper mainly discusses the addition of received signal strength (RSS) to the Q-learning algorithm to navigate unmanned aerial vehicle (UAV), summarizes the main content and results of the neural Q-learning algorithm helping UAV avoid obstacles, adaptive and random exploration (ARE) method is proposed to address the issue in UAV planning a route tasks, summarizes the content and results of route designing of moving robot using obstacle characteristics as Q-learning states and actions, the Q-learning algorithm employs a novel exploration technique that combines ε-rapacious exploring with Boltzmann theory. to help mobile robot to plan its path, and analyzes the convergence speed of the algorithm for planning a route of stage Q-learning and the path planning algorithm of traditional Q-learning. When there are many states and actions, the operation efficiency of the Q-learning algorithm will be greatly reduced, so it is necessary to study in depth how to reduce the operation time of the algorithm for Q-learning and increase the convergence velocity of algorithm for Q-learning.

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, Bo-Wei, Shih-Hung Yang, Yu-Chun Lo, Ching-Fu Wang, Han-Lin Wang, Chen-Yang Hsu, Yun-Ting Kuo, et al. "Enhancement of Hippocampal Spatial Decoding Using a Dynamic Q-Learning Method With a Relative Reward Using Theta Phase Precession." International Journal of Neural Systems 30, no. 09 (August 12, 2020): 2050048. http://dx.doi.org/10.1142/s0129065720500483.

Full text

Abstract:

Hippocampal place cells and interneurons in mammals have stable place fields and theta phase precession profiles that encode spatial environmental information. Hippocampal CA1 neurons can represent the animal’s location and prospective information about the goal location. Reinforcement learning (RL) algorithms such as Q-learning have been used to build the navigation models. However, the traditional Q-learning ([Formula: see text]Q-learning) limits the reward function once the animals arrive at the goal location, leading to unsatisfactory location accuracy and convergence rates. Therefore, we proposed a revised version of the Q-learning algorithm, dynamical Q-learning ([Formula: see text]Q-learning), which assigns the reward function adaptively to improve the decoding performance. Firing rate was the input of the neural network of [Formula: see text]Q-learning and was used to predict the movement direction. On the other hand, phase precession was the input of the reward function to update the weights of [Formula: see text]Q-learning. Trajectory predictions using [Formula: see text]Q- and [Formula: see text]Q-learning were compared by the root mean squared error (RMSE) between the actual and predicted rat trajectories. Using [Formula: see text]Q-learning, significantly higher prediction accuracy and faster convergence rate were obtained compared with [Formula: see text]Q-learning in all cell types. Moreover, combining place cells and interneurons with theta phase precession improved the convergence rate and prediction accuracy. The proposed [Formula: see text]Q-learning algorithm is a quick and more accurate method to perform trajectory reconstruction and prediction.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhang, Chunyuan, Qi Song, and Zeng Meng. "Minibatch Recursive Least Squares Q-Learning." Computational Intelligence and Neuroscience 2021 (October 8, 2021): 1–9. http://dx.doi.org/10.1155/2021/5370281.

Full text

Abstract:

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

APA, Harvard, Vancouver, ISO, and other styles

41

Shin, YongWoo. "Q-learning to improve learning speed using Minimax algorithm." Journal of Korea Game Society 18, no. 4 (August 31, 2018): 99–106. http://dx.doi.org/10.7583/jkgs.2018.18.4.99.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Full text

Abstract:

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.

APA, Harvard, Vancouver, ISO, and other styles

43

Charypar, David, and Kai Nagel. "Q-Learning for Flexible Learning of Daily Activity Plans." Transportation Research Record: Journal of the Transportation Research Board 1935, no. 1 (January 2005): 163–69. http://dx.doi.org/10.1177/0361198105193500119.

Full text

Abstract:

Q-learning is a method from artificial intelligence to solve the reinforcement learning problem (RLP), defined as follows. An agent is faced with a set of states, S. For each state s there is a set of actions, A( s), that the agent can take and that takes the agent (deterministically or stochastically) to another state. For each state the agent receives a (possibly stochastic) reward. The task is to select actions such that the reward is maximized. Activity generation is for demand generation in the context of transportation simulation. For each member of a synthetic population, a daily activity plan stating a sequence of activities (e.g., home-work-shop-home), including locations and times, needs to be found. Activities at different locations generate demand for transportation. Activity generation can be modeled as an RLP with the states given by the triple (type of activity, starting time of activity, time already spent at activity). The possible actions are either to stay at a given activity or to move to another activity. Rewards are given as “utility per time slice,” which corresponds to a coarse version of marginal utility. Q-learning has the property that, by repeating similar experiences over and over again, the agent looks forward in time; that is, the agent can also go on paths through state space in which high rewards are given only at the end. This paper presents computational results with such an algorithm for daily activity planning.

APA, Harvard, Vancouver, ISO, and other styles

44

Tan, Chunxi, Ruijian Han, Rougang Ye, and Kani Chen. "Adaptive Learning Recommendation Strategy Based on Deep Q-learning." Applied Psychological Measurement 44, no. 4 (July 25, 2019): 251–66. http://dx.doi.org/10.1177/0146621619858674.

Full text

Abstract:

Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s learning efficiency. This article proposes a novel adaptive recommendation strategy under the framework of reinforcement learning. The proposed strategy is realized by the deep Q-learning algorithms, which are the techniques that contributed to the success of AlphaGo Zero to achieve the super-human level in playing the game of go. The proposed algorithm incorporates an early stopping to account for the possibility that learners may choose to stop learning. It can properly deal with missing data and can handle more individual-specific features for better recommendations. The recommendation strategy guides individual learners with efficient learning paths that vary from person to person. The authors showcase concrete examples with numeric analysis of substantive learning scenarios to further demonstrate the power of the proposed method.

APA, Harvard, Vancouver, ISO, and other styles

45

Gokul, Vignesh, Parinitha Kannan, Sharath Kumar, and Shomona Gracia. "Deep Q-Learning for Home Automation." International Journal of Computer Applications 152, no. 6 (October 17, 2016): 1–5. http://dx.doi.org/10.5120/ijca2016911873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

NOTSU, Akira, and Katsuhiro HONDA. "Discounted UCB1-tuned for Q-Learning." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 26, no. 6 (2014): 913–23. http://dx.doi.org/10.3156/jsoft.26.913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Hu, Wei, and James Hu. "Q Learning with Quantum Neural Networks." Natural Science 11, no. 01 (2019): 31–39. http://dx.doi.org/10.4236/ns.2019.111005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Zheng, Zhang, Ji-Hoon Seung, Tae-Yeong Kim, and Kil-To Chong. "Traffic Control using Q-Learning Algorithm." Journal of the Korea Academia-Industrial cooperation Society 12, no. 11 (November 30, 2011): 5135–42. http://dx.doi.org/10.5762/kais.2011.12.11.5135.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Jingchen, Gongjun Xu, and Zhiliang Ying. "Theory of self-learning $Q$-matrix." Bernoulli 19, no. 5A (November 2013): 1790–817. http://dx.doi.org/10.3150/12-bej430.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Ma, Yu chien (Calvin), Zoe Wang, and Alexander Fleiss. "Deep Q-Learning for Trading Cryptocurrency." Journal of Financial Data Science 3, no. 3 (June 8, 2021): 121–27. http://dx.doi.org/10.3905/jfds.2021.1.064.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Q-learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles