Auswahl der wissenschaftlichen Literatur zum Thema „Q-learning“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Q-learning" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Q-learning"

1

Watkins, Christopher J. C. H., und Peter Dayan. „Q-learning“. Machine Learning 8, Nr. 3-4 (Mai 1992): 279–92. http://dx.doi.org/10.1007/bf00992698.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Clausen, C., und H. Wechsler. „Quad-Q-learning“. IEEE Transactions on Neural Networks 11, Nr. 2 (März 2000): 279–94. http://dx.doi.org/10.1109/72.839000.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

ten Hagen, Stephan, und Ben Kr�se. „Neural Q-learning“. Neural Computing & Applications 12, Nr. 2 (01.11.2003): 81–88. http://dx.doi.org/10.1007/s00521-003-0369-9.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Wang, Yin-Hao, Tzuu-Hseng S. Li und Chih-Jui Lin. „Backward Q-learning: The combination of Sarsa algorithm and Q-learning“. Engineering Applications of Artificial Intelligence 26, Nr. 9 (Oktober 2013): 2184–93. http://dx.doi.org/10.1016/j.engappai.2013.06.016.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Evseenko, Alla, und Dmitrii Romannikov. „Application of Deep Q-learning and double Deep Q-learning algorithms to the task of control an inverted pendulum“. Transaction of Scientific Papers of the Novosibirsk State Technical University, Nr. 1-2 (26.08.2020): 7–25. http://dx.doi.org/10.17212/2307-6879-2020-1-2-7-25.

Der volle Inhalt der Quelle
Annotation:
Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is mainly carried out to solve problems in Atari 2600 games or in other similar ones. In this article, reinforcement training will be applied to one of the dynamic objects – an inverted pendulum. As a model of this object, we consider a model of an inverted pendulum on a cart taken from the Gym library, which contains many models that are used to test and analyze reinforcement learning algorithms. The article describes the implementation and study of two algorithms from this approach, Deep Q-learning and Double Deep Q-learning. As a result, training, testing and training time graphs for each algorithm are presented, on the basis of which it is concluded that it is desirable to use the Double Deep Q-learning algorithm, because the training time is approximately 2 minutes and provides the best control for the model of an inverted pendulum on a cart.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Abedalguni, Bilal. „Bat Q-learning Algorithm“. Jordanian Journal of Computers and Information Technology 3, Nr. 1 (2017): 51. http://dx.doi.org/10.5455/jjcit.71-1480540385.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Zhu, Rong, und Mattia Rigotti. „Self-correcting Q-learning“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 12 (18.05.2021): 11185–92. http://dx.doi.org/10.1609/aaai.v35i12.17334.

Der volle Inhalt der Quelle
Annotation:
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Borkar, Vivek S., und Siddharth Chandak. „Prospect-theoretic Q-learning“. Systems & Control Letters 156 (Oktober 2021): 105009. http://dx.doi.org/10.1016/j.sysconle.2021.105009.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Ganger, Michael, und Wei Hu. „Quantum Multiple Q-Learning“. International Journal of Intelligence Science 09, Nr. 01 (2019): 1–22. http://dx.doi.org/10.4236/ijis.2019.91001.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

John, Indu, Chandramouli Kamanchi und Shalabh Bhatnagar. „Generalized Speedy Q-Learning“. IEEE Control Systems Letters 4, Nr. 3 (Juli 2020): 524–29. http://dx.doi.org/10.1109/lcsys.2020.2970555.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Mehr Quellen

Dissertationen zum Thema "Q-learning"

1

Gaskett, Chris, und cgaskett@it jcu edu au. „Q-Learning for Robot Control“. The Australian National University. Research School of Information Sciences and Engineering, 2002. http://thesis.anu.edu.au./public/adt-ANU20041108.192425.

Der volle Inhalt der Quelle
Annotation:
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Gaskett, Chris. „Q-Learning for robot control“. View thesis entry in Australian Digital Theses Program, 2002. http://eprints.jcu.edu.au/623/1/gaskettthesis.pdf.

Der volle Inhalt der Quelle
Annotation:
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Laivamaa, J. (Juuso). „Reinforcement Q-Learning using OpenAI Gym“. Bachelor's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201903151329.

Der volle Inhalt der Quelle
Annotation:
Abstract. Q-Learning is an off-policy algorithm for reinforcement learning, that can be used to find optimal policies in Markovian domains. This thesis is about how Q-Learning can be applied to a test environment in the OpenAI Gym toolkit. The utility of testing the algorithm on a problem case is to find out how well it performs as well proving the practical utility of the algorithm. This thesis starts off with a general overview of reinforcement learning as well as the Markov decision process, both of which are crucial in understanding the theoretical groundwork that Q-Learning is based on. After that we move on to discussing the Q-Learning technique itself and dissect the algorithm in detail. We also go over OpenAI Gym toolkit and how it can be used to test the algorithm’s functionality. Finally, we introduce the problem case and apply the algorithm to solve it and analyse the results. The reasoning for this thesis is the rise of reinforcement learning and its increasing relevance in the future as technological progress allows for more and more complex and sophisticated applications of machine learning and artificial intelligence.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Del, Ben Enrico <1997&gt. „Reinforcement Learning: a Q-Learning Algorithm for High Frequency Trading“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2021. http://hdl.handle.net/10579/20411.

Der volle Inhalt der Quelle
Annotation:
The scope of this work is to test the implementation of an automated trading system based on Reinforcement Learning: a machine learning algorithm in which an intelligent agent acts to maximize its rewards given the environment around it. Indeed, given the environmental inputs and the environmental responses to the actions taken, the agent will learn how to behave in best way possible. In particular, in this work, a Q-Learning algorithm has been used to produce trading signals on the basis of high frequency data of the Limit Order Book for some selected stocks.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Karlsson, Daniel. „Hyperparameter optimisation using Q-learning based algorithms“. Thesis, Karlstads universitet, Fakulteten för hälsa, natur- och teknikvetenskap (from 2013), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-78096.

Der volle Inhalt der Quelle
Annotation:
Machine learning algorithms have many applications, both for academic and industrial purposes. Examples of applications are classification of diffraction patterns in materials science and classification of properties in chemical compounds within the pharmaceutical industry. For these algorithms to be successful they need to be optimised,  part of this is achieved by training the algorithm, but there are components of the algorithms that cannot be trained. These hyperparameters have to be tuned separately. The focus of this work was optimisation of hyperparameters in classification algorithms based on convolutional neural networks. The purpose of this thesis was to investigate the possibility of using reinforcement learning algorithms, primarily Q-learning, as the optimising algorithm.  Three different algorithms were investigated, Q-learning, double Q-learning and a Q-learning inspired algorithm, which was designed during this work. The algorithms were evaluated on different problems and compared to a random search algorithm, which is one of the most common optimisation tools for this type of problem. All three algorithms were capable of some learning, however the Q-learning inspired algorithm was the only one to outperform the random search algorithm on the test problems.  Further, an iterative scheme of the Q-learning inspired algorithm was implemented, where the algorithm was allowed to refine the search space available to it. This showed further improvements of the algorithms performance and the results indicate that similar performance to the random search may be achieved in a shorter period of time, sometimes reducing the computational time by up to 40%.
Maskininlärningsalgoritmer har många tillämpningsområden, både akademiska och inom industrin. Exempel på tillämpningar är, klassificering av diffraktionsmönster inom materialvetenskap och klassificering av egenskaper hos kemiska sammansättningar inom läkemedelsindustrin. För att dessa algoritmer ska prestera bra behöver de optimeras. En del av optimering sker vid träning av algoritmerna, men det finns komponenter som inte kan tränas. Dessa hyperparametrar måste justeras separat. Fokuset för det här arbetet var optimering av hyperparametrar till klassificeringsalgoritmer baserade på faltande neurala nätverk. Syftet med avhandlingen var att undersöka möjligheterna att använda förstärkningsinlärningsalgoritmer, främst ''Q-learning'', som den optimerande algoritmen.  Tre olika algoritmer undersöktes, ''Q-learning'', dubbel ''Q-learning'' samt en algoritm inspirerad av ''Q-learning'', denna utvecklades under arbetets gång. Algoritmerna utvärderades på olika testproblem och jämfördes mot resultat uppnådda med en slumpmässig sökning av hyperparameterrymden, vilket är en av de vanligare metoderna för att optimera den här typen av algoritmer. Alla tre algoritmer påvisade någon form av inlärning, men endast den ''Q-learning'' inspirerade algoritmen presterade bättre än den slumpmässiga sökningen.  En iterativ implemetation av den ''Q-learning'' inspirerade algoritmen utvecklades också. Den iterativa metoden tillät den tillgängliga hyperparameterrymden att förfinas mellan varje iteration. Detta medförde ytterligare förbättringar av resultaten som indikerade att beräkningstiden i vissa fall kunde minskas med upp till 40% jämfört med den slumpmässiga sökningen med bibehållet eller förbättrat resultat.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Finnman, Peter, und Max Winberg. „Deep reinforcement learning compared with Q-table learning applied to backgammon“. Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186545.

Der volle Inhalt der Quelle
Annotation:
Reinforcement learning attempts to mimic how humans react to their surrounding environment by giving feedback to software agents based on the actions they take. To test the capabilities of these agents, researches have long regarded board games as a powerful tool. This thesis compares two approaches to reinforcement learning in the board game backgammon, a Q-table and a deep reinforcement network. It was determined which approach surpassed the other in terms of accuracy and convergence rate towards the perceived optimal strategy. The evaluation is performed by training the agents using the self-learning approach. After variable amounts of training sessions, the agents are benchmarked against each other and a third, random agent. The results derived from the study indicate that the convergence rate of the deep learning agent is far superior to that of the Q-table agent. However, the results also indicate that the accuracy of Q-tables is greater than that of deep learning once the former has mapped the environment.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Patel, Purvag. „Improving Computer Game Bots' behavior using Q-Learning“. Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1966544161&sid=3&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Burkov, Andriy. „Adaptive Dynamics Learning and Q-initialization in the Context of Multiagent Learning“. Thesis, Université Laval, 2007. http://www.theses.ulaval.ca/2007/24476/24476.pdf.

Der volle Inhalt der Quelle
Annotation:
L’apprentissage multiagent est une direction prometteuse de la recherche récente et à venir dans le contexte des systèmes intelligents. Si le cas mono-agent a été beaucoup étudié pendant les deux dernières décennies, le cas multiagent a été peu étudié vu sa complexité. Lorsque plusieurs agents autonomes apprennent et agissent simultanément, l’environnement devient strictement imprévisible et toutes les suppositions qui sont faites dans le cas mono-agent, telles que la stationnarité et la propriété markovienne, s’avèrent souvent inapplicables dans le contexte multiagent. Dans ce travail de maîtrise nous étudions ce qui a été fait dans ce domaine de recherches jusqu’ici, et proposons une approche originale à l’apprentissage multiagent en présence d’agents adaptatifs. Nous expliquons pourquoi une telle approche donne les résultats prometteurs lorsqu’on la compare aux différentes autres approches existantes. Il convient de noter que l’un des problèmes les plus ardus des algorithmes modernes d’apprentissage multiagent réside dans leur complexité computationnelle qui est fort élevée. Ceci est dû au fait que la taille de l’espace d’états du problème multiagent est exponentiel en le nombre d’agents qui agissent dans cet environnement. Dans ce travail, nous proposons une nouvelle approche de la réduction de la complexité de l’apprentissage par renforcement multiagent. Une telle approche permet de réduire de manière significative la partie de l’espace d’états visitée par les agents pour apprendre une solution efficace. Nous évaluons ensuite nos algorithmes sur un ensemble d’essais empiriques et présentons des résultats théoriques préliminaires qui ne sont qu’une première étape pour former une base de la validité de nos approches de l’apprentissage multiagent.
Multiagent learning is a promising direction of the modern and future research in the context of intelligent systems. While the single-agent case has been well studied in the last two decades, the multiagent case has not been broadly studied due to its complex- ity. When several autonomous agents learn and act simultaneously, the environment becomes strictly unpredictable and all assumptions that are made in single-agent case, such as stationarity and the Markovian property, often do not hold in the multiagent context. In this Master’s work we study what has been done in this research field, and propose an original approach to multiagent learning in presence of adaptive agents. We explain why such an approach gives promising results by comparing it with other different existing approaches. It is important to note that one of the most challenging problems of all multiagent learning algorithms is their high computational complexity. This is due to the fact that the state space size of multiagent problem is exponential in the number of agents acting in the environment. In this work we propose a novel approach to the complexity reduction of the multiagent reinforcement learning. Such an approach permits to significantly reduce the part of the state space needed to be visited by the agents to learn an efficient solution. Then we evaluate our algorithms on a set of empirical tests and give a preliminary theoretical result, which is first step in forming the basis of validity of our approaches to multiagent learning.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Cunningham, Bryan. „Non-Reciprocating Sharing Methods in Cooperative Q-Learning Environments“. Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34610.

Der volle Inhalt der Quelle
Annotation:
Past research on multi-agent simulation with cooperative reinforcement learning (RL) for homogeneous agents focuses on developing sharing strategies that are adopted and used by all agents in the environment. These sharing strategies are considered to be reciprocating because all participating agents have a predefined agreement regarding what type of information is shared, when it is shared, and how the participating agent's policies are subsequently updated. The sharing strategies are specifically designed around manipulating this shared information to improve learning performance. This thesis targets situations where the assumption of a single sharing strategy that is employed by all agents is not valid. This work seeks to address how agents with no predetermined sharing partners can exploit groups of cooperatively learning agents to improve learning performance when compared to Independent learning. Specifically, several intra-agent methods are proposed that do not assume a reciprocating sharing relationship and leverage the pre-existing agent interface associated with Q-Learning to expedite learning. The other agents' functions and their sharing strategies are unknown and inaccessible from the point of view of the agent(s) using the proposed methods. The proposed methods are evaluated on physically embodied agents in the multi-agent cooperative robotics field learning a navigation task via simulation. The experiments conducted focus on the effects of the following factors on the performance of the proposed non-reciprocating methods: scaling the number of agents in the environment, limiting the communication range of the agents, and scaling the size of the environment.
Master of Science
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Andersson, Gabriel, und Martti Yap. „Att spela 'Breakout' med hjälp av 'Deep Q-Learning'“. Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255799.

Der volle Inhalt der Quelle
Annotation:
I denna rapport implementerar vi en reinforcement learning (RL) algoritm som lär sig spela Breakout på 'Atari Learning Environment'. Den dator drivna spelaren (Agenten) har tillgång till samma information som en mänsklig spelare och vet inget om spelet och dess regler på förhand. Målet är att reproducera tidigare resultat genom att optimera agenten så att den överträffar den typiska mänskliga medelpoängen. För att genomföra detta formaliserar vi problemet som en 'Markov decision Process'. VI applicerar 'Deep Q-learning' algoritmen med 'action masking' för att uppnå en optimal strategi. Vi finner att vår agents genomsnittliga poäng ligger lite under den mänskliga: med 20 poäng som medel och ungefär 65% av den mänskliga motsvarigheten. Vi diskuterar en del möjliga implementationer och förbättringar som kan appliceras i framtida forskningsprojekt.
We cover in this report the implementation of a reinforcement learning (RL) algorithm capable of learning how to play the game 'Breakout' on the Atari Learning Environment (ALE). The non-human player (agent) is given no prior information of the game and must learn from the same sensory input that a human would typically receive when playing the game. The aim is to reproduce previous results by optimizing the agent driven control of 'Breakout' so as to surpass a typical human score. To this end, the problem is formalized by modeling it as a Markov Decision Process. We apply the celebrated Deep Q-Learning algorithm with action masking to achieve an optimal strategy. We find our agent's average score to be just below the human benchmarks: achieving an average score of 20, approximately 65% of the human counterpart. We discuss a number of implementations that boosted agent performance, as well as further techniques that could lead to improvements in the future.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Mehr Quellen

Bücher zum Thema "Q-learning"

1

Wiederhold, Chuck. The Q-matrix: Cooperative learning and critical thinking. San Juan Capistrano, CA: Kagan Cooperative Learning, 1995.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

author, Pereira Penny, und Health Foundation (Great Britain), Hrsg. Building Q: Learning from designing a large scale improvement community. London: The Health Foundation, 2016.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

National Advisory Council for Education and Training Targets. New national Learning targets for England for 2002: Q&A document on national, regional and local implementation. Sudbury: Department for Education and Employment, 2002.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Kimple, James A. Eye Q and the efficient learner. Santa Ana, Calif: Optometric Extension Program Foundation, Inc., 1997.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Lacey, Greg. Q&A modern world history. London: Letts Educational, 1999.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

ʻAzzāwī, ʻAbd al-Raḥmān Ḥusayn. al- Manhajīyah al-tārīkhīyah fī al-ʻIrāq ilá q. 4 H. /10 M. Baghdād: Dār al-Shuʾūn al-Thaqāfīyah al-ʻĀmmah "Āfāq ʻArabīyah", 1988.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Morrison, Liz. Project Manager: Q Learning (Q Learning S.). Hodder & Stoughton, 2003.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Rimm, Sylvia B. Learning Leads Q-Cards. Apple Pub Co, 1990.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Psaris, Nick. Fun Q: A Functional Introduction to Machine Learning in Q. Vector SIGMA, 2020.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Habib, Nazia. Hands-On Q-Learning with Python: Practical Q-Learning with OpenAI Gym, Keras, and TensorFlow. Packt Publishing, Limited, 2019.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Mehr Quellen

Buchteile zum Thema "Q-learning"

1

Stone, Peter, Xin Jin, Jiawei Han, Sanjay Jain und Frank Stephan. „Q-Learning“. In Encyclopedia of Machine Learning, 819. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_683.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Stone, Peter. „Q-Learning“. In Encyclopedia of Machine Learning and Data Mining, 1. Boston, MA: Springer US, 2014. http://dx.doi.org/10.1007/978-1-4899-7502-7_689-1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Stone, Peter. „Q-Learning“. In Encyclopedia of Machine Learning and Data Mining, 1033. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_689.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Li, Jinna, Frank L. Lewis und Jialu Fan. „Interleaved Q-Learning“. In Reinforcement Learning, 155–83. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-28394-9_6.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Sengupta, Nandita, und Jaya Sil. „Q-Learning Classifier“. In Intrusion Detection, 83–111. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-2716-6_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Hu, Zhihui, Yubin Jiang, Xinghong Ling und Quan Liu. „Accurate Q-Learning“. In Neural Information Processing, 560–70. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-04182-3_49.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Goldberg, Yair, Rui Song und Michael R. Kosorok. „Adaptive Q-learning“. In Institute of Mathematical Statistics Collections, 150–62. Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2013. http://dx.doi.org/10.1214/12-imscoll911.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Stanko, Silvestr, und Karel Macek. „CVaR Q-Learning“. In Studies in Computational Intelligence, 333–58. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-70594-7_14.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Sanghi, Nimish. „Deep Q-Learning“. In Deep Reinforcement Learning with Python, 155–206. Berkeley, CA: Apress, 2021. http://dx.doi.org/10.1007/978-1-4842-6809-4_6.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Liu, Mark. „Deep Q-Learning“. In Machine Learning, Animated, 322–38. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/b23383-17.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Q-learning"

1

Kantasewi, Nitchakun, Sanparith Marukatat, Somying Thainimit und Okumura Manabu. „Multi Q-Table Q-Learning“. In 2019 10th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES). IEEE, 2019. http://dx.doi.org/10.1109/ictemsys.2019.8695963.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Lu, Fan, Prashant G. Mehta, Sean P. Meyn und Gergely Neu. „Convex Q-Learning“. In 2021 American Control Conference (ACC). IEEE, 2021. http://dx.doi.org/10.23919/acc50511.2021.9483244.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Reid, Cameron, und Snehasis Mukhopadhyay. „Mutual Q-learning“. In 2020 3rd International Conference on Control and Robots (ICCR). IEEE, 2020. http://dx.doi.org/10.1109/iccr51572.2020.9344374.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Zhang, Zongzhang, Zhiyuan Pan und Mykel J. Kochenderfer. „Weighted Double Q-learning“. In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/483.

Der volle Inhalt der Quelle
Annotation:
Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Schilperoort, Jits, Ivar Mak, Madalina M. Drugan und Marco A. Wiering. „Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants“. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2018. http://dx.doi.org/10.1109/ssci.2018.8628782.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Nguyen, Thanh, und Snehasis Mukhopadhyay. „Selectively decentralized Q-learning“. In 2017 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE, 2017. http://dx.doi.org/10.1109/smc.2017.8122624.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Pandey, Punit, und Deepshikha Pandey. „Reduct based Q-learning“. In the 2011 International Conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/1947940.1948001.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Kok, Jelle R., und Nikos Vlassis. „Sparse cooperative Q-learning“. In Twenty-first international conference. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1015330.1015410.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Szepesvári, Csaba, und William D. Smart. „Interpolation-based Q-learning“. In Twenty-first international conference. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1015330.1015445.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Edwards, Ashley, und William M. Pottenger. „Higher order Q-Learning“. In 2011 Ieee Symposium On Adaptive Dynamic Programming And Reinforcement Learning. IEEE, 2011. http://dx.doi.org/10.1109/adprl.2011.5967385.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Berichte der Organisationen zum Thema "Q-learning"

1

Martinson, Eric, Alexander Stoytchev und Ronald Arkin. Robot Behavioral Selection Using Q-learning. Fort Belvoir, VA: Defense Technical Information Center, Januar 2002. http://dx.doi.org/10.21236/ada640010.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Goodrich, Michael A., und Morgan Quigley. Satisficing Q-Learning: Efficient Learning in Problems With Dichotomous Attributes. Fort Belvoir, VA: Defense Technical Information Center, Januar 2004. http://dx.doi.org/10.21236/ada451568.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Ceren, Roi, Prashant Doshi, Matthew Meisel, Adam Goodie und Dan Hall. Behaviorally Modeling Games of Strategy Using Descriptive Q-learning. Fort Belvoir, VA: Defense Technical Information Center, Januar 2013. http://dx.doi.org/10.21236/ada575140.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Oakley, Louise. K4D International Nature Learning Journey Summary. Institute of Development Studies, September 2022. http://dx.doi.org/10.19088/k4d.2022.129.

Der volle Inhalt der Quelle
Annotation:
The International Nature Learning Journey was developed to support FCDO and other government departments’ understanding, capacity and influence related to nature, particularly in the run-up to COP-26. A series of on-line seminars took place between May and August 2021 which involved an expert speaker on each topic, followed by a case study to provide practical illustrations, and a facilitated Q&A with participants. Each session was chaired by an expert facilitator. Participants included advisors from across several government departments, including FCDO, Defra, BEIS and Treasury, with approximately 150 participants joining each session.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Aydin, Orhun. Deep Q-Learning Framework for Quantitative Climate Change Adaptation Policy for Florida Road Network due to Extreme Precipitation. Purdue University, Oktober 2023. http://dx.doi.org/10.5703/1288284317673.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

García Ferro, Luz Ángela, Elba N. Luna, Lorena Rodríguez, Micha Van Waesberghe und Darinka Vásquez Jordán. Peer Assist. Inter-American Development Bank, Juni 2012. http://dx.doi.org/10.18235/0009034.

Der volle Inhalt der Quelle
Annotation:
This document describes the Peer Assist, which is a facilitated workshop, held face-to-face or virtually, in which a diverse group of participants from inside and/or outside the Bank share their experiences and insights with a team that wants to benefit from what others have learned, before it decides on a specific plan or course of action to deal with a significant upcoming challenge. Different from more informal peer to peer learning opportunities (e.g. networking, mentoring, Q&A, BBLs), the Peer Assist is a structured process designed to tackle challenges of greater complexity, uncertainty, or risk.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Rinuado, Christina, William Leonard, Christopher Morey, Theresa Coumbe, Jaylen Hopson und Robert Hilborn. Artificial intelligence (AI)–enabled wargaming agent training. Engineer Research and Development Center (U.S.), April 2024. http://dx.doi.org/10.21079/11681/48419.

Der volle Inhalt der Quelle
Annotation:
Fiscal Year 2021 (FY21) work from the Engineer Research and Development Center Institute for Systems Engineering Research lever-aged deep reinforcement learning to develop intelligent systems (red team agents) capable of exhibiting credible behavior within a military course of action wargaming maritime framework infrastructure. Building from the FY21 research, this research effort sought to explore options to improve upon the wargaming framework infrastructure and to investigate opportunities to improve artificial intelligence (AI) agent behavior. Wargaming framework infrastructure enhancements included updates related to supporting agent training, leveraging high-performance computing resources, and developing infrastructure to support AI versus AI agent training and gameplay. After evaluating agent training across different algorithm options, Deep Q-Network–trained agents performed better compared to those trained with Advantage Actor Critic or Proximal Policy Optimization algorithms. Experimentation in varying scenarios revealed acceptable performance from agents trained in the original baseline scenario. By training a blue agent against a previously trained red agent, researchers successfully demonstrated the AI versus AI training and gameplay capability. Observing results from agent gameplay revealed the emergence of behavior indicative of two principles of war, which were economy of force and mass.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Soliciting opinions and solutions on the "Q Zhang's Problem". BDICE, März 2023. http://dx.doi.org/10.58911/bdic.2023.03.001.

Der volle Inhalt der Quelle
Annotation:
"Q Zhang's problem" is a teaching problem proposed by Qian Zhang, a science teacher at Dongjiao Minxiang Primary School in Dongcheng District, Beijing. In 2022, she proposed that: (1) when explaining the knowledge points of frequency in the "Sound" unit, experiments on the vibration of objects such as rubber bands and steel rulers were used to assist students in learning, but the effect was not obvious, because it was difficult for the naked eye to distinguish the speed of vibration of objects such as rubber bands, and it was difficult to correspond to the high and low frequencies; (2) Students seem to be confused about the difference between frequency and amplitude. When guiding them to make the rubber band vibrate faster, they tend to tug harder at the rubber band, but this actually changes the amplitude rather than the frequency (changing the frequency should be to control its vibrating chord length, similar to the tuning method of a stringed instrument). Therefore, demonstration experiments using objects such as rubber bands as frequencies do not seem suitable and cannot effectively assist students in establishing the concept of frequency. We hope to solicit opinions and solutions (research ideas) on this problem, with a focus on two points: ① the mathematical/physical explanation of the problem. That is, does simply changing the amplitude really not affect the original vibration frequency of the object (except when the amplitude is 0) ② explanation from a cognitive perspective: Why do people confuse the two concepts? What is the cognitive mechanism behind it.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie