Einloggen

Thematische Bibliographien / Q-learning / Dissertationen

Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Q-learning.

Dissertationen zum Thema „Q-learning“

Autor: Grafiati

Veröffentlicht am 4. Juni 2021

Zuletzt aktualisiert am 25. Mai 2024

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "Q-learning" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Gaskett, Chris, und cgaskett@it jcu edu au. „Q-Learning for Robot Control“. The Australian National University. Research School of Information Sciences and Engineering, 2002. http://thesis.anu.edu.au./public/adt-ANU20041108.192425.

Der volle Inhalt der Quelle

Annotation:

Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learnings ability to learn by passively observing the robots actionsrather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Gaskett, Chris. „Q-Learning for robot control“. View thesis entry in Australian Digital Theses Program, 2002. http://eprints.jcu.edu.au/623/1/gaskettthesis.pdf.

Der volle Inhalt der Quelle

Annotation:

Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Laivamaa, J. (Juuso). „Reinforcement Q-Learning using OpenAI Gym“. Bachelor's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201903151329.

Der volle Inhalt der Quelle

Annotation:

Abstract. Q-Learning is an off-policy algorithm for reinforcement learning, that can be used to find optimal policies in Markovian domains. This thesis is about how Q-Learning can be applied to a test environment in the OpenAI Gym toolkit. The utility of testing the algorithm on a problem case is to find out how well it performs as well proving the practical utility of the algorithm. This thesis starts off with a general overview of reinforcement learning as well as the Markov decision process, both of which are crucial in understanding the theoretical groundwork that Q-Learning is based on. After that we move on to discussing the Q-Learning technique itself and dissect the algorithm in detail. We also go over OpenAI Gym toolkit and how it can be used to test the algorithm’s functionality. Finally, we introduce the problem case and apply the algorithm to solve it and analyse the results. The reasoning for this thesis is the rise of reinforcement learning and its increasing relevance in the future as technological progress allows for more and more complex and sophisticated applications of machine learning and artificial intelligence.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Del, Ben Enrico <1997&gt. „Reinforcement Learning: a Q-Learning Algorithm for High Frequency Trading“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2021. http://hdl.handle.net/10579/20411.

Der volle Inhalt der Quelle

Annotation:

The scope of this work is to test the implementation of an automated trading system based on Reinforcement Learning: a machine learning algorithm in which an intelligent agent acts to maximize its rewards given the environment around it. Indeed, given the environmental inputs and the environmental responses to the actions taken, the agent will learn how to behave in best way possible. In particular, in this work, a Q-Learning algorithm has been used to produce trading signals on the basis of high frequency data of the Limit Order Book for some selected stocks.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Karlsson, Daniel. „Hyperparameter optimisation using Q-learning based algorithms“. Thesis, Karlstads universitet, Fakulteten för hälsa, natur- och teknikvetenskap (from 2013), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-78096.

Der volle Inhalt der Quelle

Annotation:

Machine learning algorithms have many applications, both for academic and industrial purposes. Examples of applications are classification of diffraction patterns in materials science and classification of properties in chemical compounds within the pharmaceutical industry. For these algorithms to be successful they need to be optimised, part of this is achieved by training the algorithm, but there are components of the algorithms that cannot be trained. These hyperparameters have to be tuned separately. The focus of this work was optimisation of hyperparameters in classification algorithms based on convolutional neural networks. The purpose of this thesis was to investigate the possibility of using reinforcement learning algorithms, primarily Q-learning, as the optimising algorithm. Three different algorithms were investigated, Q-learning, double Q-learning and a Q-learning inspired algorithm, which was designed during this work. The algorithms were evaluated on different problems and compared to a random search algorithm, which is one of the most common optimisation tools for this type of problem. All three algorithms were capable of some learning, however the Q-learning inspired algorithm was the only one to outperform the random search algorithm on the test problems. Further, an iterative scheme of the Q-learning inspired algorithm was implemented, where the algorithm was allowed to refine the search space available to it. This showed further improvements of the algorithms performance and the results indicate that similar performance to the random search may be achieved in a shorter period of time, sometimes reducing the computational time by up to 40%.
Maskininlärningsalgoritmer har många tillämpningsområden, både akademiska och inom industrin. Exempel på tillämpningar är, klassificering av diffraktionsmönster inom materialvetenskap och klassificering av egenskaper hos kemiska sammansättningar inom läkemedelsindustrin. För att dessa algoritmer ska prestera bra behöver de optimeras. En del av optimering sker vid träning av algoritmerna, men det finns komponenter som inte kan tränas. Dessa hyperparametrar måste justeras separat. Fokuset för det här arbetet var optimering av hyperparametrar till klassificeringsalgoritmer baserade på faltande neurala nätverk. Syftet med avhandlingen var att undersöka möjligheterna att använda förstärkningsinlärningsalgoritmer, främst ''Q-learning'', som den optimerande algoritmen. Tre olika algoritmer undersöktes, ''Q-learning'', dubbel ''Q-learning'' samt en algoritm inspirerad av ''Q-learning'', denna utvecklades under arbetets gång. Algoritmerna utvärderades på olika testproblem och jämfördes mot resultat uppnådda med en slumpmässig sökning av hyperparameterrymden, vilket är en av de vanligare metoderna för att optimera den här typen av algoritmer. Alla tre algoritmer påvisade någon form av inlärning, men endast den ''Q-learning'' inspirerade algoritmen presterade bättre än den slumpmässiga sökningen. En iterativ implemetation av den ''Q-learning'' inspirerade algoritmen utvecklades också. Den iterativa metoden tillät den tillgängliga hyperparameterrymden att förfinas mellan varje iteration. Detta medförde ytterligare förbättringar av resultaten som indikerade att beräkningstiden i vissa fall kunde minskas med upp till 40% jämfört med den slumpmässiga sökningen med bibehållet eller förbättrat resultat.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Finnman, Peter, und Max Winberg. „Deep reinforcement learning compared with Q-table learning applied to backgammon“. Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186545.

Der volle Inhalt der Quelle

Annotation:

Reinforcement learning attempts to mimic how humans react to their surrounding environment by giving feedback to software agents based on the actions they take. To test the capabilities of these agents, researches have long regarded board games as a powerful tool. This thesis compares two approaches to reinforcement learning in the board game backgammon, a Q-table and a deep reinforcement network. It was determined which approach surpassed the other in terms of accuracy and convergence rate towards the perceived optimal strategy. The evaluation is performed by training the agents using the self-learning approach. After variable amounts of training sessions, the agents are benchmarked against each other and a third, random agent. The results derived from the study indicate that the convergence rate of the deep learning agent is far superior to that of the Q-table agent. However, the results also indicate that the accuracy of Q-tables is greater than that of deep learning once the former has mapped the environment.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Patel, Purvag. „Improving Computer Game Bots' behavior using Q-Learning“. Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1966544161&sid=3&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Burkov, Andriy. „Adaptive Dynamics Learning and Q-initialization in the Context of Multiagent Learning“. Thesis, Université Laval, 2007. http://www.theses.ulaval.ca/2007/24476/24476.pdf.

Der volle Inhalt der Quelle

Annotation:

L’apprentissage multiagent est une direction prometteuse de la recherche récente et à venir dans le contexte des systèmes intelligents. Si le cas mono-agent a été beaucoup étudié pendant les deux dernières décennies, le cas multiagent a été peu étudié vu sa complexité. Lorsque plusieurs agents autonomes apprennent et agissent simultanément, l’environnement devient strictement imprévisible et toutes les suppositions qui sont faites dans le cas mono-agent, telles que la stationnarité et la propriété markovienne, s’avèrent souvent inapplicables dans le contexte multiagent. Dans ce travail de maîtrise nous étudions ce qui a été fait dans ce domaine de recherches jusqu’ici, et proposons une approche originale à l’apprentissage multiagent en présence d’agents adaptatifs. Nous expliquons pourquoi une telle approche donne les résultats prometteurs lorsqu’on la compare aux différentes autres approches existantes. Il convient de noter que l’un des problèmes les plus ardus des algorithmes modernes d’apprentissage multiagent réside dans leur complexité computationnelle qui est fort élevée. Ceci est dû au fait que la taille de l’espace d’états du problème multiagent est exponentiel en le nombre d’agents qui agissent dans cet environnement. Dans ce travail, nous proposons une nouvelle approche de la réduction de la complexité de l’apprentissage par renforcement multiagent. Une telle approche permet de réduire de manière significative la partie de l’espace d’états visitée par les agents pour apprendre une solution efficace. Nous évaluons ensuite nos algorithmes sur un ensemble d’essais empiriques et présentons des résultats théoriques préliminaires qui ne sont qu’une première étape pour former une base de la validité de nos approches de l’apprentissage multiagent.
Multiagent learning is a promising direction of the modern and future research in the context of intelligent systems. While the single-agent case has been well studied in the last two decades, the multiagent case has not been broadly studied due to its complex- ity. When several autonomous agents learn and act simultaneously, the environment becomes strictly unpredictable and all assumptions that are made in single-agent case, such as stationarity and the Markovian property, often do not hold in the multiagent context. In this Master’s work we study what has been done in this research field, and propose an original approach to multiagent learning in presence of adaptive agents. We explain why such an approach gives promising results by comparing it with other different existing approaches. It is important to note that one of the most challenging problems of all multiagent learning algorithms is their high computational complexity. This is due to the fact that the state space size of multiagent problem is exponential in the number of agents acting in the environment. In this work we propose a novel approach to the complexity reduction of the multiagent reinforcement learning. Such an approach permits to significantly reduce the part of the state space needed to be visited by the agents to learn an efficient solution. Then we evaluate our algorithms on a set of empirical tests and give a preliminary theoretical result, which is first step in forming the basis of validity of our approaches to multiagent learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Cunningham, Bryan. „Non-Reciprocating Sharing Methods in Cooperative Q-Learning Environments“. Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34610.

Der volle Inhalt der Quelle

Annotation:

Past research on multi-agent simulation with cooperative reinforcement learning (RL) for homogeneous agents focuses on developing sharing strategies that are adopted and used by all agents in the environment. These sharing strategies are considered to be reciprocating because all participating agents have a predefined agreement regarding what type of information is shared, when it is shared, and how the participating agent's policies are subsequently updated. The sharing strategies are specifically designed around manipulating this shared information to improve learning performance. This thesis targets situations where the assumption of a single sharing strategy that is employed by all agents is not valid. This work seeks to address how agents with no predetermined sharing partners can exploit groups of cooperatively learning agents to improve learning performance when compared to Independent learning. Specifically, several intra-agent methods are proposed that do not assume a reciprocating sharing relationship and leverage the pre-existing agent interface associated with Q-Learning to expedite learning. The other agents' functions and their sharing strategies are unknown and inaccessible from the point of view of the agent(s) using the proposed methods. The proposed methods are evaluated on physically embodied agents in the multi-agent cooperative robotics field learning a navigation task via simulation. The experiments conducted focus on the effects of the following factors on the performance of the proposed non-reciprocating methods: scaling the number of agents in the environment, limiting the communication range of the agents, and scaling the size of the environment.
Master of Science

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Andersson, Gabriel, und Martti Yap. „Att spela 'Breakout' med hjälp av 'Deep Q-Learning'“. Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255799.

Der volle Inhalt der Quelle

Annotation:

I denna rapport implementerar vi en reinforcement learning (RL) algoritm som lär sig spela Breakout på 'Atari Learning Environment'. Den dator drivna spelaren (Agenten) har tillgång till samma information som en mänsklig spelare och vet inget om spelet och dess regler på förhand. Målet är att reproducera tidigare resultat genom att optimera agenten så att den överträffar den typiska mänskliga medelpoängen. För att genomföra detta formaliserar vi problemet som en 'Markov decision Process'. VI applicerar 'Deep Q-learning' algoritmen med 'action masking' för att uppnå en optimal strategi. Vi finner att vår agents genomsnittliga poäng ligger lite under den mänskliga: med 20 poäng som medel och ungefär 65% av den mänskliga motsvarigheten. Vi diskuterar en del möjliga implementationer och förbättringar som kan appliceras i framtida forskningsprojekt.
We cover in this report the implementation of a reinforcement learning (RL) algorithm capable of learning how to play the game 'Breakout' on the Atari Learning Environment (ALE). The non-human player (agent) is given no prior information of the game and must learn from the same sensory input that a human would typically receive when playing the game. The aim is to reproduce previous results by optimizing the agent driven control of 'Breakout' so as to surpass a typical human score. To this end, the problem is formalized by modeling it as a Markov Decision Process. We apply the celebrated Deep Q-Learning algorithm with action masking to achieve an optimal strategy. We find our agent's average score to be just below the human benchmarks: achieving an average score of 20, approximately 65% of the human counterpart. We discuss a number of implementations that boosted agent performance, as well as further techniques that could lead to improvements in the future.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Wang, Ying. „Cooperative and intelligent control of multi-robot systems using machine learning“. Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/905.

Der volle Inhalt der Quelle

Annotation:

This thesis investigates cooperative and intelligent control of autonomous multi-robot systems in a dynamic, unstructured and unknown environment and makes significant original contributions with regard to self-deterministic learning for robot cooperation, evolutionary optimization of robotic actions, improvement of system robustness, vision-based object tracking, and real-time performance. A distributed multi-robot architecture is developed which will facilitate operation of a cooperative multi-robot system in a dynamic and unknown environment in a self-improving, robust, and real-time manner. It is a fully distributed and hierarchical architecture with three levels. By combining several popular AI, soft computing, and control techniques such as learning, planning, reactive paradigm, optimization, and hybrid control, the developed architecture is expected to facilitate effective autonomous operation of cooperative multi-robot systems in a dynamically changing, unknown, and unstructured environment. A machine learning technique is incorporated into the developed multi-robot system for self-deterministic and self-improving cooperation and coping with uncertainties in the environment. A modified Q-learning algorithm termed Sequential Q-learning with Kalman Filtering (SQKF) is developed in the thesis, which can provide fast multi-robot learning. By arranging the robots to learn according to a predefined sequence, modeling the effect of the actions of other robots in the work environment as Gaussian white noise and estimating this noise online with a Kalman filter, the SQKF algorithm seeks to solve several key problems in multi-robot learning. As a part of low-level sensing and control in the proposed multi-robot architecture, a fast computer vision algorithm for color-blob tracking is developed to track multiple moving objects in the environment. By removing the brightness and saturation information in an image and filtering unrelated information based on statistical features and domain knowledge, the algorithm solves the problems of uneven illumination in the environment and improves real-time performance. In order to validate the developed approaches, a Java-based simulation system and a physical multi-robot experimental system are developed to successfully transport an object of interest to a goal location in a dynamic and unknown environment with complex obstacle distribution. The developed approaches in this thesis are implemented in the prototype system and rigorously tested and validated through computer simulation and experimentation.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Renner, Michael Robert. „Machine Learning Simulation: Torso Dynamics of Robotic Biped“. Thesis, Virginia Tech, 2007. http://hdl.handle.net/10919/34602.

Der volle Inhalt der Quelle

Annotation:

Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics.
Master of Science

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Chapman, Kevin L. „A Distributed Q-learning Classifier System for task decomposition in real robot learning problems“. Thesis, This resource online, 1996. http://scholar.lib.vt.edu/theses/available/etd-03042009-041449/.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Ekelund, Kalle. „Q-Learning: Ett sätt att lära agenter att spela fotboll“. Thesis, Högskolan i Skövde, Institutionen för kommunikation och information, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-8497.

Der volle Inhalt der Quelle

Annotation:

Den artificiella intelligensen i spel brukar ofta använda sig utav regelbaserade tekniker för dess beteende. Detta har gjort att de artificiella agenterna blivit förutsägbara, vilket är väldigt tydligt för sportspel. Det här arbetet har utvärderat ifall inlärningstekniken Q-learning är bättre på att spela fotboll än en regelbaserade tekniken tillståndsmaskin. För att utvärdera detta har en förenklad fotbollssimulering skapats. Där de båda lagen har använts sig av varsin teknik. De båda lagen har sedan spelat 100 matcher mot varandra för att se vilket lag/teknik som är bäst. Statistik ifrån matcherna har använts som undersökningsresultat. Resultatet visar att Q-learning är en bättre teknik då den vinner flest match och skapar flest chanser under matcherna. Diskussionen efteråt handlar om hur användbart Q-learning är i ett spelsammanhang.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Hagen, Stephanus Hendrikus Gerhardus ten. „Continuous state space Q-learning for control of nonlinear systems“. [S.l. : Amsterdam : s.n.] ; Universiteit van Amsterdam [Host], 2001. http://dare.uva.nl/document/58530.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Backstad, Sebastian. „Federated Averaging Deep Q-NetworkA Distributed Deep Reinforcement Learning Algorithm“. Thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-149637.

Der volle Inhalt der Quelle

Annotation:

In the telecom sector, there is a huge amount of rich data generated every day. This trend will increase with the launch of 5G networks. Telco companies are interested in analyzing their data to shape and improve their core businesses. However, there can be a number of limiting factors that prevents them from logging data to central data centers for analysis. Some examples include data privacy, data transfer, network latency etc. In this work, we present a distributed Deep Reinforcement Learning (DRL) method called Federated Averaging Deep Q-Network (FADQN), that employs a distributed hierarchical reinforcement learning architecture. It utilizes gradient averaging to decrease communication cost. Privacy concerns are also satisfied by training the agent locally and only sending aggregated information to the centralized server. We introduce two versions of FADQN: synchronous and asynchronous. Results on the cart-pole environment show 80 times reduction in communication without any significant loss in performance. Additionally, in case of asynchronous approach, we see a great improvement in convergence.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Koppula, Sreedevi. „Automated GUI Tests Generation for Android Apps Using Q-learning“. Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984181/.

Der volle Inhalt der Quelle

Annotation:

Mobile applications are growing in popularity and pose new problems in the area of software testing. In particular, mobile applications heavily depend upon user interactions and a dynamically changing environment of system events. In this thesis, we focus on user-driven events and use Q-learning, a reinforcement machine learning algorithm, to generate tests for Android applications under test (AUT). We implement a framework that automates the generation of GUI test cases by using our Q-learning approach and compare it to a uniform random (UR) implementation. A novel feature of our approach is that we generate user-driven event sequences through the GUI, without the source code or the model of the AUT. Hence, considerable amount of cost and time are saved by avoiding the need for model generation for generating the tests. Our results show that the systematic path exploration used by Q-learning results in higher average code coverage in comparison to the uniform random approach.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Sangalli, Andrea <1990&gt. „Q-Learning: an intelligent stochastic control approach for financial trading“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2015. http://hdl.handle.net/10579/6428.

Der volle Inhalt der Quelle

Annotation:

The objective is to implement a financial trading system, using MATLAB® software, to solve a stochastic control problem, which is the management of a capital. It is an automated model free machine learning based on Reinforcement Learning method, in particular Q-Learning one. This approach is developed by an algorithm which optimizes its behavior in real time based on the reactions it gets from the environment in which it operates. This project is based on a new emerging theory regarding the market efficiency, called Adaptive Market Hypothesis (AMH). I present an algorithm which might to perform in an operative applications using not complex information, which are the current and the four last returns. It operates on a single stock history prices time series selecting three possible actions: buy, sell and stay out from the market. My several simulations, with different parameters values set and on different stocks, show satisfactory operative performances, which are net of transaction costs.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Popa, Veronica <1992&gt. „Q-Learning. An intelligent technique for financial trading systems implementation“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2019. http://hdl.handle.net/10579/14103.

Der volle Inhalt der Quelle

Annotation:

In this thesis I consider a Reinforcement Learning (RL) approach for policy evaluation, in particular the Q-Learning algorithm (QLa). The QLa is able to dynamically optimize, in real time, its behaviour on the basis of the feedbacks it receives from the surrounding environment. First, I introduce the theory of Adaptive Market Hypothesis (AMH), on which an active portfolio management is based, as an evolution of the Efficient Market Hypothesis (EMH). Then, the essential aspects of the RL method are explained. Different parameters and values for Financial Trading Systems (FTSs) are presented in order to configure different QLas. Finally, the application and the results of such FTSs on stock price time series are presented.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Ho, Junius K. (Junius Ku) 1979. „Solving the reader collision problem with a hierarchical Q-learning algorithm“. Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87404.

Der volle Inhalt der Quelle

Annotation:

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.
Includes bibliographical references (leaves 89-90).
by Junius K. Hu.
M.Eng.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Abounadi, Jinane 1966. „Stochastic approximation for non-expansive maps : application to Q-learning algorithms“. Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10033.

Der volle Inhalt der Quelle

Annotation:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.
Includes bibliographical references (leaves 129-133).
by Jinane Abounadi.
Ph.D.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Moritz, Johan, und Albin Winkelmann. „Stuck state avoidance through PID estimation training of Q-learning agent“. Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264562.

Der volle Inhalt der Quelle

Annotation:

Reinforcement learning is conceptually based on an agent learning through interaction with its environment. This trial-and-error learning method makes the process prone to situations in which the agent is stuck in a dead-end, from which it cannot keep learning. This thesis studies a method to diminish the risk that a wheeled inverted pendulum, or WIP, falls over during training by having a Qlearning based agent estimate a PID controller before training it on the balance problem. We show that our approach is equally stable compared to a Q-learning agent without estimation training, while having the WIP falling over less than half the number of times during training. Both agents succeeds in balancing the WIP for a full hour in repeated tests.
Reinforcement learning baseras på en agent som lär sig genom att interagera med sin omgivning. Denna inlärningsmetod kan göra att agenten hamnar i situationer där den fastnar och inte kan fortsätta träningen. I denna examensuppsats utforskas en metod för att minska risken att en självkörande robot faller under inlärning. Detta görs genom att en Q-learning agent tränas till att estimera en PID kontroller innan den tränar på balanseringsproblemet. Vi visar att vår metod är likvärdigt stabil jämfört med en Q-learning agent utan estimeringsträning. Under träning faller roboten färre än hälften så många gånger när den kontrolleras av vår metod. Båda agenterna lyckas balansera roboten under en hel timme.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Monte, Calvo Alexander. „Learning, Evolution, and Bayesian Estimation in Games and Dynamic Choice Models“. Thesis, University of Oregon, 2014. http://hdl.handle.net/1794/18341.

Der volle Inhalt der Quelle

Annotation:

This dissertation explores the modeling and estimation of learning in strategic and individual choice settings. While learning has been extensively used in economics, I introduce the concept into standard models in unorthodox ways. In each case, changing the perspective of what learning is drastically changes standard models. Estimation proceeds using advanced Bayesian techniques which perform very well in simulated data. The first chapter proposes a framework called Experienced-Based Ability (EBA) in which players increase the payoffs of a particular strategy in the future through using the strategy today. This framework is then introduced into a model of differentiated duopoly in which firms can utilize price or quantity contracts, and I explore how the resulting equilibrium is affected by changes in model parameters. The second chapter extends the EBA model into an evolutionary setting. This new model offers a simple and intuitive way to theoretically explain complicated dynamics. Moreover, this chapter demonstrates how to estimate posterior distributions of the model's parameters using a particle filter and Metropolis-Hastings algorithm, a technique that can also be used in estimating standard evolutionary models. This allows researchers to recover estimates of unobserved fitness and skill across time while only observing population share data. The third chapter investigates individual learning in a dynamic discrete choice setting. This chapter relaxes the assumption that individuals base decisions off an optimal policy and investigates the importance of policy learning. Q-learning is proposed as a model of individual choice when optimal policies are unknown, and I demonstrate how it can be used in the estimation of dynamic discrete choice (DDC) models. Using Bayesian Markov chain Monte Carlo techniques on simulated data, I show that the Q-learning model performs well at recovering true parameter values and thus functions as an alternative structural DDC model for researchers who want to move away from the rationality assumption. In addition, the simulated data are used to illustrate possible issues with standard structural estimation if the rationality assumption is incorrect. Lastly, using marginal likelihood analysis, I demonstrate that the Q-learning model can be used to test for the significance of learning effects if this is a concern.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Gustafsson, Robin, und Lucas Fröjdendahl. „Machine Learning for Traffic Control of Unmanned Mining Machines : Using the Q-learning and SARSA algorithms“. Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260285.

Der volle Inhalt der Quelle

Annotation:

Manual configuration of rules for unmanned mining machine traffic control can be time-consuming and therefore expensive. This paper presents a Machine Learning approach for automatic configuration of rules for traffic control in mines with autonomous mining machines by using Q-learning and SARSA. The results show that automation might be able to cut the time taken to configure traffic rules from 1-2 weeks to a maximum of approximately 6 hours which would decrease the cost of deployment. Tests show that in the worst case the developed solution is able to run continuously for 24 hours 82% of the time compared to the 100% accuracy of the manual configuration. The conclusion is that machine learning can plausibly be used for the automatic configuration of traffic rules. Further work in increasing the accuracy to 100% is needed for it to replace manual configuration. It remains to be examined whether the conclusion retains pertinence in more complex environments with larger layouts and more machines.
Manuell konfigurering av trafikkontroll för obemannade gruvmaskiner kan vara en tidskrävande process. Om denna konfigurering skulle kunna automatiseras så skulle det gynnas tidsmässigt och ekonomiskt. Denna rapport presenterar en lösning med maskininlärning med Q-learning och SARSA som tillvägagångssätt. Resultaten visar på att konfigureringstiden möjligtvis kan tas ned från 1–2 veckor till i värsta fallet 6 timmar vilket skulle minska kostnaden för produktionssättning. Tester visade att den slutgiltiga lösningen kunde köra kontinuerligt i 24 timmar med minst 82% träffsäkerhet jämfört med 100% då den manuella konfigurationen används. Slutsatsen är att maskininlärning eventuellt kan användas för automatisk konfiguration av trafikkontroll. Vidare arbete krävs för att höja träffsäkerheten till 100% så att det kan användas istället för manuell konfiguration. Fler studier bör göras för att se om detta även är sant och applicerbart för mer komplexa scenarier med större gruvlayouts och fler maskiner.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

PEREIRA, ADRIANO BRITO. „PESSIMISTIC Q-LEARNING: AN ALGORITHM TO CREATE BOTS FOR TURN-BASED GAMES“. PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28809@1.

Der volle Inhalt der Quelle

Annotation:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Este documento apresenta um novo algoritmo de aprendizado por reforço, o Q-Learning Pessimista. Nossa motivação é resolver o problema de gerar bots capazes de jogar jogos baseados em turnos e contribuir para obtenção de melhores resultados através dessa extensão do algoritmo Q-Learning. O Q-Learning Pessimista explora a flexibilidade dos cálculos gerados pelo Q-Learning tradicional sem a utilização de força bruta. Para medir a qualidade do bot gerado, consideramos qualidade como a soma do potencial de vitória e empate em um jogo. Nosso propósito fundamental é gerar bots de boa qualidade para diferentes jogos. Desta forma, podemos utilizar este algoritmo para famílias de jogos baseados em turno. Desenvolvemos um framework chamado Wisebots e realizamos experimentos com alguns cenários aplicados aos seguintes jogos tradicionais: TicTacToe, Connect-4 e CardPoints. Comparando a qualidade do Q-Learning Pessimista com a do Q-Learning tradicional, observamos ganhos de 0,8 por cento no TicTacToe, obtendo um algoritmo que nunca perde. Observamos também ganhos de 35 por cento no Connect-4 e de 27 por cento no CardPoints, elevando ambos da faixa de 50 por cento a 60 por cento para 90 por cento a 100 por cento de qualidade. Esses resultados ilustram o potencial de melhoria com o uso do Q-Learning Pessimista, sugerindo sua aplicação aos diversos tipos de jogos de turnos.
This document presents a new algorithm for reinforcement learning method, Q-Learning Pessimistic. Our motivation is to resolve the problem of generating bots able to play turn-based games and contribute to achieving better results through this extension of the Q-Learning algorithm. The Q-Learning Pessimistic explores the flexibility of the calculations generated by the traditional Q-learning without the use of force brute. To measure the quality of bot generated, we consider quality as the sum of the potential to win and tie in a game. Our fundamental purpose, is to generate bots with good quality for different games. Thus, we can use this algorithm to families of turn-based games. We developed a framework called Wisebots and conducted experiments with some scenarios applied to the following traditional games TicTacToe, Connect-4 and CardPoints. Comparing the quality of Pessimistic Q-Learning with the traditional Q-Learning, we observed gains to 100 per cent in the TicTacToe, obtaining an algorithm that never loses. Also observed in 35 per cent gains Connect-4 and 27 per cent in CardPoints, increasing both the range of 60 per cent to 80 per cent for 90 per cent to 100 per cent of quality. These results illustrate the potential for improvement with the use of Q-Learning Pessimistic, suggesting its application to various types of games.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Soto, Santibanez Miguel Angel. „BUILDING AN ARTIFICIAL CEREBELLUM USING A SYSTEM OF DISTRIBUTED Q-LEARNING AGENTS“. Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/194811.

Der volle Inhalt der Quelle

Annotation:

About 400 million years ago sharks developed a separate co-processor in their brains that not only made them faster but also more precisely coordinated. This co-processor, which is nowadays called cerebellum, allowed sharks to outperform their peers and survive as one of the fittest. For the last 40 years or so, researchers have been attempting to provide robots and other machines with this type of capability. This thesis discusses currently used methods to create artificial cerebellums and points out two main shortcomings: 1) framework usability issues and 2) building blocks incompatibility issues. This research argues that the framework usability issues hinder the production of good quality artificial cerebellums for a large number of applications. Furthermore, this study argues that the building blocks incompatibility issues make artificial cerebellums less efficient that they could be, given our current technology. To tackle the framework usability issues, this thesis research proposes the use of a new framework, which formalizes the task of creating artificial cerebellums and offers a list of simple steps to accomplish this task. Furthermore, to tackle the building blocks incompatibility issues, this research proposes thinking of artificial cerebellums as a set of cooperating q-learning agents, which utilize a new technique called Moving Prototypes to make better use of the available memory and computational resources. Furthermore, this work describes a set of general guidelines that can be applied to accelerate the training of this type of system. Simulation is used to show examples of the performance improvements resulting from the use of these guidelines. To illustrate the theory developed in this dissertation, this paper implements a cerebellum for a real life application, namely, a cerebellum capable of controlling a type of mining equipment called front-end loader. Finally, this thesis proposes the creation of a development tool based on this formalization. This research argues that such a development tool would allow engineers, scientists and technicians to quickly build customized cerebellums for a wide range of applications without the need of becoming experts on the area of Artificial Intelligence, Neuroscience or Machine Learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Cesaro, Enrico <1992&gt. „Q-Learning: un algoritmo ad apprendimento per rinforzo applicato al trading finanziario“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2018. http://hdl.handle.net/10579/12287.

Der volle Inhalt der Quelle

Annotation:

"In an efficient market prices fully reflect all available information". È con questa celeberrima frase che Fama, nel 1969, teorizzava le ipotesi di un mercato efficiente, presupponendo informazioni disponibili a costo zero ed individui razionali. Il più grande contributo di Fama fu quello di accendere un dibattito ultra trentennale che ha spinto la letteratura accademica alla ricerca di teorie e prove empiriche che confutassero il lavoro dell'economista statunitense, attingendo a materie interdisciplinari che considerassero non solo teorie economiche, ma anche comportamentali ed ambientali. Recentemente, Andrew Lo si è mosso in questa direzione sviluppando le ipotesi di mercato adattivo, con il principale intendo di riconciliare le teorie economiche basate sulle ipotesi di Fama con il comportamento che i soggetti economici assumono nell'ambiente in cui agiscono, applicando dunque i principi dell'evoluzionismo (competizione, adattamento, selezione naturale) alle interazioni finanziarie. Sotto questa nuova luce, è possibile trattare problemi di ottimizzazione e management dei portafogli finanziari in un'ottica evoluzionistica. La tecnica che viene presentata ed utilizzata in questo lavoro è l'apprendimento per rinforzo, ovvero una tecnica di apprendimento automatico che punta ad adattarsi alle mutazioni dell'ambiente nel quale è immersa utilizzando la valutazione delle performance ottenute. Tale tecnica viene attuata mediante l'algoritmo Q-Learning: viene sviluppata la costruzione di un sistema di trading finanzio mediante questa tecnica per verificare se, grazie all'uso dell'algoritmo citato, si è in grado di ottenere performance positive e significative, senza la necessità di una supervisione attiva del portafoglio selezionato.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

Singh, Isha. „Reinforcement Learning For Multiple Time Series“. University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1573223551346074.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

Howe, Dustin. „Quantile Regression Deep Q-Networks for Multi-Agent System Control“. Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505241/.

Der volle Inhalt der Quelle

Annotation:

Training autonomous agents that are capable of performing their assigned job without fail is the ultimate goal of deep reinforcement learning. This thesis introduces a dueling Quantile Regression Deep Q-network, where the network learns the state value quantile function and advantage quantile function separately. With this network architecture the agent is able to learn to control simulated robots in the Gazebo simulator. Carefully crafted reward functions and state spaces must be designed for the agent to learn in complex non-stationary environments. When trained for only 100,000 timesteps, the agent is able reach asymptotic performance in environments with moving and stationary obstacles using only the data from the inertial measurement unit, LIDAR, and positional information. Through the use of transfer learning, the agents are also capable of formation control and flocking patterns. The performance of agents with frozen networks is improved through advice giving in Deep Q-networks by use of normalized Q-values and majority voting.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Dunn, Noah M. „A Modified Q-Learning Approach for Predicting Mortality in Patients Diagnosed with Sepsis“. Miami University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=miami1618439089761747.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Webb, Samuel J. „Interpretation and mining of statistical machine learning (Q)SAR models for toxicity prediction“. Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807269/.

Der volle Inhalt der Quelle

Annotation:

Structure Activity Relationship (SAR) modelling capitalises on techniques developed within the computer science community, particularly in the fields of machine learning and data mining. These machine learning approaches are often developed for the optimisation of model accuracy which can come at the expense of the interpretation of the prediction. Highly predictive models should be the goal of any modeller, however, the intended users of the model and all factors relating to usage of the model should be considered. One such aspect is the clarity, understanding and explanation for the prediction. In some cases black box models which do not provide an interpretation can be disregarded regardless of their predictive accuracy. In this thesis the problem of model interpretation has been tackled in the context of models to predict toxicity of drug like molecules. Firstly a novel algorithm has been developed for the interpretation of binary classification models where the endpoint meets defined criteria: activity is caused by the presence of a feature and inactivity by the lack of an activating feature or the deactivation of all such activating features. This algorithm has been shown to provide a meaningful interpretation of the model’s cause(s) of both active and inactive predictions for two toxicological endpoints: mutagenicity and skin irritation. The algorithm shows benefits over other interpretation algorithms in its ability to not only identify the causes of activity mapped to fragments and physicochemical descriptors but also in its ability to account for combinatorial effects of the descriptors. The interpretation is presented to the user in the form of the impact of features and can be visualised as a concise summary or in a hierarchical network detailing the full elucidation of the models behaviour for a particular query compound. The interpretation output has been capitalised on and incorporated into a knowledge mining strategy. The knowledge mining is able to extract the learned structure activity relationship trends from a model such as a Random Forest, decision tree, k Nearest Neighbour or support vector machine. These trends can be presented to the user focused around the feature responsible for the assessment such as ACTIVATING or DEACTIVATING. Supporting examples are provided along with an estimation of the models predictive performance for a given SAR trend. Both the interpretation and knowledge mining has been applied to models built for the prediction of Ames mutagenicity and skin irritation. The performance of the developed models is strong and comparable to both academic and commercial predictors for these two toxicological activities.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

Ogunniyi, Samuel. „Energy efficient path planning: the effectiveness of Q-learning algorithm in saving energy“. Master's thesis, University of Cape Town, 2014. http://hdl.handle.net/11427/13308.

Der volle Inhalt der Quelle

Annotation:

Includes bibliographical references.
In this thesis the author investigated the use of a Q-learning based path planning algorithm to investigate how effective it is in saving energy. It is important to pursue any means to save energy in this day and age, due to the excessive exploitation of natural resources and in order to prevent drops in production in industrial environments where less downtime is necessary or other applications where a mobile robot running out of energy can be costly or even disastrous, such as search and rescue operations or dangerous environment navigation. The study was undertaken by implementing a Q-learning based path planning algorithm in several unstructured and unknown environments. A cell decomposition method was used to generate the search space representation of the environments, within which the algorithm operated. The results show that the Q-learning path planner paths on average consumed 3.04% less energy than the A* path planning algorithm, in a square 20% obstacle density environment. The Q-learning path planner consumed on average 5.79% more energy than the least energy paths for the same environment. In the case of rectangular environments, the Q-learning path planning algorithm uses 1.68% less energy, than the A* path algorithm and 3.26 % more energy than the least energy paths. The implication of this study is to highlight the need for the use of learning algorithm in attempting to solve problems whose existing solutions are not learning based, in order to obtain better solutions.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Lima, J?nior Manoel Leandro de. „Uma contribui??o ? solu??o do problema dos k-servos usando aprendizagem por refor?o“. Universidade Federal do Rio Grande do Norte, 2005. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15405.

Der volle Inhalt der Quelle

Annotation:

Made available in DSpace on 2014-12-17T14:55:59Z (GMT). No. of bitstreams: 1 ManoelLJ.pdf: 474615 bytes, checksum: 061ee02f4ad5cc23a561d346dd73a9da (MD5) Previous issue date: 2005-04-06
Neste trabalho ? proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solu??o ? comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os par?metros de compara??o significativos. Um algoritmo que apresente desempenho eficiente em rela??o aos mesmos tende a ser competitivo tamb?m, devendo, obviamente, se provar o referido fato. Tal prova, entretanto, foge aos objetivos do presente trabalho. O algoritmo apresentado para a solu??o do PKS ? baseado em t?cnicas de aprendizagem por refor?o. Para tanto, o problema foi modelado como um processo de decis?o em m?ltiplas etapas, ao qual ? aplicado o algoritmo Q-Learning, um dos m?todos de solu??o mais populares para o estabelecimento de pol?ticas ?timas neste tipo de problema de decis?o. Entretanto, deve-se observar que a dimens?o da estrutura de armazenamento utilizada pela aprendizagem por refor?o para se obter a pol?tica ?tima cresce em fun??o do n?mero de estados e de a??es, que por sua vez ? proporcional ao n?mero n de n?s e k de servos. Ao se analisar esse crescimento (matematicamente, ) percebe-se que o mesmo ocorre de maneira exponencial, limitando a aplica??o do m?todo a problemas de menor porte, onde o n?mero de n?s e de servos ? reduzido. Este problema, denominado maldi??o da dimensionalidade, foi introduzido por Belmann e implica na impossibilidade de execu??o de um algoritmo para certas inst?ncias de um problema pelo esgotamento de recursos computacionais para obten??o de sua sa?da. De modo a evitar que a solu??o proposta, baseada exclusivamente na aprendizagem por refor?o, seja restrita a aplica??es de menor porte, prop?e-se uma solu??o alternativa para problemas mais realistas, que envolvam um n?mero maior de n?s e de servos. Esta solu??o alternativa ? hierarquizada e utiliza dois m?todos de solu??o do PKS: a aprendizagem por refor?o, aplicada a um n?mero reduzido de n?s obtidos a partir de um processo de agrega??o, e um m?todo guloso, aplicado aos subconjuntos de n?s resultantes do processo de agrega??o, onde o crit?rio de escolha do agendamento dos servos ? baseado na menor dist?ncia ao local de demanda

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Magnus, Sonia de Paula Faria. „Estrategias de aprendizagem em lingua estrangeira : um estudo "Q"“. [s.n.], 2005. http://repositorio.unicamp.br/jspui/handle/REPOSIP/269241.

Der volle Inhalt der Quelle

Annotation:

Orientador: Linda Gentry El-Dash
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem
Made available in DSpace on 2018-08-04T18:30:36Z (GMT). No. of bitstreams: 1 Magnus_SoniadePaulaFaria_M.pdf: 400952 bytes, checksum: b4ea479468f5a85ca0c1da73c203235c (MD5) Previous issue date: 2005
Resumo: O principal objetivo deste estudo é investigar que estratégias de aprendizagem de língua estrangeira (cognitivas, metacognitivas e sócio-afetivas) são usadas conscientemente por aprendizes de 7ª e 8ª séries em escolas públicas brasileiras.Aspectos demográficos, tais como origem, idade e sexo também foram explorados, a fim de verificar se estes exerciam alguma influência no uso destas estratégias. Quatro metodologias distintas foram usadas na pesquisa: grupos focais, análise estatística, estudo ¿Q¿ e entrevistas.A primeira coleta de dados foi obtida através das discussões nos grupos focais, das quais as estratégias foram retiradas, analisadas, selecionadas e submetidas a um questionário e análise fatorial.Os fatores que surgiram desta análise foram usados para compor os itens do estudo ¿Q¿, no qual foi possível identificar três diferentes pontos de vista.O primeiro deles (ponto de vista/fator ¿A¿) apresentou o perfil do típico aluno considerado ¿bom¿ (aquele que toma nota e revê os exercícios, por exemplo); o segundo (ponto de vista/ fator ¿B¿) revelou o perfil do aluno ¿preguiçoso¿, numa visão mais tradicional (não classifica palavras, não copia diálogos, por exemplo) e o terceiro ponto de vista (ponto de vista/ fator ¿C¿), o de interpretação mais difícil, foi melhor entendido somente através de entrevista com cada um dos dois participantes mais típicos deste ponto de vista: são alunos autônomos, que geralmente preferem tarefas mais práticas. Foi possível verificar, também, que todos os pontos de vista foram mais influenciados por um ou outro aspecto demográfico apresentado no estudo
Abstract: The main goal of this study is to investigate which learning strategies in foreign language (among cognitive, metacognitive and social/affective strategies) are consciously used by learners from ¿7ª and 8ª séries¿ in Brazilian public schools. Demographic aspects such as origin, age and sex were also explored in order to check if they get any sort of influence in learning strategies use. Four distinct methodologies were adopted in this research: focal groups, factor analysis, ¿Q¿ study and interviews. The first data collection was obtained through discussions in focal groups, from which strategies were collected, analyzed, selected and submitted to a questionnaire and factor analysis. The factors, which emerged from this analysis, were used to compose the items in ¿Q¿ study. It was possible to identify three distinct points of view from this study. The first one (point of view/ factor ¿A¿) showed the typical ¿good student¿s¿ profile (e.g.: the one who takes notes, reviews exercises); the second one (point of view/ factor ¿B¿) revealed the ¿lazy student¿s¿ profile, in a more traditional point of view (subjects in this point of view do not classify words, neither copy dialogues, for example); and the third (point of view/ factor ¿C¿), the most difficult one to interpret, could only be better understood after interviews with each one of the two most typical subjects in this point of view. We found out that subjects who share this point of view are autonomous and usually prefer practical tasks better than others. It was also possible to verify that all points of view could be more affected by one or other demographic aspect presented in this study
Mestrado
Ensino-Aprendizagem de Segunda Lingua e Lingua Estrangeira
Mestre em Linguística Aplicada

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Funkquist, Mikaela, und Minghua Lu. „Distributed Optimization Through Deep Reinforcement Learning“. Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293878.

Der volle Inhalt der Quelle

Annotation:

Reinforcement learning methods allows self-learningagents to play video- and board games autonomously. Thisproject aims to study the efficiency of the reinforcement learningalgorithms Q-learning and deep Q-learning for dynamical multi-agent problems. The goal is to train robots to optimally navigatethrough a warehouse without colliding.A virtual environment was created, in which the learning algo-rithms were tested by simulating moving agents. The algorithms’efficiency was evaluated by how fast the agents learned to performpredetermined tasks.The results show that Q-learning excels in simple problemswith few agents, quickly solving systems with two active agents.Deep Q-learning proved to be better suited for complex systemscontaining several agents, though cases of sub-optimal movementwere still possible. Both algorithms showed great potential fortheir respective areas however improvements still need to be madefor any real-world use.
Förstärkningsinlärningsmetoder tillåter självlärande enheter att spela video- och brädspel autonomt. Projektet siktar på att studera effektiviteten hos förstärkningsinlärningsmetoderna Q-learning och deep Q-learning i dynamiska problem. Målet är att träna upp robotar så att de kan röra sig genom ett varuhus på bästa sätt utan att kollidera. En virtuell miljö skapades, i vilken algoritmerna testades genom att simulera agenter som rörde sig. Algoritmernas effektivitet utvärderades av hur snabbt agenterna lärde sig att utföra förutbestämda uppgifter. Resultatet visar att Q-learning fungerar bra för enkla problem med få agenter, där system med två aktiva agenter löstes snabbt. Deep Q-learning fungerar bättre för mer komplexa system som innehåller fler agenter, men fall med suboptimala rörelser uppstod. Båda algoritmerna visade god potential inom deras respektive områden, däremot måste förbättringar göras innan de kan användas i verkligheten.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

Damio, Siti Maftuhah. „Seeking Malay trainee English teachers' perceptions of autonomy in language learning using Q methodology“. Thesis, University of Nottingham, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.594590.

Der volle Inhalt der Quelle

Annotation:

Greater learner autonomy has, in recent decades, become recognised as a worthy goal in the field of foreign language learning. However, it is difficult to find a consensus view in the literature concerning the definition of the concept. One area of debate is the extent to which autonomy in language learning (ALL) is universally applicable, or whether versions of autonomy exist which are defined culturally or contextually. To attempt to address this debate, a study was carried out in the specific context of Malaysian higher education. The subjects were four cohorts. totalling 31 participants, of Malay trainee English teachers in a Malaysian public university. The aim was to examine the conceptions, practices and cultural influences of autonomy in language learning. Q Methodology was chosen as a means of systematically exploring subjective viewpoints, enabling pattern recognition of perspectives obtained. To provide further insight into the topic, three interviews were carried out. Findings relating to the conceptions found in the data obtained showed that there were four perspectives to autonomy in language learning in this particular context. To the questions on practices, findings indicated that practices of autonomy in language learning were more prominently located in the classroom, although the role of learning outside of the classroom was acknowledged. Findings on the role of culture suggested that an amalgamation of the individual and the social is encouraged. In addition to looking at the perspectives on autonomy in language education, this study carried out an investigation concerning the extent of effectiveness of Q Methodology for researching subjectivities in autonomy in language learning. The findings showed that the prospect is encouraging for Q Methodology is found to be educational. creative and environmentally friendly. This study offers a development to the dynamic concept of autonomy in language learning by suggesting that the incorporation of the collectivist and Islamic perspectives of Malay culture to the concept will encourage greater autonomy in language learning among Malay trainee teachers. The use of Q Methodology, which can be constructive for participants and researchers alike, is enriching, and using Q Methodology in appropriate language learning research is encouraged.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Ondroušek, Vít. „Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu“. Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-233985.

Der volle Inhalt der Quelle

Annotation:

The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Kung, Teju, und 孔德如. „Gait Balancing by Q-learning“. Thesis, 2011. http://ndltd.ncl.edu.tw/handle/12698998321901616016.

Der volle Inhalt der Quelle

Annotation:

碩士
國立中正大學
電機工程研究所
99
In the context of the humanoid biped robot, for building a robot model with 18 dimensions, and applying this model to achieve the balance of robot behavior at the same time needs for large amount of calculation of mathematical derivations. The study on biped walking and balance control using reinforcement learning is presented in this paper. The algorithm can lead a robot learn to walk without any previous knowledge about any explicit dynamics model. Meanwhile, to put applications of reinforcement learning into more practice, it is an important issue to equip the system with a continuous action policy, because it can enhance the effectiveness of learning, and also make robots more adapted to the real environment. Based on the Tabu Search and convention Q-learning which can generate a mapping between a paradigm action and a discrete state space, the proposed reinforcement learning algorithm is developed to deal with the problem of reinforcement learning in continuous action domain by means of a self-organized state aggregation mechanism. The learning architecture is developed in to solve more complex control problems. It spans the basis discrete actions to construct a continuous action policy. The architecture allows for scaling the dimensionality of the state space and cardinality of the action set that represent new knowledge, or new requirements for the desired task. The simulation analysis, using the simulation model of a physical biped robot, shows that a biped robot can perform its basic walking skill with a priori knowledge and then learn to improve its behavior in terms of walking speed and restricted positions of center of mass by incorporation of the human intuitive balancing knowledge and walking evaluation knowledge.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Gaskett, Chris. „Q-Learning for Robot Control“. Phd thesis, 2002. http://hdl.handle.net/1885/47080.

Der volle Inhalt der Quelle

Annotation:

Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. ¶ This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. ¶ The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Gaskett, Chris. „Q-Learning for robot control“. Thesis, 2002. http://hdl.handle.net/1885/47080.

Der volle Inhalt der Quelle

Annotation:

Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

„Asynchronous stochastic approximation and Q-learning“. Massachusetts Institute of Technology, Laboratory for Information and Decision Systems], 1993. http://hdl.handle.net/1721.1/3312.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Liu, Hsiu-Chen, und 劉繡禎. „Combining Q-Learning with Hybrid Learning Approach in RoboCup“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/s2z5ry.

Der volle Inhalt der Quelle

Annotation:

碩士
國立臺北科技大學
電資碩士班
101
RoboCup is an international competition developed in 1997. The mission is “By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup”. For academic, RoboCup provides an excellent test bed for machine learning. As in a soccer game, environment states are constantly changing. Therefore, how to make a soccer agent learn autonomously to act with the best responses has becomes an important issue. The paper “Applying Hybrid Learning Approach to RoboCup''s Strategy” discusses the hybrid learning approach in this field. In this paper, to carry on the concept, we continue to apply the hybrid learning approach for the coach agent; while for the player agent, we apply the Q-Learning method. Furthermore, in order to solve the excessive environment state which slows down the learning rate, here we use fuzzy-state and fuzzy-rule to decrease the state space and to simplify the State-Action Table of Q-Learning. Finally, we build this soccer team that coach agent and player agent both have learning ability in RoboCup Soccer simulator. Through experiments, we analyze and compare the learning effects and the efficiency of execution.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Herrmann, Michael, und Ralf Der. „Efficient Q-Learning by Division of Labor“. 1995. https://ul.qucosa.de/id/qucosa%3A32942.

Der volle Inhalt der Quelle

Annotation:

Q-learning as well as other learning paradigms depend strongly on the representation of the underlying state space. As a special case of the hidden state problem we investigate the effect of a self-organizing discretization of the state space in a simple control problem. We apply the neural gas algorithm with adaptation of learning rate and neighborhood range to a simulated cart-pole problem. The learning parameters are determined by the ambiguity of successful actions inside each cell.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Tsai, Shih-Hung, und 蔡世宏. „Fuzzy Q-Learning based Hybrid ARQ for HSDPA“. Thesis, 2007. http://ndltd.ncl.edu.tw/handle/31870432212612583075.

Der volle Inhalt der Quelle

Annotation:

碩士
國立交通大學
電機學院通訊與網路科技產業專班
95
In order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). HSDPA adopts a technique called adaptive modulation and coding (AMC) to use different modulation orders and coding rates corresponding to different channel conditions. It provides more multi-codes for transmissions, and also adopts the hybrid automatic retransmission request (HARQ) scheme to make the transmission mechanism more effective. In order to adapt the changes of channel conditions, HSDPA adopts shorter transmission time interval (TTI) as 2ms to reach more effective resource allocation. In this thesis, a fuzzy Q-learning based HARQ scheme for HSDPA is proposed. For the HARQ scheme modeled as Markov decision process (MDP), we use fuzzy Q-learning algorithm to learn the modulation and coding rates of initial transmissions. Our object is to satisfy the quality of service (QoS) and keep high data rate. The simulation results show that the proposed scheme can indeed reach the object and is feasible in different channel conditions.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Wu, Li Hsuan, und 吳俐萱. „An Improved Dyna-Q algorithm via Indirect Learning“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/14795815166083051340.

Der volle Inhalt der Quelle

Annotation:

碩士
國立中正大學
電機工程研究所
101
In this thesis, we applied more algorithms, such as Ant Colony Optimization (ACO), Prioritized Sweeping, to improve the problem of learning speed in Dyna-Q learning algorithms. The agent interacts with environment and learns policy by Q-learning, and builds interactive information to the virtual backward prediction model. As the agent explores the unknown environment, makes a decision action by the exploration factor. In Dyna architecture, the planning method produces the random state-action pairs that have been experienced; however, prioritized sweeping (breadth-first planning) is proposed to improve Dyna-Q planning method. In this thesis, we proposed two methods of planning, depth-first planning and hybrid planning. Depth-first planning applied ACO algorithm concept and explored factor; hybrid Planning combines with advantage of the depth-first planning with prioritized sweeping (breadth-first planning). In order to improve shortcomings of the model, we propose the model shaping predicting insufficient information on the model. For verifying the proposed method, we simulate in maze and mountain car environment. The results of simulation prove we propose the method can promote speed and efficiency of learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Su, Wei-De, und 蘇偉德. „Combination of Q-learning and Fuzzy-State for Learning of RoboCup Agent“. Thesis, 2010. http://ndltd.ncl.edu.tw/handle/jg434k.

Der volle Inhalt der Quelle

Annotation:

碩士
國立臺北科技大學
資訊工程系研究所
98
Artificial intelligence always been interesting in computer science,and in this area machine learning is the key to success, RoboCup(Robot World Cup Tournament) is a competition game which has already become a popular research domain in recent years, includes the real robot as well as computer simulation games and also provide comprehensive rules and mechanisms. In Academic,it provides a best test-bed for machine learning. As the soccer game, the environment states are always changing.Therefor, in this paper, we use the Q-Learning method that is a kind of reinforcement learning to apply for learning of robocup agent. And, in order to solve the environment states of excessive problem which led to slow learning rate, we will use fuzzy-state and fuzzy-rule to decrease the state and state-action table of Q-Learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Lin, Shin-Yi, und 林欣儀. „Hardware Implementation of a FAST-based Q-learning Algorithm“. Thesis, 2006. http://ndltd.ncl.edu.tw/handle/01215306387784303539.

Der volle Inhalt der Quelle

Annotation:

碩士
國立中正大學
電機工程所
94
This thesis studies a modified Flexible Adaptable-Size Topology (FAST) Neural Architecture, and its integration with reinforcement algorithms. Through the control of balancing an inverted pendulum, the performance of the proposed architecture is verified. In the article, that an architecture, in which the FAST algorithm and the Adaptive Heuristic Critic (AHC) algorithm is combined, is studied. The performance of this architecture is not as what it has been expected when balancing an inverted pendulum is simulated. Hence, we modify three portions of the FAST algorithm and combine it with the AHC and achieve better effect in the same simulation. However, it is difficult to realize hardware structure of the FAST with AHC algorithm due to the complexity in the AHC structure. Q-learning or Q(λ) reinforcement learning algorithms with simple structures are then chosen to integrate with the modified FAST. Their superior effects are verified through the inverted pendulum simulation as well. Finally, the implementation of the modified FAST on a FPGA is developed for accomplishing a classification chip which can learn.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Tsai, Cheng-Yu, und 蔡承佑. „Online Learning System Using Q&A management mechanism“. Thesis, 2010. http://ndltd.ncl.edu.tw/handle/90277341659478981179.

Der volle Inhalt der Quelle

Annotation:

碩士
國立中央大學
資訊工程研究所
98
Students usually turn to on-line information when they study. Thus they can make up for the infertility of the textbook contents, solving their questions and acquiring more complete knowledge. However, sometimes students can not find satisfying solution on the web, then except for asking somebody in real-life, they can also post a question on the forum and discuss with others in the forum to come up with a proper answer. 　　However, because the forum is independent from the textbook, discussions that aimed at some particular concepts in the textbook is scattered all over the whole forum. Even if students use keywords to search, they always find other matterless discussions and have to spend much time picking up useful discussion-threads they really interested in. Moreover, when students post new questions, they must frequently go back to the threads to check if someone has answered their questions. And, when they need to keep discussing with others in some threads, each time they should search out the threads in the forum before that. 　　Tracing the progression of discussion threads waste much time and effort, inducing students’ unwillingness to ask questions and discuss in an on-line forum. Hence we propose a framework to combine Q&A management mechanism with the textbook content. In such way, students can easily find related discussions in the forum when they read textbook, and they can subscribe interested threads and monitor the status and progression of the threads. These information help students to decide which thread is worth reading in current context.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Lu, Xiaoqi. „Regularized Greedy Gradient Q-Learning with Mobile Health Applications“. Thesis, 2021. https://doi.org/10.7916/d8-zv13-2p78.

Der volle Inhalt der Quelle

Annotation:

Recent advance in health and technology has made mobile apps a viable approach to delivering behavioral interventions in areas including physical activity encouragement, smoking cessation, substance abuse prevention, and mental health management. Due to the chronic nature of most of the disorders and heterogeneity among mobile users, delivery of the interventions needs to be sequential and tailored to individual needs. We operationalize the sequential decision making via a policy that takes a mobile user's past usage pattern and health status as input and outputs an app/intervention recommendation with the goal of optimizing the cumulative rewards of interest in an indefinite horizon setting. There is a plethora of reinforcement learning methods on the development of optimal policies in this case. However, the vast majority of the literature focuses on studying the convergence of the algorithms with infinite amount of data in computer science domain. Their performances in health applications with limited amount of data and high noise are yet to be explored. Technically the nature of sequential decision making results in an objective function that is non-smooth (not even a Lipschitz) and non-convex in the model parameters. This poses theoretical challenges to the characterization of the asymptotic properties of the optimizer of the objective function, as well as computational challenges for optimization. This problem is especially exacerbated with the presence of high dimensional data in mobile health applications. In this dissertation we propose a regularized greedy gradient Q-learning (RGGQ) method to tackle this estimation problem. The optimal policy is estimated via an algorithm which synthesizes the PGM and the GGQ algorithms in the presence of an L₁ regularization, and its asymptotic properties are established. The theoretical framework initiated in this work can be applied to tackle other non-smooth high dimensional problems in reinforcement learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Yin, Huang Chiao, und 黃巧瑩. „HARQ Process for HSDPA by Fuzzy Q-learning Technique“. Thesis, 2008. http://ndltd.ncl.edu.tw/handle/15151366792629085484.

Der volle Inhalt der Quelle

Annotation:

碩士
國立交通大學
電信工程系所
96
In order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). An important QoS requirement defined in spec for the hybrid automatic retransmission request (HARQ) process is to choose a suitable MCS to maintain the initial block error rate (BLER) smaller than 0.1 based on the channel quality information. In this thesis, we proposed a fuzzy Q-learning based HARQ (FQL-HARQ) scheme for HSDPA to solve this problem. The HARQ scheme is modeled as a Markov decision process (MDP). On one hand, the fuzzy rule is designed to maintain the BLER requirement by separated to different parts based on a shot term BLER performance. On the other hand, by considering both link adaptation and HARQ version, the Q-learning algorithm is used to learn the performance of MCS under different environment. After learning, we want to choose the MCS with highest throughput while not going to violate the BLER requirement. The simulation results show that the proposed scheme can indeed choose a suitable MCS for the initial transmission with channel information delay consideration. Comparing to other traditional schemes, the FQL-HARQ scheme can achieve higher system throughput and maintain the BLER at the same time.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!