To see the other types of publications on this topic, follow the link: Q-learning.

Dissertations / Theses on the topic 'Q-learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Q-learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gaskett, Chris, and cgaskett@it jcu edu au. "Q-Learning for Robot Control." The Australian National University. Research School of Information Sciences and Engineering, 2002. http://thesis.anu.edu.au./public/adt-ANU20041108.192425.

Full text
Abstract:
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing
APA, Harvard, Vancouver, ISO, and other styles
2

Gaskett, Chris. "Q-Learning for robot control." View thesis entry in Australian Digital Theses Program, 2002. http://eprints.jcu.edu.au/623/1/gaskettthesis.pdf.

Full text
Abstract:
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing
APA, Harvard, Vancouver, ISO, and other styles
3

Laivamaa, J. (Juuso). "Reinforcement Q-Learning using OpenAI Gym." Bachelor's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201903151329.

Full text
Abstract:
Abstract. Q-Learning is an off-policy algorithm for reinforcement learning, that can be used to find optimal policies in Markovian domains. This thesis is about how Q-Learning can be applied to a test environment in the OpenAI Gym toolkit. The utility of testing the algorithm on a problem case is to find out how well it performs as well proving the practical utility of the algorithm. This thesis starts off with a general overview of reinforcement learning as well as the Markov decision process, both of which are crucial in understanding the theoretical groundwork that Q-Learning is based on. A
APA, Harvard, Vancouver, ISO, and other styles
4

Del, Ben Enrico <1997&gt. "Reinforcement Learning: a Q-Learning Algorithm for High Frequency Trading." Master's Degree Thesis, Università Ca' Foscari Venezia, 2021. http://hdl.handle.net/10579/20411.

Full text
Abstract:
The scope of this work is to test the implementation of an automated trading system based on Reinforcement Learning: a machine learning algorithm in which an intelligent agent acts to maximize its rewards given the environment around it. Indeed, given the environmental inputs and the environmental responses to the actions taken, the agent will learn how to behave in best way possible. In particular, in this work, a Q-Learning algorithm has been used to produce trading signals on the basis of high frequency data of the Limit Order Book for some selected stocks.
APA, Harvard, Vancouver, ISO, and other styles
5

Karlsson, Daniel. "Hyperparameter optimisation using Q-learning based algorithms." Thesis, Karlstads universitet, Fakulteten för hälsa, natur- och teknikvetenskap (from 2013), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-78096.

Full text
Abstract:
Machine learning algorithms have many applications, both for academic and industrial purposes. Examples of applications are classification of diffraction patterns in materials science and classification of properties in chemical compounds within the pharmaceutical industry. For these algorithms to be successful they need to be optimised,  part of this is achieved by training the algorithm, but there are components of the algorithms that cannot be trained. These hyperparameters have to be tuned separately. The focus of this work was optimisation of hyperparameters in classification algorithms b
APA, Harvard, Vancouver, ISO, and other styles
6

Finnman, Peter, and Max Winberg. "Deep reinforcement learning compared with Q-table learning applied to backgammon." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186545.

Full text
Abstract:
Reinforcement learning attempts to mimic how humans react to their surrounding environment by giving feedback to software agents based on the actions they take. To test the capabilities of these agents, researches have long regarded board games as a powerful tool. This thesis compares two approaches to reinforcement learning in the board game backgammon, a Q-table and a deep reinforcement network. It was determined which approach surpassed the other in terms of accuracy and convergence rate towards the perceived optimal strategy. The evaluation is performed by training the agents using the sel
APA, Harvard, Vancouver, ISO, and other styles
7

Patel, Purvag. "Improving Computer Game Bots' behavior using Q-Learning." Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1966544161&sid=3&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Burkov, Andriy. "Adaptive Dynamics Learning and Q-initialization in the Context of Multiagent Learning." Thesis, Université Laval, 2007. http://www.theses.ulaval.ca/2007/24476/24476.pdf.

Full text
Abstract:
L’apprentissage multiagent est une direction prometteuse de la recherche récente et à venir dans le contexte des systèmes intelligents. Si le cas mono-agent a été beaucoup étudié pendant les deux dernières décennies, le cas multiagent a été peu étudié vu sa complexité. Lorsque plusieurs agents autonomes apprennent et agissent simultanément, l’environnement devient strictement imprévisible et toutes les suppositions qui sont faites dans le cas mono-agent, telles que la stationnarité et la propriété markovienne, s’avèrent souvent inapplicables dans le contexte multiagent. Dans ce travail de maît
APA, Harvard, Vancouver, ISO, and other styles
9

Cunningham, Bryan. "Non-Reciprocating Sharing Methods in Cooperative Q-Learning Environments." Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34610.

Full text
Abstract:
Past research on multi-agent simulation with cooperative reinforcement learning (RL) for homogeneous agents focuses on developing sharing strategies that are adopted and used by all agents in the environment. These sharing strategies are considered to be reciprocating because all participating agents have a predefined agreement regarding what type of information is shared, when it is shared, and how the participating agent's policies are subsequently updated. The sharing strategies are specifically designed around manipulating this shared information to improve learning performance. This thesi
APA, Harvard, Vancouver, ISO, and other styles
10

Andersson, Gabriel, and Martti Yap. "Att spela 'Breakout' med hjälp av 'Deep Q-Learning'." Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255799.

Full text
Abstract:
I denna rapport implementerar vi en reinforcement learning (RL) algoritm som lär sig spela Breakout på 'Atari Learning Environment'. Den dator drivna spelaren (Agenten) har tillgång till samma information som en mänsklig spelare och vet inget om spelet och dess regler på förhand. Målet är att reproducera tidigare resultat genom att optimera agenten så att den överträffar den typiska mänskliga medelpoängen. För att genomföra detta formaliserar vi problemet som en 'Markov decision Process'. VI applicerar 'Deep Q-learning' algoritmen med 'action masking' för att uppnå en optimal strategi. Vi finn
APA, Harvard, Vancouver, ISO, and other styles
11

Wang, Ying. "Cooperative and intelligent control of multi-robot systems using machine learning." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/905.

Full text
Abstract:
This thesis investigates cooperative and intelligent control of autonomous multi-robot systems in a dynamic, unstructured and unknown environment and makes significant original contributions with regard to self-deterministic learning for robot cooperation, evolutionary optimization of robotic actions, improvement of system robustness, vision-based object tracking, and real-time performance. A distributed multi-robot architecture is developed which will facilitate operation of a cooperative multi-robot system in a dynamic and unknown environment in a self-improving, robust, and real-time manner
APA, Harvard, Vancouver, ISO, and other styles
12

Renner, Michael Robert. "Machine Learning Simulation: Torso Dynamics of Robotic Biped." Thesis, Virginia Tech, 2007. http://hdl.handle.net/10919/34602.

Full text
Abstract:
Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a loo
APA, Harvard, Vancouver, ISO, and other styles
13

Chapman, Kevin L. "A Distributed Q-learning Classifier System for task decomposition in real robot learning problems." Thesis, This resource online, 1996. http://scholar.lib.vt.edu/theses/available/etd-03042009-041449/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Ekelund, Kalle. "Q-Learning: Ett sätt att lära agenter att spela fotboll." Thesis, Högskolan i Skövde, Institutionen för kommunikation och information, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-8497.

Full text
Abstract:
Den artificiella intelligensen i spel brukar ofta använda sig utav regelbaserade tekniker för dess beteende. Detta har gjort att de artificiella agenterna blivit förutsägbara, vilket är väldigt tydligt för sportspel. Det här arbetet har utvärderat ifall inlärningstekniken Q-learning är bättre på att spela fotboll än en regelbaserade tekniken tillståndsmaskin. För att utvärdera detta har en förenklad fotbollssimulering skapats. Där de båda lagen har använts sig av varsin teknik. De båda lagen har sedan spelat 100 matcher mot varandra för att se vilket lag/teknik som är bäst. Statistik ifrån mat
APA, Harvard, Vancouver, ISO, and other styles
15

Hagen, Stephanus Hendrikus Gerhardus ten. "Continuous state space Q-learning for control of nonlinear systems." [S.l. : Amsterdam : s.n.] ; Universiteit van Amsterdam [Host], 2001. http://dare.uva.nl/document/58530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Backstad, Sebastian. "Federated Averaging Deep Q-NetworkA Distributed Deep Reinforcement Learning Algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-149637.

Full text
Abstract:
In the telecom sector, there is a huge amount of rich data generated every day. This trend will increase with the launch of 5G networks. Telco companies are interested in analyzing their data to shape and improve their core businesses. However, there can be a number of limiting factors that prevents them from logging data to central data centers for analysis.  Some examples include data privacy, data transfer, network latency etc. In this work, we present a distributed Deep Reinforcement Learning (DRL) method called Federated Averaging Deep Q-Network (FADQN), that employs a distributed hierarc
APA, Harvard, Vancouver, ISO, and other styles
17

Koppula, Sreedevi. "Automated GUI Tests Generation for Android Apps Using Q-learning." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984181/.

Full text
Abstract:
Mobile applications are growing in popularity and pose new problems in the area of software testing. In particular, mobile applications heavily depend upon user interactions and a dynamically changing environment of system events. In this thesis, we focus on user-driven events and use Q-learning, a reinforcement machine learning algorithm, to generate tests for Android applications under test (AUT). We implement a framework that automates the generation of GUI test cases by using our Q-learning approach and compare it to a uniform random (UR) implementation. A novel feature of our approach is
APA, Harvard, Vancouver, ISO, and other styles
18

Sangalli, Andrea <1990&gt. "Q-Learning: an intelligent stochastic control approach for financial trading." Master's Degree Thesis, Università Ca' Foscari Venezia, 2015. http://hdl.handle.net/10579/6428.

Full text
Abstract:
The objective is to implement a financial trading system, using MATLAB® software, to solve a stochastic control problem, which is the management of a capital. It is an automated model free machine learning based on Reinforcement Learning method, in particular Q-Learning one. This approach is developed by an algorithm which optimizes its behavior in real time based on the reactions it gets from the environment in which it operates. This project is based on a new emerging theory regarding the market efficiency, called Adaptive Market Hypothesis (AMH). I present an algorithm which might to perfor
APA, Harvard, Vancouver, ISO, and other styles
19

Popa, Veronica <1992&gt. "Q-Learning. An intelligent technique for financial trading systems implementation." Master's Degree Thesis, Università Ca' Foscari Venezia, 2019. http://hdl.handle.net/10579/14103.

Full text
Abstract:
In this thesis I consider a Reinforcement Learning (RL) approach for policy evaluation, in particular the Q-Learning algorithm (QLa). The QLa is able to dynamically optimize, in real time, its behaviour on the basis of the feedbacks it receives from the surrounding environment. First, I introduce the theory of Adaptive Market Hypothesis (AMH), on which an active portfolio management is based, as an evolution of the Efficient Market Hypothesis (EMH). Then, the essential aspects of the RL method are explained. Different parameters and values for Financial Trading Systems (FTSs) are presented in
APA, Harvard, Vancouver, ISO, and other styles
20

Ho, Junius K. (Junius Ku) 1979. "Solving the reader collision problem with a hierarchical Q-learning algorithm." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87404.

Full text
Abstract:
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.<br>Includes bibliographical references (leaves 89-90).<br>by Junius K. Hu.<br>M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
21

Abounadi, Jinane 1966. "Stochastic approximation for non-expansive maps : application to Q-learning algorithms." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10033.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.<br>Includes bibliographical references (leaves 129-133).<br>by Jinane Abounadi.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
22

Moritz, Johan, and Albin Winkelmann. "Stuck state avoidance through PID estimation training of Q-learning agent." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264562.

Full text
Abstract:
Reinforcement learning is conceptually based on an agent learning through interaction with its environment. This trial-and-error learning method makes the process prone to situations in which the agent is stuck in a dead-end, from which it cannot keep learning. This thesis studies a method to diminish the risk that a wheeled inverted pendulum, or WIP, falls over during training by having a Qlearning based agent estimate a PID controller before training it on the balance problem. We show that our approach is equally stable compared to a Q-learning agent without estimation training, while having
APA, Harvard, Vancouver, ISO, and other styles
23

Monte, Calvo Alexander. "Learning, Evolution, and Bayesian Estimation in Games and Dynamic Choice Models." Thesis, University of Oregon, 2014. http://hdl.handle.net/1794/18341.

Full text
Abstract:
This dissertation explores the modeling and estimation of learning in strategic and individual choice settings. While learning has been extensively used in economics, I introduce the concept into standard models in unorthodox ways. In each case, changing the perspective of what learning is drastically changes standard models. Estimation proceeds using advanced Bayesian techniques which perform very well in simulated data. The first chapter proposes a framework called Experienced-Based Ability (EBA) in which players increase the payoffs of a particular strategy in the future through using
APA, Harvard, Vancouver, ISO, and other styles
24

Gustafsson, Robin, and Lucas Fröjdendahl. "Machine Learning for Traffic Control of Unmanned Mining Machines : Using the Q-learning and SARSA algorithms." Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260285.

Full text
Abstract:
Manual configuration of rules for unmanned mining machine traffic control can be time-consuming and therefore expensive. This paper presents a Machine Learning approach for automatic configuration of rules for traffic control in mines with autonomous mining machines by using Q-learning and SARSA. The results show that automation might be able to cut the time taken to configure traffic rules from 1-2 weeks to a maximum of approximately 6 hours which would decrease the cost of deployment. Tests show that in the worst case the developed solution is able to run continuously for 24 hours 82% of the
APA, Harvard, Vancouver, ISO, and other styles
25

PEREIRA, ADRIANO BRITO. "PESSIMISTIC Q-LEARNING: AN ALGORITHM TO CREATE BOTS FOR TURN-BASED GAMES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28809@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO<br>COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR<br>PROGRAMA DE EXCELENCIA ACADEMICA<br>Este documento apresenta um novo algoritmo de aprendizado por reforço, o Q-Learning Pessimista. Nossa motivação é resolver o problema de gerar bots capazes de jogar jogos baseados em turnos e contribuir para obtenção de melhores resultados através dessa extensão do algoritmo Q-Learning. O Q-Learning Pessimista explora a flexibilidade dos cálculos gerados pelo Q-Learning tradicional sem a utilização de força bruta. Para medir a qualidade do bo
APA, Harvard, Vancouver, ISO, and other styles
26

Soto, Santibanez Miguel Angel. "BUILDING AN ARTIFICIAL CEREBELLUM USING A SYSTEM OF DISTRIBUTED Q-LEARNING AGENTS." Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/194811.

Full text
Abstract:
About 400 million years ago sharks developed a separate co-processor in their brains that not only made them faster but also more precisely coordinated. This co-processor, which is nowadays called cerebellum, allowed sharks to outperform their peers and survive as one of the fittest. For the last 40 years or so, researchers have been attempting to provide robots and other machines with this type of capability. This thesis discusses currently used methods to create artificial cerebellums and points out two main shortcomings: 1) framework usability issues and 2) building blocks incompatibility i
APA, Harvard, Vancouver, ISO, and other styles
27

Cesaro, Enrico <1992&gt. "Q-Learning: un algoritmo ad apprendimento per rinforzo applicato al trading finanziario." Master's Degree Thesis, Università Ca' Foscari Venezia, 2018. http://hdl.handle.net/10579/12287.

Full text
Abstract:
"In an efficient market prices fully reflect all available information". È con questa celeberrima frase che Fama, nel 1969, teorizzava le ipotesi di un mercato efficiente, presupponendo informazioni disponibili a costo zero ed individui razionali. Il più grande contributo di Fama fu quello di accendere un dibattito ultra trentennale che ha spinto la letteratura accademica alla ricerca di teorie e prove empiriche che confutassero il lavoro dell'economista statunitense, attingendo a materie interdisciplinari che considerassero non solo teorie economiche, ma anche comportamentali ed ambientali. R
APA, Harvard, Vancouver, ISO, and other styles
28

Singh, Isha. "Reinforcement Learning For Multiple Time Series." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1573223551346074.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Howe, Dustin. "Quantile Regression Deep Q-Networks for Multi-Agent System Control." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505241/.

Full text
Abstract:
Training autonomous agents that are capable of performing their assigned job without fail is the ultimate goal of deep reinforcement learning. This thesis introduces a dueling Quantile Regression Deep Q-network, where the network learns the state value quantile function and advantage quantile function separately. With this network architecture the agent is able to learn to control simulated robots in the Gazebo simulator. Carefully crafted reward functions and state spaces must be designed for the agent to learn in complex non-stationary environments. When trained for only 100,000 timesteps,
APA, Harvard, Vancouver, ISO, and other styles
30

Dunn, Noah M. "A Modified Q-Learning Approach for Predicting Mortality in Patients Diagnosed with Sepsis." Miami University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=miami1618439089761747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Webb, Samuel J. "Interpretation and mining of statistical machine learning (Q)SAR models for toxicity prediction." Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807269/.

Full text
Abstract:
Structure Activity Relationship (SAR) modelling capitalises on techniques developed within the computer science community, particularly in the fields of machine learning and data mining. These machine learning approaches are often developed for the optimisation of model accuracy which can come at the expense of the interpretation of the prediction. Highly predictive models should be the goal of any modeller, however, the intended users of the model and all factors relating to usage of the model should be considered. One such aspect is the clarity, understanding and explanation for the predicti
APA, Harvard, Vancouver, ISO, and other styles
32

Ogunniyi, Samuel. "Energy efficient path planning: the effectiveness of Q-learning algorithm in saving energy." Master's thesis, University of Cape Town, 2014. http://hdl.handle.net/11427/13308.

Full text
Abstract:
Includes bibliographical references.<br>In this thesis the author investigated the use of a Q-learning based path planning algorithm to investigate how effective it is in saving energy. It is important to pursue any means to save energy in this day and age, due to the excessive exploitation of natural resources and in order to prevent drops in production in industrial environments where less downtime is necessary or other applications where a mobile robot running out of energy can be costly or even disastrous, such as search and rescue operations or dangerous environment navigation. The study
APA, Harvard, Vancouver, ISO, and other styles
33

Lima, J?nior Manoel Leandro de. "Uma contribui??o ? solu??o do problema dos k-servos usando aprendizagem por refor?o." Universidade Federal do Rio Grande do Norte, 2005. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15405.

Full text
Abstract:
Made available in DSpace on 2014-12-17T14:55:59Z (GMT). No. of bitstreams: 1 ManoelLJ.pdf: 474615 bytes, checksum: 061ee02f4ad5cc23a561d346dd73a9da (MD5) Previous issue date: 2005-04-06<br>Neste trabalho ? proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solu??o ? comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os par?metros de compara??o significativos. Um algoritmo que apresente desempenho eficiente em rela??o aos mesmos tende a ser com
APA, Harvard, Vancouver, ISO, and other styles
34

Magnus, Sonia de Paula Faria. "Estrategias de aprendizagem em lingua estrangeira : um estudo "Q"." [s.n.], 2005. http://repositorio.unicamp.br/jspui/handle/REPOSIP/269241.

Full text
Abstract:
Orientador: Linda Gentry El-Dash<br>Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem<br>Made available in DSpace on 2018-08-04T18:30:36Z (GMT). No. of bitstreams: 1 Magnus_SoniadePaulaFaria_M.pdf: 400952 bytes, checksum: b4ea479468f5a85ca0c1da73c203235c (MD5) Previous issue date: 2005<br>Resumo: O principal objetivo deste estudo é investigar que estratégias de aprendizagem de língua estrangeira (cognitivas, metacognitivas e sócio-afetivas) são usadas conscientemente por aprendizes de 7ª e 8ª séries em escolas públicas brasileiras.Aspectos demográf
APA, Harvard, Vancouver, ISO, and other styles
35

Funkquist, Mikaela, and Minghua Lu. "Distributed Optimization Through Deep Reinforcement Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293878.

Full text
Abstract:
Reinforcement learning methods allows self-learningagents to play video- and board games autonomously. Thisproject aims to study the efficiency of the reinforcement learningalgorithms Q-learning and deep Q-learning for dynamical multi-agent problems. The goal is to train robots to optimally navigatethrough a warehouse without colliding.A virtual environment was created, in which the learning algo-rithms were tested by simulating moving agents. The algorithms’efficiency was evaluated by how fast the agents learned to performpredetermined tasks.The results show that Q-learning excels in simple p
APA, Harvard, Vancouver, ISO, and other styles
36

Damio, Siti Maftuhah. "Seeking Malay trainee English teachers' perceptions of autonomy in language learning using Q methodology." Thesis, University of Nottingham, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.594590.

Full text
Abstract:
Greater learner autonomy has, in recent decades, become recognised as a worthy goal in the field of foreign language learning. However, it is difficult to find a consensus view in the literature concerning the definition of the concept. One area of debate is the extent to which autonomy in language learning (ALL) is universally applicable, or whether versions of autonomy exist which are defined culturally or contextually. To attempt to address this debate, a study was carried out in the specific context of Malaysian higher education. The subjects were four cohorts. totalling 31 participants, o
APA, Harvard, Vancouver, ISO, and other styles
37

Ondroušek, Vít. "Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-233985.

Full text
Abstract:
The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the pla
APA, Harvard, Vancouver, ISO, and other styles
38

Kung, Teju, and 孔德如. "Gait Balancing by Q-learning." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/12698998321901616016.

Full text
Abstract:
碩士<br>國立中正大學<br>電機工程研究所<br>99<br>In the context of the humanoid biped robot, for building a robot model with 18 dimensions, and applying this model to achieve the balance of robot behavior at the same time needs for large amount of calculation of mathematical derivations. The study on biped walking and balance control using reinforcement learning is presented in this paper. The algorithm can lead a robot learn to walk without any previous knowledge about any explicit dynamics model. Meanwhile, to put applications of reinforcement learning into more practice, it is an important issue to equip t
APA, Harvard, Vancouver, ISO, and other styles
39

Gaskett, Chris. "Q-Learning for Robot Control." Phd thesis, 2002. http://hdl.handle.net/1885/47080.

Full text
Abstract:
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing
APA, Harvard, Vancouver, ISO, and other styles
40

Gaskett, Chris. "Q-Learning for robot control." Thesis, 2002. http://hdl.handle.net/1885/47080.

Full text
Abstract:
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing
APA, Harvard, Vancouver, ISO, and other styles
41

"Asynchronous stochastic approximation and Q-learning." Massachusetts Institute of Technology, Laboratory for Information and Decision Systems], 1993. http://hdl.handle.net/1721.1/3312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Liu, Hsiu-Chen, and 劉繡禎. "Combining Q-Learning with Hybrid Learning Approach in RoboCup." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/s2z5ry.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>電資碩士班<br>101<br>RoboCup is an international competition developed in 1997. The mission is “By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup”. For academic, RoboCup provides an excellent test bed for machine learning. As in a soccer game, environment states are constantly changing. Therefore, how to make a soccer agent learn autonomously to act with the best responses has becomes an important issue. The paper “Applying Hybrid Le
APA, Harvard, Vancouver, ISO, and other styles
43

Herrmann, Michael, and Ralf Der. "Efficient Q-Learning by Division of Labor." 1995. https://ul.qucosa.de/id/qucosa%3A32942.

Full text
Abstract:
Q-learning as well as other learning paradigms depend strongly on the representation of the underlying state space. As a special case of the hidden state problem we investigate the effect of a self-organizing discretization of the state space in a simple control problem. We apply the neural gas algorithm with adaptation of learning rate and neighborhood range to a simulated cart-pole problem. The learning parameters are determined by the ambiguity of successful actions inside each cell.
APA, Harvard, Vancouver, ISO, and other styles
44

Tsai, Shih-Hung, and 蔡世宏. "Fuzzy Q-Learning based Hybrid ARQ for HSDPA." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/31870432212612583075.

Full text
Abstract:
碩士<br>國立交通大學<br>電機學院通訊與網路科技產業專班<br>95<br>In order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). HSDPA adopts a technique called adaptive modulation and coding (AMC) to use different modulation orders and coding rates corresponding to different channel conditions. It provides more multi-codes for transmissions, and also adopts the hybrid automatic retransmission request (HARQ) scheme to make the transmission mechanism more effective. In order to adapt the changes
APA, Harvard, Vancouver, ISO, and other styles
45

Wu, Li Hsuan, and 吳俐萱. "An Improved Dyna-Q algorithm via Indirect Learning." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/14795815166083051340.

Full text
Abstract:
碩士<br>國立中正大學<br>電機工程研究所<br>101<br>In this thesis, we applied more algorithms, such as Ant Colony Optimization (ACO), Prioritized Sweeping, to improve the problem of learning speed in Dyna-Q learning algorithms. The agent interacts with environment and learns policy by Q-learning, and builds interactive information to the virtual backward prediction model. As the agent explores the unknown environment, makes a decision action by the exploration factor. In Dyna architecture, the planning method produces the random state-action pairs that have been experienced; however, prioritized sweeping (brea
APA, Harvard, Vancouver, ISO, and other styles
46

Su, Wei-De, and 蘇偉德. "Combination of Q-learning and Fuzzy-State for Learning of RoboCup Agent." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/jg434k.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>資訊工程系研究所<br>98<br>Artificial intelligence always been interesting in computer science,and in this area machine learning is the key to success, RoboCup(Robot World Cup Tournament) is a competition game which has already become a popular research domain in recent years, includes the real robot as well as computer simulation games and also provide comprehensive rules and mechanisms. In Academic,it provides a best test-bed for machine learning. As the soccer game, the environment states are always changing.Therefor, in this paper, we use the Q-Learning method that is a kind of re
APA, Harvard, Vancouver, ISO, and other styles
47

Lin, Shin-Yi, and 林欣儀. "Hardware Implementation of a FAST-based Q-learning Algorithm." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/01215306387784303539.

Full text
Abstract:
碩士<br>國立中正大學<br>電機工程所<br>94<br>This thesis studies a modified Flexible Adaptable-Size Topology (FAST) Neural Architecture, and its integration with reinforcement algorithms. Through the control of balancing an inverted pendulum, the performance of the proposed architecture is verified. In the article, that an architecture, in which the FAST algorithm and the Adaptive Heuristic Critic (AHC) algorithm is combined, is studied. The performance of this architecture is not as what it has been expected when balancing an inverted pendulum is simulated. Hence, we modify three portions of the FAST algor
APA, Harvard, Vancouver, ISO, and other styles
48

Tsai, Cheng-Yu, and 蔡承佑. "Online Learning System Using Q&A management mechanism." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/90277341659478981179.

Full text
Abstract:
碩士<br>國立中央大學<br>資訊工程研究所<br>98<br>Students usually turn to on-line information when they study. Thus they can make up for the infertility of the textbook contents, solving their questions and acquiring more complete knowledge. However, sometimes students can not find satisfying solution on the web, then except for asking somebody in real-life, they can also post a question on the forum and discuss with others in the forum to come up with a proper answer.   However, because the forum is independent from the textbook, discussions that aimed at some particular concepts in the textbook is scattered
APA, Harvard, Vancouver, ISO, and other styles
49

Lu, Xiaoqi. "Regularized Greedy Gradient Q-Learning with Mobile Health Applications." Thesis, 2021. https://doi.org/10.7916/d8-zv13-2p78.

Full text
Abstract:
Recent advance in health and technology has made mobile apps a viable approach to delivering behavioral interventions in areas including physical activity encouragement, smoking cessation, substance abuse prevention, and mental health management. Due to the chronic nature of most of the disorders and heterogeneity among mobile users, delivery of the interventions needs to be sequential and tailored to individual needs. We operationalize the sequential decision making via a policy that takes a mobile user's past usage pattern and health status as input and outputs an app/intervention recommenda
APA, Harvard, Vancouver, ISO, and other styles
50

Yin, Huang Chiao, and 黃巧瑩. "HARQ Process for HSDPA by Fuzzy Q-learning Technique." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/15151366792629085484.

Full text
Abstract:
碩士<br>國立交通大學<br>電信工程系所<br>96<br>In order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). An important QoS requirement defined in spec for the hybrid automatic retransmission request (HARQ) process is to choose a suitable MCS to maintain the initial block error rate (BLER) smaller than 0.1 based on the channel quality information. In this thesis, we proposed a fuzzy Q-learning based HARQ (FQL-HARQ) scheme for HSDPA to solve this problem. The HARQ scheme is modeled a
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!