Dissertationen zum Thema „Q-learning“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "Q-learning" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Gaskett, Chris, und cgaskett@it jcu edu au. „Q-Learning for Robot Control“. The Australian National University. Research School of Information Sciences and Engineering, 2002. http://thesis.anu.edu.au./public/adt-ANU20041108.192425.
Der volle Inhalt der QuelleGaskett, Chris. „Q-Learning for robot control“. View thesis entry in Australian Digital Theses Program, 2002. http://eprints.jcu.edu.au/623/1/gaskettthesis.pdf.
Der volle Inhalt der QuelleLaivamaa, J. (Juuso). „Reinforcement Q-Learning using OpenAI Gym“. Bachelor's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201903151329.
Der volle Inhalt der QuelleDel, Ben Enrico <1997>. „Reinforcement Learning: a Q-Learning Algorithm for High Frequency Trading“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2021. http://hdl.handle.net/10579/20411.
Der volle Inhalt der QuelleKarlsson, Daniel. „Hyperparameter optimisation using Q-learning based algorithms“. Thesis, Karlstads universitet, Fakulteten för hälsa, natur- och teknikvetenskap (from 2013), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-78096.
Der volle Inhalt der QuelleMaskininlärningsalgoritmer har många tillämpningsområden, både akademiska och inom industrin. Exempel på tillämpningar är, klassificering av diffraktionsmönster inom materialvetenskap och klassificering av egenskaper hos kemiska sammansättningar inom läkemedelsindustrin. För att dessa algoritmer ska prestera bra behöver de optimeras. En del av optimering sker vid träning av algoritmerna, men det finns komponenter som inte kan tränas. Dessa hyperparametrar måste justeras separat. Fokuset för det här arbetet var optimering av hyperparametrar till klassificeringsalgoritmer baserade på faltande neurala nätverk. Syftet med avhandlingen var att undersöka möjligheterna att använda förstärkningsinlärningsalgoritmer, främst ''Q-learning'', som den optimerande algoritmen. Tre olika algoritmer undersöktes, ''Q-learning'', dubbel ''Q-learning'' samt en algoritm inspirerad av ''Q-learning'', denna utvecklades under arbetets gång. Algoritmerna utvärderades på olika testproblem och jämfördes mot resultat uppnådda med en slumpmässig sökning av hyperparameterrymden, vilket är en av de vanligare metoderna för att optimera den här typen av algoritmer. Alla tre algoritmer påvisade någon form av inlärning, men endast den ''Q-learning'' inspirerade algoritmen presterade bättre än den slumpmässiga sökningen. En iterativ implemetation av den ''Q-learning'' inspirerade algoritmen utvecklades också. Den iterativa metoden tillät den tillgängliga hyperparameterrymden att förfinas mellan varje iteration. Detta medförde ytterligare förbättringar av resultaten som indikerade att beräkningstiden i vissa fall kunde minskas med upp till 40% jämfört med den slumpmässiga sökningen med bibehållet eller förbättrat resultat.
Finnman, Peter, und Max Winberg. „Deep reinforcement learning compared with Q-table learning applied to backgammon“. Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186545.
Der volle Inhalt der QuellePatel, Purvag. „Improving Computer Game Bots' behavior using Q-Learning“. Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1966544161&sid=3&Fmt=2&clientId=1509&RQT=309&VName=PQD.
Der volle Inhalt der QuelleBurkov, Andriy. „Adaptive Dynamics Learning and Q-initialization in the Context of Multiagent Learning“. Thesis, Université Laval, 2007. http://www.theses.ulaval.ca/2007/24476/24476.pdf.
Der volle Inhalt der QuelleMultiagent learning is a promising direction of the modern and future research in the context of intelligent systems. While the single-agent case has been well studied in the last two decades, the multiagent case has not been broadly studied due to its complex- ity. When several autonomous agents learn and act simultaneously, the environment becomes strictly unpredictable and all assumptions that are made in single-agent case, such as stationarity and the Markovian property, often do not hold in the multiagent context. In this Master’s work we study what has been done in this research field, and propose an original approach to multiagent learning in presence of adaptive agents. We explain why such an approach gives promising results by comparing it with other different existing approaches. It is important to note that one of the most challenging problems of all multiagent learning algorithms is their high computational complexity. This is due to the fact that the state space size of multiagent problem is exponential in the number of agents acting in the environment. In this work we propose a novel approach to the complexity reduction of the multiagent reinforcement learning. Such an approach permits to significantly reduce the part of the state space needed to be visited by the agents to learn an efficient solution. Then we evaluate our algorithms on a set of empirical tests and give a preliminary theoretical result, which is first step in forming the basis of validity of our approaches to multiagent learning.
Cunningham, Bryan. „Non-Reciprocating Sharing Methods in Cooperative Q-Learning Environments“. Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34610.
Der volle Inhalt der QuelleMaster of Science
Andersson, Gabriel, und Martti Yap. „Att spela 'Breakout' med hjälp av 'Deep Q-Learning'“. Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255799.
Der volle Inhalt der QuelleWe cover in this report the implementation of a reinforcement learning (RL) algorithm capable of learning how to play the game 'Breakout' on the Atari Learning Environment (ALE). The non-human player (agent) is given no prior information of the game and must learn from the same sensory input that a human would typically receive when playing the game. The aim is to reproduce previous results by optimizing the agent driven control of 'Breakout' so as to surpass a typical human score. To this end, the problem is formalized by modeling it as a Markov Decision Process. We apply the celebrated Deep Q-Learning algorithm with action masking to achieve an optimal strategy. We find our agent's average score to be just below the human benchmarks: achieving an average score of 20, approximately 65% of the human counterpart. We discuss a number of implementations that boosted agent performance, as well as further techniques that could lead to improvements in the future.
Wang, Ying. „Cooperative and intelligent control of multi-robot systems using machine learning“. Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/905.
Der volle Inhalt der QuelleRenner, Michael Robert. „Machine Learning Simulation: Torso Dynamics of Robotic Biped“. Thesis, Virginia Tech, 2007. http://hdl.handle.net/10919/34602.
Der volle Inhalt der QuelleMaster of Science
Chapman, Kevin L. „A Distributed Q-learning Classifier System for task decomposition in real robot learning problems“. Thesis, This resource online, 1996. http://scholar.lib.vt.edu/theses/available/etd-03042009-041449/.
Der volle Inhalt der QuelleEkelund, Kalle. „Q-Learning: Ett sätt att lära agenter att spela fotboll“. Thesis, Högskolan i Skövde, Institutionen för kommunikation och information, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-8497.
Der volle Inhalt der QuelleHagen, Stephanus Hendrikus Gerhardus ten. „Continuous state space Q-learning for control of nonlinear systems“. [S.l. : Amsterdam : s.n.] ; Universiteit van Amsterdam [Host], 2001. http://dare.uva.nl/document/58530.
Der volle Inhalt der QuelleBackstad, Sebastian. „Federated Averaging Deep Q-NetworkA Distributed Deep Reinforcement Learning Algorithm“. Thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-149637.
Der volle Inhalt der QuelleKoppula, Sreedevi. „Automated GUI Tests Generation for Android Apps Using Q-learning“. Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984181/.
Der volle Inhalt der QuelleSangalli, Andrea <1990>. „Q-Learning: an intelligent stochastic control approach for financial trading“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2015. http://hdl.handle.net/10579/6428.
Der volle Inhalt der QuellePopa, Veronica <1992>. „Q-Learning. An intelligent technique for financial trading systems implementation“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2019. http://hdl.handle.net/10579/14103.
Der volle Inhalt der QuelleHo, Junius K. (Junius Ku) 1979. „Solving the reader collision problem with a hierarchical Q-learning algorithm“. Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87404.
Der volle Inhalt der QuelleIncludes bibliographical references (leaves 89-90).
by Junius K. Hu.
M.Eng.
Abounadi, Jinane 1966. „Stochastic approximation for non-expansive maps : application to Q-learning algorithms“. Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10033.
Der volle Inhalt der QuelleIncludes bibliographical references (leaves 129-133).
by Jinane Abounadi.
Ph.D.
Moritz, Johan, und Albin Winkelmann. „Stuck state avoidance through PID estimation training of Q-learning agent“. Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264562.
Der volle Inhalt der QuelleReinforcement learning baseras på en agent som lär sig genom att interagera med sin omgivning. Denna inlärningsmetod kan göra att agenten hamnar i situationer där den fastnar och inte kan fortsätta träningen. I denna examensuppsats utforskas en metod för att minska risken att en självkörande robot faller under inlärning. Detta görs genom att en Q-learning agent tränas till att estimera en PID kontroller innan den tränar på balanseringsproblemet. Vi visar att vår metod är likvärdigt stabil jämfört med en Q-learning agent utan estimeringsträning. Under träning faller roboten färre än hälften så många gånger när den kontrolleras av vår metod. Båda agenterna lyckas balansera roboten under en hel timme.
Monte, Calvo Alexander. „Learning, Evolution, and Bayesian Estimation in Games and Dynamic Choice Models“. Thesis, University of Oregon, 2014. http://hdl.handle.net/1794/18341.
Der volle Inhalt der QuelleGustafsson, Robin, und Lucas Fröjdendahl. „Machine Learning for Traffic Control of Unmanned Mining Machines : Using the Q-learning and SARSA algorithms“. Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260285.
Der volle Inhalt der QuelleManuell konfigurering av trafikkontroll för obemannade gruvmaskiner kan vara en tidskrävande process. Om denna konfigurering skulle kunna automatiseras så skulle det gynnas tidsmässigt och ekonomiskt. Denna rapport presenterar en lösning med maskininlärning med Q-learning och SARSA som tillvägagångssätt. Resultaten visar på att konfigureringstiden möjligtvis kan tas ned från 1–2 veckor till i värsta fallet 6 timmar vilket skulle minska kostnaden för produktionssättning. Tester visade att den slutgiltiga lösningen kunde köra kontinuerligt i 24 timmar med minst 82% träffsäkerhet jämfört med 100% då den manuella konfigurationen används. Slutsatsen är att maskininlärning eventuellt kan användas för automatisk konfiguration av trafikkontroll. Vidare arbete krävs för att höja träffsäkerheten till 100% så att det kan användas istället för manuell konfiguration. Fler studier bör göras för att se om detta även är sant och applicerbart för mer komplexa scenarier med större gruvlayouts och fler maskiner.
PEREIRA, ADRIANO BRITO. „PESSIMISTIC Q-LEARNING: AN ALGORITHM TO CREATE BOTS FOR TURN-BASED GAMES“. PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28809@1.
Der volle Inhalt der QuelleCOORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Este documento apresenta um novo algoritmo de aprendizado por reforço, o Q-Learning Pessimista. Nossa motivação é resolver o problema de gerar bots capazes de jogar jogos baseados em turnos e contribuir para obtenção de melhores resultados através dessa extensão do algoritmo Q-Learning. O Q-Learning Pessimista explora a flexibilidade dos cálculos gerados pelo Q-Learning tradicional sem a utilização de força bruta. Para medir a qualidade do bot gerado, consideramos qualidade como a soma do potencial de vitória e empate em um jogo. Nosso propósito fundamental é gerar bots de boa qualidade para diferentes jogos. Desta forma, podemos utilizar este algoritmo para famílias de jogos baseados em turno. Desenvolvemos um framework chamado Wisebots e realizamos experimentos com alguns cenários aplicados aos seguintes jogos tradicionais: TicTacToe, Connect-4 e CardPoints. Comparando a qualidade do Q-Learning Pessimista com a do Q-Learning tradicional, observamos ganhos de 0,8 por cento no TicTacToe, obtendo um algoritmo que nunca perde. Observamos também ganhos de 35 por cento no Connect-4 e de 27 por cento no CardPoints, elevando ambos da faixa de 50 por cento a 60 por cento para 90 por cento a 100 por cento de qualidade. Esses resultados ilustram o potencial de melhoria com o uso do Q-Learning Pessimista, sugerindo sua aplicação aos diversos tipos de jogos de turnos.
This document presents a new algorithm for reinforcement learning method, Q-Learning Pessimistic. Our motivation is to resolve the problem of generating bots able to play turn-based games and contribute to achieving better results through this extension of the Q-Learning algorithm. The Q-Learning Pessimistic explores the flexibility of the calculations generated by the traditional Q-learning without the use of force brute. To measure the quality of bot generated, we consider quality as the sum of the potential to win and tie in a game. Our fundamental purpose, is to generate bots with good quality for different games. Thus, we can use this algorithm to families of turn-based games. We developed a framework called Wisebots and conducted experiments with some scenarios applied to the following traditional games TicTacToe, Connect-4 and CardPoints. Comparing the quality of Pessimistic Q-Learning with the traditional Q-Learning, we observed gains to 100 per cent in the TicTacToe, obtaining an algorithm that never loses. Also observed in 35 per cent gains Connect-4 and 27 per cent in CardPoints, increasing both the range of 60 per cent to 80 per cent for 90 per cent to 100 per cent of quality. These results illustrate the potential for improvement with the use of Q-Learning Pessimistic, suggesting its application to various types of games.
Soto, Santibanez Miguel Angel. „BUILDING AN ARTIFICIAL CEREBELLUM USING A SYSTEM OF DISTRIBUTED Q-LEARNING AGENTS“. Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/194811.
Der volle Inhalt der QuelleCesaro, Enrico <1992>. „Q-Learning: un algoritmo ad apprendimento per rinforzo applicato al trading finanziario“. Master's Degree Thesis, Università Ca' Foscari Venezia, 2018. http://hdl.handle.net/10579/12287.
Der volle Inhalt der QuelleSingh, Isha. „Reinforcement Learning For Multiple Time Series“. University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1573223551346074.
Der volle Inhalt der QuelleHowe, Dustin. „Quantile Regression Deep Q-Networks for Multi-Agent System Control“. Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505241/.
Der volle Inhalt der QuelleDunn, Noah M. „A Modified Q-Learning Approach for Predicting Mortality in Patients Diagnosed with Sepsis“. Miami University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=miami1618439089761747.
Der volle Inhalt der QuelleWebb, Samuel J. „Interpretation and mining of statistical machine learning (Q)SAR models for toxicity prediction“. Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807269/.
Der volle Inhalt der QuelleOgunniyi, Samuel. „Energy efficient path planning: the effectiveness of Q-learning algorithm in saving energy“. Master's thesis, University of Cape Town, 2014. http://hdl.handle.net/11427/13308.
Der volle Inhalt der QuelleIn this thesis the author investigated the use of a Q-learning based path planning algorithm to investigate how effective it is in saving energy. It is important to pursue any means to save energy in this day and age, due to the excessive exploitation of natural resources and in order to prevent drops in production in industrial environments where less downtime is necessary or other applications where a mobile robot running out of energy can be costly or even disastrous, such as search and rescue operations or dangerous environment navigation. The study was undertaken by implementing a Q-learning based path planning algorithm in several unstructured and unknown environments. A cell decomposition method was used to generate the search space representation of the environments, within which the algorithm operated. The results show that the Q-learning path planner paths on average consumed 3.04% less energy than the A* path planning algorithm, in a square 20% obstacle density environment. The Q-learning path planner consumed on average 5.79% more energy than the least energy paths for the same environment. In the case of rectangular environments, the Q-learning path planning algorithm uses 1.68% less energy, than the A* path algorithm and 3.26 % more energy than the least energy paths. The implication of this study is to highlight the need for the use of learning algorithm in attempting to solve problems whose existing solutions are not learning based, in order to obtain better solutions.
Lima, J?nior Manoel Leandro de. „Uma contribui??o ? solu??o do problema dos k-servos usando aprendizagem por refor?o“. Universidade Federal do Rio Grande do Norte, 2005. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15405.
Der volle Inhalt der QuelleNeste trabalho ? proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solu??o ? comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os par?metros de compara??o significativos. Um algoritmo que apresente desempenho eficiente em rela??o aos mesmos tende a ser competitivo tamb?m, devendo, obviamente, se provar o referido fato. Tal prova, entretanto, foge aos objetivos do presente trabalho. O algoritmo apresentado para a solu??o do PKS ? baseado em t?cnicas de aprendizagem por refor?o. Para tanto, o problema foi modelado como um processo de decis?o em m?ltiplas etapas, ao qual ? aplicado o algoritmo Q-Learning, um dos m?todos de solu??o mais populares para o estabelecimento de pol?ticas ?timas neste tipo de problema de decis?o. Entretanto, deve-se observar que a dimens?o da estrutura de armazenamento utilizada pela aprendizagem por refor?o para se obter a pol?tica ?tima cresce em fun??o do n?mero de estados e de a??es, que por sua vez ? proporcional ao n?mero n de n?s e k de servos. Ao se analisar esse crescimento (matematicamente, ) percebe-se que o mesmo ocorre de maneira exponencial, limitando a aplica??o do m?todo a problemas de menor porte, onde o n?mero de n?s e de servos ? reduzido. Este problema, denominado maldi??o da dimensionalidade, foi introduzido por Belmann e implica na impossibilidade de execu??o de um algoritmo para certas inst?ncias de um problema pelo esgotamento de recursos computacionais para obten??o de sua sa?da. De modo a evitar que a solu??o proposta, baseada exclusivamente na aprendizagem por refor?o, seja restrita a aplica??es de menor porte, prop?e-se uma solu??o alternativa para problemas mais realistas, que envolvam um n?mero maior de n?s e de servos. Esta solu??o alternativa ? hierarquizada e utiliza dois m?todos de solu??o do PKS: a aprendizagem por refor?o, aplicada a um n?mero reduzido de n?s obtidos a partir de um processo de agrega??o, e um m?todo guloso, aplicado aos subconjuntos de n?s resultantes do processo de agrega??o, onde o crit?rio de escolha do agendamento dos servos ? baseado na menor dist?ncia ao local de demanda
Magnus, Sonia de Paula Faria. „Estrategias de aprendizagem em lingua estrangeira : um estudo "Q"“. [s.n.], 2005. http://repositorio.unicamp.br/jspui/handle/REPOSIP/269241.
Der volle Inhalt der QuelleDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem
Made available in DSpace on 2018-08-04T18:30:36Z (GMT). No. of bitstreams: 1 Magnus_SoniadePaulaFaria_M.pdf: 400952 bytes, checksum: b4ea479468f5a85ca0c1da73c203235c (MD5) Previous issue date: 2005
Resumo: O principal objetivo deste estudo é investigar que estratégias de aprendizagem de língua estrangeira (cognitivas, metacognitivas e sócio-afetivas) são usadas conscientemente por aprendizes de 7ª e 8ª séries em escolas públicas brasileiras.Aspectos demográficos, tais como origem, idade e sexo também foram explorados, a fim de verificar se estes exerciam alguma influência no uso destas estratégias. Quatro metodologias distintas foram usadas na pesquisa: grupos focais, análise estatística, estudo ¿Q¿ e entrevistas.A primeira coleta de dados foi obtida através das discussões nos grupos focais, das quais as estratégias foram retiradas, analisadas, selecionadas e submetidas a um questionário e análise fatorial.Os fatores que surgiram desta análise foram usados para compor os itens do estudo ¿Q¿, no qual foi possível identificar três diferentes pontos de vista.O primeiro deles (ponto de vista/fator ¿A¿) apresentou o perfil do típico aluno considerado ¿bom¿ (aquele que toma nota e revê os exercícios, por exemplo); o segundo (ponto de vista/ fator ¿B¿) revelou o perfil do aluno ¿preguiçoso¿, numa visão mais tradicional (não classifica palavras, não copia diálogos, por exemplo) e o terceiro ponto de vista (ponto de vista/ fator ¿C¿), o de interpretação mais difícil, foi melhor entendido somente através de entrevista com cada um dos dois participantes mais típicos deste ponto de vista: são alunos autônomos, que geralmente preferem tarefas mais práticas. Foi possível verificar, também, que todos os pontos de vista foram mais influenciados por um ou outro aspecto demográfico apresentado no estudo
Abstract: The main goal of this study is to investigate which learning strategies in foreign language (among cognitive, metacognitive and social/affective strategies) are consciously used by learners from ¿7ª and 8ª séries¿ in Brazilian public schools. Demographic aspects such as origin, age and sex were also explored in order to check if they get any sort of influence in learning strategies use. Four distinct methodologies were adopted in this research: focal groups, factor analysis, ¿Q¿ study and interviews. The first data collection was obtained through discussions in focal groups, from which strategies were collected, analyzed, selected and submitted to a questionnaire and factor analysis. The factors, which emerged from this analysis, were used to compose the items in ¿Q¿ study. It was possible to identify three distinct points of view from this study. The first one (point of view/ factor ¿A¿) showed the typical ¿good student¿s¿ profile (e.g.: the one who takes notes, reviews exercises); the second one (point of view/ factor ¿B¿) revealed the ¿lazy student¿s¿ profile, in a more traditional point of view (subjects in this point of view do not classify words, neither copy dialogues, for example); and the third (point of view/ factor ¿C¿), the most difficult one to interpret, could only be better understood after interviews with each one of the two most typical subjects in this point of view. We found out that subjects who share this point of view are autonomous and usually prefer practical tasks better than others. It was also possible to verify that all points of view could be more affected by one or other demographic aspect presented in this study
Mestrado
Ensino-Aprendizagem de Segunda Lingua e Lingua Estrangeira
Mestre em Linguística Aplicada
Funkquist, Mikaela, und Minghua Lu. „Distributed Optimization Through Deep Reinforcement Learning“. Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293878.
Der volle Inhalt der QuelleFörstärkningsinlärningsmetoder tillåter självlärande enheter att spela video- och brädspel autonomt. Projektet siktar på att studera effektiviteten hos förstärkningsinlärningsmetoderna Q-learning och deep Q-learning i dynamiska problem. Målet är att träna upp robotar så att de kan röra sig genom ett varuhus på bästa sätt utan att kollidera. En virtuell miljö skapades, i vilken algoritmerna testades genom att simulera agenter som rörde sig. Algoritmernas effektivitet utvärderades av hur snabbt agenterna lärde sig att utföra förutbestämda uppgifter. Resultatet visar att Q-learning fungerar bra för enkla problem med få agenter, där system med två aktiva agenter löstes snabbt. Deep Q-learning fungerar bättre för mer komplexa system som innehåller fler agenter, men fall med suboptimala rörelser uppstod. Båda algoritmerna visade god potential inom deras respektive områden, däremot måste förbättringar göras innan de kan användas i verkligheten.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
Damio, Siti Maftuhah. „Seeking Malay trainee English teachers' perceptions of autonomy in language learning using Q methodology“. Thesis, University of Nottingham, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.594590.
Der volle Inhalt der QuelleOndroušek, Vít. „Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu“. Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-233985.
Der volle Inhalt der QuelleKung, Teju, und 孔德如. „Gait Balancing by Q-learning“. Thesis, 2011. http://ndltd.ncl.edu.tw/handle/12698998321901616016.
Der volle Inhalt der Quelle國立中正大學
電機工程研究所
99
In the context of the humanoid biped robot, for building a robot model with 18 dimensions, and applying this model to achieve the balance of robot behavior at the same time needs for large amount of calculation of mathematical derivations. The study on biped walking and balance control using reinforcement learning is presented in this paper. The algorithm can lead a robot learn to walk without any previous knowledge about any explicit dynamics model. Meanwhile, to put applications of reinforcement learning into more practice, it is an important issue to equip the system with a continuous action policy, because it can enhance the effectiveness of learning, and also make robots more adapted to the real environment. Based on the Tabu Search and convention Q-learning which can generate a mapping between a paradigm action and a discrete state space, the proposed reinforcement learning algorithm is developed to deal with the problem of reinforcement learning in continuous action domain by means of a self-organized state aggregation mechanism. The learning architecture is developed in to solve more complex control problems. It spans the basis discrete actions to construct a continuous action policy. The architecture allows for scaling the dimensionality of the state space and cardinality of the action set that represent new knowledge, or new requirements for the desired task. The simulation analysis, using the simulation model of a physical biped robot, shows that a biped robot can perform its basic walking skill with a priori knowledge and then learn to improve its behavior in terms of walking speed and restricted positions of center of mass by incorporation of the human intuitive balancing knowledge and walking evaluation knowledge.
Gaskett, Chris. „Q-Learning for Robot Control“. Phd thesis, 2002. http://hdl.handle.net/1885/47080.
Der volle Inhalt der QuelleGaskett, Chris. „Q-Learning for robot control“. Thesis, 2002. http://hdl.handle.net/1885/47080.
Der volle Inhalt der Quelle„Asynchronous stochastic approximation and Q-learning“. Massachusetts Institute of Technology, Laboratory for Information and Decision Systems], 1993. http://hdl.handle.net/1721.1/3312.
Der volle Inhalt der QuelleLiu, Hsiu-Chen, und 劉繡禎. „Combining Q-Learning with Hybrid Learning Approach in RoboCup“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/s2z5ry.
Der volle Inhalt der Quelle國立臺北科技大學
電資碩士班
101
RoboCup is an international competition developed in 1997. The mission is “By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup”. For academic, RoboCup provides an excellent test bed for machine learning. As in a soccer game, environment states are constantly changing. Therefore, how to make a soccer agent learn autonomously to act with the best responses has becomes an important issue. The paper “Applying Hybrid Learning Approach to RoboCup''s Strategy” discusses the hybrid learning approach in this field. In this paper, to carry on the concept, we continue to apply the hybrid learning approach for the coach agent; while for the player agent, we apply the Q-Learning method. Furthermore, in order to solve the excessive environment state which slows down the learning rate, here we use fuzzy-state and fuzzy-rule to decrease the state space and to simplify the State-Action Table of Q-Learning. Finally, we build this soccer team that coach agent and player agent both have learning ability in RoboCup Soccer simulator. Through experiments, we analyze and compare the learning effects and the efficiency of execution.
Herrmann, Michael, und Ralf Der. „Efficient Q-Learning by Division of Labor“. 1995. https://ul.qucosa.de/id/qucosa%3A32942.
Der volle Inhalt der QuelleTsai, Shih-Hung, und 蔡世宏. „Fuzzy Q-Learning based Hybrid ARQ for HSDPA“. Thesis, 2007. http://ndltd.ncl.edu.tw/handle/31870432212612583075.
Der volle Inhalt der Quelle國立交通大學
電機學院通訊與網路科技產業專班
95
In order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). HSDPA adopts a technique called adaptive modulation and coding (AMC) to use different modulation orders and coding rates corresponding to different channel conditions. It provides more multi-codes for transmissions, and also adopts the hybrid automatic retransmission request (HARQ) scheme to make the transmission mechanism more effective. In order to adapt the changes of channel conditions, HSDPA adopts shorter transmission time interval (TTI) as 2ms to reach more effective resource allocation. In this thesis, a fuzzy Q-learning based HARQ scheme for HSDPA is proposed. For the HARQ scheme modeled as Markov decision process (MDP), we use fuzzy Q-learning algorithm to learn the modulation and coding rates of initial transmissions. Our object is to satisfy the quality of service (QoS) and keep high data rate. The simulation results show that the proposed scheme can indeed reach the object and is feasible in different channel conditions.
Wu, Li Hsuan, und 吳俐萱. „An Improved Dyna-Q algorithm via Indirect Learning“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/14795815166083051340.
Der volle Inhalt der Quelle國立中正大學
電機工程研究所
101
In this thesis, we applied more algorithms, such as Ant Colony Optimization (ACO), Prioritized Sweeping, to improve the problem of learning speed in Dyna-Q learning algorithms. The agent interacts with environment and learns policy by Q-learning, and builds interactive information to the virtual backward prediction model. As the agent explores the unknown environment, makes a decision action by the exploration factor. In Dyna architecture, the planning method produces the random state-action pairs that have been experienced; however, prioritized sweeping (breadth-first planning) is proposed to improve Dyna-Q planning method. In this thesis, we proposed two methods of planning, depth-first planning and hybrid planning. Depth-first planning applied ACO algorithm concept and explored factor; hybrid Planning combines with advantage of the depth-first planning with prioritized sweeping (breadth-first planning). In order to improve shortcomings of the model, we propose the model shaping predicting insufficient information on the model. For verifying the proposed method, we simulate in maze and mountain car environment. The results of simulation prove we propose the method can promote speed and efficiency of learning.
Su, Wei-De, und 蘇偉德. „Combination of Q-learning and Fuzzy-State for Learning of RoboCup Agent“. Thesis, 2010. http://ndltd.ncl.edu.tw/handle/jg434k.
Der volle Inhalt der Quelle國立臺北科技大學
資訊工程系研究所
98
Artificial intelligence always been interesting in computer science,and in this area machine learning is the key to success, RoboCup(Robot World Cup Tournament) is a competition game which has already become a popular research domain in recent years, includes the real robot as well as computer simulation games and also provide comprehensive rules and mechanisms. In Academic,it provides a best test-bed for machine learning. As the soccer game, the environment states are always changing.Therefor, in this paper, we use the Q-Learning method that is a kind of reinforcement learning to apply for learning of robocup agent. And, in order to solve the environment states of excessive problem which led to slow learning rate, we will use fuzzy-state and fuzzy-rule to decrease the state and state-action table of Q-Learning.
Lin, Shin-Yi, und 林欣儀. „Hardware Implementation of a FAST-based Q-learning Algorithm“. Thesis, 2006. http://ndltd.ncl.edu.tw/handle/01215306387784303539.
Der volle Inhalt der Quelle國立中正大學
電機工程所
94
This thesis studies a modified Flexible Adaptable-Size Topology (FAST) Neural Architecture, and its integration with reinforcement algorithms. Through the control of balancing an inverted pendulum, the performance of the proposed architecture is verified. In the article, that an architecture, in which the FAST algorithm and the Adaptive Heuristic Critic (AHC) algorithm is combined, is studied. The performance of this architecture is not as what it has been expected when balancing an inverted pendulum is simulated. Hence, we modify three portions of the FAST algorithm and combine it with the AHC and achieve better effect in the same simulation. However, it is difficult to realize hardware structure of the FAST with AHC algorithm due to the complexity in the AHC structure. Q-learning or Q(λ) reinforcement learning algorithms with simple structures are then chosen to integrate with the modified FAST. Their superior effects are verified through the inverted pendulum simulation as well. Finally, the implementation of the modified FAST on a FPGA is developed for accomplishing a classification chip which can learn.
Tsai, Cheng-Yu, und 蔡承佑. „Online Learning System Using Q&A management mechanism“. Thesis, 2010. http://ndltd.ncl.edu.tw/handle/90277341659478981179.
Der volle Inhalt der Quelle國立中央大學
資訊工程研究所
98
Students usually turn to on-line information when they study. Thus they can make up for the infertility of the textbook contents, solving their questions and acquiring more complete knowledge. However, sometimes students can not find satisfying solution on the web, then except for asking somebody in real-life, they can also post a question on the forum and discuss with others in the forum to come up with a proper answer. However, because the forum is independent from the textbook, discussions that aimed at some particular concepts in the textbook is scattered all over the whole forum. Even if students use keywords to search, they always find other matterless discussions and have to spend much time picking up useful discussion-threads they really interested in. Moreover, when students post new questions, they must frequently go back to the threads to check if someone has answered their questions. And, when they need to keep discussing with others in some threads, each time they should search out the threads in the forum before that. Tracing the progression of discussion threads waste much time and effort, inducing students’ unwillingness to ask questions and discuss in an on-line forum. Hence we propose a framework to combine Q&A management mechanism with the textbook content. In such way, students can easily find related discussions in the forum when they read textbook, and they can subscribe interested threads and monitor the status and progression of the threads. These information help students to decide which thread is worth reading in current context.
Lu, Xiaoqi. „Regularized Greedy Gradient Q-Learning with Mobile Health Applications“. Thesis, 2021. https://doi.org/10.7916/d8-zv13-2p78.
Der volle Inhalt der QuelleYin, Huang Chiao, und 黃巧瑩. „HARQ Process for HSDPA by Fuzzy Q-learning Technique“. Thesis, 2008. http://ndltd.ncl.edu.tw/handle/15151366792629085484.
Der volle Inhalt der Quelle國立交通大學
電信工程系所
96
In order to provide higher speed and more effective downlink packet data service in 3G, high speed downlink packet access (HSDPA) is proposed by 3rd generation partnership project (3GPP). An important QoS requirement defined in spec for the hybrid automatic retransmission request (HARQ) process is to choose a suitable MCS to maintain the initial block error rate (BLER) smaller than 0.1 based on the channel quality information. In this thesis, we proposed a fuzzy Q-learning based HARQ (FQL-HARQ) scheme for HSDPA to solve this problem. The HARQ scheme is modeled as a Markov decision process (MDP). On one hand, the fuzzy rule is designed to maintain the BLER requirement by separated to different parts based on a shot term BLER performance. On the other hand, by considering both link adaptation and HARQ version, the Q-learning algorithm is used to learn the performance of MCS under different environment. After learning, we want to choose the MCS with highest throughput while not going to violate the BLER requirement. The simulation results show that the proposed scheme can indeed choose a suitable MCS for the initial transmission with channel information delay consideration. Comparing to other traditional schemes, the FQL-HARQ scheme can achieve higher system throughput and maintain the BLER at the same time.