Academic literature on the topic 'Policy gradient methods'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Policy gradient methods.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Policy gradient methods"
Peters, Jan. "Policy gradient methods." Scholarpedia 5, no. 11 (2010): 3698. http://dx.doi.org/10.4249/scholarpedia.3698.
Full textCai, Qingpeng, Ling Pan, and Pingzhong Tang. "Deterministic Value-Policy Gradients." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3316–23. http://dx.doi.org/10.1609/aaai.v34i04.5732.
Full textZhang, Matthew S., Murat A. Erdogdu, and Animesh Garg. "Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 9066–73. http://dx.doi.org/10.1609/aaai.v36i8.20891.
Full textAkella, Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, and Yisong Yue. "Deep Bayesian Quadrature Policy Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6600–6608. http://dx.doi.org/10.1609/aaai.v35i8.16817.
Full textWang, Lin, Xingang Xu, Xuhui Zhao, Baozhu Li, Ruijuan Zheng, and Qingtao Wu. "A randomized block policy gradient algorithm with differential privacy in Content Centric Networks." International Journal of Distributed Sensor Networks 17, no. 12 (December 2021): 155014772110599. http://dx.doi.org/10.1177/15501477211059934.
Full textLe, Hung, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, and Svetha Venkatesh. "Episodic Policy Gradient Training." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7317–25. http://dx.doi.org/10.1609/aaai.v36i7.20694.
Full textCohen, Andrew, Xingye Qiao, Lei Yu, Elliot Way, and Xiangrong Tong. "Diverse Exploration via Conjugate Policies for Policy Gradient Methods." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3404–11. http://dx.doi.org/10.1609/aaai.v33i01.33013404.
Full textZhang, Junzi, Jongho Kim, Brendan O'Donoghue, and Stephen Boyd. "Sample Efficient Reinforcement Learning with REINFORCE." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10887–95. http://dx.doi.org/10.1609/aaai.v35i12.17300.
Full textYu, Hai-Tao, Degen Huang, Fuji Ren, and Lishuang Li. "Diagnostic Evaluation of Policy-Gradient-Based Ranking." Electronics 11, no. 1 (December 23, 2021): 37. http://dx.doi.org/10.3390/electronics11010037.
Full textBaxter, J., and P. L. Bartlett. "Infinite-Horizon Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 319–50. http://dx.doi.org/10.1613/jair.806.
Full textDissertations / Theses on the topic "Policy gradient methods"
Greensmith, Evan, and evan greensmith@gmail com. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20060106.193712.
Full textGreensmith, Evan. "Policy gradient methods : variance reduction and stochastic convergence /." View thesis entry in Australian Digital Theses Program, 2005. http://thesis.anu.edu.au/public/adt-ANU20060106.193712/index.html.
Full textYuan, Rui. "Stochastic Second Order Methods and Finite Time Analysis of Policy Gradient Methods." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT010.
Full textTo solve large scale machine learning problems, first-order methods such as stochastic gradient descent and ADAM are the methods of choice because of their low cost per iteration. The issue with first order methods is that they can require extensive parameter tuning, and/or knowledge of the parameters of the problem. There is now a concerted effort to develop efficient stochastic second order methods to solve large scale machine learning problems. The motivation is that they require less parameter tuning and converge for wider variety of models and datasets. In the first part of the thesis, we presented a principled approach for designing stochastic Newton methods for solving both nonlinear equations and optimization problems in an efficient manner. Our approach has two steps. First, we can re-write the nonlinear equations or the optimization problem as desired nonlinear equations. Second, we apply new stochastic second order methods to solve this system of nonlinear equations. Through our general approach, we showcase many specific new second-order algorithms that can solve the large machine learning problems efficiently without requiring knowledge of the problem nor parameter tuning. In the second part of the thesis, we then focus on optimization algorithms applied in a specific domain: reinforcement learning (RL). This part is independent to the first part of the thesis. To achieve such high performance of RL problems, policy gradient (PG) and its variant, natural policy gradient (NPG), are the foundations of the several state of the art algorithms (e.g., TRPO and PPO) used in deep RL. In spite of the empirical success of RL and PG methods, a solid theoretical understanding of even the “vanilla” PG has long been elusive. By leveraging the RL structure of the problem together with modern optimization proof techniques, we derive new finite time analysis of both PG and NPG. Through our analysis, we also bring new insights to the methods with better hyperparameter choices
Pianazzi, Enrico. "A deep reinforcement learning approach based on policy gradient for mobile robot navigation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.
Find full textGreensmith, Evan. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." Phd thesis, 2005. http://hdl.handle.net/1885/47105.
Full text"Adaptive Curvature for Stochastic Optimization." Master's thesis, 2019. http://hdl.handle.net/2286/R.I.53675.
Full textDissertation/Thesis
Masters Thesis Computer Science 2019
Pereira, Bruno Alexandre Barbosa. "Deep reinforcement learning for robotic manipulation tasks." Master's thesis, 2021. http://hdl.handle.net/10773/33654.
Full textOs avanços recentes na Inteligência Artificial (IA) demonstram um conjunto de novas oportunidades para a robótica. A Aprendizagem Profunda por Reforço (DRL) é uma subárea da IA que resulta da combinação de Aprendizagem Profunda (DL) com Aprendizagem por Reforço (RL). Esta subárea define algoritmos de aprendizagem automática que aprendem diretamente por experiência e oferece uma abordagem compreensiva para o estudo da interação entre aprendizagem, representação e a decisão. Estes algoritmos já têm sido utilizados com sucesso em diferentes domínios. Nomeadamente, destaca-se a aplicação de agentes de DRL que aprenderam a jogar vídeo jogos da consola Atari 2600 diretamente a partir de pixels e atingiram um desempenho comparável a humanos em 49 desses jogos. Mais recentemente, a DRL em conjunto com outras técnicas originou agentes capazes de jogar o jogo de tabuleiro Go a um nível profissional, algo que até ao momento era visto como um problema demasiado complexo para ser resolvido devido ao seu enorme espaço de procura. No âmbito da robótica, a DRL tem vindo a ser utilizada em problemas de planeamento, navegação, controlo ótimo e outros. Nestas aplicações, as excelentes capacidades de aproximação de funções e aprendizagem de representação das Redes Neuronais Profundas permitem à RL escalar a problemas com espaços de estado e ação multidimensionais. Adicionalmente, propriedades inerentes à DRL fazem a transferência de aprendizagem útil ao passar da simulação para o mundo real. Esta dissertação visa investigar a aplicabilidade e eficácia de técnicas de DRL para aprender políticas de sucesso no domínio das tarefas de manipulação robótica. Inicialmente, um conjunto de três problemas clássicos de RL foram resolvidos utilizando algoritmos de RL e DRL de forma a explorar a sua implementação prática e chegar a uma classe de algoritmos apropriados para estas tarefas de robótica. Posteriormente, foi definida uma tarefa em simulação onde um agente tem como objetivo controlar um manipulador com 6 graus de liberdade de forma a atingir um alvo com o seu terminal. Esta é utilizada para avaliar o efeito no desempenho de diferentes representações do estado, hiperparâmetros e algoritmos do estado da arte de DRL, o que resultou em agentes com taxas de sucesso elevadas. O foco é depois colocado na velocidade e restrições de tempo do posicionamento do terminal. Para este fim, diferentes sistemas de recompensa foram testados para que um agente possa aprender uma versão modificada da tarefa anterior para velocidades de juntas superiores. Neste cenário, foram verificadas várias melhorias em relação ao sistema de recompensa original. Finalmente, uma aplicação do melhor agente obtido nas experiências anteriores é demonstrada num cenário implicado de captura de bola.
Mestrado em Engenharia de Computadores e Telemática
Kiah-YangChong and 張家揚. "Design and Implementation of Fuzzy Policy Gradient Gait Learning Method for Humanoid Robot." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/90100127378597192142.
Full text國立成功大學
電機工程學系碩博士班
98
The design and implementation of Fuzzy Policy Gradient Learning (FPGL) method for small-sized humanoid robot is proposed in this thesis. This thesis not only introduces the mechanism structure of the humanoid robot and the hardware system adapted on the robot, which is named as aiRobots-V, but also improves and parameterizes the gait pattern of the robot. The movement of arms is added to the gait pattern to reduce the tilt of trunk while walking. FPGL method is an integrated machine learning method based on Policy Gradient Reinforcement Learning (PGRL) method and fuzzy logic concept in order to improve the efficiency and speed of gait learning computation. The humanoid robot is trained with FPGL method which is using the walking distance in constant walking cycles as the reward to learn faster and stable gait automatically. The tilt degree of trunk is chosen as the reward to learn the movement of arms in the walking cycle. The result of the experiment shows that FPGL method could train the gait pattern from 9.26 mm/s walking speed to 162.27 mm/s in about an hour. The training data of experiments also shows that this method could improve the efficiency of basic PGRL method up to 13%. The effect of arm movement to reduce the tilt degree of trunk is also proved by the experimental results. This robot is also applied to participate in the throw-in technical challenge of RoboCup 2010.
Books on the topic "Policy gradient methods"
Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt Publishing, 2018.
Find full textBook chapters on the topic "Policy gradient methods"
Zeugmann, Thomas, Pascal Poupart, James Kennedy, Xin Jin, Jiawei Han, Lorenza Saitta, Michele Sebag, et al. "Policy Gradient Methods." In Encyclopedia of Machine Learning, 774–76. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_640.
Full textHu, Michael. "Policy Gradient Methods." In The Art of Reinforcement Learning, 177–96. Berkeley, CA: Apress, 2023. http://dx.doi.org/10.1007/978-1-4842-9606-6_9.
Full textPeters, Jan, and J. Andrew Bagnell. "Policy Gradient Methods." In Encyclopedia of Machine Learning and Data Mining, 1–4. Boston, MA: Springer US, 2016. http://dx.doi.org/10.1007/978-1-4899-7502-7_646-1.
Full textPeters, Jan, and J. Andrew Bagnell. "Policy Gradient Methods." In Encyclopedia of Machine Learning and Data Mining, 982–85. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_646.
Full textHu, Michael. "Advanced Policy Gradient Methods." In The Art of Reinforcement Learning, 205–20. Berkeley, CA: Apress, 2023. http://dx.doi.org/10.1007/978-1-4842-9606-6_11.
Full textSemmler, Markus. "Fisher Information Approximations in Policy Gradient Methods." In Reinforcement Learning Algorithms: Analysis and Applications, 59–67. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-41188-6_6.
Full textHansel, Kay, Janosch Moos, and Cedric Derstroff. "Benchmarking the Natural Gradient in Policy Gradient Methods and Evolution Strategies." In Reinforcement Learning Algorithms: Analysis and Applications, 69–84. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-41188-6_7.
Full textJiang, Xuesong, Zhipeng Li, and Xiumei Wei. "Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient." In Neural Information Processing, 711–21. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-04179-3_63.
Full textLevy, Kfir Y., and Nahum Shimkin. "Unified Inter and Intra Options Learning Using Policy Gradient Methods." In Lecture Notes in Computer Science, 153–64. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29946-9_17.
Full textSabbioni, Luca, Francesco Corda, and Marcello Restelli. "Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes." In Machine Learning and Knowledge Discovery in Databases: Research Track, 506–23. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-43421-1_30.
Full textConference papers on the topic "Policy gradient methods"
Peters, Jan, and Stefan Schaal. "Policy Gradient Methods for Robotics." In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2006. http://dx.doi.org/10.1109/iros.2006.282564.
Full textStåhlberg, Simon, Blai Bonet, and Hector Geffner. "Learning General Policies with Policy Gradient Methods." In 20th International Conference on Principles of Knowledge Representation and Reasoning {KR-2023}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/kr.2023/63.
Full textLi, Dong, Dongbin Zhao, Qichao Zhang, and Chaomin Luo. "Policy gradient methods with Gaussian process modelling acceleration." In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017. http://dx.doi.org/10.1109/ijcnn.2017.7966065.
Full textShi, Wenjie, Shiji Song, and Cheng Wu. "Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/475.
Full textMa, Xiaobai, Katherine Driggs-Campbell, Zongzhang Zhang, and Mykel J. Kochenderfer. "Monte Carlo Tree Search for Policy Optimization." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/432.
Full textZiemann, Ingvar, Anastasios Tsiamis, Henrik Sandberg, and Nikolai Matni. "How are policy gradient methods affected by the limits of control?" In 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022. http://dx.doi.org/10.1109/cdc51059.2022.9992612.
Full textDing, Yuhao, Junzi Zhang, and Javad Lavaei. "Local Analysis of Entropy-Regularized Stochastic Soft-Max Policy Gradient Methods." In 2023 European Control Conference (ECC). IEEE, 2023. http://dx.doi.org/10.23919/ecc57647.2023.10178123.
Full textPeng, Zilun, Ahmed Touati, Pascal Vincent, and Doina Precup. "SVRG for Policy Evaluation with Fewer Gradient Evaluations." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/374.
Full textGronauer, Sven, Martin Gottwald, and Klaus Diepold. "The Successful Ingredients of Policy Gradient Algorithms." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/338.
Full textRiedmiller, Martin, Jan Peters, and Stefan Schaal. "Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark." In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 2007. http://dx.doi.org/10.1109/adprl.2007.368196.
Full textReports on the topic "Policy gradient methods"
Umberger, Pierce. Experimental Evaluation of Dynamic Crack Branching in Poly(methyl methacrylate) (PMMA) Using the Method of Coherent Gradient Sensing. Fort Belvoir, VA: Defense Technical Information Center, February 2010. http://dx.doi.org/10.21236/ada518614.
Full textA Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, December 2020. http://dx.doi.org/10.4271/2020-01-5154.
Full text