Academic literature on the topic 'Hindsight Optimization'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Hindsight Optimization.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Hindsight Optimization"
Gang Wu, E. K. P. Chong, and R. Givan. "Burst-level congestion control using hindsight optimization." IEEE Transactions on Automatic Control 47, no. 6 (June 2002): 979–91. http://dx.doi.org/10.1109/tac.2002.1008362.
Full textJavdani, Shervin, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell. "Shared autonomy via hindsight optimization for teleoperation and teaming." International Journal of Robotics Research 37, no. 7 (June 2018): 717–42. http://dx.doi.org/10.1177/0278364918776060.
Full textWang, Kelvin C. P., and John P. Zaniewski. "20/30 Hindsight: The New Pavement Optimization in the Arizona State Highway Network." Interfaces 26, no. 3 (June 1996): 77–89. http://dx.doi.org/10.1287/inte.26.3.77.
Full textWang, Guanghui, Shiyin Lu, Yao Hu, and Lijun Zhang. "Adapting to Smoothness: A More Universal Algorithm for Online Convex Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6162–69. http://dx.doi.org/10.1609/aaai.v34i04.6081.
Full textGarivaltis, Alex. "Cover's Rebalancing Option With Discrete Hindsight Optimization." SSRN Electronic Journal, 2019. http://dx.doi.org/10.2139/ssrn.3346107.
Full textGarivaltis, Alex. "Cover’s Rebalancing Option with Discrete Hindsight Optimization." Journal of Derivatives, April 20, 2021, jod.2021.1.135. http://dx.doi.org/10.3905/jod.2021.1.135.
Full textDissertations / Theses on the topic "Hindsight Optimization"
Olsen, Alan. "Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes." DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/1035.
Full textNathanael, Johanes, and 郭子聖. "Hindsight Proximal Policy Optimization." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/2t45jq.
Full text國立交通大學
電機資訊國際學程
107
Reinforcement learning (RL) has come into a crucial part in machine learning development and shown to be the closest to the nature of learning for human beings. The idea to understand this framework is probably easiest to look into how infant learns, moves hands, legs and observes the surrounding things. No explicit teacher is present during learning process, but his/her sensorimotor movement has direct interaction with the environment. Throughout his/her lifespan, this interaction will become his/her major feedback and source of knowledge to improve and evolve. In reinforcement learning, this infant is actually an agent with the given situation called state and a set of possible actions to choose from according to the policy. This agent receives the feedback in a form of reward and learns how to achieve a particular goal with the environment modeled by a Markov decision process. This thesis presents the deep reinforcement learning for robotic automation in presence of the continuous action space. Either from experimental justification or theoretical formulation, the parameterized policies have shown remarkable performance especially when combined with the recent advances in deep neural networks. Learning in reinforcement learning always seeks to compromise two dilemmas between exploration and exploitation. In early learning stage, an agent definitely has no knowledge for the understanding of environment and therefore, it needs to explore its surroundings. The learning process runs with two techniques, which are off-policy and on-policy. The off-policy technique uses different policies for exploration and exploitation. The behavior policy is run for exploration while the learning policy is used for exploitation. Considering the example of off-policy using deep Q learning, the behavior policy is based on the greedy exploration, where the agent literally and randomly scrutinizes while the learning policy is based on the optimal action with the largest Q value. Alternatively, using the on-policy technique, the behavioral policy is equal to the learning policy. This technique is non-optimal. In the end, both techniques will eventually converge to an optimal policy. Next, consider a case when you are an adventurer. You suddenly receive a special imap, and currently excited to find that treasure. At this moment, you have to actually perform many actions until you finally achieve what you want. Under this circumstance, if you only receive reward feedback when you find the treasure, then most of the time, you will not receive any reward nor useful feedback. The so-called sparse reward problem happens. Unlike the supervised learning based on class labels, the agent in reinforcement learning has to learn with sparse reward. If the state space is massive, it becomes very difficult to carry out a desirable reinforcement learning process. This thesis develops an on-policy reinforcement learning for agent to explore with sparse reward. One of most prominent methods in this scope is based on the so-called hindsight experience replay (HER). The idea behind this work is to learn from failure. Hence, rather than waiting till the end of states to achieve a goal, we assume there exists a different goal state can be achieved across different episodes and provide a feedback despite its inability to reach the end of states. However, the traditional HER is constrained by the solution which is only based on off-policy algorithm. Some continuous tasks with state-of-the-art performance using on-policy algorithm could not be generalized. In addition, off-policy algorithm implements two different policies at the same time. The stability in learning process is not assured. We are motivated to combine state-of-the-art on-policy algorithm for hindsight experience replay with improved performance in terms of stability and performance for sparse reward. This new algorithm is implemented by using the importance sampling. The reason for this implementation is to ensure the stability when drawing different goal states. Not only that, in this thesis, we also present how to generate better goals to boost the learning efficiency. The model-based method is usually used to represent the environment. Instead, we implement based on the goal distribution. For the experiments, the first environment we used to demonstrate our ideas is a simple classical problem, a pendulum. This environment implemented dense reward trait. The reason we choose this is to show that it is possible to combine the idea of hindsight in dense reward environment. It has been shown in our experimental result that stability trait can be preserved. Besides, by implementing model based goal generator, the sample efficiency is boosted, which followed our previous claim. For sparse reward, we will evaluate inside bit-flipping environment or possibly with new multi-goal environment developed under MuJoCo which is sparse and has been proven to be quite complicated. One of them is the locomotion task, where we have robot arm to pick up certain object and move it to other places. Not only the arm has to study where the object is, it also has to learn how to correctly grasp.
Johnson, Frances Laird. "Dependency-directed reconsideration An anytime algorithm for hindsight knowledge-base optimization /." 2006. http://proquest.umi.com/pqdweb?did=1172122381&sid=1&Fmt=2&clientId=39334&RQT=309&VName=PQD.
Full textTitle from PDF title page (viewed on Dec. 13, 2006) Available through UMI ProQuest Digital Dissertations. Thesis adviser: Shapiro, Stuart C. Includes bibliographical references.
Conference papers on the topic "Hindsight Optimization"
Zhang, Hanbo, Site Bai, Xuguang Lan, David Hsu, and Nanning Zheng. "Hindsight Trust Region Policy Optimization." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/459.
Full textJavdani, Shervin, Siddhartha Srinivasa, and Andrew Bagnell. "Shared Autonomy via Hindsight Optimization." In Robotics: Science and Systems 2015. Robotics: Science and Systems Foundation, 2015. http://dx.doi.org/10.15607/rss.2015.xi.032.
Full textGodoy, Julio, Ioannis Karamouzas, Stephen J. Guy, and Maria Gini. "Anytime navigation with Progressive Hindsight optimization." In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014). IEEE, 2014. http://dx.doi.org/10.1109/iros.2014.6942639.
Full textPellegrinelli, Stefania, Henny Admoni, Shervin Javdani, and Siddhartha Srinivasa. "Human-robot shared workspace collaboration via hindsight optimization." In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016. http://dx.doi.org/10.1109/iros.2016.7759147.
Full textZhu, Menghui, Minghuan Liu, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, and Wei Yang. "MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/480.
Full textYang, Deyu, Hanbo Zhang, and Xuguang Lan. "Research on Complex Robot Manipulation Tasks Based on Hindsight Trust Region Policy Optimization." In 2020 Chinese Automation Congress (CAC). IEEE, 2020. http://dx.doi.org/10.1109/cac51589.2020.9327251.
Full textDayapule, Durga Harish, Aswin Raghavan, Prasad Tadepalli, and Alan Fern. "Emergency Response Optimization using Online Hybrid Planning." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/656.
Full textGuo, Qingyu, Bo An, and Long Tran-Thanh. "Playing Repeated Network Interdiction Games with Semi-Bandit Feedback." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/515.
Full text