Log in

Relevant bibliographies by topics / Hindsight Optimization

Contents

Journal articles
Dissertations / Theses
Conference papers

Academic literature on the topic 'Hindsight Optimization'

Author: Grafiati

Published: 4 June 2021

Last updated: 3 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Hindsight Optimization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Hindsight Optimization"

1

Gang Wu, E. K. P. Chong, and R. Givan. "Burst-level congestion control using hindsight optimization." IEEE Transactions on Automatic Control 47, no. 6 (June 2002): 979–91. http://dx.doi.org/10.1109/tac.2002.1008362.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Javdani, Shervin, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell. "Shared autonomy via hindsight optimization for teleoperation and teaming." International Journal of Robotics Research 37, no. 7 (June 2018): 717–42. http://dx.doi.org/10.1177/0278364918776060.

Full text

Abstract:

In shared autonomy, a user and autonomous system work together to achieve shared goals. To collaborate effectively, the autonomous system must know the user’s goal. As such, most prior works follow a predict-then-act model, first predicting the user’s goal with high confidence, then assisting given that goal. Unfortunately, confidently predicting the user’s goal may not be possible until they have nearly achieved it, causing predict-then-act methods to provide little assistance. However, the system can often provide useful assistance even when confidence for any single goal is low (e.g. move towards multiple goals). In this work, we formalize this insight by modeling shared autonomy as a partially observable Markov decision process (POMDP), providing assistance that minimizes the expected cost-to-go with an unknown goal. As solving this POMDP optimally is intractable, we use hindsight optimization to approximate. We apply our framework to both shared-control teleoperation and human–robot teaming. Compared with predict-then-act methods, our method achieves goals faster, requires less user input, decreases user idling time, and results in fewer user–robot collisions.

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Kelvin C. P., and John P. Zaniewski. "20/30 Hindsight: The New Pavement Optimization in the Arizona State Highway Network." Interfaces 26, no. 3 (June 1996): 77–89. http://dx.doi.org/10.1287/inte.26.3.77.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Wang, Guanghui, Shiyin Lu, Yao Hu, and Lijun Zhang. "Adapting to Smoothness: A More Universal Algorithm for Online Convex Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6162–69. http://dx.doi.org/10.1609/aaai.v34i04.6081.

Full text

Abstract:

We aim to design universal algorithms for online convex optimization, which can handle multiple common types of loss functions simultaneously. The previous state-of-the-art universal method has achieved the minimax optimality for general convex, exponentially concave and strongly convex loss functions. However, it remains an open problem whether smoothness can be exploited to further improve the theoretical guarantees. In this paper, we provide an affirmative answer by developing a novel algorithm, namely UFO, which achieves O(√L*), O(d log L*) and O(log L*) regret bounds for the three types of loss functions respectively under the assumption of smoothness, where L* is the cumulative loss of the best comparator in hindsight, and d is dimensionality. Thus, our regret bounds are much tighter when the comparator has a small loss, and ensure the minimax optimality in the worst case. In addition, it is worth pointing out that UFO is the first to achieve the O(log L*) regret bound for strongly convex and smooth functions, which is tighter than the existing small-loss bound by an O(d) factor.

APA, Harvard, Vancouver, ISO, and other styles

5

Garivaltis, Alex. "Cover's Rebalancing Option With Discrete Hindsight Optimization." SSRN Electronic Journal, 2019. http://dx.doi.org/10.2139/ssrn.3346107.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Garivaltis, Alex. "Cover’s Rebalancing Option with Discrete Hindsight Optimization." Journal of Derivatives, April 20, 2021, jod.2021.1.135. http://dx.doi.org/10.3905/jod.2021.1.135.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Hindsight Optimization"

1

Olsen, Alan. "Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes." DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/1035.

Full text

Abstract:

Partially-observable Markov decision processes (POMDPs) are especially good at modeling real-world problems because they allow for sensor and effector uncertainty. Unfortunately, such uncertainty makes solving a POMDP computationally challenging. Traditional approaches, which are based on value iteration, can be slow because they find optimal actions for every possible situation. With the help of the Fast Forward (FF) planner, FF- Replan and FF-Hindsight have shown success in quickly solving fully-observable Markov decision processes (MDPs) by solving classical planning translations of the problem. This thesis extends the concept of problem determination to POMDPs by sampling action observations (similar to how FF-Replan samples action outcomes) and guiding the construction of policy trajectories with a conformant (as opposed to classical) planning heuristic. The resultant planner is called POND-Hindsight.

APA, Harvard, Vancouver, ISO, and other styles

2

Nathanael, Johanes, and 郭子聖. "Hindsight Proximal Policy Optimization." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/2t45jq.

Full text

Abstract:

碩士
國立交通大學
電機資訊國際學程
107
Reinforcement learning (RL) has come into a crucial part in machine learning development and shown to be the closest to the nature of learning for human beings. The idea to understand this framework is probably easiest to look into how infant learns, moves hands, legs and observes the surrounding things. No explicit teacher is present during learning process, but his/her sensorimotor movement has direct interaction with the environment. Throughout his/her lifespan, this interaction will become his/her major feedback and source of knowledge to improve and evolve. In reinforcement learning, this infant is actually an agent with the given situation called state and a set of possible actions to choose from according to the policy. This agent receives the feedback in a form of reward and learns how to achieve a particular goal with the environment modeled by a Markov decision process. This thesis presents the deep reinforcement learning for robotic automation in presence of the continuous action space. Either from experimental justification or theoretical formulation, the parameterized policies have shown remarkable performance especially when combined with the recent advances in deep neural networks. Learning in reinforcement learning always seeks to compromise two dilemmas between exploration and exploitation. In early learning stage, an agent definitely has no knowledge for the understanding of environment and therefore, it needs to explore its surroundings. The learning process runs with two techniques, which are off-policy and on-policy. The off-policy technique uses different policies for exploration and exploitation. The behavior policy is run for exploration while the learning policy is used for exploitation. Considering the example of off-policy using deep Q learning, the behavior policy is based on the greedy exploration, where the agent literally and randomly scrutinizes while the learning policy is based on the optimal action with the largest Q value. Alternatively, using the on-policy technique, the behavioral policy is equal to the learning policy. This technique is non-optimal. In the end, both techniques will eventually converge to an optimal policy. Next, consider a case when you are an adventurer. You suddenly receive a special imap, and currently excited to find that treasure. At this moment, you have to actually perform many actions until you finally achieve what you want. Under this circumstance, if you only receive reward feedback when you find the treasure, then most of the time, you will not receive any reward nor useful feedback. The so-called sparse reward problem happens. Unlike the supervised learning based on class labels, the agent in reinforcement learning has to learn with sparse reward. If the state space is massive, it becomes very difficult to carry out a desirable reinforcement learning process. This thesis develops an on-policy reinforcement learning for agent to explore with sparse reward. One of most prominent methods in this scope is based on the so-called hindsight experience replay (HER). The idea behind this work is to learn from failure. Hence, rather than waiting till the end of states to achieve a goal, we assume there exists a different goal state can be achieved across different episodes and provide a feedback despite its inability to reach the end of states. However, the traditional HER is constrained by the solution which is only based on off-policy algorithm. Some continuous tasks with state-of-the-art performance using on-policy algorithm could not be generalized. In addition, off-policy algorithm implements two different policies at the same time. The stability in learning process is not assured. We are motivated to combine state-of-the-art on-policy algorithm for hindsight experience replay with improved performance in terms of stability and performance for sparse reward. This new algorithm is implemented by using the importance sampling. The reason for this implementation is to ensure the stability when drawing different goal states. Not only that, in this thesis, we also present how to generate better goals to boost the learning efficiency. The model-based method is usually used to represent the environment. Instead, we implement based on the goal distribution. For the experiments, the first environment we used to demonstrate our ideas is a simple classical problem, a pendulum. This environment implemented dense reward trait. The reason we choose this is to show that it is possible to combine the idea of hindsight in dense reward environment. It has been shown in our experimental result that stability trait can be preserved. Besides, by implementing model based goal generator, the sample efficiency is boosted, which followed our previous claim. For sparse reward, we will evaluate inside bit-flipping environment or possibly with new multi-goal environment developed under MuJoCo which is sparse and has been proven to be quite complicated. One of them is the locomotion task, where we have robot arm to pick up certain object and move it to other places. Not only the arm has to study where the object is, it also has to learn how to correctly grasp.

APA, Harvard, Vancouver, ISO, and other styles

3

Johnson, Frances Laird. "Dependency-directed reconsideration An anytime algorithm for hindsight knowledge-base optimization /." 2006. http://proquest.umi.com/pqdweb?did=1172122381&sid=1&Fmt=2&clientId=39334&RQT=309&VName=PQD.

Full text

Abstract:

Thesis (Ph.D.)--State University of New York at Buffalo, 2006.
Title from PDF title page (viewed on Dec. 13, 2006) Available through UMI ProQuest Digital Dissertations. Thesis adviser: Shapiro, Stuart C. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Hindsight Optimization"

1

Zhang, Hanbo, Site Bai, Xuguang Lan, David Hsu, and Nanning Zheng. "Hindsight Trust Region Policy Optimization." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/459.

Full text

Abstract:

Reinforcement Learning (RL) with sparse rewards is a major challenge. We pro- pose Hindsight Trust Region Policy Optimization (HTRPO), a new RL algorithm that extends the highly successful TRPO algorithm with hindsight to tackle the challenge of sparse rewards. Hindsight refers to the algorithm’s ability to learn from information across goals, including past goals not intended for the current task. We derive the hindsight form of TRPO, together with QKL, a quadratic approximation to the KL divergence constraint on the trust region. QKL reduces variance in KL divergence estimation and improves stability in policy updates. We show that HTRPO has similar convergence property as TRPO. We also present Hindsight Goal Filtering (HGF), which further improves the learning performance for suitable tasks. HTRPO has been evaluated on various sparse-reward tasks, including Atari games and simulated robot control. Experimental results show that HTRPO consistently outperforms TRPO, as well as HPG, a state-of-the-art policy 14 gradient algorithm for RL with sparse rewards.

APA, Harvard, Vancouver, ISO, and other styles

2

Javdani, Shervin, Siddhartha Srinivasa, and Andrew Bagnell. "Shared Autonomy via Hindsight Optimization." In Robotics: Science and Systems 2015. Robotics: Science and Systems Foundation, 2015. http://dx.doi.org/10.15607/rss.2015.xi.032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Godoy, Julio, Ioannis Karamouzas, Stephen J. Guy, and Maria Gini. "Anytime navigation with Progressive Hindsight optimization." In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014). IEEE, 2014. http://dx.doi.org/10.1109/iros.2014.6942639.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Pellegrinelli, Stefania, Henny Admoni, Shervin Javdani, and Siddhartha Srinivasa. "Human-robot shared workspace collaboration via hindsight optimization." In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016. http://dx.doi.org/10.1109/iros.2016.7759147.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Zhu, Menghui, Minghuan Liu, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, and Wei Yang. "MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/480.

Full text

Abstract:

In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. In this paper, to enhance the diversity of relabeled goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. Besides, to improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training. By integrating these two improvements, we introduce the MapGo framework (Model-Assisted Policy optimization for Goal-oriented tasks). In our experiments, we first show the effectiveness of the FGI strategy compared with the hindsight one, and then show that the MapGo framework achieves higher sample efficiency when compared to model-free baselines on a set of complicated tasks.

APA, Harvard, Vancouver, ISO, and other styles

6

Yang, Deyu, Hanbo Zhang, and Xuguang Lan. "Research on Complex Robot Manipulation Tasks Based on Hindsight Trust Region Policy Optimization." In 2020 Chinese Automation Congress (CAC). IEEE, 2020. http://dx.doi.org/10.1109/cac51589.2020.9327251.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Dayapule, Durga Harish, Aswin Raghavan, Prasad Tadepalli, and Alan Fern. "Emergency Response Optimization using Online Hybrid Planning." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/656.

Full text

Abstract:

This paper poses the planning problem faced by the dispatcher responding to urban emergencies as a Hybrid (Discrete and Continuous) State and Action Markov Decision Process (HSA-MDP). We evaluate the performance of three online planning algorithms based on hindsight optimization for HSA- MDPs on real-world emergency data in the city of Corvallis, USA. The approach takes into account and respects the policy constraints imposed by the emergency department. We show that our algorithms outperform a heuristic policy commonly used by dispatchers by significantly reducing the average response time as well as lowering the fraction of unanswered calls. Our results give new insights into the problem such as withholding of resources for future emergencies in some situations.

APA, Harvard, Vancouver, ISO, and other styles

8

Guo, Qingyu, Bo An, and Long Tran-Thanh. "Playing Repeated Network Interdiction Games with Semi-Bandit Feedback." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/515.

Full text

Abstract:

We study repeated network interdiction games with no prior knowledge of the adversary and the environment, which can model many real world network security domains. Existing works often require plenty of available information for the defender and neglect the frequent interactions between both players, which are unrealistic and impractical, and thus, are not suitable for our settings. As such, we provide the first defender strategy, that enjoys nice theoretical and practical performance guarantees, by applying the adversarial online learning approach. In particular, we model the repeated network interdiction game with no prior knowledge as an online linear optimization problem, for which a novel and efficient online learning algorithm, SBGA, is proposed, which exploits the unique semi-bandit feedback in network security domains. We prove that SBGA achieves sublinear regret against adaptive adversary, compared with both the best fixed strategy in hindsight and a near optimal adaptive strategy. Extensive experiments also show that SBGA significantly outperforms existing approaches with fast convergence rate.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!