Academic literature on the topic 'Sparse rewards environment'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Sparse rewards environment.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Sparse rewards environment"

1

Bougie, Nicolas, and Ryutaro Ichise. "Skill-based curiosity for intrinsically motivated reinforcement learning." Machine Learning 109, no. 3 (2019): 493–512. http://dx.doi.org/10.1007/s10994-019-05845-8.

Full text
Abstract:
Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, c
APA, Harvard, Vancouver, ISO, and other styles
2

Jiang, Jiechuan, and Zongqing Lu. "Generative Exploration and Exploitation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (2020): 4337–44. http://dx.doi.org/10.1609/aaai.v34i04.5858.

Full text
Abstract:
Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm,
APA, Harvard, Vancouver, ISO, and other styles
3

Dharmavaram, Akshay, Matthew Riemer, and Shalabh Bhatnagar. "Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13777–78. http://dx.doi.org/10.1609/aaai.v34i10.7160.

Full text
Abstract:
Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an
APA, Harvard, Vancouver, ISO, and other styles
4

HUANG, XIAO, and JUYANG WENG. "INHERENT VALUE SYSTEMS FOR AUTONOMOUS MENTAL DEVELOPMENT." International Journal of Humanoid Robotics 04, no. 02 (2007): 407–33. http://dx.doi.org/10.1142/s0219843607001011.

Full text
Abstract:
The inherent value system of a developmental agent enables autonomous mental development to take place right after the agent's "birth." Biologically, it is not clear what basic components constitute a value system. In the computational model introduced here, we propose that inherent value systems should have at least three basic components: punishment, reward and novelty with decreasing weights from the first component to the last. Punishments and rewards are temporally sparse but novelty is temporally dense. We present a biologically inspired computational architecture that guides development
APA, Harvard, Vancouver, ISO, and other styles
5

Potjans, Wiebke, Abigail Morrison, and Markus Diesmann. "A Spiking Neural Network Model of an Actor-Critic Learning Agent." Neural Computation 21, no. 2 (2009): 301–39. http://dx.doi.org/10.1162/neco.2008.08-07-593.

Full text
Abstract:
The ability to adapt behavior to maximize reward as a result of interactions with the environment is crucial for the survival of any higher organism. In the framework of reinforcement learning, temporal-difference learning algorithms provide an effective strategy for such goal-directed adaptation, but it is unclear to what extent these algorithms are compatible with neural computation. In this article, we present a spiking neural network model that implements actor-critic temporal-difference learning by combining local plasticity rules with a global reward signal. The network is capable of sol
APA, Harvard, Vancouver, ISO, and other styles
6

Ramakrishnan, Santhosh K., Dinesh Jayaraman, and Kristen Grauman. "Emergence of exploratory look-around behaviors through active observation completion." Science Robotics 4, no. 30 (2019): eaaw6326. http://dx.doi.org/10.1126/scirobotics.aaw6326.

Full text
Abstract:
Standard computer vision systems assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself. We address the problem of learning to look around: How can an agent learn to acquire informative visual observations? We propose a reinforcement learning solution, where the agent is rewarded for reducing its uncertainty about the unobserved portions of its environment. Specifically, the agent is trained to select a short sequence of glimpses, after which it must infer the appearance of its full en
APA, Harvard, Vancouver, ISO, and other styles
7

Kim та Park. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning". Symmetry 11, № 11 (2019): 1352. http://dx.doi.org/10.3390/sym11111352.

Full text
Abstract:
In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or
APA, Harvard, Vancouver, ISO, and other styles
8

Navidi, Neda, and Rene Landry. "New Approach in Human-AI Interaction by Reinforcement-Imitation Learning." Applied Sciences 11, no. 7 (2021): 3068. http://dx.doi.org/10.3390/app11073068.

Full text
Abstract:
Reinforcement Learning (RL) provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent’s training towards dealing w
APA, Harvard, Vancouver, ISO, and other styles
9

Dai, Tianhong, Hengyan Liu, and Anil Anthony Bharath. "Episodic Self-Imitation Learning with Hindsight." Electronics 9, no. 10 (2020): 1742. http://dx.doi.org/10.3390/electronics9101742.

Full text
Abstract:
Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transi
APA, Harvard, Vancouver, ISO, and other styles
10

Ashraf, Nesma M., Reham R. Mostafa, Rasha H. Sakr, and M. Z. Rashad. "Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm." PLOS ONE 16, no. 6 (2021): e0252754. http://dx.doi.org/10.1371/journal.pone.0252754.

Full text
Abstract:
Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparame
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Sparse rewards environment"

1

Hanski, Jari, and Kaan Baris Biçak. "An Evaluation of the Unity Machine Learning Agents Toolkit in Dense and Sparse Reward Video Game Environments." Thesis, Uppsala universitet, Institutionen för speldesign, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444982.

Full text
Abstract:
In computer games, one use case for artificial intelligence is used to create interesting problems for the player. To do this new techniques such as reinforcement learning allows game developers to create artificial intelligence agents with human-like or superhuman abilities. The Unity ML-agents toolkit is a plugin that provides game developers with access to reinforcement algorithms without expertise in machine learning. In this paper, we compare reinforcement learning methods and provide empirical training data from two different environments. First, we describe the chosen reinforcement meth
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Sparse rewards environment"

1

Hensel, Maximilian. "Exploration Methods in Sparse Reward Environments." In Reinforcement Learning Algorithms: Analysis and Applications. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-41188-6_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jeewa, Asad, Anban W. Pillay, and Edgar Jembere. "Learning to Generalise in Sparse Reward Navigation Environments." In Artificial Intelligence Research. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-66151-9_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bonnet, Roger-Maurice. "Rewards and Power of Space." In Our Space Environment, Opportunities, Stakes and Dangers. EPFL Press, 2015. http://dx.doi.org/10.1201/b19345-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

"Rewards and Power of Space." In Our Space Environment, Opportunities, Stakes and Dangers. EPFL Press, 2016. http://dx.doi.org/10.1201/b19345-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lemmers-Jansen, Imke L. J., Anne-Kathrin J. Fett, and Lydia Krabbendam. "Neural correlates of urban risk environments." In Psychotic Disorders, edited by Alaptagin Khan, Kyoko Ohashi, Maria Maier, and Martin H. Teicher. Oxford University Press, 2020. http://dx.doi.org/10.1093/med/9780190653279.003.0051.

Full text
Abstract:
Epidemiological studies suggest that the observed association between urbanicity and psychosis may be explained in part by social deprivation, reduced social capital affecting cohesion and trust, and minority group and ethnic density effects, which in turn may represent aspects of “social defeat.” In addition, urbanicity is also associated with pollution, noise, and lack of green space, which may negatively impact a range of health outcomes. This chapter reviews the neuroimaging literature on brain function, structure, and connectivity in relation to urbanicity. Research in patients with psychosis has mostly shown associations of urbanicity with brain functioning, rather than structure or connectivity. Neuroimaging research in healthy individuals may support altered social stress processing as a possible explanatory mechanism. Altered reward processing associated with urbanicity may provide an explanation for the possible influence of urban environments on dopamine dysregulation and the pathogenesis of psychosis. Mentalizing and sensory gating deficits may also mediate some of the negative effects of the city on mental health. A sustained effort toward exact replication is required to further develop this promising field.
APA, Harvard, Vancouver, ISO, and other styles
6

Coello, Yann, and Tina Iachini. "The social dimension of peripersonal space." In The World at Our Fingertips. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780198851738.003.0015.

Full text
Abstract:
Peripersonal space (PPS) is a dynamic representation of the space around the body subserving primarily the organization of goal-directed behaviours towards stimuli with the highest reward value. It must also be viewed as a space where potentially harmful stimuli receive specific attention in order to protect the body from the hazards ahead. In the present chapter, we will highlight the anticipatory motor nature of PPS representation and its dynamic properties. We will also show that stimuli in PPS receive particular attention that fosters perceptual and cognitive processes. Finally, we will propose that PPS serves as a mediation zone between the body and the environment, protecting the body from external threats and, as such, contributing to the organization of the social life.
APA, Harvard, Vancouver, ISO, and other styles
7

Levitt, Roberta, and Joseph Piro. "Game-Changer." In Gamification. IGI Global, 2015. http://dx.doi.org/10.4018/978-1-4666-8200-9.ch040.

Full text
Abstract:
Technology integration and Information and Communication Technology (ICT)-based education have enhanced the teaching and learning process by introducing a range of web-based instructional resources for classroom practitioners to deepen and extend instruction. One of the most durable of these resources has been the WebQuest. Introduced around the mid-1990s, it involves an inquiry-centered activity in which some or all of the information learners interact with comes from digital artifacts located on the Internet. WebQuests still retain much of their popularity and educational relevance and have shown remarkable staying power. Because of this, recontextualizing the WebQuest and situating it within the modern-day trend of the “gamification” of instructional design is examined, together with how the WebQuest can promote solid academic gain by placing students inside a learning space patterned after a multi-user virtual environment. This structure includes emphasis on teamwork and socially responsible problem-solving, intense task immersion, task game flow and scalability, and reward cycles. The authors also discuss how including an upgraded WebQuest informed by Common Core Grade-Specific Learning Standards in pre-service education curriculum can advance multiple facets of teacher education with candidates who are acquiring, learning, applying, and integrating pedagogical, technological, and content-area skills. Further, the authors offer suggestions for new directions in the use of web-based resources in 21st century education enterprise.
APA, Harvard, Vancouver, ISO, and other styles
8

Gosz, James R. "The Long-Term Ecological Research Stimulus: Research, Education, and Leadership Development at Individual and Community Levels." In Long-Term Ecological Research. Oxford University Press, 2016. http://dx.doi.org/10.1093/oso/9780199380213.003.0058.

Full text
Abstract:
Through the Long-Term Ecological Research (LTER) program, I have learned to appreciate the complexity of environmental dynamics when they are analyzed at multiple time and space scales. My experience as a postdoctoral fellow and in the LTER program facilitated much of my understanding of interdisciplinary research because of access to multiple disciplinary approaches and accumulation of long-term and multiple- scale information. My teaching of science benefited through recognition of the need for a combination of a deep understanding of each discipline’s role in an issue (reductionist approach) and the collaborative need for integrating disciplines to fully understand complexity. No single discipline can answer the complexity in an environmental question. I have improved my communication with the public through the combination of teaching and research reporting. The challenge is to develop the information in ways that can be communicated: free of scientific jargon, containing only essential data, and developed in scenarios that are recognized as real-life situations. The public has many forms and levels of understanding—there are K to gray and ordinary citizens and policy-makers; consequently, communication needs to be targeted appropriately. I value the role of collaboration; there is tremendous satisfaction and reward from working in teams that can accomplish so much more than can an individual. This collaboration requires compromise, interaction, and time, but those that strive for this approach to science are well recognized. I am fortunate in being in positions that have created opportunities for sustaining a long career in stimulating interdisciplinary and collaborative science. I had a traditional forest management and soil science education (Michigan Technological University and the University of Idaho). However, my entrée into ecosystem science was set up by my very valuable postdoctoral fellowship at the Hubbard Brook Experimental Forest under the guidance of Gene Likens from 1969 to 1970, before the formation of the LTER program. The Hubbard Brook experience, quite literally, educated me about systems thinking, with the watershed approach to understanding integrated responses from complex, multifactor interactions and influences of forest management as disturbances.
APA, Harvard, Vancouver, ISO, and other styles
9

Ginsberg, Benjamin. "What Is to Be Done." In The Fall of the Faculty. Oxford University Press, 2011. http://dx.doi.org/10.1093/oso/9780199782444.003.0010.

Full text
Abstract:
Professors, Taken As a group, are far from perfect. They can be petty, foolish, venal, lazy, and quarrelsome. Nevertheless, at its best, the university is a remarkable institution. It is a place where ideas are taken seriously; where notions that are taken as givens elsewhere are problematized; where what has seemed to be reality can be bent and reshaped by the power of the mind. The university is also a vitally important social institution. Chief Justice Warren, quoted in chapter 2, said American society would “stagnate and die” without free scholarly inquiry. In truth, society would not die, but it would become more stagnant without the philosophical and scientific concepts that are conceived and debated on university campuses. In the sciences, university laboratories continue to be a source of ideas that promise not only to improve established technologies but, more important, to spark the development of new technologies. This is why the Bayh-Dole Act and its encouragement of patent thickets and an anticommons in the scientific realm is potentially so destructive. In the humanities, the university is one of the few institutions to encourage and incubate new visions and modes of thought. Where else are smart people paid primarily to think and rewarded for thinking things that haven’t been thought before? The university, moreover, is a bastion of relatively free expression and, hence, one of the few places where new ideas can be discussed and sharpened. The old left, new left, neocons, and neoliberals of recent years all had their roots in academia. Political impulses that changed American life, including the “new politics movement,” the peace movement, civil rights movement, feminist movement, gay rights movement, environmental movement, the conservative legal movement, and a host of others were nurtured, if not launched, on university campuses. And why not? The university is a natural center of ideological ferment and dissent. The recipe is a simple one. simple one. Take large numbers of young people, add a few iconoclastic faculty members, liberally sprinkle with new ideas, place into a Bohemian culture, and half bake.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Sparse rewards environment"

1

Camacho, Alberto, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. "LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/840.

Full text
Abstract:
In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatical
APA, Harvard, Vancouver, ISO, and other styles
2

Bougie, Nicolas, and Ryutaro Ichise. "Towards High-Level Intrinsic Exploration in Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/733.

Full text
Abstract:
Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity sig
APA, Harvard, Vancouver, ISO, and other styles
3

Seurin, Mathieu, Florian Strub, Philippe Preux, and Olivier Pietquin. "Don’t Do What Doesn’t Matter: Intrinsic Motivation with Action Usefulness." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/406.

Full text
Abstract:
Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant act
APA, Harvard, Vancouver, ISO, and other styles
4

Lin, Xingyu, Pengsheng Guo, Carlos Florensa, and David Held. "Adaptive Variance for Changing Sparse-Reward Environments." In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019. http://dx.doi.org/10.1109/icra.2019.8793650.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Song, Shihong, Jiayi Weng, Hang Su, Dong Yan, Haosheng Zou, and Jun Zhu. "Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/482.

Full text
Abstract:
Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement Learning (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose a hierarchical agent based on combined options with intrinsic rewards to drive exploration. Specifically, we present a hierarchical model that works in a manager-worker fashion over two levels of hierarchy. The high-level manager learns a policy over options, and the low-level workers, motivated by intrinsic reward, learn to execute the options. Performance is further im
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Rundong, Runsheng Yu, Bo An, and Zinovi Rabinovich. "I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/433.

Full text
Abstract:
Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I^2HRL). First, in
APA, Harvard, Vancouver, ISO, and other styles
7

Massari, Francesco, Martin Biehl, Lisa Meeden, and Ryota Kanai. "Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments." In 2021 IEEE International Conference on Development and Learning (ICDL). IEEE, 2021. http://dx.doi.org/10.1109/icdl49984.2021.9515647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wulur, Christoper, Cornelius Weber, and Stefan Wermter. "Planning-integrated Policy for Efficient Reinforcement Learning in Sparse-reward Environments." In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9533509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Andres, Alain, Esther Villar-Rodriguez, Aritz D. Martinez, and Javier Del Ser. "Collaborative Exploration and Reinforcement Learning between Heterogeneously Skilled Agents in Environments with Sparse Rewards." In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9534146.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Juliani, Arthur, Ahmed Khalifa, Vincent-Pierre Berges, et al. "Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/373.

Full text
Abstract:
The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent in Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, eva
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!