Dissertations / Theses on the topic 'Processus décisionnel de Markov(MDP)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 17 dissertations / theses for your research on the topic 'Processus décisionnel de Markov(MDP).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Alizadeh, Pegah. "Elicitation and planning in Markov decision processes with unknown rewards." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCD011/document.
Full textMarkov decision processes (MDPs) are models for solving sequential decision problemswhere a user interacts with the environment and adapts her policy by taking numericalreward signals into account. The solution of an MDP reduces to formulate the userbehavior in the environment with a policy function that specifies which action to choose ineach situation. In many real world decision problems, the users have various preferences,and therefore, the gain of actions on states are different and should be re-decoded foreach user. In this dissertation, we are interested in solving MDPs for users with differentpreferences.We use a model named Vector-valued MDP (VMDP) with vector rewards. We propose apropagation-search algorithm that allows to assign a vector-value function to each policyand identify each user with a preference vector on the existing set of preferences wherethe preference vector satisfies the user priorities. Since the user preference vector is notknown we present several methods for solving VMDPs while approximating the user’spreference vector.We introduce two algorithms that reduce the number of queries needed to find the optimalpolicy of a user: 1) A propagation-search algorithm, where we propagate a setof possible optimal policies for the given MDP without knowing the user’s preferences.2) An interactive value iteration algorithm (IVI) on VMDPs, namely Advantage-basedValue Iteration (ABVI) algorithm that uses clustering and regrouping advantages. Wealso demonstrate how ABVI algorithm works properly for two different types of users:confident and uncertain.We finally work on a minimax regret approximation method as a method for findingthe optimal policy w.r.t the limited information about user’s preferences. All possibleobjectives in the system are just bounded between two higher and lower bounds while thesystem is not aware of user’s preferences among them. We propose an heuristic minimaxregret approximation method for solving MDPs with unknown rewards that is faster andless complex than the existing methods in the literature
Boussard, Matthieu. "Planification multi-agents multi-objectifs : modèle et algorithme." Caen, 2008. http://www.theses.fr/2008CAEN2065.
Full textThis thesis deals with the coordination of a group of autonomous agents in the real world. So, we have to take into account uncertainty about action's outcome, about other agent's behavior and also the changes in the environment. We are using Markov decision processes (MDP), whose allow to manage those uncertainties in a decision process. In order to manage the interactions with the other agents, we give a formalism to express them, and also we give a solution to integrate them in a on-line decision process. This is an extension of the Markov Decision Processes where the agent are trying to optimize their own reward as well as the welfare of the group. This is a mutlicriteria decision problem, and we give it a solution. Once this formalism built, we tackle some classical coordination problems : platooning, spatial coverage, coalitions formation. Those applications allow us to apply with success the principle given at the beginning of the thesis. The extensions of this work will be dealing with on-line learning, and also game theory in order to detect and to solve deadlocks
Lelerre, Mathieu. "Processus Décisionnels de Markov pour l'autonomie ajustable et l'interaction hétérogène entre engins autonomes et pilotés." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC246/document.
Full textRobots will be more and more used in both civil and military fields. These robots, operating in fleet, can accompany soldiers in fight, or accomplish a mission while being supervised by a control center. Considering the requirement of a military operation, it is complicated to let robots decide their action without an operator agreement or watch, in function of the situation.In this thesis, we focus on two problematics:First, we try to exploit adjustable autonomy to make a robot accomplishes is mission as efficiency as possible, while he respects restrictions, assigned by an operator, on his autonomy level. For this, it is able to define for given sets of states and actions a restriction level. This restriction can force, for example, the need of being tele-operated to access a dangerous zone.Secondly, we consider that several robots can be deployed at the same time. These robots have to coordinate to accomplish their objectives. However, since operators can take the control of some robots, the coordination is harder. In fact, the operator has preferences, perception, hesitation, stress that are not modeled by the agent. It is then hard to estimate his next actions, so to coordinate with him. We propose in this thesis an approach to estimate the policy executed by a tele-operated robot from learning methods, based on observed actions from this robot.The notion of planning his important in these works. These are based on planning models, such as Markov Decision Processes
Yin, Biao. "Contrôle adaptatif des feux de signalisation dans les carrefours : modélisation du système de trafic dynamique et approches de résolution." Thesis, Belfort-Montbéliard, 2015. http://www.theses.fr/2015BELF0279/document.
Full textAdaptive traffic signal control is a decision making optimization problem. People address this crucial problem constantly in order to solve the traffic congestion at urban intersections. It is very popular to use intelligent algorithms to improve control performances, such as traffic delay. In the thesis, we try to study this problem comprehensively with a microscopic and dynamic model in discrete-time, and investigate the related algorithms both for isolated intersection and distributed network control. At first, we focus on dynamic modeling for adaptive traffic signal control and network loading problems. The proposed adaptive phase sequence (APS) mode is highlighted as one of the signal phase control mechanisms. As for the modeling of signal control at intersections, problems are fundamentally formulated by Markov decision process (MDP), especially the concept of tunable system state is proposed for the traffic network coordination. Moreover, a new vehicle-following model supports for the network loading environment.Based on the model, signal control methods in the thesis are studied by optimal and near-optimal algorithms in turn. Two exact DP algorithms are investigated and results show some limitations of DP solution when large state space appears in complex cases. Because of the computational burden and unknown model information in dynamic programming (DP), it is suggested to use an approximate dynamic programming (ADP). Finally, the online near-optimal algorithm using ADP with RLS-TD(λ) is confirmed. In simulation experiments, especially with the integration of APS, the proposed algorithm indicates a great advantage in performance measures and computation efficiency
Bonneau, Mathieu. "Échantillonnage adaptatif optimal dans les champs de Markov, application à l'échantillonnage d'une espèce adventice." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1909/.
Full textThis work is divided into two parts: (i) the theoretical study of the problem of adaptive sampling in Markov Random Fields (MRF) and (ii) the modeling of the problem of weed sampling in a crop field and the design of adaptive sampling strategies for this problem. For the first point, we first modeled the problem of finding an optimal sampling strategy as a finite horizon Markov Decision Process (MDP). Then, we proposed a generic algorithm for computing an approximate solution to any finite horizon MDP with known model. This algorithm, called Least-Squared Dynamic Programming (LSDP), combines the concepts of dynamic programming and reinforcement learning. It was then adapted to compute adaptive sampling strategies for any type of MRF distributions and observations costs. An experimental evaluation of this algorithm was performed on simulated problems. For the second point, we first modeled the weed spatial repartition in the MRF framework. Second, we have built a cost model adapted to the weed sampling problem. Finally, both models were used together to design adaptive sampling strategies with the LSDP algorithm. Based on real world data, these strategies were compared to a simple heuristic and to static sampling strategies classically used for weed sampling
Radoszycki, Julia. "Résolution de processus décisionnels de Markov à espace d'état et d'action factorisés - Application en agroécologie." Thesis, Toulouse, INSA, 2015. http://www.theses.fr/2015ISAT0022/document.
Full textThis PhD thesis focuses on the resolution of problems of sequential decision makingunder uncertainty, modelled as Markov decision processes (MDP) whose state and actionspaces are both of high dimension. Resolution of these problems with a good compromisebetween quality of approximation and scaling is still a challenge. Algorithms for solvingthis type of problems are rare when the dimension of both spaces exceed 30, and imposecertain limits on the nature of the problems that can be represented.We proposed a new framework, called F3MDP, as well as associated approximateresolution algorithms. A F3MDP is a Markov decision process with factored state andaction spaces (FA-FMDP) whose solution policies are constrained to be in a certainfactored form, and can be stochastic. The algorithms we proposed belong to the familyof approximate policy iteration algorithms and make use of continuous optimisationtechniques, and inference methods for graphical models.These policy iteration algorithms have been validated on a large number of numericalexperiments. For small F3MDPs, for which the optimal global policy is available, theyprovide policy solutions that are close to the optimal global policy. For larger problemsfrom the graph-based Markov decision processes (GMDP) subclass, they are competitivewith state-of-the-art algorithms in terms of quality. We also show that our algorithmsallow to deal with F3MDPs of very large size outside the GMDP subclass, on toy problemsinspired by real problems in agronomy or ecology. The state and action spaces arethen both of dimension 100, and of size 2100. In this case, we compare the quality of thereturned policies with the one of expert policies. In the second part of the thesis, we applied the framework and the proposed algorithms to determine ecosystem services management strategies in an agricultural landscape.Weed species, ie wild plants of agricultural environments, have antagonistic functions,being at the same time in competition with the crop for resources and keystonespecies in trophic networks of agroecosystems. We seek to explore which organizationsof the landscape (here composed of oilseed rape, wheat and pasture) in space and timeallow to provide at the same time production services (production of cereals, fodder andhoney), regulation services (regulation of weed populations and wild pollinators) andcultural services (conservation of weed species and wild pollinators). We developed amodel for weeds and pollinators dynamics and for reward functions modelling differentobjectives (production, conservation of biodiversity or trade-off between services). Thestate space of this F3MDP is of size 32100, and the action space of size 3100, which meansthis F3MDP has substantial size. By solving this F3MDP, we identified various landscapeorganizations that allow to provide different sets of ecosystem services which differ inthe magnitude of each of the three classes of ecosystem services
El, Falou Salah. "Programmation répartie, optimisation par agent mobile." Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00123168.
Full textet d'échanger des informations entre ces différentes entités. Les agents
mobiles apparaissent dans ce contexte comme une solution prometteuse
permettant la construction d'applications flexibles, adaptables aux
contraintes de l'application et de l'environnement d'exécution. Dans
cette thèse, la mobilité est étudiée sous deux angles. D'une part,
l'envoi du code sur le serveur permet d'adapter les services distants
aux exigences du client ce qui permet la réduction du trafic réseau.
D'autre part, une machine surchargée peut déléguer l'exécution de
certaines de ces tâches à une autre machine ce qui permet de gagner au
niveau du temps d'exécution. Une architecture basée sur la technologie
d'agents mobiles est proposée. Elle permet l'équilibrage de charge dans
une application répartie. L'architecture proposée est décentralisée et
l'équilibrage de charge se fait d'une façon dynamique. Un agent mobile
collecteur est utilisé afin de construire une vision globale du système.
Pour la réduction du trafic, nous proposons la communication par un
agent intelligent hybride. L'agent utilise ainsi deux modes,
client/serveur ou migration (échange locale), pour sa communication. Le
processus décisionnel de Markov est utilisé pour trouver la politique
optimale du déplacement de l'agent. Un travail d'expérimentation sur des
problèmes concrets permet de valider les algorithmes proposés.
Thomas, Vincent. "Proposition d'un formalisme pour la construction automatique d'interactions dans les systèmes multi-agents réactifs." Phd thesis, Université Henri Poincaré - Nancy I, 2005. http://tel.archives-ouvertes.fr/tel-00011094.
Full textLes formalismes existants comme les DEC-POMDPs parviennent à représenter des problèmes multi-agents mais ne représentent pas au niveau individuel la notion d'interaction fondamentale dans les systèmes collectifs. Ceci induit une complexité algorithmique importante dans les algorithmes de résolution. Afin de donner aux agents la possibilité d'appréhender la présence d'autres agents et de structurer de manière implicite les systèmes multi-agents, cette thèse propose un formalisme original, l'interac-DEC-POMDP inspiré des DEC-POMDPs et d'Hamelin, une simulation développée au cours de cette thèse et issue d'expériences conduites en éthologie. La spécificité de ce formalisme réside dans la capacité offerte aux agents d'interagir directement et localement entre eux. Cette possibilité permet des prises de décision à un niveau intermédiaire entre des décisions globales impliquant l'ensemble des agents et des décisions purement individuelles.
Nous avons proposé en outre un algorithme décentralisé basé sur des techniques d'apprentissage par renforcement et une répartition heuristique des gains des agents au cours des interactions. Une démarche expérimentale nous a permis de valider sa capacité à produire pour des restriction du formalisme des comportements collectifs pertinents adaptatifs sans qu'aucun agent ne dispose d'une vue globale du système.
Guillot, Matthieu. "Le problème du plus court chemin stochastique et ses variantes : fondements et applications à l'optimisation de stratégie dans le sport." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM024.
Full textA golf course consists of eighteen holes. On each hole, the golfer has to move the ball from the tee to the flag in a minimum number of shots. Under some assumptions, the golfer's problem can be modeled as a stochastic shortest path problem (SSP). SSP problem is a special case of Markov Decision Processes in which an agent evolves dynamically in a finite set of states. In each state, the agent chooses an action that leads him to another state following a known probability distribution. This action induces a cost. There exists a `sink node' in which the agent, once in it, stays with probability one and a cost zero. The goal of the agent is to reach the sink node with a minimum expected cost. In the first chapter, we study the SSP problem theoretically. We define a new framework in which the assumptions needed for the existence of an optimal policy are weakened. We prove that the most famous algorithm still converge in this setting. We also define a new algorithm to solve exactly the problem based on the primal-dual algorithm. In the second chapter we detail the golfer's problem model as a SSP. Thanks to the Shotlink database, we create `numerical clones' of players and simulate theses clones on different golf course in order to predict professional golfer's scores. We apply our model on two competitions: the master of Augusta in 2017 and the Ryder Cup in 2018. In the third chapter, we study the 2-player natural extension of SSP problem: the stochastic shortest path games. We study two special cases, and in particular linear programming formulation of these games
Hamila, Mohammed Amine. "Planification multi-agents dans un cadre markovien : les jeux stochastiques à somme générale." Thesis, Valenciennes, 2012. http://www.theses.fr/2012VALE0014/document.
Full textPlanning agent’s actions in a dynamic and uncertain environment has been extensively studied. The framework of Markov decision process provides tools to model and solve such problems. The field of game theory has allowed the study of strategic interactions between multiple agents for a given game. The framework of stochastic games is considered as a generalization of the fields of Markov decision process and game theory. It allows to model systems with multiple agents and multiple states. However, planning in a multi-agent system is considered difficult : agent’s decisions depend not only on its actions but also on actions of the other agents. The work presented in this thesis focuses on decision making in distributed multi-agent systems. Existing works in this field allow the theoretical resolution of stochastic games but place severe restrictions and ignore some crucial problems of the model. We propose a decentralized planning algorithm for the model of stochastic games. Our proposal is based on the Value-Iteration algorithm and on the concept of Nash equilibrium. To improve the resolution process and to deal with large problems, we sought to ease decision making and limit the set of joint actions at each stage. The proposed algorithm was validated on a coordination problem including several agents and various experiments were conducted to assess the quality of the resulting solution
Ferrari, Fabio Valerio. "Cooperative POMDPs for human-Robot joint activities." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMC257/document.
Full textThis thesis presents a novel method for ensuring cooperation between humans and robots in public spaces, under the constraint of human behavior uncertainty. The thesis introduces a hierarchical and flexible framework based on POMDPs. The framework partitions the overall joint activity into independent planning modules, each dealing with a specific aspect of the joint activity: either ensuring the human-robot cooperation, or proceeding with the task to achieve. The cooperation part can be solved independently from the task and executed as a finite state machine in order to contain online planning effort. In order to do so, we introduce a belief shift function and describe how to use it to transform a POMDP policy into an executable finite state machine.The developed framework has been implemented in a real application scenario as part of the COACHES project. The thesis describes the Escort mission used as testbed application and the details of implementation on the real robots. This scenario has as well been used to carry several experiments and to evaluate our contributions
Desquesnes, Guillaume Louis Florent. "Distribution de Processus Décisionnels Markoviens pour une gestion prédictive d’une ressource partagée : application aux voies navigables des Hauts-de-France dans le contexte incertain du changement climatique." Thesis, Ecole nationale supérieure Mines-Télécom Lille Douai, 2018. http://www.theses.fr/2018MTLD0001/document.
Full textThe work of this thesis aims to introduce and implement a predictive management under uncertainties of the water resource for inland waterway networks. The objective is to provide a water management plan to optimize the navigation conditions of the entire supervised network over a specified horizon. The expected solution must render the network resilient to probable effects of the climate change and changes in waterway traffic. Firstly, a generic modeling of a resource distributed on a network is proposed. This modeling, based on Markovian Decision Processes, takes into account the numerous uncertainties affecting considered networks. The objective of this modeling is to cover all possible cases, foreseen or not, in order to have a resilient management of those networks. The second contribution consists in a distribution of the model over several agents to facilitate the scaling. This consists of a repartition of the network's control capacities among the agents. Thus, each agent has only local knowledge of the supervised network. As a result, agents require coordination to provide an efficient management of the network. An iterative resolution, with exchanges of temporary plans from each agent, is used to obtain local management policies for each agent. Finally, experiments were carried out on realistic and real networks of the French waterways to observe the quality of the solutions produced. Several different climatic scenarios have been simulated to test the resilience of the produced policies
Souza, Oliveira Camila Helena. "Reliability and cost efficiency in coding-based in-network data storage and data retrieval for IoT/WSNs." Thesis, Paris Est, 2015. http://www.theses.fr/2015PESC1134/document.
Full textWireless Sensor Networks (WSN) are made up of small devices limited in terms of memory, processing and energy capacity. They work interconnected and autonomously in order to monitoring a region or an object of interest. The evolution in the development of devices more powerful (with new capability such as energy harvesting and acting) and less expensive made the WSNs a crucial element in the emergence of Internet of Things (IoT). Nonetheless, assuming the new applications and services offered in the IoT scenario, new issues arise in the data management performed in the WSNs. Indeed, in this new context, WSNs have to deal with a large amount of data, now consumed on-demand, while ensure a good trade-off between its reliability and retrievability, and the energy consumption. In the scope of this thesis, we are interested in the data management in the WSN in the context of IoT realm. Specifically, we approach the problem of in-network data storage by posing the following question: How to store data for a short term in the WSNs so that the data could be easily retrieved by the consumers while ensuring the best trade-off between data reliability and conservation of energy resources? Foremost, we propose a reliable data storage scheme based on coding network, and assuming a communication model defined by the Publish/Subscribe paradigm. We validate the efficiency of our proposal by a theoretical analyses that is corroborate by a simulation evaluation. The results show that our scheme achieves a reliability of 80% in data delivery with the best cost-benefit compared to other data storage scheme. Aiming to further improve the performance of the data storage scheme proposed in our first contribution, we propose its optimization (modeling it as a Markov Decision Process (MDP)) in order to store data with optimal trade-off between reliability and communication overhead (in this context, also seen as energy consumption), and in an autonomously and adaptive way. For the best of our knowledge, our optimized data storage scheme is the only to ensure data reliability while adapt itself according to the service requirements and network condition. In addition, we propose a generalization of the mathematical model used in our first contribution, and a system model that defines the integration of WSNs performing our data storage scheme in the context for which it was envisaged, the IoT realm. Our performance evaluation shows that our optimization allows the consumers to retrieve up to 70% more packets than a scheme without optimization whereas increase the network lifetime of 43%.Finally, after being interested in finding the best trade-off between reliability and cost, we now focus on an auxiliary way to reduce the energy consumption in the sensor nodes. As our third contribution, we propose a study, in two parts, to measure how much a node activity scheduling can save energy. First, we propose an improvement in the duty cycle mechanism defined in the 802.15.4. Then, we propose a duty cycle mechanism introduced into our data storage scheme aiming at saving energy in the storage nodes. The simulation results show that our solution to the duty cycle mechanism in 802.15.4 led in considerable saving in energy costs. However, regarding duty cycle in our data storage scheme, it did not end up in more energy saving. Actually, as our optimized scheme already saves as much resource energy as possible while ensuring high reliability, the duty cycle mechanism can not improve the energy saving without compromise the data reliability. Nonetheless, this result corroborates that our scheme, indeed, performs under the optimal trade-off between reliability and communication overhead (consumption energy)
Sprauel, Jonathan. "Conception sûre et optimale de systèmes dynamiques critiques auto-adaptatifs soumis à des événements redoutés probabilistes." Thesis, Toulouse, ISAE, 2016. http://www.theses.fr/2016ESAE0003/document.
Full textThis study takes place in the broad field of Artificial Intelligence, specifically at the intersection of two domains : Automated Planning and Formal Verification in probabilistic environment. In this context, it raises the question of the integration of new technologies in critical systems, and the complexity it entails : How to ensure that adding intelligence to a system, in the form of autonomy, is not done at the expense of safety ? To address this issue, this study aims to develop a tool-supported process for designing critical, self-adaptive systems. Throughout this document, innovations are therefore proposed in methods of formal modeling and in algorithms for safe and optimal planning
Paniah, Crédo. "Approche multi-agents pour la gestion des fermes éoliennes offshore." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112067/document.
Full textRenewable Energy Sources (RES) has grown remarkably in last few decades. Compared to conventional energy sources, renewable generation is more available, sustainable and environment-friendly - for example, there is no greenhouse gases emission during the energy generation. However, while electrical network stability requires production and consumption equality and the electricity market constrains producers to contract future production a priori and respect their furniture commitments or pay substantial penalties, RES are mainly uncontrollable and their behavior is difficult to forecast accurately. De facto, they jeopardize the stability of the physical network and renewable producers competitiveness in the market. The Winpower project aims to design realistic, robust and stable control strategies for offshore networks connecting to the main electricity system renewable sources and controllable storage devices owned by different autonomous actors. Each actor must embed its own local physical device control strategy but a global network management mechanism, jointly decided between connected actors, should be designed as well.We assume a market participation of the actors as an unique entity (the coalition of actors connected by the Winpower network) allowing the coalition to facilitate the network management through resources aggregation, renewable producers to take advantage of controllable sources flexibility to handle market penalties risks, as well as storage devices owners to leverage their resources on the market and/or with the management of renewable imbalances. This work tackles the market participation of the coalition as a Cooperative Virtual Power Plant. For this purpose, we describe a multi-agent architecture trough the definition of intelligent agents managing and operating actors resources and the description of these agents interactions; it allows the alliance of local constraints and objectives and the global network management objective.We formalize the aggregation and planning of resources utilization as a Markov Decision Process (MDP), a formal model suited for sequential decision making in uncertain environments. Its aim is to define the sequence of actions which maximize expected actual incomes of the market participation, while decisions over controllable resources have uncertain outcomes. However, market participation decision is prior to the actual operation when renewable generation still is uncertain. Thus, the Markov Decision Process is intractable as its state in each decision time-slot is not fully observable. To solve such a Partially Observable MDP (POMDP), we decompose it into a classical MDP and an information state (a probability distribution over renewable generation errors). The Information State MDP (IS-MDP) obtained is solved with an adaptation of the Backwards Induction, a classical MDP resolution algorithm.Then, we describe a common simulation framework to compare our proposed methodology to some other strategies, including the state of the art in renewable generation market participation. Simulations results validate the resources aggregation strategy and confirm that cooperation is beneficial to renewable producers and storage devices owners when they participate in electricity market. The proposed architecture is designed to allow the distribution of the decision making between the coalition’s actors, through the implementation of a suitable coordination mechanism. We propose some distribution methodologies, to this end
Studzinski, Perotto Filipo. "Un Mécanisme Constructiviste d'Apprentissage Automatique d'Anticipations pour des Agents Artificiels Situés." Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2010. http://tel.archives-ouvertes.fr/tel-00620755.
Full textPerotto, Filipo Studzinski. "Um mecanismo construtivista para aprendizagem de antecipações em agentes artificiais situados." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/27653.
Full textEsta pesquisa caracteriza-se, primeiramente, pela condução de uma discussão teórica sobre o conceito de agente autônomo, baseada em elementos provenientes dos paradigmas da Inteligência Artificial Situada e da Inteligência Artificial Afetiva. A seguir, a tese apresenta o problema da aprendizagem de modelos de mundo, fazendo uma revisão bibliográfica a respeito de trabalhos relacionados. A partir dessas discussões, a arquitetura CAES e o mecanismo CALM são apresentados. O CAES (Coupled Agent-Environment System) é uma arquitetura para a descrição de sistemas baseados na dicotomia agente-ambiente. Ele define agente e ambiente como dois sistemas parcialmente abertos, em acoplamento dinâmico. O agente, por sua vez, é composto por dois subsistemas, mente e corpo, seguindo os princípios de situatividade e motivação intrínseca. O CALM (Constructivist Anticipatory Learning Mechanism) é um mecanismo de aprendizagem fundamentado na abordagem construtivista da Inteligência Artificial. Ele permite que um agente situado possa construir um modelo de mundo em ambientes parcialmente observáveis e parcialmente determinísticos, na forma de um Processo de Decisão de Markov Parcialmente Observável e Fatorado (FPOMDP). O modelo de mundo construído é então utilizado para que o agente defina uma política de ações a fim de melhorar seu próprio desempenho.
This research is characterized, first, by a theoretical discussion on the concept of autonomous agent, based on elements taken from the Situated AI and the Affective AI paradigms. Secondly, this thesis presents the problem of learning world models, providing a bibliographic review regarding some related works. From these discussions, the CAES architecture and the CALM mechanism are presented. The CAES (Coupled Agent-Environment System) is an architecture for describing systems based on the agent-environment dichotomy. It defines the agent and the environment as two partially open systems, in dynamic coupling. The agent is composed of two sub-systems, mind and body, following the principles of situativity and intrinsic motivation. CALM (Constructivist Learning Anticipatory Mechanism) is based on the constructivist approach to Artificial Intelligence. It allows a situated agent to build a model of the world in environments partially deterministic and partially observable in the form of Partially Observable and Factored Markov Decision Process (FPOMDP). The model of the world is constructed and used for the agent to define a policy for action in order to improve its own performance.