Contents
Academic literature on the topic 'Processus décisionnel de Markov(MDP)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Processus décisionnel de Markov(MDP).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Dissertations / Theses on the topic "Processus décisionnel de Markov(MDP)"
Alizadeh, Pegah. "Elicitation and planning in Markov decision processes with unknown rewards." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCD011/document.
Full textMarkov decision processes (MDPs) are models for solving sequential decision problemswhere a user interacts with the environment and adapts her policy by taking numericalreward signals into account. The solution of an MDP reduces to formulate the userbehavior in the environment with a policy function that specifies which action to choose ineach situation. In many real world decision problems, the users have various preferences,and therefore, the gain of actions on states are different and should be re-decoded foreach user. In this dissertation, we are interested in solving MDPs for users with differentpreferences.We use a model named Vector-valued MDP (VMDP) with vector rewards. We propose apropagation-search algorithm that allows to assign a vector-value function to each policyand identify each user with a preference vector on the existing set of preferences wherethe preference vector satisfies the user priorities. Since the user preference vector is notknown we present several methods for solving VMDPs while approximating the user’spreference vector.We introduce two algorithms that reduce the number of queries needed to find the optimalpolicy of a user: 1) A propagation-search algorithm, where we propagate a setof possible optimal policies for the given MDP without knowing the user’s preferences.2) An interactive value iteration algorithm (IVI) on VMDPs, namely Advantage-basedValue Iteration (ABVI) algorithm that uses clustering and regrouping advantages. Wealso demonstrate how ABVI algorithm works properly for two different types of users:confident and uncertain.We finally work on a minimax regret approximation method as a method for findingthe optimal policy w.r.t the limited information about user’s preferences. All possibleobjectives in the system are just bounded between two higher and lower bounds while thesystem is not aware of user’s preferences among them. We propose an heuristic minimaxregret approximation method for solving MDPs with unknown rewards that is faster andless complex than the existing methods in the literature
Boussard, Matthieu. "Planification multi-agents multi-objectifs : modèle et algorithme." Caen, 2008. http://www.theses.fr/2008CAEN2065.
Full textThis thesis deals with the coordination of a group of autonomous agents in the real world. So, we have to take into account uncertainty about action's outcome, about other agent's behavior and also the changes in the environment. We are using Markov decision processes (MDP), whose allow to manage those uncertainties in a decision process. In order to manage the interactions with the other agents, we give a formalism to express them, and also we give a solution to integrate them in a on-line decision process. This is an extension of the Markov Decision Processes where the agent are trying to optimize their own reward as well as the welfare of the group. This is a mutlicriteria decision problem, and we give it a solution. Once this formalism built, we tackle some classical coordination problems : platooning, spatial coverage, coalitions formation. Those applications allow us to apply with success the principle given at the beginning of the thesis. The extensions of this work will be dealing with on-line learning, and also game theory in order to detect and to solve deadlocks
Lelerre, Mathieu. "Processus Décisionnels de Markov pour l'autonomie ajustable et l'interaction hétérogène entre engins autonomes et pilotés." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC246/document.
Full textRobots will be more and more used in both civil and military fields. These robots, operating in fleet, can accompany soldiers in fight, or accomplish a mission while being supervised by a control center. Considering the requirement of a military operation, it is complicated to let robots decide their action without an operator agreement or watch, in function of the situation.In this thesis, we focus on two problematics:First, we try to exploit adjustable autonomy to make a robot accomplishes is mission as efficiency as possible, while he respects restrictions, assigned by an operator, on his autonomy level. For this, it is able to define for given sets of states and actions a restriction level. This restriction can force, for example, the need of being tele-operated to access a dangerous zone.Secondly, we consider that several robots can be deployed at the same time. These robots have to coordinate to accomplish their objectives. However, since operators can take the control of some robots, the coordination is harder. In fact, the operator has preferences, perception, hesitation, stress that are not modeled by the agent. It is then hard to estimate his next actions, so to coordinate with him. We propose in this thesis an approach to estimate the policy executed by a tele-operated robot from learning methods, based on observed actions from this robot.The notion of planning his important in these works. These are based on planning models, such as Markov Decision Processes
Yin, Biao. "Contrôle adaptatif des feux de signalisation dans les carrefours : modélisation du système de trafic dynamique et approches de résolution." Thesis, Belfort-Montbéliard, 2015. http://www.theses.fr/2015BELF0279/document.
Full textAdaptive traffic signal control is a decision making optimization problem. People address this crucial problem constantly in order to solve the traffic congestion at urban intersections. It is very popular to use intelligent algorithms to improve control performances, such as traffic delay. In the thesis, we try to study this problem comprehensively with a microscopic and dynamic model in discrete-time, and investigate the related algorithms both for isolated intersection and distributed network control. At first, we focus on dynamic modeling for adaptive traffic signal control and network loading problems. The proposed adaptive phase sequence (APS) mode is highlighted as one of the signal phase control mechanisms. As for the modeling of signal control at intersections, problems are fundamentally formulated by Markov decision process (MDP), especially the concept of tunable system state is proposed for the traffic network coordination. Moreover, a new vehicle-following model supports for the network loading environment.Based on the model, signal control methods in the thesis are studied by optimal and near-optimal algorithms in turn. Two exact DP algorithms are investigated and results show some limitations of DP solution when large state space appears in complex cases. Because of the computational burden and unknown model information in dynamic programming (DP), it is suggested to use an approximate dynamic programming (ADP). Finally, the online near-optimal algorithm using ADP with RLS-TD(λ) is confirmed. In simulation experiments, especially with the integration of APS, the proposed algorithm indicates a great advantage in performance measures and computation efficiency
Bonneau, Mathieu. "Échantillonnage adaptatif optimal dans les champs de Markov, application à l'échantillonnage d'une espèce adventice." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1909/.
Full textThis work is divided into two parts: (i) the theoretical study of the problem of adaptive sampling in Markov Random Fields (MRF) and (ii) the modeling of the problem of weed sampling in a crop field and the design of adaptive sampling strategies for this problem. For the first point, we first modeled the problem of finding an optimal sampling strategy as a finite horizon Markov Decision Process (MDP). Then, we proposed a generic algorithm for computing an approximate solution to any finite horizon MDP with known model. This algorithm, called Least-Squared Dynamic Programming (LSDP), combines the concepts of dynamic programming and reinforcement learning. It was then adapted to compute adaptive sampling strategies for any type of MRF distributions and observations costs. An experimental evaluation of this algorithm was performed on simulated problems. For the second point, we first modeled the weed spatial repartition in the MRF framework. Second, we have built a cost model adapted to the weed sampling problem. Finally, both models were used together to design adaptive sampling strategies with the LSDP algorithm. Based on real world data, these strategies were compared to a simple heuristic and to static sampling strategies classically used for weed sampling
Radoszycki, Julia. "Résolution de processus décisionnels de Markov à espace d'état et d'action factorisés - Application en agroécologie." Thesis, Toulouse, INSA, 2015. http://www.theses.fr/2015ISAT0022/document.
Full textThis PhD thesis focuses on the resolution of problems of sequential decision makingunder uncertainty, modelled as Markov decision processes (MDP) whose state and actionspaces are both of high dimension. Resolution of these problems with a good compromisebetween quality of approximation and scaling is still a challenge. Algorithms for solvingthis type of problems are rare when the dimension of both spaces exceed 30, and imposecertain limits on the nature of the problems that can be represented.We proposed a new framework, called F3MDP, as well as associated approximateresolution algorithms. A F3MDP is a Markov decision process with factored state andaction spaces (FA-FMDP) whose solution policies are constrained to be in a certainfactored form, and can be stochastic. The algorithms we proposed belong to the familyof approximate policy iteration algorithms and make use of continuous optimisationtechniques, and inference methods for graphical models.These policy iteration algorithms have been validated on a large number of numericalexperiments. For small F3MDPs, for which the optimal global policy is available, theyprovide policy solutions that are close to the optimal global policy. For larger problemsfrom the graph-based Markov decision processes (GMDP) subclass, they are competitivewith state-of-the-art algorithms in terms of quality. We also show that our algorithmsallow to deal with F3MDPs of very large size outside the GMDP subclass, on toy problemsinspired by real problems in agronomy or ecology. The state and action spaces arethen both of dimension 100, and of size 2100. In this case, we compare the quality of thereturned policies with the one of expert policies. In the second part of the thesis, we applied the framework and the proposed algorithms to determine ecosystem services management strategies in an agricultural landscape.Weed species, ie wild plants of agricultural environments, have antagonistic functions,being at the same time in competition with the crop for resources and keystonespecies in trophic networks of agroecosystems. We seek to explore which organizationsof the landscape (here composed of oilseed rape, wheat and pasture) in space and timeallow to provide at the same time production services (production of cereals, fodder andhoney), regulation services (regulation of weed populations and wild pollinators) andcultural services (conservation of weed species and wild pollinators). We developed amodel for weeds and pollinators dynamics and for reward functions modelling differentobjectives (production, conservation of biodiversity or trade-off between services). Thestate space of this F3MDP is of size 32100, and the action space of size 3100, which meansthis F3MDP has substantial size. By solving this F3MDP, we identified various landscapeorganizations that allow to provide different sets of ecosystem services which differ inthe magnitude of each of the three classes of ecosystem services
El, Falou Salah. "Programmation répartie, optimisation par agent mobile." Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00123168.
Full textet d'échanger des informations entre ces différentes entités. Les agents
mobiles apparaissent dans ce contexte comme une solution prometteuse
permettant la construction d'applications flexibles, adaptables aux
contraintes de l'application et de l'environnement d'exécution. Dans
cette thèse, la mobilité est étudiée sous deux angles. D'une part,
l'envoi du code sur le serveur permet d'adapter les services distants
aux exigences du client ce qui permet la réduction du trafic réseau.
D'autre part, une machine surchargée peut déléguer l'exécution de
certaines de ces tâches à une autre machine ce qui permet de gagner au
niveau du temps d'exécution. Une architecture basée sur la technologie
d'agents mobiles est proposée. Elle permet l'équilibrage de charge dans
une application répartie. L'architecture proposée est décentralisée et
l'équilibrage de charge se fait d'une façon dynamique. Un agent mobile
collecteur est utilisé afin de construire une vision globale du système.
Pour la réduction du trafic, nous proposons la communication par un
agent intelligent hybride. L'agent utilise ainsi deux modes,
client/serveur ou migration (échange locale), pour sa communication. Le
processus décisionnel de Markov est utilisé pour trouver la politique
optimale du déplacement de l'agent. Un travail d'expérimentation sur des
problèmes concrets permet de valider les algorithmes proposés.
Thomas, Vincent. "Proposition d'un formalisme pour la construction automatique d'interactions dans les systèmes multi-agents réactifs." Phd thesis, Université Henri Poincaré - Nancy I, 2005. http://tel.archives-ouvertes.fr/tel-00011094.
Full textLes formalismes existants comme les DEC-POMDPs parviennent à représenter des problèmes multi-agents mais ne représentent pas au niveau individuel la notion d'interaction fondamentale dans les systèmes collectifs. Ceci induit une complexité algorithmique importante dans les algorithmes de résolution. Afin de donner aux agents la possibilité d'appréhender la présence d'autres agents et de structurer de manière implicite les systèmes multi-agents, cette thèse propose un formalisme original, l'interac-DEC-POMDP inspiré des DEC-POMDPs et d'Hamelin, une simulation développée au cours de cette thèse et issue d'expériences conduites en éthologie. La spécificité de ce formalisme réside dans la capacité offerte aux agents d'interagir directement et localement entre eux. Cette possibilité permet des prises de décision à un niveau intermédiaire entre des décisions globales impliquant l'ensemble des agents et des décisions purement individuelles.
Nous avons proposé en outre un algorithme décentralisé basé sur des techniques d'apprentissage par renforcement et une répartition heuristique des gains des agents au cours des interactions. Une démarche expérimentale nous a permis de valider sa capacité à produire pour des restriction du formalisme des comportements collectifs pertinents adaptatifs sans qu'aucun agent ne dispose d'une vue globale du système.
Guillot, Matthieu. "Le problème du plus court chemin stochastique et ses variantes : fondements et applications à l'optimisation de stratégie dans le sport." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM024.
Full textA golf course consists of eighteen holes. On each hole, the golfer has to move the ball from the tee to the flag in a minimum number of shots. Under some assumptions, the golfer's problem can be modeled as a stochastic shortest path problem (SSP). SSP problem is a special case of Markov Decision Processes in which an agent evolves dynamically in a finite set of states. In each state, the agent chooses an action that leads him to another state following a known probability distribution. This action induces a cost. There exists a `sink node' in which the agent, once in it, stays with probability one and a cost zero. The goal of the agent is to reach the sink node with a minimum expected cost. In the first chapter, we study the SSP problem theoretically. We define a new framework in which the assumptions needed for the existence of an optimal policy are weakened. We prove that the most famous algorithm still converge in this setting. We also define a new algorithm to solve exactly the problem based on the primal-dual algorithm. In the second chapter we detail the golfer's problem model as a SSP. Thanks to the Shotlink database, we create `numerical clones' of players and simulate theses clones on different golf course in order to predict professional golfer's scores. We apply our model on two competitions: the master of Augusta in 2017 and the Ryder Cup in 2018. In the third chapter, we study the 2-player natural extension of SSP problem: the stochastic shortest path games. We study two special cases, and in particular linear programming formulation of these games
Hamila, Mohammed Amine. "Planification multi-agents dans un cadre markovien : les jeux stochastiques à somme générale." Thesis, Valenciennes, 2012. http://www.theses.fr/2012VALE0014/document.
Full textPlanning agent’s actions in a dynamic and uncertain environment has been extensively studied. The framework of Markov decision process provides tools to model and solve such problems. The field of game theory has allowed the study of strategic interactions between multiple agents for a given game. The framework of stochastic games is considered as a generalization of the fields of Markov decision process and game theory. It allows to model systems with multiple agents and multiple states. However, planning in a multi-agent system is considered difficult : agent’s decisions depend not only on its actions but also on actions of the other agents. The work presented in this thesis focuses on decision making in distributed multi-agent systems. Existing works in this field allow the theoretical resolution of stochastic games but place severe restrictions and ignore some crucial problems of the model. We propose a decentralized planning algorithm for the model of stochastic games. Our proposal is based on the Value-Iteration algorithm and on the concept of Nash equilibrium. To improve the resolution process and to deal with large problems, we sought to ease decision making and limit the set of joint actions at each stage. The proposed algorithm was validated on a coordination problem including several agents and various experiments were conducted to assess the quality of the resulting solution