Dissertations / Theses: 'Fouille de Données Textuelles Hétérogènes'

1

Alencar, Medeiros Gabriel Henrique. "ΡreDiViD Τοwards the Ρredictiοn οf the Disseminatiοn οf Viral Disease cοntagiοn in a pandemic setting." Electronic Thesis or Diss., Normandie, 2025. http://www.theses.fr/2025NORMR005.

Full text

Abstract:

Les systèmes de surveillance basés sur les événements (EBS) sont essentiels pour détecter et suivre les phénomènes de santé émergents tels que les épidémies et crises sanitaires. Cependant, ils souffrent de limitations, notamment une forte dépendance à l’expertise humaine, des difficultés à traiter des données textuelles hétérogènes et une prise en compte insuffisante des dynamiques spatio-temporelles. Pour pallier ces limites, nous proposons une approche hybride combinant des méthodologies guidées par les connaissances et les données, ancrée dans l’ontologie des phénomènes de propagation (PropaPhen) et le cadre Description-Detection-Prediction Framework (DDPF), afin d’améliorer la description, la détection et la prédiction des phénomènes de propagation. PropaPhen est une ontologie FAIR conçue pour modéliser la propagation spatio-temporelle des phénomènes et a été spécialisée pour le biomédical grâce à l’intégration de UMLS et World-KG, menant à la création du graphe BioPropaPhenKG. Le cadre DDPF repose sur trois modules : la description, générant des ontologies spécifiques ; la détection, appliquant des techniques d'extraction de relations sur des textes hétérogènes ; et la prédiction, utilisant des méthodes avancées de clustering. Expérimenté sur des données du COVID-19 et de la variole du singe et validé avec les données de l’OMS, DDPF a démontré son efficacité dans la détection et la prédiction de clusters spatio-temporels. Son architecture modulaire assure son évolutivité et son adaptabilité à divers domaines, ouvrant des perspectives en santé publique, environnement et phénomènes sociaux
Event-Based Surveillance (EBS) systems are essential for detecting and tracking emerging health phenomena such as epidemics and public health crises. However, they face limitations, including strong dependence on human expertise, challenges processing heterogeneous textual data, and insufficient consideration of spatiotemporal dynamics. To overcome these issues, we propose a hybrid approach combining knowledge-driven and data-driven methodologies, anchored in the Propagation Phenomena Ontology (PropaPhen) and the Description-Detection-Prediction Framework (DDPF), to enhance the description, detection, and prediction of propagation phenomena. PropaPhen is a FAIR ontology designed to model the spatiotemporal spread of phenomena. It has been specialized in the biomedical domain through the integration of UMLS and World-KG, leading to the creation of the BioPropaPhenKG knowledge graph. The DDPF framework consists of three modules: description, which generates domain-specific ontologies; detection, which applies relation extraction techniques to heterogeneous textual sources; and prediction, which uses advanced clustering methods. Tested on COVID-19 and Monkeypox datasets and validated against WHO data, DDPF demonstrated its effectiveness in detecting and predicting spatiotemporal clusters. Its modular architecture ensures scalability and adaptability to various domains, opening perspectives in public health, environmental monitoring, and social phenomena