To see the other types of publications on this topic, follow the link: Semantic Annotation.

Dissertations / Theses on the topic 'Semantic Annotation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Semantic Annotation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Cardoso, Silvio Domingos. "MAISA - Maintenance of semantic annotations." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS338/document.

Full text
Abstract:
Les annotations sémantiques sont utilisées dans de nombreux domaines comme celui de la santé et servent à différentes tâches notamment la recherche et le partage d’information ou encore l'aide à la décision. Les annotations sont produites en associant à des documents digitaux des labels de concepts provenant des systèmes d’organisation de la connaissance (Knowledge Organization Systems, ou KOS, en anglais) comme les ontologies. Elles permettent alors aux ordinateurs d'interpréter, connecter et d'utiliser de manière automatique de grandes quantités de données. Cependant, la nature dynamique de la connaissance engendre régulièrement de profondes modifications au niveau du contenu des KOS provoquant ainsi un décalage entre la définition des concepts et les annotations. Une adaptation des annotations à ces changements est nécessaire pour garantir une bonne utilisation par les applications informatiques. De plus, la quantité importante d’annotations affectées rend impossible une adaptation manuelle. Dans ce mémoire de thèse, nous proposons une approche originale appelée MAISA pour résoudre le problème de l'adaptation des annotations sémantiques engendrée par l’évolution des KOS et pour lequel nous distinguons deux cas. Dans le premier cas, nous considérons que les annotations sont directement modifiables. Pour traiter ce problème nous avons défini une approche à base de règles combinant des informations provenant de l’évolution des KOS et des connaissances extraites du Web. Dans le deuxième cas, nous considérons que les annotations ne sont pas modifiables comme c’est bien souvent le cas des annotations associées aux données des patients. L’objectif ici étant de pouvoir retrouver les documents annotées avec une version du KOS donnée lorsque l’utilisateur interroge le système stockant ces documents avec le vocabulaire du même KOS mais d’une version différente. Pour gérer ce décalage de versions, nous avons proposé un graphe de connaissance représentant un KOS et son historique et un mécanisme d’enrichissement de requêtes permettant d’extraire de ce graphe l’historique d’un concept pour l’ajouter à la requête initiale. Nous proposons une évaluation expérimentale de notre approche pour la maintenance des annotations à partir de cas réels construits sur quatre KOS du domaine de la santé : ICD-9-CM, MeSH, NCIt et SNOMED CT. Nous montrons à travers l’utilisation des métriques classiques que l’approche proposée permet, dans les deux cas considérés, d’améliorer la maintenance des annotations sémantiques<br>Semantic annotations are often used in a wide range of applications ranging from information retrieval to decision support. Annotations are produced through the association of concept labels from Knowledge Organization System (KOS), i.e. ontology, thesaurus, dictionaries, with pieces of digital information, e.g. images or texts. Annotations enable machines to interpret, link, and use a vast amount of data. However, the dynamic nature of KOS may affect annotations each time a new version of a KOS is released. New concepts can be added, obsolete ones removed and the definition of existing concepts may be refined through the modification of their labels/properties. As a result, many annotations can lose their relevance, thus hindering the intended use and exploitation of annotated data. To solve this problem, methods to maintain the annotations up-to-date are required. In this thesis we propose a framework called MAISA to tackle the problem of adapting outdated annotations when the KOS utilized to create them change. We distinguish two different cases. In the first one we consider that annotations are directly modifiable. In this case, we proposed a rule-based approach implementing information derived from the evolution of KOS as well as external knowledge from the Web. In the second case, we consider that the annotations are not modifiable. The goal is then to keep the annotated documents searchable even if the annotations are produced with a given KOS version but the user used another version to query them. In this case, we designed a knowledge graph that represent a KOS and its successive evolution and propose a method to extract the history of a concept and add the gained label to the initial query allowing to deal with annotation evolution. We experimentally evaluated MAISA on realistic cases-studies built from four well-known biomedical KOS: ICD-9-CM, MeSH, NCIt and SNOMED CT. We show that the proposed maintenance method allow to maintain semantic annotations using standard metrics
APA, Harvard, Vancouver, ISO, and other styles
2

Aydinlilar, Merve. "Semi-automatic Semantic Video Annotation Tool." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613966/index.pdf.

Full text
Abstract:
Semantic annotation of video content is necessary for indexing and retrieval tasks of video management systems. Currently, it is not possible to extract all high-level semantic information from video data automatically. Video annotation tools assist users to generate annotations to represent video data. Generated annotations can also be used for testing and evaluation of content based retrieval systems. In this study, a semi-automatic semantic video annotation tool is presented. Generated annotations are in MPEG-7 metadata format to ensure interoperability. With the help of image processing and pattern recognition solutions, annotation process is partly automated and annotation time is reduced. Annotations can be done for spatio-temporal decompositions of video data. Extraction of low-level visual descriptions are included to obtain complete descriptions.
APA, Harvard, Vancouver, ISO, and other styles
3

Wong, Chun Fan. "Automatic semantic image annotation and retrieval." HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1188.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Di, Francescomarino Chiara. "Semantic annotation of business process models." Doctoral thesis, Università degli studi di Trento, 2011. https://hdl.handle.net/11572/367849.

Full text
Abstract:
In the last decades, business process models have increasingly been used by companies with different purposes, such as documenting enacted processes or enabling and improving the communication among stakeholders (e.g., designers and implementers). Aside from the differences, all the roles played by process models involve human actors (e.g., business designers, business analysts, re-engineers) and hence demand for readability and ease of use, beyond correctness and reasonable completeness. It often happens, however, that process models are large and intricate, thus resulting potentially difficult to understand and to manage. In this thesis we propose some techniques aimed at supporting business designers and analysts in the management of business process models. The core of the proposal is the enrichment of process models with semantic annotations from domain ontologies and the formalization of both structural and domain information in a shared knowledge base, thus opening to the possibility of exploiting reasoning for supporting business experts in their work. In detail, this thesis investigates some of the services that can be provided on top of the process semantic annotation, as for example, the automatic verification of process constraints, the automated querying of process models or the semi-automatic mining, documentation and modularization of crosscutting concerns. Moreover, special care is devoted to support designers and analysts when process models are not available or they have to be semantically annotated. Specifically, an approach for recovering process models from (Web) applications and some metrics for evaluating the understandability of the recovered models are investigated. Techniques for suggesting candidate semantic annotations are also proposed. The results obtained by applying the presented techniques have been validated by means of case studies, performance evaluations and empirical investigations.
APA, Harvard, Vancouver, ISO, and other styles
5

Di, Francescomarino Chiara. "Semantic annotation of business process models." Doctoral thesis, University of Trento, 2011. http://eprints-phd.biblio.unitn.it/547/1/DiFrancescomarino_Chiara.pdf.

Full text
Abstract:
In the last decades, business process models have increasingly been used by companies with different purposes, such as documenting enacted processes or enabling and improving the communication among stakeholders (e.g., designers and implementers). Aside from the differences, all the roles played by process models involve human actors (e.g., business designers, business analysts, re-engineers) and hence demand for readability and ease of use, beyond correctness and reasonable completeness. It often happens, however, that process models are large and intricate, thus resulting potentially difficult to understand and to manage. In this thesis we propose some techniques aimed at supporting business designers and analysts in the management of business process models. The core of the proposal is the enrichment of process models with semantic annotations from domain ontologies and the formalization of both structural and domain information in a shared knowledge base, thus opening to the possibility of exploiting reasoning for supporting business experts in their work. In detail, this thesis investigates some of the services that can be provided on top of the process semantic annotation, as for example, the automatic verification of process constraints, the automated querying of process models or the semi-automatic mining, documentation and modularization of crosscutting concerns. Moreover, special care is devoted to support designers and analysts when process models are not available or they have to be semantically annotated. Specifically, an approach for recovering process models from (Web) applications and some metrics for evaluating the understandability of the recovered models are investigated. Techniques for suggesting candidate semantic annotations are also proposed. The results obtained by applying the presented techniques have been validated by means of case studies, performance evaluations and empirical investigations.
APA, Harvard, Vancouver, ISO, and other styles
6

Reeve, Lawrence H. Han Hyoil. "Semantic annotation and summarization of biomedical text /." Philadelphia, Pa. : Drexel University, 2007. http://hdl.handle.net/1860/1779.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ullah, Irfan. "Semantic multimedia modelling & interpretation for annotation." Thesis, Middlesex University, 2011. http://eprints.mdx.ac.uk/9129/.

Full text
Abstract:
The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user's high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems.
APA, Harvard, Vancouver, ISO, and other styles
8

Felt, Paul L. "Facilitating Corpus Annotation by Improving Annotation Aggregation." BYU ScholarsArchive, 2015. https://scholarsarchive.byu.edu/etd/5678.

Full text
Abstract:
Annotated text corpora facilitate the linguistic investigation of language as well as the automation of natural language processing (NLP) tasks. NLP tasks include problems such as spam email detection, grammatical analysis, and identifying mentions of people, places, and events in text. However, constructing high quality annotated corpora can be expensive. Cost can be reduced by employing low-cost internet workers in a practice known as crowdsourcing, but the resulting annotations are often inaccurate, decreasing the usefulness of a corpus. This inaccuracy is typically mitigated by collecting multiple redundant judgments and aggregating them (e.g., via majority vote) to produce high quality consensus answers. We improve the quality of consensus labels inferred from imperfect annotations in a number of ways. We show that transfer learning can be used to derive benefit from out-dated annotations which would typically be discarded. We show that, contrary to popular preference, annotation aggregation models that take a generative data modeling approach tend to outperform those that take a condition approach. We leverage this insight to develop csLDA, a novel annotation aggregation model that improves on the state of the art for a variety of annotation tasks. When data does not permit generative data modeling, we identify a conditional data modeling approach based on vector-space text representations that achieves state-of-the-art results on several unusual semantic annotation tasks. Finally, we identify a family of models capable of aggregating annotation data containing heterogenous annotation types such as label frequencies and labeled features. We present a multiannotator active learning algorithm for this model family that jointly selects an annotator, data items, and annotation type.
APA, Harvard, Vancouver, ISO, and other styles
9

Lin, Yun. "Semantic Annotation for Process Models : Facilitating Process Knowledge Management via Semantic Interoperability." Doctoral thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-2119.

Full text
Abstract:
<p>Business process models representing process knowledge about doing business are necessary for designing Information Systems (IS) solutions in enterprises. Interoperability of business process knowledge in legacy systems is crucial for enterprise systems interoperation and integration due to increased enterprise cooperation and business exchange. Many modern technologies and approaches are deployed to support business process interoperability either at the instance level or the protocol level, such as BPML, WSDL and SOAP. However, we argue that a holistic approach is necessary for semantic interoperability of business process models at the conceptual level when considering the process models as reusable process knowledge for other (new or integrated) IS solutions. This brings requirements to manage semantic heterogeneity of process knowledge in process models which are distributed across different enterprise systems. Semantic annotation is an approach to achieve semantic interoperability of heterogeneous resources. However, such an approach has usually been applied to enhance the semantics of unstructured and structured artifacts (e.g. textual resources [72] [49], and Web services [166] [201]).</p><p>The aim of the research is to introduce an ontology-based semantic annotation approach to enrich and reconcile semantics of process models — a kind of semi-structured artifact, for managing process knowledge. The approach brings together techniques in process modeling, ontology building, semantic matching, and Description Logic inference in order to provide a comprehensive semantic annotation framework. Furthermore, a prototype system that supports the process of ontology-based semantic annotation of heterogeneous process models is described. The applicational goal of our approach is to facilitate process knowledge management activities (e.g. discovery, reuse, and integration of process knowledge/models) by enhanced semantic interoperability.</p><p>A survey has been performed through identifying semantic heterogeneity in process modeling and investigating semantic technology from theoretical and practical views. Based on the results from the survey, a comprehensive semantic annotation framework has been developed, which provides a method to manage semantic heterogeneity of process models from the following perspectives. First, basic descriptions of process models (profile annotation); second, process modeling languages (meta-model annotation); third, contents of process models (model annotation) and finally intentions of process model owners (goal annotation). Applying the semantic annotation framework, an ontology-based annotation method has been elaborated, which results in two categories of research activity — ontology building and semantic mapping. In ontology building, we use Web Ontology Language (OWL), a Semantic Web technology, which can be used to model ontologies. GPO (General Process Ontology) comprising core concepts in most process modeling languages is proposed; domain concepts are classified in the corresponding categories of GPO as a domain ontology; design principles for building a goal ontology are introduced in order to serve the annotation of process models pragmatically. In semantic mapping, a set of mapping strategies are developed to conduct the annotation by considering the semantic relationships between model artifacts and ontology references and as well the semantic inference mechanism supported by OWL DL (Description Logic). The annotation method is finally formalized into a process semantic annotation model - PSAM.</p><p>The proposed approach has been implemented in a prototype annotation tool —ProSEAT to facilitate the annotation process. Procedures of applying the semantic annotation approach with the tool are described through exemplar study. The annotation approach and the prototype tool are evaluated using a quality framework. Furthermore, the applicability of the annotation results is validated by going through a process knowledge management application. The Semantic Web Rule Language (SWRL) is applied in the application demonstration. We argue that the ontology-based annotation approach combined with the Semantic Web technology is a feasible approach to reconcile semantic heterogeneity in the process knowledge management. Limitations and future work are discussed after concluding this research work.</p><p>The contributions of this thesis are summarized as follows. First, a general process ontology is proposed for unifying process representations at a high level of abstraction. Second, a semantic annotation framework is introduced to describe process knowledge systematically. Third, ontology-based annotation methods are elaborated and formalized. Fourth, an annotation system, utilizing the developed formal methods, is designed and implemented. Fifth, a process knowledge management system is outlined as the platform for manipulating the annotation results. Moreover, applying results of the approach is demonstrated through a process model integration example.</p>
APA, Harvard, Vancouver, ISO, and other styles
10

Sordo, Mohamed. "Semantic annotation of music collections: A computational approach." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/79132.

Full text
Abstract:
El consum de la música ha canviat dràsticament en els últims anys. Amb l’arribada de la música digital, el cost de producció s’ha reduït considerablement. L’expansió de la Web ha ajudat a promoure l’exploració de molt més contingut musical. Algunes botigues musicals on-line, com iTunes o Amazon, posseeixen milions de cançons a les seves col.leccions. No obstant, accedir a aquestes col.leccions d’una manera eficient és encara un gran repte. En aquesta tesis ens centrem en el problema d’anotar col.leccions musicals amb paraules semàntiques, també conegudes com tags. Els mètodes utilitzats en aquesta tesi estan fonamentats sobre els camps de recuperació de la informació, l’inteligència artificial, i el procesament del senyal. Proposem un algorisme per anotar música automàticament, utilitzant similitud d’audio a nivell de contingut per propagar tags entre cançons. L’algorisme s’avalua extensament utilitzant múltiples col.leccions musicals de diferent mida i qualitat de les dades, incloent una col.lecció de més de mig milió de cançons, anotades amb tags socials derivats d’una comunitat musical. Avaluem la qualitat del nostre algorisme mitjançant una comparació amb algorismes de l’estat de l’art. Addicionalment, discutim la importància d’utilitzar mesures de avaluació que cobreixen diferents dimensions, és a dir, avaluacions a nivell de cançó i a nivell de tag. El nostre algorisme ha estat avaluat i s’ha classificat en altes posicions en el concurs d’avaluació internacional MIREX 2011. Els resultats obtinguts també demostren algunes limitacions de l’anotació automàtica, relacionades amb les inconsistències en les dades, la correlació de conceptes i la dificultat de capturar alguns tags personals amb informació del contingut. Això és més evident en les comunitats musicals, on els usuaris poden anotar cançons amb qualsevol paraula, sigui aquesta contextual o no. Per tal d’abordar aquestes limitacions, presentem un ampli estudi sobre la naturalesa de les folksonomies musicals. Concretament, estudiem si les anotacions fetes per una gran comunitat d’usuaris coincideixen amb un vocabulari més controlat i estructurat per part d’experts en el camp. Els resultats revelen que alguns tags estan clarament definits i compresos tant des del punt de vista dels experts com el de la saviesa popular, mentre que n’hi ha d’altres sobre els quals és difícil trobar un consens. Finalment, estenem el nostre previ treball a un ampli ventall de conceptes semàntics. Presentem un nou métode per a descobrir conceptes semàntics implícits en els tags socials, i classificar aquests tags pel que fa als conceptes semàntics. Les darreres troballes poden ajudar a entendre la naturalesa dels tags socials, i per tant ser beneficials per a una addicional millora de la anotació automàtica de la música.<br>Music consumption has changed drastically in the last few years. With the arrival of digital music, the cost of production has substantially dropped. The expansion of the World Wide Web has helped to promote the exploration of many more music content. Online stores, such as iTunes or Amazon, own music collections in the order of millions of songs. Accessing these large collections in an effective manner is still a big challenge. In this dissertation we focus on the problem of annotating music collections with semantic words, also called tags. The foundations of all the methods used in this dissertation are based on techniques from the fields of information retrieval, machine learning, and signal processing. We propose an automatic music annotation algorithm that uses content-based audio similarity to propagate tags among songs. The algorithm is evaluated extensively using multiple music collections of varying size and quality of the data, including a large music collection of more than a half million songs, annotated with social tags derived from a music community. We assess the quality of our proposed algorithm by comparing it with several state of the art approaches. We also discuss the importance of using evaluation measures that cover different dimensions; per– song and per–tag evaluation. Our proposal achieves state of the art results, and has ranked high in the MIREX 2011 evaluation campaign. The obtained results also show some limitations of automatic tagging, related to data inconsistencies, correlation of concepts and the difficulty to capture some personal tags with content information. This is more evident in music communites, where users can annotate songs with any free text word. In order to tackle these issues, we present an in-depth study of the nature of music folksonomies. We concretely study whether tag annotations made by a large community (i.e. a folksonomy) correspond with a more controlled, structured vocabulary by experts in the music and the psychology fields. Results reveal that some tags are clearly defined and understood both by the experts and the wisdom of crowds, while it is difficult to achieve a common consensus on the meaning of other tags. Finally, we extend our previous work to a wide range of semantic concepts. We present a novel way to uncover facets implicit in social tagging, and classify the tags with respect to these semantic facets. The latter findings can help to understand the nature of social tags, and thus be beneficial for further improvement of semantic tagging of music. Our findings have significant implications for music information retrieval systems that assist users to explore large music collections, digging for content they might like.<br>El consumo de la música ha cambiado drásticamente en los últimos años. Con la llegada de la música digital, el coste de producción se ha reducido considerablemente. La expansión de la Web ha ayudado a promover la exploración de mucho más contenido musical. Algunas tiendas musicales on-line, como iTunes o Amazon, poseen millones de canciones en sus colecciones. Sin embargo, acceder a estas colecciones de una manera eficiente es todavía un gran reto. En esta tesis nos centramos en el problema de anotar colecciones musicales con palabras semánticas, también conocidas como tags. Los métodos utilizados en esta tesis están cimentados sobre los campos de recuperación de la información, la inteligencia artifical, y el procesamiento del señal. Proponemos un algoritmo para anotar música automáticamente, usando similitud de audio a nivel de contenido para propagar tags entre canciones. El algoritmo se evalúa extensamente usando múltiples colecciones musicales de distinto tamaño y calidad de los datos, incluyendo una colección de más de medio millón de canciones, anotadas con tags sociales derivados de una comunidad musical. Evaluamos la calidad de nuestro algoritmo mediante una comparación con algoritmos del estado del arte. Adicionalmente, discutimos la importancia de usar medidas de evaluación que cubren diferentes dimensiones; es decir, evaluaciones a nivel de canción y a nivel de tag. Nuestro algoritmo ha sido evaluado y se clasificado en altas posiciones en el concurso de evaluación internacional MIREX 2011. Los resultados obtenidos también demuestran algunas limitaciones de la anotación automática, relacionadas con las inconsistencias en los datos, la correlación de conceptos y la dificultad de capturar algunos tags personales con información del contenido. Esto es más evidente en las comunidades musicales, donde los usuarios pueden anotar canciones con cualquier palabra, sea esta contextual o no. Con el fin de abordar estas limitaciones, presentamos un amplio estudio sobre la naturaleza de las folksonomías musicales. Concretamente, estudiamos si las anotaciones hechas por una gran comunidad de usuarios concuerdan con un vocabulario más controlado y estructurado por parte de expertos en el campo. Los resultados revelan que algunos tags están claramente definidos y comprendidos tanto desde el punto de vista de los expertos como el de la sabiduría popular, mientras que hay otros tags sobre los cuales es difícil encontrar un consenso. Por último, extendemos nuestro previo trabajo a un amplio abanico de conceptos semánticos. Presentamos un método novedoso para descubrir conceptos semánticos implícitos en los tags sociales, y clasificar dichos tags con respecto a los conceptos semánticos. Los últimos hallazgos pueden ayudar a entender la naturaleza de los tags sociales, y por consiguiente ser beneficiales para una adicional mejora para la anotación automática de la música.
APA, Harvard, Vancouver, ISO, and other styles
11

Gkotsoulia, Paraskevi A. "An entailment-based approach to semantic role annotation." Thesis, University of Essex, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528842.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Cui, Hong. "Semantic annotation of morphological descriptions: an overall strategy." BioMed Central, 2010. http://hdl.handle.net/10150/610209.

Full text
Abstract:
BACKGROUND:Large volumes of morphological descriptions of whole organisms have been created as print or electronic text in a human-readable format. Converting the descriptions into computer- readable formats gives a new life to the valuable knowledge on biodiversity. Research in this area started 20 years ago, yet not sufficient progress has been made to produce an automated system that requires only minimal human intervention but works on descriptions of various plant and animal groups. This paper attempts to examine the hindering factors by identifying the mismatches between existing research and the characteristics of morphological descriptions.RESULTS:This paper reviews the techniques that have been used for automated annotation, reports exploratory results on characteristics of morphological descriptions as a genre, and identifies challenges facing automated annotation systems. Based on these criteria, the paper proposes an overall strategy for converting descriptions of various taxon groups with the least human effort.CONCLUSIONS:A combined unsupervised and supervised machine learning strategy is needed to construct domain ontologies and lexicons and to ultimately achieve automated semantic annotation of morphological descriptions. Further, we suggest that each effort in creating a new description or annotating an individual description collection should be shared and contribute to the "biodiversity information commons" for the Semantic Web. This cannot be done without a sound strategy and a close partnership between and among information scientists and biologists.
APA, Harvard, Vancouver, ISO, and other styles
13

Paiva, Nogueira Tales. "A Framework for Automatic Annotation of Semantic Trajectories." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM004/document.

Full text
Abstract:
Les données de localisation sont présentes dans plusieurs aspects de notre vie. Nous assistons à une utilisation croissante de ce type de données par une variété d'applications. En conséquence, les systèmes d'information sont demandés à traiter des grands ensembles de données brutes afin de construire des abstractions de haut niveau. La convergence des services de localisation et des standards de la Web sémantique rendent plus faciles les taches d’interconnexion et d’annotation des trajectoires.Dans cette thèse, nous nous concentrons sur la modélisation de trajectoires dans le contexte de la Web sémantique. Nous proposons une ontologie pour représenter des épisodes génériques. Notre modèle couvre aussi des éléments contextuels qui peuvent être liés à des trajectoires. Nous proposons aussi un framework contenant trois algorithmes d'annotation des trajectoires. Le premier détecte les mouvements, les arrêts et les données manquants; le second est capable de compresser des séries temporelles et de créer des épisodes qui reprennent l'évolution des caractéristiques de la trajectoire; le troisième exploite les données liées pour annoter des trajectoires avec des éléments géographiques qui l’intersecte a partir des données d'OpenStreetMap.Comme résultats, nous avons une nouvelle ontologie qui peut représenter des phénomènes spatiotemporels dans différents niveaux de granularité. En outre, notre méthode de détection de mouvement-arrêt-bruit est capable de traiter des traces échantillonnées irrégulièrement et ne dépend pas des données externes; notre méthode de compression des séries temporelles est capable de trouver des valeurs qui la résume en même temps que des segments trop courts sont évités; et notre algorithme d'annotation spatiale explore des données liées et des relations entre concepts pour trouver des types pertinents d'entités spatiales qui peuvent décrire l'environnement où la trajectoire a eu lieu<br>Location data is ubiquitous in many aspects of our lives. We are witnessing an increasing usage of this kind of data by a variety of applications. As a consequence, information systems are required to deal with large datasets containing raw data in order to build high level abstractions. Semantic Web technologies offers powerful representation tools for pervasive applications. The convergence of location-based services and Semantic Web standards allows an easier interlinking and annotation of trajectories.In this thesis, we focus in modeling mobile object trajectories in the context of the Semantic Web. First, we propose an ontology that allows the representation of generic episodes. Our model also handles contextual elements that may be related to trajectories. Second, we propose a framework containing three algorithms for automatic annotation of trajectories. The first one detects moves, stops, and noisy data; the second one is able to compress generic time series and create episodes that resumes the evolution of trajectory characteristics; the third one exploits the linked data cloud to annotate trajectories with geographic elements that intersects it with data from OpenStreetMap.As results of this thesis, we have a new ontology that can represent spatiotemporal phenomena at different levels of granularity. Moreover, our framework offers three novel algorithms for trajectory annotation. The move-stop-noise detection method is able to deal with irregularly sampled traces and do not depend on external data of the underlying geography; our time series compression method is able to find values that summarize a series at the same time that too small segments are avoided; and our spatial annotation algorithm explores linked data and the relationships among concepts to find relevant types of spatial features to describe the environment where the trajectory took place
APA, Harvard, Vancouver, ISO, and other styles
14

Li, Honglin. "Hierarchical video semantic annotation the vision and techniques /." Connect to this title online, 2003. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1071863899.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2003.<br>Title from first page of PDF file. Document formatted into pages; contains xv, 146 p.; also includes graphics. Includes bibliographical references (p. 136-146).
APA, Harvard, Vancouver, ISO, and other styles
15

CUTRONA, VINCENZO. "Semantic Table Annotation for Large-Scale Data Enrichment." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/317044.

Full text
Abstract:
I dati rappresentano uno dei principali asset che creano valore. L'analisi dei dati è diventata una componente cruciale negli studi scientifici e nelle decisioni aziendali negli ultimi anni e ha portato i ricercatori a definire nuove metodologie per rappresentare, gestire e analizzare i dati. Contemporaneamente, la crescita della potenza di calcolo ha permesso l'analisi di enormi quantità di dati, permettendo alle persone di estrarre informazioni utili dai dati raccolti. L'analisi predittiva gioca un ruolo cruciale in molte applicazioni poiché fornisce più conoscenza per supportare le decisioni aziendali. Tra le tecniche statistiche disponibili per supportare l'analitica predittiva, l'apprendimento automatico è una tecnica capace di risolvere molte classi diverse di problemi, e che ha beneficiato maggiormente della crescita della potenza di calcolo. Infatti, negli ultimi anni, sono stati proposti modelli di apprendimento automatico più complessi e accurati, che richiedono una quantità crescente di dati attuali e storici per funzionare al meglio. La richiesta di una quantità così massiccia di dati per addestrare i modelli di apprendimento automatico rappresenta un ostacolo iniziale per i data scientist, perché le informazioni necessarie sono di solito sparse in diversi set di dati che devono essere integrati manualmente. Di conseguenza, l'arricchimento dei dati è diventato un compito critico nel processo di preparazione dei dati, e al giorno d'oggi, la maggior parte dei progetti prevedere un processo di preparazione dei dati costoso in termini di tempo, volto ad arricchire un set di dati principali con informazioni aggiuntive da varie fonti esterne per migliorare la solidità dei modelli addestrati risultanti. Come facilitare la progettazione del processo di arricchimento per gli scienziati dei dati è una sfida, così come sostenere il processo di arricchimento su larga scala. Nonostante la crescente importanza dell'attività di arricchimento, essa è ancora supportata solo in misura limitata dalle soluzioni esistenti, delegando la maggior parte dello sforzo al data scientist, che è incaricato sia di rilevare i set di dati che contengono le informazioni necessarie, sia di integrarli. In questa tesi, introduciamo una metodologia per supportare l'attività di arricchimento dei dati, che si concentra sullo sfruttamento della semantica come fattore chiave, fornendo agli utenti uno strumento semantico per progettare il processo di arricchimento, insieme a una piattaforma per eseguire il processo su larga scala. Illustriamo come l'arricchimento dei dati può essere affrontato tramite trasformazioni di dati tabellari, sfruttando metodi di interpretazione semantica delle tabelle, e discutiamo le tecniche di implementazione per supportare l'esecuzione del processo risultante su grandi set di dati. Dimostriamo sperimentalmente la scalabilità e l'efficienza della soluzione proposta impiegandola in uno scenario del mondo reale. Infine, introduciamo un nuovo set di dati di riferimento per valutare le prestazioni e la scalabilità degli algoritmi di annotazione semantica delle tabelle, e proponiamo un nuovo approccio efficiente per migliorare le prestazioni di tali algoritmi.<br>Data are the new oil, and they represent one of the main value-creating assets. Data analytics has become a crucial component in scientific studies and business decisions in the last years and has brought researchers to define novel methodologies to represent, manage, and analyze data. Simultaneously, the growth of computing power enabled the analysis of huge amounts of data, allowing people to extract useful information from collected data. Predictive analytics plays a crucial role in many applications since it provides more knowledge to support business decisions. Among the statistical techniques available to support predictive analytics, machine learning is the technique that features capabilities to solve many different classes of problems, and that has benefited the most from computing power growth. In the last years, more complex and accurate machine learning models have been proposed, requiring an increasing amount of current and historical data to perform the best. The demand for such a massive amount of data to train machine learning models represents an initial hurdle for data scientists because the information needed is usually scattered in different data sets that have to be manually integrated. As a consequence, data enrichment has become a critical task in the data preparation process, and nowadays, most of all the data science projects involve a time-costly data preparation process aimed at enriching a core data set with additional information from various external sources to improve the sturdiness of resulting trained models. How to ease the design of the enrichment process for data scientists is defying and supporting the enrichment process at a large scale. Despite the growing importance of the enrichment task, it is still supported only to a limited extent by existing solutions, delegating most of the effort to the data scientist, who is in charge of both detecting the data sets that contain the needed information, and integrate them. In this thesis, we introduce a methodology to support the data enrichment task, which focuses on harnessing the semantics as the key factor by providing users with a semantics-aided tool to design the enrichment process, along with a platform to execute the process at a business scale. We illustrate how the data enrichment can be addressed via tabular data transformations exploiting semantic table interpretation methods, discussing implementation techniques to support the enactment of the resulting process on large data sets. We experimentally demonstrate the scalability and run-time efficiency of the proposed solution by employing it in a real-world scenario. Finally, we introduce a new benchmark dataset to evaluate the performance and the scalability of existing semantic table annotation algorithms, and we propose an efficient novel approach to improve the performance of such algorithms.
APA, Harvard, Vancouver, ISO, and other styles
16

Bannour, Hichem. "Building and Using Knowledge Models for Semantic Image Annotation." Phd thesis, Ecole Centrale Paris, 2013. http://tel.archives-ouvertes.fr/tel-00905953.

Full text
Abstract:
This dissertation proposes a new methodology for building and using structured knowledge models for automatic image annotation. Specifically, our first proposals deal with the automatic building of explicit and structured knowledge models, such as semantic hierarchies and multimedia ontologies, dedicated to image annotation. Thereby, we propose a new approach for building semantic hierarchies faithful to image semantics. Our approach is based on a new image-semantic similarity measure between concepts and on a set of rules that allow connecting the concepts with higher relatedness till the building of the final hierarchy. Afterwards, we propose to go further in the modeling of image semantics through the building of explicit knowledge models that incorporate richer semantic relationships between image concepts. Therefore, we propose a new approach for automatically building multimedia ontologies consisting of subsumption relationships between concepts, and also other semantic relationships such as contextual and spatial relations. Fuzzy description logics are used as a formalism to represent our ontology and to deal with the uncertainty and the imprecision of concept relationships. In order to assess the effectiveness of the built structured knowledge models, we propose subsequently to use them in a framework for image annotation. We propose therefore an approach, based on the structure of semantic hierarchies, to effectively perform hierarchical image classification. Furthermore, we propose a generic approach for image annotation combining machine learning techniques, such as hierarchical image classification, and fuzzy ontological-reasoning in order to achieve a semantically relevant image annotation. Empirical evaluations of our approaches have shown significant improvement in the image annotation accuracy.
APA, Harvard, Vancouver, ISO, and other styles
17

Levy, Mark. "Retrieval and annotation of music using latent semantic models." Thesis, Queen Mary, University of London, 2012. http://qmro.qmul.ac.uk/xmlui/handle/123456789/2969.

Full text
Abstract:
This thesis investigates the use of latent semantic models for annotation and retrieval from collections of musical audio tracks. In particular latent semantic analysis (LSA) and aspect models (or probabilistic latent semantic analysis, pLSA) are used to index words in descriptions of music drawn from hundreds of thousands of social tags. A new discrete audio feature representation is introduced to encode musical characteristics of automatically-identified regions of interest within each track, using a vocabulary of audio muswords. Finally a joint aspect model is developed that can learn from both tagged and untagged tracks by indexing both conventional words and muswords. This model is used as the basis of a music search system that supports query by example and by keyword, and of a simple probabilistic machine annotation system. The models are evaluated by their performance in a variety of realistic retrieval and annotation tasks, motivated by applications including playlist generation, internet radio streaming, music recommendation and catalogue search.
APA, Harvard, Vancouver, ISO, and other styles
18

Juby, Benjamin Paul. "Enhancing distributed real-time collaboration with automatic semantic annotation." Thesis, University of Southampton, 2005. https://eprints.soton.ac.uk/427164/.

Full text
Abstract:
Distributed real-time collaboration, such as group-to-group videoconferencing, is becoming increasingly popular. However, this form of collaboration tends to be less effective than co-located interactions and there is a significant body of research that has sought to improve the collaboration technology through a variety of methods. Some of this research has focused on adding annotations that explicitly represent events that take place during the course of a collaboration session. While this approach shows promise, existing work has in general lacked high-level semantics, which limits the scope for automated processing of these annotations. Furthermore, the systems tend not to work in real-time and therefore only provide benefit during the replay of recorded sessions. The systems also often require significant effort from the session participants to create the annotations. This thesis presents a general-purpose framework and proof of concept implementation for the automated, real-time annotation of live collaboration sessions. It uses technologies from the Semantic Web to introduce machine-processable semantics. This enables inference to be used to automatically generate annotations by inferring high-level events from basic events captured during collaboration sessions. Furthermore, the semantic approach allows the framework to support a high level of interoperability, reuse and extensibility. The real-time nature of the framework means that the annotations can be displayed to meeting participants dUling a live session, which means that they can directly be of benefit during the session as well as being archived for later indexing and replay of a session recording. The semantic annotations are authored in RDF (Resource Description Framework) and are compliant to an OWL (Web Ontology Language) ontology. Both these languages are World Wide Web Consortium (W3C) recommendations. The framework uses rule-based inference combined with knowledge from an external triplestore to generate the annotations. A shared buffer called a tuple space is used for sharing these annotations between distributed sites. The proof of concept implementation uses existing Access Grid videoconferencing technology as an example application domain, to which speaker identification and participant tracking are added as examples of semantic annotations.
APA, Harvard, Vancouver, ISO, and other styles
19

Alec, Céline. "Enrichissement et peuplement d’ontologie à partir de textes et de données du LOD : Application à l’annotation automatique de documents." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS228/document.

Full text
Abstract:
Cette thèse traite d'une approche, guidée par une ontologie, conçue pour annoter les documents d'un corpus où chaque document décrit une entité de même type. Dans notre contexte, l'ensemble des documents doit être annoté avec des concepts qui sont en général trop spécifiques pour être explicitement mentionnés dans les textes. De plus, les concepts d'annotation ne sont représentés au départ que par leur nom, sans qu'aucune information sémantique ne leur soit reliée. Enfin, les caractéristiques des entités décrites dans les documents sont incomplètes. Pour accomplir ce processus particulier d'annotation de documents, nous proposons une approche nommée SAUPODOC (Semantic Annotation Using Population of Ontology and Definitions of Concepts) qui combine plusieurs tâches pour (1) peupler et (2) enrichir une ontologie de domaine. La phase de peuplement (1) ajoute dans l'ontologie des informations provenant des documents du corpus mais aussi du Web des données (Linked Open Data ou LOD). Le LOD représente aujourd'hui une source prometteuse pour de très nombreuses applications du Web sémantique à condition toutefois de développer des techniques adaptées d'acquisition de données. Dans le cadre de SAUPODOC, le peuplement de l'ontologie doit tenir compte de la diversité des données présentes dans le LOD : propriétés multiples, équivalentes, multi-valuées ou absentes. Les correspondances à établir, entre le vocabulaire de l'ontologie à peupler et celui du LOD, étant complexes, nous proposons un modèle pour faciliter leur spécification. Puis, nous montrons comment ce modèle est utilisé pour générer automatiquement des requêtes SPARQL et ainsi faciliter l'interrogation du LOD et le peuplement de l'ontologie. Celle-ci, une fois peuplée, est ensuite enrichie(2) avec les concepts d'annotation et leurs définitions qui sont apprises grâce à des exemples de documents annotés. Un raisonnement sur ces définitions permet enfin d'obtenir les annotations souhaitées. Des expérimentations ont été menées dans deux domaines d'application, et les résultats, comparés aux annotations obtenues avec des classifieurs, montrent l'intérêt de l'approche<br>This thesis deals with an approach, guided by an ontology, designed to annotate documents from a corpus where each document describes an entity of the same type. In our context, all documents have to be annotated with concepts that are usually too specific to be explicitly mentioned in the texts. In addition, the annotation concepts are represented initially only by their name, without any semantic information connected to them. Finally, the characteristics of the entities described in the documents are incomplete. To accomplish this particular process of annotation of documents, we propose an approach called SAUPODOC (Semantic Annotation of Population Using Ontology and Definitions of Concepts) which combines several tasks to (1) populate and (2) enrich a domain ontology. The population step (1) adds to the ontology information from the documents in the corpus but also from the Web of Data (Linked Open Data or LOD). The LOD represents today a promising source for many applications of the Semantic Web, provided that appropriate techniques of data acquisition are developed. In the settings of SAUPODOC, the ontology population has to take into account the diversity of the data in the LOD: multiple, equivalent, multi-valued or absent properties. The correspondences to be established, between the vocabulary of the ontology to be populated and that of the LOD, are complex, thus we propose a model to facilitate their specification. Then, we show how this model is used to automatically generate SPARQL queries and facilitate the interrogation of the LOD and the population of the ontology. The latter, once populated, is then enriched (2) with the annotation concepts and definitions that are learned through examples of annotated documents. Reasoning on these definitions finally provides the desired annotations. Experiments have been conducted in two areas of application, and the results, compared with the annotations obtained with classifiers, show the interest of the approach
APA, Harvard, Vancouver, ISO, and other styles
20

Vicient, Monllaó Carlos. "Moving towards the semantic web: enabling new technologies through the semantic annotation of social contents." Doctoral thesis, Universitat Rovira i Virgili, 2015. http://hdl.handle.net/10803/285334.

Full text
Abstract:
La Web Social ha causat un creixement exponencial dels continguts disponibles deixant enormes quantitats de recursos textuals electrònics que sovint aclaparen els usuaris. Aquest volum d’informació és d’interès per a la comunitat de mineria de dades. Els algorismes de mineria de dades exploten característiques de les entitats per tal de categoritzar-les, agrupar-les o classificar-les segons la seva semblança. Les dades per si mateixes no aporten cap mena de significat: han de ser interpretades per esdevenir informació. Els mètodes tradicionals de mineria de dades no tenen com a objectiu “entendre” el contingut d’un recurs, sinó que extreuen valors numèrics els quals esdevenen models en aplicar-hi càlculs estadístics, que només cobren sentit sota l’anàlisi manual d’un expert. Els darrers anys, motivat per la Web Semàntica, molts investigadors han proposat mètodes semàntics de classificació de dades capaços d’explotar recursos textuals a nivell conceptual. Malgrat això, normalment aquests mètodes depenen de recursos anotats prèviament per poder interpretar semànticament el contingut d’un document. L’ús d’aquests mètodes està estretament relacionat amb l’associació de dades i el seu significat. Aquest treball es centra en el desenvolupament d’una metodologia genèrica capaç de detectar els trets més rellevants d’un recurs textual descobrint la seva associació semàntica, es a dir, enllaçant-los amb conceptes modelats a una ontologia, i detectant els principals temes de discussió. Els mètodes proposats són no supervisats per evitar el coll d’ampolla generat per l’anotació manual, independents del domini (aplicables a qualsevol àrea de coneixement) i flexibles (capaços d’analitzar recursos heterogenis: documents textuals o documents semi-estructurats com els articles de la Viquipèdia o les publicacions de Twitter). El treball ha estat avaluat en els àmbits turístic i mèdic. Per tant, aquesta dissertació és un primer pas cap a l'anotació semàntica automàtica de documents necessària per possibilitar el camí cap a la visió de la Web Semàntica.<br>La Web Social ha provocado un crecimiento exponencial de los contenidos disponibles, dejando enormes cantidades de recursos electrónicos que a menudo abruman a los usuarios. Tal volumen de información es de interés para la comunidad de minería de datos. Los algoritmos de minería de datos explotan características de las entidades para categorizarlas, agruparlas o clasificarlas según su semejanza. Los datos por sí mismos no aportan ningún significado: deben ser interpretados para convertirse en información. Los métodos tradicionales no tienen como objetivo "entender" el contenido de un recurso, sino que extraen valores numéricos que se convierten en modelos tras aplicar cálculos estadísticos, los cuales cobran sentido bajo el análisis manual de un experto. Actualmente, motivados por la Web Semántica, muchos investigadores han propuesto métodos semánticos de clasificación de datos capaces de explotar recursos textuales a nivel conceptual. Sin embargo, generalmente estos métodos dependen de recursos anotados previamente para poder interpretar semánticamente el contenido de un documento. El uso de estos métodos está estrechamente relacionado con la asociación de datos y su significado. Este trabajo se centra en el desarrollo de una metodología genérica capaz de detectar los rasgos más relevantes de un recurso textual descubriendo su asociación semántica, es decir, enlazándolos con conceptos modelados en una ontología, y detectando los principales temas de discusión. Los métodos propuestos son no supervisados para evitar el cuello de botella generado por la anotación manual, independientes del dominio (aplicables a cualquier área de conocimiento) y flexibles (capaces de analizar recursos heterogéneos: documentos textuales o documentos semi-estructurados, como artículos de la Wikipedia o publicaciones de Twitter). El trabajo ha sido evaluado en los ámbitos turístico y médico. Esta disertación es un primer paso hacia la anotación semántica automática de documentos necesaria para posibilitar el camino hacia la visión de la Web Semántica.<br>Social Web technologies have caused an exponential growth of the documents available through the Web, making enormous amounts of textual electronic resources available. Users may be overwhelmed by such amount of contents and, therefore, the automatic analysis and exploitation of all this information is of interest to the data mining community. Data mining algorithms exploit features of the entities in order to characterise, group or classify them according to their resemblance. Data by itself does not carry any meaning; it needs to be interpreted to convey information. Classical data analysis methods did not aim to “understand” the content and the data were treated as meaningless numbers and statistics were calculated on them to build models that were interpreted manually by human domain experts. Nowadays, motivated by the Semantic Web, many researchers have proposed semantic-grounded data classification and clustering methods that are able to exploit textual data at a conceptual level. However, they usually rely on pre-annotated inputs to be able to semantically interpret textual data such as the content of Web pages. The usability of all these methods is related to the linkage between data and its meaning. This work focuses on the development of a general methodology able to detect the most relevant features of a particular textual resource finding out their semantics (associating them to concepts modelled in ontologies) and detecting its main topics. The proposed methods are unsupervised (avoiding the manual annotation bottleneck), domain-independent (applicable to any area of knowledge) and flexible (being able to deal with heterogeneous resources: raw text documents, semi-structured user-generated documents such Wikipedia articles or short and noisy tweets). The methods have been evaluated in different fields (Tourism, Oncology). This work is a first step towards the automatic semantic annotation of documents, needed to pave the way towards the Semantic Web vision.
APA, Harvard, Vancouver, ISO, and other styles
21

Al, Asswad Mohammad Mourhaf. "Semantic information systems engineering : a query-based approach for semi-automatic annotation of web services." Thesis, Brunel University, 2011. http://bura.brunel.ac.uk/handle/2438/5441.

Full text
Abstract:
There has been an increasing interest in Semantic Web services (SWS) as a proposed solution to facilitate automatic discovery, composition and deployment of existing syntactic Web services. Successful implementation and wider adoption of SWS by research and industry are, however, profoundly based on the existence of effective and easy to use methods for service semantic description. Unfortunately, Web service semantic annotation is currently performed by manual means. Manual annotation is a difficult, error-prone and time-consuming task and few approaches exist aiming to semi-automate that task. Existing approaches are difficult to use since they require ontology building. Moreover, these approaches employ ineffective matching methods and suffer from the Low Percentage Problem. The latter problem happens when a small number of service elements - in comparison to the total number of elements – are annotated in a given service. This research addresses the Web services annotation problem by developing a semi-automatic annotation approach that allows SWS developers to effectively and easily annotate their syntactic services. The proposed approach does not require application ontologies to model service semantics. Instead, a standard query template is used: This template is filled with data and semantics extracted from WSDL files in order to produce query instances. The input of the annotation approach is the WSDL file of a candidate service and a set of ontologies. The output is an annotated WSDL file. The proposed approach is composed of five phases: (1) Concept extraction; (2) concept filtering and query filling; (3) query execution; (4) results assessment; and (5) SAWSDL annotation. The query execution engine makes use of name-based and structural matching techniques. The name-based matching is carried out by CN-Match which is a novel matching method and tool that is developed and evaluated in this research. The proposed annotation approach is evaluated using a set of existing Web services and ontologies. Precision (P), Recall (R), F-Measure (F) and Percentage of annotated elements are used as evaluation metrics. The evaluation reveals that the proposed approach is effective since - in relation to manual results - accurate and almost complete annotation results are obtained. In addition, high percentage of annotated elements is achieved using the proposed approach because it makes use of effective ontology extension mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
22

Adnan, Mehnaz. "A semantic annotation framework for patient-friendly electronic discharge summaries." Thesis, University of Auckland, 2011. http://hdl.handle.net/2292/10272.

Full text
Abstract:
Discharge summaries are intended to include information necessary to communicate the post-discharge framework of care to care providers as well as patients and their families. An important aspect is the availability of easily understandable discharge information to empower patients as partners in their post-discharge care. However, these summaries are found to impose comprehension barriers for consumers. We explore semantic annotation as an approach to improve discharge summaries by assigning links of various semantic types to entities in the text. Our approach is grounded in automated text analysis and panel assessment of a corpus of 200 Electronic Discharge Summaries (EDSs) to identify the barriers to patient use of these summaries. These analyses identified the presence of advanced clinical vocabulary, abbreviations and inadequate patient advice as major obstacles. In response to the findings from corpus analyses, we implemented two components, SemLink and SemAssist. Both of these components use the Unified Medical Language System (UMLS) and the Open Access Collaboratives' Consumer Health Vocabulary (CHV) as biomedical vocabularies and the General Architecture for Text Engineering (GATE) as a natural language processing framework. SemLink is designed to provide readability support for EDS text by adding hyperlinks to the most appropriate and readable consumer-based web resource for difficult terms and phrases. SemLink was developed iteratively and can embed its results in portable document format (PDF). In a preliminary automated evaluation, SemLink achieved 95% precision in hyperlinking topically relevant Web resources in which 83% of hyperlinks could be restricted to resources of reading grade-level 8 or less. In the final evaluation by expert feedback, SemLink generated 65% topically relevant hyperlinks as agreed by the majority of the experts. SemAssist is designed as an interactive ontology-based Clinical Decision Support System to assist EDS authors in providing optimal medication advice for patients. The system offers a pre-formulated auto text and an alert critique about the inclusion of advice on side effects, required patient actions and follow-up related to postdischarge care for a set of high risk medications. Together, SemLink and SemAssist illustrate the application of a semantic annotation framework to support consumers in getting the most from their EDSs by exploiting both dynamic hyperlinking and authoring support. Our approach may have a wider range of applications to support other health-related document types and clinical users.
APA, Harvard, Vancouver, ISO, and other styles
23

Chan, Ching Lap. "Semantic annotation in knowledge engineering, e-learning and computational linguistics." Thesis, City University London, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.576943.

Full text
Abstract:
In this work, a comprehensive study of semantic annotation has been carried out in early stage. The study focuses on the annotation requirements of human knowledge acquisition in knowledge engineering, e-Iearning and computational linguistics. Based on findings from the study, annotation of natural languages for linguistic analysis creates complicated data structures. Due to the complexity, almost all existing annotation schemes are designed to support only one application domain at one instance. Discovery of new knowledge by means of cross-domains text analysis is limited by the capability of these annotation schemes. To realize the findings in the study and provide solution to the problem, a new general-purpose annotation archival scheme has been developed but not limited to (1) Enable true cross-domain data analysis in knowledge engineering, e-Learning and computational linguistics, and (2) Organize complex structure of human knowledge annotation in an accessible manner, so they can be analyzed in multiple layers through retrieval, search, visualization and etc. To further verify the contributions of the new semantic annotation scheme in real application, experiments has been carried out in several areas, namely (1) collaborative retrieval of complex linguistic information, (2) computer-assisted production of learning material and (3) relevancy comparison between text. In (1), the annotation scheme is applied in a cloud-based platform for hosting parallel multilingual corpora leading to new applications such as computer assisted pattern visualization, speech analysis, speech-to-text transcription and statistical analysis. In (2), the annotation scheme provides support to applications that produce reader friendly learning material suites for teacher, and as a result improve learning quality. In (3), the annotation scheme provides support to a text' comparison platform that carries out writing assessment semantically. XIII
APA, Harvard, Vancouver, ISO, and other styles
24

Mendes, Pablo N. "Adaptive Semantic Annotation of Entity and Concept Mentions in Text." Wright State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=wright1401665504.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Garrido, Marquez Ivan. "Dynamics in semantic annotation, a perspective of information access system." Thesis, Sorbonne Paris Cité, 2019. http://www.theses.fr/2019USPCD008.

Full text
Abstract:
A travers cette étude, se présente une perspective dynamique de l'annotation sémantique. Cette perspective considère le passage du temps et les flux permanents de documents qui font croître les collections et étendre leurs systèmes d'annotation. Nous apportons également une vision de la qualité des systèmes d'annotations basée sur la notion d'accès à l'information et de cohérence. Dans notre vision de la qualité, l'information de vocabulaire d'annotation est la complexité à parcourir par un utilisateur à la recherche d'un certain sujet.Pour répondre au problème de la dynamique dans l'annotation sémantique, cette thèse propose une architecture modulaire pour l'annotation sémantique dynamique. Cette architecture modélise les activités impliquées dans le processus d'annotation sémantique en modules abstraits avec des considérations particulières en fonction de la tâche spécifique.Comme cas d'étude, nous prenons l’annotation de blogs. Nous rassemblâmes un corpus contenant jusqu'à 10 ans de billets de blog annotés avec des catégories et des tags et analysé les habitudes d'annotation observées. Nous explorons la suggestion automatique de tags et de catégories afin de mesurer l'impact de la dynamique dans le système d'annotation. Certaines stratégies pour faire face à cet impact ont été évaluées pour caractériser l'importance de l'âge des exemples.Enfin, nous proposons un cadre de trois mesures de qualité et une méthode interactive pour récupérer la qualité d'un système d'indexation basé sur des annotations sémantiques appuyée par les métriques. Les mesures ont été évaluées au fil du temps pour observer la dégradation de la qualité de l'indexation. Une série d'exemples étudiés sont présentés pour observer la performance des mesures visant à guider la restructuration du système d'annotation de l'indexation<br>The information is growing and evolving everyday and in every human activity. Documents of different modalities store our information. The dynamic nature of information is given by a flow of documents. The huge and ever-growing document collections opens the need for organizing, relating and searching for information in an efficient way. Although full-text search tools have been developed, people continue to categorize documents, often using automatic classification tools. These annotations categories can be considered as a semantic indexing: classifying newspaper articles or blog posts allows journalists or readers to quickly find documents that have been published in the past in relation to a given topic. However, the quality of an index based on semantic annotation often deteriorates with time due to the dynamics of the information it describes: some categories are misused or forgotten by indexers, others become obsolete or too general to be useful. Through this study we introduce a dynamic perspective of semantic annotation. This perspective considers the passage of time and the permanent flow of documents that makes the collections grow and their annotation systems to extend and evolve. We also bring a vision of the quality of annotations systems based on the notion of information access. Traditionally, the quality of the annotation is considered in terms of semantic adequacy between the contents of the documents and the annotation terms describe them. In our vision, the quality of annotation vocabulary depends on the amount and complexity of information to be navigated by a user while searching for a certain topic. To address the problem of the dynamics in semantic annotation, this work proposes a modular architecture for dynamic semantic annotation. This architecture models the activities involved in the semantic annotation process in abstract modules dedicated to the different tasks that users have to perform. As a case of study we took blogging annotation. We gathered a corpus containing up to 10 years of annotated blog posts with categories and tags and we analyzed the annotation habits. By testing automatic tag and category strategies, we measure the impact of the dynamics in the annotation system. We propose some strategies to control this impact, which helps to evaluate the obsolescence of examples. Finally we propose a framework relying on three quality metrics and an interactive method to recover the quality of an indexing system based on semantic annotation. The metrics are evaluated over time to observe the degradation in indexing quality. A series of studied examples are presented to observe the performance of the measures to guide the restructuring of the indexing annotation system
APA, Harvard, Vancouver, ISO, and other styles
26

Anderson, Neil David Alan. "Data extraction & semantic annotation from web query result pages." Thesis, Queen's University Belfast, 2016. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.705642.

Full text
Abstract:
Our unquenchable thirst for knowledge is one of the few things that really defines our humanity. Yet the Information Age, which we have created, has left us floating aimlessly in a vast ocean of unintelligible data. Hidden Web databases are one massive source of structured data. The contents of these databases are, however, often only accessible through a query proposed by a user. The data returned in these Query Result Pages is intended for human consumption and, as such, has nothing more than an implicit semantic structure which can be understood visually by a human reader, but not by a computer. This thesis presents an investigation into the processes of extraction and semantic understanding of data from Query Result Pages. The work is multi-faceted and includes at the outset, the development of a vision-based data extraction tool. This work is followed by the development of a number of algorithms which make use of machine learning-based techniques first to align the data extracted into semantically similar groups and then to assign a meaningful label to each group. Part of the work undertaken in fulfilment of this thesis has also addressed the lack of large, modern datasets containing a wide range of result pages representing of those typically found online today. In particular, a new innovative crowdsourced dataset is presented. Finally, the work concludes by examining techniques from the complementary research field of Information Extraction. An initial, critical assessment of how these mature techniques could be applied to this research area is provided.
APA, Harvard, Vancouver, ISO, and other styles
27

Bihler, Pascal [Verfasser]. "The Semantic Shadow : Combining User Interaction with Context Information for Semantic Web-Site Annotation / Pascal Bihler." Bonn : Universitäts- und Landesbibliothek Bonn, 2011. http://d-nb.info/1016006292/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Yuee, Liu. "Ontology-based image annotation." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/39611/1/Liu_Yuee_Thesis.pdf.

Full text
Abstract:
With regard to the long-standing problem of the semantic gap between low-level image features and high-level human knowledge, the image retrieval community has recently shifted its emphasis from low-level features analysis to high-level image semantics extrac- tion. User studies reveal that users tend to seek information using high-level semantics. Therefore, image semantics extraction is of great importance to content-based image retrieval because it allows the users to freely express what images they want. Semantic content annotation is the basis for semantic content retrieval. The aim of image anno- tation is to automatically obtain keywords that can be used to represent the content of images. The major research challenges in image semantic annotation are: what is the basic unit of semantic representation? how can the semantic unit be linked to high-level image knowledge? how can the contextual information be stored and utilized for image annotation? In this thesis, the Semantic Web technology (i.e. ontology) is introduced to the image semantic annotation problem. Semantic Web, the next generation web, aims at mak- ing the content of whatever type of media not only understandable to humans but also to machines. Due to the large amounts of multimedia data prevalent on the Web, re- searchers and industries are beginning to pay more attention to the Multimedia Semantic Web. The Semantic Web technology provides a new opportunity for multimedia-based applications, but the research in this area is still in its infancy. Whether ontology can be used to improve image annotation and how to best use ontology in semantic repre- sentation and extraction is still a worth-while investigation. This thesis deals with the problem of image semantic annotation using ontology and machine learning techniques in four phases as below. 1) Salient object extraction. A salient object servers as the basic unit in image semantic extraction as it captures the common visual property of the objects. Image segmen- tation is often used as the �rst step for detecting salient objects, but most segmenta- tion algorithms often fail to generate meaningful regions due to over-segmentation and under-segmentation. We develop a new salient object detection algorithm by combining multiple homogeneity criteria in a region merging framework. 2) Ontology construction. Since real-world objects tend to exist in a context within their environment, contextual information has been increasingly used for improving object recognition. In the ontology construction phase, visual-contextual ontologies are built from a large set of fully segmented and annotated images. The ontologies are composed of several types of concepts (i.e. mid-level and high-level concepts), and domain contextual knowledge. The visual-contextual ontologies stand as a user-friendly interface between low-level features and high-level concepts. 3) Image objects annotation. In this phase, each object is labelled with a mid-level concept in ontologies. First, a set of candidate labels are obtained by training Support Vectors Machines with features extracted from salient objects. After that, contextual knowledge contained in ontologies is used to obtain the �nal labels by removing the ambiguity concepts. 4) Scene semantic annotation. The scene semantic extraction phase is to get the scene type by using both mid-level concepts and domain contextual knowledge in ontologies. Domain contextual knowledge is used to create scene con�guration that describes which objects co-exist with which scene type more frequently. The scene con�guration is represented in a probabilistic graph model, and probabilistic inference is employed to calculate the scene type given an annotated image. To evaluate the proposed methods, a series of experiments have been conducted in a large set of fully annotated outdoor scene images. These include a subset of the Corel database, a subset of the LabelMe dataset, the evaluation dataset of localized semantics in images, the spatial context evaluation dataset, and the segmented and annotated IAPR TC-12 benchmark.
APA, Harvard, Vancouver, ISO, and other styles
29

Khalili, Ali. "A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-159956.

Full text
Abstract:
The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions.
APA, Harvard, Vancouver, ISO, and other styles
30

Wong, Ping-wai. "Semantic annotation of Chinese texts with message structures based on HowNet." Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/hkuto/record/B38212389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Al-Sultany, Ghaidaa Abdalhussein Billal. "Automatic message annotation and semantic interface for context aware mobile computing." Thesis, Brunel University, 2012. http://bura.brunel.ac.uk/handle/2438/6564.

Full text
Abstract:
In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.
APA, Harvard, Vancouver, ISO, and other styles
32

Wong, Ping-wai, and 黃炳蔚. "Semantic annotation of Chinese texts with message structures based on HowNet." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B38212389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Al-Khalifa, Hend S. "Automatic document-level semantic metadata annotation using folksonomies and domain ontologies." Thesis, University of Southampton, 2007. https://eprints.soton.ac.uk/264181/.

Full text
Abstract:
The last few years have witnessed a fast growth of the concept of Social Software. Be it video sharing such as YouTube, photo sharing such as Flickr, community building such as MySpace, or social bookmarking such as del.icio.us. These websites contain valuable user-generated metadata called folksonomies. Folksonomies are ad hoc, light-weight knowledge representation artefacts to describe web resources using people’s own vocabulary. The cheap metadata contained in such websites presents potential opportunities for us (researchers) to benefit from. This thesis presents a novel tool that uses folksonomies to automatically generate metadata with educational semantics in an attempt to provide semantic annotations to bookmarked web resources, and to help in making the vision of the Semantic Web a reality. The tool comprises two components: the tags normalisation process and the semantic annotation process. The tool uses the del.icio.us social bookmarking service as a source for folksonomy tags. The tool was applied to a case study consisting of a framework for evaluating the usefulness of the generated semantic metadata within the context of a particular eLearning application. This implementation of the tool was evaluated over three dimensions: the quality, the searchability and the representativeness of the generated semantic metadata. The results show that folksonomy tags were acceptable for creating semantic metadata. Moreover, folksonomy tags showed the power of aggregating people’s intelligence. The novel contribution of this work is the design of a tool that utilises folksonomy tags to automatically generate metadata with fine gained and extra educational semantics.
APA, Harvard, Vancouver, ISO, and other styles
34

Yaprakkaya, Gokhan. "Face Identification, Gender And Age Groups Classifications For Semantic Annotation Of Videos." Thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612848/index.pdf.

Full text
Abstract:
This thesis presents a robust face recognition method and a combination of methods for gender identification and age group classification for semantic annotation of videos. Local binary pattern histogram which has 256 bins and pixel intensity differences are used as extracted facial features for gender classification. DCT Mod2 features and edge detection results around facial landmarks are used as extracted facial features for age group classification. In gender classification module, a Random Trees classifier is trained with LBP features and an adaboost classifier is trained with pixel intensity differences. DCT Mod2 features are used for training of a Random Trees classifier and LBP features around facial landmark points are used for training another Random Trees classifier in age group classification module. DCT Mod2 features of the detected faces morped by two dimensional face morphing method based on Active Appearance Model and Barycentric Coordinates are used as the inputs of the nearest neighbor classifier with weights obtained from the trained Random Forest classifier in face identification module. Different feature extraction methods are tried and compared and the best achievements in the face recognition module to be used in the method chosen. We compared our classification results with some successful earlier works results in our experiments performed with same datasets and got satisfactory results.
APA, Harvard, Vancouver, ISO, and other styles
35

Tao, Cui. "Ontology generation, information harvesting, and semantic annotation for machine-generated web pages /." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2762.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Tao, Cui. "Ontology Generation, Information Harvesting and Semantic Annotation for Machine-Generated Web Pages." BYU ScholarsArchive, 2008. https://scholarsarchive.byu.edu/etd/1646.

Full text
Abstract:
The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. This frustrating problem motivates an approach to turn the web of pages into a web of knowledge, so that web users can query the information of interest directly. This dissertation provides a step in this direction and a way to partially overcome the challenges. Specifically, this dissertation shows how to turn machine-generated web pages like those on the hidden web into semantic web pages for the web of knowledge. We design and develop three systems to address the challenge of turning the web pages into web-of-knowledge pages: TISP (Table Interpretation for Sibling Pages), TISP++, and FOCIH (Form-based Ontology Creation and Information Harvesting). TISP can automatically interpret hidden-web tables. Given interpreted tables, TISP++ can generate ontologies and semantically annotate the information present in the interpreted tables automatically. This way, we can offer a way to make the hidden information publicly accessible. We also provide users with a way where they can generate personalized ontologies. FOCIH provides users with an interface with which they can provide their own view by creating a form that specifies the information they want. Based on the form, FOCIH can generate user-specific ontologies, and based on patterns in machine-generated pages, FOCIH can harvest information and annotate these pages with respect to the generated ontology. Users can directly query on the annotated information. With these contributions, this dissertation serves as a foundational pillar for turning the current web of pages into a web of knowledge.
APA, Harvard, Vancouver, ISO, and other styles
37

Hou, Jun. "Text mining with semantic annotation : using enriched text representation for entity-oriented retrieval, semantic relation identification and text clustering." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/79206/1/Jun_Hou_Thesis.pdf.

Full text
Abstract:
This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.
APA, Harvard, Vancouver, ISO, and other styles
38

Hatem, Muna Salman. "A framework for semantic web implementation based on context-oriented controlled automatic annotation." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/3207.

Full text
Abstract:
The Semantic Web is the vision of the future Web. Its aim is to enable machines to process Web documents in a way that makes it possible for the computer software to "understand" the meaning of the document contents. Each document on the Semantic Web is to be enriched with meta-data that express the semantics of its contents. Many infrastructures, technologies and standards have been developed and have proven their theoretical use for the Semantic Web, yet very few applications have been created. Most of the current Semantic Web applications were developed for research purposes. This project investigates the major factors restricting the wide spread of Semantic Web applications. We identify the two most important requirements for a successful implementation as the automatic production of the semantically annotated document, and the creation and maintenance of semantic based knowledge base. This research proposes a framework for Semantic Web implementation based on context-oriented controlled automatic Annotation; for short, we called the framework the Semantic Web Implementation Framework (SWIF) and the system that implements this framework the Semantic Web Implementation System (SWIS). The proposed architecture provides for a Semantic Web implementation of stand-alone websites that automatically annotates Web pages before being uploaded to the Intranet or Internet, and maintains persistent storage of Resource Description Framework (RDF) data for both the domain memory, denoted by Control Knowledge, and the meta-data of the Web site's pages. We believe that the presented implementation of the major parts of SWIS introduce a competitive system with current state of art Annotation tools and knowledge management systems; this is because it handles input documents in the ii context in which they are created in addition to the automatic learning and verification of knowledge using only the available computerized corporate databases. In this work, we introduce the concept of Control Knowledge (CK) that represents the application's domain memory and use it to verify the extracted knowledge. Learning is based on the number of occurrences of the same piece of information in different documents. We introduce the concept of Verifiability in the context of Annotation by comparing the extracted text's meaning with the information in the CK and the use of the proposed database table Verifiability_Tab. We use the linguistic concept Thematic Role in investigating and identifying the correct meaning of words in text documents, this helps correct relation extraction. The verb lexicon used contains the argument structure of each verb together with the thematic structure of the arguments. We also introduce a new method to chunk conjoined statements and identify the missing subject of the produced clauses. We use the semantic class of verbs that relates a list of verbs to a single property in the ontology, which helps in disambiguating the verb in the input text to enable better information extraction and Annotation. Consequently we propose the following definition for the annotated document or what is sometimes called the 'Intelligent Document' 'The Intelligent Document is the document that clearly expresses its syntax and semantics for human use and software automation'. This work introduces a promising improvement to the quality of the automatically generated annotated document and the quality of the automatically extracted information in the knowledge base. Our approach in the area of using Semantic Web iii technology opens new opportunities for diverse areas of applications. E-Learning applications can be greatly improved and become more effective.
APA, Harvard, Vancouver, ISO, and other styles
39

Austrheim, Aanund, and Terje Olsen. "A Graphical User Interface for Automated Semantic Web service Annotation, Composition and Execution." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-10765.

Full text
Abstract:
<p>We have implemented a graphical user interface for the ADIS system, as a realization of the ADIS concepts and theories. The system as a whole lets the user discover, annote, publish, and execute composite Web services, and with this concrete implementation we have proved the concept of using ontologies in Web service matchmaking, as well as realizing automated Web service composition, and semi-automatic Web service execution.</p>
APA, Harvard, Vancouver, ISO, and other styles
40

Yao, Wei [Verfasser]. "Semantic annotation and object extraction for very high resolution satellite images / Wei Yao." Siegen : Universitätsbibliothek der Universität Siegen, 2018. http://d-nb.info/1152078720/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Dotsika, Fefie. "Semantic technologies : from niche to the mainstream of Web 3? : a comprehensive framework for web information modelling and semantic annotation." Thesis, University of Westminster, 2012. https://westminsterresearch.westminster.ac.uk/item/8z712/semantic-technologies-from-niche-to-the-mainstream-of-web-3-a-comprehensive-framework-for-web-information-modelling-and-semantic-annotation.

Full text
Abstract:
Context: Web information technologies developed and applied in the last decade have considerably changed the way web applications operate and have revolutionised information management and knowledge discovery. Social technologies, user-generated classification schemes and formal semantics have a far-reaching sphere of influence. They promote collective intelligence, support interoperability, enhance sustainability and instigate innovation. Contribution: The research carried out and consequent publications follow the various paradigms of semantic technologies, assess each approach, evaluate its efficiency, identify the challenges involved and propose a comprehensive framework for web information modelling and semantic annotation, which is the thesis’ original contribution to knowledge. The proposed framework assists web information modelling, facilitates semantic annotation and information retrieval, enables system interoperability and enhances information quality. Implications: Semantic technologies coupled with social media and end-user involvement can instigate innovative influence with wide organisational implications that can benefit a considerable range of industries. The scalable and sustainable business models of social computing and the collective intelligence of organisational social media can be resourcefully paired with internal research and knowledge from interoperable information repositories, back-end databases and legacy systems. Semantified information assets can free human resources so that they can be used to better serve business development, support innovation and increase productivity.
APA, Harvard, Vancouver, ISO, and other styles
42

Elias, Mturi. "Design of Business Process Model Repositories : Requirements, Semantic Annotation Model and Relationship Meta-model." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-117035.

Full text
Abstract:
Business process management is fast becoming one of the most important approaches for designing contemporary organizations and information systems. A critical component of business process management is business process modelling. It is widely accepted that modelling of business processes from scratch is a complex, time-consuming and error-prone task. However the efforts made to model these processes are seldom reused beyond their original purpose. Reuse of business process models has the potential to overcome the challenges of modelling business processes from scratch. Process model repositories, properly populated, are certainly a step toward supporting reuse of process models. This thesis starts with the observation that the existing process model repositories for supporting process model reuse suffer from several shortcomings that affect their usability in practice. Firstly, most of the existing repositories are proprietary, therefore they can only be enhanced or extended with new models by the owners of the repositories. Secondly, it is difficult to locate and retrieve relevant process models from a large collection. Thirdly, process models are not goal related, thereby making it difficult to gain an understanding of the business goals that are realized by a certain model. Finally, process model repositories lack a clear mechanism to identify and define the relationship between business processes and as a result it is difficult to identify related processes. Following a design science research paradigm, this thesis proposes an open and language-independent process model repository with an efficient retrieval system to support process model reuse. The proposed repository is grounded on four original and interrelated contributions: (1) a set of requirements that a process model repository should possess to increase the probability of process model reuse; (2) a context-based process semantic annotation model for semantically annotating process models to facilitate effective retrieval of process models; (3) a business process relationship meta-model for identifying and defining the relationship of process models in the repository; and (4) architecture of a process model repository for process model reuse. The models and architecture produced in this thesis were evaluated to test their utility, quality and efficacy. The semantic annotation model was evaluated through two empirical studies using controlled experiments. The conclusion drawn from the two studies is that the annotation model improves searching, navigation and understanding of process models. The process relationship meta-model was evaluated using an informed argument to determine the extent to which it meets the established requirements. The results of the analysis revealed that the meta-model meets the established requirements. Also the analysis of the architecture against the requirements indicates that the architecture meets the established requirements.<br>Processhantering, också kallat ärendehantering, har blivit en av de viktigaste ansatserna för att utforma dagens organisationer och informationssystem. En central komponent i processhantering är processmodellering. Det är allmänt känt att modellering av processer kan vara en komplex, tidskrävande och felbenägen uppgift. Och de insatser som görs för att modellera processer kan sällan användas bortom processernas ursprungliga syfte. Återanvändning av processmodeller skulle kunna övervinna många av de utmaningar som finns med att modellera processer. En katalog över processmodeller är ett steg mot att stödja återanvändning av processmodeller. Denna avhandling börjar med observationen att befintliga processmodellkataloger för att stödja återanvändning av processmodeller lider av flera brister som påverkar deras användbarhet i praktiken. För det första är de flesta processmodellkatalogerna proprietära, och därför kan endast katalogägarna förbättra eller utöka dem med nya modeller. För det andra är det svårt att finna och hämta relevanta processmodeller från en stor katalog. För det tredje är processmodeller inte målrelaterade, vilket gör det svårt att få en förståelse för de affärsmål som realiseras av en viss modell. Slutligen så saknar processmodellkataloger ofta en tydlig mekanism för att identifiera och definiera förhållandet mellan processer, och därför är det svårt att identifiera relaterade processer. Utifrån ett designvetenskapligt forskningsparadigm så föreslår denna avhandling en öppen och språkoberoende processmodellkatalog med ett effektivt söksystem för att stödja återanvändning av processmodeller. Den föreslagna katalogen bygger på fyra originella och inbördes relaterade bidrag: (1) en uppsättning krav som en processmodellkatalog bejöver uppfylla för att öka möjligheterna till återanvändning av processmodeller; (2) en kontextbaserad semantisk processannoteringsmodell för semantisk annotering av processmodeller för att underlätta effektivt återvinnande av processmodeller; (3) en metamodell för processrelationer för att identifiera och definiera förhållandet mellan processmodeller i katalogen; och (4) en arkitektur av en processmodellkatalog för återanvändning av processmodeller. De modeller och den arkitektur som tagits fram i denna avhandling har utvärderats för att testa deras användbarhet, kvalitet och effektivitet. Den semantiska annotationsmodellen utvärderades genom två empiriska studier med kontrollerade experiment. Slutsatsen av de två studierna är att modellen förbättrar sökning, navigering och förståelse för processmodeller. Metamodellen för processrelationer utvärderades med hjälp av ett informerat argument för att avgöra i vilken utsträckning den uppfyllde de ställda kraven. Resultaten av analysen visade att metamodellen uppfyllde dessa krav. Även analysen av arkitekturen indikerade att denna uppfyllde de fastställda kraven.
APA, Harvard, Vancouver, ISO, and other styles
43

Tang, Lilian Hongying. "Semantic analysis of image content for intelligent retrieval and automatic annotation of medical images." Thesis, University of Cambridge, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621732.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Andrade, Guidson Coelho de. "Semantic enrichment of American English corpora through automatic semantic annotation based on top-level ontologies using the CRF clas- sification model." Universidade Federal de Viçosa, 2018. http://www.locus.ufv.br/handle/123456789/21639.

Full text
Abstract:
Submitted by MARCOS LEANDRO TEIXEIRA DE OLIVEIRA (marcosteixeira@ufv.br) on 2018-09-05T12:51:49Z No. of bitstreams: 1 texto completo.pdf: 1357733 bytes, checksum: 0b0fc46e7358bfaa6996ea4bcbd760d0 (MD5)<br>Made available in DSpace on 2018-09-05T12:51:49Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1357733 bytes, checksum: 0b0fc46e7358bfaa6996ea4bcbd760d0 (MD5) Previous issue date: 2018-04-26<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior<br>O significado de bases de dados textuais é de fácil percepção para as pessoas, mas de difícil interpretação por parte dos computadores. Para que as máquinas possam compreender a semântica associada aos textos e não somente a sintaxe, é necessário a adição de informações extras a esses corpora. A anotação semântica é a tarefa que incorpora essas informações por meio da adição de metadados aos itens lex- icais. Essas informações podem ser conceitos ontológicos que ajudam a definir a natureza da palavra a fim de atribuir-lhe algum significado. No entanto, anotar textos segundo uma determinada ontologia ainda é uma tarefa que demanda tempo e esforço de anotadores treinados para esse fim. Outra abordagem a ser consid- erada é o desenvolvimento de ferramentas de anotação semântica automática que utilizem técnicas de aprendizado de máquina para classificar os termos anotados. Essa abordagem demanda uma base de dados para treinamento dos algoritmos que nesse caso são corpora pré-anotados segundo a dimensão semântica a ser explorada. Entretanto, essa linhagem metodológica dispõe de recursos limitados para suprir as necessidades dos métodos de aprendizado. Existe uma grande carência de corpora anotados semanticamente e, particularmente, uma ausência ainda maior de corpora ontologicamente anotados, dificultando o avanço da área de anotação semântica au- tomática. O objetivo do presente trabalho é auxiliar no enriquecimento semântico de textos do Inglês americano, anotando-os de forma automática baseando-se em ontologia de nível topo através do modelo de aprendizagem supervisionada Condi- tional Random Fields (CRF). Após a seleção do Open American National Corpus como base de dados linguística e da Schema.org como ontologia, o trabalho teve sua estrutura dividida em duas etapas. Primeiramente, o corpus pré-processado e corrigido foi submetido a uma anotação híbrida, com um anotador baseado em re- gras e, posteriormente, uma anotação complementar manual. Ambas as tarefas de anotação foram dirigidas pelos conceitos e definições das oito classes provenientes do nível topo da ontologia selecionada. De posse do corpus anotado ontologicamente, iniciou-se o processo de anotação automática via uso do método de aprendizagem CRF. O modelo de predição levou em consideração as características linguísticas e estruturais dos termos para classificá-los sob os oito tipos ontológicos. Os resulta- dos obtidos durante a avaliação do modelo foram muito satisfatórios e atingiram o objetivo da pesquisa. O trabalho, embora seja uma nova abordagem de anotação semântica e com pouca margem de comparação, apresentou resultados promissores para o avanço da pesquisa na área de enriquecimento semântico automático baseado em ontologias de nível topo.<br>Textual databases carry with them human-perceived meanings, but those meanings are difficult to be interpreted by computers. In order for the machines to understand the semantics attached to texts, and not only their syntax, it is necessary to add extra information to these corpora. Semantic annotation is the task of incorporat- ing this information by adding metadata to lexical items. This information can be ontological concepts that help define the nature of the word in order to give it some meaning. However, annotating texts according to an ontology is still a task that requires time and effort from annotators trained for this purpose. Another approach to be considered is the use of automatic semantic annotation tools that use machine learning techniques to classify annotated terms. This approach demands a database for training the algorithms that in this case are corpora pre-annotated according to the semantic dimension to be explored. However, this methodological lineage has limited resources to meet the needs of learning methods. There is a large lack of semantically annotated corpora and an even larger absence of ontologically anno- tated corpora, hindering the advance of the area of automatic semantic annotation. The purpose of the present work is to assist in the semantic enrichment of Amer- ican English texts by automatically annotating them based on top-level ontology through the Conditional Random Fields (CRF) supervised learning model. After the selection of the Open American National Corpus as a linguistic database and Schema.org as an ontology, the work had its structure divided into two stages. First, the pre-processed and corrected corpus was submitted to a hybrid annotation, with a rule-based annotator, and later manually. Both annotation tasks were driven by the concepts and definitions of the eight classes from the top-level of the selected ontology. Once the corpus was written ontologically, the automatic annotation pro- cess was started using the CRF learning method. The prediction model took into account the linguistic and structural features of the terms to classify them under the eight ontological types. The results obtained during the evaluation of the model were very satisfactory and reached the objective of the research. The work, although it is a new approach of semantic annotation and with little margin of comparison, presented promising results for the advance of the research in the area of automatic semantic enrichment based on top-level ontologies.
APA, Harvard, Vancouver, ISO, and other styles
45

Li, Chun. "Ontology-driven semantic annotations for multiple engineering viewpoints in computer aided design." Thesis, University of Bath, 2012. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.558858.

Full text
Abstract:
Engineering design involves a series of activities to handle data, including capturing and storing data, retrieval and manipulation of data. This also applies throughout the entire product lifecycle (PLC). Unfortunately, a closed loop of knowledge and information management system has not been implemented for the PLC. As part of product lifecycle management (PLM) approaches, computer-aided design (CAD) systems are extensively used from embodiment and detail design stages in mechanical engineering. However, current CAD systems lack the ability to handle semantically-rich information, thus to represent, manage and use knowledge among multidisciplinary engineers, and to integrate various tools/services with distributed data and knowledge. To address these challenges, a general-purpose semantic annotation approach based on CAD systems in the mechanical engineering domain is proposed, which contributes to knowledge management and reuse, data interoperability and tool integration. In present-day PLM systems, annotation approaches are currently embedded in software applications and use diverse data and anchor representations, making them static, inflexible and difficult to incorporate with external systems. This research will argue that it is possible to take a generalised approach to annotation with formal annotation content structures and anchoring mechanisms described using general-purpose ontologies. In this way viewpoint-oriented annotation may readily be captured, represented and incorporated into PLM systems together with existing annotations in a common framework, and the knowledge collected or generated from multiple engineering viewpoints may be reasoned with to derive additional knowledge to enable downstream processes. Therefore, knowledge can be propagated and evolved through the PLC. Within this framework, a knowledge modelling methodology has also been proposed for developing knowledge models in various situations. In addition, a prototype system has been designed and developed in order to evaluate the core contributions of this proposed concept. According to an evaluation plan, cost estimation and finite element analysis as case studies have been used to validate the usefulness, feasibility and generality of the proposed framework. Discussion has been carried out based on this evaluation. As a conclusion, the presented research work has met the original aim and objectives, and can be improved further. At the end, some research directions have been suggested.
APA, Harvard, Vancouver, ISO, and other styles
46

Traore, Lamine. "Semantic modeling of an histopathology image exploration and analysis tool." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066621/document.

Full text
Abstract:
La formalisation des données cliniques est réalisée et adoptée dans plusieurs domaines de la santé comme la prévention des erreurs médicales, la standardisation, les guides de bonnes pratiques et de recommandations. Cependant, la communauté n'arrive pas encore à tirer pleinement profit de la valeur de ces données. Le problème majeur reste la difficulté à intégrer ces données et des services sémantiques associés au profit de la qualité de soins. Objectif L'objectif méthodologique de ce travail consiste à formaliser, traiter et intégrer les connaissances d'histopathologie et d'imagerie basées sur des protocoles standardisés, des référentiels et en utilisant les langages du web sémantique. L'objectif applicatif est de valoriser ces connaissances dans une plateforme pour faciliter l'exploration des lames virtuelles (LV), améliorer la collaboration entre pathologistes et fiabiliser les systèmes d'aide à la décision dans le cadre spécifique du diagnostic du cancer du sein. Il est important de préciser que notre but n'est pas de remplacer le clinicien, mais plutôt de l'accompagner et de faciliter ses lourdes tâches quotidiennes : le dernier mot reste aux pathologistes. Approche Nous avons adopté une approche transversale pour la représentation formelle des connaissances d'histopathologie et d'imagerie dans le processus de gradation du cancer. Cette formalisation s'appuie sur les technologies du web sémantique<br>Semantic modelling of a histopathology image exploration and analysis tool. Recently, anatomic pathology (AP) has seen the introduction of several tools such as high-resolution histopathological slide scanners, efficient software viewers for large-scale histopathological images and virtual slide technologies. These initiatives created the conditions for a broader adoption of computer-aided diagnosis based on whole slide images (WSI) with the hope of a possible contribution to decreasing inter-observer variability. Beside this, automatic image analysis algorithms represent a very promising solution to support pathologist’s laborious tasks during the diagnosis process. Similarly, in order to reduce inter-observer variability between AP reports of malignant tumours, the College of American Pathologists edited 67 organ-specific Cancer Checklists and associated Protocols (CAP-CC&amp;P). Each checklist includes a set of AP observations that are relevant in the context of a given organ-specific cancer and have to be reported by the pathologist. The associated protocol includes interpretation guidelines for most of the required observations. All these changes and initiatives bring up a number of scientific challenges such as the sustainable management of the available semantic resources associated to the diagnostic interpretation of AP images by both humans and computers. In this context, reference vocabularies and formalization of the associated knowledge are especially needed to annotate histopathology images with labels complying with semantic standards. In this research work, we present our contribution in this direction. We propose a sustainable way to bridge the content, features, performance and usability gaps between histopathology and WSI analysis
APA, Harvard, Vancouver, ISO, and other styles
47

Kucuk, Dilek. "Exploiting Information Extraction Techniques For Automatic Semantic Annotation And Retrieval Of News Videos In Turkish." Phd thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613043/index.pdf.

Full text
Abstract:
Information extraction (IE) is known to be an effective technique for automatic semantic indexing of news texts. In this study, we propose a text-based fully automated system for the semantic annotation and retrieval of news videos in Turkish which exploits several IE techniques on the video texts. The IE techniques employed by the system include named entity recognition, automatic hyperlinking, person entity extraction with coreference resolution, and event extraction. The system utilizes the outputs of the components implementing these IE techniques as the semantic annotations for the underlying news video archives. Apart from the IE components, the proposed system comprises a news video database in addition to components for news story segmentation, sliding text recognition, and semantic video retrieval. We also propose a semi-automatic counterpart of system where the only manual intervention takes place during text extraction. Both systems are executed on genuine video data sets consisting of videos broadcasted by Turkish Radio and Television Corporation. The current study is significant as it proposes the first fully automated system to facilitate semantic annotation and retrieval of news videos in Turkish, yet the proposed system and its semi-automated counterpart are quite generic and hence they could be customized to build similar systems for video archives in other languages as well. Moreover, IE research on Turkish texts is known to be rare and within the course of this study, we have proposed and implemented novel techniques for several IE tasks on Turkish texts. As an application example, we have demonstrated the utilization of the implemented IE components to facilitate multilingual video retrieval.
APA, Harvard, Vancouver, ISO, and other styles
48

Dytrych, Jaroslav. "Sémantická anotace textu." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-412580.

Full text
Abstract:
This thesis deals with intelligent systems for support of the semantic annotation of text. It discusses the motivation for creation of such systems and state of the art in the areas of their usage. The thesis also describes newly proposed and realised annotation system which realizes advanced functions of semantic filtering and presentation of annotation suggestion alternatives in a unique way. The results of finished experiments clearly show the advantages of proposed solution. They also prove that the user interface of the annotation tools affects the annotation process. The optimisation of displayed information for the task of disambiguation of ambiguous entity names was done and proposed methods to speedup and increase of quality of the created annotations was experimentally evaluated. The comparison with the Protégé general tool has proven the benefits of created system for collaborative ontology creation which should be anchored in the text. In the conclusion, all achieved results are analysed and summarized.
APA, Harvard, Vancouver, ISO, and other styles
49

Lindberg, Hampus. "Semantic Segmentation of Iron Ore Pellets in the Cloud." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-86896.

Full text
Abstract:
This master's thesis evaluates data annotation, semantic segmentation and Docker for use in AWS. The data provided has to be annotated and is to be used as a dataset for the creation of a neural network. Different neural network models are then to be compared based on performance. AWS has the option to use Docker containers and thus that option is to be examined, and lastly the different tools available in AWS SageMaker will be analyzed for bringing a neural network to the cloud. Images were annotated in Ilastik and the dataset size is 276 images, then a neural network was created in PyTorch by using the library Segmentation Models PyTorch which gave the option of trying different models. This neural network was created in a notebook in Google Colab for a quick setup and easy testing. The dataset was then uploaded to AWS S3 and the notebook was brought from Colab to an AWS instance where the dataset then could be loaded from S3. A Docker container was created and packaged with the necessary packages and libraries as well as the training and inference code, to then be pushed to the ECR (Elastic Container Registry). This container could then be used to perform training jobs in SageMaker which resulted in a trained model stored in S3, and the hyperparameter tuning tool was also examined to get a better performing model. The two different deployment methods in SageMaker was then investigated to understand the entire machine learning solution. The images annotated in Ilastik were deemed sufficient as the neural network results were satisfactory. The neural network created was able to use all of the models accessible from Segmentation Models PyTorch which enabled a lot of options. By using a Docker container all of the tools available in SageMaker could be used with the created neural network packaged in the container and pushed to the ECR. Training jobs were run in SageMaker by using the container to get a trained model which could be saved to AWS S3. Hyperparameter tuning was used and got better results than the manually tested parameters which resulted in the best neural network produced. The model that was deemed the best was Unet++ in combination with the Dpn98 encoder. The two different deployment methods in SageMaker was explored and is believed to be beneficial in different ways and thus has to be reconsidered for each project. By analysis the cloud solution was deemed to be the better alternative compared to an in-house solution, in all three aspects measured, which was price, performance and scalability.
APA, Harvard, Vancouver, ISO, and other styles
50

Djemaa, Marianne. "Stratégie domaine par domaine pour la création d'un FrameNet du français : annotations en corpus de cadres et rôles sémantiques." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC007/document.

Full text
Abstract:
Dans cette thèse, nous décrivons la création du French FrameNet (FFN), une ressource de type FrameNet pour le français créée à partir du FrameNet de l’anglais (Baker et al., 1998) et de deux corpus arborés : le French Treebank (Abeillé et al., 2003) et le Sequoia Treebank (Candito et Seddah, 2012). La ressource séminale, le FrameNet de l’anglais, constitue un modèle d’annotation sémantique de situations prototypiques et de leurs participants. Elle propose à la fois :a) un ensemble structuré de situations prototypiques, appelées cadres, associées à des caractérisations sémantiques des participants impliqués (les rôles);b) un lexique de déclencheurs, les lexèmes évoquant ces cadres;c) un ensemble d’annotations en cadres pour l’anglais. Pour créer le FFN, nous avons suivi une approche «par domaine notionnel» : nous avons défini quatre «domaines» centrés chacun autour d’une notion (cause, communication langagière, position cognitive ou transaction commerciale), que nous avons travaillé à couvrir exhaustivement à la fois pour la définition des cadres sémantiques, la définition du lexique, et l’annotation en corpus. Cette stratégie permet de garantir une plus grande cohérence dans la structuration en cadres sémantiques, tout en abordant la polysémie au sein d’un domaine et entre les domaines. De plus, nous avons annoté les cadres de nos domaines sur du texte continu, sans sélection d’occurrences : nous préservons ainsi la distribution des caractéristiques lexicales et syntaxiques de l’évocation des cadres dans notre corpus. à l’heure actuelle, le FFN comporte 105 cadres et 873 déclencheurs distincts, qui donnent lieu à 1109 paires déclencheur-cadre distinctes, c’est-à-dire 1109 sens. Le corpus annoté compte au total 16167 annotations de cadres de nos domaines et de leurs rôles. La thèse commence par resituer le modèle FrameNet dans un contexte théorique plus large. Nous justifions ensuite le choix de nous appuyer sur cette ressource et motivons notre méthodologie en domaines notionnels. Nous explicitons pour le FFN certaines notions définies pour le FrameNet de l’anglais que nous avons jugées trop floues pour être appliquées de manière cohérente. Nous introduisons en particulier des critères plus directement syntaxiques pour la définition du périmètre lexical d’un cadre, ainsi que pour la distinction entre rôles noyaux et non-noyaux.Nous décrivons ensuite la création du FFN : d’abord, la délimitation de la structure de cadres utilisée pour le FFN, et la création de leur lexique. Nous présentons alors de manière approfondie le domaine notionnel des positions cognitives, qui englobe les cadres portant sur le degré de certitude d’un être doué de conscience sur une proposition. Puis, nous présentons notre méthodologie d’annotation du corpus en cadres et en rôles. à cette occasion, nous passons en revue certains phénomènes linguistiques qu’il nous a fallu traiter pour obtenir une annotation cohérente ; c’est par exemple le cas des constructions à attribut de l’objet.Enfin, nous présentons des données quantitatives sur le FFN tel qu’il est à ce jour et sur son évaluation. Nous terminons sur des perspectives de travaux d’amélioration et d’exploitation de la ressource créée<br>This thesis describes the creation of the French FrameNet (FFN), a French language FrameNet type resource made using both the Berkeley FrameNet (Baker et al., 1998) and two morphosyntactic treebanks: the French Treebank (Abeillé et al., 2003) and the Sequoia Treebank (Candito et Seddah, 2012). The Berkeley FrameNet allows for semantic annotation of prototypical situations and their participants. It consists of:a) a structured set of prototypical situations, called frames. These frames incorporate semantic characterizations of the situations’ participants (Frame Elements, or FEs);b) a lexicon of lexical units (LUs) which can evoke those frames;c) a set of English language frame annotations. In order to create the FFN, we designed a “domain by domain” methodology: we defined four “domains”, each centered on a specific notion (cause, verbal communication, cognitive stance, or commercial transaction). We then sought to obtain full frame and lexical coverage for these domains, and annotated the first 100 corpus occurrences of each LU in our domains. This strategy guarantees a greater consistency in terms of frame structuring than other approaches and is conducive to work on both intra-domain and inter-domains frame polysemy. Our annotating frames on continuous text without selecting particular LU occurrences preserves the natural distribution of lexical and syntactic characteristics of frame-evoking elements in our corpus. At the present time, the FFNincludes 105 distinct frames and 873 distinct LUs, which combine into 1,109 LU-frame pairs (i.e. 1,109 senses). 16,167 frame occurrences, as well as their FEs, have been annotated in our corpus. In this thesis, I first situate the FrameNet model in a larger theoretical background. I then justify our using the Berkeley FrameNet as our resource base and explain why we used a domain-by- domain methodology. I next try to clarify some specific BFN notions that we found too vague to be coherently used to make the FFN. Specifically, I introduce more directly syntactic criteria both for defining a frame’s lexical perimeter and for differentiating core FEs from non-core ones.Then, I describe the FFN creation itself first by delimitating a structure of frames that will be used in the resource and by creating a lexicon for these frames. I then introduce in detail the Cognitive Stances notional domain, which includes frames having to do with a cognizer’s degree of certainty about some particular content. Next, I describe our methodology for annotating a corpus with frames and FEs, and analyze our treatment of several specific linguistic phenomena that required additional consideration (such as object complement constructions).Finally, I give quantified information about the current status of the FFN and its evaluation. I conclude with some perspectives on improving and exploiting the FFN
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!