Dissertations / Theses: 'Collections of texts'

1

Brooks, Laura Jeanice. "French chansons collections on the texts of Pierre de Ronsard, 1570-1580 /." Ann Arbor : UMI, 2000. http://catalogue.bnf.fr/ark:/12148/cb37103588v.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Barron, Caroline. "Tourists and texts : Latin inscriptions in the Grand Tour collections of eighteenth-century England." Thesis, King's College London (University of London), 2015. https://kclpure.kcl.ac.uk/portal/en/theses/tourists-and-texts(70feb3de-1582-437b-b4e8-d7a2eb314620).html.

Full text

Abstract:

This thesis examines the acquisition of Latin inscriptions by the Grand Tourists of eighteenth century England - while there are many previous surveys of the private collections of antiquities made in this period, there has been no comprehensive study of the inscriptions in their own right. Previous research has focused on the collection and display of ancient statuary but the Latin inscriptions that were included in the majority of collections in this period have largely been overlooked, or considered 'minor' objects by comparison. This thesis has investigated the types of inscriptions that were acquired by collectors such as Thomas Hollis, William Weddell, the 1st Earl of Shelburne and Charles Townley, the objects on which the inscriptions were placed and the motivation behind their acquisition and suggests that they were included in collections throughout the eighteenth century for very specific reasons. Analysis of the content of the inscriptions and the way in which they were displayed has identified the different intellectual and aesthetic value attributed to them by the Tourists, from an antiquarian interest in their potential to deliver historical facts to their utility in aesthetically pleasing gallery arrangements. It also argues that these responses are indicative of the changing perception of antiquity in the eighteenth century. Archival material has been used to clarify the process by which the inscriptions were acquired and to illustrate how the interests and aesthetic criteria of the Tourists drove the art market and the dealers of antiquities in Rome. This thesis suggests that far from the 'minor' status accorded to them in most previous studies, inscriptions played a vital role in the Grand Tourists' experience of antiquity in the eighteenth century.

APA, Harvard, Vancouver, ISO, and other styles

3

McNutt, Genevieve Theodora. "Joseph Ritson and the publication of early English literature." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31497.

Full text

Abstract:

This thesis examines the work of antiquary and scholar Joseph Ritson (1752-1803) in publishing significant and influential collections of early English and Scottish literature, including the first collection of medieval romance, by going beyond the biographical approaches to Ritson's work typical of nineteenth- and twentieth-century accounts, incorporating an analysis of Ritson's contributions to specific fields into a study of the context which made his work possible. It makes use of the 'Register of Manuscripts Sent to the Reading Room of the British Museum' to shed new light on Ritson's use of the manuscript collections of the British Museum. The thesis argues that Ritson's early polemic attacks on Thomas Warton, Thomas Percy, and the editors of Shakespeare allowed Ritson to establish his own claims to expertise and authority, built upon the research he had already undertaken in the British Museum and other public and private collections. Through his publications, Ritson experimented with different strategies for organizing, systematizing, interpreting and presenting his research, constructing very different collections for different kinds of texts, and different kinds of readers. A comparison of Ritson's three major collections of songs - A Select Collection of English Songs (1783), Ancient Songs (1790), and Scotish Songs (1794) - demonstrates some of the consequences of his decisions, particularly the distinction made between English and Scottish material. Although Ritson's Robin Hood (1795) is the most frequently reprinted of his collections, and one of the best studied, approaching this work within the immediate context of Ritson's research and other publications, rather than its later reception, offers some explanation for its more idiosyncratic features. Finally, Ritson's Ancient Engleish Metrical Romance's (1802) provides a striking example of Ritson's participation in collaborative networks and the difficulty of finding an audience and a market for editions of early English literature at the beginning of the nineteenth century.

APA, Harvard, Vancouver, ISO, and other styles

4

Mota, Denysson Axel Ribeiro. "Representação e recuperação de informação em acervos digitais nos contextos da web semântica e web pragmática: um estudo crítico." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/27/27151/tde-27012016-135403/.

Full text

Abstract:

Estudo comparativo das propostas da Web Semântica e da Web Pragmática, com base nas teorias da organização e recuperação de informações, com o objetivo de propor uma modelagem de representação de dados, que inclua contextos, para melhorar a qualidade dos processos informacionais. Dirige-se o foco da pesquisa aos problemas da representação e recuperação de informações em acervos bibliográficos, principalmente com o uso de RDF e Topic Maps. Para isso, são apresentadas as origens da Web Semântica e da Web Pragmática e os conceitos fundamentais relacionados a esse ambiente, tais como linguagem, representação e recuperação da informação e do conhecimento; termos e terminologia; semântica e pragmática. A metodologia do trabalho foi construída a partir da análise e discussão dos conceitos fundamentais, a modelagem em Diagrama Entidade-Relacionamento e a codificação em RDF e XTM para representar contextos, aplicados a um repositório de textos científicos. Com base na análise crítica das propostas de organização e recuperação de informação das Web Semântica e Web Pragmática, apresenta-se uma proposta de representação de informações com contexto que poderá contribuir para melhorar a relevância dos resultados da recuperação de informações na WWW. Os contextos representados são: Citações, Domínio de Origem do Documento, Domínio da Palavra-Chave, Áreas de Formação do Indivíduo, Áreas de Publicação do Indivíduo, Áreas de Publicação da Revista e Interesses do Indivíduo. A pesquisa permitiu observar que há limites para introduzir contextos em sistemas de informação e compreender, também, que termos como semântica e pragmática requerem abordagem crítica. De fato, a operacionalização de conceitos semânticos e pragmáticos ainda está longe de ser realidade nos sistemas de informação contemporâneos no contexto da www. É importante ressaltar que a presente pesquisa tem cunho interdisciplinar por abordar problemas discutidos tanto na Ciência da Computação como na Ciência da Informação. Esta abordagem interdisciplinar ocorre, em primeiro lugar, porque o objeto de pesquisa, embora tenha origem na Ciência da Computação, requer as teorias e métodos da representação da informação estudados na Ciência da Informação para ser desenvolvido de forma adequada.
Comparative study of the proposals of the Semantic Web and Pragmatic Web, based on the information organization and retrieval studies, in order to propose a model that includes contexts to improve the quality of information retrieval processes. Information representation and retrieval in library collections are the focus of the approach, especially with the use of RDF and Topic Maps. For the development of this research, the origins of the Semantic Web and Pragmatic Web and its fundamental concepts such as language, information and knowledge representation and retrieval; terms and terminology; semantics and pragmatics were discussed. The methodology of this work consisted of the analysis and discussion of key concepts, such as Entity-Relationship Diagram modeling and encoding in RDF and XTM to represent contexts, applied to a repository of scientific texts. Based on the critical analysis of proposed information organization and retrieval from the Semantic Web and Pragmatic Web, it is proposed an information representation with context that could improve the of relevance of the results of information retrieval processes in WWW. The contexts represented were: Quotation, Document Source Domain, Keyword Domain, Person\'s Education Areas, Person\'s Publication Areas, Journal\'s Publication Areas and Person\'s Interests. This research allowed to undestand that there are limits to introduce contexts in information systems and that terms such as semantic and pragmatic require critical approach. Indeed, the operationalization of semantic and pragmatic concepts is still far from reality in contemporary information systems in the context of www. This research has interdisciplinary nature by addressing problems discussed both in Computer Science and in Information Science. The interdisciplinary approach was adopted because the object of research, although originated in Computer Science requires the to be appropriately developed, theories and methods of information representation studied in Information Science.

APA, Harvard, Vancouver, ISO, and other styles

5

Shokouhi, Milad, and milads@microsoft com. "Federated Text Retrieval from Independent Collections." RMIT University. Computer Science and Information Technology, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080521.151632.

Full text

Abstract:

Federated information retrieval is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot index uncrawlable hidden web collections; federated information retrieval systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections. There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated information retrieval systems acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem. In this thesis, we propose new approaches for each of these problems. Our suggested methods, for collection representation, collection selection, and result merging, outperform state-of-the-art techniques in most cases. We also propose novel methods for estimating the number of documents in collections, and for pruning unnecessary information from collection representations sets. Although management of document duplication has been cited as one of the major problems in federated search, prior research in this area often assumes that collections are free of overlap. We investigate the effectiveness of federated search on overlapped collections, and propose new methods for maximizing the number of distinct relevant documents in the final merged results. In summary, this thesis introduces several new contributions to the field of federated information retrieval, including practical solutions to some historically unsolved problems in federated search, such as document duplication management. We test our techniques on multiple testbeds that simulate both hidden web and enterprise search environments.

APA, Harvard, Vancouver, ISO, and other styles

6

Song, Min Song Il-Yeol. "Robust knowledge extraction over large text collections /." Philadelphia, Pa. : Drexel University, 2005. http://dspace.library.drexel.edu/handle/1860/495.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Bray, Steven Russell. "Role efficacy within interdependent teams, measurement development and tests of theory." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/nq32817.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Duan, Yijun. "History-related Knowledge Extraction from Temporal Text Collections." Kyoto University, 2020. http://hdl.handle.net/2433/253410.

Full text

Abstract:

Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第22574号
情博第711号
新制||情||122(附属図書館)
京都大学大学院情報学研究科社会情報学専攻
(主査)教授吉川正俊, 教授鹿島久嗣, 教授田島敬史, 特定准教授 JATOWT Adam Wladyslaw
学位規則第4条第1項該当

APA, Harvard, Vancouver, ISO, and other styles

9

Young-Lai, Matthew. "Text structure recognition using a region algebra." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/NQ60576.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Patchala, Jagadeesh. "Data Mining Algorithms for Discovering Patterns in Text Collections." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1458299372.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Young, Steven C. "Application of aquifer tests and sedimentological concepts to characterize the hydrological properties of a fluvial deposit." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/nq21400.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Walker, Daniel David. "Bayesian Test Analytics for Document Collections." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3530.

Full text

Abstract:

Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved modeling of document collections that may contain textual noise or that may include real-valued metadata associated with the documents. This class of documents includes many historical document collections. Indeed, our specific motivation for this work is to help improve the modeling of historical documents, which are often noisy and/or have historical context represented by metadata. Many historical documents are digitized by means of Optical Character Recognition(OCR) from document images of old and degraded original documents. Historical documents also often include associated metadata, such as timestamps,which can be incorporated in an analysis of their topical content. Many techniques, such as topic models, have been developed to automatically discover patterns of meaning in large collections of text. While these methods are useful, they can break down in the presence of OCR errors. We show the extent to which this performance breakdown occurs. The specific types of analyses covered in this dissertation are document clustering, feature selection, unsupervised and supervised topic modeling for documents with and without OCR errors and a new supervised topic model that uses Bayesian nonparametrics to improve the modeling of document metadata. We present results in each of these areas, with an emphasis on studying the effects of noise on the performance of the algorithms and on modeling the metadata associated with the documents. In this research we effectively: improve the state of the art in both document clustering and topic modeling; introduce a useful synthetic dataset for historical document researchers; and present analyses that empirically show how existing algorithms break down in the presence of OCR errors.

APA, Harvard, Vancouver, ISO, and other styles

13

Zhou, Mei. "The suffix-signature method for searching for phrases in text." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp05/nq22257.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Ball, Liezl Hilde. "Enhancing digital text collections with detailed metadata to improve retrieval." Thesis, University of Pretoria, 2020. http://hdl.handle.net/2263/79015.

Full text

Abstract:

Digital text collections are increasingly important, as they enable researchers to explore new ways of interacting with texts through the use of technology. Various tools have been developed to facilitate exploring and searching in text collections at a fairly low level of granularity. Ideally, it should be possible to filter the results at a greater level of granularity to retrieve only specific instances in which the researcher is interested. The aim of this study was to investigate to what extent detailed metadata could be used to enhance texts in order to improve retrieval. To do this, the researcher had to identify metadata that could be useful to filter according to and find ways in which these metadata can be applied to or encoded in texts. The researcher also had to evaluate existing tools to determine to what extent current tools support retrieval on a fine-grained level. After identifying useful metadata and reviewing existing tools, the researcher could suggest a metadata framework that could be used to encode texts on a detailed level. Metadata in five different categories were used, namely morphological, syntactic, semantic, functional and bibliographic. A further contribution in this metadata framework was the addition of in-text bibliographic metadata, to use where sections in a text have different properties than those in the main text. The suggested framework had to be tested to determine if retrieval was indeed improved. In order to do so, a selection of texts was encoded with the suggested framework and a prototype was developed to test the retrieval. The prototype receives the encoded texts and stores the information in a database. A graphical user interface was developed to enable searching in the database in an easy and intuitive manner. The prototype demonstrates that it is possible to search for words or phrases with specific properties when detailed metadata are applied to texts. The fine-grained metadata from five different categories enable retrieval on a greater level of granularity and specificity. It is therefore recommended that detailed metadata are used to encode texts in order to improve retrieval in digital text collections. Keywords: metadata, digital humanities, digital text collections, retrieval, encoding
Thesis (DPhil (Information Science))--University of Pretoria, 2020.
Information Science
DPhil (Information Science)
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

15

Li, Liuqing. "Event-related Collections Understanding and Services." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/97365.

Full text

Abstract:

Event-related collections, including both tweets and webpages, have valuable information, and are worth exploring in interdisciplinary research and education. Unfortunately, such data is noisy, so this variety of information has not been adequately exploited. Further, for better understanding, more knowledge hidden behind events needs to be unearthed. Regarding these collections, different societies may have different requirements in particular scenarios. Some may need relatively clean datasets for data exploration and data mining. Social researchers require preprocessing of information, so they can conduct analyses. General societies are interested in the overall descriptions of events. However, few systems, tools, or methods exist to support the flexible use of event-related collections. In this research, we propose a new, integrated system to process and analyze event-related collections at different levels (i.e., data, information, and knowledge). It also provides various services and covers the most important stages in a system pipeline, including collection development, curation, analysis, integration, and visualization. Firstly, we propose a query likelihood model with pre-query design and post-query expansion to rank a webpage corpus by query generation probability, and retrieve relevant webpages from event-related tweet collections. We further preserve webpage data into WARC files and enrich original tweets with webpages in JSON format. As an application of data management, we conduct an empirical study of the embedded URLs in tweets based on collection development and data curation techniques. Secondly, we develop TwiRole, an integrated model for 3-way user classification on Twitter, which detects brand-related, female-related, and male-related tweeters through multiple features with both machine learning (i.e., random forest classifier) and deep learning (i.e., an 18-layer ResNet) techniques. As guidance to user-centered social research at the information level, we combine TwiRole with a pre-trained recurrent neural network-based emotion detection model, and carry out tweeting pattern analyses on disaster-related collections. Finally, we propose a tweet-guided multi-document summarization (TMDS) model, which generates summaries of the event-related collections by using tweets associated with those events. The TMDS model also considers three aspects of named entities (i.e., importance, relatedness, and diversity) as well as topics, to score sentences in webpages, and then rank selected relevant sentences in proper order for summarization. The entire system is realized using many technologies, such as collection development, natural language processing, machine learning, and deep learning. For each part, comprehensive evaluations are carried out, that confirm the effectiveness and accuracy of our proposed approaches. Regarding broader impact, the outcomes proposed in our study can be easily adopted or extended for further event analyses and service development.
Doctor of Philosophy
Event-related collections, including both tweets and webpages, have valuable information. They are worth exploring in interdisciplinary research and education. Unfortunately, such data is noisy. Many tweets and webpages are not relevant to the events. This leads to difficulties during data analysis of the datasets, as well as explanation of the results. Further, for better understanding, more knowledge hidden behind events needs to be unearthed. Regarding these collections, different groups of people may have different requirements. Some may need relatively clean datasets for data exploration. Some require preprocessing of information, so they can conduct analyses, e.g., based on tweeter type or content topic. General societies are interested in the overall descriptions of events. However, few systems, tools, or methods exist to support the flexible use of event-related collections. Accordingly, we describe our new framework and integrated system to process and analyze event-related collections. It provides varied services and covers the most important stages in a system pipeline. It has sub-systems to clean, manage, analyze, integrate, and visualize event-related collections. It takes an event-related tweet collection as input and generates an event-related webpage corpus by leveraging Wikipedia and the URLs embedded in tweets. It also combines and enriches original tweets with webpages. As an application of data management, we conduct an empirical study of tweets and their embedded URLs. We developed TwiRole for 3-way user classification on Twitter. It detects brand-related, female-related, and male-related tweeters through their profiles, tweets, and images. To aid user-centered social research, we combine TwiRole with an existing emotion detection tool, and carry out tweeting pattern analyses on disaster-related collections. Finally, we propose a tweet-guided multi-document summarization (TMDS) model and service, which generates summaries of the event-related collections by using tweets associated with those events. It extracts important sentences across different topics from webpages, and organizes them in proper order. The entire system is realized using many technologies, such as collection development, natural language processing, machine learning, and deep learning. For each part, comprehensive evaluations help confirm the effectiveness and accuracy of our proposed approaches. Regarding broader impact, our methods and system can be easily adopted or extended for further event analyses and service development.

APA, Harvard, Vancouver, ISO, and other styles

16

Leong, Elaine. "Medical recipe collections in seventeenth-century England : knowledge, text and gender." Thesis, University of Oxford, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.432177.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Chakravarty, Saurabh. "A Large Collection Learning Optimizer Framework." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78302.

Full text

Abstract:

Content is generated on the web at an increasing rate. The type of content varies from text on a traditional webpage to text on social media portals (e.g., social network sites and microblogs). One such example of social media is the microblogging site Twitter. Twitter is known for its high level of activity during live events, natural disasters, and events of global importance. Challenges with the data in the Twitter universe include the limit of 140 characters on the text length. Because of this limitation, the vocabulary in the Twitter universe includes short abbreviations of sentences, emojis, hashtags, and other non-standard usage. Consequently, traditional text classification techniques are not very effective on tweets. Fortunately, sophisticated text processing techniques like cleaning, lemmatizing, and removal of stop words and special characters will give us clean text which can be further processed to derive richer word semantic and syntactic relationships using state of the art feature selection techniques like Word2Vec. Machine learning techniques, using word features that capture semantic and context relationships, can be of benefit regarding classification accuracy. Improving text classification results on Twitter data would pave the way to categorize tweets relative to human defined real world events. This would allow diverse stakeholder communities to interactively collect, organize, browse, visualize, analyze, summarize, and explore content and sources related to crises, disasters, human rights, inequality, population growth, resiliency, shootings, sustainability, violence, etc. Having the events classified into different categories would help us study causality and correlations among real world events. To check the efficacy of our classifier, we would compare our experimental results with an Association Rules (AR) classifier. This classifier composes its rules around the most discriminating words in the training data. The hierarchy of rules, along with an ability to tune to a support threshold, makes it an effective classifier for scenarios where short text is involved. Traditionally, developing classification systems for these purposes requires a great degree of human intervention. Constantly monitoring new events, and curating training and validation sets, is tedious and time intensive. Significant human capital is required for such annotation endeavors. Also, involved efforts are required to tune the classifier for best performance. Developing and tuning classifiers manually using human intervention would not be a viable option if we are to monitor events and trends in real-time. We want to build a framework that would require very little human intervention to build and choose the best among the available performing classification techniques in our system. Another challenge with classification systems is related to their performance with unseen data. For the classification of tweets, we are continually faced with a situation where a given event contains a certain keyword that is closely related to it. If a classifier, built for a particular event, due to overfitting to what is a biased sample with limited generality, is faced with new tweets with different keywords, accuracy may be reduced. We propose building a system that will use very little training data in the initial iteration and will be augmented with automatically labelled training data from a collection that stores all the incoming tweets. A system that is trained on incoming tweets that are labelled using sophisticated techniques based on rich word vector representation would perform better than a system that is trained on only the initial set of tweets. We also propose to use sophisticated deep learning techniques like Convolutional Neural Networks (CNN) that can capture the combination of the words using an n-gram feature representation. Such sophisticated feature representation could account for the instances when the words occur together. We divide our case studies into two phases: preliminary and final case studies. The preliminary case studies focus on selecting the best feature representation and classification methodology out of the AR and the Word2Vec based Logistic Regression classification techniques. The final case studies focus on developing the augmented semi-supervised training methodology and the framework to develop a large collection learning optimizer to generate a highly performant classifier. For our preliminary case studies, we are able to achieve an F1 score of 0.96 that is based on Word2Vec and Logistic Regression. The AR classifier achieved an F1 score of 0.90 on the same data. For our final case studies, we are able to show improvements of F1 score from 0.58 to 0.94 in certain cases based on our augmented training methodology. Overall, we see improvement in using the augmented training methodology on all datasets.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

18

Jordanov, Dimitar Dimitrov. "Similarity Search in Document Collections." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236746.

Full text

Abstract:

Hlavním cílem této práce je odhadnout výkonnost volně šířeni balík Sémantický Vektory a třída MoreLikeThis z balíku Apache Lucene. Tato práce nabízí porovnání těchto dvou přístupů a zavádí metody, které mohou vést ke zlepšení kvality vyhledávání.

APA, Harvard, Vancouver, ISO, and other styles

19

Makuta, Marzena H. "A computational model of lexical cohesion analysis and its application to the evaluation of text coherence." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ30625.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Popovici, Eugen-Costin. "Information retrieval of text, structure and sequential data in heterogeneous XML document collections." Lorient, 2008. http://www.theses.fr/2008LORIS110.

Full text

Abstract:

Les documents numériques sont aujourd'hui des données complexes qui intègrent d'une manière hétérogène des informations textuelles, structurelles, multimédia ainsi que des méta-données. Le langage de balisage générique XML s’est progressivement imposé comme support privilégié non seulement pour l’échange des données mais aussi pour leur stockage. La gestion des documents stocke��s sous les formats XML nécessite le développement de méthodes et d'outils spécifiques pour l'indexation, la recherche, le filtrage et la fouille des données. En particulier, les fonctions de recherche et de filtrage doivent prendre en compte des requêtes disposant de connaissances incomplètes, imprécises, parfois même erronées sur la structure ou le contenu des documents XML. Ces fonctions doivent par ailleurs maintenir une complexité algorithmique compatible avec la complexité des données et surtout avec leur volume toujours en forte croissance, ceci pour assurer le passage à l'échelle des solutions informatique. Dans cette thèse, nous étudions des méthodes et développons des outils pour indexer et rechercher des informations multimédia hétérogènes stockées dans des banques de documents XML. Plus précisément, nous abordons la question de la recherche par similarité sur des données composites décrites par des éléments structurels, textuels et séquentiels. En s'appuyant sur la partie structurelle des documents XML, nous avons défini un modèle de représentation, d'indexation et d'interrogation flexible pour des types hétérogènes de données séquentielles. Les principes que nous développons mettent en oeuvre des mécanismes de recherche qui exploitent simultanément les éléments des structures documentaires indexées et les contenus documentaires non structurés. Nous évaluons également l’impact sur la pertinence des résultats retournés par l'introduction de mécanismes d'alignement approximatif des éléments structurels. Nous proposons des algorithmes capables de détecter et de suggérer les « meilleurs points d'entrée » pour accéder directement à l’information recherchée dans un document XML. Finalement, nous étudions l'exploitation d’une architecture matérielle dédiée pour accélérer les traitements les plus coûteux du point de vue de la complexité de notre application de recherche d’information structurée
Nowadays digital documents represent a complex and heterogeneous mixture of text, structure, meta-data and multimedia information. The XML description language is now the standard used to represent such documents in digital libraries, product catalogues, scientific data repositories and across the Web. The management of semi structured data requires the development of appropriate indexing, filtering, searching and browsing methods and tools. In particular, the filtering and searching functions of the retrieval systems should be able to answer queries having an incomplete, imprecise or even erroneous knowledge about both the structure and the content of the XML documents. Moreover, these functions should maintain an algorithmic complexity compatible with the complexity of the data while maintaining the scalability of the system. In this thesis, we explore methods for managing and searching collections of heterogeneous multimedia XML documents. We focus on the flexible searching of structure, text, and sequential data embedded in heterogeneous XML document databases. Based on the structural part of the XML documents, we propose a flexible model for the representation, indexing and retrieval of heterogeneous types of sequential data. The matching mechanism simultaneously exploits the structural organization of the sequential/textual data as well as the relevance and the characteristics of the unstructured content of the indexed documents. We also design and evaluate methods both for the approximate matching of structural constraints in an XML Information Retrieval (IR) framework and for the detection of best entry points to locate given topics in XML Documents. Finally, we explore the use of dedicated hardware architecture to accelerate the most expensive processing steps of our XML IR application

APA, Harvard, Vancouver, ISO, and other styles

21

Utz, Laura Lee. "Museum Educator as Advocate for the Visitor: Organizing the Texas Fashion Collection's 25th Anniversary Exhibition Suiting the Modern Woman." Thesis, University of North Texas, 1997. https://digital.library.unt.edu/ark:/67531/metadc277589/.

Full text

Abstract:

Suiting the Modern Woman documented the evolution of women's power dressing in the 20th century by featuring four major components: thirteen period suit silhouettes, the power suits of twenty-eight influential and successful high profile Texas women, a look at the career and creations of Dallas designer, Richard Brooks, who created the professional wardrobe for former Texas Governor Ann Richards, and a media room which showcased images of working women in television and movie clips, advertisements, cartoons, and fashion guidebooks. The exhibition served as an application for contemporary museum education theory. Acting as both the exhibition coordinator and educator provided an opportunity to develop interpretative strategies and create a meaningful visitor experience.

APA, Harvard, Vancouver, ISO, and other styles

22

Barackman, Martin Lee 1953, and Martin Lee 1953 Barackman. "Diverging flow tracer tests in fractured granite: equipment design and data collection." Thesis, The University of Arizona, 1986. http://hdl.handle.net/10150/191896.

Full text

Abstract:

Down-hole injection and sampling equipment was designed and constructed in order to perform diverging-flow tracer tests. The tests were conducted at a field site about 8 km southeast of Oracle, Arizona, as part of a project sponsored by the U. S. Nuclear Regulatory Commission to study mass transport of fluids in saturated, fractured granite. The tracer injection system was designed to provide a steady flow of water or tracer solution to a packed off interval of the borehole and allow for monitoring of down-hole tracer concentration and pressure in the injection interval. The sampling system was designed to collect small volume samples from multiple points in an adjacent borehole. Field operation of the equipment demonstrated the importance of prior knowledge of the location of interconnecting fractures before tracer testing and the need for down-hole mixing of the tracer solution in the injection interval. The field tests were designed to provide data that could me analyzed to provide estimates of dispersivity and porosity of the fractured rock. Although analysis of the data is beyond the scope of this thesis, the detailed data are presented in four appendices.

APA, Harvard, Vancouver, ISO, and other styles

23

Montreuil, Sophie. "Le livre en serie : histoire et theorie de la collection letteraire." Thesis, McGill University, 2001. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=38243.

Full text

Abstract:

This doctoral thesis examines the literary series [collection litteraire], considered at one and the same time as a form of publication defined and redefined by the publisher since the invention of the printing press and as a paratextual component that has the ability to act on the process of reading the text: An original aspect of this work is that it combines in the same analysis fields of knowledge that are rarely studied together: the history of the book and of publishing, the sociology of literature and in particular the theory of the literary institution, the theory of paratextuality and reader response theory. This thesis examines separately the two dimensions of the topic but follows a logical progression that concludes with a third section. The first section explores the hypothesis that the literary series is the outcome of a long process of definition and specialization which has accompanied the evolution of French publishing and literature. It then goes on to examine cases illustrating the "convergence" of the two, such as the "Bibliotheque Bleue", the "Bibliotheque universelle des romans", the "Bibliotheque Charpentier", the collections of livraisons illustrees published in the 1850's, the "Collection Michel Levy" and a few collections published by Flammarion and Fayard. Following a rereading of the Genettien paratexte (1987) that reviews and further refines the parameters of the concept (its boundaries, its components and their functions) in order to increase its scope of action, the second section explores in depth the essence of the encounter between the series and literature itself and proposes a theory of the series which positions it in relation to a community of readers and recognizes a different functioning, different risks and effects depending on whether it is destined for a specialized public or the general public. Finally, the third section picks up the historical thread that the first section suspended at the beginning of the 20th century

APA, Harvard, Vancouver, ISO, and other styles

24

Rossi, Rafael Geraldeli. "Representação de coleções de documentos textuais por meio de regras de associação." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-31082011-125648/.

Full text

Abstract:

O número de documentos textuais disponíveis em formato digital tem aumentado incessantemente. Técnicas de Mineração de Textos são cada vez mais utilizadas para organizar e extrair conhecimento de grandes coleções de documentos textuais. Para o uso dessas técnicas é necessário que os documentos textuais estejam representados em um formato apropriado. A maioria das pesquisas de Mineração de Textos utiliza a abordagem bag-of-words para representar os documentos da coleção. Essa representação usa cada palavra presente na coleção de documentos como possível atributo, ignorando a ordem das palavras, informa ções de pontuação ou estruturais, e é caracterizada pela alta dimensionalidade e por dados esparsos. Por outro lado, a maioria dos conceitos são compostos por mais de uma palavra, como Inteligência Articial, Rede Neural, e Mineração de Textos. As abordagens que geram atributos compostos por mais de uma palavra apresentam outros problemas além dos apresentados pela representação bag-of-words, como a geração de atributos com pouco signicado e uma dimensionalidade muito maior. Neste projeto de mestrado foi proposta uma abordagem para representar documentos textuais nomeada bag-of-related-words. A abordagem proposta gera atributos compostos por palavras relacionadas com o uso de regras de associação. Com as regras de associação, espera-se identicar relações entre palavras de um documento, além de reduzir a dimensionalidade, pois são consideradas apenas as palavras que ocorrem ou que coocorrem acima de uma determinada frequência para gerar as regras. Diferentes maneiras de mapear o documento em transações para possibilitar a geração de regras de associação são analisadas. Diversas medidas de interesse aplicadas às regras de associação para a extração de atributos mais signicativos e a redução do número de atributos também são analisadas. Para avaliar o quanto a representação bag-of-related-words pode auxiliar na organização e extração de conhecimento de coleções de documentos textuais, e na interpretabilidade dos resultados, foram realizados três grupos de experimentos: 1) classicação de documentos textuais para avaliar o quanto os atributos da representação bag-of-related-words são bons para distinguir as categorias dos documentos; 2) agrupamento de documentos textuais para avaliar a qualidade dos grupos obtidos com a bag-of-related-words e consequentemente auxiliar na obtenção da estrutura de uma hierarquia de tópicos; e 3) construção e avaliação de hierarquias de tópicos por especialistas de domínio. Todos os resultados e dimensionalidades foram comparados com a representação bag-of-words. Pelos resultados dos experimentos realizados, pode-se vericar que os atributos da representação bag-of-related-words possuem um poder preditivo tão bom quanto os da representação bag-of-words. A qualidade dos agrupamentos de documentos textuais utilizando a representação bag-of-related-words foi tão boa quanto utilizando a representação bag-of-words. Na avaliação de hierarquias de tópicos por especialistas de domínio, a utilização da representação bag-of-related-words apresentou melhores resultados em todos os quesitos analisados
The amount of textual documents available in digital format is incredibly large. Text Mining techniques are becoming essentials to manage and extract knowledge in big textual document collections. In order to use these techniques, the textual documents need to be represented in an appropriate format to allow the construction of a model that represents the embedded knowledge in these textual documents. Most of the researches on Text Mining uses the bag-of-words approach to represent textual document collections. This representation uses each word in a collection as feature, ignoring the order of the words, structural information, and it is characterized by the high dimensionality and data sparsity. On the other hand, most of the concepts are compounded by more than one word, such as Articial Intelligence, Neural Network, and Text Mining. The approaches which generate features compounded by more than one word to solve this problem, suer from other problems, as the generation of features without meaning and a dimensionality much higher than that of the bag-of-words. An approach to represent textual documents named bag-of-related-words was proposed in this master thesis. The proposed approach generates features compounded by related words using association rules. We hope to identify relationships among words and reduce the dimensionality with the use of association rules, since only the words that occur and cooccur over a frequency threshold will be used to generate rules. Dierent ways to map the document into transactions to allow the extraction of association rules are analyzed. Dierent objective interest measures applied to the association rules to generate more meaningful features and to the reduce the feature number are also analyzed. To evaluate how much the textual document representation proposed in this master project can aid the managing and knowledge extraction from textual document collections, and the understanding of the results, three experiments were carried out: 1) textual document classication to analyze the predictive power of the bag-of-related-words features, 2) textual document clustering to analyze the quality of the cluster using the bag-of-related-words representation 3) topic hierarchies building and evaluation by domain experts. All the results and dimensionalities were compared to the bag-of-words representation. The results presented that the features of the bag-of-related-words representation have a predictive power as good as the features of the bag-of-words representation. The quality of the textual document clustering also was as good as the bag-of-words. The evaluation of the topic hierarchies by domain specialists presented better results when using the bag-of-related-words representation in all the questions analyzed

APA, Harvard, Vancouver, ISO, and other styles

25

McDaniel, Thomas Rudy. "A SOFTWARE-BASED KNOWLEDGE MANAGEMENT SYSTEM USING NARRATIVE TEXTS." Doctoral diss., University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4372.

Full text

Abstract:

Technical and professional communicators have in recent research been challenged to make significant contributions to the field of knowledge management, and to learn or create the new technologies allowing them to do so. The purpose of this dissertation is to make such a combined theoretical and applied contribution from the context of the emerging discipline of Texts and Technology. This dissertation explores the field of knowledge management (KM), particularly its relationship to the related study of artificial intelligence (AI), and then recommends a KM software application based on the principles of narratology and narrative information exchange. The focus of knowledge is shifted from the reductive approach of data and information to a holistic approach of meaning and the way people make sense of complex events as experiences expressed in stories. Such an analysis requires a discussion of the evolution of intelligent systems and narrative theory as well as an examination of existing computerized and non-computerized storytelling systems. After a thorough discussion of these issues, an original software program that is used to collect, analyze, and distribute thematic stories within any hierarchical organization is modeled, exemplified, and explained in detail.
Ph.D.
Department of English
Arts and Sciences
Texts and Technology

APA, Harvard, Vancouver, ISO, and other styles

26

Desjardins, Michael. "The origins and development of the notion of isostheneia in Greek scepticism: A collection of texts." Thesis, University of Ottawa (Canada), 1996. http://hdl.handle.net/10393/10379.

Full text

Abstract:

The research collected texts in which ancient authors wrote of issues having to do with the sceptical notion of isostheneia. The collection finds its beginnings under the auspices of Calliope, the muse of fine speaking, and in the tendency to produce accounts in antithetical terms or in opposition to one another. In the Classical period anticipations of the development of the notion are found in the thinking of the physicists and the speculation of the physicians. Most significant for the development of the notion seems to have been the emergence of some of the differences between rhetoric and dialectic, the one elaborated under the pressure of the practice of the Sophists and Isocrates, the other isolated by Socrates and detailed by Plato as a philosophical method. The medical communities seem to have produced a paradigm of balance between opposed elements as the foundation of vitality and health. At the end of the Classical period, Aristotle appears to have provided some model for the sceptical notion in his practice of arguing in utramque partem, and to have anticipated it in his description of perplexity. Both Plato and Aristotle were familiar with some of the modes which were later collected by Aenesidemus. In the Hellenistic period it appears to have been in the ad hominen argument of Arcesilaus that the sceptical notion first became articulate as the basis of suspension of judgment. With Carneades the practice of arguing both for and against any proposition and relying heavily on rhetoric appears to have been the model on which the sceptical way was being fashioned. A controversy between Epicureans and Stoics over how to decide between acts of assent founded on equally reliable sense perceptions is suspected to be at the basis of the articulation of the notion of isostheneia. In the Hellenistic period the development of the notion seems also to have been assisted by the requirement for some therapeutical intervention by means of which health might be restored. With the Imperial period the sceptical notion first became apparent in the literature: Greek words from the root $\iota\sigma\sigma\sigma\theta\epsilon\nu$--which would be later used to name it seem to have begun to find their way into texts from its beginning. Some evidence is introduced to indicate that Philo Judaeus had knowledge of the subject of this study. By the time that Plutarch wrote Adversus Colotem the notion had become fully articulate. Later in the second century authors of the second sophistic also appear to have been comfortable with the notion at the basis of the sceptical way. Galen used the word on many occasions to describe anatomical and physiological details and a passage is included to indicate that he had knowledge of the notion. Sextus Empiricus compiled the arguments of the Pyrrhonians sometime around the end of the second century, and used words from the root $\iota\sigma\sigma\sigma\theta\epsilon\nu$--to identify the notion. Late in the Imperial period and reflecting what was to occur in the Medieval Latin west, Augustine seems to have been unaware that the equal persuasiveness of incompatible accounts was the basis for withholding assent. In the Greek east the notion continued to appear in some literature produced after the end of the texts known as ancient philosophy.

APA, Harvard, Vancouver, ISO, and other styles

27

Ward, Erik. "Tweet Collect: short text message collection using automatic query expansion and classification." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-194961.

Full text

Abstract:

The growing number of twitter users create large amounts of messages that contain valuable information for market research. These messages, called tweets, which are short, contain twitter-specific writing styles and are often idiosyncratic give rise to a vocabulary mismatch between typically chosen keywords for tweet collection and words used to describe television shows. A method is presented that uses a new form of query expansion that generates pairs of search terms and takes into consideration the language usage of twitter to access user data that would otherwise be missed. Supervised classification, without manually annotated data, is used to maintain precision by comparing collected tweets with external sources. The method is implemented, as the Tweet Collect system, in Java utilizing many processing steps to improve performance. The evaluation was carried out by collecting tweets about five different television shows during their time of airing and indicating, on average, a 66.5% increase in the number of relevant tweets compared with using the title of the show as the search terms and 68.0% total precision. Classification gives a, slightly lower, average increase of 55.2% in number of tweets and a greatly increased 82.0% total precision. The utility of an automatic system for tracking topics that can find additional keywords is demonstrated. Implementation considerations and possible improvements are discussed that can lead to improved performance.

APA, Harvard, Vancouver, ISO, and other styles

28

Celikik, Marjan [Verfasser], and Hannah [Akademischer Betreuer] Bast. "Efficient error-tolerant search on large text collections = Effiziente fehlertolerante Suche auf große Datenmengen." Freiburg : Universität, 2013. http://d-nb.info/1114829188/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Moreau, Nicolas. "Modèles collectifs d'offre de travail : analyses et tests sur données françaises." Aix-Marseille 2, 2002. http://www.theses.fr/2002AIX24016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Crisafi, Anthony. "OUTSIDE THE FRAME: TOWARDS A PHENOMENOLOGY OF TEXTS AND TECHNOLOGY." Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4145.

Full text

Abstract:

The subject of my dissertation is how phenomenology can be used as a tool for understanding the intersection between texts and technology. What I am suggesting here is that, specifically in connection with the focus of our program in Texts and Technology, there are very significant questions concerning how digital communications technology extends our humanity, and more importantly what kind of epistemological and ontological questions are raised because of this. There needs to be a coherent theory for Texts and Technology that will help us to understand this shift, and I feel that this should be the main focus for the program itself. In this dissertation I present an analysis of the different phenomenological aspects of the study of Texts and Technology. For phenomenologists such as Husserl, Heidegger, and Merleau-Ponty, technology, in all of its forms, is the way in which human consciousness is embodied. Through the creation and manipulation of technology, humanity extends itself into the physical world. Therefore, I feel we must try to understand this extension as more than merely a reflection of materialist practices, because first and foremost we are discussing how the human mind uses technology to further its advancement. I will detail some of the theoretical arguments both for and against the study of technology as a function of human consciousness. I will focus on certain issues, such as problems of archiving and copyright, as central to the field. I will further argue how from a phenomenological standpoint we are in the presence of a phenomenological shift from the primacy of print towards a more hybrid system of representing human communications.
Ph.D.
Department of English
Arts and Humanities
Texts and Technology PhD

APA, Harvard, Vancouver, ISO, and other styles

31

Taylor, Jennifer Renee. "Ocular demonstrations: cross-dressing and the body in early american texts." Honors in the Major Thesis, University of Central Florida, 2001. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/250.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Arts and Sciences
English

APA, Harvard, Vancouver, ISO, and other styles

32

Kearley, Miranda S. "Traumatic desire in three gothic texts : The Monk, Dracula, and Lost." Honors in the Major Thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1096.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Arts and Humanities
English Literature

APA, Harvard, Vancouver, ISO, and other styles

33

Tran, Anh Xuan. "Identifying latent attributes from video scenes using knowledge acquired from large collections of text documents." Thesis, The University of Arizona, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3634275.

Full text

Abstract:

Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that “the most important thing in communication is hearing what isn't said.” It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter.

In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.

APA, Harvard, Vancouver, ISO, and other styles

34

Popovici, Eugen. "Recherche et filtrage d'information multimédia (texte, structure et séquence) dans des collections de documents XML hétérogènes." Phd thesis, Université de Bretagne Sud, 2008. http://tel.archives-ouvertes.fr/tel-00511981.

Full text

Abstract:

Les documents numériques sont aujourd'hui des données complexes qui intègrent d'une manière hétérogène des informations textuelles, structurelles, multimédia ainsi que des méta-données. Le langage de balisage générique XML s'est progressivement imposé comme support privilégié non seulement pour l'échange des données mais aussi pour leur stockage. La gestion des documents stockés sous les formats XML nécessite le développement de méthodes et d'outils spécifiques pour l'indexation, la recherche, le filtrage et la fouille des données. En particulier, les fonctions de recherche et de filtrage doivent prendre en compte des requêtes disposant de connaissances incomplètes, imprécises, parfois même erronées sur la structure ou le contenu des documents XML. Ces fonctions doivent par ailleurs maintenir une complexité algorithmique compatible avec la complexité des données et surtout avec leur volume toujours en forte croissance, ceci pour assurer le passage à l'échelle des solutions informatiques. Dans cette thèse, nous étudions des méthodes et développons des outils pour indexer et rechercher des informations multimédia hétérogènes stockées dans des banques de documents XML. Plus précisément, nous abordons la question de la recherche par similarité sur des données composites décrites par des éléments structurels, textuels et séquentiels. En s'appuyant sur la partie structurelle des documents XML, nous avons défini un modèle de représentation, d'indexation et d'interrogation flexible pour des types hétérogènes de données séquentielles. Les principes que nous développons mettent en oeuvre des mécanismes de recherche qui exploitent simultanément les éléments des structures documentaires indexées et les contenus documentaires non structurés. Nous évaluons également l'impact sur la pertinence des résultats retournés par l'introduction de mécanismes d'alignement approximatif des éléments structurels. Nous proposons des algorithmes capables de détecter et de suggérer les « meilleurs points d'entrée » pour accéder directement à l'information recherchée dans un document XML. Finalement, nous étudions l'exploitation d'une architecture matérielle dédiée pour accélérer les traitements les plus coûteux du point de vue de la complexité de notre application de recherche d'information structurée. v

APA, Harvard, Vancouver, ISO, and other styles

35

Daniel, Christophe. "Conditions de travail et salaires compensations, négociations et incitations : théories et tests." Orléans, 1995. http://www.theses.fr/1995ORLE0503.

Full text

Abstract:

A la suite du debat entre smith et stuart mill sur les relations salaires conditions de travail, il existe dans la litterature deux theories concurrentes d'appariement entre les travailleurs et les emplois : la theorie des differences compensatrices et la theorie de la segmentation ou du cumul. Le principal facteur qui explique les differences entre ces deux theories est l'existence d'un effet-revenu (effet de richesse ou effet d'heterogeneite des productivites individuelles). En combinant le modele des salaires hedoniques (selon lequel le prix implicite des conditions de travail est une difference de salaire) et le modele des negociations collectives salaires emploi, nous mettons en evidence un autre facteur : un effet de pouvoir syndical. Nous passons ainsi d'une situation concurrentielle a une situation de monopole bilateral dans laquelle les optima de pareto sont situes sur une courbe des contrats croissante dans le plan (salaires, bonnes conditions de travail). Nous testons ensuite la validite de cet effet syndical a partir de donnees transversales de l'insee de 1986-1987. Nous expliquons au prealable pourquoi il est necessaire de corriger (simultanement ou non) les biais negatifs des variables omises de productivites ou de preferences individuelles, d'endogeneite des conditions de travail, et de selection de l'echantillon lorsqu'on estime une fonction de gains elargie aux conditions de travail. Nous constatons a la fois l'existence, pour l'ensemble de l'echantillon, d'une relation compensatoire entre les salaires et plusieurs indices generaux de conditions de travail, et d'une relation cumulative entre les remunerations et les bonnes conditions de travail dans les secteurs fortement syndiques. Ces resultats sont en outre plus accentues pour les femmes que pour les hommes. Cette recherche constitue ainsi l'une des premieres en france qui analyse l'influence des conditions de travail sur les ecarts de salaires, mais aussi sur les relations entre les salaires et leurs variables explicatives traditionnelles
Since the debate between smith and stuart mill about wages and working conditions relationships, there are in the literature two opposite theories of matching between workers and jobs : the theory of compensating differentials and the theory of segmentation which predicts a positive relationship between wages and good working conditions. The main factor explaining differences between these two theories is the existence of an income effect (wealth effect or heterogeneity of individual productivities effect). Combining the hedonic wage model (according to which the shadow price of working conditions is a wages differential) and the wages employment collective bargaining model, we show off another factor : an union power effect. So, we shift from a competitive situation to a bilateral monopoly situation in which pareto optima are located on an upward-sloping contract curve in the (wage, good working conditions) space. Then, we test the validity of this union effect from cross-sectional 19861987 insee data. We before explain why it is necessary to correct (simultaneously or not) negative biases of productivities or individual preferences omitted variables, of working conditions endogeneity and of sample selection when one estimates a wage function in which working conditions are introduced. We find out at one and the same time the existence, for the whole sample, a negative relationship between wages and several general indexes of good working conditions, and a positive relationship in highly unionized sectors. Besides, these results are more strongly marked for women than for men. Thus, this research constitutes one of the firsts in france which analyses working conditions influences on wage gaps, but also on relationships between wages and their standard independent variables

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Nan. "TRANSFORM BASED AND SEARCH AWARE TEXT COMPRESSION SCHEMES AND COMPRESSED DOMAIN TEXT RETRIEVAL." Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3938.

Full text

Abstract:

In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

37

Kozlovski, Nikolai. "TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3607.

Full text

Abstract:

Previous research showed that combining three different optical character recognition (OCR) engines (ExperVision® OCR, Scansoft OCR, and Abbyy® OCR) results using voting algorithms will get higher accuracy rate than each of the engines individually. While a voting algorithm has been realized, several aspects to automate and improve the accuracy rate needed further research. This thesis will focus on morphological image preprocessing and morphological text restoration that goes to OCR engines. This method is similar to the one used in restoration partial finger prints. Series of morphological dilating and eroding filters of various mask shapes and sizes were applied to text of different font sizes and types with various noises added. These images were then processed by the OCR engines, and based on these results successful combinations of text, noise, and filters were chosen. The thesis will also deal with the problem of text alignment. Each OCR engine has its own way of dealing with noise and corrupted characters; as a result, the output texts of OCR engines have different lengths and number of words. This in turn, makes it impossible to use spaces a delimiter as a method to separate the words for processing by the voting part of the system. Text aligning determines, using various techniques, what is an extra word, what is supposed to be two or more words instead of one, which words are missing in one document compared to the other, etc. Alignment algorithm is made up of a series of shifts in the two texts to determine which parts are similar and which are not. Since errors made by OCR engines are due to visual misrecognition, in addition to simple character comparison (equal or not), a technique was developed that allows comparison of characters based on how they look.
M.S.E.E.
Department of Electrical and Computer Engineering
Engineering and Computer Science
Electrical Engineering

APA, Harvard, Vancouver, ISO, and other styles

38

O'Brien, Erica F. ""FLIPPING THE SCRIPT": FEMININE CULPABILITY MODELS IN FIFTEENTH-CENTURY IBERIAN TEXTS." Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/577421.

Full text

Abstract:

Spanish
Ph.D.
This dissertation explores the ways in which feminine culpability is verbally articulated by the male courtly lover to his beloved lady within the amorous relationship in three fifteenth-century Spanish sentimental novels: Diego de San Pedro’s Cárcel de amor, published in 1492, and two of Juan de Flores' sentimental novels, Grimalte y Gradissa and Grisel and Mirabella, both published in approximately 1495, and how these motifs of feminine culpability are subverted in the anonymous fifteenth-century Catalan chivalric novel Curial e Güelfa. This subversion of culpability motifs is facilitated in Curial e Güelfa since there is also a subversion of gender roles within the amorous relationship of the novel's protagonists: a female lover, Güelfa, who courts her male beloved, Curial. To execute this study, I begin by discussing the origins of this rhetoric of feminine culpability in patristic, Biblical and philosophical texts, illustrating their sedimentation into the collective ideologies of medieval audiences. I also examine these feminine culpability models in Provençal lyric poetry written and recited by Occitan troubadours between the eleventh and thirteenth centuries, as one of its particular genres, the mala cansó, aims to not only blame the beloved lady, but also to publicly defame her, a threat that is also ever-present in the words of the male lover in the sentimental novel. After analyzing the tactics used by the male courtly lover to blame the beloved lady for his suffering and the demise of the relationship, I demonstrate how these same tactics are employed by the female characters of Curial e Güelfa toward the beloved man. However, feminine blame still occurs in Curial e Güelfa, manifested as feminine self-blame and blame between women, while the male characters engage in self-absolution, absolution of other men, and utter shirking of the blame. The theoretical framework employed is that of medieval canon law, and the way in which culpability was determined under this law from the twelfth century onward, which was by the intentions of the offender at the time of the crime or transgression rather than the consequences of the transgression. If we examine these fifteenth-century courtly love texts, it becomes clear that the beloved lady is innocent, while the male lover himself is the culpable party. Finally, following Rouben C. Cholakian's reading of the troubadour poetry through the work of twentieth-century psychoanalyst Jacques Lacan, I conclude that although the poet-lover verbally enunciates erotic metaphors and adulating language toward his beloved lady in the guise of courtly love, the true desire that he cannot articulate is to dominate, to overpower, and possibly to eradicate the feminine. Thus, in a Lacanian sense the notion that courtly love literature praises the woman is a fallacy. Both the poet-lover of the Provençal lyric and the courtly lover of the sentimental novel subvert the concept of alleged feminine superiority and exaltation in these texts.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

39

Balikas, Georgios. "Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM054/document.

Full text

Abstract:

Le texte est l'une des sources d'informations les plus répandues et les plus persistantes. L'analyse de contenu du texte se réfère à des méthodes d'étude et de récupération d'informations à partir de documents. Aujourd'hui, avec une quantité de texte disponible en ligne toujours croissante l'analyse de contenu du texte revêt une grande importance parce qu' elle permet une variété d'applications. À cette fin, les méthodes d'apprentissage de la représentation sans supervision telles que les modèles thématiques et les word embeddings constituent des outils importants.L'objectif de cette dissertation est d'étudier et de relever des défis dans ce domaine.Dans la première partie de la thèse, nous nous concentrons sur les modèles thématiques et plus précisément sur la manière d'incorporer des informations antérieures sur la structure du texte à ces modèles.Les modèles de sujets sont basés sur le principe du sac-de-mots et, par conséquent, les mots sont échangeables. Bien que cette hypothèse profite les calculs des probabilités conditionnelles, cela entraîne une perte d'information.Pour éviter cette limitation, nous proposons deux mécanismes qui étendent les modèles de sujets en intégrant leur connaissance de la structure du texte. Nous supposons que les documents sont répartis dans des segments de texte cohérents. Le premier mécanisme attribue le même sujet aux mots d'un segment. La seconde, capitalise sur les propriétés de copulas, un outil principalement utilisé dans les domaines de l'économie et de la gestion des risques, qui sert à modéliser les distributions communes de densité de probabilité des variables aléatoires tout en n'accédant qu'à leurs marginaux.La deuxième partie de la thèse explore les modèles de sujets bilingues pour les collections comparables avec des alignements de documents explicites. En règle générale, une collection de documents pour ces modèles se présente sous la forme de paires de documents comparables. Les documents d'une paire sont écrits dans différentes langues et sont thématiquement similaires. À moins de traductions, les documents d'une paire sont semblables dans une certaine mesure seulement. Pendant ce temps, les modèles de sujets représentatifs supposent que les documents ont des distributions thématiques identiques, ce qui constitue une hypothèse forte et limitante. Pour le surmonter, nous proposons de nouveaux modèles thématiques bilingues qui intègrent la notion de similitude interlingue des documents qui constituent les paires dans leurs processus générateurs et d'inférence.La dernière partie de la thèse porte sur l'utilisation d'embeddings de mots et de réseaux de neurones pour trois applications d'exploration de texte. Tout d'abord, nous abordons la classification du document polylinguistique où nous soutenons que les traductions d'un document peuvent être utilisées pour enrichir sa représentation. À l'aide d'un codeur automatique pour obtenir ces représentations de documents robustes, nous démontrons des améliorations dans la tâche de classification de documents multi-classes. Deuxièmement, nous explorons la classification des tweets à plusieurs tâches en soutenant que, en formant conjointement des systèmes de classification utilisant des tâches corrélées, on peut améliorer la performance obtenue. À cette fin, nous montrons comment réaliser des performances de pointe sur une tâche de classification du sentiment en utilisant des réseaux neuronaux récurrents. La troisième application que nous explorons est la récupération d'informations entre langues. Compte tenu d'un document écrit dans une langue, la tâche consiste à récupérer les documents les plus similaires à partir d'un ensemble de documents écrits dans une autre langue. Dans cette ligne de recherche, nous montrons qu'en adaptant le problème du transport pour la tâche d'estimation des distances documentaires, on peut obtenir des améliorations importantes
Text is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements

APA, Harvard, Vancouver, ISO, and other styles

40

Smallwood, Ashley Michelle. "Use-wear analysis of the Clovis biface collection from the Gault site in central Texas." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Arifin, Zamri. "The Islamic tendency in Al-Jahiz's prose works : A study of selected texts from the Rasa'il Al-Jahiz collection." Thesis, University of Wales Trinity Saint David, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.503603.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Doan, Malani Melissa. "Effects of a Reading Strategy with Digital Social Studies Texts for Eighth Grade Students." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5192.

Full text

Abstract:

Recent data indicate that only 34% of American eighth grade students are able to demonstrate grade-level proficiency with academic reading tasks (NCES, 2011). The staggering nature of statistics such as this is even more profound when considering that high level literacy skills combined with mastery of digital texts have become practical requirements for success in secondary education, post-secondary education, and virtually all vocational contexts. Despite this incongruent scenario, little research has been conducted to evaluate instructional methods and reading comprehension strategies with digital texts. To address this critical issue, the present study examined the effects of a metacognitive reading comprehension instructional protocol (STRUCTURE Your Reading [SYR]; Ehren, 2008) with eighth grade students using digital texts in a standard social studies classroom in an urban American school setting. The focus of the protocol was on teaching strategies and self-questioning prompts before, during, and after reading. The study employed a randomized controlled design and consisted of three conditions with a total of 4 participating teachers and 124 participating students. The study was conducted over 25 instructional days and two instructional units with 13.83 treatment hours within the standard, social studies classes. Hierarchical ANCOVA analyses revealed that when controlling for pre-test measurements, the comparison and experimental groups performed significantly better than the control group with instructional unit test scores (Unit 2), reading strategy use in all stages of reading (before, during, and after), and self-questioning prompts during reading. Comparison and experimental groups did not significantly differ in these gains, indicating that this instructional protocol is effective with both paper and digital text. These findings suggest that the SYR instructional protocol is effective with secondary students in content area classrooms when using digital text. Furthermore, they suggest that metacognition and reading comprehension strategy instruction are able to be successfully embedded within a content area class and result in academic and metacognitive gains. Clinical implications and future research directions and are discussed.
Ph.D.
Doctorate
Education and Human Performance
Education; Communication Sciences and Disorders

APA, Harvard, Vancouver, ISO, and other styles

43

Malani, Melissa Doan. "Effects of a reading strategy with digital social studies texts for eighth grade students." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5414.

Full text

Abstract:

Recent data indicate that only 34% of American eighth grade students are able to demonstrate grade-level proficiency with academic reading tasks (NCES, 2011). The staggering nature of statistics such as this is even more profound when considering that high level literacy skills combined with mastery of digital texts have become practical requirements for success in secondary education, post-secondary education, and virtually all vocational contexts. Despite this incongruent scenario, little research has been conducted to evaluate instructional methods and reading comprehension strategies with digital texts. To address this critical issue, the present study examined the effects of a metacognitive reading comprehension instructional protocol (STRUCTURE Your Reading (SYR); Ehren, 2008) with eighth grade students using digital texts in a standard social studies classroom in an urban American school setting. The focus of the protocol was on teaching strategies and self-questioning prompts before, during, and after reading. The study employed a randomized controlled design and consisted of three conditions with a total of 4 participating teachers and 124 participating students. The study was conducted over 25 instructional days and two instructional units with 13.83 treatment hours within the standard, social studies classes. Hierarchical ANCOVA analyses revealed that when controlling for pre-test measurements, the comparison and experimental groups performed significantly better than the control group with instructional unit test scores (Unit 2), reading strategy use in all stages of reading (before, during, and after), and self-questioning prompts during reading. Comparison and experimental groups did not significantly differ in these gains, indicating that this instructional protocol is effective with both paper and digital text.; These findings suggest that the SYR instructional protocol is effective with secondary students in content area classrooms when using digital text. Furthermore, they suggest that metacognition and reading comprehension strategy instruction are able to be successfully embedded within a content area class and result in academic and metacognitive gains. Clinical implications and future research directions and are discussed.
ID: 031001563; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Adviser: Barbara J. Ehren.; Title from PDF title page (viewed August 26, 2013).; Thesis (Ph.D.)--University of Central Florida, 2012.; Includes bibliographical references (p. 236-261).
Ph.D.
Doctorate
Education and Human Performance
Education; Communication Sciences and Disorders

APA, Harvard, Vancouver, ISO, and other styles

44

Kiyota, Yoji. "Dialog navigator : A navigation system from vague questions to specific answers based on real-world text collections." 京都大学 (Kyoto University), 2004. http://hdl.handle.net/2433/84999.

Full text

Abstract:

As computers and their networks continue to be developed, our day-to-day lives are being surrounded by increasingly more complex instruments, and we often have to ask questions about using them. At the same time, large collections of texts to answer these questions are being gathered. Therefore, there are potential answers to many of our questions that exist as texts somewhere. However, there are various gaps between our various questions and the texts, and these prevent us from accessing appropriate texts to answer our questions. The gaps are mainly composed of both expression and vagueness gaps. When we seek texts for answers using conventional keyword-based text retrieval systems, we often have trouble locating them. In contrast, when we ask experts on instruments or operators of call centers, they can resolve the various gaps, by interpreting our questions flexibly, and by producing some ask-backs. The problem with experts and call centers is that they are not always available. Two approaches have been studied to resolve the various gaps: the extension of keyword-based text retrieval systems, and the application of artificial intelligence techniques. However, these approaches have their respective limitations. The former uses texts or keywords as methods for ask-back questions, but these methods are not always suitable. The latter requires a specialized knowledge base described in formal languages, so it cannot be applied to existing collections with large amount of texts. This thesis targets real-world the large text collections provided by Microsoft Corporation, and addresses a novel methodology to resolve the gaps between various user questions and the texts. The methodology consists of two key solutions: precise and flexible methods of matching user questions with texts based on NLP (natural language processing) techniques, and ask-back methods using the matching methods. First, the matching methods, including sentence structure analysis and expression gap resolution, are described. In addition, these methods are extended into matching through metonymy, which is frequently observed in natural languages. After that, a solution to make ask backs based on these matching methods, by using two kinds of ask-backs that complement each other, is proposed. Both ask-backs navigate users from vague questions to specific answers. Finally, our methodology is evaluated through the real-world operation of a dialog system, Dialog Navigator, in which all the proposed methods are implemented. Chapter 1 discusses issues on information retrieval, and present which issues are to be solved. That is, it examines the question logs from a real-world natural-language-based text retrieval system, and organizes types and factors of the gaps. The examination indicates that some gaps between user questions and texts cannot be resolved well by methods used in previous studies, and suggests that both interactions with users and applicability to real-world text collections are needed. Based on the discussion, a solution to deal with these gaps is proposed, by advancing an approach employed in open-domain question-answering systems, i.e., utilization of recent NLP techniques, into resolving the various gaps. Chapter 2 proposes several methods of matching user questions with texts, based on the NLP techniques. Of these techniques, sentence structure analysis through fullparsing is essential for two reasons: first, it enables expression gaps to be resolved beyond the keyword level; second, it is indispensable in resolving vagueness gaps by providing ask-backs. Our methods include: sentence structure analysis using a Japanese parser KNP, expression-gap resolution based on two kinds of dictionaries, text-collection selection through question-type estimates, and score calculations based on sentence structures. An experimental evaluation on testsets shows significant improvements of performance by our methods. Chapter 3 proposes a novel method of processing metonymy, as an extension of the matching methods proposed in Chapter 2. Metonymy is a figure of speech in which the name of one thing is substituted for that of something else to which it is related, and this frequently occurs in both user questions and texts. Namely, this chapter addresses the automatic acquisition of pairs of metonymic expressions and their interpretative expressions from large corpora, and applies the acquired pairs to resolving structural gaps caused by metonymy. Unlike previous studies on metonymy, the method targets both recognition and interpretation process of metonymy. The method acquired 1, 126 pairs from corpora, and over 80% of the pairs were correct as interpretations of metonymy. Furthermore, an experimental evaluation on the testsets demonstrated that introducing the acquired pairs significantly improves matching. Chapter 4 presents a strategy of navigating users from vague questions to specific texts based on the previously discussed matching methods. Of course, it is necessary to make some use of ask-backs to achieve this, and this strategy involves two approaches: description extraction as a bottom-up approach, and dialog cards as a top-down approach. The former extracts the neighborhoods of the part that matches the user question in each text through matching methods. Such neighborhoods are mostly suitable for ask-backs that clarify vague user questions. However, if a user’s question is too vague, this approach often fails. The latter covers vague questions based on the know-how of the call center; dialog cards systematize procedures for ask-backs to clarify frequently asked questions that are vague. Matching methods are also applied to match user questions with the cards. Finally, a comparison of the approaches with those used in other related work demonstrates the novelty of the approaches. Chapter 5 describes the architecture for Dialog Navigator, a dialog system in which all the proposed methods are implemented. The system uses the real-world large text collections provided by Microsoft Corporation, and it has been open to the public on a website from April 2002. The methods were evaluated based on the real-world operational results of the system, because the various gaps to be resolved should reflect those in the real-world. The evaluation proved the effectiveness of the methods: more than 70% of all user questions were answered with relevant texts, the behaviors of both users and the system were reasonable with most dialogs, and most of the extracted descriptions for ask-backs were suitably matched. Chapter 6 concludes the thesis.
Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第11209号
情博第135号
新制||情||31(附属図書館)
UT51-2004-T178
京都大学大学院情報学研究科知能情報学専攻
(主査)教授松山隆司, 教授河原達也, 助教授佐藤理史
学位規則第4条第1項該当

APA, Harvard, Vancouver, ISO, and other styles

45

Rouquet, David. "Multilinguisation d'ontologies dans le cadre de la recherche d'information translingue dans des collections d'images accompagnées de textes spontanés." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00743652.

Full text

Abstract:

Le Web est une source proliférante d'objets multimédia, décrits dans différentes langues natu- relles. Afin d'utiliser les techniques du Web sémantique pour la recherche de tels objets (images, vidéos, etc.), nous proposons une méthode d'extraction de contenu dans des collections de textes multilingues, paramétrée par une ou plusieurs ontologies. Le processus d'extraction est utilisé pour indexer les objets multimédia à partir de leur contenu textuel, ainsi que pour construire des requêtes formelles à partir d'énoncés spontanés. Il est basé sur une annotation interlingue des textes, conservant les ambiguïtés de segmentation et la polysémie dans des graphes. Cette première étape permet l'utilisation de processus de désambiguïsation "factorisés" au niveau d'un lexique pivot (de lexèmes interlingues). Le passage d'une ontologie en paramètre du système se fait en l'alignant de façon automatique avec le lexique interlingue. Il est ainsi possible d'utiliser des ontologies qui n'ont pas été conçues pour une utilisation multilingue, et aussi d'ajouter ou d'étendre l'ensemble des langues et leurs couvertures lexicales sans modifier les ontologies. Un démonstrateur pour la recherche multilingue d'images, développé pour le projet ANR OMNIA, a permis de concrétiser les approches proposées. Le passage à l'échelle et la qualité des annotations produites ont ainsi pu être évalués.

APA, Harvard, Vancouver, ISO, and other styles

46

Carney, Nathaniel. "Diagnosing L2 English Learners’ Listening comprehension abilities with Scripted and Unscripted Listening Texts." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/529140.

Full text

Abstract:

Teaching & Learning
Ph.D.
L2 listening research has moved toward a focus on understanding the process of listening. However, there are still few detailed studies of L2 listening that reveal learners’ comprehension processes when listening to scripted and unscripted listening texts. Studies in which such processing has been discussed have lacked detailed diagnoses of how bottom-up and top-down processing interactively affect listeners’ comprehension. This study was designed to show how listeners’ process and comprehend texts, with a focus on how their bottom-up and top-down processing either assist or impede their comprehension. In this study, a group of 30 L1 Japanese university English language learners’ listening abilities were diagnosed. The 30 participants were at three listening proficiency levels—high, mid, and low—based on TOEIC listening proficiency scores. The diagnostic procedure involved participants listening to two scripted and two unscripted listening texts and then reporting what they comprehended through three tasks—L1 oral recalls, L2 repetitions, and verbal reports. Other data was also collected in the study to relate the comprehension of listening texts to other important listening-related variables including listening proficiency, lexical knowledge, listening anxiety, study abroad experience, short-term phonological memory, and working memory. The main finding of the study was that miscomprehension of listening texts was invariably multi-causal, with a combination of both bottom-up and top-down factors leading to comprehension difficulty. Although not a new finding, the study offered more detail than current research about how bottom-up and top-down processing occur interactively. Regarding the overall difficulty of the listening texts, unscripted texts were more difficult to comprehend than scripted texts, and high-proficiency participants had fewer listening difficulties overall than mid- and low-proficiency participants. Quantitative and qualitative results revealed common processing difficulties among all participants due to L1-related phonological decoding issues (e.g., /l/ vs. /r/), connected speech, unknown lexis, and a lack of familiarity with unscripted speech hesitation phenomena (e.g., um, like). Qualitative transcript examples showed how top-down knowledge influenced misinterpretations of words and phrases interactively with bottom-up information, making inaccurate understandings of listening difficult to overcome. In addition to revealing participants’ difficulties and the severity of their comprehension difficulties, the diagnostic procedure showed common strengths—key words and phrases understood well by participants. High-frequency vocabulary and shorter utterances were both shown to be comprehended well. Finally, quantitative results in the study revealed relationships of participants’ listening comprehension with other important listening related variables. Listening proficiency and listening anxiety had strong relationships with listening comprehension of the listening texts. Working memory and short-term phonological memory had no relationship with listening text comprehension. Finally, study abroad experience showed a relationship with comprehension, but with many caveats, and listening vocabulary knowledge was not related with comprehension, but again, with numerous caveats to consider. Based on the results, theoretical and pedagogical implications were posed. Theoretical implications from the study relate to the understanding of four concerns in L2 listening research. Mainly, data in the study will aid researchers’ understanding of how L2 English listeners process speech interactively (i.e., with bottom-up and top-down information) for comprehension, how L2 English listeners experience connected speech, how L2 listeners deal with unknown lexis, and how L2 listeners experience difficulties with features of unscripted speech. Pedagogical implications of the study include the need for increased teacher and learner awareness of the complexity of L2 listening, the need to have learners to track their own listening development, and the need for teachers to expose learners to unscripted listening texts and make them familiar with features of unscripted speech. Finally, suggestions for further research are posed, including conducting diagnostics assessments of L2 listening with listeners of different L1s and with more varied proficiency levels, using different diagnostic procedures to examine L2 listening comprehension, and using more instruments to understand listening-related variables’ relationships with L2 listening comprehension.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

47

Khalifa, Najib. "Les effets de débordement des biens publics locaux : modéles théoriques et tests empiriques." Montpellier 1, 1995. http://www.theses.fr/1995MON10019.

Full text

Abstract:

L'offre des biens publics locaux par une ville centre d'agglomeration est non optimale si les avantages que procurent ces biens et services profitent aux communes peripheriques sans que celles-ci participent financierement a leur production. Ce phenomene est qualifie dans la litterature economique de "querelle de la centralite". Ce travail a un double objet, le premier est d'explorer d'un point de vue theorique l'origine de cette querelle et les moyens d'y remedier. Le second est de tester la validite de l'hypothese de "l'exploitation" des villes centre d'agglomeration par leurs communes peripheriques, dans le cas de la france et les effets de cette exploitation surles residents des villes centre
The supply of local public goods and services by a central city of an agglomeration is non-optimal if a part of the benefits from these goods and services goes to the suburbs without financial contribution in return. This phenomena is called in the litterature "querelle de la centralite" (suburban central cities exploitation) this work has two objectives, the first one is to explore theoretically the origin of this thesis and to expose the lessons on the ways to solve this problem. The second one is to test its validity in the french case and to point out the effects of this exploitation on the residents of central cities

APA, Harvard, Vancouver, ISO, and other styles

48

Laruelle, Chloé. "Édition, traduction et commentaire des fables de Babrius." Thesis, Bordeaux 3, 2017. http://www.theses.fr/2017BOR30025.

Full text

Abstract:

Cette thèse vise à proposer une édition critique des quelque 143 fables grecques composées en choliambes par Babrius (Ier – IIe siècle après J.-C.), à les traduire en français et à en proposer un commentaire. Un travail complet d’établissement du texte a pour cela été mené, fondé sur l’examen à nouveaux frais des témoins de la tradition directe (papyri, tablettes de cire antiques et manuscrits médiévaux) et sur l’analyse des témoins de la tradition indirecte (la Souda en particulier). Le corpus des fables attribuées à Babrius ne permet pas une histoire du texte traditionnelle, fondée sur un stemma bien déterminé. En effet, les témoins sont peu nombreux, hétérogènes, et leurs leçons si divergentes qu’il est souvent difficile d’en préférer une ; aussi attestent-ils davantage des réécritures et des remaniements successifs dont ces fables ont fait l’objet au cours des siècles qu’ils ne permettent de retrouver avec sûreté la matière originelle voulue par Babrius lui-même. Ce constat a joué un rôle déterminant sur notre décision de nous démarquer des éditeurs précédents. Ces derniers, en effet, désireux de reconstituer un hypothétique « original d’auteur », ont souvent été amenés à réécrire les passages problématiques, si bien qu’ils donnent à lire un texte virtuel, remodelé et figé, incapable de témoigner de l’histoire pourtant passionnante de ce corpus vivant, en perpétuel devenir. C’est pourquoi cette thèse s’attache à élaborer une histoire du texte alternative – c’est-à-dire soucieuse de reconstituer dans sa complexité la fortune des fables de Babrius, l’histoire de leur transmission et de leurs réécritures – et, partant, une édition critique différente, attentive à rendre perceptible pour le lecteur moderne ce processus d’évolution du texte babrien
This doctoral thesis proposes a critical edition of 143 Greek fables composed by Babrius in choliambic verse (1st and 2nd century AD), as well as a French translation and a commentary of the fables. This was achieved by thoroughly establishing the text, through a further examination of the witnesses in the direct tradition (papyri, ancient wax tablets and medieval manuscripts) and through the analysis of the witnesses in the indirect tradition (in particular the Suda). The corpus of fables attributed to Babrius does not permit to establish a traditional history of the text, based on a well-defined stemma. Indeed, there are few, heterogeneous witnesses and their readings diverge so greatly that it is often difficult to choose only one; hence, rather than allowing to retrieve with any degree of certitude the original material intended by Babrius himself, they in fact bear testimony to the numerous rewritings and reworkings of these fables throughout the centuries. This observation was instrumental in our decision to break with the editing tradition. In effect, previous editors, in their will to reconstruct a hypothetical autograph, have often been led to rewrite problematic passages, so that what they propose is a virtual, remodelled and fixed text that is in fact unable to testify to the fascinating history of this living, constantly evolving corpus. This is why this thesis aims to elaborate an alternative history of the text—that is, one that endeavours to reconstitute the complex fortune of Babrius’s fables, through the history of their transmission and rewritings—and, therefore, to propose a different critical edition, that strives to make this evolutionary process of Babrius’s text perceptible to the modern reader

APA, Harvard, Vancouver, ISO, and other styles

49

Van, Oort Jessica. "Dancing in Body and Spirit: Dance and Sacred Performance in Thirteenth-Century Beguine Texts." Diss., Temple University Libraries, 2009. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/45623.

Full text

Abstract:

Dance
Ph.D.
This study examines dance and dance-like sacred performance in four texts by or about the thirteenth-century beguines Elisabeth of Spalbeek, Hadewijch, Mechthild of Magdeburg, and Agnes Blannbekin. These women wrote about dance as a visionary experience of the joys of heaven or the relationship between God and the soul, and they also created physical performances of faith that, while not called dance by medieval authors, seem remarkably dance-like to a modern eye. The existence of these dance-like sacred performances calls into question the commonly-held belief that most medieval Christians denied their bodies in favor of their souls and considered dancing sinful. In contrast to official church prohibitions of dance I present an alternative viewpoint, that of religious Christian women who physically performed their faith. The research questions this study addresses include the following: what meanings did the concept of dance have for medieval Christians; how did both actual physical dances and the concept of dance relate to sacred performance; and which aspects of certain medieval dances and performances made them sacred to those who performed and those who observed? In a historical interplay of text and context, I thematically analyze four beguine texts and situate them within the larger tapestry of medieval dance and sacred performance. This study suggests that medieval Christian concepts of dance, sacred performance, the soul, and the body were complex and fluid; that medieval sacred performance was as much a matter of a correct inner, emotional and spiritual state as it was of appropriate outward, physical actions; and that sacred performance was a powerful, important force in medieval Europe that various Christians used to support their own beliefs or to contest the beliefs and practices of others.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

50

Fitzhugh, Shannon Leigh. "The Coherence Formation Model of Illustrated Text Comprehension: A Path Model of Attention to Multimedia Text." Diss., Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/210583.

Full text

Abstract:

Psychology
Ph.D.
The study reported here tests a model that includes several factors thought to contribute to the comprehension of static multimedia learning materials (i.e. background knowledge, working memory, attention to components as measured with eye movement measures). The model examines the effects of working memory capacity, domain specific (biology) and related domain (geoscience) background knowledge on the visual attention to static multimedia text, and their collective influence on reading comprehension. A similar model has been tested with a previous cohort of students, and has been found to have a good fit to the data (Fitzhugh, Cromley, Newcombe, Perez and Wills, 2010). The present study tests the efficacy of visual cues (signaling) on the comprehension of multimedia texts and the effects of signaling on the relationships between cognitive factors and visual attention. Analysis of Covariance indicated that signaling interacts with background knowledge. Signaling also changes the distribution of attention to varying components of the multimedia display. The path model shows that signaling alters the relationship between domain specific background knowledge (biology) and comprehension as well as that of related background knowledge (geoscience) on comprehension. The nature of the relationships indicates that the characteristics of the reading material influence the type of background knowledge that contributes to comprehension. Results are discussed in terms of their application to a classroom setting.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Collections of texts'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles