Дисертації: "Date extraction"

1

Akasha, Ibrahim Abdurrhman Mohamed. "Extraction and characterisation of protein fraction from date palm (Phoenix dactylifera L.) seeds." Thesis, Heriot-Watt University, 2014. http://hdl.handle.net/10399/2771.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

To meet the challenges of protein price increases from animal sources, the development of new, sustainable and inexpensive proteins sources (nonanimal sources) is of great importance. Date palm (Phoenix dactylifera L.) seeds could be one of these sources. These seeds are considered a waste and a major problem to the food industry. In this thesis we report a physicochemical characterisation of date palm seed protein. Date palm seed was found to be composed of a number of components including protein and amino acids, fat, ash and fibre. The first objective of the project was to extract protein from date palm seed to produce a powder of sufficient protein content to test functional properties. This was achieved using several laboratory scale methods. Protein powders of varying protein content were produced depending on the method used. Most methods were based on solubilisation of the proteins in 0.1M NaOH. Using this method combined with enzymatic hydrolysis of seed polysaccharides (particularly mannans) it was possible to achieve a protein powder of about 40% protein (w/w) compared to a seed protein content of about 6% (w/w). Phenol/TCA extraction gave the protein powder with the highest protein percentage of 68.24% (w/w) and this powder was used for subsequent functional testing. Several factors were found to influence seed protein extraction such as pH, temperature, the extraction time, the solvent to sample ratio and the solvent concentration. Optimum conditions for extraction were found to be pH 10, 45˚C and extraction time of 60 min. The results showed that use of enzymes to hydrolyse and remove seed polysaccharides improved the extraction of date seed protein. Optimal improvement was obtained using Mannaway, which hydrolyses mannans and galactomannans, which gave a powder with 34.82% (w/w) protein compared to the control of 11.15% (w/w) protein. The proteins in the extracted date seed protein were profiled using LC/MSMS. Three-hundred and seventeen proteins were identified. The proteins belonged to all major functional categories. The most abundant proteins were glycinin and β-conglycinin, the two major seed storage proteins of plants. The functional properties of extracted date seed protein were investigated using a range of tests. The thermal properties of date seed proteins were consistent with a powder containing high levels of conglycinin and β-glycinin. The solubility had a similar pH profile to soy protein, but differed in absolute solubility due to differences in non-protein composition. Similarly, water holding and oil holding capacity of date seed protein was lower than for soy protein, probably because of compositional differences. Date seed proteins were able to emulsify oils and had a comparable emulsifying ability and emulsion stability to soy protein isolate. The date seed protein was not a good foaming agent compared to soy protein or whey protein concentrate.

2

Al-Jasser, Mohammed S. "The feasibility of date processing Phoenix dactylifera L. var Sufri components using physical and pectolytic enzyme treatments." Thesis, Loughborough University, 1990. https://dspace.lboro.ac.uk/2134/6928.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The Sufri variety of date is widely cultivated in Saudi Arabia, where it 'produced large quantities of dates. The high quality dates are consumed fresh, dried or preserved, the surplus and second quality dates may be damaged by improper harvesting, handling, transporting and processing. The Sufri variety of date is one of moderate quality and there is a surplus in local markets for processing into overflows to be used as "a base" for the food industry. The present work was conducted to increase soluble solids, including sugars, in the overflows and to maintain the quality of the underflows; chemical analysis of both the overflows and the underflows revealed that Sufri date contains proteins/ amino acids and pectin in small quantities, which can be utilised as byproducts. Physical treatment involved maceration with different ratios (date/water) at mild temperatures (30-60°C) for different times (10- 30 min). Over this range the increase in soluble solids in the overflow was at a minimum but the underflow retained its quality and softening of date tissues was achieved. Different extraction ratios indicated that a lower ratio produced a low overflow which was turbid, but that the higher ratios produced overflows which were dilute. Serial extractions with the same ratio in the initial extraction was not practical. In enzymic treatment, pectolytic enzymes were incubated with date underflows at different concentrations, temperatures for various incubation times. Overflows increased significantly over a short time and at low temperatures (30 min and 30°C) indicating the effectiveness of pectolytic enzymes in releasing more of the overflows, and sugars were increased in the overflows as an indication of the effect of these enzymes on date cell walls. Pure pectolytic enzymes were investigated and it was found that specificity was very important for selection of suitable pectolytic enzymes. It is concluded that the Sufri variety of date is a good source of reducing sugars, and its by-products have a promising future.

3

Al, Bulushi Karima. "Supercritical CO2 extraction of waxes from date palm (Phoenix dactylifera) leaves : optimisation, characterisation, and applications." Thesis, University of York, 2018. http://etheses.whiterose.ac.uk/21257/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The low cost, abundant, underexploited and underutilised renewable agricultural waste residue, date palm leaves (Phoenix dactylifera), were extracted using supercritical carbon dioxide (scCO2) to obtain valuable waxes. The extraction process was optimised using second order factorial design to obtain high yield of waxes. Date palm leaves exhibited relatively high yield of wax of 3.49%, as compared to other agricultural residues extracted with scCO2. Diverse range of lipophilic compounds were characterised and quantified including n-alkanes, free fatty acids, free fatty alcohols, long chain aldehydes, sterols and wax esters. Waxes extracted at different extraction pressure and temperature exhibited significant difference in melting profile (ranging from 35 °C for extractions at 40°C and 80 Bar to melting points of 78 °C for extractions at 100 °C and 400 Bar). Thus, suggesting the opportunity to tailor extraction to meet a target application. ScCO2 extraction has several advantages over organic solvent extraction which were demonstrated in this work. Date palm leaves wax was tested as structuring agent for sunflower oil along with other commercial natural waxes. Date palm wax based oleogel exhibited low critical gelling concentrations compared to other waxes. Chemical composition and crystal morphology for the waxes and their gels were further explored to gain better understanding of their gelling behaviour. Date palm wax exhibited good gelling ability and high thermal stability compared to other commercial waxes. The rheological profile for date palm wax based oleogel was comparable with other natural waxes making it a promising structuring agent in food industry. The scale up of scCO2 extraction was studied at semi-pilot scale and resulted in comparable yields, chemical composition and melting profile of wax to the lab scale. Attempts to further reduce the complexity of the wax by fractional extraction, yielding three different wax fractions with varying in texture, composition and physical properties. Economic aspects of the extraction process were explored to further assess the viability of the process. Cost of Manufacture of date palm wax was initially €14.01 kg−1 wax, which could be further reduced to €8.80 kg-1 wax by biomass pelletising. If the extracted biomass was utilised to generate electricity the costs are further reduced to 3.88 kg−1 wax.

4

Poulain, d'Andecy Vincent. "Système à connaissance incrémentale pour la compréhension de document et la détection de fraude." Thesis, La Rochelle, 2021. http://www.theses.fr/2021LAROS025.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Le Document Understanding est la discipline de l’Intelligence Artificielle qui dote les machines du pouvoir de Lecture. Cette capacité sous-entend de comprendre dans une vision globale l’objet du document, sa classe, et dans une vision locale, des informations précises, des entités. Un double défi est de réussir ces opérations dans plus de 90% des cas tout en éduquant la machine avec le moins d’effort humain possible. Cette thèse défend la possibilité de relever ces défis avec des méthodes à apprentissage incrémental. Nos propositions permettent d’éduquer efficacement et itérativement une machine avec quelques exemples de document. Pour la classification, nous démontrons (1) la possibilité de découvrir itérativement des descripteurs textuels, (2) l’intérêt de tenir compte de l’ordre du discours et (3) l’intérêt d’intégrer dans le modèle de donnée incrémental une mémoire épisodique de quelques Souvenirs d’échantillon. Pour l’extraction d’entité, nous démontrons un modèle structurel itératif à partir d’un graphe en étoile dont la robustesse est améliorée avec quelques connaissances a priori d’ordre général. Conscient de l’importance économique et sociétale de la fraude dans les flux documentaires, cette thèse fait également le point sur cette problématique. Notre contribution est modeste en étudiant les catégories de fraude pour ouvrir des perspectives de recherche. Cette thèse a été conduite dans un cadre atypique en conjonction avec une activité industrielle à Yooz et des projets collaboratifs, en particulier, les projets FEDER SECURDOC soutenu par la région Nouvelle Aquitaine et Labcom IDEAS soutenu par l’ANR
The Document Understanding is the Artificial Intelligence ability for machines to Read documents. In a global vision, it aims the understanding of the document function, the document class, and in a more local vision, it aims the understanding of some specific details like entities. The scientific challenge is to recognize more than 90% of the data. While the industrial challenge requires this performance with the least human effort to train the machine. This thesis defends that Incremental Learning methods can cope with both challenges. The proposals enable an efficient iterative training with very few document samples. For the classification task, we demonstrate (1) the continue learning of textual descriptors, (2) the benefit of the discourse sequence, (3) the benefit of integrating a Souvenir of few samples in the knowledge model. For the data extraction task, we demonstrate an iterative structural model, based on a star-graph representation, which is enhanced by the embedding of few a priori knowledges. Aware about economic and societal impacts because the document fraud, this thesis deals with this issue too. Our modest contribution is only to study the different fraud categories to open further research. This research work has been done in a non-classic framework, in conjunction of industrial activities for Yooz and collaborative research projects like the FEDER Securdoc project supported by la région Nouvelle Aquitaine, and the Labcom IDEAS supported by the ANR

5

Sottovia, Paolo. "Information Extraction from data." Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/242992.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Data analysis is the process of inspecting, cleaning, extract, and modeling data with the intention of extracting useful information in order to support users in their decisions. With the advent of Big Data, data analysis was becoming more complicated due to the volume and variety of data. This process begins with the acquisition of the data and the selection of the data that is useful for the desiderata analysis. With such amount of data, also expert users are not able to inspect the data and understand if a dataset is suitable or not for their purposes. In this dissertation, we focus on five problems in the broad data analysis process to help users find insights from the data when they do not have enough knowledge about its data. First, we analyze the data description problem, where the user is looking for a description of the input dataset. We introduce data descriptions: a compact, readable and insightful formula of boolean predicates that represents a set of data records. Finding the best description for a dataset is computationally expensive and task-specific; we, therefore, introduce a set of metrics and heuristics for generating meaningful descriptions at an interactive performance. Secondly, we look at the problem of order dependency discovery, which discovers another kind of metadata that may help the user in the understanding of characteristics of a dataset. Our approach leverages the observation that discovering order dependencies can be guided by the discovery of a more specific form of dependencies called order compatibility dependencies. Thirdly, textual data encodes much hidden information. To allow this data to reach its full potential, there has been an increasing interest in extracting structural information from it. In this regard, we propose a novel approach for extracting events that are based on temporal co-reference among entities. We consider an event to be a set of entities that collectively experience relationships between them in a specific period of time. We developed a distributed strategy that is able to scale with the largest on-line encyclopedia available, Wikipedia. Then, we deal with the evolving nature of the data by focusing on the problem of finding synonymous attributes in evolving Wikipedia Infoboxes. Over time, several attributes have been used to indicate the same characteristic of an entity. This provides several issues when we are trying to analyze the content of different time periods. To solve it, we propose a clustering strategy that combines two contrasting distance metrics. We developed an approximate solution that we assess over 13 years of Wikipedia history by proving its flexibility and accuracy. Finally, we tackle the problem of identifying movements of attributes in evolving datasets. In an evolving environment, entities not only change their characteristics, but they sometimes exchange them over time. We proposed a strategy where we are able to discover those cases, and we also test our strategy on real datasets. We formally present the five problems that we validate both in terms of theoretical results and experimental evaluation, and we demonstrate that the proposed approaches efficiently scale with a large amount of data.

6

Diedhiou, Djibril. "Fractionnement analytique de la graine de neem (Azadirachta indica A. Juss.) et de la graine de dattier du désert (Balanites aegyptiaca L.) - Valorisation des constituants de la graine de neem par bioraffinage." Thesis, Toulouse, INPT, 2017. http://www.theses.fr/2017INPT0135/document.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les graines de neem et de dattier du désert ont été caractérisées et leurs perspectives de fractionnement orientées. Un procédé de fractionnement des graines de neem en extrudeur bi-vis a été étudié en vue d’une production et d’une valorisation intégrée de ses fractions: huile, coextrait d’azadirachtine, protéines et lipides, et raffinat d’extrusion. La mise en oeuvre de l’eau et des mélanges hydroéthanoliques (jusqu’à 75% d’éthanol) comme solvants d’extraction avec une configuration de l’extrudeur bi-vis définissant quatre zones (une zone d’alimentation, une zone de broyage, une zone d’extraction solide-liquide et une zone de séparation solide/liquide), permet d’extraire au filtrat 83 à 86% de l’azadirachtine, 86 à 92% des lipides et 44 à 74% des protéines de la graine et de produire un raffinat essentiellement fibreux, contenant au plus 8% de lipides, 12% de protéines et 0,82 g/kg d’azadirachtine. Une des meilleures voies de traitement de la suspension que constitue le filtrat brut est la séparation solide-liquide par centrifugation. Ce procédé de séparation permet d’obtenir une émulsion diluée contenant 42 à 64% des lipides et jusqu’à 41% des protéines de la graine. La décantation centrifuge permet de le réaliser efficacement, mais elle peut présenter des inconvénients pour le traitement de grands volumes. Considérée comme sous-produit du traitement du filtrat brut, la phase insoluble peut contenir 24 à 48% des lipides, 32,9 à 47% des protéines et 10 à 13% de l’azadirachtine de la graine. L’eau s’est avérée être le meilleur solvant de ce procédé de fractionnement. Le pressage des graines de neem suivi de l’extraction aqueuse ou hydroalcoolique dans le même extrudeur bi-vis permettent d’exprimer jusqu’à 32% de l’huile de la graine et de récupérer 20% de l’huile de la graine sous forme claire, avec très peu d’azadirachtine, en assurant de meilleurs rendements en azadirachtine et en protéines au filtrat brut. Deux voies de traitement des filtrats ont été étudiées : celle conduisant à une émulsion d’azadirachtine et celle conduisant à l’obtention d’une poudre lyophilisée d’azadirachtine. La valorisation du raffinat d’extrusion, fibreux, a été orientée vers la production d’agromatériaux par thermopressage. Un schéma de bioraffinage de la graine de neem pour la valorisation de ses constituants a été ainsi mis en place
Neem and desert date seeds were characterized and their fractionation perspectives oriented. A process of fractionation of neem seeds in twin-screw extruder has been studied for the purpose of production and integrated valorization of its fractions: oil, co-extract of azadirachtin, proteins and lipids, and extrusion raffinate. The use of water and water/ethanol mixtures (up to 75% ethanol) with a twin-screw extruder configuration defining four zones (a feed zone, a grinding zone, a solidliquid extraction zone and a solid / liquid separation zone), allows to extract from the filtrate 83 to 86% of the azadirachtin, 86 to 92% of the lipids and 44 to 74% of the proteins of the seed thereby producing a raffinate essentially fibrous containing at most 8% lipids, 12% proteins and 0.82 g/kg azadirachtin. One of the best ways of processing the suspension that is the crude filtrate, is a solid-liquid separation by centrifugation. This separation process makes it possible to obtain a diluted emulsion containing 42 to 64% of the lipids and up to 41% of the proteins of the seed. A centrifugation achieves it effectively, but this separation process can have disadvantages in the treatment of large volumes. Considered as a by-product of the treatment of crude filtrate, the insoluble phase can contain 42 to 64% of the lipids, 32.9 to 47% of the proteins and 10 to 13% of the azadirachtin of the seed. Water has proven to be the best solvent in this fractionation process. The pressing of the neem seeds followed by the aqueous or hydroalcoholic extraction in the same twin-screw extruder makes it possible to extract up to 32% of the oil of the seeds and to recover 20% of the seed oil in clear form, with very little azadirachtin, ensuring better extraction yields of azadirachtin and proteins to the crude filtrate. Two treatment pathways of the filtrates were studied: one leading to an emulsion of azadirachtin and another to a freeze-dried powder of azadirachtin. The valorization of the fibrous extrusion raffinate has been oriented towards the production of agromaterials by thermopressing. A biorefinery scheme of the neem seed for the valorization of its constituents has thus be implemented

7

Bigg, Daniel. "Unsupervised financial knowledge extraction." Available from the University of Aberdeen Library and Historic Collections Digital Resources. Online version available for University member only until Jan. 1, 2014, 2009. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?application=DIGITOOL-3&owner=resourcediscovery&custom_att_2=simple_viewer&pid=33589.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Wackersreuther, Bianca. "Efficient Knowledge Extraction from Structured Data." Diss., lmu, 2011. http://nbn-resolving.de/urn:nbn:de:bvb:19-138079.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Thelen, Andrea. "Optimized surface extraction from holographic data." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=980418798.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Zhou, Yuanqiu. "Generating Data-Extraction Ontologies By Example." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd1115.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Williams, Dean Ashley. "Combining data integration and information extraction." Thesis, Birkbeck (University of London), 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.499152.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Heys, Richard. "Extraction of anthropological data with ultrasound." Thesis, Brunel University, 2007. http://bura.brunel.ac.uk/handle/2438/7896.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Human body scanners used to extract anthropological data have a significant drawback, the subject is required to undress or wear tight fitting clothing. This thesis demonstrates an ultrasonic based alternative to the current optical systems, that can potentially operate on a fully clothed subject. To validate the concept several experiments were performed to determine the acoustic properties of multiple garments. The results indicated that such an approach was possible. Beamforming is introduced as a method by which the ultrasonic scanning area can be increased, the concept is thoroughly studied and a clear theoretical analysis is performed. Additionally, Matlab has been used to demonstrate graphically, the results of such analysis, providing an invaluable tool during the simulation, experimental and results stages of the thesis. To evaluate beamfoming as a composite part of ultrasonic body imaging, a hardware solution was necessary. During the concept phase, both FPGA and digital signal processors were evaluated to determine their suitability for the role. An FPGA approach was finally chosen, which allows highly parallel operation, essential to the high acquisition speeds required by some beamforming methodologies. In addition, analogue circuitry was also designed to provide an interface with the ultrasonic transducers, which, included variable gain amplifiers, charge amplifiers and signal conditioning. Finally, a digital acquisition card was used to transfer data between the FPGA and a desktop computer, on which, the sampled data was processed and displayed in a coherent graphical manner. The beamforming results clearly demonstrate that imaging multiple layers in air, with ultrasound, is a viable technique for anthroplogical data collection. Furthermore, a wavelet based method of improving the axial resolution is also proposed and demonstrated.

13

Shunmugam, Nagarajan. "Operational data extraction using visual perception." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292216.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The information era has led the manufacturer of trucks and logistics solution providers are inclined towards software as a service (SAAS) based solutions. With advancements in software technologies like artificial intelligence and deep learning, the domain of computer vision has achieved significant performance boosts that it competes with hardware based solutions. Firstly, data is collected from a large number of sensors which can increase production costs and carbon footprint in the environment. Secondly certain useful physical quantities/variables are impossible to measure or turns out to be very expensive solution. So in this dissertation, we are investigating the feasibility of providing the similar solution using a single sensor (dashboard- camera) to measure multiple variables. This provides a sustainable solution even when scaled up in huge fleets. The video frames that can be collected from the visual perception of the truck (i.e. the on-board camera of the truck) is processed by the deep learning techniques and operational data can be extracted. Certain techniques like the image classification and semantic segmentation outputs were experimented and shows potential to replace costly hardware counterparts like Lidar or radar based solutions.
Informationstiden har lett till att tillverkare av lastbilar och logistiklösningsleve -rantörer är benägna mot mjukvara som en tjänst (SAAS) baserade lösningar. Med framsteg inom mjukvaruteknik som artificiell intelligens och djupinlärnin har domänen för datorsyn uppnått betydande prestationsförstärkningar att konkurrera med hårdvarubaserade lösningar. För det första samlas data in från ett stort antal sensorer som kan öka produktionskostnaderna och koldioxidavtry -cket i miljön. För det andra är vissa användbara fysiska kvantiteter / variabler omöjliga att mäta eller visar sig vara en mycket dyr lösning. Så i denna avhandling undersöker vi möjligheten att tillhandahålla liknande lösning med hjälp av en enda sensor (instrumentbrädkamera) för att mäta flera variabler. Detta ger en hållbar lösning även när den skalas upp i stora flottor. Videoramar som kan samlas in från truckens visuella uppfattning (dvs. lastbilens inbyggda kamera) bearbetas av djupinlärningsteknikerna och operativa data kan extraher -as. Vissa tekniker som bildklassificering och semantiska segmenteringsutgång -ar experimenterades och visar potential att ersätta dyra hårdvaruprojekt som Lidar eller radarbaserade lösningar.

14

Raza, Ali. "Test Data Extraction and Comparison with Test Data Generation." DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/982.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Testing an integrated information system that relies on data from multiple sources can be a challenge, particularly when the data is confidential. This thesis describes a novel test data extraction approach, called semantic-based test data extraction for integrated systems (iSTDE) that solves many of the problems associated with creating realistic test data for integrated information systems containing confidential data. iSTDE reads a consistent cross-section of data from the production databases, manipulates that data to obscure individual identities while still preserving overall semantic data characteristics that are critical to thorough system testing, and then moves that test data to an external test environment. This thesis also presents a theoretical study that compares test-data extraction with a competing technique, named test-data generation. Specifically, this thesis a) describes a comparison method that includes a comprehensive list of characteristics essential for testing the database applications organized into seven different areas, b) presents an analysis of the relative strengths and weaknesses of the different test-data creation techniques, and c) reports a number of specific conclusions that will help testers make appropriate choices.

15

Jungbluth, Adolfo, and Jon Li Yeng. "Quality data extraction methodology based on the labeling of coffee leaves with nutritional deficiencies." Association for Computing Machinery, 2018. http://hdl.handle.net/10757/624685.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado.
Nutritional deficiencies detection for coffee leaves is a task which is often undertaken manually by experts on the field known as agronomists. The process they follow to carry this task is based on observation of the different characteristics of the coffee leaves while relying on their own experience. Visual fatigue and human error in this empiric approach cause leaves to be incorrectly labeled and thus affecting the quality of the data obtained. In this context, different crowdsourcing approaches can be applied to enhance the quality of the data extracted. These approaches separately propose the use of voting systems, association rule filters and evolutive learning. In this paper, we extend the use of association rule filters and evolutive approach by combining them in a methodology to enhance the quality of the data while guiding the users during the main stages of data extraction tasks. Moreover, our methodology proposes a reward component to engage users and keep them motivated during the crowdsourcing tasks. The extracted dataset by applying our proposed methodology in a case study on Peruvian coffee leaves resulted in 93.33% accuracy with 30 instances collected by 8 experts and evaluated by 2 agronomic engineers with background on coffee leaves. The accuracy of the dataset was higher than independently implementing the evolutive feedback strategy and an empiric approach which resulted in 86.67% and 70% accuracy respectively under the same conditions.
Revisión por pares

16

Lee, Seungkyu Liu Yanxi. "Symmetry group extraction from multidimensional real data." [University Park, Pa.] : Pennsylvania State University, 2009. http://etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-4720/index.html.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

17

King, Brent. "Automatic extraction of knowledge from design data." Thesis, University of Sunderland, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307964.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Guo, Jinsong. "Reducing human effort in web data extraction." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:04bd39dd-bfec-4c07-91db-980fcbc745ba.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The human effort in large-scale web data extraction significantly affects both the extraction flexibility and the economic cost. Our work aims to reduce the human effort required by web data extraction tasks in three specific scenarios. (I) Data demand is unclear, and the user has to guide the wrapper induction by annotations. To maximally save the human effort in the annotation process, wrappers should be robust, i.e., immune to the webpage's change, to avoid the wrapper re-generation which requires a re-annotation process. Existing approaches primarily aim at generating accurate wrappers but barely generate robust wrappers. We prove that the XPATH wrapper induction problem is NP-hard, and propose an approximate solution estimating a set of top-k robust wrappers in polynomial time. Our method also meets one additional requirement that the induction process should be noise resistant, i.e., tolerate slightly erroneous examples. (II) Data demand is clear, and the user's guide should be avoided, i.e., the wrapper generation should be fully-unsupervised. Existing unsupervised methods purely relying on the repeated patterns of HTML structures/visual information are far from being practical. Partially supervised methods, such as the state-of-the-art system DIADEM, can work well for tasks involving only a small number of domains. However, the human effort in the annotator preparation process becomes a heavier burden when the domain number increases. We propose a new approach, called RED (abbreviation for 'redundancy'), an automatic approach exploiting content redundancy between the result page and its corresponding detail pages. RED requires no annotation (thus requires no human effort) and its wrapper accuracy is significantly higher than that of previous unsupervised methods. (III) Data quality is unknown, and the user's related decisions are blind. Without knowing the error types and the error number of each type in the extracted data, the extraction effort could be wasted on useless websites, and even worse, the human effort could be wasted on unnecessary or wrongly-targeted data cleaning process. Despite the importance of error estimation, no methods have addressed it sufficiently. We focus on two types of common errors in web data, namely duplicates and violations of integrity constraints. We propose a series of error estimation approaches by adapting, extending, and synthesizing some recent innovations in diverse areas such as active learning, classifier calibration, F-measure estimation, and interactive training.

19

Yang, Hui. "Data extraction in holographic particle image velocimetry." Thesis, Loughborough University, 2004. https://dspace.lboro.ac.uk/2134/35012.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Holographic Particle Image Velocimetry (HPIV) is potentially the best technique to obtain instantaneous, three-dimensional, flow field information. Several researchers have presented their experimental results to demonstrate the power of HPIV technique. However, the challenge to find an economical and automatic means to extract and process the immense amount of data from the holograms still remains. This thesis reports on the development of complex amplitude correlation as a means of data extraction. At the same time, three-dimensional quantitative measurements for a micro scale flow is of increasing importance in the design of microfluidic devices. This thesis also reports the investigation of HPIV in micro-scale fluid flow. The author has re-examined complex amplitude correlation using a formulation of scalar diffraction in three-dimensional vector space.

20

Rangaraj, Jithendra Kumar. "Knowledge-based Data Extraction Workbench for Eclipse." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1354290498.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Ouahid, Hicham. "Data extraction from the Web using XML." Thesis, University of Ottawa (Canada), 2001. http://hdl.handle.net/10393/9260.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This thesis presents a mechanism based on eXtensible Markup Language (XML) to extract data from HTML-based Web pages and populate relational databases. This task is performed by a system called the XML-based Web Agent (XWA). The data extraction is done in three phases. First, the Web pages are converted to well-formed XML documents to facilitate their processing. Second, the data is extracted from the well-formed XML documents and formatted into valid XML documents. Finally, the valid XML documents are mapped into tables to be stored in a relational database. To extract specific data from the Web, the XWA requires information about the Web pages from which to extract the data, the location of the data within the Web pages, and how the extracted data should be formatted. This information is stored in Web Site Ontologies which are built using a language called the Web Ontology Description Language (WONDEL). WONDEL is based on XML and XML Pointer Language. It has been defined as a part of this work to allow users to specify the data they want, and let the XWA work offline to extract it and store it in a database. This has the advantage of saving users the time waiting for the Web pages to download, and taking benefit from the powerful query mechanism offered by database management systems.

22

Bródka, Piotr. "Key User Extraction Based on Telecommunication Data." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5863.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The number of systems that collect vast amount of data about users rapidly grow during last few years. Many of these systems contain data not only about people characteristics but also about their relationships with other system users. From this kind of data it is possible to extract a social network that reflects the connections between system’s users. Moreover, the analysis of such social network enables to investigate different characteristics of its users and their linkages. One of the types of examining such network is key users extraction. Key users are these who have the biggest impact on other network users as well as have big influence on network evolution. The obtained knowledge about these users enables to investigate and predict changes within the network. So this knowledge is very important for the people or companies who make a profit from the network like telecommunication company. The second important issue is the ability to extract these users as quick as possible, i.e. developed the algorithm that will be time-effective in large social networks where number of nodes and edges is equal few millions.

23

Wessman, Alan E. "A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd684.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Chartrand, Timothy Adam. "Ontology-Based Extraction of RDF Data from the World Wide Web." BYU ScholarsArchive, 2003. https://scholarsarchive.byu.edu/etd/56.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The simplicity and proliferation of the World Wide Web (WWW) has taken the availability of information to an unprecedented level. The next generation of the Web, the Semantic Web, seeks to make information more usable by machines by introducing a more rigorous structure based on ontologies. One hinderance to the Semantic Web is the lack of existing semantically marked-up data. Until there is a critical mass of Semantic Web data, few people will develop and use Semantic Web applications. This project helps promote the Semantic Web by providing content. We apply existing information-extraction techniques, in particular, the BYU ontologybased data-extraction system, to extract information from the WWW based on a Semantic Web ontology to produce Semantic Web data with respect to that ontology. As an example of how the generated Semantic Web data can be used, we provide an application to browse the extracted data and the source documents together. In this sense, the extracted data is superimposed over or is an index over the source documents. Our experiments with ontologies in four application domains show that our approach can indeed extract Semantic Web data from the WWW with precision and recall similar to that achieved by the underlying information extraction system and make that data accessible to Semantic Web applications.

25

Morsey, Mohamed. "Efficient Extraction and Query Benchmarking of Wikipedia Data." Doctoral thesis, Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-130593.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Knowledge bases are playing an increasingly important role for integrating information between systems and over the Web. Today, most knowledge bases cover only specific domains, they are created by relatively small groups of knowledge engineers, and it is very cost intensive to keep them up-to-date as domains change. In parallel, Wikipedia has grown into one of the central knowledge sources of mankind and is maintained by thousands of contributors. The DBpedia (http://dbpedia.org) project makes use of this large collaboratively edited knowledge source by extracting structured content from it, interlinking it with other knowledge bases, and making the result publicly available. DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Furthermore, many companies and researchers use DBpedia and its public services to improve their applications and research approaches. However, the DBpedia release process is heavy-weight and the releases are sometimes based on several months old data. Hence, a strategy to keep DBpedia always in synchronization with Wikipedia is highly required. In this thesis we propose the DBpedia Live framework, which reads a continuous stream of updated Wikipedia articles, and processes it. DBpedia Live processes that stream on-the-fly to obtain RDF data and updates the DBpedia knowledge base with the newly extracted data. DBpedia Live also publishes the newly added/deleted facts in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Moreover, the new DBpedia Live framework incorporates several significant features, e.g. abstract extraction, ontology changes, and changesets publication. Basically, knowledge bases, including DBpedia, are stored in triplestores in order to facilitate accessing and querying their respective data. Furthermore, the triplestores constitute the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triplestore implementations. We introduce a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triplestores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triplestores and provide results for the popular triplestore implementations Virtuoso, Sesame, Apache Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triplestores is by far less homogeneous than suggested by previous benchmarks. Further, one of the crucial tasks when creating and maintaining knowledge bases is validating their facts and maintaining the quality of their inherent data. This task include several subtasks, and in thesis we address two of those major subtasks, specifically fact validation and provenance, and data quality The subtask fact validation and provenance aim at providing sources for these facts in order to ensure correctness and traceability of the provided knowledge This subtask is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. We present DeFacto (Deep Fact Validation), which is an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. On the other hand the subtask of data quality maintenance aims at evaluating and continuously improving the quality of data of the knowledge bases. We present a methodology for assessing the quality of knowledge bases’ data, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia.

26

Lin, Qingfen. "Enhancement, Extraction, and Visualization of 3D Volume Data." Doctoral thesis, Linköping : Univ, 2003. http://www.bibl.liu.se/liupubl/disp/disp2003/tek824s.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Palmer, David Donald. "Modeling uncertainty for information extraction from speech data /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/5834.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Tao, Cui. "Schema Matching and Data Extraction over HTML Tables." Diss., CLICK HERE for online access, 2003. http://contentdm.lib.byu.edu/ETD/image/etd279.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Gottlieb, Matthew. "Understanding malware autostart techniques with web data extraction /." Online version of thesis, 2009. http://hdl.handle.net/1850/10632.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Laidlaw, David H. Barr Alan H. "Geometric model extraction from magnetic resonance volume data /." Diss., Pasadena, Calif. : California Institute of Technology, 1995. http://resolver.caltech.edu/CaltechETD:etd-10152007-132141.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Pham, Nam Wilamowski Bogdan M. "Data extraction from servers by the Internet Robot." Auburn, Ala, 2009. http://hdl.handle.net/10415/1781.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Cheung, Jarvis T. "Representation and extraction of trends from process data." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/13186.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Stachowiak, Maciej 1976. "Automated extraction of structured data from HTML documents." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9896.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.
Includes bibliographical references (leaf 45).
by Maciej Stachowiak.
M.Eng.

34

Lazzarini, Nicola. "Knowledge extraction from biomedical data using machine learning." Thesis, University of Newcastle upon Tyne, 2017. http://hdl.handle.net/10443/3839.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Thanks to the breakthroughs in biotechnologies that have occurred during the recent years, biomedical data is accumulating at a previously unseen pace. In the field of biomedicine, decades-old statistical methods are still commonly used to analyse such data. However, the simplicity of these approaches often limits the amount of useful information that can be extracted from the data. Machine learning methods represent an important alternative due to their ability to capture complex patterns, within the data, likely missed by simpler methods. This thesis focuses on the extraction of useful knowledge from biomedical data using machine learning. Within the biomedical context, the vast majority of machine learning applications focus their e↵ort on the generation and validation of prediction models. Rarely the inferred models are used to discover meaningful biomedical knowledge. The work presented in this thesis goes beyond this scenario and devises new methodologies to mine machine learning models for the extraction of useful knowledge. The thesis targets two important and challenging biomedical analytic tasks: (1) the inference of biological networks and (2) the discovery of biomarkers. The first task aims to identify associations between di↵erent biological entities, while the second one tries to discover sets of variables that are relevant for specific biomedical conditions. Successful solutions for both problems rely on the ability to recognise complex interactions within the data, hence the use of multivariate machine learning methods. The network inference problem is addressed with FuNeL: a protocol to generate networks based on the analysis of rule-based machine learning models. The second task, the biomarker discovery, is studied with RGIFE, a heuristic that exploits the information extracted from machine learning models to guide its search for minimal subsets of variables. The extensive analysis conducted for this dissertation shows that the networks inferred with FuNeL capture relevant knowledge complementary to that extracted by standard inference methods. Furthermore, the associations defined by FuNeL are discovered - 6 - more pertinent in a disease context. The biomarkers selected by RGIFE are found to be disease-relevant and to have a high predictive power. When applied to osteoarthritis data, RGIFE confirmed the importance of previously identified biomarkers, whilst also extracting novel biomarkers with possible future clinical applications. Overall, the thesis shows new e↵ective methods to leverage the information, often remaining buried, encapsulated within machine learning models and discover useful biomedical knowledge.

35

Novelli, Noël. "Extraction de dépendances fonctionnetitre : Une approche Data Mining." Aix-Marseille 2, 2000. http://www.theses.fr/2000AIX22071.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Jiang, Ji Chu. "High Precision Deep Learning-Based Tabular Data Extraction." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/41699.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The advancements of AI methodologies and computing power enables automation and propels the Industry 4.0 phenomenon. Information and data are digitized more than ever, millions of documents are being processed every day, they are fueled by the growth in institutions, organizations, and their supply chains. Processing documents is a time consuming laborious task. Therefore automating data processing is a highly important task for optimizing supply chains efficiency across all industries. Document analysis for data extraction is an impactful field, this thesis aims to achieve the vital steps in an ideal data extraction pipeline. Data is often stored in tables since it is a structured formats and the user can easily associate values and attributes. Tables can contain vital information from specifications, dimensions, cost etc. Therefore focusing on table analysis and recognition in documents is a cornerstone to data extraction. This thesis applies deep learning methodologies for automating the two main problems within table analysis for data extraction; table detection and table structure detection. Table detection is identifying and localizing the boundaries of the table. The output of the table detection model will be inputted into the table structure detection model for structure format analysis. Therefore the output of the table detection model must have high localization performance otherwise it would affect the rest of the data extraction pipeline. Our table detection improves bounding box localization performance by incorporating a Kullback–Leibler loss function that calculates the divergence between the probabilistic distribution between ground truth and predicted bounding boxes. As well as adding a voting procedure into the non-maximum suppression step to produce better localized merged bounding box proposals. This model improved precision of tabular detection by 1.2% while achieving the same recall as other state-of-the-art models on the public ICDAR2013 dataset. While also achieving state-of-the-art results of 99.8% precision on the ICDAR2017 dataset. Furthermore, our model showed huge improvements espcially at higher intersection over union (IoU) thresholds; at 95% IoU an improvement of 10.9% can be seen for ICDAR2013 dataset and an improvement of 8.4% can be seen for ICDAR2017 dataset. Table structure detection is recognizing the internal layout of a table. Often times researchers approach this through detecting the rows and columns. However, in order for correct mapping of each individual cell data location in the semantic extraction step the rows and columns would have to be combined and form a matrix, this introduces additional degrees of error. Alternatively we propose a model that directly detects each individual cell. Our model is an ensemble of state-of-the-art models; Hybird Task Cascade as the detector and dual ResNeXt101 backbones arranged in a CBNet architecture. There is a lack of quality labeled data for table cell structure detection, therefore we hand labeled the ICDAR2013 dataset, and we wish to establish a strong baseline for this dataset. Our model was compared with other state-of-the-art models that excelled at table or table structure detection. Our model yielded a precision of 89.2% and recall of 98.7% on the ICDAR2013 cell structure dataset.

37

Einstein, Noah. "SmartHub: Manual Wheelchair Data Extraction and Processing Device." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555352793977171.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Müglich, Marcel. "Motion Feature Extraction of Video and Movie Data." Thesis, KTH, Numerisk analys, NA, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214030.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Since the Video on Demand market grows at a fast rate in terms of available content and user numbers, the task arises to match personal relevant content to each individual user. This problem is tackled by implementing a recommondation system which finds relevant content by automatically detecting patterns in the individual user’s behaviour. To find such patterns, either collaborative filtering, which evaluates patterns of user groups to draw conclusions about a single user’s preferences, or content based strategies can be applied. Those content strategies analyze the watched movies of the individual user and extract quantifiable information from them. This information can be utilized to find relevant movies with similar features. The focus of this thesis lies on the extraction of motion features from movie and video data. Three feature extraction methods are presented and evaluated which classify camera movement, estimate the motion intensity and detect film transitions.
VOD-marknaden (Video på begäran) är en växande marknad, dels i mängden tillgängligt innehåll samt till antalet användare. Det skapar en utmaning att matcha personligt relevant innehåll för varje enskild användare. Utmaningen hanteras genom att implementera ett rekommendationssystem som hittar relevant innehåll genom att automatiskt identifiera mönster i varje användaren beteende. För att hitta sådana mönster används i vanliga fall Collaborative filtering; som utvärderar mönster utifrån grupper av flera användare och kors- rekommenderar produkter mellan dem utan att ta nämnvärd hänsyn till produktens innehåll. (De som har köpt X har också köpt Y) Ett alternativ till detta är att tillämpa en innehållsbaserad strategi. Innehållsbaserade strategier analyserar den faktiska video-datan i de produkter som har konsumerats av en enskild användare med syfte att därifrån extrahera kvantifierbar information. Denna information kan användas för att hitta relevanta filmer med liknande videoinnehåll. Inriktningen för denna avhandling berör utvinning av kamerarörelsevektorer från film- och videodata. Tre extraktionsmetoder presenteras och utvärderas för att klassificera kamerans rörelse, kamerarörelsen intensitet och för att detektera scenbyten.

39

García-Martín, Eva. "Extraction and Energy Efficient Processing of Streaming Data." Licentiate thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-15532.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The interest in machine learning algorithms is increasing, in parallel with the advancements in hardware and software required to mine large-scale datasets. Machine learning algorithms account for a significant amount of energy consumed in data centers, which impacts the global energy consumption. However, machine learning algorithms are optimized towards predictive performance and scalability. Algorithms with low energy consumption are necessary for embedded systems and other resource constrained devices; and desirable for platforms that require many computations, such as data centers. Data stream mining investigates how to process potentially infinite streams of data without the need to store all the data. This ability is particularly useful for companies that are generating data at a high rate, such as social networks. This thesis investigates algorithms in the data stream mining domain from an energy efficiency perspective. The thesis comprises of two parts. The first part explores how to extract and analyze data from Twitter, with a pilot study that investigates a correlation between hashtags and followers. The second and main part investigates how energy is consumed and optimized in an online learning algorithm, suitable for data stream mining tasks. The second part of the thesis focuses on analyzing, understanding, and reformulating the Very Fast Decision Tree (VFDT) algorithm, the original Hoeffding tree algorithm, into an energy efficient version. It presents three key contributions. First, it shows how energy varies in the VFDT from a high-level view by tuning different parameters. Second, it presents a methodology to identify energy bottlenecks in machine learning algorithms, by portraying the functions of the VFDT that consume the largest amount of energy. Third, it introduces dynamic parameter adaptation for Hoeffding trees, a method to dynamically adapt the parameters of Hoeffding trees to reduce their energy consumption. The results show an average energy reduction of 23% on the VFDT algorithm.
Scalable resource-efficient systems for big data analytics

40

Nziga, Jean-Pierre. "Incremental Sparse-PCA Feature Extraction For Data Streams." NSUWorks, 2015. http://nsuworks.nova.edu/gscis_etd/365.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Intruders attempt to penetrate commercial systems daily and cause considerable financial losses for individuals and organizations. Intrusion detection systems monitor network events to detect computer security threats. An extensive amount of network data is devoted to detecting malicious activities. Storing, processing, and analyzing the massive volume of data is costly and indicate the need to find efficient methods to perform network data reduction that does not require the data to be first captured and stored. A better approach allows the extraction of useful variables from data streams in real time and in a single pass. The removal of irrelevant attributes reduces the data to be fed to the intrusion detection system (IDS) and shortens the analysis time while improving the classification accuracy. This dissertation introduces an online, real time, data processing method for knowledge extraction. This incremental feature extraction is based on two approaches. First, Chunk Incremental Principal Component Analysis (CIPCA) detects intrusion in data streams. Then, two novel incremental feature extraction methods, Incremental Structured Sparse PCA (ISSPCA) and Incremental Generalized Power Method Sparse PCA (IGSPCA), find malicious elements. Metrics helped compare the performance of all methods. The IGSPCA was found to perform as well as or better than CIPCA overall in term of dimensionality reduction, classification accuracy, and learning time. ISSPCA yielded better results for higher chunk values and greater accumulation ratio thresholds. CIPCA and IGSPCA reduced the IDS dataset to 10 principal components as opposed to 14 eigenvectors for ISSPCA. ISSPCA is more expensive in terms of learning time in comparison to the other techniques. This dissertation presents new methods that perform feature extraction from continuous data streams to find the small number of features necessary to express the most data variance. Data subsets derived from a few important variables render their interpretation easier. Another goal of this dissertation was to propose incremental sparse PCA algorithms capable to process data with concept drift and concept shift. Experiments using WaveForm and WaveFormNoise datasets confirmed this ability. Similar to CIPCA, the ISSPCA and IGSPCA updated eigen-axes as a function of the accumulation ratio value, forming informative eigenspace with few eigenvectors.

41

Xiang, Deliang. "Urban Area Information Extraction From Polarimetric SAR Data." Doctoral thesis, KTH, Skolan för arkitektur och samhällsbyggnad (ABE), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187951.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Polarimetric Synthetic Aperture Radar (PolSAR) has been used for various remote sensing applications since more information could be obtained in multiple polarizations. The overall objective of this thesis is to investigate urban area information extraction from PolSAR data with the following specific objectives: (1) to exploit polarimetric scattering model-based decomposition methods for urban areas, (2) to investigate effective methods for man-made target detection, (3) to develop edge detection and superpixel generation methods, and (4) to investigate urban area classification and segmentation. Paper 1 proposes a new scattering coherency matrix to model the cross-polarized scattering component from urban areas, which adaptively considers the polarization orientation angles of buildings. Thus, the HV scattering components from forests and oriented urban areas can be modelled respectively. Paper 2 presents two urban area decompositions using this scattering model. After the decomposition, urban scattering components can be effectively extracted. Paper 3 presents an improved man-made target detection method for PolSAR data based on nonstationarity and asymmetry. Reflection asymmetry was incorporate into the azimuth nonstationarity extraction method to improve the man-made target detection accuracy, i.e., removing the natural areas and detecting the small targets. In Paper 4, the edge detection of PolSAR data was investigated using SIRV model and Gauss-shaped filter. This detector can locate the edge pixels accurately with fewer omissions. This could be useful for speckle noise reduction, superpixel generation and others. Paper 5 investigates an unsupervised classification method for PolSAR data in urban areas. The ortho and oriented buildings can be discriminated very well. Paper 6 proposes an adaptive superpixel generation method for PolSAR images. The algorithm produces compact superpixels that can well adhere to image boundaries in both natural and urban areas.
Polarimetriska Synthetic Aperture Radar (PolSAR) har använts för olika fjärranalystillämpningar för, eftersom mer information kan erhållas från multipolarisad data. Det övergripande syftet med denna avhandling är att undersöka informationshämtning över urbana områden från PolSAR data med följande särskilda mål: (1) att utnyttja polarimetrisk spridningsmodellbaserade nedbrytningsmetoder för stadsområden, (2) att undersöka effektiva metoder för upptäckt av konstgjorda objekt, (3) att utveckla metoder som kantavkänning och superpixel generation, och (4) för att undersöka klassificering och segmentering av stadsområden. Artikel 1 föreslår en ny spridnings-koherens matris för att modellera korspolariserade spridningskomponent från tätorter, som adaptivt utvärderar polariseringsorienteringsvinkel av byggnader. Artikel 2 presenterar nedbrytningstekniken över två urbana områden med hjälp av denna spridningsmodell. Efter nedbrytningen kunde urbana spridningskomponenter effektivt extraheras. Artikel 3 presenterar en förbättrad detekteringsmetod för konstgjorda mål med PolSAR data baserade på icke-stationaritet och asymmetri. integrerades reflektionsasymmetri i icke-stationaritetsmetoden för att förbättra noggrannheten i upptäckten av konstgjorda föremål, dvs. att ta bort naturområden och upptäcka de små föremålen. I artikel 4 undersöktes kantdetektering av PolSAR data med hjälp av SIRV modell och ett Gauss-formad filter. Denna detektor kan hitta kantpixlarna noggrant med mindre utelämnande. Detta skulle den vara användbar för reduktion av brus, superpixel generation och andra. Artikel 5 utforskar en oövervakad klassificeringsmetod av PolSAR data över stadsområden. Orto- och orienterade byggnader kan särskiljas mycket väl. Baserat på artikel 4 föreslår artikel 6 en adaptiv superpixel generationensmetod för PolSAR data. Algoritmen producerar kompakta superpixels som kan kommer att följa bildgränser i både naturliga och stadsområden.

QC 20160607

42

Alves, Ricardo João de Freitas. "Declarative approach to data extraction of web pages." Master's thesis, Faculdade de Ciências e Tecnologia, 2009. http://hdl.handle.net/10362/5822.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
In the last few years, we have been witnessing a noticeable WEB evolution with the introduction of significant improvements at technological level, such as the emergence of XHTML, CSS,Javascript, and Web2.0, just to name ones. This, combined with other factors such as physical expansion of the Web, as well as its low cost, have been the great motivator for the organizations and the general public to join, with a consequent growth in the number of users and thus influencing the volume of the largest global data repository. In consequence, there was an increasing need for regular data acquisition from the WEB, and because of its frequency, length or complexity, it would only be viable to obtain through automatic extractors. However, two main difficulties are inherent to automatic extractors. First, much of the Web's information is presented in visual formats mainly directed for human reading. Secondly, the introduction of dynamic webpages, which are brought together in local memory from different sources, causing some pages not to have a source file. Therefore, this thesis proposes a new and more modern extractor, capable of supporting the Web evolution, as well as being generic, so as to be able to be used in any situation, and capable of being extended and easily adaptable to a more particular use. This project is an extension of an earlier one which had the capability of extractions on semi-structured text files. However it evolved to a modular extraction system capable of extracting data from webpages, semi-structured text files and be expanded to support other data source types. It also contains a more complete and generic validation system and a new data delivery system capable of performing the earlier deliveries as well as new generic ones. A graphical editor was also developed to support the extraction system features and to allow a domain expert without computer knowledge to create extractions with only a few simple and intuitive interactions on the rendered webpage.

43

Seegmiller, Ray D., Greg C. Willden, Maria S. Araujo, Todd A. Newton, Ben A. Abbott, and William A. Malatesta. "Automation of Generalized Measurement Extraction from Telemetric Network Systems." International Foundation for Telemetering, 2012. http://hdl.handle.net/10150/581647.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

ITC/USA 2012 Conference Proceedings / The Forty-Eighth Annual International Telemetering Conference and Technical Exhibition / October 22-25, 2012 / Town and Country Resort & Convention Center, San Diego, California
In telemetric network systems, data extraction is often an after-thought. The data description frequently changes throughout the program so that last minute modifications of the data extraction approach are often required. This paper presents an alternative approach in which automation of measurement extraction is supported. The central key is a formal declarative language that can be used to configure instrumentation devices as well as measurement extraction devices. The Metadata Description Language (MDL) defined by the integrated Network Enhanced Telemetry (iNET) program, augmented with a generalized measurement extraction approach, addresses this issue. This paper describes the TmNS Data Extractor Tool, as well as lessons learned from commercial systems, the iNET program and TMATS.

44

Selig, Henny. "Continuous Event Log Extraction for Process Mining." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210710.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Process mining is the application of data science technologies on transactional business data to identify or monitor processes within an organization. The analyzed data often originates from process-unaware enterprise software, e.g. Enterprise Resource Planning (ERP) systems. The differences in data management between ERP and process mining systems result in a large fraction of ambiguous cases, affected by convergence and divergence. The consequence is a chasm between the process as interpreted by process mining, and the process as executed in the ERP system. In this thesis, a purchasing process of an SAP ERP system is used to demonstrate, how ERP data can be extracted and transformed into a process mining event log that expresses ambiguous cases as accurately as possible. As the content and structure of the event log already define the scope (i.e. which process) and granularity (i.e. activity types), the process mining results depend on the event log quality. The results of this thesis show how the consideration of case attributes, the notion of a case and the granularity of events can be used to manage the event log quality. The proposed solution supports continuous event extraction from the ERP system.
Process mining är användningen av datavetenskaplig teknik för transaktionsdata, för att identifiera eller övervaka processer inom en organisation. Analyserade data härstammar ofta från processomedvetna företagsprogramvaror, såsom SAP-system, vilka är centrerade kring affärsdokumentation. Skillnaderna i data management mellan Enterprise Resource Planning (ERP)och process mining-system resulterar i en stor andel tvetydiga fall, vilka påverkas av konvergens och divergens. Detta resulterar i ett gap mellan processen som tolkas av process mining och processen som exekveras i ERP-systemet. I denna uppsats används en inköpsprocess för ett SAP ERP-system för att visa hur ERP-data kan extraheras och omvandlas till en process mining-orienterad händelselogg som uttrycker tvetydiga fall så precist som möjligt. Eftersom innehållet och strukturen hos händelseloggen redan definierar omfattningen (vilken process) och granularitet (aktivitetstyperna), så beror resultatet av process mining på kvalitén av händelseloggen. Resultaten av denna uppsats visar hur definitioner av typfall och händelsens granularitet kan användas för att förbättra kvalitén. Den beskrivna lösningen stöder kontinuerlig händelseloggsextraktion från ERPsystemet.

45

Fontanarava, Julien. "Signal Extraction from Scans of Electrocardiograms." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-248430.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In this thesis, we propose a Deep Learning method for fully automated digitization of ECG (Electrocardiogram) sheets. We perform the digitization of ECG sheets in three steps: layout detection, column-wise signal segmentation, and finally signal retrieval - each of them performed by a Convolutional Neural Network. These steps leverage advances in the fields of object detection and pixel-wise segmentation due to the rise of CNNs in image processing. We train each network on synthetic images that reect the challenges of real-world data. The use of these realistic synthetic images aims at making our models robust to the variability of real-world ECG sheets. Compared with computer vision benchmarks, our networks show promising results. Our signal retrieval network significantly outperforms our implementation of the benchmark. Our column segmentation model shows robustness to overlapping signals, an issue of signal segmentation that computer vision methods are not equipped to deal with. Overall, this fully automated pipeline provides a gain in time and precision for physicians willing to digitize their ECG database.
I detta examensarbete föreslår vi en Deep Learning-metod för fullständig automatiserad digitalisering av EKG-grafer. Vi utför digitaliseringen av EKG-graferna i tre steg: layoutdetektering, kolumnvis signalsegmentering och slutligen signalhämtning. Var och en av dem utförs av ett faltningsnätverk. Dessa nätverk är inspirerade av nätverk som används för objektdetektering och pixelvis segmentering. Vi tränar varje nätverk på syntetiska bilder som återspeglar utmaningarna i den verkliga datan. Användningen av dessa realistiska syntetiska bilder syftar till att göra våra modeller robusta mot variationer av EKG-graferna i den riktiga världen. Jämfört med riktmärkning från datorseende visar våra nätverk lovande resultat. Vårt signalhämtningsnätverk överträffar avsevärt vår implementering av riktmärket. Vår kolumnsegmenteringsmodell visar robusthet mot överlappande signaler, en fråga om signalsegmentering som metoder i datorseende inte kan hantera. Sammantaget ger denna helautomatiska pipeline en förbättring i tid och precision för läkare som är villiga att digitalisera sina EKG-databaser.

46

Bejugam, Santosh. "Tremor quantification and parameter extraction." Thesis, Mittuniversitetet, Institutionen för informationsteknologi och medier, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-16021.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Tremor is a neuro degenerative disease causing involuntary musclemovements in human limbs. There are many types of tremor that arecaused due to the damage of nerve cells that surrounds thalamus of thefront brain chamber. It is hard to distinguish or classify the tremors asthere are many reasons behind the formation of specific category, soevery tremor type is named behind its frequency type. Propermedication for the cure by physician is possible only when the disease isidentified.Because of the argument given in the above paragraph, there is a needof a device or a technique to analyze the tremor and for extracting theparameters associated with the signal. These extracted parameters canbe used to classify the tremor for onward identification of the disease.There are various diagnostic and treatment monitoring equipment areavailable for many neuromuscular diseases. This thesis is concernedwith the tremor analysis for the purpose of recognizing certain otherneurological disorders. A recording and analysis system for human’stremor is developed.The analysis was performed based on frequency and amplitudeparameters of the tremor. The Fast Fourier Transform (FFT) and higherorderspectra were used to extract frequency parameters (e.g., peakamplitude, fundamental frequency of tremor, etc). In order to diagnosesubjects’ condition, classification was implemented by statisticalsignificant tests (t‐test).

47

Paidipally, Anoop Rao. "Dynamic Data Extraction and Data Visualization with Application to the Kentucky Mesonet." TopSCHOLAR®, 2012. http://digitalcommons.wku.edu/theses/1160.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

There is a need to integrate large-scale database, high-performance computing engines and geographical information system technologies into a user-friendly web interface as a platform for data visualization and customized statistical analysis. We present some concepts and design ideas regarding dynamic data storage and extraction by making use of open-source computing and mapping technologies. We implemented our methods to the Kentucky Mesonet automated weather mapping workflow. The main components of the work flow includes a web based interface, a robust database and computing infrastructure designed for both general users and power users such as modelers and researchers.

48

Chu, Chenhui. "Integrated Parallel Data Extraction from Comparable Corpora for Statistical Machine Translation." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/199431.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Arpteg, Anders. "Adaptive Semi-structured Information Extraction." Licentiate thesis, Linköping University, Linköping University, KPLAB - Knowledge Processing Lab, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5688.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user.

The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built.

The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions.

An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system.

Report code: LiU-Tek-Lic-2002:73.

50

Bosch, Vicente Juan José. "From heuristics-based to data-driven audio melody extraction." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404678.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications.
La identificación de la melodía en una grabación musical es una tarea relativamente fácil para seres humanos, pero muy difícil para sistemas computacionales. Esta tarea se conoce como "extracción de melodía", más formalmente definida como la estimación automática de la secuencia de alturas correspondientes a la melodía de una grabación de música polifónica. Esta tesis investiga los beneficios de utilizar conocimiento derivado automáticamente de datos para extracción de melodía, combinando procesado digital de la señal y métodos de aprendizaje automático. Ampliamos el alcance de la investigación en este campo, al trabajar con un conjunto de datos variado y múltiples definiciones de melodía. En primer lugar presentamos un extenso análisis comparativo del estado de la cuestión y realizamos una evaluación en un contexto de música sinfónica. A continuación, proponemos métodos de extracción de melodía basados en modelos de fuente-filtro y la caracterización de contornos tonales, y los evaluamos en varios géneros musicales. Finalmente, investigamos la caracterización de contornos con información de timbre, tonalidad y posición espacial, y proponemos un método para la estimación de múltiples líneas melódicas. La combinación de enfoques supervisados y no supervisados lleva a mejoras en la extracción de melodía y muestra un camino prometedor para futuras investigaciones y aplicaciones.

Дисертації з теми "Date extraction"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями