Dissertations / Theses: 'Data processing pipeline'

1

Jakubiuk, Wiktor. "High performance data processing pipeline for connectome segmentation." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/106122.

Full text

Abstract:

Thesis: M. Eng. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February 2016.
"December 2015." Cataloged from PDF version of thesis.
Includes bibliographical references (pages 83-88).
By investigating neural connections, neuroscientists try to understand the brain and reconstruct its connectome. Automated connectome reconstruction from high resolution electron miscroscopy is a challenging problem, as all neurons and synapses in a volume have to be detected. A mm3 of a high-resolution brain tissue takes roughly a petabyte of space that the state-of-the-art pipelines are unable to process to date. A high-performance, fully automated image processing pipeline is proposed. Using a combination of image processing and machine learning algorithms (convolutional neural networks and random forests), the pipeline constructs a 3-dimensional connectome from 2-dimensional cross-sections of a mammal's brain. The proposed system achieves a low error rate (comparable with the state-of-the-art) and is capable of processing volumes of 100's of gigabytes in size. The main contributions of this thesis are multiple algorithmic techniques for 2- dimensional pixel classification of varying accuracy and speed trade-off, as well as a fast object segmentation algorithm. The majority of the system is parallelized for multi-core machines, and with minor additional modification is expected to work in a distributed setting.
by Wiktor Jakubiuk.
M. Eng. in Computer Science and Engineering

APA, Harvard, Vancouver, ISO, and other styles

2

Nakane, Takanori. "Data processing pipeline for serial femtosecond crystallography at SACLA." Kyoto University, 2017. http://hdl.handle.net/2433/217997.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Gu, Wenyu. "Improving the performance of stream processing pipeline for vehicle data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284547.

Full text

Abstract:

The growing amount of position-dependent data (containing both geo position data (i.e. latitude, longitude) and also vehicle/driver-related information) collected from sensors on vehicles poses a challenge to computer programs to process the aggregate amount of data from many vehicles. While handling this growing amount of data, the computer programs that process this data need to exhibit low latency and high throughput – as otherwise the value of the results of this processing will be reduced. As a solution, big data and cloud computing technologies have been widely adopted by industry. This thesis examines a cloud-based processing pipeline that processes vehicle location data. The system receives real-time vehicle data and processes the data in a streaming fashion. The goal is to improve the performance of this streaming pipeline, mainly with respect to latency and cost. The work began by looking at the current solution using AWS Kinesis and AWS Lambda. A benchmarking environment was created and used to measure the current system’s performance. Additionally, a literature study was conducted to find a processing framework that best meets both industrial and academic requirements. After a comparison, Flink was chosen as the new framework. A new solution was designed to use Fink. Next the performance of the current solution and the new Flink solution were compared using the same benchmarking environment and. The conclusion is that the new Flink solution has 86.2% lower latency while supporting triple the throughput of the current system at almost same cost.
Den växande mängden positionsberoende data (som innehåller både geo-positionsdata (dvs. latitud, longitud) och även fordons- / förarelaterad information) som samlats in från sensorer på fordon utgör en utmaning för datorprogram att bearbeta den totala mängden data från många fordon. Medan den här växande mängden data hanteras måste datorprogrammen som behandlar dessa datauppvisa låg latens och hög genomströmning - annars minskar värdet på resultaten av denna bearbetning. Som en lösning har big data och cloud computing-tekniker använts i stor utsträckning av industrin. Denna avhandling undersöker en molnbaserad bearbetningspipeline som bearbetar fordonsplatsdata. Systemet tar emot fordonsdata i realtid och behandlar data på ett strömmande sätt. Målet är att förbättra prestanda för denna strömmande pipeline, främst med avseende på latens och kostnad. Arbetet började med att titta på den nuvarande lösningen med AWS Kinesis och AWS Lambda. En benchmarking-miljö skapades och användes för att mäta det aktuella systemets prestanda. Dessutom genomfördes en litteraturstudie för att hitta en bearbetningsram som bäst uppfyller både industriella och akademiska krav. Efter en jämförelse valdes Flink som det nya ramverket. En nylösning designades för att använda Fink. Därefter jämfördes prestandan för den nuvarande lösningen och den nya Flink-lösningen med samma benchmarking-miljö och. Slutsatsen är att den nya Flink-lösningen har 86,2% lägre latens samtidigt som den stöder tredubbla kapaciteten för det nuvarande systemet till nästan samma kostnad.

APA, Harvard, Vancouver, ISO, and other styles

4

González, Alejandro. "A Swedish Natural Language Processing Pipeline For Building Knowledge Graphs." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254363.

Full text

Abstract:

The concept of knowledge is proper only to the human being thanks to the faculty of understanding. The immaterial concepts, independent of the material causes of the experience constitute an evident proof of the existence of the rational soul that makes the human being a spiritual being "in a way independent of the material. Nowadays research efforts in the field of Artificial Intelligence are trying to mimic this human capacity using computers by means of tteachingthem how to read and understand human language using Machine Learning techniques related to the processing of human language. However, there are still a significant number of challenges such as how to represent this knowledge so can be used by a machine to infer conclusions or provide answers. This thesis presents a Natural Language Processing pipeline that is capable of building a knowledge representation of the information contained in Swedish human-generated text. The result is a system that, given Swedish text in its raw format, builds a representation in the form of a Knowledge Graph of the knowledge or information contained in that text.
Vetskapen om kunskap är den del av det som definierar den nutida människan (som vet, att hon vet). De immateriella begreppen oberoende av materiella attribut är en del av beviset på att människan en själslig varelse som till viss del är oberoende av materialet. För närvarande försöker forskningsinsatser inom artificiell intelligens efterlikna det mänskliga betandet med hjälp av datorer genom att "lära" dem hur man läser och förstår mänskligt språk genom att använda maskininlärningstekniker relaterade till behandling av mänskligt språk. Det finns emellertid fortfarande ett betydande antal utmaningar, till exempel hur man representerar denna kunskap så att den kan användas av en maskin för att dra slutsatser eller ge svar utifrån detta. Denna avhandling presenterar en studie i användningen av ”Natural Language Processing” i en pipeline som kan generera en kunskapsrepresentation av informationen utifrån det svenska språket som bas. Resultatet är ett system som, med svensk text i råformat, bygger en representation i form av en kunskapsgraf av kunskapen eller informationen i den texten.

APA, Harvard, Vancouver, ISO, and other styles

5

SHARMA, DIVYA. "APPLICATION OF ML TO MAKE SENCE OF BIOLOGICAL BIG DATA IN DRUG DISCOVERY PROCESS." Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18378.

Full text

Abstract:

Scientists have been working over years to assemble and accumulate data from biological sources to find solutions for many principal questions. Since a tremendous amount of data has been collected over the past and still increasing at an exponential rate, hence it now becomes unachievable for a human being alone to handle or analyze this data. Most of the data collection and maintenance is now done in digitalized format and hence requires an organization to have better data management and analysis to convert the vast data resource into insights to achieve their objectives. The continuous explosion of information both from biomedical and healthcare sources calls for urgent solutions. Healthcare data needs to be closely combined with biomedical research data to make it more effective in providing personalized medicine and better treatment procedures. Therefore, big data analytics would help in integrating large data sets for proper management, decision-making, and cost- effectiveness in any medical/healthcare organization. The scope of the thesis is to highlight the need for big data analytics in healthcare, explain data processing pipeline, and machine learning used to analyze big data.

APA, Harvard, Vancouver, ISO, and other styles

6

Patuzzi, Ilaria. "16S rRNA gene sequencing sparse count matrices: a count data simulator and optimal pre-processing pipelines." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3426369.

Full text

Abstract:

The study of microbial communities has deeply changed since it was firstly introduced in the 17th century. In the late 1970s, a breakthrough in the way bacterial communities were studied was brought by the discovery that ribosomal RNA (rRNA) genes could be used as molecular markers to perform organisms classification. Some decades later, the advent of DNA sequencing technology revolutionized the study of microbial communities, permitting a culture-independent view on the overall community contained within a sample. Today, one of the most widely used approaches for microbial communities profiling is based on the sequencing of the gene that codes for the 16S subunit of prokaryotic ribosome (16S rRNA gene), that being ubiquitous to all bacteria, but having an exact DNA sequence unique to each species, is used as a sort of molecular fingerprint for assigning to each community member a taxonomic characterization. The advent of Next-Generation Sequencing (NGS) platforms ensured 16S rRNA gene sequencing (16S rDNA-Seq) an increasing growth in election rate as preferred methodology to perform microbiome studies. Despite this, the continuous development of both experimental and computational procedures for 16S rDNA-Seq caused an unavoidable lack in standardization concerning sequencing output data treatment and analysis. This is further complicated by the very peculiar characteristics that distinguish the matrix in which samples information is summarized after sequencing. In fact, the instrumental limit on the maximum number of obtainable sequences makes 16S rDNA-Seq data compositional, i.e. they are data in which the detected abundance of each bacterial species is dependent from the level of presence of other populations in the sample. Additionally, 16S rDNA-Seq-derived matrices are typically highly sparse (70-95% of null values). These peculiarities make the commonly adopted loan of bulk RNA sequencing tools and approaches inappropriate for 16S rDNA-Seq count matrices analyses. In particular, unspecific pre-processing steps, such as normalization, risk to introduce biases in case of highly sparse matrices. The main objective of this thesis was to identify optimal pipelines that filled the above gaps in order to assure solid and reliable conclusions from 16S rRNA-Seq data analyses. Among all the analysis steps included in a typical pipeline, this project was focused on the pre-processing of count data matrices obtained from 16S rDNA-Seq experiments. This task was carried out through several steps. first, state of the art methods for 16S rDNA-Seq count data pre-processing were identified performing a thorough literature search, which revealed a minimal availability of specific tools and the complete lack in the usual 16S rDNA-Seq analysis pipeline of a pre-processing step in which the information loss due to sequencing is recovered (zero-imputation). At the same time, the literature search highlighted that no specific simulators were available to directly obtain synthetic 16S rDNA-Seq count data on which perform the analysis to identift optimal pre-processing pipelines. Thus, a 16S rDNA-Seq sparse count matrices simulator that considers the compositional nature of this data was developed. Then, a comprehensive benchmark analysis of forty-nine pre-processing pipelines was designed and performed to assess currently used and most-recen tpre-processing approaches performance and to test for appropriateness in including zero-imputation step into 16S rDNA-Seq analysis framework. Overall, this thesis considers the 16S rDNA-Seq data pre-processing problem and provide a useful guide for a robust data pre-processing when performing a 16S rDNA-Seq analysis. Additionally, the simulator proposed in this work could be a spur and valuable tool for researchers involved in developing and testing bioinformatics methods, thus helping in filling the lack of specific tools for 16S rDNA-Seq data.
Lo studio delle comunità microbiche è profondamente cambiato da quando fu per la prima volta proposto nel XVII secolo. Quando il ruolo fondamentale dei microbi nel regolare e causare malattie umane divenne evidente, i ricercatori iniziarono a sviluppare una varietà di tecniche per isolare e coltivare i batteri in laboratorio con l'obiettivo di caratterizzarli e classificarli. Alla fine degli anni '70, una svolta in come venivano studiate le comunità batteriche fu apportata dalla scoperta che i geni che codificano per l'RNA ribosomale (rRNA) potevano essere utilizzati come marcatori molecolari per la classificazione degli organismi. Alcuni decenni più tardi, l'avvento della tecnologia di sequenziamento del DNA ha rivoluzionato lo studio delle comunità microbiche, consentendo una visione complessiva coltura-indipendente della comunità contenuta in un campione. Oggi, uno degli approcci più diffusi per profilazione di comunità microbiche si basa sul sequenziamento del gene che codifica per la subunità 16S del ribosoma procariotico (gene dell'rRNA 16S). Poiché il ribosoma svolge un ruolo essenziale nella vita procariotica, esso è onnipresente in tutti i batteri, ma la sua esatta sequenza di DNA è unica per ogni specie. Per questo motivo, esso viene utilizzato come una sorta di impronta molecolare per assegnare a ciascun membro della comunità una caratterizzazione tassonomica. L'avvento delle piattaforme di Next Generation Sequencing (NGS), in grado di produrre un'enorme mole di dati riducendo tempi e costi, ha assicurato alla tecnica di sequenziamento del gene rRNA 16S (16S rDNA-Seq) una crescita nel tasso di elezione come metodologia preferita per eseguire studi sul microbioma. Nonostante ciò, il continuo sviluppo di procedure sia sperimentali che computazionali per 16S rDNA-Seq ha causato una inevitabile mancanza di standardizzazione riguardo al trattamento e all'analisi dei dati di sequenziamento. Ciò è ulteriormente complicato dalle caratteristiche molto peculiari che contraddistinguono la matrice in cui tipicamente le informazioni dei campioni sono riassunte dopo il sequenziamento. Infatti, il limite strumentale sul numero massimo di sequenze ottenibili rende i dati 16S rDNA-Seq composizionali, cioè dati in cui l'abbondanza rilevata di ogni specie batterica dipende dal livello di presenza di altre popolazioni nel campione. Inoltre, le matrici derivate da 16S rDNA-Seq sono in genere molto sparse (70-95% di valori nulli). Ciò è dovuto sia alla diversità biologica tra i campioni sia alla perdita di informazione sulle specie rare durante il sequenziamento, un effetto che è fortemente dipendente sia dalla distribuzione solitamente asimmetrica delle abbondanze delle specie presenti nei microbiomi, sia dal numero di campioni sequenziati nella stessa corsa di sequenziamento (il cosiddetto livello di multiplexing). Le suddette peculiarità rendono la comunemente adottata mutuazione di tool e approcci dall’ambito del sequenziamento di tipo bulk RNA inadeguata per analisi di matrici di conte derivanti da 16S rDNA-Seq. In particolare, fasi di pre-elaborazione non specifiche, come la normalizzazione, rischiano di introdurre forti bias in caso di matrici molto sparse. L'obiettivo principale di questa tesi era quello di identificare delle pipeline di analisi ottimali che riempissero le suddette lacune al fine di ottenere conclusioni solide e affidabili dall'analisi dei dati dell'rRNA-Seq 16S. Tra tutte le fasi di analisi incluse in una tipica pipeline, questo progetto si è concentrato sulla pre-elaborazione di matrici di conte ottenute da esperimenti di 16S rDNA-Seq. Questo scopo è stato raggiunto attraverso diversi passaggi. In primo luogo, sono stati identificati metodi all'avanguardia per la pre-elaborazione dei dati di conte di 16S rDNA-Seq eseguendo un'accurata ricerca bibliografica, che ha rivelato una minima disponibilità di strumenti specifici e la completa mancanza nella consueta pipeline di analisi 16S rDNA-Seq di una fase di pre-elaborazione in cui venga recuperata la perdita di informazioni dovuta al sequenziamento (zero-imputation). Allo stesso tempo, la ricerca bibliografica ha evidenziato che non erano disponibili simulatori specifici per ottenere direttamente dati di conte 16S rDNA-Seq sintetici su cui eseguire l'analisi per identificare pipeline di pre-elaborazione ottimali. Di consequenza, è stato sviluppato un simulatore di matrici di conte sparse derivanti da 16S rDNA-Seq che considera la natura composizionale di questi dati. In seguito, un'analisi comparativa completa di quarantanove pipeline di pre-elaborazione è stata progettata ed eseguita con lo scopo di valutare le prestazioni degli approcci di pre-elaborazione più comunemente utilizzati e più recenti e per verificare l’appropriatezza dell’inclusione di una fase di zero-imputation nel contesto delle analisi di 16S rDNA-Seq. Nel complesso, questa tesi considera il problema della pre-elaborazione dei dati provenienti da 16S rDNA-Seq e fornisce una guida utile per una pre-elaborazione dei dati robusta quando durante un'analisi 16S rDNA-Seq. Inoltre, il simulatore proposto in questo lavoro potrebbe essere uno stimolo e uno strumento prezioso per i ricercatori coinvolti nello sviluppo e nel test dei metodi di bioinformatica, contribuendo così a colmare la mancanza di strumenti specifici per i dati di rDNA-Seq 16S.

APA, Harvard, Vancouver, ISO, and other styles

7

NIGRI, ANNA. "Quality data assessment and improvement in pre-processing pipeline to minimize impact of spurious signals in functional magnetic imaging (fMRI)." Doctoral thesis, Politecnico di Torino, 2017. http://hdl.handle.net/11583/2911412.

Full text

Abstract:

In the recent years, the field of quality data assessment and signal denoising in functional magnetic resonance imaging (fMRI) is rapidly evolving and the identification and reduction of spurious signal with pre-processing pipeline is one of the most discussed topic. In particular, subject motion or physiological signals, such as respiratory or/and cardiac pulsatility, were showed to introduce false-positive activations in subsequent statistical analyses. Different measures for the evaluation of the impact of motion related artefacts, such as frame-wise displacement and root mean square of movement parameters, and the reduction of these artefacts with different approaches, such as linear regression of nuisance signals and scrubbing or censoring procedure, were introduced. However, we identify two main drawbacks: i) the different measures used for the evaluation of motion artefacts were based on user-dependent thresholds, and ii) each study described and applied their own pre-processing pipeline. Few studies analysed the effect of these different pipelines on subsequent analyses methods in task-based fMRI.The first aim of the study is to obtain a tool for motion fMRI data assessment, based on auto-calibrated procedures, to detect outlier subjects and outliers volumes, targeted on each investigated sample to ensure homogeneity of data for motion. The second aim is to compare the impact of different pre-processing pipelines on task-based fMRI using GLM based on recent advances in resting state fMRI preprocessing pipelines. Different output measures based on signal variability and task strength were used for the assessment.

APA, Harvard, Vancouver, ISO, and other styles

8

Torkler, Phillipp [Verfasser], and Johannes [Akademischer Betreuer] Söding. "STAMMP : A statistical model and processing pipeline for PAR-CLIP data reveals transcriptome maps of mRNP biogenesis factors / Phillipp Torkler. Betreuer: Johannes Söding." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2015. http://d-nb.info/1072376628/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Maarouf, Marwan Younes. "XML Integrated Environment For Service-Oriented Data Management." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1180450288.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Severini, Nicola. "Analysis, Development and Experimentation of a Cognitive Discovery Pipeline for the Generation of Insights from Informal Knowledge." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21013/.

Full text

Abstract:

The purpose of this thesis project is to bring the application of Cognitive Discovery to an informal type of knowledge. Cognitive Discovery is a term coined by IBM Research to indicate a series of Information Extraction (IE) processes in order to build a knowledge graph capable of representing knowledge from highly unstructured data such as text. Cognitive Discovery is typically applied to a type of formal knowledge, i.e. of the documented text such as academic papers, business reports, patents, etc. While informal knowledge is provided, for example, by recording a conversation within a meeting or through a Power Point presentation, therefore a type of knowledge not formally defined. The idea behind the project is the same as that of the original Cognitive Discovery project, that is the processing of natural language in order to build a knowledge graph that can be interrogated in different ways. This knowledge graph will have an architecture that will depend on the use case, but tends to be a network of entity nodes connected to each other through a certain semantic relationship and to a certain type of nodes containing structural data such as a paragraph, an image or a slide from a presentation. The creation of this graph requires a series of steps, a data processing pipeline that starting from the raw data (in the specific case of the prototype the audio file of the conversation) a series of features are extracted and processed such as entities, semantic relationships between entities, main concepts etc. Once the graph has been created, it is necessary to define an engine for querying and / or generating insights from the knowledge graph; in general the graph database infrastructure also provides a language for querying the graph, however to make the application usable even for those who do not have the technical knowledge necessary to learn the query language, a component has been defined to process the natural language query to query the graph.

APA, Harvard, Vancouver, ISO, and other styles

11

Lundgren, Therese. "Digitizing the Parthenon using 3D Scanning : Managing Huge Datasets." Thesis, Linköping University, Department of Science and Technology, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2636.

Full text

Abstract:

Digitizing objects and environments from real world has become an important part of creating realistic computer graphics. Through the use of structured lighting and laser time-of-flight measurements the capturing of geometric models is now a common process. The result are visualizations where viewers gain new possibilities for both visual and intellectual experiences.

This thesis presents the reconstruction of the Parthenon temple and its environment in Athens, Greece by using a 3D laser-scanning technique.

In order to reconstruct a realistic model using 3D scanning techniques there are various phases in which the acquired datasets have to be processed. The data has to be organized, registered and integrated in addition to pre and post processing. This thesis describes the development of a suitable and efficient data processing pipeline for the given data.

The approach differs from previous scanning projects considering digitizing this large scale object at very high resolution. In particular the issue managing and processing huge datasets is described.

Finally, the processing of the datasets in the different phases and the resulting 3D model of the Parthenon is presented and evaluated.

APA, Harvard, Vancouver, ISO, and other styles

12

Ayala, Cabrera David. "Characterization of components of water supply systems from GPR images and tools of intelligent data analysis." Doctoral thesis, Universitat Politècnica de València, 2015. http://hdl.handle.net/10251/59235.

Full text

Abstract:

[EN] Over time, due to multiple operational and maintenance activities, the networks of water supply systems (WSSs) undergo interventions, modifications or even are closed. In many cases, these activities are not properly registered. Knowledge of the paths and characteristics (status and age, etc.) of the WSS pipes is obviously necessary for efficient and dynamic management of such systems. This problem is greatly augmented by considering the detection and control of leaks. Access to reliable leakage information is a complex task. In many cases, leaks are detected when the damage is already considerable, which brings high social and economic costs. In this sense, non-destructive methods (e.g., ground penetrating radar - GPR) may be a constructive response to these problems, since they allow, as evidenced in this thesis, to ascertain paths of pipes, identify component characteristics, and detect primordial water leaks. Selection of GPR in this work is justified by its characteristics as non-destructive technique that allows studying both metallic and non-metallic objects. Although the capture of information with GPR is usually successful, such aspects as the capture settings, the large volume of generated information, and the use and interpretation of such information require high level of skill and experience. This dissertation may be seen as a step forward towards the development of tools able to tackle the problem of lack of knowledge on the WSS buried assets. The main objective of this doctoral work is thus to generate tools and assess their feasibility of application to the characterization of components of WSSs from GPR images. In this work we have carried out laboratory tests specifically designed to propose, develop and evaluate methods for the characterization of the WSS buried components. Additionally, we have conducted field tests, which have enabled us to determine the feasibility of implementing such methodologies under uncontrolled conditions. The methodologies developed are based on techniques of intelligent data analysis. The basic principle of this work has involved the processing of data obtained through the GPR to look for useful information about WSS components, with special emphasis on the pipes. After performing numerous activities, one can conclude that, using GPR images, it is feasible to obtain more information than the typical identification of hyperbolae currently performed. In addition, this information can be observed directly, e.g. more simply, using the methodologies proposed in this doctoral work. These methodologies also prove that it is feasible to identify patterns (especially with the preprocessing algorithm termed Agent race) that provide fairly good approximation of the location of leaks in WSSs. Also, in the case of pipes, one can obtain such other characteristics as diameter and material. The main outcomes of this thesis consist in a series of tools we have developed to locate, identify and visualize WSS components from GPR images. Most interestingly, the data are synthesized and reduced so that the characteristics of the different components of the images recorded in GPR are preserved. The ultimate goal is that the developed tools facilitate decision-making in the technical management of WSSs, and that such tools can even be operated by personnel with limited experience in handling non-destructive methodologies, specifically GPR.
[ES] Con el paso del tiempo, y debido a múltiples actividades operacionales y de mantenimiento, las redes de los sistemas de abastecimiento de agua (SAAs) sufren intervenciones, modificaciones o incluso, son clausuradas, sin que, en muchos casos, estas actividades sean correctamente registradas. El conocimiento de los trazados y características (estado y edad, entre otros) de las tuberías en los SAAs es obviamente necesario para una gestión eficiente y dinámica de tales sistemas. A esta problemática se suma la detección y el control de las fugas de agua. El acceso a información fiable sobre las fugas es una tarea compleja. En muchos casos, las fugas son detectadas cuando los daños en la red son ya considerables, lo que trae consigo altos costes sociales y económicos. En este sentido, los métodos no destructivos (por ejemplo, ground penetrating radar - GPR), pueden ser una respuesta a estas problemáticas, ya que permiten, como se pone de manifiesto en esta tesis, localizar los trazados de las tuberías, identificar características de los componentes y detectar las fugas de agua cuando aún no son significativas. La selección del GPR, en este trabajo se justifica por sus características como técnica no destructiva, que permite estudiar tanto objetos metálicos como no metálicos. Aunque la captura de información con GPR suele ser exitosa, la configuración de la captura, el gran volumen de información, y el uso y la interpretación de la información requieren de alto nivel de habilidad y experiencia por parte del personal. Esta tesis doctoral se plantea como un avance hacia el desarrollo de herramientas que permitan responder a la problemática del desconocimiento de los activos enterrados de los SAAs. El objetivo principal de este trabajo doctoral es, pues, generar herramientas y evaluar la viabilidad de su aplicación en la caracterización de componentes de un SAA, a partir de imágenes GPR. En este trabajo hemos realizado ensayos de laboratorio específicamente diseñados para plantear, elaborar y evaluar metodologías para la caracterización de los componentes enterrados de los SAAs. Adicionalmente, hemos realizado ensayos de campo, que han permitido determinar la viabilidad de aplicación de tales metodologías bajo condiciones no controladas. Las metodologías elaboradas están basadas en técnicas de análisis inteligentes de datos. El principio básico de este trabajo ha consistido en el tratamiento adecuado de los datos obtenidos mediante el GPR, a fin de buscar información de utilidad para los SAAs respecto a sus componentes, con especial énfasis en las tuberías. Tras la realización de múltiples actividades, se puede concluir que es viable obtener más información de las imágenes de GPR que la que actualmente se obtiene con la típica identificación de hipérbolas. Esta información, además, puede ser observada directamente, de manera más sencilla, mediante las metodologías planteadas en este trabajo doctoral. Con estas metodologías se ha probado que también es viable la identificación de patrones (especialmente el pre-procesado con el algoritmo Agent race) que proporcionan aproximación bastante acertada de la localización de las fugas de agua en los SAAs. También, en el caso de las tuberías, se puede obtener otro tipo de características tales como el diámetro y el material. Como resultado de esta tesis se han desarrollado una serie de herramientas que permiten visualizar, identificar y localizar componentes de los SAAs a partir de imágenes de GPR. El resultado más interesante es que los resultados obtenidos son sintetizados y reducidos de manera que preservan las características de los diferentes componentes registrados en las imágenes de GPR. El objetivo último es que las herramientas desarrolladas faciliten la toma de decisiones en la gestión técnica de los SAAs y que tales herramientas puedan ser operadas incluso por personal con una experiencia limitada en el manejo
[CAT] Amb el temps, a causa de les múltiples activitats d'operació i manteniment, les xarxes de sistemes d'abastament d'aigua (SAAs) se sotmeten a intervencions, modificacions o fins i tot estan tancades. En molts casos, aquestes activitats no estan degudament registrats. El coneixement dels camins i característiques (estat i edat, etc.) de les canonades d'aigua i sanejament fa evident la necessitat d'una gestió eficient i dinàmica d'aquests sistemes. Aquest problema es veu augmentat en gran mesura tenint en compte la detecció i control de fuites. L'accés a informació fiable sobre les fuites és una tasca complexa. En molts casos, les fugues es detecten quan el dany ja és considerable, el que porta costos socials i econòmics. En aquest sentit, els mètodes no destructius (per exemple, ground penetrating radar - GPR) poden ser una resposta constructiva a aquests problemes, ja que permeten, com s'evidencia en aquesta tesi, per determinar rutes de canonades, identificar les característiques dels components, i detectar les fuites d'aigua quan encara no són significatives. La selecció del GPR en aquest treball es justifica per les seves característiques com a tècnica no destructiva que permet estudiar tant objectes metàl·lics i no metàl·lics. Tot i que la captura d'informació amb GPR sol ser reeixida, aspectes com ara la configuració de captura, el gran volum d'informació que es genera, i l'ús i la interpretació d'aquesta informació requereix alt nivell d'habilitat i experiència. Aquesta tesi pot ser vista com un pas endavant cap al desenvolupament d'eines capaces d'abordar el problema de la manca de coneixement sobre els actius d'aigua i sanejament enterrat. L'objectiu principal d'aquest treball doctoral és, doncs, generar eines i avaluar la seva factibilitat d'aplicació a la caracterització dels components de los SAAs, a partir d'imatges GPR. En aquest treball s'han dut a terme proves de laboratori específicament dissenyats per proposar, desenvolupar i avaluar mètodes per a la caracterització dels components d'aigua i sanejament soterrat. A més, hem dut a terme proves de camp, que ens han permès determinar la viabilitat de la implementació d'aquestes metodologies en condicions no controlades. Les metodologies desenvolupades es basen en tècniques d'anàlisi intel·ligent de dades. El principi bàsic d'aquest treball ha consistit en el tractament de dades obtingudes a través del GPR per buscar informació útil sobre els components d'SAA, amb especial èmfasi en la canonades. Després de realitzar nombroses activitats, es pot concloure que, amb l'ús d'imatges de GPR, és factible obtenir més informació que la identificació típica d'hipèrboles realitzat actualment. A més, aquesta informació pot ser observada directament, per exemple, més simplement, utilitzant les metodologies proposades en aquest treball doctoral. Aquestes metodologies també demostren que és factible per identificar patrons (especialment el pre-processat amb l'algoritme Agent race) que proporcionen bastant bona aproximació de la localització de fuites en SAAs. També, en el cas de tubs, es pot obtenir altres característiques com ara el diàmetre i el material. Els principals resultats d'aquesta tesi consisteixen en una sèrie d'eines que hem desenvolupat per localitzar, identificar i visualitzar els components dels SAAS a partir d'imatges GPR. El resultat més interessant és que els resultats obtinguts són sintetitzats i reduïts de manera que preserven les característiques dels diferents components registrats en les imatges de GPR. L'objectiu final és que les eines desenvolupades faciliten la presa de decisions en la gestió tècnica de SAA, i que tals eines poden fins i tot ser operades per personal amb poca experiència en el maneig de metodologies no destructives, específicament GPR.
Ayala Cabrera, D. (2015). Characterization of components of water supply systems from GPR images and tools of intelligent data analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59235
TESIS
Premiado

APA, Harvard, Vancouver, ISO, and other styles

13

Eriksson, Caroline, and Emilia Kallis. "NLP-Assisted Workflow Improving Bug Ticket Handling." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301248.

Full text

Abstract:

Software companies spend a lot of resources on debugging, a process where previous solutions can help in solving current problems. The bug tickets, containing this information, are often time-consuming to read. To minimize the time spent on debugging and to make sure that the knowledge from prior solutions is kept in the company, an evaluation was made to see if summaries could make this process more efficient. Abstractive and extractive summarization models were tested for this task and fine-tuning of the bert-extractive-summarizer was performed. The model-generated summaries were compared in terms of perceived quality, speed, similarity to each other, and summarization length. The average description summary contained part of the description needed and the found solution was either well documented or did not answer the problem at all. The fine-tuned extractive model and the abstractive model BART provided good conditions for generating summaries containing all the information needed.
Vid mjukvaruutveckling går mycket resurser åt till felsökning, en process där tidigare lösningar kan hjälpa till att lösa aktuella problem. Det är ofta tidskrävande att läsa felrapporterna som innehåller denna information. För att minimera tiden som läggs på felsökning och säkerställa att kunskap från tidigare lösningar bevaras inom företaget, utvärderades om sammanfattningar skulle kunna effektivisera detta. Abstrakta och extraherande sammanfattningsmodeller testades för uppgiften och en finjustering av bert-extractive- summarizer gjordes. De genererade sammanfattningarna jämfördes i avseende på upplevd kvalitet, genereringshastighet, likhet mellan varandra och sammanfattningslängd. Den genomsnittliga sammanfattningen innehöll delar av den viktigaste informationen och den föreslagna lösningen var antingen väldokumenterad eller besvarade inte problembeskrivningen alls. Den finjusterade BERT och den abstrakta modellen BART visade goda förutsättningar för att generera sammanfattningar innehållande all den viktigaste informationen.

APA, Harvard, Vancouver, ISO, and other styles

14

Heidar, Ryad. "Architectures pour le traitement d'images." Grenoble INPG, 1989. http://www.theses.fr/1989INPG0076.

Full text

Abstract:

L'auteur propose un systeme de traitement d'images et son implantation materielle en utilisant les techniques pipelines et l'approche de traitement par recirculation de trame d'images. Le systeme est base sur une association de memoires d'images double acces avec des operateurs pipelines temps reel tel un filtre de masque 33 programmable. La separation des bus de la memoire evite la saturation rapide en vitesse de traitement et de transfert de donnees lors d'un traitement d'une grande masse d'informations. Les performances sont evaluees et des propositions pour le futur developpement sont presentees

APA, Harvard, Vancouver, ISO, and other styles

15

Roy, Simon A. "Data processing pipelines tailored for imaging Fourier-transform spectrometers." Thesis, Université Laval, 2008. http://www.theses.ulaval.ca/2008/25682/25682.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Giovanelli, Joseph. "AutoML: A new methodology to automate data pre-processing pipelines." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20422/.

Full text

Abstract:

It is well known that we are living in the Big Data Era. Indeed, the exponential growth of Internet of Things, Web of Things and Pervasive Computing systems greatly increased the amount of stored data. Thanks to the availability of data, the figure of the Data Scientist has become one of the most sought, because he is capable of transforming data, performing analysis on it, and applying Machine Learning techniques to improve the business decisions of companies. Yet, Data Scientists do not scale. It is almost impossible to balance their number and the required effort to analyze the increasingly growing sizes of available data. Furthermore, today more and more non-experts use Machine Learning tools to perform data analysis but they do not have the required knowledge. To this end, tools that help them throughout the Machine Learning process have been developed and are typically referred to as AutoML tools. However, even with the presence of such tools, raw data (i.e., without being pre-processed) are rarely ready to be consumed, and generally perform poorly when consumed in a raw form. A pre-processing phase (i.e., application of a set of transformations), which improves the quality of the data and makes it suitable for algorithms is usually required. Most of AutoML tools do not consider this preliminary part, even though it has already shown to improve the final performance. Moreover, there exist a few works that actually support pre-processing, but they provide just the application of a fixed series of transformations, decided a priori, not considering the nature of the data, the used algorithm, or simply that the order of the transformations could affect the final result. In this thesis we propose a new methodology that allows to provide a series of pre-processing transformations according to the specific presented case. Our approach analyzes the nature of the data, the algorithm we intend to use, and the impact that the order of transformations could have.

APA, Harvard, Vancouver, ISO, and other styles

17

Tallberg, Sebastian. "A COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING PIPELINES." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-48744.

Full text

Abstract:

In recent years there has been an increasing demand for real-time streaming applications that handle large volumes of data with low latency. Examples of such applications include real-time monitoring and analytics, electronic trading, advertising, fraud detection, and more. In a streaming pipeline the first step is ingesting the incoming data events, after which they can be sent off for processing. Choosing the correct tool that satisfies application requirements is an important technical decision that must be made. This thesis focuses entirely on the data ingestion part by evaluating three different platforms: Apache Kafka, Apache Pulsar and Redis Streams. The platforms are compared both on characteristics and performance. Architectural and design differences reveal that Kafka and Pulsar are more suited for use cases involving long-term persistent storage of events, whereas Redis is a potential solution when only short-term persistence is required. They all provide means for scalability and fault tolerance, ensuring high availability and reliable service. Two metrics, throughput and latency, were used in evaluating performance in a single node cluster. Kafka proves to be the most consistent in throughput but performs the worst in latency. Pulsar manages high throughput with low message sizes but struggles with larger message sizes. Pulsar performs the best in overall average latency across all message sizes tested, followed by Redis. The tests also show Redis being the most inconsistent in terms of throughput potential between different message sizes

APA, Harvard, Vancouver, ISO, and other styles

18

Harrison, William. "Malleability, obliviousness and aspects for broadcast service attachment." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4138/.

Full text

Abstract:

An important characteristic of Service-Oriented Architectures is that clients do not depend on the service implementation's internal assignment of methods to objects. It is perhaps the most important technical characteristic that differentiates them from more common object-oriented solutions. This characteristic makes clients and services malleable, allowing them to be rearranged at run-time as circumstances change. That improvement in malleability is impaired by requiring clients to direct service requests to particular services. Ideally, the clients are totally oblivious to the service structure, as they are to aspect structure in aspect-oriented software. Removing knowledge of a method implementation's location, whether in object or service, requires re-defining the boundary line between programming language and middleware, making clearer specification of dependence on protocols, and bringing the transaction-like concept of failure scopes into language semantics as well. This paper explores consequences and advantages of a transition from object-request brokering to service-request brokering, including the potential to improve our ability to write more parallel software.

APA, Harvard, Vancouver, ISO, and other styles

19

Du, Wei. "Advanced middleware support for distributed data-intensive applications." Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1126208308.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains xix, 183 p.; also includes graphics (some col.). Includes bibliographical references (p. 170-183). Available online via OhioLINK's ETD Center

APA, Harvard, Vancouver, ISO, and other styles

20

Li, Yunming. "Machine vision algorithms for mining equipment automation." Thesis, Queensland University of Technology, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

21

Hobson, Alan George Cawood. "Optimising the renewal of natural gas reticulation pipes using GIS." Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52980.

Full text

Abstract:

Thesis (MA)--University of Stellenbosch, 2002.
ENGLISH ABSTRACT: A major concern for Energex, Australia's largest energy utility in South East Queensland, is the escape of natural gas out of their reticulation systems. Within many of the older areas in Brisbane, these networks operate primarily at low and medium pressure with a significant percentage of mains being cast iron or steel. Over many years pipes in these networks have been replaced, yet reports show that unaccounted for gas from the same networks remain high. Furthermore, operation and maintenance budgets for these networks are high with many of these pipes close to the end of their economic life. When operation and maintenance costs exceed the costs of replacement, the Energex gas utility initiates projects to renew reticulation networks with polyethylene pipes. Making decisions about pipe renewal requires an evaluation of historical records from a number of sources, namely: • gas consumption figures, • history of leaks, • maintenance and other related cost, and • the loss of revenue contributed by unaccounted for gas. Financial justification of capital expenditure has always been a requirement for renewal projects at the Energex gas utility, however the impact of a deregulation in the energy utility market has necessitated a review of their financial assessment for capital projects. The Energex gas utility has developed an application that evaluates the financial viability of renewal projects. This research will demonstrate the role of GIS integration with the Energex financial application. The results of this study showed that a GIS integrated renewal planning approach incorporates significant benefits including: • Efficient selection of a sub-network based on pipe connectivity, • Discovery of hidden relationships between spatially enabled alphanumeric data and environmental information that improves decision making, and • Enhanced testing of proposed renewal design options by scrutinizing the attributes of spatial data.
AFRIKAANSE OPSOMMING: 'n Groot bron van kommer vir Energex, Australië se grootste energieverskaffer in Suidoos- Queensland, is die verlies van natuurlike gas uit hul gasdistribusie netwerke. In 'n groot deel van ouer Brisbane opereer hierdie netwerke hoofsaaklik teen lae en medium druk, met 'n aansienlike persentasie van hoofpyplyne wat uit gietyster of staal bestaan. Al is sommige pyplyne in hierdie netwerke met verloop van tyd vervang, maak verslae dit duidelik dat 'n groot deel van die gas in hierdie netwerke steeds langs die pad verlore gaan. Die operasionele - en onderhoudsbegrotings vir hierdie netwerke is boonop hoog, met 'n groot persentasie van die pyplyne wat binnekort aan die einde van hulle ekonomiese leeftyd kom. Wanneer operasionele- en onderhoudsonkostes die koste van vervanging oorskry, beplan Energex se gasvoorsienings-afdeling projekte om verspreidingsnetwerke te hernu met poli-etileen pype. Om sinvolle besluite te neem tydens pyplynhernuwings, word verskeie historiese verslae geraadpleeg, insluitend: gasverbruikvlakke, lekplek geskiedenis rekords, onderhoud- en ander verwante onkostes, asook die verlies van inkomste weens verlore gas. Alhoewel finansiële stawing van kapitale uitgawes nog altyd 'n voorvereiste was tydens hernuwingsprojekte by Energex, het die impak van privatisering op die energieverskaffingsmark dit noodsaaklik gemaak om hulle finansiële goedkeuringsproses vir kapitaalprojekte te hersien. Energex het dus 'n sagteware toepassing ontwikkel wat die finansiële gangbaarheid van hernuwingsprojekte evalueer. Hierdie navorsing sal die moontlike integrasie van geografiese inligtingstelsels (GIS) met dié van Energex se finansiële evalueringspakket demonstreer. Die resultate van hierdie studie toon dat die integrasie van GIS in die hernuwingsproses aansienlike voordele inhou, insluitende: • die effektiewe seleksie van sub-netwerke, gebaseer op pyp konnektiwiteit, • die ontdekking van verskuilde verwantskappe tussen geografies-ruimtelike alfanumeriese data en omgewingsinligting, wat besluitneming vergemaklik, en • verbeterde toetsing van voorgestelde hernuwingsopsies deur die indiepte-nagaan van geografiesruimtelike elemente.

APA, Harvard, Vancouver, ISO, and other styles

22

Corista, Pedro André da Silva. "IoT data processing pipeline in FoF perspective." Master's thesis, 2017. http://hdl.handle.net/10362/28225.

Full text

Abstract:

With the development in the contemporary industry, the concepts of ICT and IoT are gaining more importance, as they are the foundation for the systems of the future. Most of the current solutions converge into transforming the traditional industry in new smart interconnected factories, aware of its context, adaptable to different environments and capable of fully using its resources. However, the full potential for ICT manufacturing has not been achieved, since there is not a universal or standard architecture or model that can be applied to all the existing systems, to tackle the heterogeneity of the existing devices. In a common factory, exists a large amount of information that needs to be processed into the system in order to define event rules accordingly to the related contextual knowledge, to later execute the needed actions. However, this information is sometimes heterogeneous, meaning that it cannot be accessed or understood by the components of the system. This dissertation analyses the existing theories and models that may lead to seamless and homogeneous data exchange and contextual interpretation. A framework based on these theories is proposed in this dissertation, that aims to explore the situational context formalization in order to adequately provide appropriate actions.

APA, Harvard, Vancouver, ISO, and other styles

23

Santos, João Guilherme Basílio dos. "Photometry Data Processing for ESA's CHEOPS space mission." Master's thesis, 2018. http://hdl.handle.net/10316/86206.

Full text

Abstract:

Dissertação de Mestrado em Astrofísica e Instrumentação para o Espaço apresentada à Faculdade de Ciências e Tecnologia
A pesquisa em torno da busca e estudo de planetas extra-solares está a crescer rapidamente e é definida como uma das principais prioridades da Agência Espacial Europeia (ESA) para o programa Visão Cósmica 2015-2025, com missões como a CHEOPS que será lançado no início de 2019, PLATO (PLAnetary Transit and Oscillations of Stars) por volta de 2026 e ARIEL (Atmospheric Remote-sensing Exoplanet Large-survey) em 2028. Essas missões serão fundamentais para a compreensão e estudo do campo da ciência planetária. O papel dos dispositivos acoplados carregados (CCDs)neste campo é notório, sendo que estes detectores foram e serão amplamente utilisados em missões espaciais e terrestres. A sua alta eficiência na região óptica do espectro eletromagnético é um dos factores mais importantes para o uso deste detector. Neste trabalho apresentamos o CCD e as suas principais características. Ao longo dos anos, foi necessário não apenas colectar e armazenar as informações recolhidas por estes detectores, mas também pré-processar os dados "crus" recolhidos, removendo erros que surgem naturalmente das características do detector e das suas componentes eletrónicas, bem como de efeitos externos. Este processo de correção irá mais tarde facilitar o trabalho de análise de dados feito pelas equipas de ciência. A missão CHEOPS não é uma exceção, uma vez que foi desenvolvida uma pipeline de redução de dados. Esta pipeline é um software desenvolvido usando a linguagem de programação Python, que corrigirá as imagens "cruas" e depois extrairá a curva de luz da estrela alvo. A curva de luz representa a variação do fluxo recebido de uma estrela com o tempo. Pode ser usado no campo de exoplanetas para estudar as mudanças de fluxo criadas por um planeta transitando a sua estrela hospedeira.
The research surrounding the search and study of extra-solar planets is quickly growing and is defined as one of the main priorities of the European Space Agency(ESA) for the Cosmic Vision 2015-2025 program, with missions such as CHEOPS to be launched in the beginning of 2019, PLATO (PLAnetary Transit and Oscillations of stars) around 2026 and ARIEL (Atmospheric Remote-sensing Exoplanet Large-survey) in 2028. These missions will be fundamental for the comprehension and study of the planetary science field. The role of Charged Coupled Devices (CCDs)in this field is notorious as these detectors have been and will be extensively used in both space and ground-based surveys. Their high efficiency in the optical region of the electromagnetic spectrum is one of the most important factors for the use of this detector. In subsection 2.3 we present the CCD and its main characteristics. Throughout the years it has been necessary not only to collect and store the information gathered by these detectors, but also to pre-process the collected raw data,removing errors that naturally arise from the detector’s and electronics’s intrinsic characteristics, as well as from environmental effects. This correction process will later facilitate the consequent job of data analysis done by the science teams. The CHEOPS mission is no exception to this, since a data reduction pipeline, hereafter DRP, has been developed. This pipeline is a software developed in the Python programming language, that will correct the raw images for undesired instrumental and astrophysical signals and then extract the light curve of the target star. The light-curve represents the variation of the flux received from a star with time. It may be used in exoplanet science to study the flux changes created by a planet transiting its host star.

APA, Harvard, Vancouver, ISO, and other styles

24

Morris, Joseph P. "An analysis pipeline for the processing, annotation, and dissemination of expressed sequence tags." 2009. http://etd.louisville.edu/data/UofL0482t2009.pdf.

Full text

Abstract:

Thesis (M.Eng.)--University of Louisville, 2009.
Title and description from thesis home page (viewed May 22, 2009). Department of Computer Engineering and Computer Science. Vita. "May 2009." Includes bibliographical references (p. 39-41).

APA, Harvard, Vancouver, ISO, and other styles

25

(9708467), Siddhant Srinath Betrabet. "Data Acquisition and Processing Pipeline for E-Scooter Tracking Using 3D LIDAR and Multi-Camera Setup." Thesis, 2021.

Find full text

Abstract:

Analyzing behaviors of objects on the road is a complex task that requires data from various sensors and their fusion to recreate movement of objects with a high degree of accuracy. A data collection and processing system are thus needed to track the objects accurately in order to make an accurate and clear map of the trajectories of objects relative to various coordinate frame(s) of interest in the map. Detection and tracking moving objects (DATMO) and Simultaneous localization and mapping (SLAM) are the tasks that needs to be achieved in conjunction to create a clear map of the road comprising of the moving and static objects.

These computational problems are commonly solved and used to aid scenario reconstruction for the objects of interest. The tracking of objects can be done in various ways, utilizing sensors such as monocular or stereo cameras, Light Detection and Ranging (LIDAR) sensors as well as Inertial Navigation systems (INS) systems. One relatively common method for solving DATMO and SLAM involves utilizing a 3D LIDAR with multiple monocular cameras in conjunction with an inertial measurement unit (IMU) allows for redundancies to maintain object classification and tracking with the help of sensor fusion in cases when sensor specific traditional algorithms prove to be ineffectual when either sensor falls short due to their limitations. The usage of the IMU and sensor fusion methods relatively eliminates the need for having an expensive INS rig. Fusion of these sensors allows for more effectual tracking to utilize the maximum potential of each sensor while allowing for methods to increase perceptional accuracy.

The focus of this thesis will be the dock-less e-scooter and the primary goal will be to track its movements effectively and accurately with respect to cars on the road and the world. Since it is relatively more common to observe a car on the road than e-scooters, we propose a data collection system that can be built on top of an e-scooter and an offline processing pipeline that can be used to collect data in order to understand the behaviors of the e-scooters themselves. In this thesis, we plan to explore a data collection system involving a 3D LIDAR sensor and multiple monocular cameras and an IMU on an e-scooter as well as an offline method for processing the data to generate data to aid scenario reconstruction.

APA, Harvard, Vancouver, ISO, and other styles

26

Betrabet, Siddhant S. "Data Acquisition and Processing Pipeline for E-Scooter Tracking Using 3d Lidar and Multi-Camera Setup." Thesis, 2020. http://hdl.handle.net/1805/24776.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Analyzing behaviors of objects on the road is a complex task that requires data from various sensors and their fusion to recreate the movement of objects with a high degree of accuracy. A data collection and processing system are thus needed to track the objects accurately in order to make an accurate and clear map of the trajectories of objects relative to various coordinate frame(s) of interest in the map. Detection and tracking moving objects (DATMO) and Simultaneous localization and mapping (SLAM) are the tasks that needs to be achieved in conjunction to create a clear map of the road comprising of the moving and static objects. These computational problems are commonly solved and used to aid scenario reconstruction for the objects of interest. The tracking of objects can be done in various ways, utilizing sensors such as monocular or stereo cameras, Light Detection and Ranging (LIDAR) sensors as well as Inertial Navigation systems (INS) systems. One relatively common method for solving DATMO and SLAM involves utilizing a 3D LIDAR with multiple monocular cameras in conjunction with an inertial measurement unit (IMU) allows for redundancies to maintain object classification and tracking with the help of sensor fusion in cases when sensor specific traditional algorithms prove to be ineffectual when either sensor falls short due to their limitations. The usage of the IMU and sensor fusion methods relatively eliminates the need for having an expensive INS rig. Fusion of these sensors allows for more effectual tracking to utilize the maximum potential of each sensor while allowing for methods to increase perceptional accuracy. The focus of this thesis will be the dock-less e-scooter and the primary goal will be to track its movements effectively and accurately with respect to cars on the road and the world. Since it is relatively more common to observe a car on the road than e-scooters, we propose a data collection system that can be built on top of an e-scooter and an offline processing pipeline that can be used to collect data in order to understand the behaviors of the e-scooters themselves. In this thesis, we plan to explore a data collection system involving a 3D LIDAR sensor and multiple monocular cameras and an IMU on an e-scooter as well as an offline method for processing the data to generate data to aid scenario reconstruction.

APA, Harvard, Vancouver, ISO, and other styles

27

Kaever, Peter, Wolfgang Oertel, Axel Renno, Peter Seidel, Markus Meyer, Markus Reuter, and Stefan König. "A Versatile Sensor Data Processing Framework for Resource Technology." 2021. https://htw-dresden.qucosa.de/id/qucosa%3A75233.

Full text

Abstract:

Die Erweiterung experimenteller Infrastrukturen um neuartige Sensor eröffnen die Möglichkeit, qualitativ neuartige Erkenntnisse zu gewinnen. Um diese Informationen vollständig zu erschließen ist ein Abdecken der gesamten Verarbeitungskette von der Datenauslese bis zu anwendungsbezogenen Auswertung erforderlich. Eine Erweiterung bestehender wissenschaftlicher Instrumente beinhaltet die strukturelle und zeitbezogene Integration der neuen Sensordaten in das Bestandssystem. Das hier vorgestellte Framework bietet durch seinen flexiblen Ansatz das Potenzial, unterschiedliche Sensortypen in unterschiedliche, leistungsfähige Plattformen zu integrieren. Zwei unterschiedliche Integrationsansätze zeigen die Flexibilität dieses Ansatzes, wobei einer auf die Steigerung der Sensitivität einer Anlage zur Sekundärionenmassenspektroskopie und der andere auf die Bereitstellung eines Prototypen zur Untersuchung von Rezyklaten ausgerichtet ist. Die sehr unterschiedlichen Hardwarevoraussetzungen und Anforderungen der Anwendung bildeten die Basis zur Entwicklung eines flexiblen Softwareframeworks. Um komplexe und leistungsfähige Applikationsbausteine bereitzustellen wurde eine Softwaretechnologie entwickelt, die modulare Pipelinestrukturen mit Sensor- und Ausgabeschnittstellen sowie einer Wissensbasis mit entsprechenden Konfigurations- und Verarbeitungsmodulen kombiniert.:1. Introduction 2. Hardware Architecture and Application Background 3. Software Concept 4. Experimental Results 5. Conclusion and Outlook
Novel sensors with the ability to collect qualitatively new information offer the potential to improve experimental infrastructure and methods in the field of research technology. In order to get full access to this information, the entire range from detector readout data transfer over proper data and knowledge models up to complex application functions has to be covered. The extension of existing scientific instruments comprises the integration of diverse sensor information into existing hardware, based on the expansion of pivotal event schemes and data models. Due to its flexible approach, the proposed framework has the potential to integrate additional sensor types and offers migration capabilities to high-performance computing platforms. Two different implementation setups prove the flexibility of this approach, one extending the material analyzing capabilities of a secondary ion mass spectrometry device, the other implementing a functional prototype setup for the online analysis of recyclate. Both setups can be regarded as two complementary parts of a highly topical and ground-breaking unique scientific application field. The requirements and possibilities resulting from different hardware concepts on one hand and diverse application fields on the other hand are the basis for the development of a versatile software framework. In order to support complex and efficient application functions under heterogeneous and flexible technical conditions, a software technology is proposed that offers modular processing pipeline structures with internal and external data interfaces backed by a knowledge base with respective configuration and conclusion mechanisms.:1. Introduction 2. Hardware Architecture and Application Background 3. Software Concept 4. Experimental Results 5. Conclusion and Outlook

APA, Harvard, Vancouver, ISO, and other styles

28

Kikta, Marcel. "Vyhodnocování relačních dotazů v proudově orientovaném prostředí." Master's thesis, 2014. http://www.nusl.cz/ntk/nusl-341206.

Full text

Abstract:

This thesis deals with the design and implementation of an optimizer and a transformer of relational queries. Firstly, the thesis describes the theory of the relational query compilers. Secondly, we present the data structures and algorithms used in the implemented tool. Finally, the important implementation details of the developed tool are discussed. Part of the thesis is the selection of used relational algebra operators and design of an appropriate input. Input of the implemented software is a query written in a XML file in the form of relational algebra. Query is optimized and transformed into physical plan which will be executed in the parallelization framework Bobox. Developed compiler outputs physical plan written in the Bobolang language, which serves as an input for the Bobox. Powered by TCPDF (www.tcpdf.org)

APA, Harvard, Vancouver, ISO, and other styles

29

Karlsson, Christoffer. "The performance impact from processing clipped triangles in state-of-the-art games." Thesis, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16853.

Full text

Abstract:

Background. Modern game applications pressures hardware to its limits, and affects how graphics hardware and APIs are designed. In games, rendering geometry plays a vital role, and the implementation of optimization techniques, such as view frustum culling, is generally necessary to meet the quality expected by the customers. Failing to optimize a game application can potentially lead to higher system requirements or less quality in terms of visual effects and content. Many optimization techniques, and studies of the performance of such techniques exist. However, no research was found where the utilization of computational resources in the GPU, in state-of-the-art games, was analyzed. Objectives. The aim of this thesis was to investigate the potential problem of commercial game applications wasting computational resources. Specifically, the focus was set on the triangle data processed in the geometry stage of the graphics pipeline, and the amount of triangles discarded through clipping. Methods. The objectives were met by conducting a case study and an empirical data analysis of the amount triangles and entire draw calls that were discarded through clipping, as well as the vertex data size and the time spent on processing these triangles, in eight games. The data was collected using Triangelplockaren, a tool which collects the triangle data that reaches the rasterizer stage. This data was then analyzed and discussed through relational findings in the results. Results. The results produced consisted of 30 captures of benchmark and gameplay sessions. The average of each captured session was used to make observations and to draw conclusions. Conclusions. This study showed evidence of noteworthy amounts of data being processed in the GPU which is discarded through clipping later in the graphics pipeline. This was seen in all of the game applications included in this study. While it was impossible to draw conclusions regarding the direct impact on performance, it was safe to say that the performance relative to the geometry processed was significant in each of the analyzed cases, and in many cases extreme.

APA, Harvard, Vancouver, ISO, and other styles

30

Wilson, Derek Alan. "A Dredging Knowledge-Base Expert System for Pipeline Dredges with Comparison to Field Data." 2010. http://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8653.

Full text

Abstract:

A Pipeline Analytical Program and Dredging Knowledge{Base Expert{System (DKBES) determines a pipeline dredge's production and resulting cost and schedule. Pipeline dredge engineering presents a complex and dynamic process necessary to maintain navigable waterways. Dredge engineers use pipeline engineering and slurry transport principles to determine the production rate of a pipeline dredge system. Engineers then use cost engineering factors to determine the expense of the dredge project. Previous work in engineering incorporated an object{oriented expert{system to determine cost and scheduling of mid{rise building construction where data objects represent the fundamental elements of the construction process within the program execution. A previously developed dredge cost estimating spreadsheet program which uses hydraulic engineering and slurry transport principles determines the performance metrics of a dredge pump and pipeline system. This study focuses on combining hydraulic analysis with the functionality of an expert{system to determine the performance metrics of a dredge pump and pipeline system and its resulting schedule. Field data from the U.S. Army Corps of Engineers pipeline dredge, Goetz, and several contract daily dredge reports show how accurately the DKBES can predict pipeline dredge production. Real{time dredge instrumentation data from the Goetz compares the accuracy of the Pipeline Analytical Program to actual dredge operation. Comparison of the Pipeline Analytical Program to pipeline daily dredge reports shows how accurately the Pipeline Analytical Program can predict a dredge project's schedule over several months. Both of these comparisons determine the accuracy and validity of the Pipeline Analytical Program and DKBES as they calculate the performance metrics of the pipeline dredge project. The results of the study determined that the Pipeline Analytical Program compared closely to the Goetz eld data where only pump and pipeline hydraulics a ected the dredge production. Results from the dredge projects determined the Pipeline Analytical Program underestimated actual long{term dredge production. Study results identi ed key similarities and di erences between the DKBES and spreadsheet program in terms of cost and scheduling. The study then draws conclusions based on these ndings and o ers recommendations for further use.

APA, Harvard, Vancouver, ISO, and other styles

31

"Data processing pipelines tailored for imaging Fourier-transform spectrometers." Thesis, Université Laval, 2008. http://www.theses.ulaval.ca/2008/25682/25682.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data processing pipeline'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles