Dissertations / Theses on the topic 'Very large data sets'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Very large data sets.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Quddus, Syed. "Accurate and efficient clustering algorithms for very large data sets." Thesis, Federation University Australia, 2017. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/162586.
Full textDoctor of Philosophy
Harrington, Justin. "Extending linear grouping analysis and robust estimators for very large data sets." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/845.
Full textSandhu, Jatinder Singh. "Combining exploratory data analysis and scientific visualization in the study of very large, space-time data sets /." The Ohio State University, 1990. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487683401443166.
Full textGeppert, Leo Nikolaus [Verfasser], Katja [Akademischer Betreuer] Ickstadt, and Andreas [Gutachter] Groll. "Bayesian and frequentist regression approaches for very large data sets / Leo Nikolaus Geppert ; Gutachter: Andreas Groll ; Betreuer: Katja Ickstadt." Dortmund : Universitätsbibliothek Dortmund, 2018. http://d-nb.info/1181427479/34.
Full textMcNeil, Vivienne Heather. "Assessment methodologies for very large, irregularly collected water quality data sets with special reference to the natural waters of Queensland." Thesis, Queensland University of Technology, 2001.
Find full textCordeiro, Robson Leonardo Ferreira. "Data mining in large sets of complex data." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-22112011-083653/.
Full textO crescimento em quantidade e complexidade dos dados armazenados nas organizações torna a extração de conhecimento utilizando técnicas de mineração uma tarefa ao mesmo tempo fundamental para aproveitar bem esses dados na tomada de decisões estratégicas e de alto custo computacional. O custo vem da necessidade de se explorar uma grande quantidade de casos de estudo, em diferentes combinações, para se obter o conhecimento desejado. Tradicionalmente, os dados a explorar são representados como atributos numéricos ou categóricos em uma tabela, que descreve em cada tupla um caso de teste do conjunto sob análise. Embora as mesmas tarefas desenvolvidas para dados tradicionais sejam também necessárias para dados mais complexos, como imagens, grafos, áudio e textos longos, a complexidade das análises e o custo computacional envolvidos aumentam significativamente, inviabilizando a maioria das técnicas de análise atuais quando aplicadas a grandes quantidades desses dados complexos. Assim, técnicas de mineração especiais devem ser desenvolvidas. Este Trabalho de Doutorado visa a criação de novas técnicas de mineração para grandes bases de dados complexos. Especificamente, foram desenvolvidas duas novas técnicas de agrupamento e uma nova técnica de rotulação e sumarização que são rápidas, escaláveis e bem adequadas à análise de grandes bases de dados complexos. As técnicas propostas foram avaliadas para a análise de bases de dados reais, em escala de Terabytes de dados, contendo até bilhões de objetos complexos, e elas sempre apresentaram resultados de alta qualidade, sendo em quase todos os casos pelo menos uma ordem de magnitude mais rápidas do que os trabalhos relacionados mais eficientes. Os dados reais utilizados vêm das seguintes aplicações: diagnóstico automático de câncer de mama, análise de imagens de satélites, e mineração de grafos aplicada a um grande grafo da web coletado pelo Yahoo! e também a um grafo com todos os usuários da rede social Twitter e suas conexões. Tais resultados indicam que nossos algoritmos permitem a criação de aplicações em tempo real que, potencialmente, não poderiam ser desenvolvidas sem a existência deste Trabalho de Doutorado, como por exemplo, um sistema em escala global para o auxílio ao diagnóstico médico em tempo real, ou um sistema para a busca por áreas de desmatamento na Floresta Amazônica em tempo real
Chaudhary, Amitabh. "Applied spatial data structures for large data sets." Available to US Hopkins community, 2002. http://wwwlib.umi.com/dissertations/dlnow/3068131.
Full textArvidsson, Johan. "Finding delta difference in large data sets." Thesis, Luleå tekniska universitet, Datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-74943.
Full textTricker, Edward A. "Detecting anomalous aggregations of data points in large data sets." Thesis, Imperial College London, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.512050.
Full textRomig, Phillip R. "Parallel task processing of very large datasets." [Lincoln, Neb. : University of Nebraska-Lincoln], 1999. http://international.unl.edu/Private/1999/romigab.pdf.
Full textBate, Steven Mark. "Generalized linear models for large dependent data sets." Thesis, University College London (University of London), 2004. http://discovery.ucl.ac.uk/1446542/.
Full textHennessey, Anthony. "Statistical shape analysis of large molecular data sets." Thesis, University of Nottingham, 2018. http://eprints.nottingham.ac.uk/52088/.
Full textDementiev, Roman. "Algorithm engineering for large data sets hardware, software, algorithms." Saarbrücken VDM, Müller, 2006. http://d-nb.info/986494429/04.
Full textDementiev, Roman. "Algorithm engineering for large data sets : hardware, software, algorithms /." Saarbrücken : VDM-Verl. Dr. Müller, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=3029033&prov=M&dok_var=1&dok_ext=htm.
Full textNair, Sumitra Sarada. "Function estimation using kernel methods for large data sets." Thesis, University of Sheffield, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.444581.
Full textRauschenberg, David Edward. "Computer-graphical exploration of large data sets from teletraffic." Diss., The University of Arizona, 1994. http://hdl.handle.net/10150/186645.
Full textFarran, Bassam. "One-pass algorithms for large and shifting data sets." Thesis, University of Southampton, 2010. https://eprints.soton.ac.uk/159173/.
Full textMangalvedkar, Pallavi Ramachandra. "GPU-ASSISTED RENDERING OF LARGE TREE-SHAPED DATA SETS." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1195491112.
Full textToulis, Panagiotis. "Implicit methods for iterative estimation with large data sets." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493434.
Full textStatistics
Kışınbay, Turgut. "Predictive ability or data snopping? : essays on forecasting with large data sets." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=85018.
Full textSimonet, Anthony. "Active Data - Enabling Smart Data Life Cycle Management for Large Distributed Scientific Data Sets." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1004/document.
Full textIn all domains, scientific progress relies more and more on our ability to exploit ever growing volumes of data. However, as datavolumes increase, their management becomes more difficult. A key point is to deal with the complexity of data life cycle management,i.e. all the operations that happen to data between their creation and there deletion: transfer, archiving, replication, disposal etc.These formerly straightforward operations become intractable when data volume grows dramatically, because of the heterogeneity ofdata management software on the one hand, and the complexity of the infrastructures involved on the other.In this thesis, we introduce Active Data, a meta-model, an implementation and a programming model that allow to represent formally and graphically the life cycle of data distributed in an assemblage of heterogeneous systems and infrastructures, naturally exposing replication, distribution and different data identifiers. Once connected to existing applications, Active Data exposes the progress of data through their life cycle at runtime to users and programs, while keeping their track as it passes from a system to another.The Active Data programming model allows to execute code at each step of the data life cycle. Programs developed with Active Datahave access at any time to the complete state of data in any system and infrastructure it is distributed to.We present micro-benchmarks and usage scenarios that demonstrate the expressivity of the programming model and the implementationquality. Finally, we describe the implementation of a Data Surveillance framework based on Active Data for theAdvanced Photon Source experiment that allows scientists to monitor the progress of their data, automate most manual tasks,get relevant notifications from huge amount of events, and detect and recover from errors without human intervention.This work provides interesting perspectives in data provenance and open data in particular, while facilitating collaboration betweenscientists from different communities
Schwartz, Jeremy (Jeremy D. ). "A modified experts algorithm : using correlation to speed convergence with very large sets of experts." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/35642.
Full textIncludes bibliographical references (p. 121).
This paper discusses a modification to the Exploration-Exploitation Experts algorithm - (EEE). The EEE is a generalization of the standard experts algorithm which is designed for use in reactive environments. In these problems, the algorithm is only able to learn about the expert that it follows at any given stage. As a result, the convergence rate of the algorithm is heavily dependent on the number of experts which it must consider. We adapt this algorithm for use with a very large set of experts. We do this by capitalizing on the fact that when a set of experts is large, many experts in the set tend to display similarities in behavior. We quantify this similarity with a concept called correlation, and use this correlation information to improve the convergence rate of the algorithm with respect to the number of experts. Experimental results show that given the proper conditions, the convergence rate of the modified algorithm can be independent of the size of the expert space.
by Jeremy Schwartz.
S.M.
Ljung, Patric. "Efficient Methods for Direct Volume Rendering of Large Data Sets." Doctoral thesis, Norrköping : Department of Science and Technology, Linköping University, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-7232.
Full textLam, Heidi Lap Mun. "Visual exploratory analysis of large data sets : evaluation and application." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/839.
Full textUminsky, David. "Generalized Spectral Analysis for Large Sets of Approval Voting Data." Scholarship @ Claremont, 2003. https://scholarship.claremont.edu/hmc_theses/157.
Full textYagoubi, Djamel edine. "Indexing and analysis of very large masses of time series." Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS084/document.
Full textTime series arise in many application domains such as finance, agronomy, health, earth monitoring, weather forecasting, to name a few. Because of advances in sensor technology, such applications may produce millions to trillions of time series per day, requiring fast analytical and summarization techniques.The processing of these massive volumes of data has opened up new challenges in time series data mining. In particular, it is to improve indexing techniques that has shown poor performances when processing large databases.In this thesis, we focus on the problem of parallel similarity search in such massive sets of time series. For this, we first need to develop efficient search operators that can query a very large distributed database of time series with low response times. The search operator can be implemented by using an index constructed before executing the queries. The objective of indices is to improve the speed of data retrieval operations. In databases, the index is a data structure, which based on search criteria, efficiently locates data entries satisfying the requirements. Indexes often make the response time of the lookup operation sublinear in the database size.After reviewing the state of the art, we propose three novel approaches for parallel indexing and queryin large time series datasets. First, we propose DPiSAX, a novel and efficient parallel solution that includes a parallel index construction algorithm that takes advantage of distributed environments to build iSAX-based indices over vast volumes of time series efficiently. Our solution also involves a parallel query processing algorithm that, given a similarity query, exploits the available processors of the distributed system to efficiently answer the query in parallel by using the constructed parallel index.Second, we propose RadiusSketch a random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. RadiusSketch includes a parallel index construction algorithm that takes advantage of distributed environments to efficiently build sketch-based indices over very large databases of time series, and then query the databases in parallel.Third, we propose ParCorr, an efficient parallel solution for detecting similar time series across distributed data streams. ParCorr uses the sketch principle for representing the time series. Our solution includes a parallel approach for incremental computation of the sketches in sliding windows and a partitioning approach that projects sketch vectors of time series into subvectors and builds a distributed grid structure.Our solutions have been evaluated using real and synthetics datasets and the results confirm their high efficiency compared to the state of the art
Lundell, Fredrik. "Out-of-Core Multi-Resolution Volume Rendering of Large Data Sets." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70162.
Full textMånsson, Per. "Database analysis and managing large data sets in a trading environment." Thesis, Linköpings universitet, Databas och informationsteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-104193.
Full textCarter, Caleb. "High Resolution Visualization of Large Scientific Data Sets Using Tiled Display." Fogler Library, University of Maine, 2007. http://www.library.umaine.edu/theses/pdf/CarterC2007.pdf.
Full textMemarsadeghi, Nargess. "Efficient algorithms for clustering and interpolation of large spatial data sets." College Park, Md. : University of Maryland, 2007. http://hdl.handle.net/1903/6839.
Full textThesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
Sips, Mike. "Pixel-based visual data mining in large geo-spatial point sets /." Konstanz : Hartung-Gorre, 2006. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=014881714&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.
Full textCoudret, Raphaël. "Stochastic modelling using large data sets : applications in ecology and genetics." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00865867.
Full textWinter, Eitan E. "Evolutionary analyses of protein-coding genes using large biological data sets." Thesis, University of Oxford, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.427615.
Full textMostafa, Nour. "Intelligent dynamic caching for large data sets in a grid environment." Thesis, Queen's University Belfast, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.602689.
Full textKim, Hyeyoen. "Large data sets and nonlinearity : essays in international finance and macroeconomics." Thesis, University of Warwick, 2009. http://wrap.warwick.ac.uk/3747/.
Full textNguyen, Minh Quoc. "Toward accurate and efficient outlier detection in high dimensional and large data sets." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34657.
Full textTowfeek, Ajden. "Multi-Resolution Volume Rendering of Large Medical Data Sets on the GPU." Thesis, Linköping University, Department of Science and Technology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10715.
Full textVolume rendering techniques can be powerful tools when visualizing medical data sets. The characteristics of being able to capture 3-D internal structures make the technique attractive. Scanning equipment is producing medical images, with rapidly increasing resolution, resulting in heavily increased size of the data set. Despite the great amount of processing power CPUs deliver, the required precision in image quality can be hard to obtain in real-time rendering. Therefore, it is highly desirable to optimize the rendering process.
Modern GPUs possess much more computational power and is available for general purpose programming through high level shading languages. Efficient representations of the data are crucial due to the limited memory provided by the GPU. This thesis describes the theoretical background and the implementation of an approach presented by Patric Ljung, Claes Lundström and Anders Ynnerman at Linköping University. The main objective is to implement a fully working multi-resolution framework with two separate pipelines for pre-processing and real-time rendering, which uses the GPU to visualize large medical data sets.
González, David Muñoz. "Discovering unknown equations that describe large data sets using genetic programming techniques." Thesis, Linköping University, Department of Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2639.
Full textFIR filters are widely used nowadays, with applications from MP3 players, Hi-Fi systems, digital TVs, etc. to communication systems like wireless communication. They are implemented in DSPs and there are several trade-offs that make important to have an exact as possible estimation of the required filter order.
In order to find a better estimation of the filter order than the existing ones, genetic expression programming (GEP) is used. GEP is a Genetic Algorithm that can be used in function finding. It is implemented in a commercial application which, after the appropriate input file and settings have been provided, performs the evolution of the individuals in the input file so that a good solution is found. The thesis is the first one in this new research line.
The aim has been not only reaching the desired estimation but also pave the way for further investigations.
Bäckström, Daniel. "Managing and Exploring Large Data Sets Generated by Liquid Separation - Mass Spectrometry." Doctoral thesis, Uppsala University, Analytical Chemistry, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8223.
Full textA trend in natural science and especially in analytical chemistry is the increasing need for analysis of a large number of complex samples with low analyte concentrations. Biological samples (urine, blood, plasma, cerebral spinal fluid, tissue etc.) are often suitable for analysis with liquid separation mass spectrometry (LS-MS), resulting in two-way data tables (time vs. m/z). Such biological 'fingerprints' taken for all samples in a study correspond to a large amount of data. Detailed characterization requires a high sampling rate in combination with high mass resolution and wide mass range, which presents a challenge in data handling and exploration. This thesis describes methods for managing and exploring large data sets made up of such detailed 'fingerprints' (represented as data matrices).
The methods were implemented as scripts and functions in Matlab, a wide-spread environment for matrix manipulations. A single-file structure to hold the imported data facilitated both easy access and fast manipulation. Routines for baseline removal and noise reduction were intended to reduce the amount of data without loosing relevant information. A tool for visualizing and exploring single runs was also included. When comparing two or more 'fingerprints' they usually have to be aligned due to unintended shifts in analyte positions in time and m/z. A PCA-like multivariate method proved to be less sensitive to such shifts, and an ANOVA implementation made it easier to find systematic differences within the data sets.
The above strategies and methods were applied to complex samples such as plasma, protein digests, and urine. The field of application included urine profiling (paracetamole intake; beverage effects), peptide mapping (different digestion protocols) and search for potential biomarkers (appendicitis diagnosis) . The influence of the experimental factors was visualized by PCA score plots as well as clustering diagrams (dendrograms).
Cutchin, Andrew E. Donahoo Michael J. "Towards efficient and practical reliable bulk data transport for large receiver sets." Waco, Tex. : Baylor University, 2007. http://hdl.handle.net/2104/5140.
Full textDutta, Soumya. "In Situ Summarization and Visual Exploration of Large-scale Simulation Data Sets." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524070976058567.
Full textBlanc, Trevor Jon. "Analysis and Compression of Large CFD Data Sets Using Proper Orthogonal Decomposition." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/5303.
Full textDeri, Joya A. "Graph Signal Processing: Structure and Scalability to Massive Data Sets." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/725.
Full textQuiroz, Matias. "Bayesian Inference in Large Data Problems." Doctoral thesis, Stockholms universitet, Statistiska institutionen, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-118836.
Full textAt the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Submitted. Paper 2: Submitted. Paper 3: Manuscript. Paper 4: Manuscript.
Boukorca, Ahcène. "Hypergraphs in the Service of Very Large Scale Query Optimization. Application : Data Warehousing." Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2016. http://www.theses.fr/2016ESMA0026/document.
Full textThe emergence of the phenomenon Big-Data conducts to the introduction of new increased and urgent needs to share data between users and communities, which has engender a large number of queries that DBMS must handle. This problem has been compounded by other needs of recommendation and exploration of queries. Since data processing is still possible through solutions of query optimization, physical design and deployment architectures, in which these solutions are the results of combinatorial problems based on queries, it is essential to review traditional methods to respond to new needs of scalability. This thesis focuses on the problem of numerous queries and proposes a scalable approach implemented on framework called Big-queries and based on the hypergraph, a flexible data structure, which bas a larger modeling power and may allow accurate formulation of many problems of combinatorial scientific computing. This approach is the result of collaboration with the company Mentor Graphies. It aims to capture the queries interaction in an unified query plan and to use partitioning algorithms to ensure scalability and to optimal optimization structures (materialized views and data partitioning). Also, the unified plan is used in the deploymemt phase of parallel data warehouses, by allowing data partitioning in fragments and allocating these fragments in the correspond processing nodes. Intensive experimental study sbowed the interest of our approach in terms of scaling algorithms and minimization of query response time
Bresell, Anders. "Characterization of protein families, sequence patterns, and functional annotations in large data sets." Doctoral thesis, Linköping : Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10565.
Full textCastro, Jose R. "MODIFICATIONS TO THE FUZZY-ARTMAP ALGORITHM FOR DISTRIBUTED LEARNING IN LARGE DATA SETS." Doctoral diss., University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4449.
Full textPh.D.
School of Electrical and Computer Engineering
Engineering and Computer Science
Electrical and Computer Engineering
Brind'Amour, Katherine. "Maternal and Child Health Home Visiting Evaluations Using Large, Pre-Existing Data Sets." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1468965739.
Full textNyumbeka, Dumisani Joshua. "Using data analysis and Information visualization techniques to support the effective analysis of large financial data sets." Thesis, Nelson Mandela Metropolitan University, 2016. http://hdl.handle.net/10948/12983.
Full textLi, Yanrong. "Techniques for improving clustering and association rules mining from very large transactional databases." Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/907.
Full text