To see the other types of publications on this topic, follow the link: Very large data sets.

Journal articles on the topic 'Very large data sets'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Very large data sets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Kui, Linlin Ge, Zhe Hu, Alex Hay-Man Ng, Xiaojing Li, and Chris Rizos. "Phase Unwrapping for Very Large Interferometric Data Sets." IEEE Transactions on Geoscience and Remote Sensing 49, no. 10 (October 2011): 4048–61. http://dx.doi.org/10.1109/tgrs.2011.2130530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kettaneh, Nouna, Anders Berglund, and Svante Wold. "PCA and PLS with very large data sets." Computational Statistics & Data Analysis 48, no. 1 (January 2005): 69–85. http://dx.doi.org/10.1016/j.csda.2003.11.027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bottou, L�on, and Yann Le Cun. "On-line learning for very large data sets." Applied Stochastic Models in Business and Industry 21, no. 2 (2005): 137–51. http://dx.doi.org/10.1002/asmb.538.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cressie, Noel, and Gardar Johannesson. "Fixed rank kriging for very large spatial data sets." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, no. 1 (January 4, 2008): 209–26. http://dx.doi.org/10.1111/j.1467-9868.2007.00633.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Harrison, L. M., and G. G. R. Green. "A Bayesian spatiotemporal model for very large data sets." NeuroImage 50, no. 3 (April 2010): 1126–41. http://dx.doi.org/10.1016/j.neuroimage.2009.12.042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kazar, Baris. "High performance spatial data mining for very large data-sets (citation_only)." ACM SIGPLAN Notices 38, no. 10 (October 2003): 1. http://dx.doi.org/10.1145/966049.781509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Angiulli, F., and G. Folino. "Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets." IEEE Transactions on Knowledge and Data Engineering 19, no. 12 (December 2007): 1593–606. http://dx.doi.org/10.1109/tkde.2007.190665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Maarel, Eddy, Ileana Espejel, and Patricia Moreno-Casasola. "Two-step vegetation analysis based on very large data sets." Vegetatio 68, no. 3 (January 1987): 139–43. http://dx.doi.org/10.1007/bf00114714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hathaway, Richard J., and James C. Bezdek. "Extending fuzzy and probabilistic clustering to very large data sets." Computational Statistics & Data Analysis 51, no. 1 (November 2006): 215–34. http://dx.doi.org/10.1016/j.csda.2006.02.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Liang, James C. Bezdek, Christopher Leckie, and Ramamohanarao Kotagiri. "Selective sampling for approximate clustering of very large data sets." International Journal of Intelligent Systems 23, no. 3 (2008): 313–31. http://dx.doi.org/10.1002/int.20268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Weili Wu, Hong Gao, and Jianzhong Li. "New Algorithm for Computing Cube on Very Large Compressed Data Sets." IEEE Transactions on Knowledge and Data Engineering 18, no. 12 (December 2006): 1667–80. http://dx.doi.org/10.1109/tkde.2006.195.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Jian-xiong Dong, A. Krzyzak, and C. Y. Suen. "Fast SVM training algorithm with decomposition on very large data sets." IEEE Transactions on Pattern Analysis and Machine Intelligence 27, no. 4 (April 2005): 603–18. http://dx.doi.org/10.1109/tpami.2005.77.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Cautis, Bogdan, Alin Deutsch, Nicola Onose, and Vasilis Vassalos. "Querying XML data sources that export very large sets of views." ACM Transactions on Database Systems 36, no. 1 (March 2011): 1–42. http://dx.doi.org/10.1145/1929934.1929939.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Dzwinel, Witold, and Rafał Wcisło. "Very Fast Interactive Visualization of Large Sets of High-dimensional Data." Procedia Computer Science 51 (2015): 572–81. http://dx.doi.org/10.1016/j.procs.2015.05.325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Campobello, Giuseppe, Mirko Mantineo, Giuseppe Patanè, and Marco Russo. "LBGS: a smart approach for very large data sets vector quantization." Signal Processing: Image Communication 20, no. 1 (January 2005): 91–114. http://dx.doi.org/10.1016/j.image.2004.10.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Sardjono, Sardjono, R. Yadi Rakhman Alamsyah, Marwondo Marwondo, and Elia Setiana. "Data Cleansing Strategies on Data Sets Become Data Science." International Journal of Quantitative Research and Modeling 1, no. 3 (September 3, 2020): 145–56. http://dx.doi.org/10.46336/ijqrm.v1i3.71.

Full text
Abstract:
The digital era very grows up with the increasing using of smartphone and many organization or companies was implemented of a system to support their business. That is who will increase the volume of usage and dissemination of data, neither through open nor closed internet networks. Because there is the need to process large data and how to get it from different store resource, so requirement strategy to process the data according to the rule of good, effective and efficient in activity data cleansing until the data set can be use as mature and very useful information for their business purpose. By using the R languaged who can process large data and has data complexity for the data loaded from different storage resource can be done as well as. To using R languaged maximally, so we have to a basic skill that needed to process the data set which will be used to be data scient for organizations or companies by good data cleansing techniques. In this research on Data Cleansing Strategies on data set owned by organizations,will describe the correct step by step to obtaining data that very useful to be uses as data science for organization so by the data that generated after the data cleansing process is very meaningful and useful for making decisions, other than that this research give basic overview and guide to the beginner all data scientists by doing data cleansing in the way stages and also provides a way to analyze from the result of execution some functions used.
APA, Harvard, Vancouver, ISO, and other styles
17

Kriege, Nils, Petra Mutzel, and Till Schäfer. "Practical SAHN Clustering for Very Large Data Sets and Expensive Distance Metrics." Journal of Graph Algorithms and Applications 18, no. 4 (2014): 577–602. http://dx.doi.org/10.7155/jgaa.00338.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

van Teijlingen, Alexander, and Tell Tuttle. "Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets." Journal of Chemical Theory and Computation 17, no. 5 (April 27, 2021): 3221–32. http://dx.doi.org/10.1021/acs.jctc.1c00159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Littau, David, and Daniel Boley. "CLUSTERING VERY LARGE DATA SETS USING A LOW MEMORY MATRIX FACTORED REPRESENTATION." Computational Intelligence 25, no. 2 (May 2009): 114–35. http://dx.doi.org/10.1111/j.1467-8640.2009.00331.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Yildiz, Beytullah, Kesheng Wu, Suren Byna, and Arie Shoshani. "Parallel membership queries on very large scientific data sets using bitmap indexes." Concurrency and Computation: Practice and Experience 31, no. 15 (January 28, 2019): e5157. http://dx.doi.org/10.1002/cpe.5157.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Peköz, Erol A., Michael Shwartz, Cindy L. Christiansen, and Dan Berlowitz. "Approximate models for aggregate data when individual-level data sets are very large or unavailable." Statistics in Medicine 29, no. 21 (August 26, 2010): 2180–93. http://dx.doi.org/10.1002/sim.3979.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Keim, Daniel A., Ming C. Hao, Umesh Dayal, and Meichun Hsu. "Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets." Information Visualization 1, no. 1 (March 2002): 20–34. http://dx.doi.org/10.1057/palgrave.ivs.9500003.

Full text
Abstract:
Simple presentation graphics are intuitive and easy-to-use, but show only highly aggregated data presenting only a very small number of data values (as in the case of bar charts) and may have a high degree of overlap occluding a significant portion of the data values (as in the case of the x-y plots). In this article, the authors therefore propose a generalization of traditional bar charts and x-y plots, which allows the visualization of large amounts of data. The basic idea is to use the pixels within the bars to present detailed information of the data records. The so-called pixel bar charts retain the intuitiveness of traditional bar charts while allowing very large data sets to be visualized in an effective way. It is shown that, for an effective pixel placement, a complex optimization problem has to be solved. The authors then present an algorithm which efficiently solves the problem. The application to a number of real-world e-commerce data sets shows the wide applicability and usefulness of this new idea, and a comparison to other well-known visualization techniques (parallel coordinates and spiral techniques) shows a number of clear advantages.
APA, Harvard, Vancouver, ISO, and other styles
23

Keim, Daniel A., Ming C. Hao, Umesh Dayal, and Meichun Hsu. "Pixel bar charts: a visualization technique for very large multi-attribute data sets." Information Visualization 1, no. 1 (March 2002): 20–34. http://dx.doi.org/10.1057/palgrave/ivs/9500003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Harrington, Justin, and Matias Salibián-Barrera. "Finding approximate solutions to combinatorial problems with very large data sets using BIRCH." Computational Statistics & Data Analysis 54, no. 3 (March 2010): 655–67. http://dx.doi.org/10.1016/j.csda.2008.08.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

STARIKOV, Valentin S., Maxim L. NEE, and Anastasia A. IVANOVA. "TRANSNATIONALISM ONLINE: EXPLORING MIGRATION PROCESSES WITH LARGE DATA SETS." Monitoring of public opinion economic&social changes, no. 5 (November 10, 2018): 0. http://dx.doi.org/10.14515/monitoring.2018.5.17.

Full text
Abstract:
The exponential growth of online technologies in everyday life transforms the very contours of social phenomena, processes, and institutions well known to sociologists. We discuss these transformations in two interrelated areas: transnational migration and extremism. First, the paper proposes an approach to examine «transnationalism online» as a sub-set of transnational migration studies. Second, it presents a critical review of how contemporary scholars study extremist activities and discourse of those who are involved in migration with a special focus on online manifestations of extremism. In a concluding part of the paper we present theoretical and methodological comments on the paths in examining the «dark side» of transnationalism online.
APA, Harvard, Vancouver, ISO, and other styles
26

Appice, Annalisa, Michelangelo Ceci, Antonio Turi, and Donato Malerba. "A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets." Intelligent Data Analysis 15, no. 1 (January 19, 2011): 69–88. http://dx.doi.org/10.3233/ida-2010-0456.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Maupin, Valérie. "Combining asynchronous data sets in regional body-wave tomography." Geophysical Journal International 224, no. 1 (October 5, 2020): 401–15. http://dx.doi.org/10.1093/gji/ggaa473.

Full text
Abstract:
SUMMARY Regional body-wave tomography is a very popular tomographic method consisting in inverting relative traveltime residuals of teleseismic body waves measured at regional networks. It is well known that the resulting inverse seismic model is relative to an unknown vertically varying reference model. If jointly inverting data obtained with networks in the vicinity of each other but operating at different times, the relative velocity anomalies in different areas of the model may have different reference levels, possibly introducing large-scale biases in the model that may compromise the interpretation. This is very unfortunate as we have numerous examples of asynchronous network deployments which would benefit from a joint analysis. We show here how a simple improvement in the formulation of the sensitivity kernels allows us to mitigate this problem. Using sensitivity kernels that take into account that data processing implies a zero mean residual for each event, the large-scale biases that otherwise arise in the inverse model using data from asynchronous station deployment are largely removed. We illustrate this first with a very simple 3-station example, and then compare the results obtained using the usual and the relative kernels in synthetic tests with more realistic station coverage, simulating data acquisition at two neighbouring asynchronous networks.
APA, Harvard, Vancouver, ISO, and other styles
28

Canning, Anat, and Gerald H. F. Gardner. "Regularizing 3-D data sets with DMO." GEOPHYSICS 61, no. 4 (July 1996): 1103–14. http://dx.doi.org/10.1190/1.1444031.

Full text
Abstract:
The combination of DMO and [Formula: see text] is used here to change the original acquisition geometry of a 3-D seismic data set into a more convenient form. For example, irregular 3-D surveys can be projected onto a regular midpoint‐offset grid with zero source‐receiver azimuth and equal increments in offset. The algorithm presented here is based on a new, nonaliased 3-D DMO algorithm in (f, x) domain. It does not require any knowledge of the velocity function for constant or rms velocity variations. The computer program was designed to process and to output very large multifold 3-D data sets. A synthetic example of a point diffractor in 3-D space and a 3-D experiment in a physical modeling tank are used to demonstrate the procedure. In both cases, the results obtained after the data set is regularized are compared with a data set that was acquired initially with the desired configuration. These comparisons show very good agreement. Analysis of the procedure indicates that it may not reconstruct AVO correctly. This is an inherent problem that occurs because the reorganization procedure changes the angle of incidence.
APA, Harvard, Vancouver, ISO, and other styles
29

Reiter, Lukas, Manfred Claassen, Sabine P. Schrimpf, Marko Jovanovic, Alexander Schmidt, Joachim M. Buhmann, Michael O. Hengartner, and Ruedi Aebersold. "Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry." Molecular & Cellular Proteomics 8, no. 11 (July 16, 2009): 2405–17. http://dx.doi.org/10.1074/mcp.m900317-mcp200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Toumoulin, C., C. Boldak, J. L. Dillenseger, J. L. Coatrieux, and Y. Rolland. "Fast detection and characterization of vessels in very large 3-D data sets using geometrical moments." IEEE Transactions on Biomedical Engineering 48, no. 5 (May 2001): 604–6. http://dx.doi.org/10.1109/10.918601.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Boskova, Veronika, and Tanja Stadler. "PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences." Molecular Biology and Evolution 37, no. 10 (June 3, 2020): 3061–75. http://dx.doi.org/10.1093/molbev/msaa136.

Full text
Abstract:
Abstract Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
APA, Harvard, Vancouver, ISO, and other styles
32

Pham, D. T., and A. A. Afify. "SRI: A Scalable Rule Induction Algorithm." Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 220, no. 4 (April 1, 2006): 537–52. http://dx.doi.org/10.1243/09544062c18304.

Full text
Abstract:
Rule induction as a method for constructing classifiers is particularly attractive in data mining applications, where the comprehensibility of the generated models is very important. Most existing techniques were designed for small data sets and thus are not practical for direct use on very large data sets because of their computational inefficiency. Scaling up rule induction methods to handle such data sets is a formidable challenge. This article presents a new algorithm for rule induction that can efficiently extract accurate and comprehensible models from large and noisy data sets. This algorithm has been tested on several complex data sets, and the results prove that it scales up well and is an extremely effective learner.
APA, Harvard, Vancouver, ISO, and other styles
33

Mudunuri, Uma S., Mohamad Khouja, Stephen Repetski, Girish Venkataraman, Anney Che, Brian T. Luke, F. Pascal Girard, and Robert M. Stephens. "Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data." PLoS ONE 8, no. 12 (December 2, 2013): e80503. http://dx.doi.org/10.1371/journal.pone.0080503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Kim, TaeHyung, Marc S. Tyndel, Haiming Huang, Sachdev S. Sidhu, Gary D. Bader, David Gfeller, and Philip M. Kim. "MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets." Nucleic Acids Research 40, no. 6 (December 31, 2011): e47-e47. http://dx.doi.org/10.1093/nar/gkr1294.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Gehrke, S., and B. T. Beshah. "RADIOMETRIC NORMALIZATION OF LARGE AIRBORNE IMAGE DATA SETS ACQUIRED BY DIFFERENT SENSOR TYPES." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B1 (June 3, 2016): 317–26. http://dx.doi.org/10.5194/isprsarchives-xli-b1-317-2016.

Full text
Abstract:
Generating seamless mosaics of aerial images is a particularly challenging task when the mosaic comprises a large number of im-ages, collected over longer periods of time and with different sensors under varying imaging conditions. Such large mosaics typically consist of very heterogeneous image data, both spatially (different terrain types and atmosphere) and temporally (unstable atmo-spheric properties and even changes in land coverage). <br><br> We present a new radiometric normalization or, respectively, radiometric aerial triangulation approach that takes advantage of our knowledge about each sensor’s properties. The current implementation supports medium and large format airborne imaging sensors of the Leica Geosystems family, namely the ADS line-scanner as well as DMC and RCD frame sensors. A hierarchical modelling – with parameters for the overall mosaic, the sensor type, different flight sessions, strips and individual images – allows for adaptation to each sensor’s geometric and radiometric properties. Additional parameters at different hierarchy levels can compensate radiome-tric differences of various origins to compensate for shortcomings of the preceding radiometric sensor calibration as well as BRDF and atmospheric corrections. The final, relative normalization is based on radiometric tie points in overlapping images, absolute radiometric control points and image statistics. It is computed in a global least squares adjustment for the entire mosaic by altering each image’s histogram using a location-dependent mathematical model. This model involves contrast and brightness corrections at radiometric fix points with bilinear interpolation for corrections in-between. The distribution of the radiometry fixes is adaptive to each image and generally increases with image size, hence enabling optimal local adaptation even for very long image strips as typi-cally captured by a line-scanner sensor. <br><br> The normalization approach is implemented in HxMap software. It has been successfully applied to large sets of heterogeneous imagery, including the adjustment of original sensor images prior to quality control and further processing as well as radiometric adjustment for ortho-image mosaic generation.
APA, Harvard, Vancouver, ISO, and other styles
36

Gehrke, S., and B. T. Beshah. "RADIOMETRIC NORMALIZATION OF LARGE AIRBORNE IMAGE DATA SETS ACQUIRED BY DIFFERENT SENSOR TYPES." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B1 (June 3, 2016): 317–26. http://dx.doi.org/10.5194/isprs-archives-xli-b1-317-2016.

Full text
Abstract:
Generating seamless mosaics of aerial images is a particularly challenging task when the mosaic comprises a large number of im-ages, collected over longer periods of time and with different sensors under varying imaging conditions. Such large mosaics typically consist of very heterogeneous image data, both spatially (different terrain types and atmosphere) and temporally (unstable atmo-spheric properties and even changes in land coverage). <br><br> We present a new radiometric normalization or, respectively, radiometric aerial triangulation approach that takes advantage of our knowledge about each sensor’s properties. The current implementation supports medium and large format airborne imaging sensors of the Leica Geosystems family, namely the ADS line-scanner as well as DMC and RCD frame sensors. A hierarchical modelling – with parameters for the overall mosaic, the sensor type, different flight sessions, strips and individual images – allows for adaptation to each sensor’s geometric and radiometric properties. Additional parameters at different hierarchy levels can compensate radiome-tric differences of various origins to compensate for shortcomings of the preceding radiometric sensor calibration as well as BRDF and atmospheric corrections. The final, relative normalization is based on radiometric tie points in overlapping images, absolute radiometric control points and image statistics. It is computed in a global least squares adjustment for the entire mosaic by altering each image’s histogram using a location-dependent mathematical model. This model involves contrast and brightness corrections at radiometric fix points with bilinear interpolation for corrections in-between. The distribution of the radiometry fixes is adaptive to each image and generally increases with image size, hence enabling optimal local adaptation even for very long image strips as typi-cally captured by a line-scanner sensor. <br><br> The normalization approach is implemented in HxMap software. It has been successfully applied to large sets of heterogeneous imagery, including the adjustment of original sensor images prior to quality control and further processing as well as radiometric adjustment for ortho-image mosaic generation.
APA, Harvard, Vancouver, ISO, and other styles
37

Yamada, Ryo, Daigo Okada, Juan Wang, Tapati Basak, and Satoshi Koyama. "Interpretation of omics data analyses." Journal of Human Genetics 66, no. 1 (May 8, 2020): 93–102. http://dx.doi.org/10.1038/s10038-020-0763-5.

Full text
Abstract:
AbstractOmics studies attempt to extract meaningful messages from large-scale and high-dimensional data sets by treating the data sets as a whole. The concept of treating data sets as a whole is important in every step of the data-handling procedures: the pre-processing step of data records, the step of statistical analyses and machine learning, translation of the outputs into human natural perceptions, and acceptance of the messages with uncertainty. In the pre-processing, the method by which to control the data quality and batch effects are discussed. For the main analyses, the approaches are divided into two types and their basic concepts are discussed. The first type is the evaluation of many items individually, followed by interpretation of individual items in the context of multiple testing and combination. The second type is the extraction of fewer important aspects from the whole data records. The outputs of the main analyses are translated into natural languages with techniques, such as annotation and ontology. The other technique for making the outputs perceptible is visualization. At the end of this review, one of the most important issues in the interpretation of omics data analyses is discussed. Omics studies have a large amount of information in their data sets, and every approach reveals only a very restricted aspect of the whole data sets. The understandable messages from these studies have unavoidable uncertainty.
APA, Harvard, Vancouver, ISO, and other styles
38

Figueroa, Juan Luis Peñaloza, and Carmen Vargas Pérez. "Business Strategies Based on Large Sets of Data and Interaction: Business Intelligence." European Journal of Economics and Business Studies 9, no. 1 (October 6, 2017): 156. http://dx.doi.org/10.26417/ejes.v9i1.p156-167.

Full text
Abstract:
.The dominant perspective in Business Intelligence (BI) projects and applications has been the technological conception, usually focused on the technical-instrumental nature of computing. This conception has avoided the change in the paradigm from a business model based on the use of tangible resources in favour of one based on the exploitation of intangible resources (data, interaction, networks, etc.). This would explain why applied BI projects, remain anchored in the old organization and operational patterns of traditional businesses in most companies. The technological perception of BI gives continuity to the stovepipe activity of companies, both in their management and in their organizational structures, where the impact of interaction as a generator of business opportunities is very limited, and often non-existent; and the effect of large volumes of data as a value generator is reduced to an operational and technical problem. Hence, the importance of considering BI as a new business philosophy that entails new forms of business organization and a new way of management based on the interaction and analysis of large volumes of internally generated data. Our interest is not only to emphasize the nature of the new business philosophy in the application of BI, but to carry out a discussion about the organizational and operational structure of businesses -according to a conception based on interaction and data as generators of business value-, and about actionable intelligence.
APA, Harvard, Vancouver, ISO, and other styles
39

Gebicke-Haerter, P. "Molecular systems biology and management of complex data sets." European Psychiatry 26, S2 (March 2011): 2223. http://dx.doi.org/10.1016/s0924-9338(11)73925-6.

Full text
Abstract:
Theoretically, high throughput technologies that have become available in molecular biology and are continuously refined to generate even more reliable datasets permit more and more insights into ongoing dynamic events that may result in improvements of biological systems or the development of diseases. The more comprehensive datasets are the more they reflect the status of a molecular system. If they are obtained at various time points, they encompass its development. In practice, however, their biological interpretation remains a challenge. Presently, we do not have the tools required to decode the full biological message encrypted in e.g. expression profiling, genome-wide DNA-methylation patterns or in the so-called “histone code”. Consequently, strategies are aimed at picking molecular subnetworks that we are familiar with from previous work. Alternatively, unbiased approaches use as many data as possible for insertion into mathematical programs to perform time-dependent computer simulations of increasingly larger networks. Although this strategy appears to be straightforward and very attractive, its mandatory extension to large molecular networks presently reveals a lack of efficient mathematical tools and computational power. Advantages and shortcomings of available algorithms will be discussed.
APA, Harvard, Vancouver, ISO, and other styles
40

Hosseini, Kasra, and Karin Sigloch. "ObspyDMT: a Python toolbox for retrieving and processing large seismological data sets." Solid Earth 8, no. 5 (October 12, 2017): 1047–70. http://dx.doi.org/10.5194/se-8-1047-2017.

Full text
Abstract:
Abstract. We present obspyDMT, a free, open-source software toolbox for the query, retrieval, processing and management of seismological data sets, including very large, heterogeneous and/or dynamically growing ones. ObspyDMT simplifies and speeds up user interaction with data centers, in more versatile ways than existing tools. The user is shielded from the complexities of interacting with different data centers and data exchange protocols and is provided with powerful diagnostic and plotting tools to check the retrieved data and metadata. While primarily a productivity tool for research seismologists and observatories, easy-to-use syntax and plotting functionality also make obspyDMT an effective teaching aid. Written in the Python programming language, it can be used as a stand-alone command-line tool (requiring no knowledge of Python) or can be integrated as a module with other Python codes. It facilitates data archiving, preprocessing, instrument correction and quality control – routine but nontrivial tasks that can consume much user time. We describe obspyDMT's functionality, design and technical implementation, accompanied by an overview of its use cases. As an example of a typical problem encountered in seismogram preprocessing, we show how to check for inconsistencies in response files of two example stations. We also demonstrate the fully automated request, remote computation and retrieval of synthetic seismograms from the Synthetics Engine (Syngine) web service of the Data Management Center (DMC) at the Incorporated Research Institutions for Seismology (IRIS).
APA, Harvard, Vancouver, ISO, and other styles
41

Zhang, Wangda, Junyoung Kim, Kenneth A. Ross, Eric Sedlar, and Lukas Stadler. "Adaptive code generation for data-intensive analytics." Proceedings of the VLDB Endowment 14, no. 6 (February 2021): 929–42. http://dx.doi.org/10.14778/3447689.3447697.

Full text
Abstract:
Modern database management systems employ sophisticated query optimization techniques that enable the generation of efficient plans for queries over very large data sets. A variety of other applications also process large data sets, but cannot leverage database-style query optimization for their code. We therefore identify an opportunity to enhance an open-source programming language compiler with database-style query optimization. Our system dynamically generates execution plans at query time, and runs those plans on chunks of data at a time. Based on feedback from earlier chunks, alternative plans might be used for later chunks. The compiler extension could be used for a variety of data-intensive applications, allowing all of them to benefit from this class of performance optimizations.
APA, Harvard, Vancouver, ISO, and other styles
42

Blome, Mark, Hansruedi Maurer, and Stewart Greenhalgh. "Geoelectric experimental design — Efficient acquisition and exploitation of complete pole-bipole data sets." GEOPHYSICS 76, no. 1 (January 2011): F15—F26. http://dx.doi.org/10.1190/1.3511350.

Full text
Abstract:
Exploiting the information content offered by geoelectric data in an efficient manner requires careful selection of the electrode configurations to be used. This can be achieved using sequential experimental design techniques proposed over the past few years. However, these techniques become impractical when large-scale 2D or 3D experiments have to be designed. Even if sequential experimental design were applicable, acquisition of the resulting data sets would require an unreasonably large effort using traditional multielectrode arrays. We present a new, fully parallelized pole-bipole measuring strategy by which large amounts of data can be acquired swiftly. Furthermore, we introduce a new experimental design concept that is based on “complete” data sets in terms of linear independence. Complete data sets include a relatively small number of basis electrode configurations, from which any other configuration can be reconstructedby superposition. The totality of possible configurations is referred to as the comprehensive data set. We demonstrate the benefits of such reconstructions using eigenvalue analyses for the case of noise-free data. In the presence of realistic noise, such reconstructions lead to unstable results when only four-point (bipole-bipole) configurations are considered. In contrast, complete three-point (pole-bipole) data sets allow more stable reconstructions. Moreover, complete pole-bipole data sets can be acquired very efficiently with a fully parallelized system. Resolution properties of complete pole-bipole data sets are illustrated using both noise-free and noisy synthetic data sets. We also show results from a field survey performed over a buried waste disposal site, which further demonstrates the usefulness of our approach. Although this paper is restricted to 2D examples, it is trivial to extend the concept to 3D surveys, where the advantages of parallelized pole-bipole data acquisition become very significant.
APA, Harvard, Vancouver, ISO, and other styles
43

Davies, R. J. "A new batch-processing data-reduction application for X-ray diffraction data." Journal of Applied Crystallography 39, no. 2 (March 12, 2006): 267–72. http://dx.doi.org/10.1107/s0021889806008697.

Full text
Abstract:
Modern synchrotron radiation facility beamlines offer high-brilliance beams and sensitive area detectors. Consequently, experiments such as scanning X-ray microdiffraction can generate large data sets within relatively short time periods. In these specialist fields there are currently very few automated data-treatment solutions to tackle the large data sets produced. Where there is existing software, it is either insufficiently specialized or cannot be operated in a batch-wise processing mode. As a result, a large gap exists between the rate at which X-ray diffraction data can be generated and the rate at which they can be realistically analysed. This article describes a new software application to perform batch-wise data reduction. It is designed to operate in combination with the commonly usedFit2Dprogram. Through the use of intuitive file selection, numerous processing lists and a generic operation sequence, it is capable of the batch-wise reduction of up to 60 000 diffraction patterns during each treatment session. It can perform automated intensity corrections to large data series, perform advanced background-subtraction operations and automatically organizes results. Integration limits can be set graphically on-screen, uniquely derived from existing peak positions or globally calculated from user-supplied values. The software represents a working solution to a hitherto unsolved problem.
APA, Harvard, Vancouver, ISO, and other styles
44

Liu, Xiaomei, Lawrence O. Hall, and Kevin W. Bowyer. "Comments on “A Parallel Mixture of SVMs for Very Large Scale Problems”." Neural Computation 16, no. 7 (July 1, 2004): 1345–51. http://dx.doi.org/10.1162/089976604323057416.

Full text
Abstract:
Collobert, Bengio, and Bengio (2002) recently introduced a novel approach to using a neural network to provide a class prediction from an ensemble of support vector machines (SVMs). This approach has the advantage that the required computation scales well to very large data sets. Experiments on the Forest Cover data set show that this parallel mixture is more accurate than a single SVM, with 90.72% accuracy reported on an independent test set. Although this accuracy is impressive, their article does not consider alternative types of classifiers. We show that a simple ensemble of decision trees results in a higher accuracy, 94.75%, and is computationally efficient. This result is somewhat surprising and illustrates the general value of experimental comparisons using different types of classifiers.
APA, Harvard, Vancouver, ISO, and other styles
45

Pike, Rob, Sean Dorward, Robert Griesemer, and Sean Quinlan. "Interpreting the Data: Parallel Analysis with Sawzall." Scientific Programming 13, no. 4 (2005): 277–98. http://dx.doi.org/10.1155/2005/962135.

Full text
Abstract:
Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design – including the separation into two phases, the form of the programming language, and the properties of the aggregators – exploits the parallelism inherent in having data and computation distributed across many machines.
APA, Harvard, Vancouver, ISO, and other styles
46

Görbitz, Carl Henrik. "What is the best crystal size for collection of X-ray data? Refinement of the structure of glycyl-L-serine based on data from a very large crystal." Acta Crystallographica Section B Structural Science 55, no. 6 (December 1, 1999): 1090–98. http://dx.doi.org/10.1107/s0108768199008721.

Full text
Abstract:
The dipeptide Gly-L-Ser was crystallized as part of a study on hydrogen-bonding patterns in the structures of dipeptides. Hydrogen-bond donors and acceptors have been assigned ranks (1 is best, 2 is next best etc.), and the observed hydrogen-bond connectivity is compared with the hypothetical pattern in which the rank n donor associates with the rank n acceptor (n = 1, 2, . . .), and with the pattern observed in the retroanalogue L-Ser-Gly, which contains the same functional groups. Crystallization of the title compound produced very bulky crystals. Rather than reducing the size of one of these before data collection, three data sets with different exposure times were collected with a Siemens SMART CCD diffractometer on a very large specimen (2.2 × 2.0 × 0.8 mm). The crystal was subsequently shaped into a 0.30 mm-diameter sphere for collection of two additional data sets. The discussion of the refinement results focus on the effect of absorption correction for the various data sets, and a comparison of geometrical and thermal parameters. One advantage of using a large crystal, the great speed with which data can be obtained, has been exemplified by collection of a complete data set of good quality in less than 25 min.
APA, Harvard, Vancouver, ISO, and other styles
47

Wlodarczyk-Sielicka, Marta, and Wioleta Blaszczak-Bak. "Processing of Bathymetric Data: The Fusion of New Reduction Methods for Spatial Big Data." Sensors 20, no. 21 (October 30, 2020): 6207. http://dx.doi.org/10.3390/s20216207.

Full text
Abstract:
Floating autonomous vehicles are very often equipped with modern systems that collect information about the situation under the water surface, e.g., the depth or type of bottom and obstructions on the seafloor. One such system is the multibeam echosounder (MBES), which collects very large sets of bathymetric data. The development and analysis of such large sets are laborious and expensive. Reduction of the spatial data obtained from bathymetric and other systems collecting spatial data is currently widely used. In commercial programs used in the development of data from hydrographic systems, methods of interpolation to a specific mesh size are very frequently used. The authors of this article previously proposed original the true bathymetric data reduction method (TBDRed) and Optimum Dataset (OptD) reduction methods, which maintain the actual position and depth for each of the measured points, without their interpolation. The effectiveness of the proposed methods has already been presented in previous articles. This article proposes the fusion of original reduction methods, which is a new and innovative approach to the problem of bathymetric data reduction. The article contains a description of the methods used and the methodology of developing bathymetric data. The proposed fusion of reduction methods allows the generation of numerical models that can be a safe, reliable source of information, and a basis for design. Numerical models can also be used in comparative navigation, during the creation of electronic navigation maps and other hydrographic products.
APA, Harvard, Vancouver, ISO, and other styles
48

Stefansson, H. Narfi, Kevin W. Eliceiri, Charles F. Thomas, Amos Ron, Ron DeVore, Robert Sharpley, and John G. White. "Wavelet Compression of Three-Dimensional Time-Lapse Biological Image Data." Microscopy and Microanalysis 11, no. 1 (January 28, 2005): 9–17. http://dx.doi.org/10.1017/s1431927605050014.

Full text
Abstract:
The use of multifocal-plane, time-lapse recordings of living specimens has allowed investigators to visualize dynamic events both within ensembles of cells and individual cells. Recordings of such four-dimensional (4D) data from digital optical sectioning microscopy produce very large data sets. We describe a wavelet-based data compression algorithm that capitalizes on the inherent redunancies within multidimensional data to achieve higher compression levels than can be obtained from single images. The algorithm will permit remote users to roam through large 4D data sets using communication channels of modest bandwidth at high speed. This will allow animation to be used as a powerful aid to visualizing dynamic changes in three-dimensional structures.
APA, Harvard, Vancouver, ISO, and other styles
49

Holemans, Thomas, Zhu Yang, and Maarten Vanierschot. "Efficient Reduced Order Modeling of Large Data Sets Obtained from CFD Simulations." Fluids 7, no. 3 (March 17, 2022): 110. http://dx.doi.org/10.3390/fluids7030110.

Full text
Abstract:
The ever-increasing computational power has shifted direct numerical simulations towards higher Reynolds numbers and large eddy simulations towards industrially-relevant flow scales. However, this increase in both temporal and spatial resolution has severely increased the computational cost of model order reduction techniques. Reducing the full data set to a smaller subset in order to perform reduced-order modeling (ROM) may be an interesting method to keep the computational effort reasonable. Moreover, non-tomographic particle image velocimetry measurements obtain a 2D data set of a 3D flow field and an interesting research question would be to quantify the difference between this 2D ROM compared to the 3D ROM of the full flow field. To provide an answer to both issues, the aim of this study was to test a new method for obtaining POD basis functions from a small subset of data initially and using them afterwards in the ROM of either the complete data set or the reduced data set. Hence, no new method of ROM is presented, but we demonstrate a procedure to significantly reduce the computational effort required for the ROM of very large data sets and a quantification of the error introduced by reducing the size of those data sets. The method applies eigenvalue decomposition on a small subset of data extracted from a full 3D simulation and the obtained temporal coefficients are projected back on the 3D velocity fields to obtain the 3D spatial modes. To test the method, an annular jet was chosen as a flow topology due to its simple geometry and the rich dynamical content of its flow field. First, a smaller data set is extracted from the 2D cross-sectional planes and ROM is performed on that data set. Secondly, the full 3D spatial structures are reconstructed by projecting the temporal coefficients back on the 3D velocity fields and the 2D spatial structures by projecting the temporal coefficients back on the 2D velocity fields. It is shown that two perpendicular lateral planes are sufficient to capture the relevant large-scale structures. As such, the total processing time can be reduced by a factor of 136 and up to 22 times less RAM is needed to complete the ROM processing.
APA, Harvard, Vancouver, ISO, and other styles
50

Wang, Zhanquan, Taoli Han, and Huiqun Yu. "Research of MDCOP mining based on time aggregated graph for large spatio-temproal data sets." Computer Science and Information Systems 16, no. 3 (2019): 891–914. http://dx.doi.org/10.2298/csis180828032w.

Full text
Abstract:
Discovering mixed-drove spatiotemporal co-occurrence patterns (MDCOPs) is important for network security such as distributed denial of service (DDoS) attack. There are usually many features when we are suffering from a DDoS attacks such as the server CPU is heavily occupied for a long time, bandwidth is hoovered and so on. In distributed cooperative intrusion, the feature information from multiple intrusion detection sources should be analyzed simultaneously to find the spatial correlation among the feature information. In addition to spatial correlation, intrusion also has temporal correlation. Some invasions are gradually penetrating, and attacks are the result of cumulative effects over a period of time. So it is necessary to discover mixed-drove spatiotemporal co-occurrence patterns (MDCOPs) in network security. However, it is difficult to mine MDCOPs from large attack event data sets because mining MDCOPs is computationally very expensive. In information security, the set of candidate co-occurrence attack event data sets is exponential in the number of object-types and the spatiotemporal data sets are too large to be managed in memory. To reduce the number of candidate co-occurrence instances, we present a computationally efficient MDCOP Graph Miner algorithm by using Time Aggregated Graph. which can deal with large attack event data sets by means of file index. The correctness, completeness and efficiency of the proposed methods are analyzed.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography