Academic literature on the topic 'Big Data Science'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Big Data Science.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Big Data Science"

1

Wright, Alex. "Big data meets big science." Communications of the ACM 57, no. 7 (July 2014): 13–15. http://dx.doi.org/10.1145/2617660.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Broome, Marion E. "Big data, data science, and big contributions." Nursing Outlook 64, no. 2 (March 2016): 113–14. http://dx.doi.org/10.1016/j.outlook.2016.02.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

McCartney, Patricia R. "Big Data Science." MCN, The American Journal of Maternal/Child Nursing 40, no. 2 (2015): 130. http://dx.doi.org/10.1097/nmc.0000000000000118.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Morik, Katharina, Christian Bockermann, and Sebastian Buschjäger. "Big Data Science." KI - Künstliche Intelligenz 32, no. 1 (December 20, 2017): 27–36. http://dx.doi.org/10.1007/s13218-017-0522-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tonidandel, Scott, Eden B. King, and Jose M. Cortina. "Big Data Methods." Organizational Research Methods 21, no. 3 (November 16, 2016): 525–47. http://dx.doi.org/10.1177/1094428116677299.

Full text
Abstract:
Advances in data science, such as data mining, data visualization, and machine learning, are extremely well-suited to address numerous questions in the organizational sciences given the explosion of available data. Despite these opportunities, few scholars in our field have discussed the specific ways in which the lens of our science should be brought to bear on the topic of big data and big data's reciprocal impact on our science. The purpose of this paper is to provide an overview of the big data phenomenon and its potential for impacting organizational science in both positive and negative ways. We identifying the biggest opportunities afforded by big data along with the biggest obstacles, and we discuss specifically how we think our methods will be most impacted by the data analytics movement. We also provide a list of resources to help interested readers incorporate big data methods into their existing research. Our hope is that we stimulate interest in big data, motivate future research using big data sources, and encourage the application of associated data science techniques more broadly in the organizational sciences.
APA, Harvard, Vancouver, ISO, and other styles
6

Mathias, Dr Elton, Dr Roveena Goveas, and Manish Rajak. "Clinical Research - A Big Data Science Approach." International Journal of Trend in Scientific Research and Development Volume-2, Issue-2 (February 28, 2018): 1075–78. http://dx.doi.org/10.31142/ijtsrd9547.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Yong, Hong Chen, Anjee Gorkhali, Yang Lu, Yiqian Ma, and Ling Li. "Big data analytics and big data science: a survey." Journal of Management Analytics 3, no. 1 (January 2, 2016): 1–42. http://dx.doi.org/10.1080/23270012.2016.1141332.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Chunpeng, Ullrich Steiner, and Alessandro Sepe. "Synchrotron Big Data Science." Small 14, no. 46 (September 17, 2018): 1802291. http://dx.doi.org/10.1002/smll.201802291.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Delaney, Connie White, Jane Englebright, and Thomas Clancy. "Nursing Big Data Science." Journal of Nursing Scholarship 53, no. 3 (May 2021): 259–61. http://dx.doi.org/10.1111/jnu.12664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Saez-Rodriguez, Julio, Markus M. Rinschen, Jürgen Floege, and Rafael Kramann. "Big science and big data in nephrology." Kidney International 95, no. 6 (June 2019): 1326–37. http://dx.doi.org/10.1016/j.kint.2018.11.048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Big Data Science"

1

Islam, Md Zahidul. "A Cloud Based Platform for Big Data Science." Thesis, Linköpings universitet, Programvara och system, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-103700.

Full text
Abstract:
With the advent of cloud computing, resizable scalable infrastructures for data processing is now available to everyone. Software platforms and frameworks that support data intensive distributed applications such as Amazon Web Services and Apache Hadoop enable users to the necessary tools and infrastructure to work with thousands of scalable computers and process terabytes of data. However writing scalable applications that are run on top of these distributed frameworks is still a demanding and challenging task. The thesis aimed to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large data sets, collectively known as “big data”. The term “big-data” in this thesis refers to large, diverse, complex, longitudinal and/or distributed data sets generated from instruments, sensors, internet transactions, email, social networks, twitter streams, and/or all digital sources available today and in the future. We introduced architectures and concepts for implementing a cloud-based infrastructure for analyzing large volume of semi-structured and unstructured data. We built and evaluated an application prototype for collecting, organizing, processing, visualizing and analyzing data from the retail industry gathered from indoor navigation systems and social networks (Twitter, Facebook etc). Our finding was that developing large scale data analysis platform is often quite complex when there is an expectation that the processed data will grow continuously in future. The architecture varies depend on requirements. If we want to make a data warehouse and analyze the data afterwards (batch processing) the best choices will be Hadoop clusters and Pig or Hive. This architecture has been proven in Facebook and Yahoo for years. On the other hand, if the application involves real-time data analytics then the recommendation will be Hadoop clusters with Storm which has been successfully used in Twitter. After evaluating the developed prototype we introduced a new architecture which will be able to handle large scale batch and real-time data. We also proposed an upgrade of the existing prototype to handle real-time indoor navigation data.
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Hashemi, Idrees Yousef. "Applying data mining techniques over big data." Thesis, Boston University, 2013. https://hdl.handle.net/2144/21119.

Full text
Abstract:
Thesis (M.S.C.S.) PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
The rapid development of information technology in recent decades means that data appear in a wide variety of formats — sensor data, tweets, photographs, raw data, and unstructured data. Statistics show that there were 800,000 Petabytes stored in the world in 2000. Today’s internet has about 0.1 Zettabytes of data (ZB is about 1021 bytes), and this number will reach 35 ZB by 2020. With such an overwhelming flood of information, present data management systems are not able to scale to this huge amount of raw, unstructured data—in today’s parlance, Big Data. In the present study, we show the basic concepts and design of Big Data tools, algorithms, and techniques. We compare the classical data mining algorithms to the Big Data algorithms by using Hadoop/MapReduce as a core implementation of Big Data for scalable algorithms. We implemented the K-means algorithm and A-priori algorithm with Hadoop/MapReduce on a 5 nodes Hadoop cluster. We explore NoSQL databases for semi-structured, massively large-scaling of data by using MongoDB as an example. Finally, we show the performance between HDFS (Hadoop Distributed File System) and MongoDB data storage for these two algorithms.
APA, Harvard, Vancouver, ISO, and other styles
3

Neagu, Daniel, and A.-N. Richarz. "Big data in predictive toxicology." Royal Society of Chemistry, 2019. http://hdl.handle.net/10454/17603.

Full text
Abstract:
No
The rate at which toxicological data is generated is continually becoming more rapid and the volume of data generated is growing dramatically. This is due in part to advances in software solutions and cheminformatics approaches which increase the availability of open data from chemical, biological and toxicological and high throughput screening resources. However, the amplified pace and capacity of data generation achieved by these novel techniques presents challenges for organising and analysing data output. Big Data in Predictive Toxicology discusses these challenges as well as the opportunities of new techniques encountered in data science. It addresses the nature of toxicological big data, their storage, analysis and interpretation. It also details how these data can be applied in toxicity prediction, modelling and risk assessment.
APA, Harvard, Vancouver, ISO, and other styles
4

Cheelangi, Madhusudan. "Result Distribution in Big Data Systems." Thesis, University of California, Irvine, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=1539891.

Full text
Abstract:

We are building a Big Data Management System (BDMS) called AsterixDB at UCI. Since AsterixDB is designed to operate on large volumes of data, the results for its queries can be potentially very large, and AsterixDB is also designed to operate under high concurency workloads. As a result, we need a specialized mechanism to manage these large volumes of query results and deliver them to the clients. In this thesis, we present an architecture and an implementation of a new result distribution framework that is capable of handling large volumes of results under high concurency workloads. We present the various components of this result distribution framework and show how they interact with each other to manage large volumes of query results and deliver them to clients. We also discuss various result distribution policies that are possible with our framework and compare their performance through experiments.

We have implemented a REST-like HTTP client interface on top of the result distribution framework to allow clients to submit queries and obtain their results. This client interface provides two modes for clients to choose from to read their query results: synchronous mode and asynchronous mode. In synchronous mode, query results are delivered to a client as a direct response to its query within the same request-response cycle. In asynchronous mode, a query handle is returned instead to the client as a response to its query. The client can store the handle and send another request later, including the query handle, to read the result for the query whenever it wants. The architectural support for these two modes is also described in this thesis. We believe that the result distribution framework, combined with this client interface, successfully meets the result management demands of AsterixDB.

APA, Harvard, Vancouver, ISO, and other styles
5

Abidi, Faiz Abbas. "Remote High Performance Visualization of Big Data for Immersive Science." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78210.

Full text
Abstract:
Remote visualization has emerged as a necessary tool in the analysis of big data. High-performance computing clusters can provide several benefits in scaling to larger data sizes, from parallel file systems to larger RAM profiles to parallel computation among many CPUs and GPUs. For scalable data visualization, remote visualization tools and infrastructure is critical where only pixels and interaction events are sent over the network instead of the data. In this paper, we present our pipeline using VirtualGL, TurboVNC, and ParaView to render over 40 million points using remote HPC clusters and project over 26 million pixels in a CAVE-style system. We benchmark the system by varying the video stream compression parameters supported by TurboVNC and establish some best practices for typical usage scenarios. This work will help research scientists and academicians in scaling their big data visualizations for real time interaction.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Jiayin. "Building Efficient Large-Scale Big Data Processing Platforms." Thesis, University of Massachusetts Boston, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10262281.

Full text
Abstract:

In the era of big data, many cluster platforms and resource management schemes are created to satisfy the increasing demands on processing a large volume of data. A general setting of big data processing jobs consists of multiple stages, and each stage represents generally defined data operation such as ltering and sorting. To parallelize the job execution in a cluster, each stage includes a number of identical tasks that can be concurrently launched at multiple servers. Practical clusters often involve hundreds or thousands of servers processing a large batch of jobs. Resource management, that manages cluster resource allocation and job execution, is extremely critical for the system performance.

Generally speaking, there are three main challenges in resource management of the new big data processing systems. First, while there are various pending tasks from dierent jobs and stages, it is difficult to determine which ones deserve the priority to obtain the resources for execution, considering the tasks' different characteristics such as resource demand and execution time. Second, there exists dependency among the tasks that can be concurrently running. For any two consecutive stages of a job, the output data of the former stage is the input data of the later one. The resource management has to comply with such dependency. The third challenge is the inconsistent performance of the cluster nodes. In practice, run-time performance of every server is varying. The resource management needs to dynamically adjust the resource allocation according to the performance change of each server.

The resource management in the existing platforms and prior work often rely on fixed user-specic congurations, and assumes consistent performance in each node. The performance, however, is not satisfactory under various workloads. This dissertation aims to explore new approaches to improving the eciency of large-scale big data processing platforms. In particular, the run-time dynamic factors are carefully considered when the system allocates the resources. New algorithms are developed to collect run-time data and predict the characteristics of jobs and the cluster. We further develop resource management schemes that dynamically tune the resource allocation for each stage of every running job in the cluster. New findings and techniques in this dissertation will certainly provide valuable and inspiring insights to other similar problems in the research community.

APA, Harvard, Vancouver, ISO, and other styles
7

Da, Yanan. "A Big Spatial Data System for Efficient and Scalable Spatial Data Processing." Thesis, Southern Illinois University at Edwardsville, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10682760.

Full text
Abstract:

Today, a large amount of spatial data is generated from a variety of sources, such as mobile devices, sensors, and satellites. Traditional spatial data processing techniques no longer satisfy the efficiency and scalability requirements for large-scale spatial data processing. Existing Big Data processing frameworks such as Hadoop and Spark have been extended to support effective large-scale spatial data processing. In addition to processing data in distributed schemes utilizing computer clusters for efficiency and scalability, single node performance can also be improved by making use of multi-core processors. In this thesis, we investigate approaches to parallelize line segment intersection algorithms for spatial computations on multi-core processors, which can be used as node-level algorithms for distributed spatial data processing. We first provide our design of line segment intersection algorithms and introduce parallelization techniques. Then, we describe experimental results using multiple data sets and speed ups are examined with varying numbers of processing cores. Equipped with the efficient underlying algorithm for spatial computation, we investigate how to build a native big spatial data system from the ground up. We provide a system design for distributed large-scale spatial data management and processing using a two-level hash based Quadtree index as well as algorithms for spatial operations.

APA, Harvard, Vancouver, ISO, and other styles
8

Mattasantharam, R. (Rubini). "3D web visualization of continuous integration big data." Master's thesis, University of Oulu, 2018. http://urn.fi/URN:NBN:fi:oulu-201812063239.

Full text
Abstract:
Continuous Integration (CI) is a practice that is used to automate the software build and its test for every code integration to a shared repository. CI runs thousands of test scripts every day in a software organization. Every test produces data which can be test results logs such as errors, warnings, performance measurements and build metrics. This data volume tends to grow at unprecedented rates for the builds that are produced in the Continuous Integration (CI) system. The amount of the integrated test results data in CI grows over time. Visualizing and manipulating the real time and dynamic data is a challenge for the organizations. The 2D visualization of big data has been actively in use in software industry. Though the 2D visualization has numerous advantages, this study is focused on the 3D representation of CI big data visualization and its advantage over 2D visualization. Interactivity with the data and system, and accessibility of the data anytime, anywhere are two important requirements for the system to be usable. Thus, the study focused in creating a 3D user interface to visualize CI system data in 3D web environment. The three-dimensional user interface has been studied by many researchers who have successfully identified various advantages of 3D visualization along with various interaction techniques. Researchers have also described how the system is useful in real world 3D applications. But the usability of 3D user interface in visualizations in not yet reached to a desirable level especially in software industry due its complex data. The purpose of this thesis is to explore the use of 3D data visualization that could help the CI system users of a beneficiary organization in interpreting and exploring CI system data. The study focuses on designing and creating a 3D user interface for providing a more effective and usable system for CI data exploration. Design science research framework is chosen as a suitable research method to conduct the study. This study identifies the advantages of applying 3D visualization to a software system data and then proceeds to explore how 3D visualization could help users in exploring the software data through visualization and its features. The results of the study reveal that the 3D visualization help the beneficiary organization to view and compare multiple datasets in a single screen space, and to see the holistic view of large datasets, as well as focused details of multiple datasets of various categories in a single screen space. Also, it can be said from the results that the 3D visualization help the beneficiary organization CI team to better represent big data in 3D than in 2D.
APA, Harvard, Vancouver, ISO, and other styles
9

Chen, Guo. "Implementation of Cumulative Probability Models for Big Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1619624862283514.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wilson, David S. "Correlated Sample Synopsis on Big Data." Youngstown State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Big Data Science"

1

Jiang, Zhe, and Shashi Shekhar. Spatial Big Data Science. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-60195-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

EMC Education Services. Data Science & Big Data Analytics. Indianapolis, IN, USA: John Wiley & Sons, Inc, 2015. http://dx.doi.org/10.1002/9781119183686.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mahmood, Zaigham, ed. Data Science and Big Data Computing. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-31861-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mishra, Durgesh Kumar, Xin-She Yang, and Aynur Unal, eds. Data Science and Big Data Analytics. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-10-7641-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Foster, Ian. Big Data and Social Science. Boca Raton, FL : CRC Press, [2017] | Series: Chapman & Hall/CRC: Chapman and Hall/CRC, 2016. http://dx.doi.org/10.1201/9781315368238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Jones, Michael N., ed. Big Data in Cognitive Science. New York, NY : Routledge, 2016. |: Psychology Press, 2016. http://dx.doi.org/10.4324/9781315413570.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cui, Zhen, Jinshan Pan, Shanshan Zhang, Liang Xiao, and Jian Yang, eds. Intelligence Science and Big Data Engineering. Big Data and Machine Learning. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-36204-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Lee, Roger, ed. Big Data, Cloud Computing, and Data Science Engineering. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-24405-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lee, Roger, ed. Big Data, Cloud Computing, Data Science & Engineering. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-96803-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Peng, Yuxin, Kai Yu, Jiwen Lu, and Xingpeng Jiang, eds. Intelligence Science and Big Data Engineering. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-02698-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Big Data Science"

1

Kauermann, Göran. "Data Science als Studiengang." In Big Data, 87–95. Wiesbaden: Springer Fachmedien Wiesbaden, 2017. http://dx.doi.org/10.1007/978-3-658-20083-1_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Martinez, Lourdes S. "Data Science." In Encyclopedia of Big Data, 1–4. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-32001-4_60-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kreuter, Frauke, Florian Keusch, Evgenia Samoilova, and Karin Frößinger. "International Program in Survey and Data Science." In Big Data, 27–41. Wiesbaden: Springer Fachmedien Wiesbaden, 2017. http://dx.doi.org/10.1007/978-3-658-20083-1_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Data Science." In Big Data with Hadoop MapReduce, 357–69. Includes bibliographical references and index.: Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lake, Peter, and Paul Crowther. "Big Data." In Undergraduate Topics in Computer Science, 135–59. London: Springer London, 2013. http://dx.doi.org/10.1007/978-1-4471-5601-7_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Domdouzis, Konstantinos, Peter Lake, and Paul Crowther. "Big Data." In Undergraduate Topics in Computer Science, 141–63. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-42224-0_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Fox, Charles. "“Data Science” and “Big Data”." In Springer Textbooks in Earth Sciences, Geography and Environment, 1–14. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-72953-4_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Verhoef, Peter C., Edwin Kooge, Natasha Walk, and Jaap E. Wieringa. "Data science and big data." In Creating Value with Data Analytics in Marketing, 1–5. 2nd ed. London: Routledge, 2021. http://dx.doi.org/10.4324/9781003011163-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Webb, Stephen. "Big Data." In New Light Through Old Windows: Exploring Contemporary Science Through 12 Classic Science Fiction Tales, 181–99. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-03195-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Morini, Marco. "Political Science." In Encyclopedia of Big Data, 1–3. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-319-32001-4_166-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Big Data Science"

1

Getoor, Lise. "Responsible Data Science." In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019. http://dx.doi.org/10.1109/bigdata47090.2019.9006129.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Haug, Frank S. "Bad big data science." In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016. http://dx.doi.org/10.1109/bigdata.2016.7840935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Shamsuddin, Siti Mariyam, and Shafaatunnur Hasan. "Data science vs big data @ UTM big data centre." In 2015 International Conference on Science in Information Technology (ICSITech). IEEE, 2015. http://dx.doi.org/10.1109/icsitech.2015.7407766.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Watson, Alex, Deepigha Shree Vittal Babu, and Suprio Ray. "Sanzu: A data science benchmark." In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8257934.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Baumann, Peter, and Dimitar Misev. "Enhancing science support in SQL." In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015. http://dx.doi.org/10.1109/bigdata.2015.7364007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Tahsin, Anika, and Md Manzurul Hasan. "Big Data & Data Science." In ICCA 2020: International Conference on Computing Advancements. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3377049.3377051.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Pearl, Judea. "The new science of cause and effect, with reflections on data science and artificial intelligence." In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019. http://dx.doi.org/10.1109/bigdata47090.2019.9005644.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Saltz, Jeffrey S., and Nancy W. Grady. "The ambiguity of data science team roles and the need for a data science workforce framework." In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8258190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Dorr, Bonnie J., Craig S. Greenberg, Peter Fontana, Mark Przybocki, Marion Le Bras, Cathryn Ploehn, Oleg Aulov, and Wo Chang. "The NIST data science evaluation series: Part of the NIST information access division data science initiative." In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015. http://dx.doi.org/10.1109/bigdata.2015.7364096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Underwood, William, David Weintrop, Michael Kurtz, and Richard Marciano. "Introducing Computational Thinking into Archival Science Education." In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018. http://dx.doi.org/10.1109/bigdata.2018.8622511.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Big Data Science"

1

Metzler, Katie, David A. Kim, Nick Allum, and Angella Denman. Who Is Doing Computational Social Science? Trends in Big Data Research A SAGE White Paper. SAGE Publishing, September 2016. http://dx.doi.org/10.4135/wp160926.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Neeley, Aimee, Stace E. Beaulieu, Chris Proctor, Ivona Cetinić, Joe Futrelle, Inia Soto Ramos, Heidi M. Sosik, et al. Standards and practices for reporting plankton and other particle observations from images. Woods Hole Oceanographic Institution, July 2021. http://dx.doi.org/10.1575/1912/27377.

Full text
Abstract:
This technical manual guides the user through the process of creating a data table for the submission of taxonomic and morphological information for plankton and other particles from images to a repository. Guidance is provided to produce documentation that should accompany the submission of plankton and other particle data to a repository, describes data collection and processing techniques, and outlines the creation of a data file. Field names include scientificName that represents the lowest level taxonomic classification (e.g., genus if not certain of species, family if not certain of genus) and scientificNameID, the unique identifier from a reference database such as the World Register of Marine Species or AlgaeBase. The data table described here includes the field names associatedMedia, scientificName/ scientificNameID for both automated and manual identification, biovolume, area_cross_section, length_representation and width_representation. Additional steps that instruct the user on how to format their data for a submission to the Ocean Biodiversity Information System (OBIS) are also included. Examples of documentation and data files are provided for the user to follow. The documentation requirements and data table format are approved by both NASA’s SeaWiFS Bio-optical Archive and Storage System (SeaBASS) and the National Science Foundation’s Biological and Chemical Oceanography Data Management Office (BCO-DMO).
APA, Harvard, Vancouver, ISO, and other styles
3

Saville, Alan, and Caroline Wickham-Jones, eds. Palaeolithic and Mesolithic Scotland : Scottish Archaeological Research Framework Panel Report. Society for Antiquaries of Scotland, June 2012. http://dx.doi.org/10.9750/scarf.06.2012.163.

Full text
Abstract:
Why research Palaeolithic and Mesolithic Scotland? Palaeolithic and Mesolithic archaeology sheds light on the first colonisation and subsequent early inhabitation of Scotland. It is a growing and exciting field where increasing Scottish evidence has been given wider significance in the context of European prehistory. It extends over a long period, which saw great changes, including substantial environmental transformations, and the impact of, and societal response to, climate change. The period as a whole provides the foundation for the human occupation of Scotland and is crucial for understanding prehistoric society, both for Scotland and across North-West Europe. Within the Palaeolithic and Mesolithic periods there are considerable opportunities for pioneering research. Individual projects can still have a substantial impact and there remain opportunities for pioneering discoveries including cemeteries, domestic and other structures, stratified sites, and for exploring the huge evidential potential of water-logged and underwater sites. Palaeolithic and Mesolithic archaeology also stimulates and draws upon exciting multi-disciplinary collaborations. Panel Task and Remit The panel remit was to review critically the current state of knowledge and consider promising areas of future research into the earliest prehistory of Scotland. This was undertaken with a view to improved understanding of all aspects of the colonization and inhabitation of the country by peoples practising a wholly hunter-fisher-gatherer way of life prior to the advent of farming. In so doing, it was recognised as particularly important that both environmental data (including vegetation, fauna, sea level, and landscape work) and cultural change during this period be evaluated. The resultant report, outlines the different areas of research in which archaeologists interested in early prehistory work, and highlights the research topics to which they aspire. The report is structured by theme: history of investigation; reconstruction of the environment; the nature of the archaeological record; methodologies for recreating the past; and finally, the lifestyles of past people – the latter representing both a statement of current knowledge and the ultimate aim for archaeologists; the goal of all the former sections. The document is reinforced by material on-line which provides further detail and resources. The Palaeolithic and Mesolithic panel report of ScARF is intended as a resource to be utilised, built upon, and kept updated, hopefully by those it has helped inspire and inform as well as those who follow in their footsteps. Future Research The main recommendations of the panel report can be summarized under four key headings:  Visibility: Due to the considerable length of time over which sites were formed, and the predominant mobility of the population, early prehistoric remains are to be found right across the landscape, although they often survive as ephemeral traces and in low densities. Therefore, all archaeological work should take into account the expectation of Palaeolithic and Mesolithic ScARF Panel Report iv encountering early prehistoric remains. This applies equally to both commercial and research archaeology, and to amateur activity which often makes the initial discovery. This should not be seen as an obstacle, but as a benefit, and not finding such remains should be cause for question. There is no doubt that important evidence of these periods remains unrecognised in private, public, and commercial collections and there is a strong need for backlog evaluation, proper curation and analysis. The inadequate representation of Palaeolithic and Mesolithic information in existing national and local databases must be addressed.  Collaboration: Multi-disciplinary, collaborative, and cross- sector approaches must be encouraged – site prospection, prediction, recognition, and contextualisation are key areas to this end. Reconstructing past environments and their chronological frameworks, and exploring submerged and buried landscapes offer existing examples of fruitful, cross-disciplinary work. Palaeolithic and Mesolithic archaeology has an important place within Quaternary science and the potential for deeply buried remains means that geoarchaeology should have a prominent role.  Innovation: Research-led projects are currently making a substantial impact across all aspects of Palaeolithic and Mesolithic archaeology; a funding policy that acknowledges risk and promotes the innovation that these periods demand should be encouraged. The exploration of lesser known areas, work on different types of site, new approaches to artefacts, and the application of novel methodologies should all be promoted when engaging with the challenges of early prehistory.  Tackling the ‘big questions’: Archaeologists should engage with the big questions of earliest prehistory in Scotland, including the colonisation of new land, how lifestyles in past societies were organized, the effects of and the responses to environmental change, and the transitions to new modes of life. This should be done through a holistic view of the available data, encompassing all the complexities of interpretation and developing competing and testable models. Scottish data can be used to address many of the currently topical research topics in archaeology, and will provide a springboard to a better understanding of early prehistoric life in Scotland and beyond.
APA, Harvard, Vancouver, ISO, and other styles
4

Holland, Darren, and Nazmina Mahmoudzadeh. Foodborne Disease Estimates for the United Kingdom in 2018. Food Standards Agency, January 2020. http://dx.doi.org/10.46756/sci.fsa.squ824.

Full text
Abstract:
In February 2020 the FSA published two reports which produced new estimates of foodborne norovirus cases. These were the ‘Norovirus Attribution Study’ (NoVAS study) (O’Brien et al., 2020) and the accompanying internal FSA technical review ‘Technical Report: Review of Quantitative Risk Assessment of foodborne norovirus transmission’ (NoVAS model review), (Food Standards Agency, 2020). The NoVAS study produced a Quantitative Microbiological Risk Assessment model (QMRA) to estimate foodborne norovirus. The NoVAS model review considered the impact of using alternative assumptions and other data sources on these estimates. From these two pieces of work, a revised estimate of foodborne norovirus was produced. The FSA has therefore updated its estimates of annual foodborne disease to include these new results and also to take account of more recent data related to other pathogens. The estimates produced include: •Estimates of GP presentations and hospital admissions for foodbornenorovirus based on the new estimates of cases. The NoVAS study onlyproduced estimates for cases. •Estimates of foodborne cases, GP presentations and hospital admissions for12 other pathogens •Estimates of unattributed cases of foodborne disease •Estimates of total foodborne disease from all pathogens Previous estimates An FSA funded research project ‘The second study of infectious intestinal disease in the community’, published in 2012 and referred to as the IID2 study (Tam et al., 2012), estimated that there were 17 million cases of infectious intestinal disease (IID) in 2009. These include illness caused by all sources, not just food. Of these 17 million cases, around 40% (around 7 million) could be attributed to 13 known pathogens. These pathogens included norovirus. The remaining 60% of cases (equivalent to 10 million cases) were unattributed cases. These are cases where the causal pathogen is unknown. Reasons for this include the causal pathogen was not tested for, the test was not sensitive enough to detect the causal pathogen or the pathogen is unknown to science. A second project ‘Costed extension to the second study of infectious intestinal disease in the community’, published in 2014 and known as IID2 extension (Tam, Larose and O’Brien, 2014), estimated that there were 566,000 cases of foodborne disease per year caused by the same 13 known pathogens. Although a proportion of the unattributed cases would also be due to food, no estimate was provided for this in the IID2 extension. New estimates We estimate that there were 2.4 million cases of foodborne disease in the UK in 2018 (95% credible intervals 1.8 million to 3.1 million), with 222,000 GP presentations (95% Cred. Int. 150,000 to 322,000) and 16,400 hospital admissions (95% Cred. Int. 11,200 to 26,000). Of the estimated 2.4 million cases, 0.9 million (95% Cred. Int. 0.7 million to 1.2 million) were from the 13 known pathogens included in the IID2 extension and 1.4 million1 (95% Cred. Int. 1.0 million to 2.0 million) for unattributed cases. Norovirus was the pathogen with the largest estimate with 383,000 cases a year. However, this estimate is within the 95% credible interval for Campylobacter of 127,000 to 571,000. The pathogen with the next highest number of cases was Clostridium perfringens with 85,000 (95% Cred. Int. 32,000 to 225,000). While the methodology used in the NoVAS study does not lend itself to producing credible intervals for cases of norovirus, this does not mean that there is no uncertainty in these estimates. There were a number of parameters used in the NoVAS study which, while based on the best science currently available, were acknowledged to have uncertain values. Sensitivity analysis undertaken as part of the study showed that changes to the values of these parameters could make big differences to the overall estimates. Campylobacter was estimated to have the most GP presentations with 43,000 (95% Cred. Int. 19,000 to 76,000) followed by norovirus with 17,000 (95% Cred. Int. 11,000 to 26,000) and Clostridium perfringens with 13,000 (95% Cred. Int. 6,000 to 29,000). For hospital admissions Campylobacter was estimated to have 3,500 (95% Cred. Int. 1,400 to 7,600), followed by norovirus 2,200 (95% Cred. Int. 1,500 to 3,100) and Salmonella with 2,100 admissions (95% Cred. Int. 400 to 9,900). As many of these credible intervals overlap, any ranking needs to be undertaken with caution. While the estimates provided in this report are for 2018 the methodology described can be applied to future years.
APA, Harvard, Vancouver, ISO, and other styles
5

African Open Science Platform Part 1: Landscape Study. Academy of Science of South Africa (ASSAf), 2019. http://dx.doi.org/10.17159/assaf.2019/0047.

Full text
Abstract:
This report maps the African landscape of Open Science – with a focus on Open Data as a sub-set of Open Science. Data to inform the landscape study were collected through a variety of methods, including surveys, desk research, engagement with a community of practice, networking with stakeholders, participation in conferences, case study presentations, and workshops hosted. Although the majority of African countries (35 of 54) demonstrates commitment to science through its investment in research and development (R&D), academies of science, ministries of science and technology, policies, recognition of research, and participation in the Science Granting Councils Initiative (SGCI), the following countries demonstrate the highest commitment and political willingness to invest in science: Botswana, Ethiopia, Kenya, Senegal, South Africa, Tanzania, and Uganda. In addition to existing policies in Science, Technology and Innovation (STI), the following countries have made progress towards Open Data policies: Botswana, Kenya, Madagascar, Mauritius, South Africa and Uganda. Only two African countries (Kenya and South Africa) at this stage contribute 0.8% of its GDP (Gross Domestic Product) to R&D (Research and Development), which is the closest to the AU’s (African Union’s) suggested 1%. Countries such as Lesotho and Madagascar ranked as 0%, while the R&D expenditure for 24 African countries is unknown. In addition to this, science globally has become fully dependent on stable ICT (Information and Communication Technologies) infrastructure, which includes connectivity/bandwidth, high performance computing facilities and data services. This is especially applicable since countries globally are finding themselves in the midst of the 4th Industrial Revolution (4IR), which is not only “about” data, but which “is” data. According to an article1 by Alan Marcus (2015) (Senior Director, Head of Information Technology and Telecommunications Industries, World Economic Forum), “At its core, data represents a post-industrial opportunity. Its uses have unprecedented complexity, velocity and global reach. As digital communications become ubiquitous, data will rule in a world where nearly everyone and everything is connected in real time. That will require a highly reliable, secure and available infrastructure at its core, and innovation at the edge.” Every industry is affected as part of this revolution – also science. An important component of the digital transformation is “trust” – people must be able to trust that governments and all other industries (including the science sector), adequately handle and protect their data. This requires accountability on a global level, and digital industries must embrace the change and go for a higher standard of protection. “This will reassure consumers and citizens, benefitting the whole digital economy”, says Marcus. A stable and secure information and communication technologies (ICT) infrastructure – currently provided by the National Research and Education Networks (NRENs) – is key to advance collaboration in science. The AfricaConnect2 project (AfricaConnect (2012–2014) and AfricaConnect2 (2016–2018)) through establishing connectivity between National Research and Education Networks (NRENs), is planning to roll out AfricaConnect3 by the end of 2019. The concern however is that selected African governments (with the exception of a few countries such as South Africa, Mozambique, Ethiopia and others) have low awareness of the impact the Internet has today on all societal levels, how much ICT (and the 4th Industrial Revolution) have affected research, and the added value an NREN can bring to higher education and research in addressing the respective needs, which is far more complex than simply providing connectivity. Apart from more commitment and investment in R&D, African governments – to become and remain part of the 4th Industrial Revolution – have no option other than to acknowledge and commit to the role NRENs play in advancing science towards addressing the SDG (Sustainable Development Goals). For successful collaboration and direction, it is fundamental that policies within one country are aligned with one another. Alignment on continental level is crucial for the future Pan-African African Open Science Platform to be successful. Both the HIPSSA ((Harmonization of ICT Policies in Sub-Saharan Africa)3 project and WATRA (the West Africa Telecommunications Regulators Assembly)4, have made progress towards the regulation of the telecom sector, and in particular of bottlenecks which curb the development of competition among ISPs. A study under HIPSSA identified potential bottlenecks in access at an affordable price to the international capacity of submarine cables and suggested means and tools used by regulators to remedy them. Work on the recommended measures and making them operational continues in collaboration with WATRA. In addition to sufficient bandwidth and connectivity, high-performance computing facilities and services in support of data sharing are also required. The South African National Integrated Cyberinfrastructure System5 (NICIS) has made great progress in planning and setting up a cyberinfrastructure ecosystem in support of collaborative science and data sharing. The regional Southern African Development Community6 (SADC) Cyber-infrastructure Framework provides a valuable roadmap towards high-speed Internet, developing human capacity and skills in ICT technologies, high- performance computing and more. The following countries have been identified as having high-performance computing facilities, some as a result of the Square Kilometre Array7 (SKA) partnership: Botswana, Ghana, Kenya, Madagascar, Mozambique, Mauritius, Namibia, South Africa, Tunisia, and Zambia. More and more NRENs – especially the Level 6 NRENs 8 (Algeria, Egypt, Kenya, South Africa, and recently Zambia) – are exploring offering additional services; also in support of data sharing and transfer. The following NRENs already allow for running data-intensive applications and sharing of high-end computing assets, bio-modelling and computation on high-performance/ supercomputers: KENET (Kenya), TENET (South Africa), RENU (Uganda), ZAMREN (Zambia), EUN (Egypt) and ARN (Algeria). Fifteen higher education training institutions from eight African countries (Botswana, Benin, Kenya, Nigeria, Rwanda, South Africa, Sudan, and Tanzania) have been identified as offering formal courses on data science. In addition to formal degrees, a number of international short courses have been developed and free international online courses are also available as an option to build capacity and integrate as part of curricula. The small number of higher education or research intensive institutions offering data science is however insufficient, and there is a desperate need for more training in data science. The CODATA-RDA Schools of Research Data Science aim at addressing the continental need for foundational data skills across all disciplines, along with training conducted by The Carpentries 9 programme (specifically Data Carpentry 10 ). Thus far, CODATA-RDA schools in collaboration with AOSP, integrating content from Data Carpentry, were presented in Rwanda (in 2018), and during17-29 June 2019, in Ethiopia. Awareness regarding Open Science (including Open Data) is evident through the 12 Open Science-related Open Access/Open Data/Open Science declarations and agreements endorsed or signed by African governments; 200 Open Access journals from Africa registered on the Directory of Open Access Journals (DOAJ); 174 Open Access institutional research repositories registered on openDOAR (Directory of Open Access Repositories); 33 Open Access/Open Science policies registered on ROARMAP (Registry of Open Access Repository Mandates and Policies); 24 data repositories registered with the Registry of Data Repositories (re3data.org) (although the pilot project identified 66 research data repositories); and one data repository assigned the CoreTrustSeal. Although this is a start, far more needs to be done to align African data curation and research practices with global standards. Funding to conduct research remains a challenge. African researchers mostly fund their own research, and there are little incentives for them to make their research and accompanying data sets openly accessible. Funding and peer recognition, along with an enabling research environment conducive for research, are regarded as major incentives. The landscape report concludes with a number of concerns towards sharing research data openly, as well as challenges in terms of Open Data policy, ICT infrastructure supportive of data sharing, capacity building, lack of skills, and the need for incentives. Although great progress has been made in terms of Open Science and Open Data practices, more awareness needs to be created and further advocacy efforts are required for buy-in from African governments. A federated African Open Science Platform (AOSP) will not only encourage more collaboration among researchers in addressing the SDGs, but it will also benefit the many stakeholders identified as part of the pilot phase. The time is now, for governments in Africa, to acknowledge the important role of science in general, but specifically Open Science and Open Data, through developing and aligning the relevant policies, investing in an ICT infrastructure conducive for data sharing through committing funding to making NRENs financially sustainable, incentivising open research practices by scientists, and creating opportunities for more scientists and stakeholders across all disciplines to be trained in data management.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography