To see the other types of publications on this topic, follow the link: Public datasets.

Journal articles on the topic 'Public datasets'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Public datasets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Muturi, Peter N., Andrew M. Kahonge, and Christopher Kipchumba Chepken. "Assessing Identity Disclosure Risk in the Absence of Identified Datasets in the Public Domain." East African Journal of Information Technology 5, no. 1 (2022): 62–75. http://dx.doi.org/10.37284/eajit.5.1.773.

Full text
Abstract:
Data release is essential in supporting data analytics and secondary data analyses. However, data curators need to ensure the released datasets preserve data subjects’ privacy and retain analytical utility. Data privacy is achieved through the anonymisation of datasets before release. The risk of disclosure posed to the dataset should inform the level of anonymisation to be undertaken. As anonymisation achieves data privacy, it reduces the analytical utility of the dataset by introducing alterations to the original data values. Therefore, data curators require an appropriate estimate of the da
APA, Harvard, Vancouver, ISO, and other styles
2

Chang, Nai Chen, Elissa Aminoff, John Pyles, Michael Tarr, and Abhinav Gupta. "Scaling Up Neural Datasets: A public fMRI dataset of 5000 scenes." Journal of Vision 18, no. 10 (2018): 732. http://dx.doi.org/10.1167/18.10.732.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sinn, Donghee, Sujin Kim, and Sue Yeon Syn. "Public Library Innovation Inside Out." Proceedings of the Association for Information Science and Technology 60, no. 1 (2023): 1128–30. http://dx.doi.org/10.1002/pra2.967.

Full text
Abstract:
ABSTRACTThis poster presents public library innovations during the Covid‐19 pandemic. Many public libraries quickly adapted to the pandemic environment, changing and improving their operations and services to meet the new challenges and demands from their users. We collected two datasets to investigate these innovations: the first dataset comprised 751 tweets from the 12 largest public libraries in the U.S., and the second dataset included 72 articles from 3 major professional magazines. These datasets were analyzed to identify innovative services provided between 2020 and 2021. A rigorous con
APA, Harvard, Vancouver, ISO, and other styles
4

Kolmogortseva Karina, Kolmogortseva Karina, Soo-Hyung Kim Soo-Hyung Kim, and Aera Kim Aera Kim. "A Review of Public Datasets for Keystroke-based Behavior Analysis." Korean Institute of Smart Media 13, no. 7 (2024): 18–26. http://dx.doi.org/10.30693/smj.2024.13.7.18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

TIAN, Feng, Feng ZHI, and Ruofan ZHAO. "Datasets of Public Metrological Standard Data." China Scientific Data 7, no. 1 (2022): A148. http://dx.doi.org/10.11922/11-6035.csd.2021.0062.zh.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Anderson, B. "SP-0368 Dealing with public datasets." Radiotherapy and Oncology 182 (May 2023): S277—S278. http://dx.doi.org/10.1016/s0167-8140(23)67373-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kirtac, Kadir, Nizamettin Aydin, Joël L. Lavanchy, et al. "Surgical Phase Recognition: From Public Datasets to Real-World Data." Applied Sciences 12, no. 17 (2022): 8746. http://dx.doi.org/10.3390/app12178746.

Full text
Abstract:
Automated recognition of surgical phases is a prerequisite for computer-assisted analysis of surgeries. The research on phase recognition has been mostly driven by publicly available datasets of laparoscopic cholecystectomy (Lap Chole) videos. Yet, videos observed in real-world settings might contain challenges, such as additional phases and longer videos, which may be missing in curated public datasets. In this work, we study (i) the possible data distribution discrepancy between videos observed in a given medical center and videos from existing public datasets, and (ii) the potential impact
APA, Harvard, Vancouver, ISO, and other styles
8

Kolmogortseva Karina, Kolmogortseva Karina, Soo-Hyung Kim, and Aera Kim. "[Corrigendum] A Review of Public Datasets for Keystroke-based Behavior Analysis." Korean Institute of Smart Media 13, no. 9 (2024): 38. http://dx.doi.org/10.30693/smj.2024.13.9.38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Di, Yanghua, Zhiguo Jiang, and Haopeng Zhang. "A Public Dataset for Fine-Grained Ship Classification in Optical Remote Sensing Images." Remote Sensing 13, no. 4 (2021): 747. http://dx.doi.org/10.3390/rs13040747.

Full text
Abstract:
Fine-grained visual categorization (FGVC) is an important and challenging problem due to large intra-class differences and small inter-class differences caused by deformation, illumination, angles, etc. Although major advances have been achieved in natural images in the past few years due to the release of popular datasets such as the CUB-200-2011, Stanford Cars and Aircraft datasets, fine-grained ship classification in remote sensing images has been rarely studied because of relative scarcity of publicly available datasets. In this paper, we investigate a large amount of remote sensing image
APA, Harvard, Vancouver, ISO, and other styles
10

Ferenc, Rudolf, Zoltán Tóth, Gergely Ladányi, István Siket, and Tibor Gyimóthy. "A public unified bug dataset for java and its assessment regarding metrics and bug prediction." Software Quality Journal 28, no. 4 (2020): 1447–506. http://dx.doi.org/10.1007/s11219-020-09515-0.

Full text
Abstract:
AbstractBug datasets have been created and used by many researchers to build and validate novel bug prediction models. In this work, our aim is to collect existing public source code metric-based bug datasets and unify their contents. Furthermore, we wish to assess the plethora of collected metrics and the capabilities of the unified bug dataset in bug prediction. We considered 5 public datasets and we downloaded the corresponding source code for each system in the datasets and performed source code analysis to obtain a common set of source code metrics. This way, we produced a unified bug dat
APA, Harvard, Vancouver, ISO, and other styles
11

Et.al, Yun-Young Hwang. "Linked Method of Open Government Data by Datasets Oriented." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 6 (2021): 780–85. http://dx.doi.org/10.17762/turcomat.v12i6.2095.

Full text
Abstract:
In order to make public data more useful, it is necessary to provide relevant data sets that meet the needs of users. We introduce the method of linkage between datasets. We provide a method for deriving linkages between fields of structured datasets provided by public data portals. We defined a dataset and connectivity between datasets. The connectivity between them is based on the metadata of the dataset and the linkage between the actual data field names and values. We constructed the standard field names. Based on this standard, we established the relationship between the datasets. This pa
APA, Harvard, Vancouver, ISO, and other styles
12

Scheuerman, Morgan Klaus, Katy Weathington, Tarun Mugunthan, Emily Denton, and Casey Fiesler. "From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets." Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023): 1–33. http://dx.doi.org/10.1145/3579488.

Full text
Abstract:
Computer vision is a "data hungry" field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including "public" data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly o
APA, Harvard, Vancouver, ISO, and other styles
13

Nogueira-Rodríguez, Alba, Miguel Reboiro-Jato, Daniel Glez-Peña, and Hugo López-Fernández. "Performance of Convolutional Neural Networks for Polyp Localization on Public Colonoscopy Image Datasets." Diagnostics 12, no. 4 (2022): 898. http://dx.doi.org/10.3390/diagnostics12040898.

Full text
Abstract:
Colorectal cancer is one of the most frequent malignancies. Colonoscopy is the de facto standard for precancerous lesion detection in the colon, i.e., polyps, during screening studies or after facultative recommendation. In recent years, artificial intelligence, and especially deep learning techniques such as convolutional neural networks, have been applied to polyp detection and localization in order to develop real-time CADe systems. However, the performance of machine learning models is very sensitive to changes in the nature of the testing instances, especially when trying to reproduce res
APA, Harvard, Vancouver, ISO, and other styles
14

Jang, Ryoungwoo, Namkug Kim, Miso Jang, et al. "Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Using Chest X-Ray Images From Multiple Centers." JMIR Medical Informatics 8, no. 8 (2020): e18089. http://dx.doi.org/10.2196/18089.

Full text
Abstract:
Background Computer-aided diagnosis on chest x-ray images using deep learning is a widely studied modality in medicine. Many studies are based on public datasets, such as the National Institutes of Health (NIH) dataset and the Stanford CheXpert dataset. However, these datasets are preprocessed by classical natural language processing, which may cause a certain extent of label errors. Objective This study aimed to investigate the robustness of deep convolutional neural networks (CNNs) for binary classification of posteroanterior chest x-ray through random incorrect labeling. Methods We trained
APA, Harvard, Vancouver, ISO, and other styles
15

Sarwati Rahayu, Sulis Sandiwarno, Erwin Dwika Putra, Marissa Utami, and Hadiguna Setiawan. "Model Sequential Resnet50 Untuk Pengenalan Tulisan Tangan Aksara Arab." JSAI (Journal Scientific and Applied Informatics) 6, no. 2 (2023): 234–41. http://dx.doi.org/10.36085/jsai.v6i2.5379.

Full text
Abstract:
Research for Arabic handwriting recognition is still limited. The number of public datasets regarding Arabic script is still limited for this type of public dataset. Therefore, each study usually uses its dataset to conduct research. However, recently public datasets have become available and become research opportunities to compare methods with the same dataset. This study aimed to determine the implementation of the transfer learning model with the best accuracy for handwriting recognition in Arabic script. The results of the experiment using ResNet50 are as follows: training accuracy is 91.
APA, Harvard, Vancouver, ISO, and other styles
16

Karia, Adrian Jackob, Juma Said Ally, and Stanley Leonard. "Enhancing Coffee Leaf Rust Detection Using DenseNet201: A Comprehensive Analysis of the Mbozi and Public Datasets in Songwe, Tanzania." African Journal of Empirical Research 6, no. 1 (2025): 171–88. https://doi.org/10.51867/ajernet.6.1.17.

Full text
Abstract:
Coffee Leaf Rust (CLR) is a worldwide devastating fungal disease that threatens coffee production, upsetting economic and farmers' livelihoods. Traditional methods of detecting CLR heavily rely on using machine-learning (ML) models trained through weakly collected datasets and physical inspection which is tedious, time-consuming, and subject to human error. This study explores the performance of the DenseNet201 model using three datasets: Mbozi, Public, and Combined (a merger of Mbozi and Public datasets). Machine Learning Theory guided this research. The study objective is to assess the influ
APA, Harvard, Vancouver, ISO, and other styles
17

Feng, Eric, and Xijin Ge. "DataViz: visualization of high-dimensional data in virtual reality." F1000Research 7 (October 23, 2018): 1687. http://dx.doi.org/10.12688/f1000research.16453.1.

Full text
Abstract:
Virtual reality (VR) simulations promote interactivity and immersion, and provide an opportunity that may help researchers gain insights from complex datasets. To explore the utility and potential of VR in graphically rendering large datasets, we have developed an application for immersive, 3-dimensional (3D) scatter plots. Developed using the Unity development environment, DataViz enables the visualization of high-dimensional data with the HTC Vive, a relatively inexpensive and modern virtual reality headset available to the general public. DataViz has the following features: (1) principal co
APA, Harvard, Vancouver, ISO, and other styles
18

Klöti, Rowan, Bernhard Ager, Vasileios Kotronis, George Nomikos, and Xenofontas Dimitropoulos. "A Comparative Look into Public IXP Datasets." ACM SIGCOMM Computer Communication Review 46, no. 1 (2016): 21–29. http://dx.doi.org/10.1145/2875951.2875955.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

DEARWENT, STEVE M., ROBERT R. JACOBS, and JOHN B. HALBERT. "Locational uncertainty in georeferencing public health datasets." Journal of Exposure Science & Environmental Epidemiology 11, no. 4 (2001): 329–34. http://dx.doi.org/10.1038/sj.jea.7500173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Oakden-Rayner, Luke. "Exploring Large-scale Public Medical Image Datasets." Academic Radiology 27, no. 1 (2020): 106–12. http://dx.doi.org/10.1016/j.acra.2019.10.006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Gondohanindijo, Jutono, Edi Noersasongko, Pujiono Pujiono, and Muljono Muljono. "Analysis Kernel and Feature: Impact on Classification Performance on Speech Emotion Using Machine Learning." Jurnal Ilmiah Teknik Elektro Komputer dan Informatika 10, no. 3 (2024): 507–19. https://doi.org/10.26555/jiteki.v10i3.29022.

Full text
Abstract:
The main objective of this study is to test the machine learning kernel's selection against the characteristics of the data set used, resulting in good classification performance. The goal of speech emotion recognition is to improve computers' ability to detect and process human emotions in order to improve their ability to respond to interactions between people and computers. It can be applied to feedback on talks, including sentimental or emotional content, as well as the detection of human mental health. One field of data mining work is Speech Emotion Recognition. One of the important thing
APA, Harvard, Vancouver, ISO, and other styles
22

Xiaoli, Lingzi, Jill V. Hagey, Daniel J. Park, et al. "Benchmark datasets for SARS-CoV-2 surveillance bioinformatics." PeerJ 10 (September 5, 2022): e13821. http://dx.doi.org/10.7717/peerj.13821.

Full text
Abstract:
Background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and check
APA, Harvard, Vancouver, ISO, and other styles
23

Mallon, Melissa. "Statistics and Datasets." Public Services Quarterly 15, no. 1 (2019): 24–33. http://dx.doi.org/10.1080/15228959.2018.1556147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Edwards, John, Kaden Hart, and Raj Shrestha. "Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets." Journal of Educational Data Mining 15, no. 1 (2023): 1–31. https://doi.org/10.5281/zenodo.7646659.

Full text
Abstract:
Analysis of programming process data has become popular in computing education research and educational data mining in the last decade. This type of data is quantitative, often of high temporal resolution, and it can be collected non-intrusively while the student is in a natural setting. Many levels of granularity can be obtained, such as submission, compilation, edit, and keystroke events, with keystroke-level logs being the most fine-grained of commonly used dataset types. However, the lack of open datasets, especially at the keystroke level, is notable. There are several reasons for this fa
APA, Harvard, Vancouver, ISO, and other styles
25

Jekateryńczuk, Gabriel, Rafał Szadkowski, and Zbigniew Piotrowski. "UaVirBASE: A Public-Access Unmanned Aerial Vehicle Sound Source Localization Dataset." Applied Sciences 15, no. 10 (2025): 5378. https://doi.org/10.3390/app15105378.

Full text
Abstract:
This article presents UaVirBASE, a publicly available dataset for the sound source localization (SSL) of unmanned aerial vehicles (UAVs). The dataset contains synchronized multi-microphone recordings captured under controlled conditions, featuring variations in UAV distances, altitudes, azimuths, and orientations relative to a fixed microphone array. UAV orientations include front, back, left, and right-facing configurations. UaVirBASE addresses the growing need for standardized SSL datasets tailored for UAV applications, filling a gap left behind by existing databases that often lack such spe
APA, Harvard, Vancouver, ISO, and other styles
26

Beg, Sheza Waqar, Dr Sharique Ahmad, and Dr Saeeda Wasim. "Harnessing Public Multimodal Datasets: Revolutionizing Scientific Research and Innovation." Haya: The Saudi Journal of Life Sciences 9, no. 07 (2024): 299–304. http://dx.doi.org/10.36348/sjls.2024.v09i07.008.

Full text
Abstract:
Multimodal datasets, integrating data from multiple sources such as text, images, audio, and physiological signals, have become increasingly valuable in scientific research. These datasets provide a comprehensive understanding of complex phenomena, facilitating advancements in fields like medicine, psychology, computer vision, and natural language processing. Publicly available multimodal datasets have democratized access to high-quality data, enabling researchers worldwide to contribute to and benefit from scientific advancements. This review article examines the significance of public multim
APA, Harvard, Vancouver, ISO, and other styles
27

Tsai, Chi-Yi, Wei-Hsuan Shih, and Humaira Nisar. "Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets." Mathematics 12, no. 19 (2024): 3104. http://dx.doi.org/10.3390/math12193104.

Full text
Abstract:
In response to the COVID-19 pandemic, governments worldwide have implemented mandatory face mask regulations in crowded public spaces, making the development of automatic face mask detection systems critical. To achieve robust face mask detection performance, a high-quality and comprehensive face mask dataset is required. However, due to the difficulty in obtaining face samples with masks in the real-world, public face mask datasets are often imbalanced, leading to the data imbalance problem in model training and negatively impacting detection performance. To address this problem, this paper p
APA, Harvard, Vancouver, ISO, and other styles
28

MADDOX, GEORGE L. "Research as a Public Enterprise: Social Science Data on Ageing in the Public Domain." Ageing and Society 17, no. 3 (1997): 323–35. http://dx.doi.org/10.1017/s0144686x97006442.

Full text
Abstract:
A revolution is occurring in information exchange among gerontologists worldwide. For research investigators the increasingly easy accessibility of public use datasets promises to facilitate both research training and useful exchange of evidence. A brief history of the development of public use datasets for research in ageing is provided, and datasets of particular interest are described. While the illustrations focus on experience in the United States the implications of these developments for training and communication among gerontologists worldwide are noted.
APA, Harvard, Vancouver, ISO, and other styles
29

Dominguez-Gimeno, S., R. Igual, and C. Medrano. "Analysis of public datasets of power quality distortions." Renewable Energy and Power Quality Journal 18 (June 2020): 321–26. http://dx.doi.org/10.24084/repqj18.317.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Tannenbaum, Cara. "Gender-based analysis using existing public health datasets." Canadian Journal of Public Health 111, no. 2 (2020): 151–54. http://dx.doi.org/10.17269/s41997-020-00302-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Nishio, Mizuho, Mari Nishio, Naoe Jimbo, and Kazuaki Nakane. "Homology-Based Image Processing for Automatic Classification of Histopathological Images of Lung Tissue." Cancers 13, no. 6 (2021): 1192. http://dx.doi.org/10.3390/cancers13061192.

Full text
Abstract:
The purpose of this study was to develop a computer-aided diagnosis (CAD) system for automatic classification of histopathological images of lung tissues. Two datasets (private and public datasets) were obtained and used for developing and validating CAD. The private dataset consists of 94 histopathological images that were obtained for the following five categories: normal, emphysema, atypical adenomatous hyperplasia, lepidic pattern of adenocarcinoma, and invasive adenocarcinoma. The public dataset consists of 15,000 histopathological images that were obtained for the following three categor
APA, Harvard, Vancouver, ISO, and other styles
32

Dlamini, Nkosikhona, and Terence L. van Zyl. "Comparing Class-Aware and Pairwise Loss Functions for Deep Metric Learning in Wildlife Re-Identification." Sensors 21, no. 18 (2021): 6109. http://dx.doi.org/10.3390/s21186109.

Full text
Abstract:
Similarity learning using deep convolutional neural networks has been applied extensively in solving computer vision problems. This attraction is supported by its success in one-shot and zero-shot classification applications. The advances in similarity learning are essential for smaller datasets or datasets in which few class labels exist per class such as wildlife re-identification. Improving the performance of similarity learning models comes with developing new sampling techniques and designing loss functions better suited to training similarity in neural networks. However, the impact of th
APA, Harvard, Vancouver, ISO, and other styles
33

Cambazoglu, B. Barla, Mark Sanderson, Falk Scholer, and Bruce Croft. "A review of public datasets in question answering research." ACM SIGIR Forum 54, no. 2 (2020): 1–23. http://dx.doi.org/10.1145/3483382.3483389.

Full text
Abstract:
Recent years have seen an increase in the number of publicly available datasets that are released to foster research in question answering systems. In this work, we survey the available datasets and also provide a simple, multi-faceted classification of those datasets. We further survey the most recent evaluation results that form the current state of the art in question answering research by exploring related research challenges and associated online leaderboards. Finally, we provide a discussion around the existing online challenges and provide a wishlist of datasets whose release could bene
APA, Harvard, Vancouver, ISO, and other styles
34

Cahyana, Nur Heri, Yuli Fauziah, and Agus Sasmito Aribowo. "The Comparison of Tree-Based Ensemble Machine Learning for Classifying Public Datasets." RSF Conference Series: Engineering and Technology 1, no. 1 (2021): 407–13. http://dx.doi.org/10.31098/cset.v1i1.412.

Full text
Abstract:
This study aims to determine the best methods of tree-based ensemble machine learning to classify the datasets used, a total of 34 datasets. This study also wants to know the relationship between the number of records and columns of the test dataset with the number of estimators (trees) for each ensemble model, namely Random Forest, Extra Tree Classifier, AdaBoost, and Gradient Bosting. The four methods will be compared to the maximum accuracy and the number of estimators when tested to classify the test dataset. Based on the results of the experiments above, tree-based ensemble machine learni
APA, Harvard, Vancouver, ISO, and other styles
35

Shi, Haonan, Tu Ouyang, and An Wang. "Unveiling Client Privacy Leakage from Public Dataset Usage in Federated Distillation." Proceedings on Privacy Enhancing Technologies 2025, no. 4 (2025): 201–15. https://doi.org/10.56553/popets-2025-0127.

Full text
Abstract:
Federated Distillation (FD) has emerged as a popular federated training framework, enabling clients to collaboratively train models without sharing private data. Public Dataset-Assisted Federated Distillation (PDA-FD), which leverages public datasets for knowledge sharing, has become widely adopted. Although PDA-FD enhances privacy compared to traditional Federated Learning, we demonstrate that the use of public datasets still poses significant privacy risks to clients' private training data. This paper presents the first comprehensive privacy analysis of PDA-FD in the presence of an honest-bu
APA, Harvard, Vancouver, ISO, and other styles
36

Kamel Boulos, M. N., and P. AbdelMalik. "Multidimensional Point Transform for Public Health Practice." Methods of Information in Medicine 51, no. 01 (2012): 63–73. http://dx.doi.org/10.3414/me11-01-0001.

Full text
Abstract:
SummaryBackground: With increases in spatial information and enabling technologies, location-privacy concerns have been on the rise. A commonly proposed solution in public health involves random perturbation, however consideration for individual dimensions (at-tributes) has been weak.Objectives: The current study proposes a multidimensional point transform (MPT) that integrates the spatial dimension with other dimensions of interest to comprehensively anonymise data.Methods: The MPT relies on the availability of a base population, a subset patient dataset, and shared dimensions of interest. Pe
APA, Harvard, Vancouver, ISO, and other styles
37

Sultana, Tangina, Umair Qudus, Muhammad Umair, and Md Delowar Hossain. "An Efficient Framework for Finding Similar Datasets Based on Ontology." Electronics 13, no. 22 (2024): 4417. http://dx.doi.org/10.3390/electronics13224417.

Full text
Abstract:
Governments are embracing an open data philosophy and making their data freely available to the public to encourage innovation and increase transparency. However, the number of available datasets is still limited. Finding relationships between related datasets on different data portals enables users to search the relevant datasets. These datasets are generated from the training data, which need to be curated by the user query. However, relevant dataset retrieval is an expensive operation due to the preparation procedure for each dataset. Moreover, it requires a significant amount of space and
APA, Harvard, Vancouver, ISO, and other styles
38

Muhtar, Yusnur, Mahpirat Muhammat, Nurbiya Yadikar, Alimjan Aysa, and Kurban Ubul. "FC-ResNet: A Multilingual Handwritten Signature Verification Model Using an Improved ResNet with CBAM." Applied Sciences 13, no. 14 (2023): 8022. http://dx.doi.org/10.3390/app13148022.

Full text
Abstract:
Offline signature verification is a widely used biometric method in finance, law, and administrative procedures. However, existing deep convolutional neural network models perform poorly on signature datasets that span different regions and ethnic people, while also suffering from problems such as large parameter counts and slow inference speeds. To address these issues, we propose an improved residual network model (FC-ResNet). This model introduces a convolutional block attention module into the classical residual network to adapt to the diversity and variability of signatures, while also co
APA, Harvard, Vancouver, ISO, and other styles
39

Tang, Ruixiang, Qizhang Feng, Ninghao Liu, Fan Yang, and Xia Hu. "Did You Train on My Dataset? Towards Public Dataset Protection with CleanLabel Backdoor Watermarking." ACM SIGKDD Explorations Newsletter 25, no. 1 (2023): 43–53. http://dx.doi.org/10.1145/3606274.3606279.

Full text
Abstract:
The huge supporting training data on the Internet has been a key factor in the success of deep learning models. However, this abundance of public-available data also raises concerns about the unauthorized exploitation of datasets for commercial purposes, which is forbidden by dataset licenses. In this paper, we propose a backdoor-based watermarking approach that serves as a general framework for safeguarding publicavailable data. By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
APA, Harvard, Vancouver, ISO, and other styles
40

Yan, Zhengjun, Liming Wang, Kui Qin, et al. "Unsupervised Domain Adaptation for Forest Fire Recognition Using Transferable Knowledge from Public Datasets." Forests 14, no. 1 (2022): 52. http://dx.doi.org/10.3390/f14010052.

Full text
Abstract:
Deep neural networks (DNNs) have driven the recent advances in fire detection. However, existing methods require large-scale labeled samples to train data-hungry networks, which are difficult to collect and even more laborious to label. This paper applies unsupervised domain adaptation (UDA) to transfer knowledge from a labeled public fire dataset to another unlabeled one in practical application scenarios for the first time. Then, a transfer learning benchmark dataset called Fire-DA is built from public datasets for fire recognition. Next, the Deep Subdomain Adaptation Network (DSAN) and the
APA, Harvard, Vancouver, ISO, and other styles
41

Debelee, Taye Girma, Abrham Gebreselasie, Friedhelm Schwenker, Mohammadreza Amirian, and Dereje Yohannes. "Classification of Mammograms Using Texture and CNN Based Extracted Features." Journal of Biomimetics, Biomaterials and Biomedical Engineering 42 (July 2019): 79–97. http://dx.doi.org/10.4028/www.scientific.net/jbbbe.42.79.

Full text
Abstract:
In this paper, a modified adaptive K-means (MAKM) method is proposed to extract the region of interest (ROI) from the local and public datasets. The local image datasets are collected from Bethezata General Hospital (BGH) and the public datasets are from Mammographic Image Analysis Society (MIAS). The same image number is used for both datasets, 112 are abnormal and 208 are normal. Two texture features (GLCM and Gabor) from ROIs and one CNN based extracted features are considered in the experiment. CNN features are extracted using Inception-V3 pre-trained model after simple preprocessing and c
APA, Harvard, Vancouver, ISO, and other styles
42

Sielemann, Katharina, Alenka Hafner, and Boas Pucker. "The reuse of public datasets in the life sciences: potential risks and rewards." PeerJ 8 (September 22, 2020): e9954. http://dx.doi.org/10.7717/peerj.9954.

Full text
Abstract:
The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings
APA, Harvard, Vancouver, ISO, and other styles
43

Wang, Shengbo, David García-Seisdedos, Ananth Prakash, et al. "Integrated view and comparative analysis of baseline protein expression in mouse and rat tissues." PLOS Computational Biology 18, no. 6 (2022): e1010174. http://dx.doi.org/10.1371/journal.pcbi.1010174.

Full text
Abstract:
The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respective
APA, Harvard, Vancouver, ISO, and other styles
44

Reiss, Attila, Ina Indlekofer, Philip Schmidt, and Kristof Van Laerhoven. "Deep PPG: Large-Scale Heart Rate Estimation with Convolutional Neural Networks." Sensors 19, no. 14 (2019): 3079. http://dx.doi.org/10.3390/s19143079.

Full text
Abstract:
Photoplethysmography (PPG)-based continuous heart rate monitoring is essential in a number of domains, e.g., for healthcare or fitness applications. Recently, methods based on time-frequency spectra emerged to address the challenges of motion artefact compensation. However, existing approaches are highly parametrised and optimised for specific scenarios of small, public datasets. We address this fragmentation by contributing research into the robustness and generalisation capabilities of PPG-based heart rate estimation approaches. First, we introduce a novel large-scale dataset (called PPG-DaL
APA, Harvard, Vancouver, ISO, and other styles
45

Barriere, Valentin, and Alexandra Balahur. "Multilingual Multi-Target Stance Recognition in Online Public Consultations." Mathematics 11, no. 9 (2023): 2161. http://dx.doi.org/10.3390/math11092161.

Full text
Abstract:
Machine Learning is an interesting tool for stance recognition in a large-scale context, in terms of data size, but also regarding the topics and themes addressed or the languages employed by the participants. Public consultations of citizens using online participatory democracy platforms offer this kind of setting and are good use cases for automatic stance recognition systems. In this paper, we propose to use three datasets of public consultations, in order to train a model able to classify the stance of a citizen within a text, towards a proposal or a debate question. We studied stance dete
APA, Harvard, Vancouver, ISO, and other styles
46

Grannis, Shaun J., Huiping Xu, Joshua R. Vest, et al. "Evaluating the effect of data standardization and validation on patient matching accuracy." Journal of the American Medical Informatics Association 26, no. 5 (2019): 447–56. http://dx.doi.org/10.1093/jamia/ocy191.

Full text
Abstract:
Abstract Objective This study evaluated the degree to which recommendations for demographic data standardization improve patient matching accuracy using real-world datasets. Materials and Methods We used 4 manually reviewed datasets, containing a random selection of matches and nonmatches. Matching datasets included health information exchange (HIE) records, public health registry records, Social Security Death Master File records, and newborn screening records. Standardized fields including last name, telephone number, social security number, date of birth, and address. Matching performance w
APA, Harvard, Vancouver, ISO, and other styles
47

Howland, Matthew D., Brady Liss, Thomas E. Levy, and Mohammad Najjar. "Integrating Digital Datasets into Public Engagement through ArcGIS StoryMaps." Advances in Archaeological Practice 8, no. 4 (2020): 351–60. http://dx.doi.org/10.1017/aap.2020.14.

Full text
Abstract:
AbstractArchaeologists have a responsibility to use their research to engage people and provide opportunities for the public to interact with cultural heritage and interpret it on their own terms. This can be done through hypermedia and deep mapping as approaches to public archaeology. In twenty-first-century archaeology, scholars can rely on vastly improved technologies to aid them in these efforts toward public engagement, including digital photography, geographic information systems, and three-dimensional models. These technologies, even when collected for analysis or documentation, can be
APA, Harvard, Vancouver, ISO, and other styles
48

Ramzi, Zaccharie, Philippe Ciuciu, and Jean-Luc Starck. "Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets." Applied Sciences 10, no. 5 (2020): 1816. http://dx.doi.org/10.3390/app10051816.

Full text
Abstract:
Deep learning is starting to offer promising results for reconstruction in Magnetic Resonance Imaging (MRI). A lot of networks are being developed, but the comparisons remain hard because the frameworks used are not the same among studies, the networks are not properly re-trained, and the datasets used are not the same among comparisons. The recent release of a public dataset, fastMRI, consisting of raw k-space data, encouraged us to write a consistent benchmark of several deep neural networks for MR image reconstruction. This paper shows the results obtained for this benchmark, allowing to co
APA, Harvard, Vancouver, ISO, and other styles
49

Casilari, Eduardo, José-Antonio Santoyo-Ramón, and José-Manuel Cano-García. "Analysis of Public Datasets for Wearable Fall Detection Systems." Sensors 17, no. 7 (2017): 1513. http://dx.doi.org/10.3390/s17071513.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Hartley, Matthew, and Gerard Kleywegt. "Towards Public Archiving of Large, Multi-Modal Imaging Datasets." Microscopy and Microanalysis 28, S1 (2022): 1526–27. http://dx.doi.org/10.1017/s1431927622006134.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!