Academic literature on the topic 'Data anonymization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Data anonymization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Data anonymization"

1

Lingala, Thirupathi, C. Kishor Kumar Reddy, B. V. Ramana Murthy, Rajashekar Shastry, and YVSS Pragathi. "L-Diversity for Data Analysis: Data Swapping with Customized Clustering." Journal of Physics: Conference Series 2089, no. 1 (November 1, 2021): 012050. http://dx.doi.org/10.1088/1742-6596/2089/1/012050.

Full text
Abstract:
Abstract Data anonymization should support the analysts who intend to use the anonymized data. Releasing datasets that contain personal information requires anonymization that balances privacy concerns while preserving the utility of the data. This work shows how choosing anonymization techniques with the data analyst requirements in mind improves effectiveness quantitatively, by minimizing the discrepancy between querying the original data versus the anonymized result, and qualitatively, by simplifying the workflow for querying the data.
APA, Harvard, Vancouver, ISO, and other styles
2

Tomás, Joana, Deolinda Rasteiro, and Jorge Bernardino. "Data Anonymization: An Experimental Evaluation Using Open-Source Tools." Future Internet 14, no. 6 (May 30, 2022): 167. http://dx.doi.org/10.3390/fi14060167.

Full text
Abstract:
In recent years, the use of personal data in marketing, scientific and medical investigation, and forecasting future trends has really increased. This information is used by the government, companies, and individuals, and should not contain any sensitive information that allows the identification of an individual. Therefore, data anonymization is essential nowadays. Data anonymization changes the original data to make it difficult to identify an individual. ARX Data Anonymization and Amnesia are two popular open-source tools that simplify this process. In this paper, we evaluate these tools in two ways: with the OSSpal methodology, and using a public dataset with the most recent tweets about the Pfizer and BioNTech vaccine. The assessment with the OSSpal methodology determines that ARX Data Anonymization has better results than Amnesia. In the experimental evaluation using the public dataset, it is possible to verify that Amnesia has some errors and limitations, but the anonymization process is simpler. Using ARX Data Anonymization, it is possible to upload big datasets and the tool does not show any error in the anonymization process. We concluded that ARX Data Anonymization is the one recommended to use in data anonymization.
APA, Harvard, Vancouver, ISO, and other styles
3

Pejić Bach, Mirjana, Jasmina Pivar, and Ksenija Dumičić. "Data anonymization patent landscape." Croatian Operational Research Review 8, no. 1 (March 31, 2017): 265–81. http://dx.doi.org/10.17535/crorr.2017.0017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lasko, Thomas A., and Staal A. Vinterbo. "Spectral Anonymization of Data." IEEE Transactions on Knowledge and Data Engineering 22, no. 3 (March 2010): 437–46. http://dx.doi.org/10.1109/tkde.2009.88.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

K., Sivasankari Krishnakumar, and Dr Uma Maheswari K.M. "A Comprehensive Review on Data Anonymization Techniques for Social Networks." Webology 19, no. 1 (January 20, 2022): 380–405. http://dx.doi.org/10.14704/web/v19i1/web19028.

Full text
Abstract:
Many individuals all around the world have been using social media to communicate information. Numerous companies utilize social data mining to deduce many fascinating facts from social data modeled as a complicated network structure. However, releasing social data has a direct and indirect impact on the privacy of many of its users. Recently, several anonymization techniques have been created to safeguard Personal information about users and their relationships in social media. This study presents a comprehensive survey on various data anonymization strategies for social network data and analysis their advantages and disadvantages. It also addresses the major research concerns surrounding the efficiency of anonymization approaches.
APA, Harvard, Vancouver, ISO, and other styles
6

Ji, Shouling, Weiqing Li, Mudhakar Srivatsa, Jing Selena He, and Raheem Beyah. "General Graph Data De-Anonymization." ACM Transactions on Information and System Security 18, no. 4 (May 6, 2016): 1–29. http://dx.doi.org/10.1145/2894760.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jang, Sung-Bong, and Young-Woong Ko. "Efficient multimedia big data anonymization." Multimedia Tools and Applications 76, no. 17 (December 1, 2015): 17855–72. http://dx.doi.org/10.1007/s11042-015-3123-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kumar, Sindhe Phani, and R. Anandan. "Data Verification of Logical Pk-Anonymization with Big Data Application and Key Generation in Cloud Computing." Journal of Function Spaces 2022 (June 23, 2022): 1–10. http://dx.doi.org/10.1155/2022/8345536.

Full text
Abstract:
Background. As more data becomes available about how frequently the cloud can be updated, a more comprehensive picture of its safety is emerging. The suggested artworks use a cloud-based gradual clustering device to cluster and refresh a large number of informational indexes in a useful manner. Purpose. Anonymization of data is done at the point of collection in order to safeguard the data. More secure than K-Anonymization, Pk-Anonymization is the area’s first randomization method. A cloud service provider (CSP) is an independent company that provides a cloud-based network and computing resources. Customers’ security and connection protection must be verified by an authority before facts may be transferred to cloud servers for storing information. Method. Logical Pk-Anonymization and key era techniques are proposed in this proposed artwork in order to verify the cloud records, as well as to store sensitive information in the cloud. Cloud-based informational indexes are used in the proposed framework, which is effective at handling large amounts of data through MapReduce; a parallel data preparation form is obtained; to get all information as new facts that joins after a while, information anonymization techniques to carry out each protection and immoderate information utilization while updating take place; information loss and clean time is reduced for substantial amounts of data. As a result, the safety and records software might be in sync.
APA, Harvard, Vancouver, ISO, and other styles
9

Et. al., Waleed M. Ead,. "A General Framework Information Loss of Utility-Based Anonymization in Data Publishing." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 5 (April 11, 2021): 1450–56. http://dx.doi.org/10.17762/turcomat.v12i5.2102.

Full text
Abstract:
To build anonymization, the data anonymizer must determine the following three issues: Firstly, which data to be preserved? Secondly, which adversary background knowledge used to disclosure the anonymized data? Thirdly, The usage of the anonymized data? We have different anonymization techniques from the previous three-question according to different adversary background knowledge and information usage (information utility). In other words, different anonymization techniques lead to different information loss. In this paper, we propose a general framework for the utility-based anonymization to minimize the information loss in data published with a trade-off grantee of achieving the required privacy level.
APA, Harvard, Vancouver, ISO, and other styles
10

Gießler, Fina, Maximilian Thormann, Bernhard Preim, Daniel Behme, and Sylvia Saalfeld. "Facial Feature Removal for Anonymization of Neurological Image Data." Current Directions in Biomedical Engineering 7, no. 1 (August 1, 2021): 130–34. http://dx.doi.org/10.1515/cdbme-2021-1028.

Full text
Abstract:
Abstract Interdisciplinary exchange of medical datasets between clinicians and engineers is essential for clinical research. Due to the Data Protection Act, which preserves the rights of patients, full anonymization is necessary before any exchange can take place. Due to the continuous improvement of image quality of tomographic datasets, anonymization of patient-specific information is not sufficient. In this work, we present a prototype that allows to reliably obscure the facial features of patient data, thus enabling anonymization of neurological datasets in image space.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Data anonymization"

1

Lasko, Thomas A. (Thomas Anton) 1965. "Spectral anonymization of data." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42055.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 87-96).
Data anonymization is the process of conditioning a dataset such that no sensitive information can be learned about any specific individual, but valid scientific analysis can nevertheless be performed on it. It is not sufficient to simply remove identifying information because the remaining data may be enough to infer the individual source of the record (a reidentification disclosure) or to otherwise learn sensitive information about a person (a predictive disclosure). The only known way to prevent these disclosures is to remove additional information from the dataset. Dozens of anonymization methods have been proposed over the past few decades; most work by perturbing or suppressing variable values. None have been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. This dissertation makes the new observation that the anonymizing operations do not need to be made in the original basis of the dataset. Operating in a different, judiciously chosen basis can improve privacy protection, analytic utility, and computational efficiency. I use the term 'spectral anonymization' to refer to anonymizing in a spectral basis, such as the basis provided by the data's eigenvectors. Additionally, I propose new measures of reidentification and prediction risk that are more generally applicable and more informative than existing measures. I also propose a measure of analytic utility that assesses the preservation of the multivariate probability distribution. Finally, I propose the demanding reference standard of nonparticipation in the study to define adequate privacy protection. I give three examples of spectral anonymization in practice. The first example improves basic cell swapping from a weak algorithm to one competitive with state of-the-art methods merely by a change of basis.
(cont) The second example demonstrates avoiding the curse of dimensionality in microaggregation. The third describes a powerful algorithm that reduces computational disclosure risk to the same level as that of nonparticipants and preserves at least 4th order interactions in the multivariate distribution. No previously reported algorithm has achieved this combination of results.
by Thomas Anton Lasko.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
2

Reje, Niklas. "Synthetic Data Generation for Anonymization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276239.

Full text
Abstract:
Because of regulations but also from a need to find willing participants for surveys, any released data needs to have some sort of privacy preservation. Privacy preservation, however, always requires some sort of reduction of the utility of the data, how much can vary with the method. Synthetic data generation seeks to be a privacy preserving alternative that keeps the privacy of the participants by generating new records that do not correspond to any real individuals/organizations but still preserve relationships and information within the original dataset. For a method to see wide adoption however it will need to be shown to be useful, for, even if it would be privacy preserving, if it cannot be used for usable research, it will never be used. We investigated four different methods for synthetic data generation: Parametric methods, Decision Trees, Saturated Model with Parametric and Saturated Model with Decision Trees and how the datasets affect those methods with regard to utility together with some restrictions due to how much data can be released and time limitations. We saw through comparing inferences made on the original and the synthetic datasets that a large number of synthetic datasets, about 10 or more, are needed to be released for good utility and that the more datasets that are released, the more stable the inferences are. We see that using as many variables in the imputation process of each variable as possible is best in order to generate synthetic datasets for general usage but that being selective in what variables are used for each imputation can be better for specific inferences that match the preserved relationships. Being selective also helps with keeping down the time complexity of generating synthetic datasets. When compared with k-anonymity we found that the results depended heavily on how much we included as quasi-identifiers but regardless, the synthetic data generation method could get inferences that were at least just as close to the original as inferences made from the k-anonymized datasets, though synthetic more often performed better. We found that Saturated Model with Decision Trees is the overall best method due to high utility with stable generation time regardless of the datasets we used. Decision Trees on their own was second with very close results to the Saturated Model with Decision Trees but some slightly worse results with categorical variables. Third best was Saturated Model with Parametric with good utility often but not with datasets with few categorical variables and occasionally a very long generation time. Parametric was the worst one with poor utility with all datasets and an unstable generation time that as well could be very long
På grund av lagstiftning men även för att få villiga deltagare i studier behöver publicerade data något slags integritetsskydd. Integritetsskydd kräver alltid en viss reducering av användbarheten av data och hur mycket varierar mellan metoder. Syntetisk datagenerering är ett integritetsskyddande alternativ som försöker skydda deltagare genom att generera nya uppgifter som inte korresponderar till någon riktig individ/organisation men som bevarar samma relationer och information som i originaldata. För att en metod ska få vid spridning behöver den visa sig användbar ty, även om den är integritetsskyddande så kommer den aldrig att användas om den inte är användbar för forskning. Vi undersökte fyra olika metoder för syntetisk datagenerering: Parametriska metoder, ”Decision Trees”, ”Saturated Model with Parametric” samt ”Saturated Model with Decision Trees” och vilken effekt olika data har på dessa metoder från ett användbarhetsperspektiv samt tidsbegränsningar och restriktioner på mängden data som kan publiceras. Vi fann genom att jämföra slutledningar gjorda på de syntetiska dataset och orginaldataset att det krävs att man publicerar ett stort antal syntetiska dataset, ungefär 10 eller fler, för att uppnå god användbarhet och att ju fler dataset man publicerar desto stabilare blir slutledningar. Vi fann att använda så många variabler som möjligt i imputeringen av en variabel är det bästa för att generera syntetisk data för generell användning men att vara selektiv i vilka variabler som används i imputeringen kan vara bättre för specifika slutledningar som matchar de bevarade relationerna. Att vara selektiv hjälper också med att hålla nere tidskomplexiteten för att generera syntetisk data. Jämfört med k-anonymity fann vi att resultaten berodde mycket på hur många variabler vi inkluderade som quasi-identifiers men att slutledningar från genererad syntetisk data var minst lika nära de man drog från orginaldata som med k-anonymity, om inte oftare närmare. Vi fann att ”Saturated Model with Decision Trees” är den bästa metoden tack vare dess höga användbarhet med stabil genereringstid oberoende av dataset. Decision Trees” var näst bäst med liknande resultat som föregående men med lite sämre resultat med kategorivariabler. Tredje bäst var ”Saturated Model with Parametric” med bra användbarhet ofta men inte med dataset som hade få kategorivariabler samt ibland en lång genereringstid. Parametrisk var den sämsta med dålig användbarhet med alla dataset samt en instabil genereringstid som ibland kunde vara väldigt lång.
APA, Harvard, Vancouver, ISO, and other styles
3

Miracle, Jacob M. "De-Anonymization Attack Anatomy and Analysis of Ohio Nursing Workforce Data Anonymization." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1482825210051101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sivakumar, Anusha. "Enhancing Privacy Of Data Through Anonymization." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177349.

Full text
Abstract:
A steep rise in availability of personal data has resulted in endless opportunities for data scientists who utilize this open data for research. However, such easy availability of complex personal data challenges privacy of individuals represented in the data. To protect privacy, traditional methods such as using pseudonyms or blurring identity of individuals are followed before releasing data. These traditional methods alone are not sufficient to enhance privacy because combining released data with other publicly available data or background knowledge identifies individuals. A potential solution to this privacy loss problem is to anonymize data so that it cannot be linked to individuals represented in the data. In case of researches involving personal data, anonymization becomes more important than ever. If we alter data to preserve privacy of research participants, the resultant data becomes almost useless for many researches. Therefore, preserving privacy of individuals represented in the data and minimizing data loss caused by privacy preservation is very vital. In this project, we first study the different cases in which attacks take place, different forms of attacks and existing solutions to prevent the attacks. After carefully examining the literature and the undertaken problem, we propose a solution to preserve privacy of research participants as much as possible and to make data useful to the researchers. To support our solution, we consider the case of Digital Footprints which collects and publishes Facebook data with the consent of the users.
En kraftig ökning av tillgång på personligt relaterat data, har lett till oändliga möjligheter för dataforskare att utnyttja dessa data för forskning. En konsekvens är att det blir svårt att bevara personers integritet på grund av den enorma mängd uppgifter som är tillgängliga. För att skydda den personliga integriteten finns möjligheten att med traditionella metoder använda pseudonymer och alias, innan personen publicerar personligt data. Att enbart använda dessa traditionella metoder är inte tillräckligt för att skydda privatlivet, det finns alltid möjligheter att koppla data till verkliga individer. En potentiell lösning på detta problem är att använda anonymiseringstekniker, för att förändra data om individen på att anpassat sätt och på det viset försvåra att data sammankopplas med en individ. Vid undersökningar som innehåller personuppgifter blir anonymisering allt viktigare. Om vi försöker att ändra uppgifter för att bevara integriteten av forskningsdeltagare innan data publiceras, blir den resulterande uppgifter nästan oanvändbar för många undersökningar. För att bevara integriteten av individer representerade i underlaget och att minimera dataförlust orsakad av privatlivet bevarande är mycket viktigt. I denna avhandling har vi studerat de olika fall där attackerna kan ske, olika former av attacker och befintliga lösningar för att förhindra attackerna. Efter att noggrant granskat litteraturen och problemet, föreslår vi en teoretisk lösning för att bevara integriteten av forskningsdeltagarna så mycket som möjligt och att uppgifterna ska vara till nytta för forskning. Som stöd för vår lösning, gällande digitala fotspår som lagrar Facebook uppgifter med samtycke av användarna och släpper den lagrade informationen via olika användargränssnitt.
APA, Harvard, Vancouver, ISO, and other styles
5

Folkesson, Carl. "Anonymization of directory-structured sensitive data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160952.

Full text
Abstract:
Data anonymization is a relevant and important field within data privacy, which tries to find a good balance between utility and privacy in data. The field is especially relevant since the GDPR came into force, because the GDPR does not regulate anonymous data. This thesis focuses on anonymization of directory-structured data, which means data structured into a tree of directories. In the thesis, four of the most common models for anonymization of tabular data, k-anonymity, ℓ-diversity, t-closeness and differential privacy, are adapted for anonymization of directory-structured data. This adaptation is done by creating three different approaches for anonymizing directory-structured data: SingleTable, DirectoryWise and RecursiveDirectoryWise. These models and approaches are compared and evaluated using five metrics and three attack scenarios. The results show that there is always a trade-off between utility and privacy when anonymizing data. Especially it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach gives the highest privacy, but also the highest information loss. On the contrary, the k-anonymity model when using the SingleTable approach or the t-closeness model when using the DirectoryWise approach gives the lowest information loss, but also the lowest privacy. The differential privacy model and the RecursiveDirectoryWise approach were also shown to give best protection against the chosen attacks. Finally, it was concluded that the differential privacy model when using the RecursiveDirectoryWise approach, was the most suitable combination to use when trying to follow the GDPR when anonymizing directory-structured data.
APA, Harvard, Vancouver, ISO, and other styles
6

Cohen, Aloni(Aloni Jonathan). "New guarantees for cryptographic circuits and data anonymization." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122737.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 305-320).
The first part of this thesis presents new definitions and constructions for three modern problems in cryptography: watermarking cryptographic circuits, updatable cryptographic circuits, and proxy reencryption. The second part is dedicate to advancing the understanding of data anonymization. We examine what it means for a data anonymization mechanism to prevent singling out in a data release, a necessary condition to be considered effectively anonymized under the European Union's General Data Protection Regulation. We also demonstrate that heretofore theoretical privacy attacks against ad-hoc privacy preserving technologies are in fact realistic and practical.
by Alon Jonathan Cohen.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
7

Hassan, FadiAbdulfattah Mohammed. "Utility-Preserving Anonymization of Textual Documents." Doctoral thesis, Universitat Rovira i Virgili, 2021. http://hdl.handle.net/10803/672012.

Full text
Abstract:
Cada dia els éssers humans afegim una gran quantitat de dades a Internet, tals com piulades, opinions, fotos i vídeos. Les organitzacions que recullen aquestes dades tan diverses n'extreuen informació per tal de millorar llurs serveis o bé per a propòsits comercials. Tanmateix, si les dades recollides contenen informació personal sensible, hom no les pot compartir amb tercers ni les pot publicar sense el consentiment o una protecció adequada dels subjectes de les dades. Els mecanismes de preservació de la privadesa forneixen maneres de sanejar les dades per tal que no revelin identitats o atributs confidencials. S'ha proposat una gran varietat de mecanismes per anonimitzar bases de dades estructurades amb atributs numèrics i categòrics; en canvi, la protecció automàtica de dades textuals no estructurades ha rebut molta menys atenció. En general, l'anonimització de dades textuals exigeix, primer, detectar trossos del text que poden revelar informació sensible i, després, emmascarar aquests trossos mitjançant supressió o generalització. En aquesta tesi fem servir diverses tecnologies per anonimitzar documents textuals. De primer, millorem les tècniques existents basades en etiquetatge de seqüències. Després, estenem aquestes tècniques per alinear-les millor amb el risc de revelació i amb les exigències de privadesa. Finalment, proposem un marc complet basat en models d'immersió de paraules que captura un concepte més ampli de protecció de dades i que forneix una protecció flexible guiada per les exigències de privadesa. També recorrem a les ontologies per preservar la utilitat del text emmascarat, és a dir, la seva semàntica i la seva llegibilitat. La nostra experimentació extensa i detallada mostra que els nostres mètodes superen els mètodes existents a l'hora de proporcionar anonimització robusta tot preservant raonablement la utilitat del text protegit.
Cada día las personas añadimos una gran cantidad de datos a Internet, tales como tweets, opiniones, fotos y vídeos. Las organizaciones que recogen dichos datos los usan para extraer información para mejorar sus servicios o para propósitos comerciales. Sin embargo, si los datos recogidos contienen información personal sensible, no pueden compartirse ni publicarse sin el consentimiento o una protección adecuada de los sujetos de los datos. Los mecanismos de protección de la privacidad proporcionan maneras de sanear los datos de forma que no revelen identidades ni atributos confidenciales. Se ha propuesto una gran variedad de mecanismos para anonimizar bases de datos estructuradas con atributos numéricos y categóricos; en cambio, la protección automática de datos textuales no estructurados ha recibido mucha menos atención. En general, la anonimización de datos textuales requiere, primero, detectar trozos de texto que puedan revelar información sensible, para luego enmascarar dichos trozos mediante supresión o generalización. En este trabajo empleamos varias tecnologías para anonimizar documentos textuales. Primero mejoramos las técnicas existentes basadas en etiquetaje de secuencias. Posteriormente las extendmos para alinearlas mejor con la noción de riesgo de revelación y con los requisitos de privacidad. Finalmente, proponemos un marco completo basado en modelos de inmersión de palabras que captura una noción más amplia de protección de datos y ofrece protección flexible guiada por los requisitos de privacidad. También recurrimos a las ontologías para preservar la utilidad del texto enmascarado, es decir, su semantica y legibilidad. Nuestra experimentación extensa y detallada muestra que nuestros métodos superan a los existentes a la hora de proporcionar una anonimización más robusta al tiempo que se preserva razonablemente la utilidad del texto protegido.
Every day, people post a significant amount of data on the Internet, such as tweets, reviews, photos, and videos. Organizations collecting these types of data use them to extract information in order to improve their services or for commercial purposes. Yet, if the collected data contain sensitive personal information, they cannot be shared with third parties or released publicly without consent or adequate protection of the data subjects. Privacy-preserving mechanisms provide ways to sanitize data so that identities and/or confidential attributes are not disclosed. A great variety of mechanisms have been proposed to anonymize structured databases with numerical and categorical attributes; however, automatically protecting unstructured textual data has received much less attention. In general, textual data anonymization requires, first, to detect pieces of text that may disclose sensitive information and, then, to mask those pieces via suppression or generalization. In this work, we leverage several technologies to anonymize textual documents. We first improve state-of-the-art techniques based on sequence labeling. After that, we extend them to make them more aligned with the notion of privacy risk and the privacy requirements. Finally, we propose a complete framework based on word embedding models that captures a broader notion of data protection and provides flexible protection driven by privacy requirements. We also leverage ontologies to preserve the utility of the masked text, that is, its semantics and readability. Extensive experimental results show that our methods outperform the state of the art by providing more robust anonymization while reasonably preserving the utility of the protected outcomes
APA, Harvard, Vancouver, ISO, and other styles
8

Michel, Axel. "Personalising privacy contraints in Generalization-based Anonymization Models." Thesis, Bourges, INSA Centre Val de Loire, 2019. http://www.theses.fr/2019ISAB0001/document.

Full text
Abstract:
Les bénéfices engendrés par les études statistiques sur les données personnelles des individus sont nombreux, que ce soit dans le médical, l'énergie ou la gestion du trafic urbain pour n'en citer que quelques-uns. Les initiatives publiques de smart-disclosure et d'ouverture des données rendent ces études statistiques indispensables pour les institutions et industries tout autour du globe. Cependant, ces calculs peuvent exposer les données personnelles des individus, portant ainsi atteinte à leur vie privée. Les individus sont alors de plus en plus réticent à participer à des études statistiques malgré les protections garanties par les instituts. Pour retrouver la confiance des individus, il devient nécessaire de proposer dessolutions de user empowerment, c'est-à-dire permettre à chaque utilisateur de contrôler les paramètres de protection des données personnelles les concernant qui sont utilisées pour des calculs.Cette thèse développe donc un nouveau concept d'anonymisation personnalisé, basé sur la généralisation de données et sur le user empowerment.En premier lieu, ce manuscrit propose une nouvelle approche mettant en avant la personnalisation des protections de la vie privée par les individus, lors de calculs d'agrégation dans une base de données. De cette façon les individus peuvent fournir des données de précision variable, en fonction de leur perception du risque. De plus, nous utilisons une architecture décentralisée basée sur du matériel sécurisé assurant ainsi les garanties de respect de la vie privée tout au long des opérations d'agrégation.En deuxième lieu, ce manuscrit étudie la personnalisations des garanties d'anonymat lors de la publication de jeux de données anonymisés. Nous proposons l'adaptation d'heuristiques existantes ainsi qu'une nouvelle approche basée sur la programmation par contraintes. Des expérimentations ont été menées pour étudier l'impact d’une telle personnalisation sur la qualité des données. Les contraintes d’anonymat ont été construites et simulées de façon réaliste en se basant sur des résultats d'études sociologiques
The benefit of performing Big data computations over individual’s microdata is manifold, in the medical, energy or transportation fields to cite only a few, and this interest is growing with the emergence of smart-disclosure initiatives around the world. However, these computations often expose microdata to privacy leakages, explaining the reluctance of individuals to participate in studies despite the privacy guarantees promised by statistical institutes. To regain indivuals’trust, it becomes essential to propose user empowerment solutions, that is to say allowing individuals to control the privacy parameter used to make computations over their microdata.This work proposes a novel concept of personalized anonymisation based on data generalization and user empowerment.Firstly, this manuscript proposes a novel approach to push personalized privacy guarantees in the processing of database queries so that individuals can disclose different amounts of information (i.e. data at different levels of accuracy) depending on their own perception of the risk. Moreover, we propose a decentralized computing infrastructure based on secure hardware enforcing these personalized privacy guarantees all along the query execution process.Secondly, this manuscript studies the personalization of anonymity guarantees when publishing data. We propose the adaptation of existing heuristics and a new approach based on constraint programming. Experiments have been done to show the impact of such personalization on the data quality. Individuals’privacy constraints have been built and realistically using social statistic studies
APA, Harvard, Vancouver, ISO, and other styles
9

Sakpere, Aderonke Busayo. "Usability heuristics for fast crime data anonymization in resource-constrained contexts." Doctoral thesis, University of Cape Town, 2018. http://hdl.handle.net/11427/28157.

Full text
Abstract:
This thesis considers the case of mobile crime-reporting systems that have emerged as an effective and efficient data collection method in low and middle-income countries. Analyzing the data, can be helpful in addressing crime. Since law enforcement agencies in resource-constrained context typically do not have the expertise to handle these tasks, a cost-effective strategy is to outsource the data analytics tasks to third-party service providers. However, because of the sensitivity of the data, it is expedient to consider the issue of privacy. More specifically, this thesis considers the issue of finding low-intensive computational solutions to protecting the data even from an "honest-but-curious" service provider, while at the same time generating datasets that can be queried efficiently and reliably. This thesis offers a three-pronged solution approach. Firstly, the creation of a mobile application to facilitate crime reporting in a usable, secure and privacy-preserving manner. The second step proposes a streaming data anonymization algorithm, which analyses reported data based on occurrence rate rather than at a preset time on a static repository. Finally, in the third step the concept of using privacy preferences in creating anonymized datasets was considered. By taking into account user preferences the efficiency of the anonymization process is improved upon, which is beneficial in enabling fast data anonymization. Results from the prototype implementation and usability tests indicate that having a usable and covet crime-reporting application encourages users to declare crime occurrences. Anonymizing streaming data contributes to faster crime resolution times, and user privacy preferences are helpful in relaxing privacy constraints, which makes for more usable data from the querying perspective. This research presents considerable evidence that the concept of a three-pronged solution to addressing the issue of anonymity during crime reporting in a resource-constrained environment is promising. This solution can further assist the law enforcement agencies to partner with third party in deriving useful crime pattern knowledge without infringing on users' privacy. In the future, this research can be extended to more than one low-income or middle-income countries.
APA, Harvard, Vancouver, ISO, and other styles
10

Ji, Shouling. "Evaluating the security of anonymized big graph/structural data." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54913.

Full text
Abstract:
We studied the security of anonymized big graph data. Our main contributions include: new De-Anonymization (DA) attacks, comprehensive anonymity, utility, and de-anonymizability quantifications, and a secure graph data publishing/sharing system SecGraph. New DA Attacks. We present two novel graph DA frameworks: cold start single-phase Optimization-based DA (ODA) and De-anonymizing Social-Attribute Graphs (De-SAG). Unlike existing seed-based DA attacks, ODA does not priori knowledge. In addition, ODA’s DA results can facilitate existing DA attacks by providing more seed information. De-SAG is the first attack that takes into account both graph structure and attribute information. Through extensive evaluations leveraging real world graph data, we validated the performance of both ODA and De-SAG. Graph Anonymity, Utility, and De-anonymizability Quantifications. We developed new techniques that enable comprehensive graph data anonymity, utility, and de-anonymizability evaluation. First, we proposed the first seed-free graph de-anonymizability quantification framework under a general data model which provides the theoretical foundation for seed-free SDA attacks. Second, we conducted the first seed-based quantification on the perfect and partial de-anonymizability of graph data. Our quantification closes the gap between seed-based DA practice and theory. Third, we conducted the first attribute-based anonymity analysis for Social-Attribute Graph (SAG) data. Our attribute-based anonymity analysis together with existing structure-based de-anonymizability quantifications provide data owners and researchers a more complete understanding of the privacy of graph data. Fourth, we conducted the first graph Anonymity-Utility-De-anonymity (AUD) correlation quantification and provided close-forms to explicitly demonstrate such correlation. Finally, based on our quantifications, we conducted large-scale evaluations leveraging 100+ real world graph datasets generated by various computer systems and services. Using the evaluations, we demonstrated the datasets’ anonymity, utility, and de-anonymizability, as well as the significance and validity of our quantifications. SecGraph. We designed, implemented, and evaluated the first uniform and open-source Secure Graph data publishing/sharing (SecGraph) system. SecGraph enables data owners and researchers to conduct accurate comparative studies of anonymization/DA techniques, and to comprehensively understand the resistance/vulnerability of existing or newly developed anonymization techniques, the effectiveness of existing or newly developed DA attacks, and graph and application utilities of anonymized data.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Data anonymization"

1

I, Heinrich Ulrike, and SpringerLink (Online service), eds. Anonymization. London: Springer London, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gkoulalas-Divanis, Aris. Anonymization of Electronic Medical Records to Support Clinical Analysis. New York, NY: Springer New York, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Weber, Rolf H., and Ulrike I. Heinrich. Anonymization. Springer, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Weber, Rolf H., and Ulrike I. Heinrich. Anonymization. Springer, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Raghunathan, Balaji. The Complete Book of Data Anonymization. Auerbach Publications, 2013. http://dx.doi.org/10.1201/b13097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Emam, Khaled El, and Luk Arbuckle. Building an Anonymization Pipeline: Creating Safe Data. O'Reilly Media, Incorporated, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

The Complete Book of Data Anonymization Infosys Press. Taylor & Francis Ltd, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Raghunathan, Balaji. Complete Book of Data Anonymization: From Planning to Implementation. Auerbach Publishers, Incorporated, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Raghunathan, Balaji. Complete Book of Data Anonymization: From Planning to Implementation. Auerbach Publishers, Incorporated, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Raghunathan, Balaji. Complete Book of Data Anonymization: From Planning to Implementation. Auerbach Publishers, Incorporated, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Data anonymization"

1

Domingo-Ferrer, Josep, and Jordi Soria-Comas. "Data Anonymization." In Lecture Notes in Computer Science, 267–71. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-17127-2_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Smith, Mick, and Rajeev Agrawal. "Anonymization Techniques." In Encyclopedia of Big Data, 30–33. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-32010-6_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Smith, Mick, and Rajeev Agrawal. "Anonymization Techniques." In Encyclopedia of Big Data, 1–4. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-32001-4_9-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Chester, Sean, Bruce M. Kapron, Gautam Srivastava, Venkatesh Srinivasan, and Alex Thomo. "Anonymization and De-anonymization of Social Network Data." In Encyclopedia of Social Network Analysis and Mining, 1–9. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4614-7163-9_22-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chester, Sean, Bruce M. Kapron, Gautam Srivastava, Venkatesh Srinivasan, and Alex Thomo. "Anonymization and De-anonymization of Social Network Data." In Encyclopedia of Social Network Analysis and Mining, 48–56. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4614-6170-8_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chester, Sean, Bruce M. Kapron, Gautam Srivastava, Venkatesh Srinivasan, and Alex Thomo. "Anonymization and De-anonymization of Social Network Data." In Encyclopedia of Social Network Analysis and Mining, 78–86. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4939-7131-2_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gkoulalas-Divanis, Aris, and Grigorios Loukides. "Overview of Patient Data Anonymization." In SpringerBriefs in Electrical and Computer Engineering, 9–30. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-5668-1_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Torra, Vicenç, and Guillermo Navarro-Arribas. "Big Data Privacy and Anonymization." In Privacy and Identity Management. Facing up to Next Steps, 15–26. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-55783-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Delanaux, Remy, Angela Bonifati, Marie-Christine Rousset, and Romuald Thion. "Query-Based Linked Data Anonymization." In Lecture Notes in Computer Science, 530–46. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00671-6_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mohammed, Alip, and Benjamin C. M. Fung. "Data Anonymization with Differential Privacy." In Encyclopedia of Machine Learning and Data Science, 1–5. New York, NY: Springer US, 2022. http://dx.doi.org/10.1007/978-1-4899-7502-7_990-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Data anonymization"

1

Ovalle Lopez, Diana Sofia, and Robert Vann. "Linguistic Analysis, Ethical Practice, and Quality Assurance in Anonymizing Recordings of Spoken Language for Deposit in Digital Archives." In International Workshop on Digital Language Archives. University of North Texas, 2021. http://dx.doi.org/10.12794/langarc1851180.

Full text
Abstract:
This report considers linguistic analyses as matters of ethical practice and quality assurance in the anonymization of recordings of spoken language for deposit in a digital language archive. Ethically, researchers must be committed to protecting the identities of primary data providers. Accordingly, conducting pragmatic analyses before initiating technical anonymization procedures can aid in determining exactly what discourse, in what contexts, might constitute identifying information in need of anonymization. Qualitatively, one of the main goals of language documentation is to preserve as much primary data as possible for future research. Accordingly, conducting phonotactic analyses with the help of computer software can aid in determining precise chronometer readings for each tonal insertion to excise as little primary data as possible during anonymizations. These findings warrant further research on anonymization protocols in digital language archive projects.
APA, Harvard, Vancouver, ISO, and other styles
2

Martinelli, Fabio, and Mina SheikhAlishahi. "Distributed Data Anonymization." In 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, 2019. http://dx.doi.org/10.1109/dasc/picom/cbdcom/cyberscitech.2019.00113.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Iwuchukwu, Tochukwu, David J. DeWitt, AnHai Doan, and Jeffrey F. Naughton. "K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization." In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 2007. http://dx.doi.org/10.1109/icde.2007.369024.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ji, Shouling, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. "Structural Data De-anonymization." In CCS'14: 2014 ACM SIGSAC Conference on Computer and Communications Security. New York, NY, USA: ACM, 2014. http://dx.doi.org/10.1145/2660267.2660278.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

De Capitani di Vimercati, Sabrina, Dario Facchinetti, Sara Foresti, Gianluca Oldani, Stefano Paraboschi, Matthew Rossi, and Pierangela Samarati. "Scalable Distributed Data Anonymization." In 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). IEEE, 2021. http://dx.doi.org/10.1109/percomworkshops51409.2021.9431063.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Malekzadeh, Mohammad, Richard G. Clegg, Andrea Cavallaro, and Hamed Haddadi. "Mobile sensor data anonymization." In IoTDI '19: International Conference on Internet-of-Things Design and Implementation. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3302505.3310068.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gionis, Aristides, Arnon Mazza, and Tamir Tassa. "k-Anonymization Revisited." In 2008 IEEE 24th International Conference on Data Engineering (ICDE 2008). IEEE, 2008. http://dx.doi.org/10.1109/icde.2008.4497483.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Deng, Xiaofeng, Fan Zhang, and Hai Jin. "Data Anonymization for Big Crowdsourcing Data." In IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2019. http://dx.doi.org/10.1109/infocomwkshps47286.2019.9093748.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Thouvenot, Maxime, Olivier Cure, and Philippe Calvez. "Knowledge Graph Anonymization using Semantic Anatomization." In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020. http://dx.doi.org/10.1109/bigdata50022.2020.9377824.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Doka, Katerina, Mingqiang Xue, Dimitrios Tsoumakos, Panagiotis Karras, Alfredo Cuzzocrea, and Nectarios Koziris. "Heterogeneous k-anonymization with high utility." In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015. http://dx.doi.org/10.1109/bigdata.2015.7363963.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography