Dissertations / Theses on the topic 'Data anonymization'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 40 dissertations / theses for your research on the topic 'Data anonymization.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Lasko, Thomas A. (Thomas Anton) 1965. "Spectral anonymization of data." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42055.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 87-96).
Data anonymization is the process of conditioning a dataset such that no sensitive information can be learned about any specific individual, but valid scientific analysis can nevertheless be performed on it. It is not sufficient to simply remove identifying information because the remaining data may be enough to infer the individual source of the record (a reidentification disclosure) or to otherwise learn sensitive information about a person (a predictive disclosure). The only known way to prevent these disclosures is to remove additional information from the dataset. Dozens of anonymization methods have been proposed over the past few decades; most work by perturbing or suppressing variable values. None have been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. This dissertation makes the new observation that the anonymizing operations do not need to be made in the original basis of the dataset. Operating in a different, judiciously chosen basis can improve privacy protection, analytic utility, and computational efficiency. I use the term 'spectral anonymization' to refer to anonymizing in a spectral basis, such as the basis provided by the data's eigenvectors. Additionally, I propose new measures of reidentification and prediction risk that are more generally applicable and more informative than existing measures. I also propose a measure of analytic utility that assesses the preservation of the multivariate probability distribution. Finally, I propose the demanding reference standard of nonparticipation in the study to define adequate privacy protection. I give three examples of spectral anonymization in practice. The first example improves basic cell swapping from a weak algorithm to one competitive with state of-the-art methods merely by a change of basis.
(cont) The second example demonstrates avoiding the curse of dimensionality in microaggregation. The third describes a powerful algorithm that reduces computational disclosure risk to the same level as that of nonparticipants and preserves at least 4th order interactions in the multivariate distribution. No previously reported algorithm has achieved this combination of results.
by Thomas Anton Lasko.
Ph.D.
Reje, Niklas. "Synthetic Data Generation for Anonymization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276239.
Full textPå grund av lagstiftning men även för att få villiga deltagare i studier behöver publicerade data något slags integritetsskydd. Integritetsskydd kräver alltid en viss reducering av användbarheten av data och hur mycket varierar mellan metoder. Syntetisk datagenerering är ett integritetsskyddande alternativ som försöker skydda deltagare genom att generera nya uppgifter som inte korresponderar till någon riktig individ/organisation men som bevarar samma relationer och information som i originaldata. För att en metod ska få vid spridning behöver den visa sig användbar ty, även om den är integritetsskyddande så kommer den aldrig att användas om den inte är användbar för forskning. Vi undersökte fyra olika metoder för syntetisk datagenerering: Parametriska metoder, ”Decision Trees”, ”Saturated Model with Parametric” samt ”Saturated Model with Decision Trees” och vilken effekt olika data har på dessa metoder från ett användbarhetsperspektiv samt tidsbegränsningar och restriktioner på mängden data som kan publiceras. Vi fann genom att jämföra slutledningar gjorda på de syntetiska dataset och orginaldataset att det krävs att man publicerar ett stort antal syntetiska dataset, ungefär 10 eller fler, för att uppnå god användbarhet och att ju fler dataset man publicerar desto stabilare blir slutledningar. Vi fann att använda så många variabler som möjligt i imputeringen av en variabel är det bästa för att generera syntetisk data för generell användning men att vara selektiv i vilka variabler som används i imputeringen kan vara bättre för specifika slutledningar som matchar de bevarade relationerna. Att vara selektiv hjälper också med att hålla nere tidskomplexiteten för att generera syntetisk data. Jämfört med k-anonymity fann vi att resultaten berodde mycket på hur många variabler vi inkluderade som quasi-identifiers men att slutledningar från genererad syntetisk data var minst lika nära de man drog från orginaldata som med k-anonymity, om inte oftare närmare. Vi fann att ”Saturated Model with Decision Trees” är den bästa metoden tack vare dess höga användbarhet med stabil genereringstid oberoende av dataset. Decision Trees” var näst bäst med liknande resultat som föregående men med lite sämre resultat med kategorivariabler. Tredje bäst var ”Saturated Model with Parametric” med bra användbarhet ofta men inte med dataset som hade få kategorivariabler samt ibland en lång genereringstid. Parametrisk var den sämsta med dålig användbarhet med alla dataset samt en instabil genereringstid som ibland kunde vara väldigt lång.
Miracle, Jacob M. "De-Anonymization Attack Anatomy and Analysis of Ohio Nursing Workforce Data Anonymization." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1482825210051101.
Full textSivakumar, Anusha. "Enhancing Privacy Of Data Through Anonymization." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177349.
Full textEn kraftig ökning av tillgång på personligt relaterat data, har lett till oändliga möjligheter för dataforskare att utnyttja dessa data för forskning. En konsekvens är att det blir svårt att bevara personers integritet på grund av den enorma mängd uppgifter som är tillgängliga. För att skydda den personliga integriteten finns möjligheten att med traditionella metoder använda pseudonymer och alias, innan personen publicerar personligt data. Att enbart använda dessa traditionella metoder är inte tillräckligt för att skydda privatlivet, det finns alltid möjligheter att koppla data till verkliga individer. En potentiell lösning på detta problem är att använda anonymiseringstekniker, för att förändra data om individen på att anpassat sätt och på det viset försvåra att data sammankopplas med en individ. Vid undersökningar som innehåller personuppgifter blir anonymisering allt viktigare. Om vi försöker att ändra uppgifter för att bevara integriteten av forskningsdeltagare innan data publiceras, blir den resulterande uppgifter nästan oanvändbar för många undersökningar. För att bevara integriteten av individer representerade i underlaget och att minimera dataförlust orsakad av privatlivet bevarande är mycket viktigt. I denna avhandling har vi studerat de olika fall där attackerna kan ske, olika former av attacker och befintliga lösningar för att förhindra attackerna. Efter att noggrant granskat litteraturen och problemet, föreslår vi en teoretisk lösning för att bevara integriteten av forskningsdeltagarna så mycket som möjligt och att uppgifterna ska vara till nytta för forskning. Som stöd för vår lösning, gällande digitala fotspår som lagrar Facebook uppgifter med samtycke av användarna och släpper den lagrade informationen via olika användargränssnitt.
Folkesson, Carl. "Anonymization of directory-structured sensitive data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160952.
Full textCohen, Aloni(Aloni Jonathan). "New guarantees for cryptographic circuits and data anonymization." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122737.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 305-320).
The first part of this thesis presents new definitions and constructions for three modern problems in cryptography: watermarking cryptographic circuits, updatable cryptographic circuits, and proxy reencryption. The second part is dedicate to advancing the understanding of data anonymization. We examine what it means for a data anonymization mechanism to prevent singling out in a data release, a necessary condition to be considered effectively anonymized under the European Union's General Data Protection Regulation. We also demonstrate that heretofore theoretical privacy attacks against ad-hoc privacy preserving technologies are in fact realistic and practical.
by Alon Jonathan Cohen.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Hassan, FadiAbdulfattah Mohammed. "Utility-Preserving Anonymization of Textual Documents." Doctoral thesis, Universitat Rovira i Virgili, 2021. http://hdl.handle.net/10803/672012.
Full textCada día las personas añadimos una gran cantidad de datos a Internet, tales como tweets, opiniones, fotos y vídeos. Las organizaciones que recogen dichos datos los usan para extraer información para mejorar sus servicios o para propósitos comerciales. Sin embargo, si los datos recogidos contienen información personal sensible, no pueden compartirse ni publicarse sin el consentimiento o una protección adecuada de los sujetos de los datos. Los mecanismos de protección de la privacidad proporcionan maneras de sanear los datos de forma que no revelen identidades ni atributos confidenciales. Se ha propuesto una gran variedad de mecanismos para anonimizar bases de datos estructuradas con atributos numéricos y categóricos; en cambio, la protección automática de datos textuales no estructurados ha recibido mucha menos atención. En general, la anonimización de datos textuales requiere, primero, detectar trozos de texto que puedan revelar información sensible, para luego enmascarar dichos trozos mediante supresión o generalización. En este trabajo empleamos varias tecnologías para anonimizar documentos textuales. Primero mejoramos las técnicas existentes basadas en etiquetaje de secuencias. Posteriormente las extendmos para alinearlas mejor con la noción de riesgo de revelación y con los requisitos de privacidad. Finalmente, proponemos un marco completo basado en modelos de inmersión de palabras que captura una noción más amplia de protección de datos y ofrece protección flexible guiada por los requisitos de privacidad. También recurrimos a las ontologías para preservar la utilidad del texto enmascarado, es decir, su semantica y legibilidad. Nuestra experimentación extensa y detallada muestra que nuestros métodos superan a los existentes a la hora de proporcionar una anonimización más robusta al tiempo que se preserva razonablemente la utilidad del texto protegido.
Every day, people post a significant amount of data on the Internet, such as tweets, reviews, photos, and videos. Organizations collecting these types of data use them to extract information in order to improve their services or for commercial purposes. Yet, if the collected data contain sensitive personal information, they cannot be shared with third parties or released publicly without consent or adequate protection of the data subjects. Privacy-preserving mechanisms provide ways to sanitize data so that identities and/or confidential attributes are not disclosed. A great variety of mechanisms have been proposed to anonymize structured databases with numerical and categorical attributes; however, automatically protecting unstructured textual data has received much less attention. In general, textual data anonymization requires, first, to detect pieces of text that may disclose sensitive information and, then, to mask those pieces via suppression or generalization. In this work, we leverage several technologies to anonymize textual documents. We first improve state-of-the-art techniques based on sequence labeling. After that, we extend them to make them more aligned with the notion of privacy risk and the privacy requirements. Finally, we propose a complete framework based on word embedding models that captures a broader notion of data protection and provides flexible protection driven by privacy requirements. We also leverage ontologies to preserve the utility of the masked text, that is, its semantics and readability. Extensive experimental results show that our methods outperform the state of the art by providing more robust anonymization while reasonably preserving the utility of the protected outcomes
Michel, Axel. "Personalising privacy contraints in Generalization-based Anonymization Models." Thesis, Bourges, INSA Centre Val de Loire, 2019. http://www.theses.fr/2019ISAB0001/document.
Full textThe benefit of performing Big data computations over individual’s microdata is manifold, in the medical, energy or transportation fields to cite only a few, and this interest is growing with the emergence of smart-disclosure initiatives around the world. However, these computations often expose microdata to privacy leakages, explaining the reluctance of individuals to participate in studies despite the privacy guarantees promised by statistical institutes. To regain indivuals’trust, it becomes essential to propose user empowerment solutions, that is to say allowing individuals to control the privacy parameter used to make computations over their microdata.This work proposes a novel concept of personalized anonymisation based on data generalization and user empowerment.Firstly, this manuscript proposes a novel approach to push personalized privacy guarantees in the processing of database queries so that individuals can disclose different amounts of information (i.e. data at different levels of accuracy) depending on their own perception of the risk. Moreover, we propose a decentralized computing infrastructure based on secure hardware enforcing these personalized privacy guarantees all along the query execution process.Secondly, this manuscript studies the personalization of anonymity guarantees when publishing data. We propose the adaptation of existing heuristics and a new approach based on constraint programming. Experiments have been done to show the impact of such personalization on the data quality. Individuals’privacy constraints have been built and realistically using social statistic studies
Sakpere, Aderonke Busayo. "Usability heuristics for fast crime data anonymization in resource-constrained contexts." Doctoral thesis, University of Cape Town, 2018. http://hdl.handle.net/11427/28157.
Full textJi, Shouling. "Evaluating the security of anonymized big graph/structural data." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54913.
Full textSehatkar, Morvarid. "Towards a Privacy Preserving Framework for Publishing Longitudinal Data." Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31629.
Full textGidofalvi, Gyözö. "Spatio-Temporal Data Mining for Location-Based Services." Doctoral thesis, Geomatic ApS - Center for Geoinformatics, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-86310.
Full textQC 20120215
Raza, Ali. "Test Data Extraction and Comparison with Test Data Generation." DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/982.
Full textBrunet, Solenn. "Conception de mécanismes d'accréditations anonymes et d'anonymisation de données." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S130/document.
Full textThe emergence of personal mobile devices, with communication and positioning features, is leading to new use cases and personalized services. However, they imply a significant collection of personal data and therefore require appropriate security solutions. Indeed, users are not always aware of the personal and sensitive information that can be inferred from their use. The main objective of this thesis is to show how cryptographic mechanisms and data anonymization techniques can reconcile privacy, security requirements and utility of the service provided. In the first part, we study keyed-verification anonymous credentials which guarantee the anonymity of users with respect to a given service provider: a user proves that she is granted access to its services without revealing any additional information. We introduce new such primitives that offer different properties and are of independent interest. We use these constructions to design three privacy-preserving systems: a keyed-verification anonymous credentials system, a coercion-resistant electronic voting scheme and an electronic payment system. Each of these solutions is practical and proven secure. Indeed, for two of these contributions, implementations on SIM cards have been carried out. Nevertheless, some kinds of services still require using or storing personal data for compliance with a legal obligation or for the provision of the service. In the second part, we study how to preserve users' privacy in such services. To this end, we propose an anonymization process for mobility traces based on differential privacy. It allows us to provide anonymous databases by limiting the added noise. Such databases can then be exploited for scientific, economic or societal purposes, for instance
Brown, Emily Elizabeth. "Adaptable Privacy-preserving Model." Diss., NSUWorks, 2019. https://nsuworks.nova.edu/gscis_etd/1069.
Full textPaulson, Jörgen. "The Effect of 5-anonymity on a classifier based on neural network that is applied to the adult dataset." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17918.
Full textAffonso, Elaine Parra [UNESP]. "A insciência do usuário na fase de coleta de dados: privacidade em foco." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154737.
Full textApproved for entry into archive by Satie Tagara (satie@marilia.unesp.br) on 2018-07-27T20:15:34Z (GMT) No. of bitstreams: 1 affonso_ep_dr_mar.pdf: 7225615 bytes, checksum: d2fc79d0116faacbbf780985663be725 (MD5)
Made available in DSpace on 2018-07-27T20:15:34Z (GMT). No. of bitstreams: 1 affonso_ep_dr_mar.pdf: 7225615 bytes, checksum: d2fc79d0116faacbbf780985663be725 (MD5) Previous issue date: 2018-07-05
Não recebi financiamento
A coleta de dados tem se tornado uma atividade predominante nos mais diversos meios digitais, em que as redes de computadores, principalmente a Internet, são essenciais para essa fase. A fim de minimizar a complexidade envolvida no uso de aplicações e de meios de comunicação, a relação usuário-tecnologia tem sido apoiada por interfaces cada vez mais amigáveis, o que contribui para que a coleta de dados, muitas vezes, ocorra de forma imperceptível ao usuário, tornando-o insciente sobre a coleta realizada pelos detentores de dados, situação que pode ferir o direito à privacidade de usuários e de referenciados. Para proporcionar consciência sobre a coleta de dados aos usuários, ambientes digitais disponibilizam políticas de privacidade com informações sobre essa fase, buscando conformidade às leis e aos regulamentos que amparam a proteção de dados pessoais, muito representada na literatura acadêmica por meio de modelos e técnicas para anonimização. A insciência sobre a coleta de dados pode estabelecer como o indivíduo se preocupa em relação às ameaças à sua privacidade e quais são as atitudes que ele deveria ter para ampliar a proteção de seus dados, que também pode ser estimulada pela carência de ações e de pesquisas de diversas áreas do conhecimento. Diante do exposto, o objetivo desta tese é caracterizar o contexto que favorece a insciência do usuário enquanto alvo de fases de coleta de dados em ambientes digitais, considerando implicações de privacidade. Para tanto, adotou-se a pesquisa exploratória-descritiva, com abordagem qualitativa. Utilizou-se a triangulação metodológica, a partir do referencial teórico que abarca a anonimização na fase de coleta de dados; legislações que amparam a proteção de dados pessoais e a coleta de dados realizada por tecnologias. Em relação às pesquisas no âmbito de proteção de dados por anonimização, observou-se que existe uma carência de trabalhos na fase de coleta de dados, uma vez que, muitas pesquisas têm concentrado esforços no contexto de medidas para compartilhar dados anonimizados, e quando a anonimização se efetua na coleta de dados, a ênfase tem sido em relação a dados de localização. Muitas vezes, as legislações ao abordarem elementos que estão envolvidos com a fase de coleta, apresentam esses conceitos de modo generalizado, principalmente em relação ao consentimento sobre a coleta, inclusive, a própria menção a atividade de coleta, emerge na maioria das leis por meio do termo tratamento. A maior parte das leis não possui um tópico específico para a coleta de dados, fator que pode fortalecer a insciência do usuário no que tange a coleta de seus dados. Os termos técnicos como anonimização, cookies e dados de tráfego são mencionados nas leis de modo esparso, e muitas vezes não estão vinculados especificamente a fase de coleta. Os dados semi-identificadores se sobressaem na coleta de dados pelos ambientes digitais, cenário que pode ampliar ainda mais as ameaças a privacidade devido à possibilidade de correlação desses dados, e com isso, a construção de perfis de indivíduos. A opacidade promovida pela abstração na coleta de dados pelos dispositivos tecnológicos vai além da insciência do usuário, ocasionando incalculáveis ameaças à privacidade e ampliando, indubitavelmente, a assimetria informacional entre detentores de dados e usuários. Conclui-se que a insciência do usuário sobre sua interação com os ambientes digitais pode diminuir a autonomia para controlar seus dados e acentuar quebras de privacidade. No entanto, a privacidade na coleta de dados é fortalecida no momento em que o usuário tem consciência sobre as ações vinculadas aos seus dados, que devem ser determinadas pelas políticas de privacidade, pelas leis e pelas pesquisas acadêmicas, três elementos evidenciados neste trabalho que se configuram como participativos no cenário que propicia a insciência do usuário.
Data collection has become a predominant activity in several digital media, in which computer networks, especially the internet, are essential for this phase. In order to minimize the complexity involved in the use of applications and media, the relationship between user and technology has been supported by ever more friendly interfaces, which oftentimes contributes to that data collection often occurs imperceptibly. This procedure leads the user to lack of awareness about the collection performed by the data holders, a situation that may harm the right to the privacy of this user and the referenced users. In order to provide awareness about the data collection to the user, digital environments provide privacy policies with information on this phase, seeking compliance with laws and regulations that protect personal data, widely represented in the academic literature through models and techniques to anonymization in the phase of data recovery. The lack of awareness on the data collection can establish how the individual is concerned about threats to its privacy and what actions it should take to extend the protection of its data, which can also be stimulated by the lack of action and researches in several areas of the knowledge. In view of the above, the objective of this thesis is to characterize the context that favors the lack of awareness of the user while the target of data collection phases in digital environments, considering privacy implications. For that, the exploratory research was adopted, with a qualitative approach. The methodological triangulation was used, from the theoretical referential that includes the anonymization in the phase of the data collection; the legislation that supports the protection of personal data and the data collection performed by technologies. The results show that, regarding researches on data protection by anonymization, it was observed that there is an absence of works about the data collection phase, since many researches have concentrated efforts in the context of measures to share anonymized data. When anonymization is done in data collection, the emphasis has been on location data. Often, legislation when addressing elements that are involved with the collection phase, present these concepts in a generalized way, mainly in relation to the consent on the collection, including the very mention of the collection activity, emerges in most laws through the term treatment. Most laws do not have a specific topic for data collection, a factor that can strengthen user insight regarding the collection of their data. Technical terms such as anonymization, cookies and traffic data emerge in the laws sparingly and are often not specifically linked to the collection phase. The quasi-identifiers data stands out in the data collected by the digital environments, a scenario that can further extend the threats to privacy due to the possibility of a correlation of this data, and with this, the construction of profiles of individuals. The opacity promoted by abstraction in data collection by computer networks goes beyonds the lack of awareness of the user, causing incalculable threats to its privacy and undoubtedly widening the informational asymmetry among data keepers and users. It is concluded that user insight about their interaction with digital environments can reduce the autonomy to control their data and accentuates privacy breaches. However, privacy in data collection is strengthened when the user is aware of the actions linked to its data, which should be determined by privacy policies, laws and academic research, i.e, three elements evidenced in this work that are constitute as participatory in the scenario that provides the lack of awareness of the user.
Heinrich, Jan-Philipp, Carsten Neise, and Andreas Müller. "Ähnlichkeitsmessung von ausgewählten Datentypen in Datenbanksystemen zur Berechnung des Grades der Anonymisierung." Universitätsbibliothek Chemnitz, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-233422.
Full textNilsson, Mattias, and Sebastian Olsson. "Bluetooth-enheter i offentliga rummet och anonymisering av data." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20025.
Full textInternet of Things (IoT) provides great opportunities to collect data for different purposes such as to estimate the number of people to control the heat in a room. Furthermore, IoT systems can automate tasks that can help us humans. This study is aimed at the type of data that can be interesting to gather in order to estimate the number of people in a public place. It is also about how sensitive data can be anonymized when gathered. To do this, Bluetooth devices was chosen for investigating how the MAC addresses would work to estimate the number of people. For collecting MAC addresses a proof of concept system was developed, where an Android application was used to collect MAC addresses. These MAC addresses were anonymized before being stored in a database. The application anonymize the unique MAC address according to three levels of security. Field studies were conducted as the number of people were counted visually then anonymous collection of MAC addresses were made. The conclusion was that Bluetooth will be difficult to use for estimating the number of people because not everyone has Bluetooth on. The application developed demonstrates that data can be collected safely and thus does not violate privacy.
Clause, James Alexander. "Enabling and supporting the debugging of software failures." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39514.
Full textNuñez, del Prado Cortez Miguel. "Inference attacks on geolocated data." Thesis, Toulouse, INSA, 2013. http://www.theses.fr/2013ISAT0028/document.
Full textIn recent years, we have observed the development of connected and nomad devices suchas smartphones, tablets or even laptops allowing individuals to use location-based services(LBSs), which personalize the service they offer according to the positions of users, on a dailybasis. Nonetheless, LBSs raise serious privacy issues, which are often not perceived by the endusers. In this thesis, we are interested in the understanding of the privacy risks related to thedissemination and collection of location data. To address this issue, we developed inferenceattacks such as the extraction of points of interest (POI) and their semantics, the predictionof the next location as well as the de-anonymization of mobility traces, based on a mobilitymodel that we have coined as mobility Markov chain. Afterwards, we proposed a classificationof inference attacks in the context of location data based on the objectives of the adversary.In addition, we evaluated the effectiveness of some sanitization measures in limiting the efficiencyof inference attacks. Finally, we have developed a generic platform called GEPETO (forGEoPrivacy Enhancing Toolkit) that can be used to test the developed inference attacks
Delanaux, Rémy. "Intégration de données liées respectueuse de la confidentialité." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1303.
Full textIndividual privacy is a major and largely unexplored concern when publishing new datasets in the context of Linked Open Data (LOD). The LOD cloud forms a network of interconnected and publicly accessible datasets in the form of graph databases modeled using the RDF format and queried using the SPARQL language. This heavily standardized context is nowadays extensively used by academics, public institutions and some private organizations to make their data available. Yet, some industrial and private actors may be discouraged by potential privacy issues. To this end, we introduce and develop a declarative framework for privacy-preserving Linked Data publishing in which privacy and utility constraints are specified as policies, that is sets of SPARQL queries. Our approach is data-independent and only inspects the privacy and utility policies in order to determine the sequence of anonymization operations applicable to any graph instance for satisfying the policies. We prove the soundness of our algorithms and gauge their performance through experimental analysis. Another aspect to take into account is that a new dataset published to the LOD cloud is indeed exposed to privacy breaches due to the possible linkage to objects already existing in the other LOD datasets. In the second part of this thesis, we thus focus on the problem of building safe anonymizations of an RDF graph to guarantee that linking the anonymized graph with any external RDF graph will not cause privacy breaches. Given a set of privacy queries as input, we study the data-independent safety problem and the sequence of anonymization operations necessary to enforce it. We provide sufficient conditions under which an anonymization instance is safe given a set of privacy queries. Additionally, we show that our algorithms are robust in the presence of sameAs links that can be explicit or inferred by additional knowledge. To conclude, we evaluate the impact of this safety-preserving solution on given input graphs through experiments. We focus on the performance and the utility loss of this anonymization framework on both real-world and artificial data. We first discuss and select utility measures to compare the original graph to its anonymized counterpart, then define a method to generate new privacy policies from a reference one by inserting incremental modifications. We study the behavior of the framework on four carefully selected RDF graphs. We show that our anonymization technique is effective with reasonable runtime on quite large graphs (several million triples) and is gradual: the more specific the privacy policy is, the lesser its impact is. Finally, using structural graph-based metrics, we show that our algorithms are not very destructive even when privacy policies cover a large part of the graph. By designing a simple and efficient way to ensure privacy and utility in plausible usages of RDF graphs, this new approach suggests many extensions and in the long run more work on privacy-preserving data publishing in the context of Linked Open Data
Eisoldt, Martin, Carsten Neise, and Andreas Müller. "Analyse verschiedener Distanzmetriken zur Messung des Anonymisierungsgrades theta." Technische Universität Chemnitz, 2019. https://monarch.qucosa.de/id/qucosa%3A34715.
Full textTrujillo, Rasúa Rolando. "Privacy in rfid and mobile objects." Doctoral thesis, Universitat Rovira i Virgili, 2012. http://hdl.handle.net/10803/86942.
Full textEls sistemes RFID permeten la identificació ràpida i automàtica d’etiquetes RFID a través d’un canal de comunicació sense fils. Aquestes etiquetes són dispositius amb cert poder de còmput i amb capacitat d’emmagatzematge de informació. Es per això que els objectes que porten una etiqueta RFID adherida permeten la lectura d’una quantitat rica i variada de dades que els descriuen i caracteritzen, com per exemple un codi únic d’identificació, el nom, el model o la data d’expiració. A més, aquesta informació pot ser llegida sense la necessitat d’un contacte visual entre el lector i l’etiqueta, la qual cosa agilitza considerablement els processos d’inventariat, identificació o control automàtic. Per a que l’ús de la tecnologia RFID es generalitzi amb èxit, es convenient complir amb diversos objectius: eficiència, seguretat i protecció de la privacitat. No obstant això, el disseny de protocols d’identificació segurs, privats i escalables, es un repte difícil d’abordar dades les restriccions computacionals de les etiquetes RFID i la seva naturalesa sense fils. Es per això que, en la present tesi, partim de protocols d’identificació segurs i privats, i mostrem com es pot aconseguir escalabilitat mitjançant una arquitectura distribuïda i col•laborativa. D’aquesta manera, la seguretat i la privacitat s’aconsegueixen mitjançant el propi protocol d’identificació, mentre que l’escalabilitat s’aconsegueix per mitjà de nous protocols col•laboratius que consideren la posició espacial i temporal de les etiquetes RFID. Independentment dels avenços en protocols d’identificació sense fils, existeixen atacs que poden passar exitosament qualsevol d’aquests protocols sense necessitat de conèixer o descobrir claus secretes vàlides, ni de trobar vulnerabilitats a les seves implantacions criptogràfiques. La idea d’aquestos atacs, coneguts com atacs de “relay”, consisteix en crear inadvertidament un pont de comunicació entre una etiqueta legítima i un lector legítim. D’aquesta manera, l’adversari utilitza els drets de l’etiqueta legítima per passar el protocol d’autentificació utilitzat pel lector. Es important tindre en compte que, dada la naturalesa sense fils dels protocols RFID, aquests tipus d’atacs representen una amenaça important a la seguretat en sistemes RFID. En aquesta dissertació proposem un nou protocol que, a més d’autentificació, realitza una revisió de la distància a la qual es troben el lector i l’etiqueta. Aquests tipus de protocols es coneixen com a “distance-boulding protocols”, els quals no prevenen aquests tipus d’atacs, però si que poden frustrar-los amb alta probabilitat. Per últim, afrontem els problemes de privacitat associats amb la publicació de informació recol•lectada a través de sistemes RFID. En concret, ens concentrem en dades de mobilitat, que també poden ser proveïdes per altres sistemes àmpliament utilitzats tals com el sistema de posicionament global (GPS) i el sistema global de comunicacions mòbils. La nostra solució es basa en la coneguda noció de privacitat “k-anonymity” i parcialment en micro-agregació. Per a aquesta finalitat, definim una nova funció de distància entre trajectòries amb la qual desenvolupen dos mètodes diferents d’anonimització de trajectòries.
Radio Frequency Identification (RFID) is a technology aimed at efficiently identifying and tracking goods and assets. Such identification may be performed without requiring line-of-sight alignment or physical contact between the RFID tag and the RFID reader, whilst tracking is naturally achieved due to the short interrogation field of RFID readers. That is why the reduction in price of the RFID tags has been accompanied with an increasing attention paid to this technology. However, since tags are resource-constrained devices sending identification data wirelessly, designing secure and private RFID identification protocols is a challenging task. This scenario is even more complex when scalability must be met by those protocols. Assuming the existence of a lightweight, secure, private and scalable RFID identification protocol, there exist other concerns surrounding the RFID technology. Some of them arise from the technology itself, such as distance checking, but others are related to the potential of RFID systems to gather huge amount of tracking data. Publishing and mining such moving objects data is essential to improve efficiency of supervisory control, assets management and localisation, transportation, etc. However, obvious privacy threats arise if an individual can be linked with some of those published trajectories. The present dissertation contributes to the design of algorithms and protocols aimed at dealing with the issues explained above. First, we propose a set of protocols and heuristics based on a distributed architecture that improve the efficiency of the identification process without compromising privacy or security. Moreover, we present a novel distance-bounding protocol based on graphs that is extremely low-resource consuming. Finally, we present two trajectory anonymisation methods aimed at preserving the individuals' privacy when their trajectories are released.
Klimek, Martin. "Neuroinformatika a sdílení dat z lékařských zobrazovacích systémů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2010. http://www.nusl.cz/ntk/nusl-218660.
Full textŠejvlová, Ludmila. "Porovnání přístupů ke generování umělých dat." Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-358804.
Full textPàmies, Estrems David. "Contributions to Lifelogging Protection In Streaming Environments." Doctoral thesis, Universitat Rovira i Virgili, 2020. http://hdl.handle.net/10803/669809.
Full textTodos los días, más de cinco mil millones de personas generan algún tipo de dato a través de Internet. Para acceder a esa información, necesitamos servicios de búsqueda, ya sean motores de búsqueda web o asistentes personales. En cada interacción con ellos, nuestro registro de acciones, logs, se utiliza para ofrecer una experiencia más útil. Para las empresas, también son muy valiosos, ya que ofrecen una forma de monetizar el servicio, vendiendo datos a terceros. Sin embargo, los logs podrían exponer información confidencial del usuario (identificadores, enfermedades, tendencias sexuales, creencias religiosas) o usarse para lo que se llama "life-logging": Un registro continuo de las actividades diarias. La normativa obliga a proteger esta información. Se han propuesto previamente sistemas de protección para conjuntos de datos cerrados, la mayoría de ellos trabajando con archivos atómicos o datos estructurados. Desafortunadamente, esos sistemas no se adaptan cuando se usan en el entorno de datos no estructurados en tiempo real que representan los servicios de Internet. Esta tesis tiene como objetivo diseñar técnicas para proteger la información confidencial del usuario en un entorno no estructurado de streaming en tiempo real, garantizando un equilibrio entre utilidad y protección de datos. Se han hecho tres propuestas para una protección eficaz de los logs. La primera es un nuevo método para anonimizar logs de consultas, basado en k-anonimato probabilístico y algunas herramientas de desanonimización para determinar fugas de datos. El segundo método, se ha mejorado añadiendo un equilibrio configurable entre privacidad y usabilidad, logrando una gran mejora en términos de utilidad de datos. La contribución final se refiere a los asistentes personales basados en Internet. La información generada por estos dispositivos se puede considerar “life-logging” y puede aumentar los riesgos de privacidad del usuario. Se propone un esquema de protección que combina anonimato de logs y firmas sanitizables.
Every day, more than five billion people generate some kind of data over the Internet. As a tool for accessing that information, we need to use search services, either in the form of Web Search Engines or through Personal Assistants. On each interaction with them, our record of actions via logs, is used to offer a more useful experience. For companies, logs are also very valuable since they offer a way to monetize the service. Monetization is achieved by selling data to third parties, however query logs could potentially expose sensitive user information: identifiers, sensitive data from users (such as diseases, sexual tendencies, religious beliefs) or be used for what is called ”life-logging”: a continuous record of one’s daily activities. Current regulations oblige companies to protect this personal information. Protection systems for closed data sets have previously been proposed, most of them working with atomic files or structured data. Unfortunately, those systems do not fit when used in the growing real-time unstructured data environment posed by Internet services. This thesis aims to design techniques to protect the user’s sensitive information in a non-structured real-time streaming environment, guaranteeing a trade-off between data utility and protection. In this regard, three proposals have been made in efficient log protection. The first is a new method to anonymize query logs, based on probabilistic k-anonymity and some de-anonymization tools to determine possible data leaks. A second method has been improved in terms of a configurable trade-off between privacy and usability, achieving a great improvement in terms of data utility. Our final contribution concerns Internet-based Personal Assistants. The information generated by these devices is likely to be considered life-logging, and it can increase the user’s privacy risks. The proposal is a protection scheme that combines log anonymization and sanitizable signatures.
Skřivánková, Barbora. "Anonymizace SPZ vozidel." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255442.
Full textDölle, Lukas. "Der Schutz der Privatsphäre bei der Anfragebearbeitung in Datenbanksystemen." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät, 2016. http://dx.doi.org/10.18452/17531.
Full textOver the last ten years many techniques for privacy-preserving data publishing have been proposed. Most of them anonymize a complete data table such that sensitive values cannot clearly be assigned to individuals. Their privacy is considered to be adequately protected, if an adversary cannot discover the actual value from a given set of at least k values. For this thesis we assume that users interact with a data base by issuing a sequence of queries against one table. The system returns a sequence of results that contains sensitive values. The goal of this thesis is to check if adversaries are able to link uniquely sensitive values to individuals despite anonymized result sets. So far, there exist algorithms to prevent deanonymization for aggregate queries. Our novel approach prevents deanonymization for arbitrary queries. We show that our approach can be transformed to matching problems in special graphs. However, finding maximum matchings in these graphs is NP-complete. Therefore, we develop several approximation algorithms, which compute specific matchings in polynomial time, that still maintaining privacy.
Coufal, Zdeněk. "Korelace dat na vstupu a výstupu sítě Tor." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-235412.
Full textRaad, Eliana. "Towards better privacy preservation by detecting personal events in photos shared within online social networks." Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS079/document.
Full textToday, social networking has considerably changed why people are taking pictures all the time everywhere they go. More than 500 million photos are uploaded and shared every day, along with more than 200 hours of videos every minute. More particularly, with the ubiquity of smartphones, social network users are now taking photos of events in their lives, travels, experiences, etc. and instantly uploading them online. Such public data sharing puts at risk the users’ privacy and expose them to a surveillance that is growing at a very rapid rate. Furthermore, new techniques are used today to extract publicly shared data and combine it with other data in ways never before thought possible. However, social networks users do not realize the wealth of information gathered from image data and which could be used to track all their activities at every moment (e.g., the case of cyberstalking). Therefore, in many situations (such as politics, fraud fighting and cultural critics, etc.), it becomes extremely hard to maintain individuals’ anonymity when the authors of the published data need to remain anonymous.Thus, the aim of this work is to provide a privacy-preserving constraint (de-linkability) to bound the amount of information that can be used to re-identify individuals using online profile information. Firstly, we provide a framework able to quantify the re-identification threat and sanitize multimedia documents to be published and shared. Secondly, we propose a new approach to enrich the profile information of the individuals to protect. Therefore, we exploit personal events in the individuals’ own posts as well as those shared by their friends/contacts. Specifically, our approach is able to detect and link users’ elementary events using photos (and related metadata) shared within their online social networks. A prototype has been implemented and several experiments have been conducted in this work to validate our different contributions
Chen, Yi-Jie, and 陳羿傑. "Privacy Information Protection through Data Anonymization." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/3bpp9z.
Full text中原大學
資訊工程研究所
103
Mobile apps is moving power behind the prevalence of intelligent mobile devices, which, in turn, bring in the exponentially growing number of mobile apps being developed. The personalized and ubiquitous characteristics of intelligent mobile devices, with the added variety of record taking and data sensing capabilities become a serious threat to user privacy when linked with the communication ability of the mobile devices. How to allow us to enjoy all the conveniences and services without privacy risk is an important issue to all users of mobile devices. The available privacy protection schemes or methods either require change made at the mobile device system framework and core, or require complicate technology process and skill. In this thesis, we proposed a proxy server based approach to develop a solution practical to ordinary users. A prototype has been implemented to demonstrate the practicality and usability of the privacy protection mechanism.
AbuSharkh, Hani. "Border-based Anonymization Method of Sharing Spatial-Temporal Data." Thesis, 2011. http://spectrum.library.concordia.ca/15131/1/AbuSharkh_MASc_F2011.pdf.
Full textLin, Wang Chih Jui, and 林王智瑞. "Multiple Release Anonymization for Time-Series Social Network Data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/53612112549688181096.
Full text國立清華大學
資訊工程學系
100
Nowadays, social networks have gained popularity among the world. Social networks have a lot of data about interaction among entities and these data may contain the individual privacy content. Accordingly, many studies have been proposed to protect the privacy in social networks. The previous works only focus on developing the privacy preserving methods for releasing a single anonymized graph used to represent a social network. However, the single anonymized graph may not be enough for analyzing the evolution of the whole network. Therefore, we address a novel problem of preserving the privacy of interaction among entities for multiple releases on time-series social network data in this thesis, which means we will release multiple anonymized graphs to represent a social network with time-series data. We point out that the privacy may be revealed across the multiple releases, if we apply the existing methods to generate the individual anonymized graph for the network in the different timestamps. We provide an anonymization method for releasing multiple anonymized graphs at one time on time-series social network data. We detail our experiment steps and evaluate the utility of the anonymized graphs by answering a series of aggregate queries. The results show that the multiple releases generated by our method answer the queries accurately.
Gao, Tianchong. "Privacy Preserving in Online Social Network Data Sharing and Publication." Thesis, 2019. http://hdl.handle.net/1805/20972.
Full textFollowing the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders. To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy. To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles.
(7428566), Tianchong Gao. "Privacy Preserving in Online Social Network Data Sharing and Publication." Thesis, 2019.
Find full textFollowing the trend of online data sharing and publishing, researchers raise their concerns about the privacy problem. Online Social Networks (OSNs), for example, often contain sensitive information about individuals. Therefore, anonymizing network data before releasing it becomes an important issue. This dissertation studies the privacy preservation problem from the perspectives of both attackers and defenders.
To defenders, preserving the private information while keeping the utility of the published OSN is essential in data anonymization. At one extreme, the final data equals the original one, which contains all the useful information but has no privacy protection. At the other extreme, the final data is random, which has the best privacy protection but is useless to the third parties. Hence, the defenders aim to explore multiple potential methods to strike a desirable tradeoff between privacy and utility in the published data. This dissertation draws on the very fundamental problem, the definition of utility and privacy. It draws on the design of the privacy criterion, the graph abstraction model, the utility method, and the anonymization method to further address the balance between utility and privacy.
To attackers, extracting meaningful information from the collected data is essential in data de-anonymization. De-anonymization mechanisms utilize the similarities between attackers’ prior knowledge and published data to catch the targets. This dissertation focuses on the problems that the published data is periodic, anonymized, and does not cover the target persons. There are two thrusts in studying the de-anonymization attacks: the design of seed mapping method and the innovation of generating-based attack method. To conclude, this dissertation studies the online data privacy problem from both defenders’ and attackers’ point of view and introduces privacy and utility enhancement mechanisms in different novel angles.
Li, Yidong. "Preserving privacy in data publishing and analysis." Thesis, 2011. http://hdl.handle.net/2440/68556.
Full textThesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2011
Viscardi, Cecilia. "Approximate Bayesian Computation and Statistical Applications to Anonymized Data: an Information Theoretic Perspective." Doctoral thesis, 2021. http://hdl.handle.net/2158/1236316.
Full textPeng, Wei. "Seed and Grow: An Attack Against Anonymized Social Networks." 2012. http://hdl.handle.net/1805/2884.
Full textDigital traces left by a user of an on-line social networking service can be abused by a malicious party to compromise the person’s privacy. This is exacerbated by the increasing overlap in user-bases among various services. To demonstrate the feasibility of abuse and raise public awareness of this issue, I propose an algorithm, Seed and Grow, to identify users from an anonymized social graph based solely on graph structure. The algorithm first identifies a seed sub-graph either planted by an attacker or divulged by collusion of a small group of users, and then grows the seed larger based on the attacker’s existing knowledge of the users’ social relations. This work identifies and relaxes implicit assumptions taken by previous works, eliminates arbitrary parameters, and improves identification effectiveness and accuracy. Experiment results on real-world collected datasets further corroborate my expectation and claim.
Duaso, Calés Rosario. "La protection des données personnelles contenues dans les documents publics accessibles sur Internet : le cas des données judiciaires." Thèse, 2002. http://hdl.handle.net/1866/2435.
Full textThe upheavals generated by the new means of disseminating public data, together with the multiple possibilities offered by the Internet, such as information storage, comprehensive memory tools and the use of search engines, give rise to major issues related to privacy protection. The dissemination of public data in digital format causes a shift in our scales of time and space, and changes the traditional concept ofpublic nature previously associated with the "paper" universe. We will study the means of protecting privacy, and the conditions for accessing and using the personal information, sometimes of a "sensitive" nature, which is contained in the public documents posted on the Internet. The characteristics of the information available through judicial data banks require special protection solutions, so that the necessary balance can be found between the principle of judicial transparency and the right to privacy.
"Mémoire présenté à la faculté des études supérieures en vue de l'obtention du grade de maître en droit (LL.M.)"