To see the other types of publications on this topic, follow the link: Data Mining And KDD.

Dissertations / Theses on the topic 'Data Mining And KDD'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Data Mining And KDD.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ferreira, José Alves. "Data mining em banco de dados de eletrocardiograma." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/98/98131/tde-15072014-094917/.

Full text
Abstract:
Neste estudo, foi proposta a exploração de um banco de dados, com informações de exames de eletrocardiogramas (ECG), utilizado pelo sistema denominado Tele-ECG do Instituto Dante Pazzanese de Cardiologia, aplicando a técnica de data mining (mineração de dados) para encontrar padrões que colaborem, no futuro, para a aquisição de conhecimento na análise de eletrocardiograma. A metodologia proposta permite que, com a utilização de data mining, investiguem-se dados à procura de padrões sem a utilização do traçado do ECG. Três pacotes de software (Weka, Orange e R-Project) do tipo open source foram utilizados, contendo, cada um deles, um conjunto de implementações algorítmicas e de diversas técnicas de data mining, além de serem softwares de domínio público. Regras conhecidas foram encontradas (confirmadas pelo especialista médico em análise de eletrocardiograma), evidenciando a validade dessa metodologia.<br>In this study, the exploration of electrocardiograms (ECG) databases, obtained from a Tele-ECG System of Dante Pazzanese Institute of Cardiology, has been proposed, applying the technique of data mining to find patterns that could collaborate, in the future, for the acquisition of knowledge in the analysis of electrocardiograms. The proposed method was to investigate the data looking for patterns without the use of the ECG traces. Three Data-mining open source software packages (Weka, Orange and R - Project) were used, containing, each one, a set of algorithmic implementations and various data mining techniques, as well as being a public domain software. Known rules were found (confirmed by medical experts in electrocardiogram analysis), showing the validity of the methodology.
APA, Harvard, Vancouver, ISO, and other styles
2

Carvalho, Renata Azevedo Santos. "Data mining no contexto de customer relationship management em uma franquia coca cola company." Universidade Federal de Pernambuco, 2010. https://repositorio.ufpe.br/handle/123456789/2406.

Full text
Abstract:
Made available in DSpace on 2014-06-12T15:57:51Z (GMT). No. of bitstreams: 2 arquivo3234_1.pdf: 2638292 bytes, checksum: cad6ee42242ce3e75b909176e6eaabc2 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2010<br>Data Mining é uma área de pesquisa multidisciplinar, incluindo tecnologias de banco de dados, inteligência artificial, redes neurais, aprendizado de máquina, estatística e visualização de dados, tendo como objetivo específico a descoberta de conhecimento novo que por ventura esteja escondido em grandes massas de dados. Como um dos grandes objetivos de uma corporação é conhecer seus clientes, este conhecimento precisa ocorrer em vários níveis, desde o tipo de produto desejado até que tipo de ofertas os clientes estão dispostos a aceitar mesmo que os produtos não sejam essenciais no momento. Esta forma de mercado dirigido pode atingir o extremo de uma relação individual com cada cliente à medida que a empresa deseje investir em segmentações (classificações) sucessivas da sua clientela. Sendo assim, esse trabalho tem como finalidade aplicar técnicas de mineração em conjunto com as diretrizes do CRM à uma franquia da Coca-Cola afim de gerar uma nova classificação dos seus clientes e auxiliar o cumprimento das metas anuais de venda com a criação de novas atividades de marketing dado o resultado da análise dos dados minerados
APA, Harvard, Vancouver, ISO, and other styles
3

Petersen, Rebecca. "Data Mining for Network Intrusion Detection : A comparison of data mining algorithms and an analysis of relevant features for detecting cyber-attacks." Thesis, Mittuniversitetet, Avdelningen för informations- och kommunikationssystem, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-28002.

Full text
Abstract:
Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.
APA, Harvard, Vancouver, ISO, and other styles
4

Fabian, Jaroslav. "Využití technik Data Mining v různých odvětvích." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2014. http://www.nusl.cz/ntk/nusl-224335.

Full text
Abstract:
This master’s thesis concerns about the use of data mining techniques in banking, insurance and shopping centres industries. The thesis theoretically describes algorithms and methodology CRISP-DM dedicated to data mining processes. With usage of theoretical knowledge and methods, the thesis suggests possible solution for various industries within business intelligence processes.
APA, Harvard, Vancouver, ISO, and other styles
5

Howard, Craig M. "Tools and techniques for knowledge discovery." Thesis, University of East Anglia, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Silva, Adinarte Correa da. "Modelo para análise de dados de gerência de redes utilizando técnicas de KDD." Florianópolis, SC, 2002. http://repositorio.ufsc.br/xmlui/handle/123456789/83701.

Full text
Abstract:
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.<br>Made available in DSpace on 2012-10-20T02:32:25Z (GMT). No. of bitstreams: 0Bitstream added on 2014-09-26T02:46:09Z : No. of bitstreams: 1 208297.pdf: 4084097 bytes, checksum: 7b462f6003d99c52f3001e14a4083b49 (MD5)
APA, Harvard, Vancouver, ISO, and other styles
7

Vargas, Emmanuel Roque, Montesinos Ricardo Cadillo, and David Mauricio. "Prediction of financial product acquisition for Peruvian savings and credit associations." Institute of Electrical and Electronics Engineers Inc, 2020. http://hdl.handle.net/10757/656581.

Full text
Abstract:
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado.<br>Savings and credit cooperatives in Peru are of great importance for their participation in the economy, reaching in 2019, deposits and deposits and assets of more than 2,890,191,000. However, they do not invest in predictive technologies to identify customers with a higher probability of purchasing a financial product, making marketing campaigns unproductive. In this work, a model based on machine learning is proposed to identify the clients who are most likely to acquire a financial product for Peruvian savings and credit cooperatives. The model was implemented using IBM SPSS Modeler for predictive analysis and tests were performed on 40,000 records on 10,000 clients, obtaining 91.25% accuracy on data not used in training.<br>Revisión por pares
APA, Harvard, Vancouver, ISO, and other styles
8

Orlygsdottir, Brynja. "Using knowledge discovery to identify potentially useful patterns of health promotion behavior of 10-12 year old Icelandic children." Diss., University of Iowa, 2008. http://ir.uiowa.edu/etd/6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Brax, Christoffer. "Recurrent neural networks for time-series prediction." Thesis, University of Skövde, Department of Computer Science, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-480.

Full text
Abstract:
<p>Recurrent neural networks have been used for time-series prediction with good results. In this dissertation recurrent neural networks are compared with time-delayed feed forward networks, feed forward networks and linear regression models on a prediction task. The data used in all experiments is real-world sales data containing two kinds of segments: campaign segments and non-campaign segments. The task is to make predictions of sales under campaigns. It is evaluated if more accurate predictions can be made when only using the campaign segments of the data.</p><p>Throughout the entire project a knowledge discovery process, identified in the literature has been used to give a structured work-process. The results show that the recurrent network is not better than the other evaluated algorithms, in fact, the time-delayed feed forward neural network showed to give the best predictions. The results also show that more accurate predictions could be made when only using information from campaign segments.</p>
APA, Harvard, Vancouver, ISO, and other styles
10

Delattorre, Joyce Paula Martin. "Arcabouço teórico para mineração de dados de defeitos construtivos em modelos BIM." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/3/3146/tde-05122016-152544/.

Full text
Abstract:
No mercado de construção civil, o BIM ou Modelagem da Informação da Construção, deixou de ser um modismo com poucos pioneiros, para ser a peça central da tecnologia do mercado de Arquitetura, Engenharia e Construção (AEC), abordando aspectos de projeto, construção e operação de edifícios. Além das informações de projeto, pode-se agregar ao modelo BIM dados externos oriundos da execução, avaliação e manutenção da construção. Cresce, com isso, o número de informações que podem ser armazenadas nos modelos e a oportunidade para identificação de padrões não explícitos, relacionados à geometria e topologia de seus componentes. Para análise destas informações, faz-se necessária a utilização de técnicas que permitam o seu processamento. Dentre as técnicas existentes para a descoberta de conhecimento em bases de dados está o KDD (Descoberta de Conhecimento em Bases de Dados) e, especificamente, a mineração de dados. Focando especificamente os dados oriundos do registro de defeitos da construção e considerando que o modelo BIM não é um repositório de dados no qual técnicas padrão podem ser aplicadas diretamente, esta pesquisa teve como objetivo o desenvolvimento de um arcabouço teórico que define os pontos relevantes para a utilização de técnicas de mineração de dados de defeitos construtivos em modelos BIM, fornecendo uma base conceitual para a sua aplicação prática. Acredita-se que a aplicação de mineração de dados em modelos BIM pode propiciar a identificação de padrões que são influenciados de alguma forma pela geometria dos elementos construtivos, padrões estes que podem ser úteis tanto para a análise de problemas de qualidade de execução, quanto para produtividade, manutenção, pós-ocupação, entre outros. Além da proposta de arcabouço teórico para mineração de dados em modelo BIM, esta pesquisa propôs um conjunto de componentes BIM para registro de informações de defeitos de construção, bem como uma proposta para categorização das relações entre os defeitos e os componentes do modelo BIM, de forma a tornar explícitas informações relevantes para mineração de seus dados.<br>In the construction market, BIM - Building Information Modeling is no longer a fad adopted by few pioneers, but the centerpiece of technology in the Architecture, Engineering and Construction market (AEC), addressing aspects of design, construction and operation of buildings. In addition to engineering design information, the BIM model allows for storage and management of information from the construction process, facilities operations and building maintenance. Alongside with this, the amount of information stored in models and the opportunity to identify patterns related to geometry and topology of construction components also increase. For the analysis of this information, the use of appropriate data processing techniques is essential. Use of KDD (Knowledge Database Discovery) and Data Mining are among the existing techniques used for knowledge extraction in large databases. While focusing on data from construction defects and considering that a BIM model is not a standard data repository, in which standard data mining techniques could be applied directly, this research aimed to develop a theoretical framework that defines the requirements and procedures for the use of Data Mining Techniques for construction defects in BIM models, while providing a conceptual basis for its practical application. It is based on the concept that the application of data mining in BIM models is able to retrieve patterns that are influenced by the geometry of building elements and that these patterns can be useful for analyzing issues of construction quality, productivity, maintenance, and post-occupancy, among others. In addition to the proposition of a theoretical framework, this research developed a standard set of BIM components for the record of construction defects data, and suggested a structure for the categorization of correlations between defects and BIM components, with the purpose of clearly identifying relevant information for the data mining process.
APA, Harvard, Vancouver, ISO, and other styles
11

Silva, Marcelo Cicero Ribeiro da. "Aprendizagem de m?quina em apoio a diagn?stico em ortopedia." Pontif?cia Universidade Cat?lica de Campinas, 2016. http://tede.bibliotecadigital.puc-campinas.edu.br:8080/jspui/handle/tede/902.

Full text
Abstract:
Submitted by SBI Biblioteca Digital (sbi.bibliotecadigital@puc-campinas.edu.br) on 2017-02-01T12:15:41Z No. of bitstreams: 1 Marcelo Cicero Ribeiro da Silva.pdf: 2629636 bytes, checksum: 626dcdd3e190058ed959a36deb2c116f (MD5)<br>Made available in DSpace on 2017-02-01T12:15:41Z (GMT). No. of bitstreams: 1 Marcelo Cicero Ribeiro da Silva.pdf: 2629636 bytes, checksum: 626dcdd3e190058ed959a36deb2c116f (MD5) Previous issue date: 2016-12-13<br>Pontif?cia Universidade Cat?lica de Campinas ? PUC Campinas<br>One of the major responsible to change in a competitive landscape is the steady progress of technology and communication (TIC). With the evolution of technology and 'machine learning', computers are already available to carry out learning in a sophisticated way, improving the prescriptions of medical diagnosis, generating a second opinion for the medical professional and thus, To provide a better service to the community. The objective of this research is to develop a computational model, supported by data mining using machine learning techniques and, using communication devices integrated with communication and information technologies, to provide efficient support for The medical diagnosis in the area of orthopedics.The proof of the concept of this proposal will be used besed on a public database in the branch of backbone and the specific objective will be assist the doctor in the discovery of the Diseases Olisthesis and Herniated disk. This application will work with the concept of Knowledge Discovery in Databases to achieve the desired result. This process will use the Data Mining that, through classification algorithms, can transform data into useful information to the support the medical professional in the elaboration of diagnosis. The research will explore and define, in the WEKA Data Mining tool, the most appropriate algorithm, among the several that already exist, that can offer the highest diagnostic accuracy and enable a mobile solution. The dynamics structured in this work should allow that system to be enriched for each new patient treated and, with this, the platform becomes more efficient and effective as it expands. It is expected that the consulted computational model can be configured as a second opinion in support of the diagnosis of the medical professional. The results were satisfactory obtaining an average accuracy index above 86%. Among the benefits it is believed that it will be possible to assist in the graduation of new professionals assisting them in the Medical Residency, and reducing problems in possible medical errors thus, increasing the efficiency during the attendance and saving time and money.<br>Um dos grandes respons?veis pela mudan?a deum panorama competidor ? o progresso constante da tecnologia da informa??o e comunica??o (TIC).A maior parte das dificuldades na tomada de decis?o ? a transforma??o de dados e informa??es em conhecimento, principalmente quando as bases de dados dizem respeito ? sa?de. Com a evolu??o da tecnologia e do ?aprendizado de m?quina?(machine learning), j? se disp?e de computadores capazes de realizar aprendizado de forma sofisticada, permitindo sua utiliza??o no aux?lio nas prescri??es de diagn?stico m?dico, gerando uma segunda opini?o para o profissional da medicina e contribuindo, assim, para uma melhor presta??o de servi?o ? comunidade. O objetivo da pesquisa relatada consiste em elaborar um modelo computacional, apoiado em minera??o de dados com uso de t?cnicas de aprendizado de m?quina, que, utilizando-se de dispositivos de comunica??o integrados ?s tecnologias de comunica??o e informa??o e que venha oferecer suporte eficiente para o diagn?stico m?dico na ?rea de ortopedia. A prova do conceito desta proposta utilizar? de uma base de dados p?blica na especialidade da ortopedia (coluna vertebral) e o objetivo espec?fico ser? o de auxiliar o m?dico na descoberta das doen?as Listese e H?rnia de Disco. Esta aplica??o trabalhou com o conceito de descoberta de conhecimento em bases de dados (Knowledge Discovery in Databases), para conseguir o resultado desejado. Esse processo a Minera??o de Dados que, por meio de algoritmos de classifica??o, poder? transformar dados em informa??es ?teis ao apoio do profissional m?dico na elabora??o do seu diagn?stico. A pesquisa ir? explorar e definir, na ferramenta de Data Mining WEKA, o algoritmo mais apropriado, dentre os v?rios j? existentes, que possa oferecer maior acur?cia no diagn?stico e que viabilize uma solu??o tipo mobile. A din?mica estruturada neste trabalho dever? permitir que o sistema seja enriquecido a cada novo paciente tratado e que, com isto, a plataforma se torne mais eficiente e eficaz ? medida que se amplie. Espera-se que o modelo computacional elaborado possa se configurar como uma segunda opini?o em apoio ao diagn?stico do profissional m?dico retornando o diagnostico do paciente. Os resultados obtidos foram satisfat?rios obtendo um ?ndice de acuracidade m?dia acima de 86%. Dentre os benef?cios acredita-se que ser? poss?vel auxiliar na forma??o de novos profissionais auxiliando-os na Resid?ncia M?dica, na redu??o de problemas decorrentes de erros m?dicos e, dessa forma, aumenta-se a efic?cia no atendimento com ganhos de tempo e dinheiro.
APA, Harvard, Vancouver, ISO, and other styles
12

Madureira, Erikson Manuel Geraldo Vieira de. "Análise de mercado : clustering." Master's thesis, Instituto Superior de Economia e Gestão, 2016. http://hdl.handle.net/10400.5/13122.

Full text
Abstract:
Mestrado em Decisão Económica e Empresarial<br>O presente trabalho tem como objetivo descrever as atividades realizadas durante o estágio efetuado na empresa Quidgest. Tendo a empresa a necessidade de estudar as suas diversas vertentes de negócio, optou-se por extrair e identificar as informações presentes no banco de dados da empresa. Para isso, foi utilizado um processo conhecido na análise de dados denominado por Extração de Conhecimento em Bases de Dados (ECBD). O maior desafio na utilização deste processo deveu-se há grande acumulação de informação pela empresa, que se foi intensificando a partir de 2013. Das fases do processo de ECBD, a que tem maior relevância é o data mining, onde é feito um estudo das variáveis caracterizadoras necessárias para a análise em foco. Foi escolhida a técnica de análise cluster da fase de data mining para que que toda análise possa ser eficiente, eficaz e se possa obter resultados de fácil leitura. Após o desenvolvimento do processo de ECBD, foi decidido que a fase de data mining podia ser implementada de modo a facilitar um trabalho futuro de uma análise realizada pela empresa. Para implementar essa fase, utilizaram-se técnicas de análise cluster e foi desenvolvida um programa em VBA/Excel centrada no utilizador. Para testar o programa criado foi utilizado um caso concreto da empresa. Esse caso consistiu em determinar quais os atuais clientes que mais contribuíram para a evolução da empresa nos anos de 2013 a 2015. Aplicando o caso referido no programa criado, obtiveram-se resultados e informações que foram analisadas e interpretadas.<br>This paper aims to describe the activities performed during the internship made in Quidgest company. Having the company need to study their various business areas, it was decided to extract and identify the information contained in the company's database. For this end, we used a process known in the data analysis called for Knowledge Discovery in Databases (KDD). The biggest challenge in using this process was due to their large accumulation of information by the company, which was intensified from 2013. The phases of the KDD process, which is the most relevant is data mining, where a study of characterizing variables required for the analysis is done. The cluster analysis technique of data mining phase was chosen for that any analysis can be efficient, effective and could provide results easy to read. After the development of the KDD process, it was decided that the data mining phase could be automated to facilitate future work carried out by the company. To automate this phase, cluster analysis techniques were used and was developed a program in VBA/Excel user-centered. To test the created program we used a specific case of the company. This case consisted in determining the current customers that have contributed to the company's evolution during the years 2013-2015. The application of the program has revealed useful information that has been analyzed and interpreted.<br>info:eu-repo/semantics/publishedVersion
APA, Harvard, Vancouver, ISO, and other styles
13

Cridelich, Carine caroline. "Influence of retraint systems during an automobile crash : prediction of injuries for frontal impact sled tests based on biomechanical data mining." Thesis, Besançon, 2015. http://www.theses.fr/2015BESA2009.

Full text
Abstract:
La sécurité automobile est l’une des principales considérations lors de l’achat d’un véhicule. Avant d’ être commercialisée, une voiture doit répondre aux normes de sécurité du pays, ce qui conduit au développement de systèmes de retenue tels que les airbags et ceintures de sécurité. De plus, des ratings comme EURO NCAP et US NCAP permettent d’évaluer de manière indépendante la sécurité de la voiture. Des essais catapultes sont entre autres effectués pour confirmer le niveau de protection du véhicule et les résultats sont généralement basés sur des valeurs de référence des dommages corporels dérivés de paramètres physiques mesurés dans les mannequins.Cette thèse doctorale présente une approche pour le traitement des données d’entrée (c’est-à-dire des paramètres des systèmes de retenue définis par des experts) suivie d’une classification des essais catapultes frontaux selon ces mêmes paramètres. L’étude est uniquement basée sur les données du passager, les données collectées pour le conducteur n’ étant pas assez complètes pour produire des résultats satisfaisants. L’objectif principal est de créer un modèle qui définit l’influence des paramètres d’entrées sur la sévérité des dommages et qui aide les ingénieurs à avoir un ordre de grandeur des résultats des essais catapultes selon la législation ou le rating choisi. Les valeurs biomécaniques du mannequin (outputs du modèle) ont été regroupées en clusters dans le but de définir des niveaux de dommages corporels. Le modèle ainsi que les différents algorithmes ont été implémentés dans un programme pour une meilleur utilisation quotidienne<br>Safety is one of the most important considerations when buying a new car. The car has to achievecrash tests defined by the legislation before being selling in a country, what drives to the developmentof safety systems such as airbags and seat belts. Additionally, ratings like EURO NCAP and US NCAPenable to provide an independent evaluation of the car safety. Frontal sled tests are thus carried outto confirm the protection level of the vehicle and the results are mainly based on injury assessmentreference values derived from physical parameters measured in dummies.This doctoral thesis presents an approach for the treatment of the input data (i.e. parameters ofthe restraint systems defined by experts) followed by a classification of frontal sled tests accordingto those parameters. The study is only based on data from the passenger side, the collected datafor the driver were not enough completed to produce satisfying results. The main objective is tocreate a model that evaluates the input parameters’ influence on the injury severity and helps theengineers having a prediction of the sled tests results according to the chosen legislation or rating.The dummy biomechanical values (outputs of the model) have been regrouped into clusters in orderto define injuries groups. The model and various algorithms have been implemented in a GraphicalUser Interface for a better practical daily use
APA, Harvard, Vancouver, ISO, and other styles
14

Patrício, Júnior José Carlos Almeida. "Mining Knowledge TV: Uma Abordagem de Ambiente de KDD com Ênfase em Mineração de Dados no Ambiente da Knowledge TV." Universidade Federal da Paraí­ba, 2012. http://tede.biblioteca.ufpb.br:8080/handle/tede/6068.

Full text
Abstract:
Made available in DSpace on 2015-05-14T12:36:32Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2749110 bytes, checksum: f4b96a7e26e5b3800a119692a456c32b (MD5) Previous issue date: 2012-05-17<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior<br>The Interactive Digital TV brings many innovations to the existing analog scenario, such as improved sound quality, image, number of channels, programs and services available to the user. However, the information representing the structured multimedia content are presented in a traditional format of tables and files, not providing interoperability between the information, nor having semantics. This way, the project Knowledge TV - KTV proposes to organize the data in the DTV semantically, using this concept of Semantic Web and knowledge representation, and provides an architecture for application development. A major component of the KTV is the environment of knowledge discovery in databases supported by semantic concepts. This environment aims to discover useful knowledge in the data through data mining, semantically organize them and make them available as a service. In this context, this work aims to specify and develop this environment that this approach will be called the Mining Knowledge TV - MKTV.<br>A TV Digital Interativa Interativa traz inúmeras inovações ao cenário analógico existente, como aumento da qualidade de som, imagem, quantidade de canais, programas e serviços disponíveis ao usuário. Contudo, as informações que representam o conteúdo multimídia se apresentam estruturadas de forma tradicional, em formato de tabelas e arquivos, não provendo interoperabilidade entre as informações, nem possuindo semântica. Desta maneira, o projeto Knowledge TV - KTV propõe organizar os dados presentes na TV Digital Interativa semanticamente, utilizando para isso conceitos da WEB Semântica e representação de conhecimento, além de fornecer uma arquitetura para desenvolvimento de aplicações. Um dos principais componentes do KTV é o ambiente de descoberta de conhecimento em base de dados apoiado por conceitos semânticos. Este ambiente tem por objetivo descobrir conhecimentos úteis nos dados, através da Mineração de Dados, organizá-los semanticamente e disponibilizá-los como um serviço. Neste contexto, este trabalho visa especificar e desenvolver esse ambiente que nesta abordagem será chamado de Mining Knowledge TV MKTV.
APA, Harvard, Vancouver, ISO, and other styles
15

Conti, Fabieli de. "Mineração de dados no Moodle: análise de prazos de entrega de atividadesTREGA DE ATIVIDADES." Universidade Federal de Santa Maria, 2011. http://repositorio.ufsm.br/handle/1/5389.

Full text
Abstract:
Virtual Learning Environments became common practice as a course tool for both distance and presence learning courses, as they support the communication among the parties involved. This study describes research carried out on data that were generated by the interaction with the Moodle VLE within an educational institution, with focus on the analysis of due dates and actual submission dates for assignments in the course environment. The objective of this study is to obtain relevant information about how course assignments are posted in the learning environment, to guide actions supporting the reduction of submissions after the due date or close to the deadline, and to propose a transparent and automatic approach to integrating KDD activities to the Moodle environment, where the data mining stage is restricted to the algorithms selected within this study and the results are presented in a simplified manner within the user interface in the Moodle environment. The study considers the time the assignment remained open for posting, the course to which the assignment was proposed and the actual time when the assignment was posted into the environment. It was carried out following the steps of the knowledge discovery process in databases, using the Weka tool. As a result from the KDD process performed in our database, the number of postings that were closer to the final expiry date were higher for assignments longer than 15 days, graduate courses tended to have longer assignments than undergraduate courses, and they also presented a higher number of postings after the due date or close to the expiry date of the assignments. In this context, shorter assignments are recommended, in order to increase postings soon after the opening of assignments and to enable teachers to obtain faster feedback from the learning process undergone by the student. That makes possible to take corrective actions in shorter time in order to avoid student failure or dismissal. The implementation of the KDD process within Moodle enables the experimentation by users in an automatic and simplified manner.<br>Como ferramenta pedagógica, os Ambientes Virtuais de Aprendizagem (AVA) tornaram-se prática comum para o ensino à distância como nos cursos presenciais, por dar apoio à comunicação entre os envolvidos com o ensino. Essa dissertação descreve uma pesquisa realizada sobre os dados gerados pela interação com o AVA Moodle de uma instituição de ensino, focando a análise de prazos e de datas efetivas de submissões de tarefas neste ambiente. O objetivo deste trabalho é identificar padrões relevantes sobre a postagem de tarefas no ambiente, para subsidiar ações em auxílio à postagem muito próximo ao final do período de postagem e propor uma forma transparente e automática de integrar ao Moodle as atividades de KDD. A ferramenta de integração proposta aborta os algoritmos de mineração de dados EM e J.48, selecionados no nosso estudo e os resultados são apresentados de forma simplificada aos usuários na própria interface do Moodle. Para o estudo, são considerados o período em que a tarefa permaneceu aberta para postagem, o curso proveniente da tarefa e o período em que a postagem foi realizada. O estudo foi realizado seguindo as etapas do processo de descoberta de conhecimento, com a utilização da ferramenta Weka. No estudo observou-se a incidência do número de postagens mais próximas ao término do tempo de postagem quando o prazo da mesma era superior a 15 dias. Nos cursos de pós-graduação, observa-se que o tempo para postagem é maior que nos cursos de nível superior e que esse nível apresenta maior quantidade de postagem sendo realizadas no final do prazo de postagem. Nesse contexto, é mais viável a realização de atividades com um prazo menor. Além de um maior número de submissões logo na abertura para postagem, o professor consegue feedback mais rápido do processo de aprendizagem do aluno. Isso possibilita tomar atitudes corretivas em tempo mais adequado a fim de evitar o insucesso ou desistência do aluno. Com a implementação da integração do KDD ao Moodle é possível a realização de experimentos por usuários de forma automática e simplificada.
APA, Harvard, Vancouver, ISO, and other styles
16

Ribeiro, Marcela Xavier. "Mineração de dados em múltiplas tabelas fato de um data warehouse." Universidade Federal de São Carlos, 2004. https://repositorio.ufscar.br/handle/ufscar/299.

Full text
Abstract:
Made available in DSpace on 2016-06-02T19:05:14Z (GMT). No. of bitstreams: 1 DissMXR.pdf: 889186 bytes, checksum: fe616fa6309b5ac267855726e8a6938b (MD5) Previous issue date: 2004-05-19<br>Financiadora de Estudos e Projetos<br>The progress of the information technology has allowed huge amount of data to be stored. Those data, when submitted to a process of knowledge discovery, can bring interesting results. Data warehouses are repositories of high quality data. A procedure that has been adopted in big companies is the joint use of data warehouse and data mining technologies, where the process of knowledge discovery takes advantage over the high quality of the warehouse s data. When the data warehouse has information about more than one subject, it also has more than one fact table. The joint analysis of multiple fact tables can bring interesting knowledge as, for instance, the relationship between purchases and sales in a company. This research presents a technique to mine data from multiple fact tables of a data warehouse, which is a new kind of association rule mining.<br>O progresso da tecnologia de informação permitiu que quantidades cada vez maiores de dados fossem armazenadas. Esses dados, no formato original de armazenamento, não trazem conhecimento, porém, quando tratados e passados por um processo de extração de conhecimento, podem revelar conhecimentos interessantes. Os data warehouses são repositórios de dados com alta qualidade. Um procedimento que vem sendo amplamente adotado nas grandes empresas é a utilização conjunta das tecnologias de data warehouse e da mineração de dados, para que o processo de descoberta de conhecimento pela alta qualidade dos dados do data warehouse. Data warehouses que envolvem mais de um assunto também envolvem mais de uma tabela fato (tabelas que representam o assunto do data warehouse). A análise em conjunto de múltiplos assuntos de um data warehouse pode revelar padrões interessantes como o relacionamento entre as compras e as vendas de determinada organização. Este projeto de pesquisa está direcionado ao desenvolvimento de técnicas de mineração de dados envolvendo múltiplas tabelas fato de um data warehouse, que é um novo tipo de mineração de regras de associação.
APA, Harvard, Vancouver, ISO, and other styles
17

Oliveira, Robson Butaca Taborelli de. "O processo de extração de conhecimento de base de dados apoiado por agentes de software." Universidade de São Paulo, 2000. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-23092001-231242/.

Full text
Abstract:
Os sistemas de aplicações científicas e comerciais geram, cada vez mais, imensas quantidades de dados os quais dificilmente podem ser analisados sem que sejam usados técnicas e ferramentas adequadas de análise. Além disso, muitas destas aplicações são voltadas para Internet, ou seja, possuem seus dados distribuídos, o que dificulta ainda mais a realização de tarefas como a coleta de dados. A área de Extração de Conhecimento de Base de Dados diz respeito às técnicas e ferramentas usadas para descobrir automaticamente conhecimento embutido nos dados. Num ambiente de rede de computadores, é mais complicado realizar algumas das etapas do processo de KDD, como a coleta e processamento de dados. Dessa forma, pode ser feita a utilização de novas tecnologias na tentativa de auxiliar a execução do processo de descoberta de conhecimento. Os agentes de software são programas de computadores com propriedades, como, autonomia, reatividade e mobilidade, que podem ser utilizados para esta finalidade. Neste sentido, o objetivo deste trabalho é apresentar a proposta de um sistema multi-agente, chamado Minador, para auxiliar na execução e gerenciamento do processo de Extração de Conhecimento de Base de Dados.<br>Nowadays, commercial and scientific application systems generate huge amounts of data that cannot be easily analyzed without the use of appropriate tools and techniques. A great number of these applications are also based on the Internet which makes it even more difficult to collect data, for instance. The field of Computer Science called Knowledge Discovery in Databases deals with issues of the use and creation of the tools and techniques that allow for the automatic discovery of knowledge from data. Applying these techniques in an Internet environment can be particulary difficult. Thus, new techniques need to be used in order to aid the knowledge discovery process. Software agents are computer programs with properties such as autonomy, reactivity and mobility that can be used in this way. In this context, this work has the main goal of presenting the proposal of a multiagent system, called Minador, aimed at supporting the execution and management of the Knowledge Discovery in Databases process.
APA, Harvard, Vancouver, ISO, and other styles
18

Prášil, Zdeněk. "Využití data miningu v řízení podniku." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-150279.

Full text
Abstract:
The thesis is focused on data mining and its use in management of an enterprise. The thesis is structured into theoretical and practical part. Aim of the theoretical part was to find out: 1/ the most used methods of the data mining, 2/ typical application areas, 3/ typical problems solved in the application areas. Aim of the practical part was: 1/ to demonstrate use of the data mining in small Czech e-shop for understanding of the structure of the sale data, 2/ to demonstrate, how the data mining analysis can help to increase marketing results. In my analyses of the literature data I found decision trees, linear and logistic regression, neural network, segmentation methods and association rules are the most used methods of the data mining analysis. CRM and marketing, financial institutions, insurance and telecommunication companies, retail trade and production are the application areas using the data mining the most. The specific tasks of the data mining focus on relationships between marketing sales and customers to make better business. In the analysis of the e-shop data I revealed the types of goods which are buying together. Based on this fact I proposed that the strategy supporting this type of shopping is crucial for the business success. As a conclusion I proved the data mining is methods appropriate also for the small e-shop and have capacity to improve its marketing strategy.
APA, Harvard, Vancouver, ISO, and other styles
19

Rebolledo, Lorca Víctor. "Plataforma para la Extracción y Almacenamiento del Conocimiento Extraído de los Web Data." Tesis, Universidad de Chile, 2008. http://www.repositorio.uchile.cl/handle/2250/101971.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Beth, Madariaga Daniel Guillermo. "Identificación de las tendencias de reclamos presentes en reclamos.cl y que apunten contra instituciones de educación y organizaciones públicas." Tesis, Universidad de Chile, 2012. http://www.repositorio.uchile.cl/handle/2250/113396.

Full text
Abstract:
Ingeniero Civil Industrial<br>En la siguiente memoria se busca corroborar, por medio de una experiencia práctica y aplicada, si a caso el uso de las técnicas de Web Opinion Mining (WOM) y de herramientas informáticas, permiten determinar las tendencias generales que pueden poseer un conjunto de opiniones presentes en la Web. Particularmente, los reclamos publicados en el sitio web Reclamos.cl, y que apuntan contra instituciones pertenecientes a las industrias nacionales de Educación y de Gobierno. En ese sentido, los consumidores cada vez están utilizando más la Web para publicar en ella las apreciaciones positivas y negativas que poseen sobre lo que adquieren en el mercado, situación que hace de esta una mina de oro para diversas instituciones, especialmente para lo que es el identificar las fortalezas y las debilidades de los productos y los servicios que ofrecen, su imagen pública, entre varios otros aspectos. Concretamente, el experimento se realiza a través de la confección y la ejecución de una aplicación informática que integra e implementa conceptos de WOM, tales como Knowledge Discovery from Data (KDD), a modo de marco metodológico para alcanzar el objetivo planteado, y Latent Dirichlet Allocation (LDA), para lo que es la detección de tópicos dentro de los contenidos de los reclamos abordados. También se hace uso de programación orientada a objetos, basada en el lenguaje Python, almacenamiento de datos en bases de datos relacionales, y se incorporan herramientas pre fabricadas con tal de simplificar la realización de ciertas tareas requeridas. La ejecución de la aplicación permitió descargar las páginas web en cuyo interior se encontraban los reclamos de interés para la realización experimento, detectando en ellas 6.460 de estos reclamos; los cueles estaban dirigidos hacia 245 instituciones, y cuya fecha de publicación fue entre el 13 de Julio de 2006 y el 5 de Diciembre de 2011. Así también, la aplicación, mediante el uso de listas de palabras a descartar y de herramientas de lematización, procesó los contenidos de los reclamos, dejando en ellos sólo las versiones canónicas de las palabras que los constituían y que aportasen significado a estos. Con ello, la aplicación llevó a cabo varios análisis LDA sobre estos contenidos, los que arbitrariamente se definieron para ser ejecutados por cada institución detectada, tanto sobre el conjunto total de sus reclamos, como en segmentos de estos agrupados por año de publicación, con tal de generar, por cada uno de estos análisis, resultados compuestos por 20 tópicos de 30 palabras cada uno. Con los resultados de los análisis LDA, y mediante una metodología de lectura e interpretación manual de las palabras que constituían cada uno de los conjuntos de tópicos obtenidos, se procedió a generar frases y oraciones que apuntasen a hilarlas, con tal de obtener una interpretación que reflejase la tendencia a la cual los reclamos, representados en estos resultados, apuntaban. De esto se pudo concluir que es posible detectar las tendencias generales de los reclamos mediante el uso de las técnicas de WOM, pero con observaciones al respecto, pues al surgir la determinación de las tendencias desde un proceso de interpretación manual, se pueden generar subjetividades en torno al objeto al que apuntan dichas tendencias, ya sea por los intereses, las experiencias, entre otros, que posea la persona que realice el ejercicio de interpretación de los resultados.
APA, Harvard, Vancouver, ISO, and other styles
21

Moretti, Caio Benatti. "Análise de grandezas cinemáticas e dinâmicas inerentes à hemiparesia através da descoberta de conhecimento em bases de dados." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/18/18149/tde-13062016-184240/.

Full text
Abstract:
Em virtude de uma elevada expectativa de vida mundial, faz-se crescente a probabilidade de ocorrer acidentes naturais e traumas físicos no cotidiano, o que ocasiona um aumento na demanda por reabilitação. A terapia física, sob o paradigma da reabilitação robótica com serious games, oferece maior motivação e engajamento do paciente ao tratamento, cujo emprego foi recomendado pela American Heart Association (AHA), apontando a mais alta avaliação (Level A) para pacientes internados e ambulatoriais. No entanto, o potencial de análise dos dados coletados pelos dispositivos robóticos envolvidos é pouco explorado, deixando de extrair informações que podem ser de grande valia para os tratamentos. O foco deste trabalho consiste na aplicação de técnicas para descoberta de conhecimento, classificando o desempenho de pacientes diagnosticados com hemiparesia crônica. Os pacientes foram inseridos em um ambiente de reabilitação robótica, fazendo uso do InMotion ARM, um dispositivo robótico para reabilitação de membros superiores e coleta dos dados de desempenho. Foi aplicado sobre os dados um roteiro para descoberta de conhecimento em bases de dados, desempenhando pré-processamento, transformação (extração de características) e então a mineração de dados a partir de algoritmos de aprendizado de máquina. A estratégia do presente trabalho culminou em uma classificação de padrões com a capacidade de distinguir lados hemiparéticos sob uma precisão de 94%, havendo oito atributos alimentando a entrada do mecanismo obtido. Interpretando esta coleção de atributos, foi observado que dados de força são mais significativos, os quais abrangem metade da composição de uma amostra.<br>As a result of a higher life expectancy, the high probability of natural accidents and traumas occurences entails an increasing need for rehabilitation. Physical therapy, under the robotic rehabilitation paradigm with serious games, offers the patient better motivation and engagement to the treatment, being a method recommended by American Heart Association (AHA), pointing the highest assessment (Level A) for inpatients and outpatients. However, the rich potential of the data analysis provided by robotic devices is poorly exploited, discarding the opportunity to aggregate valuable information to treatments. The aim of this work consists of applying knowledge discovery techniques by classifying the performance of patients diagnosed with chronic hemiparesis. The patients, inserted into a robotic rehabilitation environment, exercised with the InMotion ARM, a robotic device for upper-limb rehabilitation which also does the collection of performance data. A Knowledge Discovery roadmap was applied over collected data in order to preprocess, transform and perform data mining through machine learning methods. The strategy of this work culminated in a pattern classification with the abilty to distinguish hemiparetic sides with an accuracy rate of 94%, having eight attributes feeding the input of the obtained mechanism. The interpretation of these attributes has shown that force-related data are more significant, comprising half of the composition of a sample.
APA, Harvard, Vancouver, ISO, and other styles
22

Marques, Delano Brandes. "SISTEMA INTEGRADO DE MONITORAMENTO E CONTROLE DA QUALIDADE DE COMBUSTÍVEL." Universidade Federal do Maranhão, 2004. http://tedebc.ufma.br:8080/jspui/handle/tede/348.

Full text
Abstract:
Made available in DSpace on 2016-08-17T14:52:51Z (GMT). No. of bitstreams: 1 Delano Brandes Marques.pdf: 3918036 bytes, checksum: 599a5c86f30b5b6799c9afd54e7b5de7 (MD5) Previous issue date: 2004-02-27<br>This work aims the implantation of an Integrated System that, besides allowing a better, more efficient and more practical monitoring, makes possible the control and optimization of problems related to the oil industry. In order to guarantee fuel s quality and normalization, the development of efficient tools that allow it s monitoring of any point (anywhere) and for any type of fuel is indispensable. Considering the variety of criteria, a decision making should be based on the evaluation of the most varied types of space data and not space data. In this sense, Knowledge Discovery in Databases process is used, where the Data Warehouse and Data Mining steps allied to a Geographic Information System are emphasized. This system presents as objective including several fuel monitoring regions. From different information obtained in the ANP databases, an analysis was carried out and a Data Warehouse model proposed. In the sequel, Data Mining techniques (Principal Component Analysis, Clustering Analysis and Multiple Regression) were applied to the results in order to obtain knowledge (patterns).<br>O presente trabalho apresenta estudos que visam a implantação de um Sistema Integrado que, além de permitir um melhor monitoramento, praticidade e eficiência, possibilite o controle e otimização de problemas relacionados à indústria de petróleo. Para garantir qualidade e normalização do combustível, é indispensável o desenvolvimento de ferramentas eficientes que permitam o seu monitoramento de qualquer ponto e para qualquer tipo de combustível. Considerando a variedade dos critérios, uma tomada de decisão deve ser baseada na avaliação dos mais variados tipos de dados espaciais e não espaciais. Para isto, é utilizado o Processo de Descoberta de Conhecimento, onde são enfatizadas as etapas de Data Warehouse e Data Mining aliadas ao conceito de um Sistema de Informação Geográfica. O sistema tem por objetivo abranger várias regiões de monitoramento de combustíveis. A partir do levantamento e análise das diferentes informações usadas nos bancos de dados da ANP foi proposto um modelo de data warehouse. Na seqüência foram aplicadas técnicas de mineração de dados (Análise de Componentes Principais, Análise de Agrupamento e Regressão) visando à obtenção de conhecimento (padrões).
APA, Harvard, Vancouver, ISO, and other styles
23

Neto, Cantídio de Moura Campos. "Análise inteligente de dados em um banco de dados de procedimentos em cardiologia intervencionista." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/98/98131/tde-18102016-085650/.

Full text
Abstract:
O tema deste estudo abrange duas áreas do conhecimento: a Medicina e a Ciência da Computação. Consiste na aplicação do processo de descoberta de conhecimento em base de Dados (KDD - Knowledge Discovery in Databases), a um banco de dados real na área médica denominado Registro Desire. O Registro Desire é o registro mais longevo da cardiologia intervencionista mundial, unicêntrico e acompanha por mais de 13 anos 5.614 pacientes revascularizados unicamente pelo implante de stents farmacológicos. O objetivo é criar por meio desta técnica um modelo que seja descritivo e classifique os pacientes quanto ao risco de ocorrência de eventos cardíacos adversos maiores e indesejáveis, e avaliar objetivamente seu desempenho. Posteriormente, apresentar as regras extraídas deste modelo aos usuários para avaliar o grau de novidade e de concordância do seu conteúdo com o conhecimento dos especialistas. Foram criados modelos simbólicos de classificação pelas técnicas da árvore de decisão e regras de classificação utilizando para a etapa de mineração de dados os algoritmos C4.5, Ripper e CN2, em que o atributo-classe foi a ocorrência ou não do evento cardíaco adverso. Por se tratar de uma classificação binária, os modelos foram avaliados objetivamente pelas métricas associadas à matriz de confusão como acurácia, sensibilidade, área sob a curva ROC e outras. O algoritmo de mineração processa automaticamente todos os atributos de cada paciente exaustivamente para identificar aqueles fortemente associados com o atributo-classe (evento cardíaco) e que irão compor as regras. Foram extraídas as principais regras destes modelos de modo indireto, por meio da árvore de decisão ou diretamente pela regra de classificação, que apresentaram as variáveis mais influentes e preditoras segundo o algoritmo de mineração. Os modelos permitiram entender melhor o domínio de aplicação, relacionando a influência de detalhes da rotina e as situações associadas ao procedimento médico. Pelo modelo, foi possível analisar as probabilidades da ocorrência e da não ocorrência de eventos em diversas situações. Os modelos induzidos seguiram uma lógica de interpretação dos dados e dos fatos com a participação do especialista do domínio. Foram geradas 32 regras das quais três foram rejeitadas, 20 foram regras esperadas e sem novidade, e 9 foram consideradas regras não tão esperadas, mas que tiveram grau de concordância maior ou igual a 50%, o que as tornam candidatas à investigação para avaliar sua eventual importância. Tais modelos podem ser atualizados ao aplicar novamente o algoritmo de mineração ao banco com os dados mais recentes. O potencial dos modelos simbólicos e interpretáveis é grande na Medicina quando aliado à experiência do profissional, contribuindo para a Medicina baseada em evidência.<br>The main subject of this study comprehends two areas of knowledge, the Medical and Computer Science areas. Its purpose is to apply the Knowledge Discovery Database-KDD to the DESIRE Registry, an actual Database in Medical area. The DESIRE Registry is the oldest world\'s registry in interventional cardiology, is unicentric, which has been following up 5.614 resvascularized patients for more then 13 years, solely with pharmacological stent implants. The goal is to create a model using this technique that is meaningful to classify patients as the risk of major adverse cardiac events (MACE) and objectively evaluate their performance. Later present rules drawn from this model to the users to assess the degree of novelty and compliance of their content with the knowledge of experts. Symbolic classification models were created using decision tree model, and classification rules using for data mining step the C4.5 algorithms, Ripper and CN2 where the class attribute is the presence or absence of a MACE. As the classification is binary, the models where objectively evaluated by metrics associated to the Confusion Matrix, such as accuracy, sensitivity, area under the ROC curve among others. The data mining algorithm automatically processes the attributes of each patient, who are thoroughly tested in order to identify the most predictive to the class attribute (MACE), whom the rules will be based on. Indirectly, using decision tree, or directly, using the classification rules, the main rules of these models were extracted to show the more predictable and influential variables according to the mining algorithm. The models allowed better understand the application range, creating a link between the influence of the routine details and situations related to the medical procedures. The model made possible to analyse the probability of occurrence or not of events in different situations. The induction of the models followed an interpretation of the data and facts with the participation of the domain expert. Were generated 32 rules of which only three were rejected, 20 of them were expected rules and without novelty and 9 were considered rules not as expected but with a degree of agreement higher or equal 50%, which became candidates for an investigation to assess their possible importance. These models can be easily updated by reapplying the mining process to the database with the most recent data. There is a great potential of the interpretable symbolic models when they are associated with professional background, contributing to evidence-based medicine.
APA, Harvard, Vancouver, ISO, and other styles
24

Homoliak, Ivan. "Metriky pro detekci útoků v síťovém provozu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236525.

Full text
Abstract:
Publication aims to propose and apply new metrics for intrusion detection in network traffic according to analysis of existing metrics, analysis of network traffic and behavioral characteristics of known attacks. The main goal of the thesis is to propose and implement new collection of metrics which will be capable to detect zero day attacks.
APA, Harvard, Vancouver, ISO, and other styles
25

Mohd, Saudi Madihah. "A new model for worm detection and response : development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosis." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5410.

Full text
Abstract:
Worms have been improved and a range of sophisticated techniques have been integrated, which make the detection and response processes much harder and longer than in the past. Therefore, in this thesis, a STAKCERT (Starter Kit for Computer Emergency Response Team) model is built to detect worms attack in order to respond to worms more efficiently. The novelty and the strengths of the STAKCERT model lies in the method implemented which consists of STAKCERT KDD processes and the development of STAKCERT worm classification, STAKCERT relational model and STAKCERT worm apoptosis algorithm. The new concept introduced in this model which is named apoptosis, is borrowed from the human immunology system has been mapped in terms of a security perspective. Furthermore, the encouraging results achieved by this research are validated by applying the security metrics for assigning the weight and severity values to trigger the apoptosis. In order to optimise the performance result, the standard operating procedures (SOP) for worm incident response which involve static and dynamic analyses, the knowledge discovery techniques (KDD) in modeling the STAKCERT model and the data mining algorithms were used. This STAKCERT model has produced encouraging results and outperformed comparative existing work for worm detection. It produces an overall accuracy rate of 98.75% with 0.2% for false positive rate and 1.45% is false negative rate. Worm response has resulted in an accuracy rate of 98.08% which later can be used by other researchers as a comparison with their works in future.
APA, Harvard, Vancouver, ISO, and other styles
26

Schneider, Luís Felipe. "Aplicação do processo de descoberta de conhecimento em dados do poder judiciário do estado do Rio Grande do Sul." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2003. http://hdl.handle.net/10183/8968.

Full text
Abstract:
Para explorar as relações existentes entre os dados abriu-se espaço para a procura de conhecimento e informações úteis não conhecidas, a partir de grandes conjuntos de dados armazenados. A este campo deu-se o nome de Descoberta de Conhecimento em Base de Dados (DCBD), o qual foi formalizado em 1989. O DCBD é composto por um processo de etapas ou fases, de natureza iterativa e interativa. Este trabalho baseou-se na metodologia CRISP-DM . Independente da metodologia empregada, este processo tem uma fase que pode ser considerada o núcleo da DCBD, a “mineração de dados” (ou modelagem conforme CRISP-DM), a qual está associado o conceito “classe de tipo de problema”, bem como as técnicas e algoritmos que podem ser empregados em uma aplicação de DCBD. Destacaremos as classes associação e agrupamento, as técnicas associadas a estas classes, e os algoritmos Apriori e K-médias. Toda esta contextualização estará compreendida na ferramenta de mineração de dados escolhida, Weka (Waikato Environment for Knowledge Analysis). O plano de pesquisa está centrado em aplicar o processo de DCBD no Poder Judiciário no que se refere a sua atividade fim, julgamentos de processos, procurando por descobertas a partir da influência da classificação processual em relação à incidência de processos, ao tempo de tramitação, aos tipos de sentenças proferidas e a presença da audiência. Também, será explorada a procura por perfis de réus, nos processos criminais, segundo características como sexo, estado civil, grau de instrução, profissão e raça. O trabalho apresenta nos capítulos 2 e 3 o embasamento teórico de DCBC, detalhando a metodologia CRISP-DM. No capítulo 4 explora-se toda a aplicação realizada nos dados do Poder Judiciário e por fim, no capítulo 5, são apresentadas as conclusões.<br>With the purpose of exploring existing connections among data, a space has been created for the search of Knowledge an useful unknown information based on large sets of stored data. This field was dubbed Knowledge Discovery in Databases (KDD) and it was formalized in 1989. The KDD consists of a process made up of iterative and interactive stages or phases. This work was based on the CRISP-DM methodology. Regardless of the methodology used, this process features a phase that may be considered as the nucleus of KDD, the “data mining” (or modeling according to CRISP-DM) which is associated with the task, as well as the techniques and algorithms that may be employed in an application of KDD. What will be highlighted in this study is affinity grouping and clustering, techniques associated with these tasks and Apriori and K-means algorithms. All this contextualization will be embodied in the selected data mining tool, Weka (Waikato Environment for Knowledge Analysis). The research plan focuses on the application of the KDD process in the Judiciary Power regarding its related activity, court proceedings, seeking findings based on the influence of the procedural classification concerning the incidence of proceedings, the proceduring time, the kind of sentences pronounced and hearing attendance. Also, the search for defendants’ profiles in criminal proceedings such as sex, marital status, education background, professional and race. In chapters 2 and 3, the study presents the theoretical grounds of KDD, explaining the CRISP-DM methodology. Chapter 4 explores all the application preformed in the data of the Judiciary Power, and lastly, in Chapter conclusions are drawn
APA, Harvard, Vancouver, ISO, and other styles
27

Bhaskaran, Subhashini Sailesh. "An Investigation into the Knowledge Discovery and Data Mining (KDDM) process to generate course taking pattern characterised by contextual factors of students in Higher Education Institution (HEI)." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/15880.

Full text
Abstract:
The Knowledge Discovery and Data Mining (KDDM), a growing field of study argued to be very useful in discovering knowledge hidden in large datasets are slowly finding application in Higher Educational Institutions (HEIs). While literature shows that KDDM processes enable discovery of knowledge useful to improve performance of organisations, limitations surrounding them contradict this argument. While extending the usefulness of KDDM processes to support HEIs, challenges were encountered like the discovery of course taking patterns in educational datasets associated with contextual information. While literature argued that existing KDDM processes suffer from the limitations arising out of their inability to generate patterns associated with contextual information, this research tested this claim and developed an artefact that overcame the limitation. Design Science methodology was used to test and evaluate the KDDM artefact. The research used the CRISP-DM process model to test the educational dataset using attributes namely course taking pattern, course difficulty level, optimum CGPA and time-to-degree by applying clustering, association rule and classification techniques. The results showed that both clustering and association rules did not produce course taking patterns. Classification produced course taking patterns that were partially linked to CGPA and time-to-degree. But optimum CGPA and time-to-degree could not be linked with contextual information. Hence the CRISP-DM process was modified to include three new stages namely contextual data understanding, contextual data preparation and additional data preparation (merging) stage to see whether contextual dataset could be separately mined and associated with course taking pattern. The CRISP-DM model and the modified CRISP-DM model were tested as per the guidelines of Chapman et al. (2000). Process theory was used as basis for the modification of CRISP-DM process. Results showed that course taking pattern contextualised by course difficulty level pattern predicts optimum CGPA and time-to-degree. This research has contributed to knowledge by developing a new artefact (contextual factor mining in the CRISP-DM process) to predict optimum CGPA and optimum time-to-degree using course taking pattern and course difficulty level pattern. Contribution to theory was in extension of the application of a few theories to explain the development, testing and evaluation of the KDDM artefact. Enhancement of genetic algorithm (GA) to mine course difficulty level pattern along with course taking pattern is a contribution and a pseudocode to verify the presence of course difficulty level pattern. Contribution to practise was by demonstrating the usefulness of the modified CRISP-DM process for prediction and simulation of the course taking pattern to predict the optimum CGPA and time-to-degree thereby demonstrating that the artefact can be deployed in practise.
APA, Harvard, Vancouver, ISO, and other styles
28

Cai, Chun Hing. "Mining association rules with weighted items." Hong Kong : Chinese University of Hong Kong, 1998. http://www.cse.cuhk.edu.hk/%7Ekdd/assoc%5Frule/thesis%5Fchcai.pdf.

Full text
Abstract:
Thesis (M. Phil.)--Chinese University of Hong Kong, 1998.<br>Description based on contents viewed Mar. 13, 2007; title from title screen. Includes bibliographical references (p. 99-103). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
29

Jdey, Aloui Imen. "Contribution des techniques de fusion et de classification des images au processus d'aide à la reconnaissance des cibles radar non coopératives." Thesis, Brest, 2014. http://www.theses.fr/2014BRES0008.

Full text
Abstract:
La reconnaissance automatique de cibles non coopératives est d’une grande importance dans divers domaines. C’est le cas pour les applications en environnement incertain aérien et maritime. Il s’avère donc nécessaire d’introduire des méthodes originales pour le traitement et l’identification des cibles radar. C’est dans ce contexte que s’inscrit notre travail. La méthodologie proposée est fondée sur le processus d’extraction de connaissance à partir de données (ECD) pour l’élaboration d’une chaine complète de reconnaissance à partir des images radar en essayant d’optimiser chaque étape de cette chaine de traitement. Les expérimentations réalisées pour constituer une base de données d’images ISAR ont été effectuées dans la chambre anéchoïque de l’ENSTA de Bretagne. Ce dispositif de mesures utilisé a l’avantage de disposer d’une maîtrise de la qualité des données représentants les entrées dans le processus de reconnaissance (ECD). Nous avons ainsi étudié les étapes composites de ce processus de l’acquisition jusqu’à l’interprétation et l’évaluation de résultats de reconnaissance. En particulier, nous nous sommes concentrés sur l’étape centrale dédiée à la fouille de données considérée comme le cœur du processus développé. Cette étape est composée de deux phases principales : une porte sur la classification et l’autre sur la fusion des résultats des classifieurs, cette dernière est nommée fusion décisionnelle. Dans ce cadre, nous avons montré que cette dernière phase joue un rôle important dans l’amélioration des résultats pour la prise de décision tout en prenant en compte les imperfections liées aux données radar, notamment l’incertitude et l’imprécision. Les résultats obtenus en utilisant d’une part les différentes techniques de classification (kppv, SVM et PMC), et d’autre part celles de de fusion décisionnelle (Bayes, vote, théorie de croyance, fusion floue) font l’objet d’une étude analytique et comparative en termes de performances<br>The automatic recognition of non-cooperative targets is very important in various fields. This is the case for applications in aviation and maritime uncertain environment. Therefore, it’s necessary to introduce innovative methods for radar targets treatment and identification.The proposed methodology is based on the Knowledge Discovery from Data process (KDD) for a complete chain development of radar images recognition by trying to optimize every step of the processing chain.The experimental system used is based on an ISAR image acquisition system in the anechoic chamber of ENSTA Bretagne. This system has allowed controlling the quality of the entries in the recognition process (KDD). We studied the stages of the composite system from acquisition to interpretation and evaluation of results. We focused on the center stage; data mining considered as the heart of the system. This step is composed of two main phases: classification and the results of classifiers combination called decisional fusion. We have shown that this last phase improves results for decision making by taking into account the imperfections related to radar data, including uncertainty and imprecision.The results across different classification techniques as a first step (kNN, SVM and MCP) and decision fusion in a second time (Bayes, majority vote, belief theory, fuzzy fusion) are subject of an analytical and comparative study in terms of performance
APA, Harvard, Vancouver, ISO, and other styles
30

Peoples, Bruce E. "Méthodologie d'analyse du centre de gravité de normes internationales publiées : une démarche innovante de recommandation." Thesis, Paris 8, 2016. http://www.theses.fr/2016PA080023.

Full text
Abstract:
.../<br>“Standards make a positive contribution to the world we live in. They facilitate trade, spreadknowledge, disseminate innovative advances in technology, and share good management andconformity assessment practices”7. There are a multitude of standard and standard consortiaorganizations producing market relevant standards, specifications, and technical reports in thedomain of Information Communication Technology (ICT). With the number of ICT relatedstandards and specifications numbering in the thousands, it is not readily apparent to users howthese standards inter-relate to form the basis of technical interoperability. There is a need todevelop and document a process to identify how standards inter-relate to form a basis ofinteroperability in multiple contexts; at a general horizontal technology level that covers alldomains, and within specific vertical technology domains and sub-domains. By analyzing whichstandards inter-relate through normative referencing, key standards can be identified as technicalcenters of gravity, allowing identification of specific standards that are required for thesuccessful implementation of standards that normatively reference them, and form a basis forinteroperability across horizontal and vertical technology domains. This Thesis focuses on defining a methodology to analyze ICT standards to identifynormatively referenced standards that form technical centers of gravity utilizing Data Mining(DM) and Social Network Analysis (SNA) graph technologies as a basis of analysis. As a proofof concept, the methodology focuses on the published International Standards (IS) published bythe International Organization of Standards/International Electrotechnical Committee; JointTechnical Committee 1, Sub-committee 36 Learning Education, and Training (ISO/IEC JTC1 SC36). The process is designed to be scalable for larger document sets within ISO/IEC JTC1 that covers all JTC1 Sub-Committees, and possibly other Standard Development Organizations(SDOs).Chapter 1 provides a review of literature of previous standard analysis projects and analysisof components used in this Thesis, such as data mining and graph theory. Identification of adataset for testing the developed methodology containing published International Standardsneeded for analysis and form specific technology domains and sub-domains is the focus ofChapter 2. Chapter 3 describes the specific methodology developed to analyze publishedInternational Standards documents, and to create and analyze the graphs to identify technicalcenters of gravity. Chapter 4 presents analysis of data which identifies technical center of gravitystandards for ICT learning, education, and training standards produced in ISO/IEC JTC1 SC 36.Conclusions of the analysis are contained in Chapter 5. Recommendations for further researchusing the output of the developed methodology are contained in Chapter 6
APA, Harvard, Vancouver, ISO, and other styles
31

Mrázek, Michal. "Data mining." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2019. http://www.nusl.cz/ntk/nusl-400441.

Full text
Abstract:
The aim of this master’s thesis is analysis of the multidimensional data. Three dimensionality reduction algorithms are introduced. It is shown how to manipulate with text documents using basic methods of natural language processing. The goal of the practical part of the thesis is to process real-world data from the internet forum. Posted messages are transformed to the numerical representation, then to two-dimensional space and visualized. Later on, topics of the messages are discovered. In the last part, a few selected algorithms are compared.
APA, Harvard, Vancouver, ISO, and other styles
32

Curtin, Ryan Ross. "Improving dual-tree algorithms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54354.

Full text
Abstract:
This large body of work is entirely centered around dual-tree algorithms, a class of algorithm based on spatial indexing structures that often provide large amounts of acceleration for various problems. This work focuses on understanding dual-tree algorithms using a new, tree-independent abstraction, and using this abstraction to develop new algorithms. Stated more clearly, the thesis of this entire work is that we may improve and expand the class of dual-tree algorithms by focusing on and providing improvements for each of the three independent components of a dual-tree algorithm: the type of space tree, the type of pruning dual-tree traversal, and the problem-specific BaseCase() and Score() functions. This is demonstrated by expressing many existing dual-tree algorithms in the tree-independent framework, and focusing on improving each of these three pieces. The result is a formidable set of generic components that can be used to assemble dual-tree algorithms, including faster traversals, improved tree theory, and new algorithms to solve the problems of max-kernel search and k-means clustering.
APA, Harvard, Vancouver, ISO, and other styles
33

Winck, Ana Trindade. "Processo de KDD para auxílio à reconfiguração de ambientes virtualizados." Pontifícia Universidade Católica do Rio Grande do Sul, 2007. http://hdl.handle.net/10923/1467.

Full text
Abstract:
Made available in DSpace on 2013-08-07T18:42:21Z (GMT). No. of bitstreams: 1 000397762-Texto+Completo-0.pdf: 1330898 bytes, checksum: 5d70750d721e0c762826c9afce7b0753 (MD5) Previous issue date: 2007<br>Xen is a paravirtualizer that allows the simultaneous execution of several virtual machines (VM), each with its own operating system. Inputs for these VMs occur at different resource levels. When the aim is to improve Xen performance, it is interesting to assess the best resource allocation for a given Xen machine when different VMs are executed and the respective parameters adopted. This study puts forward a complete process of knowledge discovering in databases (KDD process). The aim of the process is to (i) capture VM development data, (ii) organize these data as an analytical model, and (iii) implement data mining techniques to suggest new parameters. First, VM development data are obtained by benchmarking each operating system. These data are stored in a data warehouse specially modeled so as to store capture records of benchmark metrics. The data stored are conveniently prepared to be used by data mining algorithms. The predictive models generated are enriched with high-level reconfiguration instructions. These models aim at suggesting the best set of configuration parameters to modify the environment and reach an overall gain in performance, for a given configuration in use. The process proposed was initially implemented and tested in a significant set of benchmarking executions, proving the quality and range of the solution.<br>Xen é um paravirtualizador que permite a execução simultânea de diversas máquinas virtuais (VM), cada uma com seu próprio sistema operacional. O consumo dessas VMs se dá em diferentes níveis de recursos. Com o objetivo de melhorar a performance do Xen, é interessante verificar qual a melhor alocação de recursos para uma dada máquina Xen, quando várias VMs são executadas, e quais são os respectivos parâmetros. Para auxiliar a eventual reconfiguração de parâmetros, este trabalho propõe um processo completo de descoberta de conhecimento em banco de dados (processo de KDD) para capturar dados de desempenho das VMs, organizá-los em um modelo analítico e aplicar técnicas de mineração para sugerir novos parâmetros. Inicialmente são obtidos dados de desempenho de cada VM, onde a estratégia empregada é a execução de benchmarks sobre cada sistema operacional. Esses dados são armazenados em um data warehouse propriamente modelado para armazenar registros de captura de métricas de benchmarks. Os dados armazenados são convenientemente preparados para serem utilizados por algoritmos de mineração de dados. Os modelos preditivos gerados podem, então, ser enriquecidos com instruções em alto nível de reconfigurações. Tais modelos buscam sugerir, dada uma configuração vigente, qual o melhor conjunto de parâmetros de configuração para modificar o ambiente, e alcançar um ganho global de desempenho. O processo proposto foi implementado e testado com um conjunto significativo de execuções de benchmarks, o que mostrou a qualidade e abrangência da solução.
APA, Harvard, Vancouver, ISO, and other styles
34

Payyappillil, Hemambika. "Data mining framework." Morgantown, W. Va. : [West Virginia University Libraries], 2005. https://etd.wvu.edu/etd/controller.jsp?moduleName=documentdata&jsp%5FetdId=3807.

Full text
Abstract:
Thesis (M.S.)--West Virginia University, 2005<br>Title from document title page. Document formatted into pages; contains vi, 65 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 64-65).
APA, Harvard, Vancouver, ISO, and other styles
35

Hrnčíř, Jan. "Aplikace systému LISp-Miner na rozsáhlá reálná data." Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-359106.

Full text
Abstract:
This dissertation thesis describes an advanced method of knowledge discovery in databases (KDD), implemented in system LISp-Miner. The goal is to show the possibilities of coordinated use of analytical tools and complex procedures GUHA in this system. The thesis uses methodology CRISP-DM, which is firstly described and work is proceeded using this methodology in the following sections. The author firstly introduces readers domain area and then the data itself, which are processed to the analysis needs. Analytical questions that are answered at, are drawn from the literature, which is focused on domain area. The work should be used as a guide to LISp-Miner users, using analytical tools and procedures GUHA is therefore described the easiest way to understand.
APA, Harvard, Vancouver, ISO, and other styles
36

Abedjan, Ziawasch. "Improving RDF data with data mining." Phd thesis, Universität Potsdam, 2014. http://opus.kobv.de/ubp/volltexte/2014/7133/.

Full text
Abstract:
Linked Open Data (LOD) comprises very many and often large public data sets and knowledge bases. Those datasets are mostly presented in the RDF triple structure of subject, predicate, and object, where each triple represents a statement or fact. Unfortunately, the heterogeneity of available open data requires significant integration steps before it can be used in applications. Meta information, such as ontological definitions and exact range definitions of predicates, are desirable and ideally provided by an ontology. However in the context of LOD, ontologies are often incomplete or simply not available. Thus, it is useful to automatically generate meta information, such as ontological dependencies, range definitions, and topical classifications. Association rule mining, which was originally applied for sales analysis on transactional databases, is a promising and novel technique to explore such data. We designed an adaptation of this technique for min-ing Rdf data and introduce the concept of “mining configurations”, which allows us to mine RDF data sets in various ways. Different configurations enable us to identify schema and value dependencies that in combination result in interesting use cases. To this end, we present rule-based approaches for auto-completion, data enrichment, ontology improvement, and query relaxation. Auto-completion remedies the problem of inconsistent ontology usage, providing an editing user with a sorted list of commonly used predicates. A combination of different configurations step extends this approach to create completely new facts for a knowledge base. We present two approaches for fact generation, a user-based approach where a user selects the entity to be amended with new facts and a data-driven approach where an algorithm discovers entities that have to be amended with missing facts. As knowledge bases constantly grow and evolve, another approach to improve the usage of RDF data is to improve existing ontologies. Here, we present an association rule based approach to reconcile ontology and data. Interlacing different mining configurations, we infer an algorithm to discover synonymously used predicates. Those predicates can be used to expand query results and to support users during query formulation. We provide a wide range of experiments on real world datasets for each use case. The experiments and evaluations show the added value of association rule mining for the integration and usability of RDF data and confirm the appropriateness of our mining configuration methodology.<br>Linked Open Data (LOD) umfasst viele und oft sehr große öffentlichen Datensätze und Wissensbanken, die hauptsächlich in der RDF Triplestruktur bestehend aus Subjekt, Prädikat und Objekt vorkommen. Dabei repräsentiert jedes Triple einen Fakt. Unglücklicherweise erfordert die Heterogenität der verfügbaren öffentlichen Daten signifikante Integrationsschritte bevor die Daten in Anwendungen genutzt werden können. Meta-Daten wie ontologische Strukturen und Bereichsdefinitionen von Prädikaten sind zwar wünschenswert und idealerweise durch eine Wissensbank verfügbar. Jedoch sind Wissensbanken im Kontext von LOD oft unvollständig oder einfach nicht verfügbar. Deshalb ist es nützlich automatisch Meta-Informationen, wie ontologische Abhängigkeiten, Bereichs-und Domänendefinitionen und thematische Assoziationen von Ressourcen generieren zu können. Eine neue und vielversprechende Technik um solche Daten zu untersuchen basiert auf das entdecken von Assoziationsregeln, welche ursprünglich für Verkaufsanalysen in transaktionalen Datenbanken angewendet wurde. Wir haben eine Adaptierung dieser Technik auf RDF Daten entworfen und stellen das Konzept der Mining Konfigurationen vor, welches uns befähigt in RDF Daten auf unterschiedlichen Weisen Muster zu erkennen. Verschiedene Konfigurationen erlauben uns Schema- und Wertbeziehungen zu erkennen, die für interessante Anwendungen genutzt werden können. In dem Sinne, stellen wir assoziationsbasierte Verfahren für eine Prädikatvorschlagsverfahren, Datenvervollständigung, Ontologieverbesserung und Anfrageerleichterung vor. Das Vorschlagen von Prädikaten behandelt das Problem der inkonsistenten Verwendung von Ontologien, indem einem Benutzer, der einen neuen Fakt einem Rdf-Datensatz hinzufügen will, eine sortierte Liste von passenden Prädikaten vorgeschlagen wird. Eine Kombinierung von verschiedenen Konfigurationen erweitert dieses Verfahren sodass automatisch komplett neue Fakten für eine Wissensbank generiert werden. Hierbei stellen wir zwei Verfahren vor, einen nutzergesteuertenVerfahren, bei dem ein Nutzer die Entität aussucht die erweitert werden soll und einen datengesteuerten Ansatz, bei dem ein Algorithmus selbst die Entitäten aussucht, die mit fehlenden Fakten erweitert werden. Da Wissensbanken stetig wachsen und sich verändern, ist ein anderer Ansatz um die Verwendung von RDF Daten zu erleichtern die Verbesserung von Ontologien. Hierbei präsentieren wir ein Assoziationsregeln-basiertes Verfahren, der Daten und zugrundeliegende Ontologien zusammenführt. Durch die Verflechtung von unterschiedlichen Konfigurationen leiten wir einen neuen Algorithmus her, der gleichbedeutende Prädikate entdeckt. Diese Prädikate können benutzt werden um Ergebnisse einer Anfrage zu erweitern oder einen Nutzer während einer Anfrage zu unterstützen. Für jeden unserer vorgestellten Anwendungen präsentieren wir eine große Auswahl an Experimenten auf Realweltdatensätzen. Die Experimente und Evaluierungen zeigen den Mehrwert von Assoziationsregeln-Generierung für die Integration und Nutzbarkeit von RDF Daten und bestätigen die Angemessenheit unserer konfigurationsbasierten Methodologie um solche Regeln herzuleiten.
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Tantan. "Data Mining over Hidden Data Sources." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1343313341.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Taylor, Phillip. "Data mining of vehicle telemetry data." Thesis, University of Warwick, 2015. http://wrap.warwick.ac.uk/77645/.

Full text
Abstract:
Driving a safety critical task that requires a high level of attention and workload from the driver. Despite this, people often perform secondary tasks such as eating or using a mobile phone, which increase workload levels and divert cognitive and physical attention from the primary task of driving. As well as these distractions, the driver may also be overloaded for other reasons, such as dealing with an incident on the road or holding conversations in the car. One solution to this distraction problem is to limit the functionality of in-car devices while the driver is overloaded. This can take the form of withholding an incoming phone call or delaying the display of a non-urgent piece of information about the vehicle. In order to design and build these adaptions in the car, we must first have an understanding of the driver's current level of workload. Traditionally, driver workload has been monitored using physiological sensors or camera systems in the vehicle. However, physiological systems are often intrusive and camera systems can be expensive and are unreliable in poor light conditions. It is important, therefore, to use methods that are non-intrusive, inexpensive and robust, such as sensors already installed on the car and accessible via the Controller Area Network (CAN)-bus. This thesis presents a data mining methodology for this problem, as well as for others in domains with similar types of data, such as human activity monitoring. It focuses on the variable selection stage of the data mining process, where inputs are chosen for models to learn from and make inferences. Selecting inputs from vehicle telemetry data is challenging because there are many irrelevant variables with a high level of redundancy. Furthermore, data in this domain often contains biases because only relatively small amounts can be collected and processed, leading to some variables appearing more relevant to the classification task than they are really. Over the course of this thesis, a detailed variable selection framework that addresses these issues for telemetry data is developed. A novel blocked permutation method is developed and applied to mitigate biases when selecting variables from potentially biased temporal data. This approach is infeasible computationally when variable redundancies are also considered, and so a novel permutation redundancy measure with similar properties is proposed. Finally, a known redundancy structure between features in telemetry data is used to enhance the feature selection process in two ways. First the benefits of performing raw signal selection, feature extraction, and feature selection in different orders are investigated. Second, a two-stage variable selection framework is proposed and the two permutation based methods are combined. Throughout the thesis, it is shown through classification evaluations and inspection of the features that these permutation based selection methods are appropriate for use in selecting features from CAN-bus data.
APA, Harvard, Vancouver, ISO, and other styles
39

Sherikar, Vishnu Vardhan Reddy. "I2MAPREDUCE: DATA MINING FOR BIG DATA." CSUSB ScholarWorks, 2017. https://scholarworks.lib.csusb.edu/etd/437.

Full text
Abstract:
This project is an extension of i2MapReduce: Incremental MapReduce for Mining Evolving Big Data . i2MapReduce is used for incremental big data processing, which uses a fine-grained incremental engine, a general purpose iterative model that includes iteration algorithms such as PageRank, Fuzzy-C-Means(FCM), Generalized Iterated Matrix-Vector Multiplication(GIM-V), Single Source Shortest Path(SSSP). The main purpose of this project is to reduce input/output overhead, to avoid incurring the cost of re-computation and avoid stale data mining results. Finally, the performance of i2MapReduce is analyzed by comparing the resultant graphs.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhang, Nan. "Privacy-preserving data mining." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1080.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Hulten, Geoffrey. "Mining massive data streams /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/6937.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Büchel, Nina. "Faktorenvorselektion im Data Mining /." Berlin : Logos, 2009. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=019006997&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Shao, Junming. "Synchronization Inspired Data Mining." Diss., lmu, 2011. http://nbn-resolving.de/urn:nbn:de:bvb:19-137356.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Wang, Xiaohong. "Data mining with bilattices." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ59344.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Knobbe, Arno J. "Multi-relational data mining /." Amsterdam [u.a.] : IOS Press, 2007. http://www.loc.gov/catdir/toc/fy0709/2006931539.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

丁嘉慧 and Ka-wai Ting. "Time sequences: data mining." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31226760.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Wan, Chang, and 萬暢. "Mining multi-faceted data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/197527.

Full text
Abstract:
Multi-faceted data contains different types of objects and relationships between them. With rapid growth of web-based services, multi-faceted data are increasing (e.g. Flickr, Yago, IMDB), which offers us richer information to infer users’ preferences and provide them better services. In this study, we look at two types of multi-faceted data: social tagging system and heterogeneous information network and how to improve service such as resources retrieving and classification on them. In social tagging systems, resources such as images and videos are annotated with descriptive words called tags. It has been shown that tag-based resource searching and retrieval is much more effective than content-based retrieval. With the advances in mobile technology, many resources are also geo-tagged with location information. We observe that a traditional tag (word) can carry different semantics at different locations. We study how location information can be used to help distinguish the different semantics of a resource’s tags and thus to improve retrieval accuracy. Given a search query, we propose a location-partitioning method that partitions all locations into regions such that the user query carries distinguishing semantics in each region. Based on the identified regions, we utilize location information in estimating the ranking scores of resources for the given query. These ranking scores are learned using the Bayesian Personalized Ranking (BPR) framework. Two algorithms, namely, LTD and LPITF, which apply Tucker Decomposition and Pairwise Interaction Tensor Factorization, respectively for modeling the ranking score tensor are proposed. Through experiments on real datasets, we show that LTD and LPITF outperform other tag-based resource retrieval methods. A heterogeneous information network (HIN) is used to model objects of different types and their relationships. Meta-paths are sequences of object types. They are used to represent complex relationships between objects beyond what links in a homogeneous network capture. We study the problem of classifying objects in an HIN. We propose class-level meta-paths and study how they can be used to (1) build more accurate classifiers and (2) improve active learning in identifying objects for which training labels should be obtained. We show that class-level meta-paths and object classification exhibit interesting synergy. Our experimental results show that the use of class-level meta-paths results in very effective active learning and good classification performance in HINs.<br>published_or_final_version<br>Computer Science<br>Master<br>Master of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
48

García-Osorio, César. "Data mining and visualization." Thesis, University of Exeter, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.414266.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Grant J. (Grant Jenhorn) 1979. "Algorithms for data mining." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/38315.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.<br>Includes bibliographical references (p. 81-89).<br>Data of massive size are now available in a wide variety of fields and come with great promise. In theory, these massive data sets allow data mining and exploration on a scale previously unimaginable. However, in practice, it can be difficult to apply classic data mining techniques to such massive data sets due to their sheer size. In this thesis, we study three algorithmic problems in data mining with consideration to the analysis of massive data sets. Our work is both theoretical and experimental - we design algorithms and prove guarantees for their performance and also give experimental results on real data sets. The three problems we study are: 1) finding a matrix of low rank that approximates a given matrix, 2) clustering high-dimensional points into subsets whose points lie in the same subspace, and 3) clustering objects by pairwise similarities/distances.<br>by Grant J. Wang.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
50

Anwar, Muhammad Naveed. "Data mining of audiology." Thesis, University of Sunderland, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.573120.

Full text
Abstract:
This thesis describes the data mining of a large set of patient records from the hearing aid clinic at James Cook University Hospital in Middlesbrough, UK. As typical of medical data in general, these audiology records are heterogeneous, containing the following three different types of data: Audiograms (graphs of hearing ability at different frequencies) Structured tabular data (such as gender, date of birth and diagnosis) Unstructured text (specific observations made about each patient in a free- text or comment field) This audiology data set is unique, as it contains records of patients prescribed with both ITE and BTE hearing aids. ITE hearing aids are not generally available on the British National Health Service in England, as they are more expensive than BTE hearing aids. However, both types of aids are prescribed at James Cook University Hospital in Middlesbrough, UK, which is also an important feature of this data. There are two research questions for this research: Which factors influence the choice of ITE (in the ear) as opposed to BTE (behind the ear) hearing aids? For patients diagnosed with tinnitus (ringing in the ear), which factors influence the decision whether to fit a tinnitus masker (a gentle sound source, worn like a hearing aid, designed to drown out tinnitus)? A number of data mining techniques, such as clustering of audiograms, association analysis of variables (such as, age, gender, diagnosis, masker, mould and free text keywords) using contingency tables and principal component analysis on audiograms were used to find candidate variables to be combined into a decision support system (OSS) where unseen patient records are presented to the system, and the relative likelihood that a patient should be fitted with an ITE as opposed to a BTE aid or a tinnitus with masker as opposed to tinnitus not with masker is returned. The DSS was created using the techniques of logistic regression, Nalve Bayesian analysis and Bayesian network, and these systems were tested using 5 fold cross validations to see which of the techniques produced the better results. The advantage of these techniques for the combination of evidence is that it is easy to see which variables contributed to the final d~~Jpion. The constructed models and the data behind them were validated by"presenting them to the Principal audiologist, Dr. Robertshaw at James Cook University Hospital in Middlesbrough for comments and suggestions for improvements. The techniques developed in this thesis for the construction of prediction models were also used successfully on a different audiology data set from Malaysia. These decisions are typically made by audiology technicians working in the out- patient clinics, on the basis of audiogram results and in consultation with the patients. In many cases, the choice is clear cut, but at other times the technicians might benefit from a second opinion given by an automatic system with an explanation of how that second opinion was arrived at.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!