Dissertationen: „Precisión y recall“

1

Parkin, Jennifer. „Memory for spatial mental models : examining the precision of recall“. Thesis, Loughborough University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.415926.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Al-Dallal, Ammar Sami. „Enhancing recall and precision of web search using genetic algorithm“. Thesis, Brunel University, 2012. http://bura.brunel.ac.uk/handle/2438/7379.

Der volle Inhalt der Quelle

Annotation:

Due to rapid growth of the number of Web pages, web users encounter two main problems, namely: many of the retrieved documents are not related to the user query which is called low precision, and many of relevant documents have not been retrieved yet which is called low recall. Information Retrieval (IR) is an essential and useful technique for Web search; thus, different approaches and techniques are developed. Because of its parallel mechanism with high-dimensional space, Genetic Algorithm (GA) has been adopted to solve many of optimization problems where IR is one of them. This thesis proposes searching model which is based on GA to retrieve HTML documents. This model is called IR Using GA or IRUGA. It is composed of two main units. The first unit is the document indexing unit to index the HTML documents. The second unit is the GA mechanism which applies selection, crossover, and mutation operators to produce the final result, while specially designed fitness function is applied to evaluate the documents. The performance of IRUGA is investigated using the speed of convergence of the retrieval process, precision at rank N, recall at rank N, and precision at recall N. In addition, the proposed fitness function is compared experimentally with Okapi-BM25 function and Bayesian inference network model function. Moreover, IRUGA is compared with traditional IR using the same fitness function to examine the performance in terms of time required by each technique to retrieve the documents. The new techniques developed for document representation, the GA operators and the fitness function managed to achieves an improvement over 90% for the recall and precision measures. And the relevance of the retrieved document is much higher than that retrieved by the other models. Moreover, a massive comparison of techniques applied to GA operators is performed by highlighting the strengths and weaknesses of each existing technique of GA operators. Overall, IRUGA is a promising technique in Web search domain that provides a high quality search results in terms of recall and precision.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Klitkou, Gabriel. „Automatisk trädkartering i urban miljö : En fjärranalysbaserad arbetssättsutveckling“. Thesis, Högskolan i Gävle, Avdelningen för Industriell utveckling, IT och Samhällsbyggnad, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-27301.

Der volle Inhalt der Quelle

Annotation:

Digital urban tree registers serve many porposes and facilitate the administration, care and management of urban trees within a city or municipality. Currently, mapping of urban tree stands is carried out manually with methods which are both laborious and time consuming. The aim of this study is to establish a way of operation based on the use of existing LiDAR data and othophotos to automatically detect individual trees. By using the extensions LIDAR Analyst and Feature Analyst for ArcMap a tree extraction was performed. This was carried out over the extent of the city district committee area of Östermalm in the city of Stockholm, Sweden. The results were compared to the city’s urban tree register and validated by calculating its Precision and Recall. This showed that FeatureAnalyst generated the result with the highest accuracy. The derived trees were represented by polygons which despite their high accuracy make the result unsuitable for detecting individual tree positions. Even though the use of LIDAR Analyst resulted in a less precise tree mapping result, individual tree positions were detected satisfactory. This especially in areas with more sparse, regular tree stands. The study concludes that the use of both tools complement each other and compensate the shortcomings of the other. FeatureAnalyst maps an acceptable tree coverage while LIDAR Analyst more accurately identifies individual tree positions. Thus, a combination of the two results could be used for individual tree mapping.
Digitala urbana trädregister tjänar många syften och underlättar för städer och kommuner att administrera, sköta och hantera sina park- och gatuträd. Dagens kartering av urbana trädbestånd sker ofta manuellt med metoder vilka är både arbetsintensiva och tidskrävande. Denna studie syftar till att utveckla ett arbetssätt för att med hjälp av befintliga LiDAR-data och ortofoton automatiskt kartera individuella träd. Med hjälp av tilläggen LIDAR Analyst och FeatureAnalyst för ArcMap utfördes en trädkartering över Östermalms stadsdelsnämndsområde i Stockholms stad. Efter kontroll mot stadens träddatabas och validering av resultatet genom beräknandet av Precision och Recall konstaterades att användningen av FeatureAnalyst resulterade i det bästa trädkarteringsresultatet. Dessa träd representeras av polygoner vilket medför att resultatet trots sin goda täckning inte lämpar sig för identifierandet av enskilda trädpositioner. Även om användningen av LIDAR Analyst resulterade i ett mindre precist karteringsresultat erhölls goda positionsbestämmelser för enskilda träd, främst i områden med jämna, glesa trädbestånd. Slutsatsen av detta är att användandet av de båda verktygen kompenserar varandras tillkortakommanden där FeatureAnalyst ger en godtagbar trädtäckning medan LIDAR Analyst bättre identifierar enskilda trädpositioner. En kombination av de båda resultaten skulle alltså kunna användas i trädkarteringssyfte.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Johansson, Ann, und Karolina Johansson. „Utvärdering av sökmaskiner : en textanalys kring utvärderingar av sökmaskiner på Webben“. Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18323.

Der volle Inhalt der Quelle

Annotation:

The purpose of this thesis is to analyse studies that evaluate Web search engines. This is done in four categories; the researchers’ purpose, the evaluation measurements, the relevance, and the time aspect. Our method is based on a text analysis in which we use the direction of analysis of the content of sixteen evaluation experiments. Our results indicate fundamental differences in the way the researchers are tackling the problem of evaluation of Web search engines. We think that, despite the differences that we have been able to identify, it is necessary to perform evaluation experiments, so that methods can be developed that can guarantee the quality of the Web search engines. To provide people with the kind of information they need is the main task for Web search engines. In an increasing flow of information that task will be even more important. Evaluation of Web search engines can be a part of improving the efficiency of the Web search engines and in that way strengthen their roll as important information resources.
Uppsatsnivå: D

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Carlsson, Bertil. „Guldstandarder : dess skapande och utvärdering“. Thesis, Linköping University, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-19954.

Der volle Inhalt der Quelle

Annotation:

Forskningsområdet för att skapa bra automatiska sammanfattningar har ökat stadigt genom de senaste åren. Detta på grund av den efterfrågan som finns både inom den privata och offentliga sektorn på att kunna ta till sig mer information än vad som idag är möjligt. Man vill slippa sitta och läsa hela rapporter och informationstexter utan istället smidigt kunna läsa en sammanfattning av dessa för att på så sätt kunna läsa fler. För att veta om dessa automatiska sammanfattare håller en bra standard måste dessa utvärderas på något sätt. Ofta görs detta genom att se till hur mycket information som kommer med i sammanfattningen och hur mycket som utelämnas. För att detta ska vara möjligt att kontrollera behövs en så kallad guldstandard, en sammanfattning som agerar som facit gentemot de automatiskt sammanfattade texterna.

Den här rapporten behandlar ämnet guldstandarder och skapandet av dessa. I projektet har fem guldstandarder på informationstexter från Försäkringskassan skapats och utvärderats med positiva resultat.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Nordh, Andréas. „Musikwebb : En evaluering av webbtjänstens återvinningseffektivitet“. Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-19907.

Der volle Inhalt der Quelle

Annotation:

The aim of this thesis was to evaluate the music downloading service Musikwebb regarding its indexing and retrieval effectiveness. This was done by performing various kinds of search in the system. The outcome of these searches were then analysed according to the criteria specificity, precision, recall, exclusivity and authority control. The study showed that Musikwebb had several flaws regarding its retrieval effectiveness. The most prominent cases were the criteria exclusivity and specificity. Several of Musikwebb’s classes could be regarded as almost similar and the average number of songs in each class was over 50 000. As this study shows, having over 50 000 unique entries in a class results in problems regarding the effectiveness of the browsing technique. The developers of Musikwebb are recommended by the author to acquire their licensed material from All Music Guide, including the implementation of the All Music Guide classification system.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Santos, Juliana Bonato dos. „Automatizando o processo de estimativa de revocação e precisão de funções de similaridade“. reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2008. http://hdl.handle.net/10183/15889.

Der volle Inhalt der Quelle

Annotation:

Os mecanismos tradicionais de consulta a bases de dados, que utilizam o critério de igualdade, têm se tornado ineficazes quando os dados armazenados possuem variações tanto ortográficas quanto de formato. Nesses casos, torna-se necessário o uso de funções de similaridade ao invés dos operadores booleanos. Os mecanismos de consulta por similaridade retornam um ranking de elementos ordenados pelo seu valor de similaridade em relação ao objeto consultado. Para delimitar os elementos desse ranking que efetivamente fazem parte do resultado pode-se utilizar um limiar de similaridade. Entretanto, a definição do limiar de similaridade adequado é complexa, visto que este valor varia de acordo com a função de similaridade usada e a semântica dos dados consultados. Uma das formas de auxiliar na definição do limiar adequado é avaliar a qualidade do resultado de consultas que utilizam funções de similaridade para diferentes limiares sobre uma amostra da coleção de dados. Este trabalho apresenta um método automático de estimativa da qualidade de funções de similaridade através de medidas de revocação e precisão computadas para diferentes limiares. Os resultados obtidos a partir da aplicação desse método podem ser utilizados como metadados e, a partir dos requisitos de uma aplicação específica, auxiliar na definição do limiar mais adequado. Este processo automático utiliza métodos de agrupamento por similaridade, bem como medidas para validar os grupos formados por esses métodos, para eliminar a intervenção humana durante a estimativa de valores de revocação e precisão.
Traditional database query mechanisms, which use the equality criterion, have become inefficient when the stored data have spelling and format variations. In such cases, it's necessary to use similarity functions instead of boolean operators. Query mechanisms that use similarity functions return a ranking of elements ordered by their score in relation to the query object. To define the relevant elements that must be returned in this ranking, a threshold value can be used. However, the definition of the appropriated threshold value is complex, because it depends on the similarity function used and the semantics of the queried data. One way to help to choose an appropriate threshold is to evaluate the quality of similarity functions results using different thresholds values on a database sample. This work presents an automatic method to estimate the quality of similarity functions through recall and precision measures computed for different thresholds. The results obtained by this method can be used as metadata and, through the requirements of an specific application, assist in setting the appropriated threshold value. This process uses clustering methods and cluster validity measures to eliminate human intervention during the process of estimating recall and precision.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Chiow, Sheng-wey. „A precision measurement of the photon recoil using large area atom interferometry /“. May be available electronically:, 2008. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Lopes, Miguel. „Inference of gene networks from time series expression data and application to type 1 Diabetes“. Doctoral thesis, Universite Libre de Bruxelles, 2015. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/216729.

Der volle Inhalt der Quelle

Annotation:

The inference of gene regulatory networks (GRN) is of great importance to medical research, as causal mechanisms responsible for phenotypes are unravelled and potential therapeutical targets identified. In type 1 diabetes, insulin producing pancreatic beta-cells are the target of an auto-immune attack leading to apoptosis (cell suicide). Although key genes and regulations have been identified, a precise characterization of the process leading to beta-cell apoptosis has not been achieved yet. The inference of relevant molecular pathways in type 1 diabetes is then a crucial research topic. GRN inference from gene expression data (obtained from microarrays and RNA-seq technology) is a causal inference problem which may be tackled with well-established statistical and machine learning concepts. In particular, the use of time series facilitates the identification of the causal direction in cause-effect gene pairs. However, inference from gene expression data is a very challenging problem due to the large number of existing genes (in human, over twenty thousand) and the typical low number of samples in gene expression datasets. In this context, it is important to correctly assess the accuracy of network inference methods. The contributions of this thesis are on three distinct aspects. The first is on inference assessment using precision-recall curves, in particular using the area under the curve (AUPRC). The typical approach to assess AUPRC significance is using Monte Carlo, and a parametric alternative is proposed. It consists on deriving the mean and variance of the null AUPRC and then using these parameters to fit a beta distribution approximating the true distribution. The second contribution is an investigation on network inference from time series. Several state of the art strategies are experimentally assessed and novel heuristics are proposed. One is a fast approximation of first order Granger causality scores, suited for GRN inference in the large variable case. Another identifies co-regulated genes (ie. regulated by the same genes). Both are experimentally validated using microarray and simulated time series. The third contribution of this thesis is on the context of type 1 diabetes and is a study on beta cell gene expression after exposure to cytokines, emulating the mechanisms leading to apoptosis. 8 datasets of beta cell gene expression were used to identify differentially expressed genes before and after 24h, which were functionally characterized using bioinformatics tools. The two most differentially expressed genes, previously unknown in the type 1 Diabetes literature (RIPK2 and ELF3) were found to modulate cytokine induced apoptosis. A regulatory network was then inferred using a dynamic adaptation of a state of the art network inference method. Three out of four predicted regulations (involving RIPK2 and ELF3) were experimentally confirmed, providing a proof of concept for the adopted approach.
Doctorat en Sciences
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Afram, Gabriel. „Genomsökning av filsystem för att hitta personuppgifter : Med Linear chain conditional random field och Regular expression“. Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34069.

Der volle Inhalt der Quelle

Annotation:

The new General Data Protection Regulation (GDPR) Act will apply to all companies within the European Union after 25 May. This means stricter legal requirements for companies that in some way store personal data. The goal of this project is therefore to make it easier for companies to meet the new legal requirements. This by creating a tool that searches file systems and visually shows the user in a graphical user interface which files contain personal data. The tool uses Named entity recognition with the Linear chain conditional random field algorithm which is a type of supervised learning method in machine learning. This algorithm is used in the project to find names and addresses in files. The different models are trained with different parameters and the training is done using the stanford NER library in Java. The models are tested by a test file containing 45,000 words where the models themselves can predict all classes to the words in the file. The models are then compared with each other using the measurements of precision, recall and F-score to find the best model. The tool also uses Regular Expression to find emails, IP numbers, and social security numbers. The result of the final machine learning model shows that it does not find all names and addresses, but that can be improved by increasing exercise data. However, this is something that requires a more powerful computer than the one used in this project. An analysis of how the Swedish language is built would also need to be done to apply the most appropriate parameters for the training of the model.
Den nya lagen General data protection regulation (GDPR) började gälla för alla företag inom Europeiska unionen efter den 25 maj. Detta innebär att det blir strängare lagkrav för företag som på något sätt lagrar personuppgifter. Målet med detta projekt är därför att underlätta för företag att uppfylla de nya lagkraven. Detta genom att skapa ett verktyg som söker igenom filsystem och visuellt visar användaren i ett grafiskt användargränssnitt vilka filer som innehåller personuppgifter. Verktyget använder Named Entity Recognition med algoritmen Linear Chain Conditional Random Field som är en typ av ”supervised” learning metod inom maskininlärning. Denna algoritm används för att hitta namn och adresser i filer. De olika modellerna tränas med olika parametrar och träningen sker med hjälp av biblioteket Stanford NER i Java. Modellerna testas genom en testfil som innehåller 45 000 ord där modellerna själva får förutspå alla klasser till orden i filen. Modellerna jämförs sedan med varandra med hjälp av mätvärdena precision, recall och F-score för att hitta den bästa modellen. Verktyget använder även Regular expression för att hitta e- mails, IP-nummer och personnummer. Resultatet på den slutgiltiga maskininlärnings modellen visar att den inte hittar alla namn och adresser men att det är något som kan förbättras genom att öka träningsdata. Detta är dock något som kräver en kraftfullare dator än den som användes i detta projekt. En undersökning på hur det svenska språket är uppbyggt skulle även också behöva göras för att använda de lämpligaste parametrarna vid träningen av modellen.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Li, Chaoyang, und Ke Liu. „Smart Search Engine : A Design and Test of Intelligent Search of News with Classification“. Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37601.

Der volle Inhalt der Quelle

Annotation:

Background Google, Bing, and Baidu are the most commonly used search engines in the world. They also have some problems. For example, when searching for Jaguar, most of the search results are cars, not animals. This is the problem of polysemy. Search engines always provide the most popular but not the most correct results. Aim We want to design and implement a search function and explore whether the method of classified news can improve the precision of users searching for news. Method In this research, we collect data by using a web crawler. We use a web crawler to crawl the data of news in BBC news. Then we use NLTK, inverted index to do data pre-processing, and use BM25 to do data processing. Results Compare to the normal search function, our function has a lower recall rate and a higher precision. Conclusions This search function can improve the precision when people search for news. Implications This search function can be used not only to search news but to search everything. It has a great future in search engines. It can be combined with machine learning to analyze users' search habits to search and classify more accurately.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Gebert, Florian [Verfasser]. „Precision measurement of the isotopic shift in calcium ions using photon recoil spectroscopy / Florian Gebert“. Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2015. http://d-nb.info/1072060299/34.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Javar, Shima. „Measurement and comparison of clustering algorithms“. Thesis, Växjö University, School of Mathematics and Systems Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-1735.

Der volle Inhalt der Quelle

Annotation:

In this project, a number of different clustering algorithms are described and their workings explained. They are compared to each other by implementing them on number of graphs with a known architecture.

These clustering algorithm, in the order they are implemented, are as follows: Nearest neighbour hillclimbing, Nearest neighbour big step hillclimbing, Best neighbour hillclimbing, Best neighbour big step hillclimbing, Gem 3D, K-means simple, K-means Gem 3D, One cluster and One cluster per node.

The graphs are Unconnected, Directed KX, Directed Cycle KX and Directed Cycle.

The results of these clusterings are compared with each other according to three criteria: Time, Quality and Extremity of nodes distribution. This enables us to find out which algorithm is most suitable for which graph. These artificial graphs are then compared with the reference architecture graph to reach the conclusions.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Aula, Lara. „Improvement of Optical Character Recognition on Scanned Historical Documents Using Image Processing“. Thesis, Högskolan i Gävle, Datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-36244.

Der volle Inhalt der Quelle

Annotation:

As an effort to improve accessibility to historical documents, digitization of historical archives has been an ongoing process at many institutions since the origination of Optical Character Recognition. The old, scanned documents can contain deteriorations acquired over time or caused by old printing methods. Common visual attributes seen on the documents are variations in style and font, broken characters, ink intensity, noise levels and damage caused by folding or ripping and more. Many of these attributes are disfavoring for modern Optical Character Recognition tools and can lead to failed character recognition. This study approaches stated problem by using image processing methods to improve the result of character recognition. Furthermore, common image quality characteristics of scanned historical documents with unidentifiable text are analyzed. The Optical Character Recognition tool used to conduct this research was the open-source Tesseract software. Image processing methods like Gaussian lowpass filtering, Otsu’s optimum thresholding method and morphological operations were used to prepare the historical documents for Tesseract. Using the Precision and Recall classification method, the OCR output was evaluated, and it was seen that the recall improved by 63 percentage points and the precision by 18 percentage points. This shows that using image pre-processing methods as an approach to increase the readability of historical documents for Optical Character Recognition tools is effective. Further it was seen that common characteristics that are especially disadvantageous for Tesseract are font deviations, occurrence of non-belonging objects, character fading, broken characters, and Poisson noise.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Jelassi, Mohamed Nidhal. „Un système personnalisé de recommandation à partir de concepts quadratiques dans les folksonomies“. Thesis, Clermont-Ferrand 2, 2016. http://www.theses.fr/2016CLF22693/document.

Der volle Inhalt der Quelle

Annotation:

Les systèmes de recommandation ont acquis une certaine popularité parmi les chercheurs, où de nombreuses approches ont été proposées dans la littérature. Les utilisateurs des folksonomies partagent des items (e.g., livres, films, sites web, etc.) en les annotant avec des tags librement choisis. Avec l'essor du Web 2.0, les utilisateurs sont devenus les principaux acteurs du système étant donné qu'ils sont à la fois les contributeurs et créateurs de l'information. Ainsi, il est important de répondre à leurs besoins en leur proposant une recommandation plus ciblée. Pour ce faire, nous considérons une nouvelle dimension dans une folksonomie classiquement composée de trois dimensions et nous proposons une approche afin de regrouper les utilisateurs ayant des intérêts proches à travers des structures appelées concepts quadratiques. Ensuite, nous utilisons ces structures afin de proposer un nouveau système personnalisé de recommandation. Nous évaluons nos approches sur divers jeux de données du monde réel. Ces expérimentations ont démontré de bons résultats en termes de précision et de rappel ainsi qu'une bonne évaluation sociale. De plus, nous étudions quelques unes des métriques utilisées pour évaluer le systèmes de recommandations, comme la couverture, la diversité, l'adaptivité, la sérendipité ou encore la scalabilité. Par ailleurs, nous menons une étude de cas sur quelques utilisateurs comme complément à notre évaluation afin d'avoir l'avis des utilisateurs sur notre système. Enfin, nous proposons un nouvel algorithme qui permet de mettre à jour un ensemble de concepts triadiques sans avoir à re-scanner l'entière folksonomie. Les premiers résultats comparant les performances de notre proposition par rapport au redémarrage du processus d'extraction des concepts triadiques sur quatre jeux de données du monde réel a démontré son efficacité
Recommender systems are now popular both commercially as well as within the research community, where many approaches have been suggested for providing recommendations. Folksonomies' users are sharing items (e.g., movies, books, bookmarks, etc.) by annotating them with freely chosen tags. Within the Web 2.0 age, users become the core of the system since they are both the contributors and the creators of the information. In this respect, it is of paramount importance to match their needs for providing a more targeted recommendation. For such purpose, we consider a new dimension in a folksonomy classically composed of three dimensions and propose an approach to group users with close interests through quadratic concepts. Then, we use such structures in order to propose our personalized recommendation system of users, tags and resources. We carried out extensive experiments on two real-life datasets, i.e., MovieLens and BookCrossing which highlight good results in terms of precision and recall as well as a promising social evaluation. Moreover, we study some of the key assessment metrics namely coverage, diversity, adaptivity, serendipity and scalability. In addition, we conduct a user study as a valuable complement to our evaluation in order to get further insights. Finally, we propose a new algorithm that aims to maintain a set of triadic concepts without the re-scan of the whole folksonomy. The first results comparing the performances of our proposition andthe running from scratch the whole process over four real-life datasets show its efficiency

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Pakyurek, Muhammet. „A Comparative Evaluation Of Foreground / Background Segmentation Algorithms“. Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614666/index.pdf.

Der volle Inhalt der Quelle

Annotation:

A COMPARATIVE EVALUATION OF FOREGROUND / BACKGROUND SEGMENTATION ALGORITHMS Pakyurek, Muhammet M.Sc., Department of Electrical and Electronics Engineering Supervisor: Prof. Dr. Gö
zde Bozdagi Akar September 2012, 77 pages Foreground Background segmentation is a process which separates the stationary objects from the moving objects on the scene. It plays significant role in computer vision applications. In this study, several background foreground segmentation algorithms are analyzed by changing their critical parameters individually to see the sensitivity of the algorithms to some difficulties in background segmentation applications. These difficulties are illumination level, view angles of camera, noise level, and range of the objects. This study is mainly comprised of two parts. In the first part, some well-known algorithms based on pixel difference, probability, and codebook are explained and implemented by providing implementation details. The second part includes the evaluation of the performances of the algorithms which is based on the comparison v between the foreground background regions indicated by the algorithms and ground truth. Therefore, some metrics including precision, recall and f-measures are defined at first. Then, the data set videos having different scenarios are run for each algorithm to compare the performances. Finally, the performances of each algorithm along with optimal values of their parameters are given based on f measure.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Wahab, Nor-Ul. „Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data“. Thesis, Högskolan Dalarna, Mikrodataanalys, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:du-28962.

Der volle Inhalt der Quelle

Annotation:

A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Kondapalli, Swetha. „An Approach To Cluster And Benchmark Regional Emergency Medical Service Agencies“. Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1596491788206805.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Massaccesi, Luciano. „Machine Learning Software for Automated Satellite Telemetry Monitoring“. Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20502/.

Der volle Inhalt der Quelle

Annotation:

During the lifetime of a satellite malfunctions may occur. Unexpected behaviour are monitored using sensors all over the satellite. The telemetry values are then sent to Earth and analysed seeking for anomalies. These anomalies could be detected by humans, but this is considerably expensive. To lower the costs, machine learning techniques can be applied. In this research many diferent machine learning techniques are tested and compared using satellite telemetry data provided by OHB System AG. The fact that the anomalies are collective, together with some data properties, is exploited to improve the performances of the machine learning algorithms. Since the data comes from a real spacecraft, it presents some defects. The data covers in fact a small time-lapse and does not present critical anomalies due to the spacecraft healthiness. Some steps are then taken to improve the evaluation of the algorithms.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Čeloud, David. „Vyhledávání informací TRECVid Search“. Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237260.

Der volle Inhalt der Quelle

Annotation:

The master's thesis deals with Information Retrieval. It summarizes the knowledge in the field of Information Retrieval theory. Furthermore, the work gives an overview of models used in Information Retrieval, the data and the actual issues and their possible solutions. The practical part of the master's thesis is focused on the implementation of methods of information retrieval in textual data. The last part is dedicated to experiments validating the implementation and its possible improvements.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Nilsson, Olof. „Visualization of live search“. Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-102448.

Der volle Inhalt der Quelle

Annotation:

The classical search engine result page is used for many interactions with search results. While these are effective at communicating relevance, they do not present the context well. By giving the user an overview in the form of a spatialized display, in a domain that has a physical analog that the user is familiar with, context should become pre-attentive and obvious to the user. A prototype has been built that takes public medical information articles and assigns these to parts of the human body. The articles are indexed and made searchable. A visualization presents the coverage of a query on the human body and allows the user to interact with it to explore the results. Through usage cases the function and utility of the approach is shown.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Chang, Jing-Shin, und 張景新. „Automatic Lexicon Acquisition and Precision-Recall Maximization for Untagged Text Corpora“. Thesis, 1997. http://ndltd.ncl.edu.tw/handle/62275166919316022114.

Der volle Inhalt der Quelle

Annotation:

博士
國立清華大學
電機工程學系
85
Automatic lexicon acquisition from large text corpora is surveyed in thisdissertation, with special emphases on optimization techniques for maximizingthe joint precision- recall performance. Both English compound word extractionand Chinese unknown word identification tasks are studied in order to exploreprecision-recall optimization techniques in different languages of differentcomplexity using different available resources. In the English compound wordextraction task, the simplest system architecture, which assumes that thelexicon extraction task is conducted using a classifier (or a filter) based ona set of multiple association features, is studied. Under such circumstances,a two stage optimization scheme is proposed, in which the first stage aims atminimizing classification error and the second stage focuses on maximizingjoint precision-recall, starting from the minimum error status. To achieveminimum error rate, various approaches are used to improve the error rateperformance of the classifier. In addition, a non-linear learning algorithm isdeveloped for achieving maximum precision-recall performance in terms of userspecified objective function of precision and recall. In the Chinese unknownword extraction task, where contextual information as well as word associationmetrics are used, an iterative approach, which allows us to improve bothprecision and recall simultaneously, is proposed to iteratively improve theprecision and recall performance. For the English compound word extractiontask, the weighted precision and recall (WPR) using the proposed approach canachieve as high as about 88% for bigram compounds, and 88% for trigramcompounds for a training (testing) corpus of 20715 (2301) sentences sampledfrom technical manuals of cars. The F-measure performances are about 84% forbigrams and 86% for trigrams. By applying the proposed optimization method,the precision and recall profile is observed to follow the preferred criteriaof different lexicographers. For the Chinese unknown word identification task,experiment results show that both precision and recall rates are improvedalmost monotonically, in contrast to non-iterative segmentation-merging-filtering- and-disambiguation approaches, which often sacrifice precision forrecall or vice versa. With a corpus of 311,591 sentences, the performance is76% (bigram), 54% (trigram), and 70% (quadgram) in F-measure, which issignificantly better than using the non-iterative approach with F-measures of74% (bigram), 46% (trigram), and 58% (quadgram).

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Gallé, Matthias. „Algoritmos para la búsqueda eficiente de instancias similares“. Bachelor's thesis, 2007. http://hdl.handle.net/11086/8.

Der volle Inhalt der Quelle

Annotation:

Tesis (Lic. en Ciencias de la Computación)--Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física, 2007.
En el presente trabajo encaramos el desafío de buscar objetos similares dentro de una colección muy grande de estos objetos. Encontramos dos dificultades en éste problema: en primer lugar definir una medida de similitud entre dos objetos y luego implementar un algoritmo que, basandose en esa medida, encuentre de una manera eficiente los objetos suficientemente parecidos. La solución presentada utiliza una medida basada fuertemente en los conceptos de precisión y recall, obteniendose una medida similar a la de Jaccard. La eficiencia del algoritmo radica en la generación de grupos de objetos similares, y solamente después busca éstos objetos en la base de datos. Usamos éste algoritmo en dos aplicaciones: por un lado a una base de datos de usuarios que evalúan películas a fin de proyectar éstas notas. Por otro lado, la utilizamos para encontrar pérfiles genéticos que pueden haber aportado a una evidencia genética.
Matthias Gallé.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Wang, Hui-Ju, und 王惠如. „Using concept map navigation to improve the precision and recall of document searching“. Thesis, 2006. http://ndltd.ncl.edu.tw/handle/gkecnw.

Der volle Inhalt der Quelle

Annotation:

碩士
銘傳大學
資訊管理學系碩士班
94
The user needs to find an enormous amount of papers related to solve their problem or as reference data. Currently, the most common two search methods are keyword searching, and natural language searching. The advantage of keyword searching is that is simple, convenient and comprehensive; yet keyword may not fully express the semantics of the user, thus generating a lot of search results which are not the most desired. Secondly, natural language searching has a more complete description than keyword searching, improving the precision. However, the disadvantage is that the semantic of the sentence is more personalized. The search results may be disappointing if users have different recognition than that of the database. In context, a concept will be generated between terms as a result, which express the concept of the article. Using association rule has great results in terms of retrieving and document classification, and concept map navigation shows the links between these concepts. The research proposes that concept map navigation searching which extracts the main concept of documents and links the concepts through the association rule forming a concept map, which allows users to browse and search for data needed, thus solving the problem of low precision searching because of unclear semantic expression. Finally, we have implemented our approach and used conference papers as samples to test the precision and recall. The result shows that concept map navigation can improve precision and recall by more than 5%. The investigated results show that our approach is feasible and effective.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Garrett, LeAnn. „Authority control and its influence on recall and precision in an online bibliographic catalog“. 1997. http://books.google.com/books?id=_PHgAAAAMAAJ.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

El, Demerdash Osama. „Mining photographic collections to enhance the precision and recall of search results using semantically controlled query expansion“. Thesis, 2013. http://spectrum.library.concordia.ca/977207/1/ElDemerdash_PhD_S2013.pdf.

Der volle Inhalt der Quelle

Annotation:

Driven by a larger and more diverse user-base and datasets, modern Information Retrieval techniques are striving to become contextually-aware in order to provide users with a more satisfactory search experience. While text-only retrieval methods are significantly more accurate and faster to render results than purely visual retrieval methods, these latter provide a rich complementary medium which can be used to obtain relevant and different results from those obtained using text-only retrieval. Moreover, the visual retrieval methods can be used to learn the user’s context and preferences, in particular the user’s relevance feedback, and exploit them to narrow down the search to more accurate results. Despite the overall deficiency in precision of visual retrieval result, the top results are accurate enough to be used for query expansion, when expanded in a controlled manner. The method we propose overcomes the usual pitfalls of visual retrieval: 1. The hardware barrier giving rise to prohibitively slow systems. 2. Results dominated by noise. 3. A significant gap between the low-level features and the semantics of the query. In our thesis, the first barrier is overcome by employing a simple block-based visual features which outperforms a method based on MPEG-7 features specially at early precision (precision of the top results). For the second obstacle, lists from words semantically weighted according to their degree of relation to the original query or to relevance feedback from example images are formed. These lists provide filters through which the confidence in the candidate results is assessed for inclusion in the results. This allows for more reliable Pseudo-Relevance Feedback (PRF). This technique is then used to bridge the third barrier; the semantic gap. It consists of a second step query, re-querying the data set with an query expanded with weighted words obtained from the initial query, and semantically filtered (SF) without human intervention. We developed our PRF-SF method on the IAPR TC-12 benchmark dataset of 20,000 tourist images, obtaining promising results, and tested it on the different and much larger Belga benchmark dataset of approximately 500,000 news images originating from a different source. Our experiments confirmed the potential of the method in improving the overall Mean Average Precision, recall, as well as the level of diversity of the results measured using cluster recall.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

„Utility of Considering Multiple Alternative Rectifications in Data Cleaning“. Master's thesis, 2013. http://hdl.handle.net/2286/R.I.18825.

Der volle Inhalt der Quelle

Annotation:

abstract: Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.
Dissertation/Thesis
M.S. Computer Science 2013

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema „Precisión y recall“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an