To see the other types of publications on this topic, follow the link: 2D Convolution Neural Network (CNN).

Dissertations / Theses on the topic '2D Convolution Neural Network (CNN)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 15 dissertations / theses for your research on the topic '2D Convolution Neural Network (CNN).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Kapoor, Rishika. "Malaria Detection Using Deep Convolution Neural Network." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613749143868579.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Shuvo, Md Kamruzzaman. "Hardware Efficient Deep Neural Network Implementation on FPGA." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/theses/2792.

Full text
Abstract:
In recent years, there has been a significant push to implement Deep Neural Networks (DNNs) on edge devices, which requires power and hardware efficient circuits to carry out the intensive matrix-vector multiplication (MVM) operations. This work presents hardware efficient MVM implementation techniques using bit-serial arithmetic and a novel MSB first computation circuit. The proposed designs take advantage of the pre-trained network weight parameters, which are already known in the design stage. Thus, the partial computation results can be pre-computed and stored into look-up tables. Then the MVM results can be computed in a bit-serial manner without using multipliers. The proposed novel circuit implementation for convolution filters and rectified linear activation function used in deep neural networks conducts computation in an MSB-first bit-serial manner. It can predict earlier if the outcomes of filter computations will be negative and subsequently terminate the remaining computations to save power. The benefits of using the proposed MVM implementations techniques are demonstrated by comparing the proposed design with conventional implementation. The proposed circuit is implemented on an FPGA. It shows significant power and performance improvements compared to the conventional designs implemented on the same FPGA.
APA, Harvard, Vancouver, ISO, and other styles
3

Ďuriš, Denis. "Detekce ohně a kouře z obrazového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412968.

Full text
Abstract:
This diploma thesis deals with the detection of fire and smoke from the image signal. The approach of this work uses a combination of convolutional and recurrent neural network. Machine learning models created in this work contain inception modules and blocks of long short-term memory. The research part describes selected models of machine learning used in solving the problem of fire detection in static and dynamic image data. As part of the solution, a data set containing videos and still images used to train the designed neural networks was created. The results of this approach are evaluated in conclusion.
APA, Harvard, Vancouver, ISO, and other styles
4

Andersson, Viktor. "Semantic Segmentation : Using Convolutional Neural Networks and Sparse dictionaries." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139367.

Full text
Abstract:
The two main bottlenecks using deep neural networks are data dependency and training time. This thesis proposes a novel method for weight initialization of the convolutional layers in a convolutional neural network. This thesis introduces the usage of sparse dictionaries. A sparse dictionary optimized on domain specific data can be seen as a set of intelligent feature extracting filters. This thesis investigates the effect of using such filters as kernels in the convolutional layers in the neural network. How do they affect the training time and final performance? The dataset used here is the Cityscapes-dataset which is a library of 25000 labeled road scene images.The sparse dictionary was acquired using the K-SVD method. The filters were added to two different networks whose performance was tested individually. One of the architectures is much deeper than the other. The results have been presented for both networks. The results show that filter initialization is an important aspect which should be taken into consideration while training the deep networks for semantic segmentation.
APA, Harvard, Vancouver, ISO, and other styles
5

Sparr, Henrik. "Object detection for a robotic lawn mower with neural network trained on automatically collected data." Thesis, Uppsala universitet, Datorteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444627.

Full text
Abstract:
Machine vision is hot research topic with findings being published at a high pace and more and more companies currently developing automated vehicles. Robotic lawn mowers are also increasing in popularity but most mowers still use relatively simple methods for cutting the lawn. No previous work has been published on machine learning networks that improved between cutting sessions by automatically collecting data and then used it for training. A data acquisition pipeline and neural network architecture that could help the mower in avoiding collision was therefor developed. Nine neural networks were tested of which a convolutional one reached the highest accuracy. The performance of the data acquisition routine and the networks show that it is possible to design a object detection model that improves between runs.
APA, Harvard, Vancouver, ISO, and other styles
6

Pradels, Léo. "Efficient CNN inference acceleration on FPGAs : a pattern pruning-driven approach." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS087.

Full text
Abstract:
Les modèles d'apprentissage profond basés sur les CNNs offrent des performances de pointe dans les tâches de traitement d'images et de vidéos, en particulier pour l'amélioration ou la classification d'images. Cependant, ces modèles sont lourds en calcul et en empreinte mémoire, ce qui les rend inadaptés aux contraintes de temps réel sur des FPGA embarqués. Il est donc essentiel de compresser ces CNNs et de concevoir des architectures d'accélérateurs pour l'inférence qui intègrent la compression dans une approche de co-conception matérielle et logicielle. Bien que des optimisations logicielles telles que l'élagage aient été proposées, elles manquent souvent de structure nécessaire à une intégration efficace de l'accélérateur. Pour répondre à ces limitations, cette thèse se concentre sur l'accélération des CNNs sur FPGA tout en respectant les contraintes de temps réel sur les systèmes embarqués. Cet objectif est atteint grâce à plusieurs contributions clés. Tout d'abord, elle introduit l'élagage des motifs, qui impose une structure à la sparsité du réseau, permettant une accélération matérielle efficace avec une perte de précision minimale due à la compression. Deuxièmement, un accélérateur pour l'inférence de CNN est présenté, qui adapte son architecture en fonction des critères de performance d'entrée, des spécifications FPGA et de l'architecture du modèle CNN cible. Une méthode efficace d'intégration de l'élagage des motifs dans l'accélérateur et un flux complet pour l'accélération de CNN sont proposés. Enfin, des améliorations de la compression du réseau sont explorées grâce à la quantification de Shift\&amp;Add, qui modifie les méthodes de multiplication sur FPGA tout en maintenant la précision du réseau de base<br>CNN-based deep learning models provide state-of-the-art performance in image and video processing tasks, particularly for image enhancement or classification. However, these models are computationally and memory-intensive, making them unsuitable for real-time constraints on embedded FPGA systems. As a result, compressing these CNNs and designing accelerator architectures for inference that integrate compression in a hardware-software co-design approach is essential. While software optimizations like pruning have been proposed, they often lack the structured approach needed for effective accelerator integration. To address these limitations, this thesis focuses on accelerating CNNs on FPGAs while complying with real-time constraints on embedded systems. This is achieved through several key contributions. First, it introduces pattern pruning, which imposes structure on network sparsity, enabling efficient hardware acceleration with minimal accuracy loss due to compression. Second, a scalable accelerator for CNN inference is presented, which adapts its architecture based on input performance criteria, FPGA specifications, and target CNN model architecture. An efficient method for integrating pattern pruning within the accelerator and a complete flow for CNN acceleration are proposed. Finally, improvements in network compression are explored through Shift&amp;Add quantization, which modifies FPGA computation methods while maintaining baseline network accuracy
APA, Harvard, Vancouver, ISO, and other styles
7

JAFARI, MUHAMMAD REZA. "PERSIAN SIGN GESTURE TRANSLATION TO ENGLISH SPOKEN LANGUAGE ON SMARTPHONE." Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2020. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18787.

Full text
Abstract:
Hearing impaired and others with verbal challenges face difficulty to communicate with society; Sign Language represents their communication such as numbers or phrases. The communication becomes a challenge with people from other countries using different languages. Additionally, the sign language is different from one country to another. That is, learning one sign language doesn’t mean learning all sign languages. To translate a word from sign language to a spoken language is a challenge and to change a particular word from that language to another language is even a bigger challenge. In such cases, there is necessity for 2 interpreters: One from sign language to the source-spoken language and one from the source language to the target language. There is ample research done on sign recognition, yet this paper focuses on translating gestures from one language to another. In this study, a smartphone approach is proposed for Sign Language recognition, because smartphones are available worldwide. Smartphones are limited in computational power so, a client server application is proposed where most of processing tasks are done on the server side. In client-server application system, client could be a smartphone application that captures images of sign gestures to be recognized and sent to a server. In turn, the server processes the data and returns the translation Sign to client. On the server application side, where most of the sign recognition tasks take place, background of the sign image is deleted, and under Hue, Saturation, Value (HSV) color space is set to black. The sign gesture then separate by detecting the biggest linked constituent in the frame. Extracted feature are in binary form pixels, and Convolutional Neural Network (CNN) is used to classify sign images. After classification, the letter for a given sign is assigned, and by putting the sequence of letters, a word is created. The word translates to target language, in this case English, and the result returns to client application.
APA, Harvard, Vancouver, ISO, and other styles
8

Abidi, Azza. "Investigating Deep Learning and Image-Encoded Time Series Approaches for Multi-Scale Remote Sensing Analysis in the context of Land Use/Land Cover Mapping." Electronic Thesis or Diss., Université de Montpellier (2022-....), 2024. http://www.theses.fr/2024UMONS007.

Full text
Abstract:
Cette thèse explore le potentiel de l'apprentissage automatique pour améliorer la cartographie de modèles complexes d'utilisation des sols et de la couverture terrestre à l'aide de données d'observation de la Terre. Traditionnellement, les méthodes de cartographie reposent sur la classification et l'interprétation manuelles des images satellites, qui sont sujettes à l'erreur humaine. Cependant, l'application de l'apprentissage automatique, en particulier par le biais des réseaux neuronaux, a automatisé et amélioré le processus de classification, ce qui a permis d'obtenir des résultats plus objectifs et plus précis. En outre, l'intégration de données de séries temporelles d'images satellitaires (STIS) ajoute une dimension temporelle aux informations spatiales, offrant une vue dynamique de la surface de la Terre au fil du temps. Ces informations temporelles sont essentielles pour une classification précise et une prise de décision éclairée dans diverses applications. Les informations d'utilisation des sols et de la couverture terrestre précises et actuelles dérivées des données STIS sont essentielles pour guider les initiatives de développement durable, la gestion des ressources et l'atténuation des risques environnementaux. Le processus de cartographie de d'utilisation des sols et de la couverture terrestre à l'aide du l'apprentissage automatique implique la collecte de données, le prétraitement, l'extraction de caractéristiques et la classification à l'aide de divers algorithmes l'apprentissage automatique . Deux stratégies principales de classification des données STIS ont été proposées : l'approche au niveau du pixel et l'approche basée sur l'objet. Bien que ces deux approches se soient révélées efficaces, elles posent également des problèmes, tels que l'incapacité à capturer les informations contextuelles dans les approches basées sur les pixels et la complexité de la segmentation dans les approches basées sur les objets.Pour relever ces défis, cette thèse vise à mettre en œuvre une métho basée sur des informations multi-échelles pour effectuer la classification de l'utilisation des terres et de la couverture terrestre, en couplant les informations spectrales et temporelles par le biais d'une méthodologie combinée pixel-objet et en appliquant une approche méthodologique pour représenter efficacement les données multi-variées SITS dans le but de réutiliser la grande quantité d'avancées de la recherche proposées dans le domaine de la vision par ordinateur<br>In this thesis, the potential of machine learning (ML) in enhancing the mapping of complex Land Use and Land Cover (LULC) patterns using Earth Observation data is explored. Traditionally, mapping methods relied on manual and time-consuming classification and interpretation of satellite images, which are susceptible to human error. However, the application of ML, particularly through neural networks, has automated and improved the classification process, resulting in more objective and accurate results. Additionally, the integration of Satellite Image Time Series(SITS) data adds a temporal dimension to spatial information, offering a dynamic view of the Earth's surface over time. This temporal information is crucial for accurate classification and informed decision-making in various applications. The precise and current LULC information derived from SITS data is essential for guiding sustainable development initiatives, resource management, and mitigating environmental risks. The LULC mapping process using ML involves data collection, preprocessing, feature extraction, and classification using various ML algorithms. Two main classification strategies for SITS data have been proposed: pixel-level and object-based approaches. While both approaches have shown effectiveness, they also pose challenges, such as the inability to capture contextual information in pixel-based approaches and the complexity of segmentation in object-based approaches.To address these challenges, this thesis aims to implement a method based on multi-scale information to perform LULC classification, coupling spectral and temporal information through a combined pixel-object methodology and applying a methodological approach to efficiently represent multivariate SITS data with the aim of reusing the large amount of research advances proposed in the field of computer vision
APA, Harvard, Vancouver, ISO, and other styles
9

Šůstek, Martin. "Word2vec modely s přidanou kontextovou informací." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363837.

Full text
Abstract:
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
APA, Harvard, Vancouver, ISO, and other styles
10

Marek, Jan. "Rekonstrukce chybějících části obličeje pomocí neuronové sítě." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-433506.

Full text
Abstract:
Cílem této práce je vytvořit neuronovou síť která bude schopna rekonstruovat obličeje z fotografií na kterých je část obličeje překrytá maskou. Jsou prezentovány koncepty využívané při vývoji konvolučních neuronových sítí a generativních kompetitivních sítí. Dále jsou popsány koncepty používané v neuronových sítích specificky pro rekonstrukci fotografií obličejů. Je představen model generativní kompetitivní sítě využívající kombinaci hrazených konvolučních vrstev a víceškálových bloků schopný realisticky doplnit oblasti obličeje zakryté maskou.
APA, Harvard, Vancouver, ISO, and other styles
11

Hameed, Khurram. "Computer vision based classification of fruits and vegetables for self-checkout at supermarkets." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2022. https://ro.ecu.edu.au/theses/2519.

Full text
Abstract:
The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications.
APA, Harvard, Vancouver, ISO, and other styles
12

Stynsberg, John. "Incorporating Scene Depth in Discriminative Correlation Filters for Visual Tracking." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-153110.

Full text
Abstract:
Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous vehicles and robot-vision. Since visual tracking does not assume any prior knowledge about the target, it faces different challenges such occlusion, appearance change, background clutter and scale change. In this thesis we try to improve the capabilities of tracking frameworks using discriminative correlation filters by incorporating scene depth information. We utilize scene depth information on three main levels. First, we use raw depth information to segment the target from its surroundings enabling occlusion detection and scale estimation. Second, we investigate different visual features calculated from depth data to decide which features are good at encoding geometric information available solely in depth data. Third, we investigate handling missing data in the depth maps using a modified version of the normalized convolution framework. Finally, we introduce a novel approach for parameter search using genetic algorithms to find the best hyperparameters for our tracking framework. Experiments show that depth data can be used to estimate scale changes and handle occlusions. In addition, visual features calculated from depth are more representative if they were combined with color features. It is also shown that utilizing normalized convolution improves the overall performance in some cases. Lastly, the usage of genetic algorithms for hyperparameter search leads to accuracy gains as well as some insights on the performance of different components within the framework.
APA, Harvard, Vancouver, ISO, and other styles
13

Gaikwad, Akash S. "Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment." Thesis, 2018. http://hdl.handle.net/1805/17923.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>In recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems. This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model. This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model. 1: Pruning based on Taylor expansion of change in cost function Delta C. 2: Pruning based on L2 normalization of activation maps. 3: Pruning based on a combination of method 1 and method 2. The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L2 normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.
APA, Harvard, Vancouver, ISO, and other styles
14

(5931047), Akash Gaikwad. "Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment." Thesis, 2019.

Find full text
Abstract:
<p>In recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems.</p> <p>This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model. </p> <p>This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model.</p> <p>1: Pruning based on Taylor expansion of change in cost function Delta C.</p> <p>2: Pruning based on L<sub>2</sub> normalization of activation maps.</p> <p>3: Pruning based on a combination of method 1 and method 2.</p><p>The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L<sub>2</sub> normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.</p><p></p>
APA, Harvard, Vancouver, ISO, and other styles
15

(9811085), Anand Koirala. "Precision agriculture: Exploration of machine learning approaches for assessing mango crop quantity." Thesis, 2020. https://figshare.com/articles/thesis/Precision_agriculture_Exploration_of_machine_learning_approaches_for_assessing_mango_crop_quantity/13411625.

Full text
Abstract:
A machine vision based system is proposed to replace the current in-orchard manual estimates of mango fruit yield, to inform harvest resourcing and marketing. The state-of-the-art in fruit detection was reviewed, highlighting the recent move from traditional image segmentation methods to convolution neural network (CNN) based deep learning methods. An experimental comparison of several deep learning based object detection frameworks (single shot detectors versus two-staged detectors) and several standard CNN architectures was undertaken for detection of mango panicles and fruit in tree images. The machine vision system used images of individual trees captured during night time from a moving platform mounted with a Global Navigation Satellite System (GNSS) receiver and a LED panel floodlight. YOLO, a single shot object detection framework, was re-designed and named as MangoYOLO. MangoYOLO outperformed existing state-of-the-art deep learning object detection frameworks in terms of fruit detection time and accuracy and was robust in use across different cultivars and cameras. MangoYOLO achieved F1 score of 0.968 and average precision of 0.983 and required just 70 ms per image (2048 × 2048 pixel) and 4417 MB memory. The annotated image dataset was made publicly available. Approaches were trialled to relate the fruit counts from tree images to the actual harvest count at an individual tree level. Machine vision based estimates of fruit load ranged between -11% to +14% of packhouse fruit counts. However, estimation of fruit yield (t/ha) requires estimation of fruit size as well as fruit number. A fruit sizing app for smart phones was developed as an affordable in-field solution. The solution was based on segmentation of the fruit in image using colour features and estimation of the camera to fruit perimeter distance based on use of fruit allometrics. For mango fruit, RMSEs of 5.3 and 3.7 mm were achieved on length and width measurements under controlled lighting, and RMSEs of 5.5 and 4.6 mm were obtained in-field under ambient lighting. Further, estimation of harvest timing can be informed by assessment of the spread of flowering. Deep learning object detection methods were deployed for assessment of the number and development stage of mango panicles, on tree. Methods to deal with different orientations of flower panicles in tree images were implemented. An R2 >0.8 was achieved between machine vision count of panicles on images and in-field human count per tree. Similarly, mean average precision of 69.1% was achieved for classification of panicle stages. These machine vision systems form a foundation for estimation of crop load and harvest timing, and for automated harvesting.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography