To see the other types of publications on this topic, follow the link: Multi-view architectures.

Journal articles on the topic 'Multi-view architectures'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-view architectures.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Tehrani, Mehrdad Panahpour, Michael Droese, Toshiaki Fujii, and Masayuki Tanimoto. "Distributed Source Coding Architectures for Multi-view Images." Journal of the Institute of Image Information and Television Engineers 58, no. 10 (2004): 1461–64. http://dx.doi.org/10.3169/itej.58.1461.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Liang Hao, Ming Xi, Dong Xiao Li, and Ming Zhang. "A Network-Friendly Architecture for Multi-View Video Coding (MVC)." Advanced Materials Research 121-122 (June 2010): 678–81. http://dx.doi.org/10.4028/www.scientific.net/amr.121-122.678.

Full text
Abstract:
Multi-view Video Coding (MVC) is very promising in applicable field for its 3D effect and interactive functions (multi viewpoint). In this paper, a network-friendly architecture for MVC is proposed. To exploit temporal as well as inter-view dependencies between adjacent cameras, two main features of the coder are used: hierarchical B picture and FGS (fine granularity scalable). Coding results are shown for the proposed multi-view coder and compared to the traditional coding architectures to show that our presented coding scheme outperforms the other approaches for the tested sequence.
APA, Harvard, Vancouver, ISO, and other styles
3

Koutris, Aristotelis, Theodoros Siozos, Yannis Kopsinis, et al. "Deep Learning-Based Indoor Localization Using Multi-View BLE Signal." Sensors 22, no. 7 (2022): 2759. http://dx.doi.org/10.3390/s22072759.

Full text
Abstract:
In this paper, we present a novel Deep Neural Network-based indoor localization method that estimates the position of a Bluetooth Low Energy (BLE) transmitter (tag) by using the received signals’ characteristics at multiple Anchor Points (APs). We use the received signal strength indicator (RSSI) value and the in-phase and quadrature-phase (IQ) components of the received BLE signals at a single time instance to simultaneously estimate the angle of arrival (AoA) at all APs. Through supervised learning on simulated data, various machine learning (ML) architectures are trained to perform AoA estimation using varying subsets of anchor points. In the final stage of the system, the estimated AoA values are fed to a positioning engine which uses the least squares (LS) algorithm to estimate the position of the tag. The proposed architectures are trained and rigorously tested on several simulated room scenarios and are shown to achieve a localization accuracy of 70 cm. Moreover, the proposed systems possess generalization capabilities by being robust to modifications in the room’s content or anchors’ configuration. Additionally, some of the proposed architectures have the ability to distribute the computational load over the APs.
APA, Harvard, Vancouver, ISO, and other styles
4

Goktug Gurler, C., Anil Aksay, Gozde Bozdagi Akar, and A. Murat Tekalp. "Architectures for multi-threaded MVC-compliant multi-view video decoding and benchmark tests." Signal Processing: Image Communication 25, no. 5 (2010): 325–34. http://dx.doi.org/10.1016/j.image.2010.01.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ahn, Jun Hyong, Heung Cheol Kim, Jong Kook Rhim, et al. "Multi-View Convolutional Neural Networks in Rupture Risk Assessment of Small, Unruptured Intracranial Aneurysms." Journal of Personalized Medicine 11, no. 4 (2021): 239. http://dx.doi.org/10.3390/jpm11040239.

Full text
Abstract:
Auto-detection of cerebral aneurysms via convolutional neural network (CNN) is being increasingly reported. However, few studies to date have accurately predicted the risk, but not the diagnosis itself. We developed a multi-view CNN for the prediction of rupture risk involving small unruptured intracranial aneurysms (UIAs) based on three-dimensional (3D) digital subtraction angiography (DSA). The performance of a multi-view CNN-ResNet50 in accurately predicting the rupture risk (high vs. non-high) of UIAs in the anterior circulation measuring less than 7 mm in size was compared with various CNN architectures (AlexNet and VGG16), with similar type but different layers (ResNet101 and ResNet152), and single image-based CNN (single-view ResNet50). The sensitivity, specificity, and overall accuracy of risk prediction were estimated and compared according to CNN architecture. The study included 364 UIAs in training and 93 in test datasets. A multi-view CNN-ResNet50 exhibited a sensitivity of 81.82 (66.76–91.29)%, a specificity of 81.63 (67.50–90.76)%, and an overall accuracy of 81.72 (66.98–90.92)% for risk prediction. AlexNet, VGG16, ResNet101, ResNet152, and single-view CNN-ResNet50 showed similar specificity. However, the sensitivity and overall accuracy were decreased (AlexNet, 63.64% and 76.34%; VGG16, 68.18% and 74.19%; ResNet101, 68.18% and 73.12%; ResNet152, 54.55% and 72.04%; and single-view CNN-ResNet50, 50.00% and 64.52%) compared with multi-view CNN-ResNet50. Regarding F1 score, it was the highest in multi-view CNN-ResNet50 (80.90 (67.29–91.81)%). Our study suggests that multi-view CNN-ResNet50 may be feasible to assess the rupture risk in small-sized UIAs.
APA, Harvard, Vancouver, ISO, and other styles
6

Bajraktari, Flakë, and Peter P. Pott. "Multi-view surgical phase recognition during laparoscopic cholecystectomy." Current Directions in Biomedical Engineering 10, no. 4 (2024): 45–48. https://doi.org/10.1515/cdbme-2024-2011.

Full text
Abstract:
Abstract In the realm of laparoscopic procedures, intelligent context-aware assistance systems hold promise for enhancing surgical workflows and patient safety. This study employs a multi-view approach to recognize surgical phases, combining data from a laparoscopic camera and an in-room camera simultaneously. The study aimed to improve phase recognition accuracy using a Transformer-based model with late sensor fusion, which yielded mixed results. The data poses significant challenges, as self-recorded videos are insufficient for extracting relevant information, necessitating real-world data. Additionally, the overall model needs refinement, as certain components degrade performance with poor data. This research highlights the complexities and opportunities in integrating multiview data for surgical phase recognition, emphasizing the importance of diverse data collection strategies and model architectures for real-world surgical settings.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhao, Haimei, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, and Dacheng Tao. "SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 7460–68. http://dx.doi.org/10.1609/aaai.v38i7.28577.

Full text
Abstract:
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance. Although distilling precise 3D geometry knowledge from LiDAR data could help tackle this challenge, the benefits of LiDAR information could be greatly hindered by the significant modality gap between different sensory modalities. To address this issue, we propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy. Specifically, we devise multi-modal architectures for both teacher and student models, including a LiDAR-camera fusion-based teacher and a simulated fusion-based student. Owing to the ``identical'' architecture design, the student can mimic the teacher to generate multi-modal features with merely multi-view images as input, where a geometry compensation module is introduced to bridge the modality gap. Furthermore, we propose a comprehensive multi-modal distillation scheme that supports intra-modal, cross-modal, and multi-modal fusion distillation simultaneously in the Bird's-eye-view space. Incorporating them together, our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment. Extensive experiments validate the effectiveness and superiority of SimDistill over state-of-the-art methods, achieving an improvement of 4.8% mAP and 4.1% NDS over the baseline detector. The source code will be released at https://github.com/ViTAE-Transformer/SimDistill.
APA, Harvard, Vancouver, ISO, and other styles
8

Suwarningsih, Wiwin, Ana Heryana, Dianadewi Riswantini, Ekasari Nugraheni, and Dikdik Krisnandi. "The multi-tenancy queueing system “QuAntri” for public service mall." Bulletin of Electrical Engineering and Informatics 11, no. 5 (2022): 2663–71. http://dx.doi.org/10.11591/eei.v11i5.4348.

Full text
Abstract:
In the new-normal era, public services must make various adjustments to keep the community safe during the COVID-19 pandemic. The Public Service Mall is an initiative to put several public services offices in a centralized location. However, it will create a crowd of people who want access to public service. This paper evaluates multi-tenant models with the rapid adaptation of cloud computing technology for all organizations' shapes and sizes, focusing on multi-tenants and multi-services, where each tenant might have multiple services to offer. We also proposed a multi-tenant architecture that can serve queues in several places to prevent the spread of COVID-19 due to the crowd of people in public places. The design of multi-tenants and multi-services applications should consider various aspects such as security, database, data communication, and user interface. We designed and built the "QuAntri'' business logic to simplify the process for multi-services in each tenant. The developed system is expected to improve tenants' performance and reduce the crowd in the public service. We compared our agile method for system development with some of the previous multi-tenant architectures. Our experiments showed that our method overall is better than the referenced model-view-controller (MVC), model-view-presenter (MVP), and model-model-view-presenter (M-MVP).
APA, Harvard, Vancouver, ISO, and other styles
9

Debats, Oscar A., Geert J. S. Litjens, and Henkjan J. Huisman. "Lymph node detection in MR Lymphography: false positive reduction using multi-view convolutional neural networks." PeerJ 7 (November 22, 2019): e8052. http://dx.doi.org/10.7717/peerj.8052.

Full text
Abstract:
Purpose To investigate whether multi-view convolutional neural networks can improve a fully automated lymph node detection system for pelvic MR Lymphography (MRL) images of patients with prostate cancer. Methods A fully automated computer-aided detection (CAD) system had been previously developed to detect lymph nodes in MRL studies. The CAD system was extended with three types of 2D multi-view convolutional neural networks (CNN) aiming to reduce false positives (FP). A 2D multi-view CNN is an efficient approximation of a 3D CNN, and three types were evaluated: a 1-view, 3-view, and 9-view 2D CNN. The three deep learning CNN architectures were trained and configured on retrospective data of 240 prostate cancer patients that received MRL images as the standard of care between January 2008 and April 2010. The MRL used ferumoxtran-10 as a contrast agent and comprised at least two imaging sequences: a 3D T1-weighted and a 3D T2*-weighted sequence. A total of 5089 lymph nodes were annotated by two expert readers, reading in consensus. A first experiment compared the performance with and without CNNs and a second experiment compared the individual contribution of the 1-view, 3-view, or 9-view architecture to the performance. The performances were visually compared using free-receiver operating characteristic (FROC) analysis and statistically compared using partial area under the FROC curve analysis. Training and analysis were performed using bootstrapped FROC and 5-fold cross-validation. Results Adding multi-view CNNs significantly (p < 0.01) reduced false positive detections. The 3-view and 9-view CNN outperformed (p < 0.01) the 1-view CNN, reducing FP from 20.6 to 7.8/image at 80% sensitivity. Conclusion Multi-view convolutional neural networks significantly reduce false positives in a lymph node detection system for MRL images, and three orthogonal views are sufficient. At the achieved level of performance, CAD for MRL may help speed up finding lymph nodes and assessing them for potential metastatic involvement.
APA, Harvard, Vancouver, ISO, and other styles
10

t M.D.Shelar, t. M. D. Shelar, Aishwarya C. Ayare Aishwarya.C.Ayare, Trushna S. Bagade Trushna.S.Bagade, Shivashree P. Nimbalkar Shivashree.P.Nimbalkar, and Muskan A. Mujawar Muskan.A.Mujawar. "Multi-View Feature Fusion for Effective Malware Classification Using Deep Learning." International Journal of Pharmaceutical Research and Applications 10, no. 3 (2025): 1070–76. https://doi.org/10.35629/4494-100310701076.

Full text
Abstract:
The rapid increase in global malware infections has necessitated the development of robust malware detection systems to mitigate threats, such as ransomware and crypto-miners, that aim for financial gain. Deep learning-based Convolutional Neural Network (CNN) model for classifying malware in Portable Executable (PE) binary files using a fusion feature set approach. An extensive evaluation of various deep learning architectures and machine learning classifiers, including Support Vector Machines (SVM), was conducted across multi-aspect feature sets encompassing static, dynamic, and image-based features, ultimately selecting the CNN model for optimal performance. The model achieved an accuracy using fusion feature sets, demonstrating its robustness and generalizability on unseen malware datasets
APA, Harvard, Vancouver, ISO, and other styles
11

M.D.Shelar, M. D. Shelar, Aishwarya C. Ayare Aishwarya.C.Ayare, Trushna S. Bagade Trushna.S.Bagade, Shivashree P. Nimbalkar Shivashree.P.Nimbalkar, and Muskan A. Mujawar Muskan.A.Mujawar. "Multi-View Feature Fusion for Effective Malware Classification Using Deep Learning." International Journal of Advances in Engineering and Management 7, no. 6 (2025): 01–09. https://doi.org/10.35629/5252-07060109.

Full text
Abstract:
The rapid increase in global malware infections has necessitated the development of robust malware detection systems to mitigate threats, such as ransomware and cryptominers, that aim for financial gain. Deep learningbased Convolutional Neural Network (CNN) model for classifying malware in Portable Executable (PE) binary files using a fusion feature set approach. An extensive evaluation of various deep learning architectures and machine learning classifiers, including Support Vector Machines (SVM), was conducted across multi-aspect feature sets encompassing static, dynamic, and image-based features, ultimately selecting the CNN model for optimal performance. The model achieved an accuracy using fusion feature sets, demonstrating its robustness and generalizability on unseen malware datasets.
APA, Harvard, Vancouver, ISO, and other styles
12

Fuentes Reyes, M., P. d’Angelo, and F. Fraundorfer. "AN EVALUATION OF STEREO AND MULTIVIEW ALGORITHMS FOR 3D RECONSTRUCTION WITH SYNTHETIC DATA." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1/W2-2023 (December 13, 2023): 1021–28. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-w2-2023-1021-2023.

Full text
Abstract:
Abstract. The reconstruction of 3D scenes from images has usually been addressed with two different strategies, namely stereo and multiview. The former requires rectified images and generates a disparity map, while the latter relies on the camera parameters and directly retrieves a depth map. For both cases, deep learning architectures have shown an outstanding performance. However, due to the differences between input and output data, the two strategies are difficult to compare on a common scene. Moreover, for remote sensing applications multi-view data is hard to acquire and the ground truth is either sparse or affected by outliers. Hence, in this article we evaluate the performance of stereo and multi-view architectures trained on synthetic data resembling remote sensing images. The data has been and processed and organized to be compatible with both kind of neural networks. For a fair comparison, training and testing are done only with two views. We focus on the accuracy of the reconstruction, as well as the impact of the depth range and the baseline of the stereo array. Results are presented for deep learning architectures and non-learning algorithms.
APA, Harvard, Vancouver, ISO, and other styles
13

Joshi, Pratibha T., Gurpreet Singh Saini, and Shivaji D. Pawara. "Application Of Densenet Architecture And Its Variants Towards Breast Cancer Detection: A Multi-View Analysis." International Journal of Environmental Sciences 11, no. 9s (2025): 835–54. https://doi.org/10.64252/azskbm54.

Full text
Abstract:
Artificial Intelligence has made giant strides in medical image classification using the development of Convolutional Neural Networks (CNNs) in the past decade. Different CNN architectures like Dense-Net Res-Net, etc., are used in the medical industry to identify patterns and features leading to a faster diagnosis. The fundamental motivation behind this research article is to study the application of different variants of Dense-Net architecture (DenseNet121, 169, and 201) towards breast cancer detection and provide a comparative analysis of Dense-net variants to the intended area of research with the support of digital mammography two mediolateral oblique (MLO). Two craniocaudal (CC) views of a single patient are used to extract the distinct features of breast cancer detection. The proposed research utilizes 9695 digital mammography images for this study. All input images are classified into three categories, Benign, Cancer, and Normal, with the help of expert radiologists as ground truth. All the proposed classifier's performances are tested with different testing matrices such as precision, responsiveness, and specificity. The concluding results demonstrate that these intended Dense-net architecture variants have delivered an exemplary performance with the highest accuracy of 94.90 % during training and 96.924% during testing on CC views. Precision, Recall, and F1 scores are 0.965, 0.969, and 0.967, respectively. A comparative analysis of the proposed model with its variants and other state-of-the-art methods is provided. Comparative research shows that DenseNet architecture can provide more accurate results when only left CC views are used as input. Acquired outcomes are again validated qualitatively with a radiologist expert in the field of breast cancer. The proposed architecture achieved state-of-the-art results with a fewer number of images and with less computation.
APA, Harvard, Vancouver, ISO, and other styles
14

Zhang, Yueping, Ao-Jan Su, and Guofei Jiang. "Understanding data center network architectures in virtualized environments: A view from multi-tier applications." Computer Networks 55, no. 9 (2011): 2196–208. http://dx.doi.org/10.1016/j.comnet.2011.03.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Mao, Wenju, Zhijie Liu, Heng Liu, Fuzeng Yang, and Meirong Wang. "Research Progress on Synergistic Technologies of Agricultural Multi-Robots." Applied Sciences 11, no. 4 (2021): 1448. http://dx.doi.org/10.3390/app11041448.

Full text
Abstract:
Multi-robots have shown good application prospects in agricultural production. Studying the synergistic technologies of agricultural multi-robots can not only improve the efficiency of the overall robot system and meet the needs of precision farming but also solve the problems of decreasing effective labor supply and increasing labor costs in agriculture. Therefore, starting from the point of view of an agricultural multiple robot system architectures, this paper reviews the representative research results of five synergistic technologies of agricultural multi-robots in recent years, namely, environment perception, task allocation, path planning, formation control, and communication, and summarizes the technological progress and development characteristics of these five technologies. Finally, because of these development characteristics, it is shown that the trends and research focus for agricultural multi-robots are to optimize the existing technologies and apply them to a variety of agricultural multi-robots, such as building a hybrid architecture of multi-robot systems, SLAM (simultaneous localization and mapping), cooperation learning of robots, hybrid path planning and formation reconstruction. While synergistic technologies of agricultural multi-robots are extremely challenging in production, in combination with previous research results for real agricultural multi-robots and social development demand, we conclude that it is realistic to expect automated multi-robot systems in the future.
APA, Harvard, Vancouver, ISO, and other styles
16

Al Farid, Fahmid, Ahsanul Bari, Abu Saleh Musa Miah, Sarina Mansor, Jia Uddin, and S. Prabha Kumaresan. "A Structured and Methodological Review on Multi-View Human Activity Recognition for Ambient Assisted Living." Journal of Imaging 11, no. 6 (2025): 182. https://doi.org/10.3390/jimaging11060182.

Full text
Abstract:
Ambient Assisted Living (AAL) leverages technology to support the elderly and individuals with disabilities. A key challenge in these systems is efficient Human Activity Recognition (HAR). However, no study has systematically compared single-view (SV) and multi-view (MV) Human Activity Recognition approaches. This review addresses this gap by analyzing the evolution from single-view to multi-view recognition systems, covering benchmark datasets, feature extraction methods, and classification techniques. We examine how activity recognition systems have transitioned to multi-view architectures using advanced deep learning models optimized for Ambient Assisted Living, thereby improving accuracy and robustness. Furthermore, we explore a wide range of machine learning and deep learning models—including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Temporal Convolutional Networks (TCNs), and Graph Convolutional Networks (GCNs)—along with lightweight transfer learning methods suitable for environments with limited computational resources. Key challenges such as data remediation, privacy, and generalization are discussed, alongside potential solutions such as sensor fusion and advanced learning strategies. This study offers comprehensive insights into recent advancements and future directions, guiding the development of intelligent, efficient, and privacy-compliant Human Activity Recognition systems for Ambient Assisted Living applications.
APA, Harvard, Vancouver, ISO, and other styles
17

Yan, Fengyu, Xiaobao Wang, Dongxiao He, Longbiao Wang, Jianwu Dang, and Di Jin. "HeterGP: Bridging Heterogeneity in Graph Neural Networks with Multi-View Prompting." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 20 (2025): 21895–903. https://doi.org/10.1609/aaai.v39i20.35496.

Full text
Abstract:
The challenges tied to unstructured graph data are manifold, primarily falling into node, edge, and graph-level problem categories. Graph Neural Networks (GNNs) serve as effective tools to tackle these issues. However, individual tasks often demand distinct model architectures, and training these models typically requires abundant labeled data, a luxury often unavailable in practical settings. Recently, various "prompt tuning" methodologies have emerged to empower GNNs to adapt to multi-task learning with limited labels. The crux of these methods lies in bridging the gap between pre-training tasks and downstream objectives. Nonetheless, a prevalent oversight in existing studies is the homophily-centric nature of prompt tuning frameworks, disregarding scenarios characterized by high heterogeneity. To remedy this oversight, we introduce a novel prompting strategy named HeterGP tailored for highly heterophilic scenarios. Specifically, we present a dual-view approach to capture both homophilic and heterophilic information, along with a prompt graph design that encompasses token initialization and insertion patterns. Through extensive experiments conducted in a few-shot context encompassing node and graph classification tasks, our method showcases superior performance in highly heterophilic environments compared to state-of-the-art prompt tuning techniques.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhu, Binglin, Fusang Liu, Ziwen Xie, Yan Guo, Baoguo Li, and Yuntao Ma. "Quantification of light interception within image-based 3-D reconstruction of sole and intercropped canopies over the entire growth season." Annals of Botany 126, no. 4 (2020): 701–12. http://dx.doi.org/10.1093/aob/mcaa046.

Full text
Abstract:
Abstract Background and Aims Light interception is closely related to canopy architecture. Few studies based on multi-view photography have been conducted in a field environment, particularly studies that link 3-D plant architecture with a radiation model to quantify the dynamic canopy light interception. In this study, we combined realistic 3-D plant architecture with a radiation model to quantify and evaluate the effect of differences in planting patterns and row orientations on canopy light interception. Methods The 3-D architectures of maize and soybean plants were reconstructed for sole crops and intercrops based on multi-view images obtained at five growth dates in the field. We evaluated the accuracy of the calculated leaf length, maximum leaf width, plant height and leaf area according to the measured data. The light distribution within the 3-D plant canopy was calculated with a 3-D radiation model. Finally, we evaluated canopy light interception in different row orientations. Key Results There was good agreement between the measured and calculated phenotypic traits, with an R2 >0.97. The light distribution was more uniform for intercropped maize and more concentrated for sole maize. At the maize silking stage, 85 % of radiation was intercepted by approx. 55 % of the upper canopy region for maize and by approx. 33 % of the upper canopy region for soybean. There was no significant difference in daily light interception between the different row orientations for the entire intercropping and sole systems. However, for intercropped maize, near east–west orientations showed approx. 19 % higher daily light interception than near south–north orientations. For intercropped soybean, daily light interception showed the opposite trend. It was approx. 49 % higher for near south–north orientations than for near east–west orientations. Conclusions The accurate reconstruction of 3-D plants grown in the field based on multi-view images provides the possibility for high-throughput 3-D phenotyping in the field and allows a better understanding of the relationship between canopy architecture and the light environment.
APA, Harvard, Vancouver, ISO, and other styles
19

O’Connell, Thomas P., Tyler Bonnen, Yoni Friedman, et al. "Approximating Human-Level 3D Visual Inferences With Deep Neural Networks." Open Mind 9 (2025): 305–24. https://doi.org/10.1162/opmi_a_00189.

Full text
Abstract:
Abstract Humans make rich inferences about the geometry of the visual world. While deep neural networks (DNNs) achieve human-level performance on some psychophysical tasks (e.g., rapid classification of object or scene categories), they often fail in tasks requiring inferences about the underlying shape of objects or scenes. Here, we ask whether and how this gap in 3D shape representation between DNNs and humans can be closed. First, we define the problem space: after generating a stimulus set to evaluate 3D shape inferences using a match-to-sample task, we confirm that standard DNNs are unable to reach human performance. Next, we construct a set of candidate 3D-aware DNNs including 3D neural field (Light Field Network), autoencoder, and convolutional architectures. We investigate the role of the learning objective and dataset by training single-view (the model only sees one viewpoint of an object per training trial) and multi-view (the model is trained to associate multiple viewpoints of each object per training trial) versions of each architecture. When the same object categories appear in the model training and match-to-sample test sets, multi-view DNNs approach human-level performance for 3D shape matching, highlighting the importance of a learning objective that enforces a common representation across viewpoints of the same object. Furthermore, the 3D Light Field Network was the model most similar to humans across all tests, suggesting that building in 3D inductive biases increases human-model alignment. Finally, we explore the generalization performance of multi-view DNNs to out-of-distribution object categories not seen during training. Overall, our work shows that multi-view learning objectives for DNNs are necessary but not sufficient to make similar 3D shape inferences as humans and reveals limitations in capturing human-like shape inferences that may be inherent to DNN modeling approaches. We provide a methodology for understanding human 3D shape perception within a deep learning framework and highlight out-of-domain generalization as the next challenge for learning human-like 3D representations with DNNs.
APA, Harvard, Vancouver, ISO, and other styles
20

Wu, Min, Sirui Xu, Ziwei Wang, et al. "ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion." Remote Sensing 17, no. 12 (2025): 1988. https://doi.org/10.3390/rs17121988.

Full text
Abstract:
Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net (Integrated CNN-Transformer Network) that synergistically combines convolutional neural networks with Transformer architectures. Our approach harnesses the complementary strengths of CNNs in capturing local geometric details and Transformers in establishing long-range dependencies, enabling comprehensive joint perception of both local and global visual patterns. Furthermore, capitalizing on the Transformer’s flexible input processing mechanism, we develop an attention-guided non-uniform cropping strategy that dynamically eliminates redundant image patches with minimal impact on localization accuracy, thereby achieving enhanced computational efficiency. To facilitate practical deployment, we propose a deep embedding clustering algorithm optimized for rapid parsing of geo-localization information. Extensive experiments demonstrate that ICT-Net establishes new state-of-the-art localization accuracy on the CVUSA benchmark, achieving a top-1 recall rate improvement of 8.6% over previous methods. Additional validation on a challenging real-world dataset collected at Beihang University (BUAA) further confirms the framework’s effectiveness and practical applicability in complex urban environments, particularly showing 23% higher robustness to vegetation variations.
APA, Harvard, Vancouver, ISO, and other styles
21

Chidera Ogeawuchi, Jeffrey, Abel Chukwuemeke Uzoka, Abraham Ayodeji Abayomi, Oluwademilade Aderemi Agboola, Toluwase Peter Gbenle, and Samuel Owoade. "Advancements in Scalable Data Modeling and Reporting for SaaS Applications and Cloud Business Intelligence." International Journal of Advanced Multidisciplinary Research and Studies 4, no. 6 (2024): 2155–62. https://doi.org/10.62225/2583049x.2024.4.6.4267.

Full text
Abstract:
As Software-as-a-Service (SaaS) applications and cloud-based Business Intelligence (BI) platforms proliferate across industries, the demand for scalable, responsive, and intelligent data modeling and reporting solutions has become paramount. This paper presents a comprehensive conceptual and technical exploration of scalable data architectures and reporting mechanisms tailored for SaaS ecosystems and cloud-native BI environments. It begins by contextualizing the rise of multi-tenant cloud systems and outlines the need for resilient data modeling practices that balance extensibility with operational efficiency. Foundational concepts such as multi-tenant architecture patterns, schema design, and metadata governance are examined, followed by a review of emerging innovations in real-time pipelines, self-service analytics, and cost-optimized performance strategies. The paper further investigates the integration of automation and machine learning into data workflows, highlighting AutoML for predictive reporting, AI-based data quality monitoring, and automated documentation through semantic modeling tools. Critical challenges are addressed, including trade-offs between scalability and system complexity, evolving compliance and privacy standards, and the ethical implications of AI-enhanced reporting. Finally, the discussion identifies emerging trends such as generative AI, data fabric architectures, and edge analytics, offering a roadmap for future research and development in the domain of scalable cloud data systems. This work contributes to both academic discourse and practical understanding by framing a holistic view of the data lifecycle in contemporary SaaS and BI applications.
APA, Harvard, Vancouver, ISO, and other styles
22

Núñez, Lorena, Jesús Savage, Miguel Moctezuma-Flores, Luis Contreras, Marco Negrete, and Hiroyuki Okada. "Multi-View Object Recognition and Pose Sequence Estimation Using HMMs." Journal of Robotics and Mechatronics 37, no. 3 (2025): 579–93. https://doi.org/10.20965/jrm.2025.p0579.

Full text
Abstract:
This work proposes an integration of a vision system for a service robot when its gripper holds an object. Based on the particular conditions of the problem, the solution is modular and allows one to use various options to extract features and classify data. Since the robot can move the object and has information about its position, the proposed solution takes advantage of this by applying preprocessing techniques to improve the performance of classifiers that can be considered weak. In addition to being able to classify the object, it is possible to infer the sequence of movements that it carries out using hidden Markov models (HMMs). The system was tested using a public dataset, the COIL-100, as well as with a dataset of real objects using the human support robot (HSR). The results show that the proposed vision system is able to work with a low number of shots in each class. Two HMM architectures are tested. In order to enhance classification by adding information from multiple perspectives, various criteria were analyzed. A simple model is built to integrate information and infer object movements. The system also has an next best view algorithm where different parameters are tested in order to improve both accuracy in the classification of the object and its pose, especially in objects that may be similar in several of their views. The system was tested using COIL-100 dataset and with real objects in common use and a HSR robot to take the real dataset. In general, using relatively few shots of each class and a plain computer, consistent results were obtained, requiring only 8.192×10-3 MFLOPs for sequence processing using concatenated HMMs compared to 404.34 MFLOPs for CNN+LSTM.
APA, Harvard, Vancouver, ISO, and other styles
23

Efimov, A. O., I. I. Livshits, M. O. Meshcheryakov, E. A. Rogozin, and V. R. Romanova. "On certain aspects of standardization and operating conditions of automated systems." Herald of Dagestan State Technical University. Technical Sciences 50, no. 4 (2024): 101–8. http://dx.doi.org/10.21822/2073-6185-2023-50-4-101-108.

Full text
Abstract:
Objective. In this paper, the main aspects of the operating conditions of the AS are considered, as well as the issues of standardization of the stages of the life cycle of the AS (creation, commissioning, maintenance, etc.) at the state level. In this subject area, the technological features of building an AS based on various technical architectures are briefly considered, since both foreign processors based on x86-64 architectures and processors of domestic development based on the Advanced RISC Machine architecture are currently applicable. The use of various components of the AS requires additional study in terms of ordering the composition and configuration of specific SPI. Since each processor has a multi-level architecture, this fact objectively complicates the possibilities for full security testing and detection of all vulnerabilities. Method. In the course of the work, the threats and vulnerabilities of individual components of the AS from the point of view of intentional and unintentional threats are considered. The information on the main state standards applied to ensure the protection of information in the AS at the present time is summarized. Result. The main features of the operating conditions of the AS are considered and it is determined that the vulnerabilities of the components are due to the imperfection of the procedures for developing and covering testing of hardware and software. It is determined that in order to protect information in the AS, it is necessary to build a multi-level protection system with state accreditation. Conclusion. Proposals are presented for the application of state standardization for the protection of information in the AS, taking into account the current and prospective threat landscape, including taking into account the design features (undeclared capabilities) of the components. Overcoming threats is possible with the creation of a multi-level information protection system with state accreditation.
APA, Harvard, Vancouver, ISO, and other styles
24

Ugile, Tukaram, and Dr Nilesh Uke. "TRANSFORMER ARCHITECTURES FOR COMPUTER VISION: A COMPREHENSIVE REVIEW AND FUTURE RESEARCH DIRECTIONS." Journal of Dynamics and Control 9, no. 3 (2025): 70–79. https://doi.org/10.71058/jodac.v9i3005.

Full text
Abstract:
Transformers have made revolutionary impacts in Natural Language Processing (NLP) area and started making significant contributions in Computer Vision problems. This paper provides a comprehensive review of the Transformer Architectures in Computer Vision, providing a detailed view about their evolution from Vision Transformers (ViTs) to more advanced variants of transformers like Swin Transformer, Transformer-XL, and Hybrid CNN-Transformer models. We have tried to make the study of the advantages of the Transformers over the traditional Convolutional Neural Networks (CNNs), their applications for Object Detection, Image Classification, Video Analysis, and their computational challenges. Finally, we discuss the future research directions, including the self-attention mechanisms, multi-modal learning, and lightweight architectures for Edge Computing.
APA, Harvard, Vancouver, ISO, and other styles
25

Aung, Aye Nyein, Che-Wei Liao, and Jeih-Weih Hung. "Effective Monoaural Speech Separation through Convolutional Top-Down Multi-View Network." Future Internet 16, no. 5 (2024): 151. http://dx.doi.org/10.3390/fi16050151.

Full text
Abstract:
Speech separation, sometimes known as the “cocktail party problem”, is the process of separating individual speech signals from an audio mixture that includes ambient noises and several speakers. The goal is to extract the target speech in this complicated sound scenario and either make it easier to understand or increase its quality so that it may be used in subsequent processing. Speech separation on overlapping audio data is important for many speech-processing tasks, including natural language processing, automatic speech recognition, and intelligent personal assistants. New speech separation algorithms are often built on a deep neural network (DNN) structure, which seeks to learn the complex relationship between the speech mixture and any specific speech source of interest. DNN-based speech separation algorithms outperform conventional statistics-based methods, although they typically need a lot of processing and/or a larger model size. This study presents a new end-to-end speech separation network called ESC-MASD-Net (effective speaker separation through convolutional multi-view attention and SuDoRM-RF network), which has relatively fewer model parameters compared with the state-of-the-art speech separation architectures. The network is partly inspired by the SuDoRM-RF++ network, which uses multiple time-resolution features with downsampling and resampling for effective speech separation. ESC-MASD-Net incorporates the multi-view attention and residual conformer modules into SuDoRM-RF++. Additionally, the U-Convolutional block in ESC-MASD-Net is refined with a conformer layer. Experiments conducted on the WHAM! dataset show that ESC-MASD-Net outperforms SuDoRM-RF++ significantly in the SI-SDRi metric. Furthermore, the use of the conformer layer has also improved the performance of ESC-MASD-Net.
APA, Harvard, Vancouver, ISO, and other styles
26

Xu, Kele, Kang You, Ming Feng, and Boqing Zhu. "Trust-worth multi-representation learning for audio classification with uncertainty estimation." Journal of the Acoustical Society of America 153, no. 3_supplement (2023): A125. http://dx.doi.org/10.1121/10.0018383.

Full text
Abstract:
Multi-view learning has been explored for audio classification tasks, exploiting different representations of audio signals, ranging from MFCC, CQT, to raw signals. The quality of each view may vary for different audio signals, and the appropriate uncertainty quantification for each view has not been fully explored. In this work, we explore a trusted multi-view learning framework for classification tasks in order to fully incorporate different views. Our framework consists of three parallel branches of Transformer architectures (Gammatone spectrogram, log-Mel and CQT) and they are combined using the uncertainty estimation of different branch. In addition to computing the classification probabilities, the uncertainty of each representation can also be obtained using the framework. We firstly calculate the evidence based on feature vectors to obtain the probabilities and the uncertainty of classification problems for Gammatone, log-Mel and CQT branch. By integrating the confidence from each of the different representations using the Dempster–Shafer theory, the classification framework can provide higher accuracy and confidence. To demonstrate the effectiveness of the proposed framework, we conduct the experiments on the GTZAN dataset. The obtained results show that our method can reach the accuracy of 83.0%, which significantly outperforms single representation-based methods while providing uncertainty estimation for different views.
APA, Harvard, Vancouver, ISO, and other styles
27

Wang, Jinglu, Bo Sun, and Yan Lu. "MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single Image." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8949–56. http://dx.doi.org/10.1609/aaai.v33i01.33018949.

Full text
Abstract:
In this paper, we address the problem of reconstructing an object’s surface from a single image using generative networks. First, we represent a 3D surface with an aggregation of dense point clouds from multiple views. Each point cloud is embedded in a regular 2D grid aligned on an image plane of a viewpoint, making the point cloud convolution-favored and ordered so as to fit into deep network architectures. The point clouds can be easily triangulated by exploiting connectivities of the 2D grids to form mesh-based surfaces. Second, we propose an encoder-decoder network that generates such kind of multiple view-dependent point clouds from a single image by regressing their 3D coordinates and visibilities. We also introduce a novel geometric loss that is able to interpret discrepancy over 3D surfaces as opposed to 2D projective planes, resorting to the surface discretization on the constructed meshes. We demonstrate that the multi-view point regression network outperforms state-of-the-art methods with a significant improvement on challenging datasets.
APA, Harvard, Vancouver, ISO, and other styles
28

Yalcin, Ilyas, Recep Can, Candan Gokceoglu, and Sultan Kocaman. "A Novel Rock Mass Discontinuity Detection Approach with CNNs and Multi-View Image Augmentation." ISPRS International Journal of Geo-Information 13, no. 6 (2024): 185. http://dx.doi.org/10.3390/ijgi13060185.

Full text
Abstract:
Discontinuity is a key element used by geoscientists and civil engineers to characterize rock masses. The traditional approach to detecting and measuring rock discontinuity relies on fieldwork, which poses dangers to human life. Photogrammetric pattern recognition and 3D measurement techniques offer new possibilities without direct contact with rock masses. This study proposes a new approach to detect discontinuities using close-range photogrammetric techniques and convolutional neural networks (CNNs) trained on a small amount of data. Investigations were conducted on basalts in Bala, Ankara, Türkiye. A total of 34 multi-view images were collected with a remotely piloted aircraft system (RPAS), and discontinuity lines were manually delineated on a point cloud generated from these images. The lines were back-projected onto the raw images to increase the amount of data, a process we call multi-view (3D) augmentation. We further evaluated radiometric and geometric augmentation methods, the contribution of multi-view augmentation to the proposed model, and the transfer learning performance of six different CNN architectures. The highest performance was achieved with U-Net + SE-ResNeXt-50 with an F1-score of 90.6%. The CNN model trained from scratch with local features also yielded a similar F1-score (91.7%), which is the highest performance reported in the literature.
APA, Harvard, Vancouver, ISO, and other styles
29

Ganeshkumar Palanisamy. "From Data Lakes to Data Fabric/Mesh: The Future of Enterprise Data Platforms in a Multi-Cloud World." Journal of Computer Science and Technology Studies 7, no. 5 (2025): 23–34. https://doi.org/10.32996/jcsts.2025.7.5.4.

Full text
Abstract:
The evolution of enterprise data management has seen a shift from traditional data warehouses to data lakes, and now towards data fabrics and data meshes. This article explores this progression, particularly in the context of multi-cloud environments. Data fabrics provide a unified, virtualized view of data across disparate sources, while data meshes decentralize data ownership and governance, aligning with modern, agile development practices. The article discusses how these architectures can be implemented in multi-cloud setups, ensuring data consistency, security, and performance. It also addresses the challenges and opportunities presented by this shift, such as integrating AI and ML workloads into the data platform. The article concludes with a forward-looking perspective on how enterprise data platforms will continue to evolve to meet the demands of a multi-cloud, data-driven world.
APA, Harvard, Vancouver, ISO, and other styles
30

Mrozek, Mirosław. "Multi-Agent Control System for the Movement of Uniaxial Objects." Solid State Phenomena 237 (August 2015): 183–88. http://dx.doi.org/10.4028/www.scientific.net/ssp.237.183.

Full text
Abstract:
Multi-agent systems are used mainly in IT solutions and control groups of robots. From the point of view of classical control architectures, they are a kind of distributed systems in which nodes perform advanced algorithms, usually associated with the technology of artificial intelligence, and they can be considered as agents. The article describes the multi-agents control system of objects of uniaxial movements. An example of such a system to control a repository with movable racks with electric motors is presented. Each rack acts as an agent through the implemented control of the resources of embedded microcontrollers. Such a system provides high quality control, guaranteeing long-lasting, trouble-free operation while maintaining the safety of both service and stored items.
APA, Harvard, Vancouver, ISO, and other styles
31

Griffiths, David, and Jan Boehm. "A Review on Deep Learning Techniques for 3D Sensed Data Classification." Remote Sensing 11, no. 12 (2019): 1499. http://dx.doi.org/10.3390/rs11121499.

Full text
Abstract:
Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches, including RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.
APA, Harvard, Vancouver, ISO, and other styles
32

PARIS, NICOLAS. "POMPC: A C LANGUAGE FOR DATA PARALLELISM." International Journal of Modern Physics C 04, no. 01 (1993): 85–96. http://dx.doi.org/10.1142/s0129183193000094.

Full text
Abstract:
POMPC is a parallel language dedicated to the programming of Massively Parallel Computers according to a synchronous Data Parallel model. It is an extension of the ANSI C language and follows its philosophy. Parallelism is explicitly handled by the definition of collections of parallel variables and the definition of communication primitives. A methodology is presented in order to easily port the language on different target architectures. Virtualization is introduced to handle simultaneously several collections of different sizes and shapes. Virtualization management is a key point of the compilation process. Programmer, architecture, compilation and system points of view lead to a special implementation of the virtualization mixing physical and virtual parallel objects. The implementation of the virtualization is adapted for the development of communication libraries and also adapted to enlarge the asynchronous sections of code for SPMD architecture. The portability of the POMPC language is validated by several implementations for mono/multi-process simulation on UNIX machines, for the Connection Machine CM-2, for the MasPar MP-1 and a compiler is in preparation for the iPSC-860.
APA, Harvard, Vancouver, ISO, and other styles
33

Alouane-Ksouri, Sonia, and Minyar Sassi Hidri. "Fuzzy Learning of Co-Similarities from Large-Scale Documents." International Journal of Fuzzy System Applications 4, no. 4 (2015): 70–86. http://dx.doi.org/10.4018/ijfsa.2015100104.

Full text
Abstract:
To analyze and explore large textual corpus, we are generally limited by the available main memory. This may lead to a proliferation of processor load due to greedy computing. The authors propose to deal with this problem to compute co-similarities from large-scale documents. The authors propose to enhance co-similarity learning by upstream and downstream parallel computing. The first deploys the fuzzy linear model in a Grid environment. The second deals with multi-view datasets while introducing different architectures by using several instances of a fuzzy triadic similarity algorithm.
APA, Harvard, Vancouver, ISO, and other styles
34

Naseer Ahamed Mohammed. "The convergence horizon: Cloud-native technologies reshaping society and infrastructure." World Journal of Advanced Engineering Technology and Sciences 15, no. 1 (2025): 538–46. https://doi.org/10.30574/wjaets.2025.15.1.0212.

Full text
Abstract:
This article explores the transformative impact of cloud-native technologies on society and infrastructure beyond their technical implementations. Beginning with an overview of market growth and architectural principles, the discussion explores how these technologies drive environmental sustainability through optimized resource utilization and reduced carbon emissions. The article continues by examining workforce evolution, including emerging specializations, skills requirements, and the shift toward distributed work models. The article further follows how cloud-native platforms democratize technology access for smaller organizations and emerging markets by reducing capital barriers and enabling innovation. Ethical and governance considerations, including data sovereignty, privacy challenges in multi-tenant architectures, and regulatory complexities in cross-border operations, are thoroughly addressed. The conclusion synthesizes these findings to present a comprehensive view of how cloud-native technologies function as socioeconomic catalysts while offering recommendations for policymakers and technology leaders to maximize benefits while mitigating potential risks.
APA, Harvard, Vancouver, ISO, and other styles
35

Yoshida, Naoto. "Homeostatic Agent for General Environment." Journal of Artificial General Intelligence 8, no. 1 (2018): 1–22. http://dx.doi.org/10.1515/jagi-2017-0001.

Full text
Abstract:
AbstractOne of the essential aspect in biological agents is dynamic stability. This aspect, called homeostasis, is widely discussed in ethology, neuroscience and during the early stages of artificial intelligence. Ashby’s homeostats are general-purpose learning machines for stabilizing essential variables of the agent in the face of general environments. However, despite their generality, the original homeostats couldn’t be scaled because they searched their parameters randomly. In this paper, first we re-define the objective of homeostats as the maximization of a multi-step survival probability from the view point of sequential decision theory and probabilistic theory. Then we show that this optimization problem can be treated by using reinforcement learning algorithms with special agent architectures and theoretically-derived intrinsic reward functions. Finally we empirically demonstrate that agents with our architecture automatically learn to survive in a given environment, including environments with visual stimuli. Our survival agents can learn to eat food, avoid poison and stabilize essential variables through theoretically-derived single intrinsic reward formulations.
APA, Harvard, Vancouver, ISO, and other styles
36

Casellas, Ramon, Ricardo Martínez, Ricard Vilalta, and Raül Muñoz. "Abstraction and Control of Multi-Domain Disaggregated Optical Networks with OpenROADM Device Models." IEEE Journal of Lightwave Technology 38, no. 9 (2020): 2606–15. https://doi.org/10.1109/JLT.2020.2969248.

Full text
Abstract:
Network operators are evolving their optical transport networks in order to make them cost effective. In some scenarios, this means considering adopting software-defined networking principles along with open and standard interfaces, leveraging the underlying hardware programmability while, at the same time, considering the benefits of (partial) disaggregation, in view of the potential benefits of decoupling terminal devices from the line systems or of separating the hardware from the controlling software. In this evolution, operators often segment their networks into domains. Reasons include the need to scale, or confidentiality and/or vendor interoperability constraints. Additionally, the need to virtualize the (multi-domain) transport network has emerged as a key requirement to support functions such as network slicing and partitioning, and to empower end users to control their allocated partitions, enabling new business models related to multi-tenancy. In this context, several standards-defining organizations have been working on architectures, interfaces, and protocols to support requirements, such as the Abstraction and Control of Traffic Engineering Networks of the Internet Engineering Task Force, known as ACTN. In this article, we experimentally validate a control plane architecture for multi-domain disaggregated transport networks that relies on the deployment of network elements compliant with the OpenROADM multi source agreement device model. We demonstrate the abstraction and control of such networks in line with the ACTN framework and we show the applicability of the approach with a proof-of-concept testbed implementation.
APA, Harvard, Vancouver, ISO, and other styles
37

Agarla, Mirko, Paolo Napoletano, and Raimondo Schettini. "Quasi Real-Time Apple Defect Segmentation Using Deep Learning." Sensors 23, no. 18 (2023): 7893. http://dx.doi.org/10.3390/s23187893.

Full text
Abstract:
Defect segmentation of apples is an important task in the agriculture industry for quality control and food safety. In this paper, we propose a deep learning approach for the automated segmentation of apple defects using convolutional neural networks (CNNs) based on a U-shaped architecture with skip-connections only within the noise reduction block. An ad-hoc data synthesis technique has been designed to increase the number of samples and at the same time to reduce neural network overfitting. We evaluate our model on a dataset of multi-spectral apple images with pixel-wise annotations for several types of defects. In this paper, we show that our proposal outperforms in terms of segmentation accuracy general-purpose deep learning architectures commonly used for segmentation tasks. From the application point of view, we improve the previous methods for apple defect segmentation. A measure of the computational cost shows that our proposal can be employed in real-time (about 100 frame-per-second on GPU) and in quasi-real-time (about 7/8 frame-per-second on CPU) visual-based apple inspection. To further improve the applicability of the method, we investigate the potential of using only RGB images instead of multi-spectral images as input images. The results prove that the accuracy in this case is almost comparable with the multi-spectral case.
APA, Harvard, Vancouver, ISO, and other styles
38

Thompson, Alison, Kelly Thorp, Matthew Conley, et al. "Comparing Nadir and Multi-Angle View Sensor Technologies for Measuring in-Field Plant Height of Upland Cotton." Remote Sensing 11, no. 6 (2019): 700. http://dx.doi.org/10.3390/rs11060700.

Full text
Abstract:
Plant height is a morphological characteristic of plant growth that is a useful indicator of plant stress resulting from water and nutrient deficit. While height is a relatively simple trait, it can be difficult to measure accurately, especially in crops with complex canopy architectures like cotton. This paper describes the deployment of four nadir view ultrasonic transducers (UTs), two light detection and ranging (LiDAR) systems, and an unmanned aerial system (UAS) with a digital color camera to characterize plant height in an upland cotton breeding trial. The comparison of the UTs with manual measurements demonstrated that the Honeywell and Pepperl+Fuchs sensors provided more precise estimates of plant height than the MaxSonar and db3 Pulsar sensors. Performance of the multi-angle view LiDAR and UAS technologies demonstrated that the UAS derived 3-D point clouds had stronger correlations (0.980) with the UTs than the proximal LiDAR sensors. As manual measurements require increased time and labor in large breeding trials and are prone to human error reducing repeatability, UT and UAS technologies are an efficient and effective means of characterizing cotton plant height.
APA, Harvard, Vancouver, ISO, and other styles
39

Sharada, Gupta, and N. Eshwarappa Murundi. "Breast cancer detection through attention based feature integration model." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 2 (2024): 2254–64. https://doi.org/10.11591/ijai.v13.i2.pp2254-2264.

Full text
Abstract:
Breast cancer is detected by screening mammography wherein X-rays are used to produce images of the breast. Mammograms for screening can detect breast cancer early. This research focuses on the challenges of using multi-view mammography to diagnose breast cancer. By examining numerous perspectives of an image, an attention-based feature-integration mechanism (AFIM) model that concentrates on local abnormal areas associated with cancer and displays the essential features considered for evaluation, analyzing cross-view data. This is segmented into two views the bi-lateral attention module (BAM) module integrates the left and right activation maps for a similar projection is used to create a spatial attention map that highlights the impact of asymmetries. Here the module's focus is on data gathering through mediolateral oblique (MLO) and bilateral craniocaudal (CC) for each breast to develop an attention module. The proposed AFIM model generates using spatial attention maps obtained from the identical image through other breasts to identify bilaterally uneven areas and class activation map (CAM) generated from two similar breast images to emphasize the feature channels connected to a single lesion in a breast. AFIM model may easily be included in ResNet-style architectures to develop multi-view classification models.
APA, Harvard, Vancouver, ISO, and other styles
40

Sharifi, Ali Asghar, Ali Zoljodi, and Masoud Daneshtalab. "TrajectoryNAS: A Neural Architecture Search for Trajectory Prediction." Sensors 24, no. 17 (2024): 5696. http://dx.doi.org/10.3390/s24175696.

Full text
Abstract:
Autonomous driving systems are a rapidly evolving technology. Trajectory prediction is a critical component of autonomous driving systems that enables safe navigation by anticipating the movement of surrounding objects. Lidar point-cloud data provide a 3D view of solid objects surrounding the ego-vehicle. Hence, trajectory prediction using Lidar point-cloud data performs better than 2D RGB cameras due to providing the distance between the target object and the ego-vehicle. However, processing point-cloud data is a costly and complicated process, and state-of-the-art 3D trajectory predictions using point-cloud data suffer from slow and erroneous predictions. State-of-the-art trajectory prediction approaches suffer from handcrafted and inefficient architectures, which can lead to low accuracy and suboptimal inference times. Neural architecture search (NAS) is a method proposed to optimize neural network models by using search algorithms to redesign architectures based on their performance and runtime. This paper introduces TrajectoryNAS, a novel neural architecture search (NAS) method designed to develop an efficient and more accurate LiDAR-based trajectory prediction model for predicting the trajectories of objects surrounding the ego vehicle. TrajectoryNAS systematically optimizes the architecture of an end-to-end trajectory prediction algorithm, incorporating all stacked components that are prerequisites for trajectory prediction, including object detection and object tracking, using metaheuristic algorithms. This approach addresses the neural architecture designs in each component of trajectory prediction, considering accuracy loss and the associated overhead latency. Our method introduces a novel multi-objective energy function that integrates accuracy and efficiency metrics, enabling the creation of a model that significantly outperforms existing approaches. Through empirical studies, TrajectoryNAS demonstrates its effectiveness in enhancing the performance of autonomous driving systems, marking a significant advancement in the field. Experimental results reveal that TrajcetoryNAS yields a minimum of 4.8 higger accuracy and 1.1* lower latency over competing methods on the NuScenes dataset.
APA, Harvard, Vancouver, ISO, and other styles
41

Zou, Yanmei, Hongshan Yu, Zhengeng Yang, Zechuan Li, and Naveed Akhtar. "Improved MLP Point Cloud Processing with High-Dimensional Positional Encoding." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 7891–99. http://dx.doi.org/10.1609/aaai.v38i7.28625.

Full text
Abstract:
Multi-Layer Perceptron (MLP) models are the bedrock of contemporary point cloud processing. However, their complex network architectures obscure the source of their strength. We first develop an “abstraction and refinement” (ABS-REF) view for the neural modeling of point clouds. This view elucidates that whereas the early models focused on the ABS stage, the more recent techniques devise sophisticated REF stages to attain performance advantage in point cloud processing. We then borrow the concept of “positional encoding” from transformer literature, and propose a High-dimensional Positional Encoding (HPE) module, which can be readily deployed to MLP based architectures. We leverage our module to develop a suite of HPENet, which are MLP networks that follow ABS-REF paradigm, albeit with a sophisticated HPE based REF stage. The developed technique is extensively evaluated for 3D object classification, object part segmentation, semantic segmentation and object detection. We establish new state-of-the-art results of 87.6 mAcc on ScanObjectNN for object classification, and 85.5 class mIoU on ShapeNetPart for object part segmentation, and 72.7 and 78.7 mIoU on Area-5 and 6-fold experiments with S3DIS for semantic segmentation. The source code for this work is available at https://github.com/zouyanmei/HPENet.
APA, Harvard, Vancouver, ISO, and other styles
42

Vinodkrishnan, Kulathumani, Nikhil Chandhok, Arjan Durresi, Raj Jain, Ramesh Jagannathan, and Srinivasan Seetharaman. "Survivability in IP over WDM networks." Journal of High Speed Networks 10, no. 2 (2001): 79–90. https://doi.org/10.3233/hsn-2001-200.

Full text
Abstract:
The Internet is emerging as the new universal telecommunication medium. IP over WDM has been envisioned as one of the most attractive architectures for the new Internet. Consequently survivability is a crucial concern in designing IP over WDM networks. This paper presents a survey of the survivability mechanisms for IP over WDM networks and thus is intended to provide a summary of what has been done in this area and help further research. A number of optical layer protection techniques have been discussed. They are examined from the point of view of cost, complexity, and application. Survivability techniques are being made available at multiple layers of the network. This paper also studies the recovery features of each network layer and explains the impact of interaction between layers on survivability. The advantages and issues of multi‐layer survivability have been identified. The main idea is that the optical layer can provide fast protection while the higher layers can provide intelligent restoration. With this idea in mind, a new scheme of carrying IP over WDM using MPLS or Multi Protocol Lambda‐Switching has been discussed. Finally, an architecture is suggested by means of which the optical layer can perform an automatic protection switch, with priority considerations with the help of signaling from the higher layers.
APA, Harvard, Vancouver, ISO, and other styles
43

Nathanael, Oliverio Theophilus, and Simeon Yuda Prasetyo. "Color and Attention for U : Modified Multi Attention U-Net for a Better Image Colorization." JOIV : International Journal on Informatics Visualization 8, no. 3 (2024): 1453. http://dx.doi.org/10.62527/joiv.8.3.1828.

Full text
Abstract:
Image colorization is a tedious task that requires creativity and understanding of the image context and semantic information. Many models have been made by harnessing various deep learning architectures to learn the plausible colorization. With the rapid discovery of new architecture and image generation techniques, more powerful options can be explored and improved for image colorization tasks. This research explores a new architecture to colorize an image by using pre-trained embeddings on U-Net combined with several attention modules across the model. Using embeddings from a pre-trained classifier provides a high-level feature extraction from the image. Conversely, multi-attention gives a little taste of image segmentation so that the model can distinguish objects in the image and further enhance the additional information given by the pre-trained embeddings. Adversarial training is also utilized as a normalization to make the generated image more realistic. This research preferred Parch GAN over base GAN as the discriminator model to ensure that the colorization has a consistent quality across all patches. The study shows that this U-Net modification can improve the generated image quality compared to a normal U-Net. The proposed architecture reaches an FID of 48.6253, SSIM of 0.8568, and PSNR of 19.7831 by only training it for 25 epochs; hence, this research offers another view of image colorization by using modules that are often used for image segmentation tasks.
APA, Harvard, Vancouver, ISO, and other styles
44

Chen, Kun, Xin Li, and Huaiqing Wang. "On the model design of integrated intelligent big data analytics systems." Industrial Management & Data Systems 115, no. 9 (2015): 1666–82. http://dx.doi.org/10.1108/imds-03-2015-0086.

Full text
Abstract:
Purpose – Although big data analytics has reaped great business rewards, big data system design and integration still face challenges resulting from the demanding environment, including challenges involving variety, uncertainty, and complexity. These characteristics in big data systems demand flexible and agile integration architectures. Furthermore, a formal model is needed to support design and verification. The purpose of this paper is to resolve the two problems with a collective intelligence (CI) model. Design/methodology/approach – In the conceptual CI framework as proposed by Schut (2010), a CI design should be comprised of a general model, which has formal form for verification and validation, and also a specific model, which is an implementable system architecture. After analyzing the requirements of system integration in big data environments, the authors apply the CI framework to resolve the integration problem. In the model instantiation, the authors use multi-agent paradigm as the specific model, and the hierarchical colored Petri Net (PN) as the general model. Findings – First, multi-agent paradigm is a good implementation for reuse and integration of big data analytics modules in an agile and loosely coupled method. Second, the PN models provide effective simulation results in the system design period. It gives advice on business process design and workload balance control. Third, the CI framework provides an incrementally build and deployed method for system integration. It is especially suitable to the dynamic data analytics environment. These findings have both theoretical and managerial implications. Originality/value – In this paper, the authors propose a CI framework, which includes both practical architectures and theoretical foundations, to solve the system integration problem in big data environment. It provides a new point of view to dynamically integrate large-scale modules in an organization. This paper also has practical suggestions for Chief Technical Officers, who want to employ big data technologies in their companies.
APA, Harvard, Vancouver, ISO, and other styles
45

Guptha, Sharada, and Murundi N. Eshwarappa. "Breast cancer detection through attention based feature integration model." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 2 (2024): 2254. http://dx.doi.org/10.11591/ijai.v13.i2.pp2254-2264.

Full text
Abstract:
<span lang="EN-US">Breast cancer is detected by screening mammography wherein X-rays are used to produce images of the breast. Mammograms for screening can detect breast cancer early. This research focuses on the challenges of using multi-view mammography to diagnose breast cancer. By examining numerous perspectives of an image, an attention-based feature-integration mechanism (AFIM) model that concentrates on local abnormal areas associated with cancer and displays the essential features considered for evaluation, analyzing cross-view data. This is segmented into two views the bi-lateral attention module (BAM) module integrates the left and right activation maps for a similar projection is used to create a spatial attention map that highlights the impact of asymmetries. Here the module's focus is on data gathering through medio-lateral oblique (MLO) and bilateral craniocaudal (CC) for each breast to develop an attention module. The proposed AFIM model generates using spatial attention maps obtained from the identical image through other breasts to identify bilaterally uneven areas and</span><span lang="EN-US">class activation map (CAM) generated from two similar breast images to emphasize the feature channels connected to a single lesion in a breast. AFIM model may easily be included in ResNet-style architectures to develop multi-view classification models.</span>
APA, Harvard, Vancouver, ISO, and other styles
46

S, Janakiraman. "AI-POWERED DEPTH ESTIMATION USING DEEP LEARNING." International Scientific Journal of Engineering and Management 04, no. 06 (2025): 1–9. https://doi.org/10.55041/isjem04525.

Full text
Abstract:
Abstract: Depth estimation is a crucial component in many computer vision domains, such as autonomous navigation, robotics, and augmented reality. This project investigates the use of deep learning to enhance depth prediction capabilities, aiming to deliver accurate and real-time 3D scene understanding. Utilizing advanced neural architectures like convolutional neural networks (CNNs), we introduce a novel approach for deriving depth information from either single-view or multi-view imagery. The proposed model effectively captures spatial context and depth indicators from extensive training datasets, leading to improved precision and resilience in complex environments. Through AI-driven adaptive learning, the system can generalize across varied scenes and challenging conditions. The results of this work are expected to drive advancements in immersive technologies, autonomous decision-making, and situational awareness in dynamic settings. Keywords: Enhanced Reality Visualization, Intelligent Learning Adaptation, Context-Aware Scene Interpretation, Artificial Intelligence-Based Frameworks, Evolving Real-World Settings
APA, Harvard, Vancouver, ISO, and other styles
47

Muhamad, Wardani, and Wawa Wikusna. "Software Architecture of E-assessment on Higher Education." IJAIT (International Journal of Applied Information Technology) 1, no. 02 (2017): 102. http://dx.doi.org/10.25124/ijait.v1i02.1030.

Full text
Abstract:
Computer technology has been used to support the learning process at the university. Learning process, generally involved students and teachers in order to learn about materials on subject courses and also evaluate student competencies regularly. Teachers can evaluate student competencies or knowledge by e-assessment. E-assessment is one of the domains of e-learning which involves the use computer in assessment, includes: setting, delivery, marking and reporting of assessments. The Major benefit of the e - assessment system is its flexibility in term of global access and devices used to access. When developing an e-assessment system, we have two focuses on multi-dimensional approach, such as user friendly and student centric nature. Because of its complexity, software architecture need to define so software developer will develop software properly. By designing software architecture, view of the system that includes the system components, the behavior of those components, and the ways the components interact could clearly define. Architecture Description Language (ADL) has been used to describe software, because it provides a concrete syntax and formal framework for characterizing architectures. As the result, the design of e-assessment system architecture can meet the needs of attribute quality. The use of notation to explain ADL is able to provide a complete description than simply explaining ADL is using text. Furthermore, the e-assessment system architecture design is expected to be used as a reference for software development in establishing an e-assessment system.
APA, Harvard, Vancouver, ISO, and other styles
48

Chen, Hanyue, Wenjiang Huang, Wang Li, Zheng Niu, Liming Zhang, and Shihe Xing. "Estimation of LAI in Winter Wheat from Multi-Angular Hyperspectral VNIR Data: Effects of View Angles and Plant Architecture." Remote Sensing 10, no. 10 (2018): 1630. http://dx.doi.org/10.3390/rs10101630.

Full text
Abstract:
View angle effects present in crop canopy spectra are critical for the retrieval of the crop canopy leaf area index (LAI). In the past, the angular effects on spectral vegetation indices (VIs) for estimating LAI, especially in crops with different plant architectures, have not been carefully assessed. In this study, we assessed the effects of the view zenith angle (VZA) on relationships between the spectral VIs and LAI. We measured the multi-angular hyperspectral reflectance and LAI of two cultivars of winter wheat, erectophile (W411) and planophile (W9507), across different growing seasons. The reflectance of each angle was used to calculate a variety of VIs that have already been published in the literature as well as all possible band combinations of Normalized Difference Spectral Indices (NDSIs). The above indices, along with the raw reflectance of representative bands, were evaluated with measured LAI across the view zenith angle for each cultivar of winter wheat. Data analysis was also supported by the use of the PROSAIL (PROSPECT + SAIL) model to simulate a range of bidirectional reflectance. The study confirmed that the strength of linear relationships between different spectral VIs and LAI did express different angular responses depending on plant type. LAI–VI correlations were generally stronger in erectophile than in planophile wheat types, especially at the zenith angle where the background is expected to be more evident for erectophile type wheat. The band combinations and formulas of the indices also played a role in shaping the angular signatures of the LAI–VI correlations. Overall, off-nadir angles served better than nadir angle and narrow-band indices, especially NDSIs with combinations of a red-edge (700~720 nm) and a green band, were more useful for LAI estimation than broad-band indices for both types of winter wheat. But the optimal angles much differed between two plant types and among various VIs. High significance (R2 > 0.9) could be obtained by selecting appropriate VIs and view angles on both the backward and forward scattering direction. These results from the in-situ measurements were also corroborated by the simulation analysis using the PROSAIL model. For the measured datasets, the highest coefficient was obtained by NDSI(536,720) at −35° in the backward (R2 = 0.971) and NDSI(571,707) at 55° in the forward scattering direction (R2 = 0.984) for the planophile and erectophile varieties, respectively. This work highlights the influence of view geometry and plant architecture. The identification of crop plant type is highly recommended before using remote sensing VIs for the large-scale mapping of vegetation biophysical variables.
APA, Harvard, Vancouver, ISO, and other styles
49

Bhargava Pananthula, Manoj Kumar. "3D Image Reconstruction from Single 2D Image using Deep Learning." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem44955.

Full text
Abstract:
Abstract— Accurate 3D reconstruction from 2D images plays a critical role in various applications including medical imaging, robotics, autonomous navigation, and augmented reality. Traditional reconstruction techniques often require multiple viewpoints or sensor setups, limiting their feasibility in resource-constrained environments. In this work, we propose a deep learning-based monocular 3D reconstruction pipeline that generates high-quality 3D models from a single RGB image. The core of this framework lies in a custom U-Net++ architecture, designed and trained on the NYU Depth V2 dataset for robust depth estimation. This model is evaluated against state-of-the-art alternatives including MiDaS (DPT-Hybrid), Depth Anything V2, and GLPN to assess its performance across accuracy, efficiency, generalization, and visualization quality. The proposed pipeline performs image preprocessing, depth map prediction, and 3D point cloud generation using Open3D, followed by mesh reconstruction techniques like Poisson Surface Reconstruction. The evaluation metrics include MSE, SSIM, PSNR, and R² Score for depth maps, alongside qualitative analysis of 3D reconstruction quality. Comparative results demonstrate that while GLPN yields the most consistent performance, the Custom U-Net++ model achieves competitive accuracy with significantly improved efficiency and adaptability, making it suitable for real-time or domain-specific deployments. This research highlights the potential of lightweight, custom-designed architectures for scalable and robust single-view 3D reconstruction. Future directions include multi-view integration, dataset expansion, and enhancing interpretability through uncertainty estimation techniques. Keywords— Monocular Depth Estimation, 3D Reconstruction, U-Net++, MiDaS, GLPN, Deep Learning, Point Clouds, Open3D.
APA, Harvard, Vancouver, ISO, and other styles
50

Dongarra, Jack, Laura Grigori, and Nicholas J. Higham. "Numerical algorithms for high-performance computational science." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 378, no. 2166 (2020): 20190066. http://dx.doi.org/10.1098/rsta.2019.0066.

Full text
Abstract:
A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!