Academic literature on the topic 'Descripteur audio'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Descripteur audio.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Descripteur audio"

1

Li, Francis F. "Soft-Computing Audio Classification as a Pre-Processor for Automated Content Descriptor Generation." International Journal of Computer and Communication Engineering 3, no. 2 (2014): 101–4. http://dx.doi.org/10.7763/ijcce.2014.v3.300.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wątrobiński, Damian. "Dylemat audiodeskryptora w procesie przekładu audiowizualnego." Investigationes Linguisticae 39 (May 31, 2019): 140–50. http://dx.doi.org/10.14746/il.2018.39.11.

Full text
Abstract:
The aim of this article is to point out the dilemma concerning the transmission of emotions which faces the audio descriptor when creating audio description for painting reproductions. It will indicate the importance of visual forms, dependencies between the translator, translation and emotions, as well as different approaches to the creation of audio description, with particular emphasis on the transfer of emotions. The theoretical considerations will be supplemented by a qualitative analysis of two emotionally colored audio descriptions.
APA, Harvard, Vancouver, ISO, and other styles
3

Moore, Austin. "Dynamic Range Compression and the Semantic Descriptor Aggressive." Applied Sciences 10, no. 7 (March 30, 2020): 2350. http://dx.doi.org/10.3390/app10072350.

Full text
Abstract:
In popular music productions, the lead vocal is often the main focus of the mix and engineers will work to impart creative colouration onto this source. This paper conducts listening experiments to test if there is a correlation between perceived distortion and the descriptor “aggressive”, which is often used to describe the sonic signature of Universal Audio 1176, a much-used dynamic range compressor in professional music production. The results from this study show compression settings that impart audible distortion are perceived as aggressive by the listener, and there is a strong correlation between the subjective listener scores for distorted and aggressive. Additionally, it was shown there is a strong correlation between compression settings rated with high aggressive scores and the audio feature roughness.
APA, Harvard, Vancouver, ISO, and other styles
4

Bloit, Julien, Nicolas Rasamimanana, and Frédéric Bevilacqua. "Modeling and segmentation of audio descriptor profiles with segmental models." Pattern Recognition Letters 31, no. 12 (September 2010): 1507–13. http://dx.doi.org/10.1016/j.patrec.2009.11.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Peng, Yu Qing, Wei Liu, Cui Cui Zhao, and Tie Jun Li. "Detection of Violent Video with Audio-Visual Features Based on MPEG-7." Applied Mechanics and Materials 411-414 (September 2013): 1002–7. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.1002.

Full text
Abstract:
In order to solve the problem that there isn’t an effective way to detect the violent video in the network, a new method using MPEG-7 audio and visual features to detect violent video was put forward. In feature extraction, the new method targeted chosen the features about audio, color, space, time, motion. Parts of MPEG-7 descriptors were added and improved: instantaneous feature of audio was added, motion intensity descriptor was customized, and a new method to extract dominant color of video was proposed. BP neural network optimized by GA was used to fuse the features. Experiment shows that these selected features are representative, discriminative and can reduce the data redundancy. Fusion model of neural network is more robust. And the method of fusing audio and visual features improves the recall and precision of video detecting.
APA, Harvard, Vancouver, ISO, and other styles
6

Wu, Pingping, Hong Liu, Xiaofei Li, Ting Fan, and Xuewu Zhang. "A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive Decision Fusion." IEEE Transactions on Multimedia 18, no. 3 (March 2016): 326–38. http://dx.doi.org/10.1109/tmm.2016.2520091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

XIE, ZHIBING, and LING GUAN. "MULTIMODAL INFORMATION FUSION OF AUDIO EMOTION RECOGNITION BASED ON KERNEL ENTROPY COMPONENT ANALYSIS." International Journal of Semantic Computing 07, no. 01 (March 2013): 25–42. http://dx.doi.org/10.1142/s1793351x13400023.

Full text
Abstract:
This paper focuses on the application of novel information theoretic tools in the area of information fusion. Feature transformation and fusion is critical for the performance of information fusion, however, the majority of the existing works depend on second order statistics, which is only optimal for Gaussian-like distribution. In this paper, the integration of information fusion techniques and kernel entropy component analysis provides a new information theoretic tool. The fusion of features is realized using descriptor of information entropy and is optimized by entropy estimation. A novel multimodal information fusion strategy of audio emotion recognition based on kernel entropy component analysis (KECA) has been presented. The effectiveness of the proposed solution is evaluated through experimentation on two audiovisual emotion databases. Experimental results show that the proposed solution outperforms the existing methods, especially when the dimension of feature space is substantially reduced. The proposed method offers general theoretical analysis which gives us an approach to implement information theory into multimedia research.
APA, Harvard, Vancouver, ISO, and other styles
8

Nanni, Loris, Sheryl Brahnam, Alessandra Lumini, and Gianluca Maguolo. "Animal Sound Classification Using Dissimilarity Spaces." Applied Sciences 10, no. 23 (November 30, 2020): 8578. http://dx.doi.org/10.3390/app10238578.

Full text
Abstract:
The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using four different backbones with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. The proposed approach uses clustering methods to determine a set of centroids (in both a supervised and unsupervised fashion) from the spectrograms in the dataset. Such centroids are exploited to generate the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, experiments process the spectrograms using the heterogeneous auto-similarities of characteristics. Once the similarity spaces are computed, each pattern is “projected” into the space to obtain a vector space representation; this descriptor is then coupled to a support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best standalone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset.
APA, Harvard, Vancouver, ISO, and other styles
9

Castro, F. M., M. J. Marín-Jiménez, N. Guil Mata, and R. Muñoz-Salinas. "Fisher Motion Descriptor for Multiview Gait Recognition." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 01 (January 2017): 1756002. http://dx.doi.org/10.1142/s021800141756002x.

Full text
Abstract:
The goal of this paper is to identify individuals by analyzing their gait. Instead of using binary silhouettes as input data (as done in many previous works) we propose and evaluate the use of motion descriptors based on densely sampled short-term trajectories. We take advantage of state-of-the-art people detectors to define custom spatial configurations of the descriptors around the target person, obtaining a rich representation of the gait motion. The local motion features (described by the Divergence-Curl-Shear descriptor [M. Jain, H. Jegou and P. Bouthemy, Better exploiting motion for better action recognition, in Proc. IEEE Conf. Computer Vision Pattern Recognition (CVPR) (2013), pp. 2555–2562.]) extracted on the different spatial areas of the person are combined into a single high-level gait descriptor by using the Fisher Vector encoding [F. Perronnin, J. Sánchez and T. Mensink, Improving the Fisher kernel for large-scale image classification, in Proc. European Conf. Computer Vision (ECCV) (2010), pp. 143–156]. The proposed approach, coined Pyramidal Fisher Motion, is experimentally validated on ‘CASIA’ dataset [S. Yu, D. Tan and T. Tan, A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition, in Proc. Int. Conf. Pattern Recognition, Vol. 4 (2006), pp. 441–444]. (parts B and C), ‘TUM GAID’ dataset, [M. Hofmann, J. Geiger, S. Bachmann, B. Schuller and G. Rigoll, The TUM Gait from Audio, Image and Depth (GAID) database: Multimodal recognition of subjects and traits, J. Vis. Commun. Image Represent. 25(1) (2014) 195–206]. ‘CMU MoBo’ dataset [R. Gross and J. Shi, The CMU Motion of Body (MoBo) database, Technical Report CMU-RI-TR-01-18, Robotics Institute (2001)]. and the recent ‘AVA Multiview Gait’ dataset [D. López-Fernández, F. Madrid-Cuevas, A. Carmona-Poyato, M. Marín-Jiménez and R. Muñoz-Salinas, The AVA multi-view dataset for gait recognition, in Activity Monitoring by Multiple Distributed Sensing, Lecture Notes in Computer Science (Springer, 2014), pp. 26–39]. The results show that this new approach achieves state-of-the-art results in the problem of gait recognition, allowing to recognize walking people from diverse viewpoints on single and multiple camera setups, wearing different clothes, carrying bags, walking at diverse speeds and not limited to straight walking paths.
APA, Harvard, Vancouver, ISO, and other styles
10

Yang, Ming Liang, and Wei Ping Ding. "The Exploration of Evaluation Method about the Driving Electromotor Acoustic Comfort of the Pure Electric Vehicles." Applied Mechanics and Materials 224 (November 2012): 113–18. http://dx.doi.org/10.4028/www.scientific.net/amm.224.113.

Full text
Abstract:
The driving electromotor noise of a pure electric bus was taken as the evaluation object in this paper. The noise signals were gathered by dual channels and to simulating human auditory by synthetic stereo, and were processed into a series of noise samples for human subjective testing generated according to the 3dB differential progressive attenuation of noise sound pressure level. Then the author investigated the human body comfort/discomfort subjective feelings under various noise samples through the high fidelity audio playback, described the subjective feelings with ‘descriptor’, and quantified the subjective feelings with scores at the same time. On this basis, the correlation of subjective feelings between acoustic comfort and discomfort were revealed, and the noise sample sets corresponding with comfort feeling were found out. Based on these, an evaluation method of electromotor acoustic comfort was established.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Descripteur audio"

1

Tardieu, Damien. "Modèles d'instruments pour l'aide à l'orchestration." Paris 6, 2008. http://www.theses.fr/2008PA066522.

Full text
Abstract:
Cette thèse traite de la conception de modèles d'instruments dédiés à l'orchestration assistée par ordinateur. L'orchestration est définie comme suit : Trouver les combinaisons de sons instrumentaux dont le timbre se rapproche le plus possible d'un son cible fournit par l'utilisateur. Les connaissances du système sur le timbre des instruments sont extraites de bases d'échantillons. Tout d'abord, un ensemble de descripteurs du timbre des instruments est présenté. Ensuite nous proposons un modèle probabiliste de ces descripteurs. Ce modèle est basé sur une taxinomie des modes de jeu instrumentaux et sur l'approximation de la distribution des descripteurs des sons instrumentaux par un produit de lois normales. Les paramètres des modèles sont appris sur les bases d'échantillons. À partir des modèles d'instruments, nous déduisons le modèle d'une combinaison pour évaluer la ressemblance entre cette combinaison et le son cible.
APA, Harvard, Vancouver, ISO, and other styles
2

Bloit, Julien. "Intéraction musicale et geste sonore : modélisation temporelle de descripteurs audio." Paris 6, 2010. http://www.theses.fr/2010PA066614.

Full text
Abstract:
Ce travail de thèse traite de la modélisation de sons instrumentaux dans un contexte d'interaction musicale entre un interprète et une partie électronique. Lorsque cette interaction implique l'extraction d'informations symboliques, les modèles existants font le plus souvent l'hypothèse que le signal est structuré en notes, définies par leurs valeurs de hauteur, durée et intensité. Cependant, cette représentation ne permet pas de rendre compte de vocabulaires instrumentaux plus contemporains qui, à travers l'emploi de modes de jeu particuliers, explorent d'autres dimensions musicales, notamment timbrales et temporelles. Plutôt que d'entreprendre la modélisation exhaustive des sonorités instrumentales contemporaines, nous proposons de considérer qu'un vocabulaire de gestes sonores peut être représenté par une combinaison de profils caractéristiques sur plusieurs dimensions perceptives. Un geste sonore est alors modélisé par des trajectoires sur plusieurs flux de descripteurs audio qui approchent ces dimensions. Dans un cadre Bayésien, nous étudions une modélisation en plusieurs flux, capable de prendre en compte l'asynchronie entre plusieurs processus cachés, ainsi que la dépendance statistique entre descripteurs. Sur chaque flux, nous proposons ensuite de modéliser les trajectoires avec des modèles segmentaux dont la structure permet de mieux rendre compte des durées et des corrélations entre observations successives que les modèles dont les observations se limitent à l'échelle d'une trame temporelle. Nous examinons ensuite le lien entre topologie de modèle et décodage en temps réel, notamment en termes de compromis précision/latence.
APA, Harvard, Vancouver, ISO, and other styles
3

Coleman, Graham Keith. "Descriptor control of sound transformations and mosaicing synthesis." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/392138.

Full text
Abstract:
Sampling, as a musical or synthesis technique, is a way to reuse recorded musical expressions. In this dissertation, several ways to expand sampling synthesis are explored, especially mosaicing synthesis, which imitates target signals by transforming and compositing source sounds, in the manner of a mosaic made of broken tile. One branch of extension consists of the automatic control of sound transformations towards targets defined in a perceptual space. The approach chosen uses models that predict how the input sound will be transformed as a function of the selected parameters. In one setting, the models are known, and numerical search can be used to find sufficient parameters; in the other, they are unknown and must be learned from data. Another branch focuses on the sampling itself. By mixing multiple sounds at once, perhaps it is possible to make better imitations, e.g. in terms of the harmony of the target. However, using mixtures leads to new computational problems, especially if properties like continuity, important to high quality sampling synthesis, are to be preserved. A new mosaicing synthesizer is presented which incorporates all of these elements: supporting automatic control of sound transformations using models, mixtures supported by perceptually relevant harmony and timbre descriptors, and preservation of continuity of the sampling context and transformation parameters. Using listening tests, the proposed hybrid algorithm was compared against classic and contemporary algorithms, and the hybrid algorithm performed well on a variety of quality measures.
El mostreig, com a tècnica musical o de síntesi, és una manera de reutilitzar expressions musicals enregistrades. En aquesta dissertació s’exploren estratègies d’ampliar la síntesi de mostreig, sobretot la síntesi de “mosaicing”. Aquesta última tracta d’imitar un senyal objectiu a partir d’un conjunt de senyals font, transformant i ordenant aquests senyals en el temps, de la mateixa manera que es faria un mosaic amb rajoles trencades. Una d’aquestes ampliacions de síntesi consisteix en el control automàtic de transformacions de so cap a objectius definits a l’espai perceptiu. L’estratègia elegida utilitza models que prediuen com es transformarà el so d’entrada en funció d’uns paràmetres seleccionats. En un cas, els models són coneguts, i cerques númeriques es poden fer servir per trobar paràmetres suficients; en l’altre, els models són desconeguts i s’han d’aprendre a partir de les dades. Una altra ampliació es centra en el mostreig en si. Mesclant múltiples sons a la vegada, potser és possible fer millors imitacions, més específicament millorar l’harmonia del resultat, entre d’altres. Tot i així, utilitzar múltiples mescles crea nous problemes computacionals, especialment si propietats com la continuïtat, important per a la síntesis de mostreig d’alta qualitat, han de ser preservades. En aquesta tesi es presenta un nou sintetitzador mosaicing que incorpora tots aquests elements: control automàtic de transformacions de so fent servir models, mescles a partir de descriptors d’harmonia i timbre perceptuals, i preservació de la continuïtat del context de mostreig i dels paràmetres de transformació. Fent servir proves d’escolta, l’algorisme híbrid proposat va ser comparat amb algorismes clàssics i contemporanis: l’algorisme híbrid va donar resultats positius a una varietat de mesures de qualitat.
APA, Harvard, Vancouver, ISO, and other styles
4

Essid, Slim. "Classification automatique des signaux audio-fréquences : reconnaissance des instruments de musique." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2005. http://pastel.archives-ouvertes.fr/pastel-00002738.

Full text
Abstract:
L'objet de cette thèse est de contribuer à améliorer l'identification automatique des instruments de musique dans des contextes réalistes, (sur des solos de musique, mais également sur des pièces multi-instrumentales). Nous abordons le problème suivant une approche de classification automatique en nous efforçant de rechercher des réalisations performantes des différents modules constituant le système que nous proposons. Nous adoptons un schéma de classification hiérarchique basé sur des taxonomies des instruments et des mélanges d'instruments. Ces taxonomies sont inférées au moyen d'un algorithme de clustering hiérarchique exploitant des distances probabilistes robustes qui sont calculées en utilisant une méthode à noyau. Le système exploite un nouvel algorithme de sélection automatique des attributs pour produire une description efficace des signaux audio qui, associée à des machines à vecteurs supports, permet d'atteindre des taux de reconnaissance élevés sur des pièces sonores reflétant la diversité de la pratique musicale et des conditions d'enregistrement rencontrées dans le monde réel. Notre architecture parvient ainsi à identifier jusqu'à quatre instruments joués simultanément, à partir d'extraits de jazz incluant des percussions.
APA, Harvard, Vancouver, ISO, and other styles
5

Ramona, Mathieu. "Classification automatique de flux radiophoniques par Machines à Vecteurs de Support." Phd thesis, Télécom ParisTech, 2010. http://pastel.archives-ouvertes.fr/pastel-00529331.

Full text
Abstract:
Nous présentons ici un système de classification audio parole/musique tirant parti des excellentes propriétés statistiques des Machines à Vecteurs de Support. Ce problème pose les trois questions suivantes : comment exploiter efficacement les SVM, méthode d'essence discriminatoire, sur un problème à plus de deux classes, comment caractériser un signal audio de manière pertinente, et enfin comment traiter l'aspect temporel du problème ? Nous proposons un système hybride de classification multi-classes tirant parti des approches un-contre-un et par dendogramme, et permettant l'estimation de probabilités a posteriori. Ces dernières sont exploitées pour l'application de méthodes de post-traitement prenant en compte les interdépendances entre trames voisines. Nous proposons ainsi une méthode de classification par l'application de Modèles de Markov Cachés (HMM) sur les probabilités a posteriori, ainsi qu'une approche basée sur la détection de rupture entre segments au contenu acoustique "homogène". Par ailleurs, la caractérisation du signal audio étant opérée par une grande collection des descripteurs audio, nous proposons de nouveaux algorithmes de sélection de descripteurs basés sur le récent critère d'Alignement du noyau ; critère que nous avons également exploité pour la sélection de noyau dans le processus de classification. Les algorithmes proposés sont comparés aux méthodes les plus efficaces de l'état de l'art auxquelles elles constituent une alternative pertinente en termes de coût de calcul et de stockage. Le système construit sur ces contributions a fait l'objet d'une participation à la campagne d'évaluation ESTER 2, que nous présentons, accompagnée de nos résultats.
APA, Harvard, Vancouver, ISO, and other styles
6

Roche, Fanny. "Music sound synthesis using machine learning : Towards a perceptually relevant control space." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT034.

Full text
Abstract:
Un des enjeux majeurs du marché des synthétiseurs et de la recherche en synthèse sonore aujourd'hui est de proposer une nouvelle forme de synthèse permettant de générer des sons inédits tout en offrant aux utilisateurs de nouveaux contrôles plus intuitifs afin de les aider dans leur recherche de sons. En effet, les synthétiseurs sont actuellement des outils très puissants qui offrent aux musiciens une large palette de possibilités pour la création de textures sonores, mais également souvent très complexes avec des paramètres de contrôle dont la manipulation nécessite généralement des connaissances expertes. Cette thèse s'intéresse ainsi au développement et à l'évaluation de nouvelles méthodes d'apprentissage machine pour la synthèse sonore permettant la génération de nouveaux sons de qualité tout en fournissant des paramètres de contrôle pertinents perceptivement.Le premier challenge que nous avons relevé a donc été de caractériser perceptivement le timbre musical synthétique en mettant en évidence un jeu de descripteurs verbaux utilisés fréquemment et de manière consensuelle par les musiciens. Deux études perceptives ont été menées : un test de verbalisation libre qui nous a permis de sélectionner huit termes communément utilisés pour décrire des sons de synthétiseurs, et une analyse à échelles sémantiques permettant d'évaluer quantitativement l'utilisation de ces termes pour caractériser un sous-ensemble de sons, ainsi que d'analyser leur "degré de consensualité".Dans un second temps, nous avons exploré l'utilisation d'algorithmes d'apprentissage machine pour l'extraction d'un espace de représentation haut-niveau avec des propriétés intéressantes d'interpolation et d'extrapolation à partir d'une base de données de sons, le but étant de mettre en relation cet espace avec les dimensions perceptives mises en évidence plus tôt. S'inspirant de précédentes études sur la synthèse sonore par apprentissage profond, nous nous sommes concentrés sur des modèles du type autoencodeur et avons réalisé une étude comparative approfondie de plusieurs types d'autoencodeurs sur deux jeux de données différents. Ces expériences, couplées avec une étude qualitative via un prototype non temps-réel développé durant la thèse, nous ont permis de valider les autoencodeurs, et en particulier l'autoencodeur variationnel (VAE), comme des outils bien adaptés à l'extraction d'un espace latent de haut-niveau dans lequel il est possible de se déplacer de manière continue et fluide en créant de tous nouveaux sons. Cependant, à ce niveau, aucun lien entre cet espace latent et les dimensions perceptives mises en évidence précédemment n'a pu être établi spontanément.Pour finir, nous avons donc apporté de la supervision au VAE en ajoutant une régularisation perceptive durant la phase d'apprentissage. En utilisant les échantillons sonores résultant du test perceptif avec échelles sémantiques labellisés suivant les huit dimensions perceptives, il a été possible de contraindre, dans une certaine mesure, certaines dimensions de l'espace latent extrait par le VAE afin qu'elles coïncident avec ces dimensions. Un test comparatif a été finalement réalisé afin d'évaluer l'efficacité de cette régularisation supplémentaire pour conditionner le modèle et permettre un contrôle perceptif (au moins partiel) de la synthèse sonore
One of the main challenges of the synthesizer market and the research in sound synthesis nowadays lies in proposing new forms of synthesis allowing the creation of brand new sonorities while offering musicians more intuitive and perceptually meaningful controls to help them reach the perfect sound more easily. Indeed, today's synthesizers are very powerful tools that provide musicians with a considerable amount of possibilities for creating sonic textures, but the control of parameters still lacks user-friendliness and may require some expert knowledge about the underlying generative processes. In this thesis, we are interested in developing and evaluating new data-driven machine learning methods for music sound synthesis allowing the generation of brand new high-quality sounds while providing high-level perceptually meaningful control parameters.The first challenge of this thesis was thus to characterize the musical synthetic timbre by evidencing a set of perceptual verbal descriptors that are both frequently and consensually used by musicians. Two perceptual studies were then conducted: a free verbalization test enabling us to select eight different commonly used terms for describing synthesizer sounds, and a semantic scale analysis enabling us to quantitatively evaluate the use of these terms to characterize a subset of synthetic sounds, as well as analyze how consensual they were.In a second phase, we investigated the use of machine learning algorithms to extract a high-level representation space with interesting interpolation and extrapolation properties from a dataset of sounds, the goal being to relate this space with the perceptual dimensions evidenced earlier. Following previous studies interested in using deep learning for music sound synthesis, we focused on autoencoder models and realized an extensive comparative study of several kinds of autoencoders on two different datasets. These experiments, together with a qualitative analysis made with a non real-time prototype developed during the thesis, allowed us to validate the use of such models, and in particular the use of the variational autoencoder (VAE), as relevant tools for extracting a high-level latent space in which we can navigate smoothly and create new sounds. However, so far, no link between this latent space and the perceptual dimensions evidenced by the perceptual tests emerged naturally.As a final step, we thus tried to enforce perceptual supervision of the VAE by adding a regularization during the training phase. Using the subset of synthetic sounds used in the second perceptual test and the corresponding perceptual grades along the eight perceptual dimensions provided by the semantic scale analysis, it was possible to constraint, to a certain extent, some dimensions of the VAE high-level latent space so as to match these perceptual dimensions. A final comparative test was then conducted in order to evaluate the efficiency of this additional regularization for conditioning the model and (partially) leading to a perceptual control of music sound synthesis
APA, Harvard, Vancouver, ISO, and other styles
7

Jui-Yu, Lee, and 李瑞育. "Music Identification Using MPEG-7 Audio Descriptor." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/94294621961519732529.

Full text
Abstract:
碩士
國立臺北科技大學
資訊工程系碩士班
92
Unlike MPEG-1, MPEG-2, or MPEG-4, MPEG-7 focuses on describing the multimedia data instead of compressing multimedia data. Using MPEG-7, one can create a multimedia database to ease the search of multimedia content. This thesis investigates the use of the Audio Signature Descriptor in MPEG-7 Audio part for music identification. In order to evaluate the discrimination power of the descriptors, the test sound tracks are distorted due to cropping, resampling, perceptual audio coding(MPEG-1 Layer-3 coded Signal, 96Kbps/stereo, 128Kbps/stereo, 192Kbps/stereo), volume change, and adding noise. In terms of fast search methods, we discuss how to reduce the computational complexity using a multi-resolution scheme. The experimental results show that the proposal approach is promising.
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Wei-Hua, and 陳威華. "Music Retrieval System Using MPEG-7 Audio Descriptor." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/y2d9h7.

Full text
Abstract:
碩士
國立臺北科技大學
資訊工程系研究所
95
In this thesis, we propose a musical retrieval system. The main concept is to identify whether one piece of sound track is the same as another one in the song database by using MPEG-7 audio descriptor. However, the practicability of this system is based on whether it has some efficient searching method. If the comparison between query song and songs in database costs too much time, it will decrease system’s practicability. Based on Audio Signature Descriptor, we propose some methods about dimension reduction and the use of KD-tree for multidimensional nearest neighbor searching. It decreases the overall comparison time to increases practical value of our system. We also use some methods to improve system’s false alarm rate (i.e., decrease FAR and FRR) and benchmark those methods by ROC graph. Finally, we use multi-resolution search to implement our system.
APA, Harvard, Vancouver, ISO, and other styles
9

Lin, Yu-Chu, and 林祐竹. "Performance Evaluation of a Musical Retrieval System based on MPEG-7 Audio Signature Descriptor for Mobile Phone Recorded Audio." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49f57m.

Full text
Abstract:
碩士
國立臺北科技大學
資訊工程系研究所
102
This thesis studies the performance of a music database system, which accepts mobile-phone recorded audio as the query, based on MPEG-7 audio signature descriptors. In this study, we firstly investigate the possibility of convolving room impulse response with the reference audio to replace the mobile-phone recorded audio. By comparing the waveforms, we conclude that this approach is highly possible.. We next add environmental noise to the simulated recorded audio as the test audio to examine various strategies to improve the identification accuracy. Simulation results reveal that filtering on the frequency-axis provides higher accuracy for noisy environment. Next, we find that comparing 8 to 12 subbands are sufficient. Our last experiment concerns the accuracy versus the number of (dimension-reduced) descriptors. The results show that the identification accuracy dramatically reduced if the number of dimension-reduced features below a certain level.
APA, Harvard, Vancouver, ISO, and other styles
10

Hung, Ming-Jen, and 洪名人. "Music identification and retrieval using MPEG-7 audio signature descriptor with dimensionality reduction by ICA and factor analysis." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/5c88r4.

Full text
Abstract:
碩士
國立臺北科技大學
資訊工程系研究所
98
The thesis studies the use of MPEG-7 audio descriptor as features for music retrieval system. Since the MPEG-7 audio descriptors representing a song contains a large number of data points, this thesis uses ICA (Independent Component Analysis) and FA(Factor Analysis) for dimension reduction to reduce the search time and to maintain a good recognition rate. In addition, we also test the indentification capabilities of the audio descriptors with speaker-mic recorded music and music with artifical noise. To be a retrieval system, we also propose a method to determine whether a piece of music is in the database or not.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Descripteur audio"

1

Luo, Ruwei, and Yun Cheng. "The algorithm of descriptor based on LPP and SIFT." In 2014 International Conference on Audio, Language and Image Processing (ICALIP). IEEE, 2014. http://dx.doi.org/10.1109/icalip.2014.7009795.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Karpaka Murthy, M., S. Seetha, and Flavio L. C. Padua. "Generating MPEG 7 audio descriptor for content-based retrieval." In 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 2011. http://dx.doi.org/10.1109/raics.2011.6069356.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Peng, Yong Kang, Yi Lai Zhang, Xi En Cheng, Yi Cheng Li, and Shi Dong Zhao. "An Object Detection Method Based on the Joint Feature of the H-S Color Descriptor and the SIFT Feature." In 2018 International Conference on Audio, Language and Image Processing (ICALIP). IEEE, 2018. http://dx.doi.org/10.1109/icalip.2018.8455641.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jia-Ching Wang, Jhing-Fa Wang, Kuok Wai He, and Cheng-Shu Hsu. "Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor." In The 2006 IEEE International Joint Conference on Neural Network Proceedings. IEEE, 2006. http://dx.doi.org/10.1109/ijcnn.2006.246644.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography