Dissertations / Theses: 'Digital audio : Signal processing'

1

Bland, Denise. "Alias-free signal processing of nonuniformly sampled signals." Thesis, University of Westminster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322992.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Lindström, Fredric. "Digital signal processing methods and algorithms for audio conferencing systems /." Karlskrona : Department of Signal Processing, School of Engineering, Blekinge Institute of Technology, 2007. http://www.bth.se/fou/Forskinfo.nsf/allfirst2/9cc008f2fa400e82c12572bb00331533?OpenDocument.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Balraj, Navaneethakrishnan. "AUTOMATED ACCIDENT DETECTION IN INTERSECTIONS VIA DIGITAL AUDIO SIGNAL PROCESSING." MSSTATE, 2003. http://sun.library.msstate.edu/ETD-db/theses/available/etd-10212003-102715/.

Full text

Abstract:

The aim of this thesis is to design a system for automated accident detection in intersections. The input to the system is a three-second audio signal. The system can be operated in two modes: two-class and multi-class. The output of the two-class system is a label of ?crash? or ?non-crash?. In the multi-class system, the output is the label of ?crash? or various non-crash incidents including ?pile drive?, ?brake?, and ?normal-traffic? sounds. The system designed has three main steps in processing the input audio signal. They are: feature extraction, feature optimization and classification. Five different methods of feature extraction are investigated and compared; they are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstrum transform and Mel frequency cepstral transform. Linear discriminant analysis (LDA) is used to optimize the features obtained in the feature extraction stage by linearly combining the features using different weights. Three types of statistical classifiers are investigated and compared: the nearest neighbor, nearest mean, and maximum likelihood methods. Data collected from Jackson, MS and Starkville, MS and the crash signals obtained from Texas Transportation Institute crash test facility are used to train and test the designed system. The results showed that the wavelet based feature extraction method with LDA and maximum likelihood classifier is the optimum design. This wavelet-based system is computationally inexpensive compared to other methods. The system produced classification accuracies of 95% to 100% when the input signal has a signal-to-noise-ratio of at least 0 decibels. These results show that the system is capable of effectively classifying ?crash? or ?non-crash? on a given input audio signal.

APA, Harvard, Vancouver, ISO, and other styles

4

Amphlett, Robert W. "Multiprocessor techniques for high quality digital audio." Thesis, University of Bristol, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.337273.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Ekström, Mattias. "Acoustic feedback suppression in audio mixer for PA applications." Thesis, Umeå universitet, Institutionen för fysik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136841.

Full text

Abstract:

When a speaker is addressing an audience, a PA system consisting of a microphone and a loudspeaker is often used. If the microphone picks up too much of the loudspeaker energy, acoustic feedback in the form of an unwanted characteristic howling can occur. Limes Audio is a software company that specializes in improving sound quality in digital communications, mainly conference telephony, and has developed a reference product, the Magneto mixer, to demonstrate the capability of their software TrueVoice. The company now wishes to expand the field of usage for the Magneto mixer to enable it to work as a microphone mixer in PA scenarios, and for this, a feedback suppression feature is needed. This master’s thesis aims at surveying the market and the literature in the field and specifying the requirements for a feedback suppression feature. Three methods for suppressing howling feedback are evaluated through simulations and compared in terms of maximum stable gain (MSG) and subjective listening experience. The method that performed the best based on these criteria was acoustic feedback cancellation with a 5 Hz frequency shift on the loudspeaker signal. This method makes use of an adaptive filter to model the acoustic feedback path and to remove the feedback component from the microphone signal. In the simulations, the method was able to increase the stable gain by approximately 10 dB while maintaining a good sound quality.
När en talare talar för en publik används ofta ett PA system bestående av en mikrofon och en högtalare. Om mikrofonen tar upp för mycket av ljudet från högtalaren finns en överhängande risk för akustisk rundgång i form av ett karaktäristiskt oönskat tjut. Limes Audio är ett företag som utvecklar mjukvara för att förbättra ljudkvaliten i digital kommunikation, främst inom konferenstelefoni. De har utvecklat en demonstrationsprodukt, Magnetomixern, som kan användas som en konferenstelefon för att demonstrera deras programvara TrueVoice. Företaget önskar nu utveckla Magnetomixern till att även fungera som en ljudmixer för PA-scenarion, eller konferenstelefoni där intern ljudförstärkning i rummet behövs, och för detta behövs en funktion för att ta bort eventuell rundgång. Detta examensarbete har som mål att lägga grunden för en sådan funktion i Magnetomixern genom att undersöka marknaden och litteraturen på området. Tre metoder för att eliminera rundgång utvärderas i simuleringar och jämförs beträffande maximal stabil förstärkning (MSG) och subjektiv ljudkvalitet. Metoden ”Acoustic feedback cancellation” tillsammans med ett 5 Hz frekvensskifte på högtalarsignalen gav högst MSG och bäst ljudkvalitet. Metoden använder ett adaptivt filter för att approximera den akustiska återkopplingsvägen mellan högtalare och mikrofon samt tar bort rundgångskomponenter från mikrofonsignalen. I simuleringarna kunde metoden öka den maximala stabila förstärkningen med upp till 10 dB medan en god ljudkvalitet på talet bibehölls.

APA, Harvard, Vancouver, ISO, and other styles

6

Chiu, Leung Kin. "Efficient audio signal processing for embedded systems." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44775.

Full text

Abstract:

We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.

APA, Harvard, Vancouver, ISO, and other styles

7

Lipstreu, William F. "Digital Signal Processing Laboratory Using Real-Time Implementations of Audio Applications." Cleveland, Ohio : Case Western Reserve University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=case1240836810.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lanciani, Christopher A. "Compressed-domain processing of MPEG audio signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13760.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Vemulapalli, Smita. "Audio-video based handwritten mathematical content recognition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45958.

Full text

Abstract:

Recognizing handwritten mathematical content is a challenging problem, and more so when such content appears in classroom videos. However, given the fact that in such videos the handwritten text and the accompanying audio refer to the same content, a combination of video and audio based recognizer has the potential to significantly improve the content recognition accuracy. This dissertation, using a combination of video and audio based recognizers, focuses on improving the recognition accuracy associated with handwritten mathematical content in such videos. Our approach makes use of a video recognizer as the primary recognizer and a multi-stage assembly, developed as part of this research, is used to facilitate effective combination with an audio recognizer. Specifically, we address the following challenges related to audio-video based handwritten mathematical content recognition: (1) Video Preprocessing - generates a timestamped sequence of segmented characters from the classroom video in the face of occlusions and shadows caused by the instructor, (2) Ambiguity Detection - determines the subset of input characters that may have been incorrectly recognized by the video based recognizer and forwards this subset for disambiguation, (3) A/V Synchronization - establishes correspondence between the handwritten character and the spoken content, (4) A/V Combination - combines the synchronized outputs from the video and audio based recognizers and generates the final recognized character, and (5) Grammar Assisted A/V Based Mathematical Content Recognition - utilizes a base mathematical speech grammar for both character and structure disambiguation. Experiments conducted using videos recorded in a classroom-like environment demonstrate the significant improvements in recognition accuracy that can be achieved using our techniques.

APA, Harvard, Vancouver, ISO, and other styles

11

Vercellesi, G. "Digital Audio Processing in MP3 Compressed Domain and Evaluation of Perceived Audio Quality." Doctoral thesis, Università degli Studi di Milano, 2006. http://hdl.handle.net/2434/36412.

Full text

Abstract:

The state of the art provides several digital audio signal processing in uncompressed domain (PCM Pulse Code Modulation). We can found several works in the literature which explain different methods to modify an audio signal both in the time and in the frequency domain, in order to normalize the intensity, or to apply filters, special effects and so on. Currently the MP3 format has not been deeply considered by literature. The most meaningfully works are related to MP1 and MP2 formats. There is not a exhaustive formalization of the digital audio signal processing in the MP3 compressed domain. Furthermore, there is not a software framework which allow to develop and implement every kind of processing algorithm in MP3 compressed domain. There are only some simple software which directly split and wrap MP3 files and process the volume in a very simple way. In this dissertation we define different approaches to develop every kind of algorithm for digital signal processing in the MP3 compressed domain. The contributions of this dissertation are the formalization the problem of MP3 direct processing defining different approaches (or levels), with respect to the various steps of the decoding/encoding phases, the development of algorithms for the MP3 format working as nearest as possible to the MP3 domain, and the improvement and the customization of methods and protocol described in the recommendation of the International Telecommunication Union (ITU-R) to evaluate the objective and subjective perceived audio quality. We define three different domain where it is possible to manage MP3-coded audio information. We develope algorithms to moving the frame, control the gain by RMS, filter and the channel selection. Filters and channel selection have been developed to downgrade MP3 files. For each algorithm we have chosen the best approach, finding the best trade-off among time consumption, perceived audio quality and problems related to unmasking and aliasing. This formalization represents the base concepts for the development of a software framework which allows the implementation of every kind of algorithm from the PCMdomain to the MP3-domain. Finally we improve and customize the methods and the protocol to evaluate the objective and subjective perceived audio quality, described in the recommendation of the International Telecommunication Union (ITU-R). We evaluate the objective performance of modern MP3 codec with respect to tandem coding. We study the level of reliable of objective tests, comparing them with the subjective. We compare the MP3-coded audio processed both following the traditional and the direct approach to editing.

APA, Harvard, Vancouver, ISO, and other styles

12

TERENZI, Alessandro. "Innovative Digital Signal Processing Methodologies for Identification and Analysis of Real Audio Systems." Doctoral thesis, Università Politecnica delle Marche, 2021. http://hdl.handle.net/11566/287822.

Full text

Abstract:

Esistono molti sistemi audio reali e ciascuno ha le proprie caratteristiche ma tutti sono accomunati dal fatto che sono sistemi in grado di generare o modificare un suono. Se un sistema naturale o artificiale può essere definito come sistema sonoro, allora è possibile applicare le tecniche del digital signal processing per studiare il sistema ed emularne il comportamento. In questo lavoro di tesi si propone di introdurre delle metodologie innovative di processamento del segnale applicate ad alcuni sistemi sonori reali. In particolare, vengono studiati e discussi tre diversi sistemi: il mondo dei dispositivi non lineari basati su valvole, con particolare attenzione agli amplificatori per chitarra e hi-fi, l'ambiente acustico di una stanza ed il suo effetto sulla propagazione del suono ed infine il suono emesso dalle api in un alveare. Per quanto riguarda il primo sistema, vengono proposti dei contributi innovativi per l'identificazione di modelli come la serie di Volterra ed il modello di Hammerstein; in particolare viene discusso un approccio per superare alcune limitazioni dell'identificazione tramite serie di Volterra e l'applicazione di una struttura in sottobande per ridurre il costo computazionale e incrementare la velocità di convergenza di un algoritmo adattativo per l'identificazione del modello di Hammerstein. In ultima analisi, viene proposto un approccio innovativo in grado di stimare con una singola misura vari parametri di distorsione sfruttando un modello di Hammerstein generallizato. Per quanto riguarda il secondo ambito, vengono proposti i risultati relativi a due applicazioni di equalizzazione multipunto: nel primo caso si mostrerà come l'equalizzazione può essere usata non solo per compensare le anomalie sonore generate all'interno di una stanza, ma anche per migliorare la risposta in frequenza di particolari trasduttori a vibrazione ancorati ad un pannello rigido; nel secondo caso si illustra come un approccio in sottobande può migliorare l'efficienza computazionale e la velocità di un algoritmo di equalizzazione adattativo multipunto e multicanale. Infine, viene presentato un sistema sonoro naturale, ovvero quello generato da un alveare. In questo caso si illustrerà un sistema di acquisizione innovativo sviluppato per monitorare gli alveari con particolare attenzione al suono; succesivamente si mostrano gli approcci messi a punto per analizzare il suono registrato in due condizioni reali ed infine verranno si illustrano i risultati ottenuti grazie allo studio del suono usando algoritmi di classificazione. Inoltre, nella parte finale dell'elaborato sono presenti dei contributi secondari ma che hanno comunque come focus principale il signal processing applicato ad ambienti acustici reali, in particolare si discute di un'implementazione di un algoritmo di cancellazione attiva del rumore e di due algoritmi per effetti digitali in cui il primo migliora le performance sonore di altoparlanti compatti, ed il secondo genera un effetto stereofonico per chitarra elettrica.
Many real word audio systems exist, each has its own characteristics but almost all of them can be identified from the fact that they are able to generate or modify a sound. If a natural or artificial system can be defined as a sound system, then it is possible to apply the techniques of digital signal processing for the studying and the emulation of the system. In this thesis, innovative methodologies for digital signal processing applied to real audio systems will be discussed. In particular, three different audio systems will be considered: the world of vacuum-based non linear audio devices with particular attention to guitar and hi-fi amplifiers; the room acoustic environment and its effect on the sound propagation; and finally the sound emitted by honey bees in a beehive. Regarding the first system, innovative approaches for the identification of the Volterra series and Hammerstein models will be proposed, in particular an approach to overcome some limitation of Volterra series identification. The application of a sub-band structure to reduce the computational cost and increase the convergence speed of an adaptive Hammerstein model identification will be proposed as well. Finally, an innovative approach for the measurement of several distortion parameters using a single measure, exploiting a generalized Hammerstein model, will be presented. For the second system, the results of the application of a multi-point equalizer to two different situations will be exposed. In particular, in the first case, it will be shown how a multi-point equalization can be used not only to compensate the acoustical anomalies of a room, but also to improve the frequency response of vibrating transducers mounted on a rigid surface. The second contribution will show how a sub-band approach can be used to improve the computational cost and the speed of an adaptive algorithm for a multi-point and multi channel equalizer. At the end, the focus will be on a natural sound system, i.e., a honey bees colony. In this case, an innovative acquisition system for honey bees sound monitoring will be presented. Then, the approaches developed for sound analysis will be exposed and applied to the recorded sounds in two different situations. Finally, the obtained results, achieved with the application of classification algorithms, will be exposed. In the final part of the work some minor contributions still related to signal processing applied to real sound systems are presented. In particular, an implementation of an active noise control system is discussed, and two algorithms for digital effects where the former improves the sound performances of compact loudspeakers and the latter generates a stereophonic effect for electric guitars are exposed.

APA, Harvard, Vancouver, ISO, and other styles

13

Yu, Jie. "Design and analysis of fixed and adaptive sigma-delta modulators." Thesis, King's College London (University of London), 1992. https://kclpure.kcl.ac.uk/portal/en/theses/design-and-analysis-of-fixed-and-adaptive-sigmadelta-modulators(6013d6b6-09fe-46bf-bd4b-5499cc30f4dc).html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Linton, Ken N. "Digital mixing consoles : parallel architectures and taskforce scheduling strategies." Thesis, Durham University, 1995. http://etheses.dur.ac.uk/5371/.

Full text

Abstract:

This thesis is concerned specifically with the implementation of large-scale professional DMCs. The design of such multi-DSP audio products is extremely challenging: one cannot simply lash together n DSPs and obtain /7-times the performance of a sole device. M-P models developed here show that topology and IPC mechanisms have critical design implications. Alternative processor technologies are investigated with respect to the requirements of DMC architectures. An extensive analysis of M-P topologies is undertaken using the metrics provided by the TPG tool. Novel methods supporting DSP message-passing connectivity lead to the development of a hybrid audio M-P (HYMIPS) employing these techniques. A DMC model demonstrates the impact of task allocation on ASP M-P architectures. Five application-specific heuristics and four static-labelling schemes are developed for scheduling console taskforces on M-Ps. An integrated research framework and DCS engine enable scheduling strategies to be analysed with regard to the DMC problem domain. Three scheduling algorithms — CPM, DYN and AST — and three IPC mechanisms — FWE, NSL and NML — are investigated. Dynamic-labelling strategies and mix-bus granularity issues are further studied in detail. To summarise, this thesis elucidates those topologies, construction techniques and scheduling algorithms appropriate to professional DMC systems.

APA, Harvard, Vancouver, ISO, and other styles

15

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text

Abstract:

Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.

APA, Harvard, Vancouver, ISO, and other styles

16

Jacobs, Deon. "Digital pulse width modulation for Class-D audio amplifiers." Thesis, Stellenbosch : University of Stellenbosch, 2006. http://hdl.handle.net/10019.1/1574.

Full text

Abstract:

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006.
Digital audio data storage mediums have long been used within the consumer market. Today, because of the advancement of processor clock speeds and increased MOSFET switching capabilities, digital audio data formats can be directly amplified using power electronic inverters. These amplifiers known as Class-D have an advantage over there analogue counterparts because of their high efficiency. This thesis deals with the signal processing algorithms necessary to convert the digital audio data obtained from the source to a digital pulse width modulated signal which controls a full bridge inverter for audio amplification. These algorithms address difficulties experienced in the past which prevented high fidelity digital pulse width modulators to be implemented. The signal processing algorithms are divided into modular blocks, each of which are defined in theory, designed and simulated in Matlab® and then implemented within VHDL firmware. These firmware blocks are then used to realize a Class-D audio amplifier.

APA, Harvard, Vancouver, ISO, and other styles

17

Rocha, Ryan D. "A Frequency-Domain Method for Active Acoustic Cancellation of Known Audio Sources." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1240.

Full text

Abstract:

Active noise control (ANC) is a real-time process in which a system measures an external, unwanted sound source and produces a canceling waveform. The cancellation is due to destructive interference by a perfect copy of the received signal phase-shifted by 180 degrees. Existing active noise control systems process the incoming and outgoing audio on a sample-by-sample basis, requiring a high-speed digital signal processor (DSP) and analog-to-digital converters (ADCs) with strict timing requirements on the order of tens of microseconds. These timing requirements determine the maximum sample rate and bit size as well as the maximum attenuation that the system can achieve. In traditional noise cancellation systems, the general assumption is that all unwanted sound is indeterminate. However, there are many instances in which an unwanted sound source is predictable, such as in the case of a song. This thesis presents a method for active acoustic cancellation of a known audio signal using the frequency characteristics of the known audio signal compared to that of a sampled, filtered excerpt of the same known audio signal. In this procedure, we must first correctly locate the sample index for which a measured audio excerpt begins via the cross-correlation function. Next, we obtain the frequency characteristics of both the known source (WAVE file of the song) and the measured unwanted audio by taking the Fast Fourier Transform (FFT) of each signal, and calculate the effective environmental transfer function (degradation function) by taking the ratio of the two complex frequency-domain results. Finally, we attempt to recreate the environmental audio from the known data and produce an inverted, synchronized, and amplitude-matched signal to cancel the audio via destructive interference. Throughout the process, we employ many signal conditioning methods such as FIR filtering, median filtering, windowing, and deconvolution. We illustrate this frequency-domain method in Native Instruments’ LabVIEW running on the Windows operating system, and discuss its reliability, areas for improvement, and potential future applications in mobile technologies. We show that under ideal conditions (unwanted sound is a known white noise source, and microphone, loudspeaker, and environmental filter frequency responses are all perfectly flat), we can achieve a theoretical maximum attenuation of approximately 300 dB. If we replace the white noise source with an actual song and the environmental filter with a low-order linear filter, then we can achieve maximum attenuation in the range of 50-70 dB. However, in a real-world environment, with additional noise and imperfect microphones, speakers, synchronization, and amplitude-matching, we can expect to see attenuation values in the range of 10-20 dB.

APA, Harvard, Vancouver, ISO, and other styles

18

Amatriain, Xavier. "An Object-oriented metamodel for digital signal processing with a focus on audio and music." Doctoral thesis, Universitat Pompeu Fabra, 2005. http://hdl.handle.net/10803/667051.

Full text

Abstract:

Classical models for information transmission such as Shannon and Weaver's still tend to be looked at as the only possible scenarios where signal processing applications can be formally modeled. Meanwhile, other disciplines like Computer Science have developed different paradigms that offer the possibility of looking at the same problem from a different perspective. One of the most favored approaches for software analysis and design is the Object Oriented paradigm, which proposes to model a system in terms of objects and relations between objects. An object is an instance of a real world or abstract concept and it is made up of an identity, a state, and a behavior. An object oriented system is thus described in terms of its internal objects, messages that are passed in between them and the way these objects respond to incoming messages by executing a particular method. Although object oriented technologies have been applied to signal processing systems, no previous comprehensive approach has been made to translate all the advantages and consequences, both practical and formal, of this paradigm to the signal processing domain. This dissertation defends the thesis that a generic signal processing system can be thoroughly and effectively described using the object oriented paradigm. For doing so, the Digital Signal Processing Object Oriented Metamodel offers a classification of signal processing objects in terms of their role in a DSP system. Objects are classified into two main categories: objects that process and objects that act as data containers. This 00 metamodel turns out to be closely related to Dataflow Process Networks, a graphical model of computation that has already proven useful for modeling signal processing systems. In our study we highlight the similarities of both models to conclude that object-orientation is in fact a superset of process-oriented models and therefore the object-oriented paradigm can be proposed as a general approach for system modeling. Furthermore, it turns out that nowadays the natural target for many signal processing applications is the computer and its software environment and the objectoriented paradigm becomes a natural conceptual framework where the different development phases fit. CLAM (C++ Library for Audio and Music) is a framework for developing music and audio applications that has been designed bearing this conceptual model in mind. CLAM is both the origin and the proof of concept of the Metamodel. On one hand its design process and rationale has led to the definition of the metamodel. On the other hand, it demonstrates that the metamodel proposed is more than an abstract wish-list and can be used to model working and efficient applications in the music and audio domain. The basic Object Oriented metamodel for signal processing systems can be extended to include the idea of Content Based Processing. 00 concepts like Inheritance Hierarchies, Polymorphism or Late Binding can be used to model run-time classification of media objects and to deal with the semantic information present in the signal rather than just treating the signal itself. This leads us to the definition of a new metamodel of information transmission that, unlike the traditional ones, does care about meaning. Finally, the 00 paradigm can also be used to model higher-level symbolic domains related to signal processing. For example, music (as a whole) can be effectively modeled using the 00 paradigm. An 00 model for music is proposed as an instance of the basic signal processing metamodel and the MetriX language is presented as its proof of concept.
Els models clàssics de transmissió de Ia informació com el de Shannon i Weaver encara se solen considerar com els únics escenaris possibles en els que aplicacions de processament del senyal es poden modelar formalment. Mentrestant, altres disciplines com Ia Informàtica han desenvolupat paradigmes diferents que ofereixen Ia possibilitat de mirar el mateix problema des d'una perspectiva different. Una de les aproximacions més utilitzades per anàlisi i disseny de programari és el paradigma Orientat a I'Objecte, el qual proposa modelar un sistema en objectes i relacions entre objectes. Un objecte és una instància de un concepte abstracte o del món real que està composat d'una identitat, un estat i un comportament. D'aquesta manera un sistema orientat a l'objecte es descriu en funció dels seus objectes interns, els missatges que es passen entre ells i Ia forma que aquests objectes responen als missatges entrants executant un mètode concret. Tot i que les tecnologies orientades a l'objecte s'han aplicat a sistemes de processament del senyal, no hi ha cap intent previ de traslladar tots els avantages i conseqüències, tant pràctiques com formals, d'aquest paradigma al domini del processament del senyal. Aquest treball defensa Ia tesi de que un sistema de processament del senyal genèric es pot descriure completament i de forma efectiva utilitzant el paradigma orientat a I'objecte. Per fer-ho, el Metamodel de Processament Digital del Senyal Orientat a l'Objecte ofereix una classificació d'objectes segons el seu rol en un sistema. Els objectes es classifiquen en dues categories principals: objectes que processen i objectes que actuen com a contenidors de dades. Aquest metamodel 00 resulta estar molt proper a les Xarxes de Processos amb Fluxe de Dades, un model gràfic de computació que ja ha mostrat Ia seva utilitat per a modelar sistemes de processament del senyal. En el nostre estudi destaquem les similituds dels dos models per concloure que Ia orientació a l'objecte és de fet un supra conjunt dels models orientats al procés i que, par tant, el paradigma orientat a l'objecte pot ser proposat com una aproximació genèrica al modelatge de sistemes. A més a més, resulta que avui dia l'entorn destí de moltes aplicacions de processament del senyal és l'ordinador i el seu programari associat i el paradigma orientat a l'objecte esdevé un entorn conceptual natural on les diverses fases de desenvolupament s'adapten. CLAM (C++ Library for Audio and Music) és un entorn per a desenvolupar aplicacions d'àudio i música que s'ha dissenyat tenint en ment aquest model conceptual. CLAM és tant I'origen com Ia prova de concepte del Metamodel. Per una banda el seu procés de disseny ha conduit a Ia definició del metamodel. Per altra banda, demostra que el metamodel proposat és més que una Ilista de desitjos abstracta i que pot ser utilitzat per a modelar aplicacions pràctiques i eficients en el domini concret de l'àudio i de Ia música. EI metamodel bàsic de processament de senyal Orientat a l'Objecte es pot extendre per a incloure Ia idea de Processament Basat en el Contingut. Conceptes 00 com ara Jerarquies d'Herència, Polimorfisme o Enllaç Tardà es poden utilitzar per a modelar classificació en temps d'execució d'objectes media o per gestionar Ia informació semàntica present en el senyal, en comptes de només tractar el senyal en ell mateix. Això ens porta a Ia definició d'un nou metamodel de transmissió de Ia informació que, a diferència dels tradicionals, es preocupa del significat. Finalment, el paradigma 00 també es pot utilitzar per a modelar dominis simbòlics de més alt nivell relacionats amb el processament del senyal. Per exemple Ia música (en tot el seu abast) es pot modelar de forma efectiva utilitzant el paradigma 00. Es proposa un model 00 de Ia música com una instància del metamodel bàsic de processament del senyal, i el Ilenguatge MetriX es presenta com Ia seva prova de concepte.
Los modelos clásicos de transmisión de información com el de Shannon y Weaver todavía se suelen considerar como los únicos escenarios posibles en los que aplicaciones de procesado de señal se pueden modelar formalmente. Mientrastanto, otras disciplinas como la Informática han desarrollado paradigmas diferentes que ofrecen la posibilidad de mirar el mismo problema des de una perspectiva diferente. Una de las aproximaciones más utilizadas para el análisis y diseño de software es el paradigma Orientado a Objetos, el cual propone modelar un sistema en objetos y relaciones entre objectos. Un objeto es una instancia de un concepto abstracto o del mundo real compuesto de una identidad, un estado y un comportamiento. De este modo un sistema orientado a objetos se describe en función de sus objetos internos, los mensajes que se pasan entre ellos y la forma que estos objetos responden a los mensajes entrantes ejecutando un método concreto. Aunque las tecnologías orientadas a objectos se han aplicado a sistemas de procesado de señal, no hay ningún intento previo de trasladar todas las ventajas y consecuencias, tanto prácticas como formales, de este paradigma al dominio del procesado de señal. Este trabajo defiende la tesis de que un sistema de procesado de señal genérico se puede describir completamente y de forma efectiva utilizando el paradigma orientado a objetos. Para hacerlo, el Metamodelo de Procesado de Señal Orientado a Objetos ofrece una clasificación de objetos según su rol en un sistema. Los objetos se clasifican en dos categorías principales: objetos que procesan y objetos que actúan como contenedores de datos. Este metamodelo OO resulta estar muy cercano a las Redes De Procesos con Flujos de datos, un modelo gráfico de computación que ya ha mostrado su utilidad para modelar sistema de procesado de señal. En nuestro estudio destacamos las similitudes de los dos modelos para concluir que la orientación a objetos es de hecho un supra conjunto de los modelos orientados al proceso y que, por lo tanto, el paradigma orientado a objetos se puede proponer como una aproximación genérica al modelado de sistemas. Además, resulta que hoy en día el entorno destino de muchas aplicaciones de procesado de señal es el ordenador y su software asociado y el paradigma orientado a objetos resulta un entorno conceptual natural donde las diversas fases de desarrollo se adaptan. CLAM (C++ Library for Audio and Music) es un entorno para desarrollar aplicaciones de audio y música que se ha diseñado teniendo en mente este model conceptual. CLAM es tanto el origen como la prueba de concepto del Metamodelo. Por un lado su proceso de diseño ha conducido a la definición del metamodelo. Por otro lado, demuestra que el metamodelo propuesto es más que una lista de deseos abstracta y que puede ser utilizado para modelar aplicaciones prácticas y eficientes en el dominio concreto del audio y la música. El metamodelo básico de procesado de señal Orientado a Objetos se puede extender para incluir la idea de Procesado Basado en el Contenido. Conceptos 00 corn las Jerarquías de Herencia, el Polimorfismo o el Enlace Tardío se pueden utilizar para modelar la clasificación en tiempo de ejecución de objetos media o para gestionar la información semántica presente en la señal, en vez de tan sólo tratar la señal en ella misma. Esto nos lleva a la definición de un nuevo metamodelo de transmisión de la información que, a diferencia de los tradicionales, sí que se preocupa del significado. Finalmente, el paradigma 00 también se puede utilizar para modelar nuevos dominios simbbólicos de más alto nivel relacionados con el procesado de señal. Por ejempló, la música (en todo su alcance) se puede modelar de forma efectiva utilizando el paradigma 00. Se propone un modelo 00 de la música como instancia del metamodelo básico de procesado de señal, i el lenguaje MetriX se presenta como su prueba de concepto.

APA, Harvard, Vancouver, ISO, and other styles

19

Bengtsson, Fredrik, and Rikard Berglund. "Digital compensation of distortion in audio systems." Thesis, Linköping University, Department of Electrical Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-56392.

Full text

Abstract:

The advancements of computational power in low cost FPGAs are giving the opportunityto implement real-time compensation of loudspeakers and audio systems. The need for expensive commercial audio systems is reduced when the fidelity ofmuch cheaper audio systems easily can be improved by real-time compensation. The topic of this thesis is to investigate and evaluate methods for digital compensationof distortion in audio systems. More specifically, a VHDL module isimplemented to, when necessary, alleviate the problem of drastically deterioratingfidelity of the bass appearing when the input power is too high.

APA, Harvard, Vancouver, ISO, and other styles

20

Anantharaman, B. "Compressed Domain Processing of MPEG Audio." Thesis, Indian Institute of Science, 2001. http://hdl.handle.net/2005/68.

Full text

Abstract:

MPEG audio compression techniques significantly reduces the storage and transmission requirements for high quality digital audio. However, compression complicates the processing of audio in many applications. If a compressed audio signal is to be processed, a direct method would be to decode the compressed signal, process the decoded signal and re-encode it. This is computationally expensive due to the complexity of the MPEG filter bank. This thesis deals with processing of MPEG compressed audio. The main contributions of this thesis are a) Extracting wavelet coefficients in the MPEG compressed domain. b) Wavelet based pitch extraction in MPEG compressed domain. c) Time Scale Modifications of MPEG audio. d) Watermarking of MPEG audio. The research contributions starts with a technique for calculating several levels of wavelet coefficients from the output of the MPEG analysis filter bank. The technique exploits the toeplitz structure which arises when the MPEG and wavelet filter banks are represented in a matrix form, The computational complexity for extracting several levels of wavelet coefficients after decoding the compressed signal and directly from the output of the MPEG analysis filter bank are compared. The proposed technique is found to be computationally efficient for extracting higher levels of wavelet coefficients. Extracting pitch in the compressed domain becomes essential when large multimedia databases need to be indexed. For example one may be interested in listening to a particular speaker or to listen to male female audio segments in a multimedia document. For this application, pitch information is one of the very basic and important features required. Pitch is basically the time interval between two successive glottal closures. Glottal closures are accompanied by sharp transients in the speech signal which in turn gives rise to a local maxima in the wavelet coefficients. Pitch can be calculated by finding the time interval between two successive maxima in the wavelet coefficients. It is shown that the computational complexity for extracting pitch in the compressed domain is less than 7% of the uncompressed domain processing. An algorithm for extracting pitch in the compressed domain is proposed. The result of this algorithm for synthetic signals, and utterances of words by male/female is reported. In a number of important applications, one needs to modify an audio signal to render it more useful than its original. Typical applications include changing the time evolution of an audio signal (increase or decrease the rate of articulation of a speaker),or to adapt a given audio sequence to a given video sequence. In this thesis, time scale modifications are obtained in the subband domain such that when the modified subband signals are given to the MPEG synthesis filter bank, the desired time scale modification of the decoded signal is achieved. This is done by making use of sinusoidal modeling [I]. Here, each of the subband signal is modeled in terms of parameters such as amplitude phase and frequencies and are subsequently synthesised by using these parameters with Ls = k La where Ls is the length of the synthesis window , k is the time scale factor and La is the length of the analysis window. As the PCM version of the time scaled signal is not available, psychoacoustic model based bit allocation cannot be used. Hence a new bit allocation is done by using a subband coding algorithm. This method has been satisfactorily tested for time scale expansion and compression of speech and music signals. The recent growth of multimedia systems has increased the need for protecting digital media. Digital watermarking has been proposed as a method for protecting digital documents. The watermark needs to be added to the signal in such a way that it does not cause audible distortions. However the idea behind the lossy MPEC encoders is to remove or make insignificant those portions of the signal which does not affect human hearing. This renders the watermark insignificant and hence proving ownership of the signal becomes difficult when an audio signal is compressed. The existing compressed domain methods merely change the bits or the scale factors according to a key. Though simple, these methods are not robust to attacks. Further these methods require original signal to be available in the verification process. In this thesis we propose a watermarking method based on spread spectrum technique which does not require original signal during the verification process. It is also shown to be more robust than the existing methods. In our method the watermark is spread across many subband samples. Here two factors need to be considered, a) the watermark is to be embedded only in those subbands which will make the addition of the noise inaudible. b) The watermark should be added to those subbands which has sufficient bit allocation so that the watermark does not become insignificant due to lack of bit allocation. Embedding the watermark in the lower subbands would cause distortion and in the higher subbands would prove futile as the bit allocation in these subbands are practically zero. Considering a11 these factors, one can introduce noise to samples across many frames corresponding to subbands 4 to 8. In the verification process, it is sufficient to have the key/code and the possibly attacked signal. This method has been satisfactorily tested for robustness to scalefactor, LSB change and MPEG decoding and re-encoding.

APA, Harvard, Vancouver, ISO, and other styles

21

Trombley, Michael. "Design of a Programmable Four-Preset Guitar Pedal." Wright State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=wright1515591271810386.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Fan, Yun-Hui. "A stereo audio coder with a nearly constant signal-to-noise ratio." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/14788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Langelaar, Johannes, Mattsson Adam Strömme, and Filip Natvig. "Development of real time audio equalizer application using MATLAB App Designer." Thesis, Uppsala universitet, Signaler och System, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-388577.

Full text

Abstract:

This paper outlines the design of a high-precision graphic audio equalizer with digital filters in parallel, along with its implementation in MATLAB App Designer. The equalizer is comprised of 31 bands separated with a one-third octave frequency ratio, and its frequency response is controlled by 63 filters. Furthermore, the application can process audio signals, in real time, recorded by microphone and from audio files. While processing, it displays an FFT plot of the output sound, also in real time, equipped with a knob by which the refreshing pace can be adjusted. The actual frequency response proved to match the desired one accurately, but the matching is computationally demanding for the computer. An even higher accuracy would entail a computational complexity beyond the power of ordinary computers, and was thus concluded to be inappropriate. As a result, the final application manages to provide most laptops with both high precision and proper functionality.

APA, Harvard, Vancouver, ISO, and other styles

24

Townsend, Phil. "Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment." UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/645.

Full text

Abstract:

The Generalized Sidelobe Canceller is an adaptive algorithm for optimally estimating the parameters for beamforming, the signal processing technique of combining data from an array of sensors to improve SNR at a point in space. This work focuses on the algorithm’s application to widely-separated microphone arrays with irregular distributions used for human voice capture. Methods are presented for improving the performance of the algorithm’s blocking matrix, a stage that creates a noise reference for elimination, by proposing a stochastic model for amplitude correction and enhanced use of cross correlation for phase correction and time-difference of arrival estimation via a correlation coefficient threshold. This correlation technique is also applied to a multilateration algorithm for an efficient method of explicit target tracking. In addition, the underlying microphone array geometry is studied with parameters and guidelines for evaluation proposed. Finally, an analysis of the stability of the system is performed with respect to its adaptation parameters.

APA, Harvard, Vancouver, ISO, and other styles

25

Chavez, Rudy, Frank Favela, Adrian Ontiveros, Matthew Smith, and Matthew Wallace. "Design and Development of a Digital Signal Processing System that Responds Automatically to an Audio Trigger Event." International Foundation for Telemetering, 2013. http://hdl.handle.net/10150/579586.

Full text

Abstract:

ITC/USA 2013 Conference Proceedings / The Forty-Ninth Annual International Telemetering Conference and Technical Exhibition / October 21-24, 2013 / Bally's Hotel & Convention Center, Las Vegas, NV
This paper presents the development of a signal processing system that responds automatically to an audio trigger event. The audio trigger event, for example, can be a gun shot, and the system's response is to fire back at the source. The proposed system uses microcontrollers to digitally process audio signals coming from the audio trigger. Once the event is detected, the location of that source relative to the base location is estimated and retaliatory measures are automatically activated by the system. In our study, gunshot sounds are replaced by recorded audio tones and the retaliatory mechanism consists of a Nerf dart being fired toward the sound source. Sound localization is achieved via time stamping the digitized microphone signals. With an array of microphones, angular components as well as radial components can be determined. Servo motors are used to control the turret type mechanism for firing back Nerf darts to the source. The project has potentials for both lethal and non-lethal responses to a firearm discharge. The work is based on a 2013 senior undergraduate capstone project.

APA, Harvard, Vancouver, ISO, and other styles

26

Lenssen, Nathan. "Applications of Fourier Analysis to Audio Signal Processing: An Investigation of Chord Detection Algorithms." Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/cmc_theses/704.

Full text

Abstract:

The discrete Fourier transform has become an essential tool in the analysis of digital signals. Applications have become widespread since the discovery of the Fast Fourier Transform and the rise of personal computers. The field of digital signal processing is an exciting intersection of mathematics, statistics, and electrical engineering. In this study we aim to gain understanding of the mathematics behind algorithms that can extract chord information from recorded music. We investigate basic music theory, introduce and derive the discrete Fourier transform, and apply Fourier analysis to audio files to extract spectral data.

APA, Harvard, Vancouver, ISO, and other styles

27

Jackson, Judith. "Generative Processes for Audification." Oberlin College Honors Theses / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1528280288385596.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Prätzlich, Thomas [Verfasser], and Meinard [Gutachter] Müller. "Freischütz Digital: Processing Audio Signals in Complex Music Scenarios / Thomas Prätzlich ; Gutachter: Meinard Müller." Erlangen : Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2016. http://d-nb.info/1123284318/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Yoo, Heejong. "Low-Power Audio Input Enhancement for Portable Devices." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6821.

Full text

Abstract:

With the development of VLSI and wireless communication technology, portable devices such as personal digital assistants (PDAs), pocket PCs, and mobile phones have gained a lot of popularity. Many such devices incorporate a speech recognition engine, enabling users to interact with the devices using voice-driven commands and text-to-speech synthesis. The power consumption of DSP microprocessors has been consistently decreasing by half about every 18 months, following Gene's law. The capacity of signal processing, however, is still significantly constrained by the limited power budget of these portable devices. In addition, analog-to-digital (A/D) converters can also limit the signal processing of portable devices. Many systems require very high-resolution and high-performance A/D converters, which often consume a large fraction of the limited power budget of portable devices. The proposed research develops a low-power audio signal enhancement system that combines programmable analog signal processing and traditional digital signal processing. By utilizing analog signal processing based on floating-gate transistor technology, the power consumption of the overall system as well as the complexity of the A/D converters can be reduced significantly. The system can be used as a front end of portable devices in which enhancement of audio signal quality plays a critical role in automatic speech recognition systems on portable devices. The proposed system performs background audio noise suppression in a continuous-time domain using analog computing elements and acoustic echo cancellation in a discrete-time domain using an FPGA.

APA, Harvard, Vancouver, ISO, and other styles

30

Lapierre, Jimmy. "Approches paramétriques pour le codage audio multicanal." Mémoire, Université de Sherbrooke, 2007. http://savoirs.usherbrooke.ca/handle/11143/1355.

Full text

Abstract:

Résumé : Afin de répondre aux besoins de communication et de divertissement, il ne fait aucun doute que la parole et l’audio doivent être encodés sous forme numérique. En qualité CD, cela nécessite un débit numérique de 1411.2 kb/s pour un signal stéréo-phonique. Une telle quantité de données devient rapidement prohibitive pour le stockage de longues durées d’audio ou pour la transmission sur certains réseaux, particulièrement en temps réel (d’où l’adhésion universelle au format MP3). De plus, ces dernières années, la quantité de productions musicales et cinématographiques disponibles en cinq canaux et plus ne cesse d’augmenter. Afin de maintenir le débit numérique à un niveau acceptable pour une application donnée, il est donc naturel pour un codeur audio à bas débit d’exploiter la redondance entre les canaux et la psychoacoustique binaurale. Le codage perceptuel et plus particulièrement le codage paramétrique permet d’atteindre des débits manifestement inférieurs en exploitant les limites de l’audition humaine (étudiées en psychoacoustique). Cette recherche se concentre donc sur le codage paramétrique à bas débit de plus d’un canal audio. // Abstract : In order to fulfill our communications and entertainment needs, there is no doubt that speech and audio must be encoded in digital format. In"CD" quality, this requires a bit-rate of 1411.2 kb/s for a stereo signal. Such a large amount of data quickly becomes prohibitive for long-term storage of audio or for transmitting on some networks, especially in real-time (leading to a universal adhesion to the MP3 format). Moreover, throughout the course of these last years, the number of musical and cinematographic productions available in five channels or more continually increased.In order to maintain an acceptable bit-rate for any given application, it is obvious that a low bit-rate audio coder must exploit the redundancies between audio channels and binaural psychoacoustics. Perceptual audio coding, and more specifically parametric audio coding, offers the possibility of achieving much lower bit-rates by taking into account the limits of human hearing (psychoacoustics). Therefore, this research concentrates on parametric audio coding of more than one audio channel.

APA, Harvard, Vancouver, ISO, and other styles

31

Gál, Marek. "Univerzální měřicí rozhraní pro digitální audio signál." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-240887.

Full text

Abstract:

This master’s thesis deals with a modification of existing project which is used as a helpful tool for tracking and measuring digital audio interface I2S. The original design was created by Ing. Martin Stejskal, Polymorphic USB – I2S Interface. Modifications are based on practical one year experience when the device was tested and deals with new requirements for extension. This work describes and justify individual changes of hardware and software part of project.

APA, Harvard, Vancouver, ISO, and other styles

32

Markle, Blake L. "A comparative study of time-stretching algorithms for audio signals /." Thesis, McGill University, 2001. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=31119.

Full text

Abstract:

Algorithms exist which will perform independent transformations on frequency or duration of a digital audio signal. These processes have different results different types of audio signals. A comparative study of granular and phase vocoder algorithms, implementation, and their respective effects on audio signals was made to determine which algorithm is best suited to a particular type of audio signal.

APA, Harvard, Vancouver, ISO, and other styles

33

Mason, Michael. "Hybrid coding of speech and audio signals." Thesis, Queensland University of Technology, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

34

Streich, Sebastian. "Music complexity: a multi-faceted description of audio content." Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7545.

Full text

Abstract:

Esta tesis propone un juego de algoritmos que puede emplearse para computar estimaciones de las distintas facetas de complejidad que ofrecen señales musicales auditivas. Están enfocados en los aspectos de acústica, ritmo, timbre y tonalidad. Así pues, la complejidad musical se entiende aquí en el nivel más basto del común acuerdo entre oyentes humanos. El objetivo es obtener juicios de complejidad mediante computación automática que resulten similares al punto de vista de un oyente ingenuo. La motivación de la presente investigación es la de mejorar la interacción humana con colecciones de música digital. Según se discute en la tesis,hay toda una serie de tareas a considerar, como la visualización de una colección, la generación de listas de reproducción o la recomendación automática de música. A través de las estimaciones de complejidad musical provistas por los algoritmos descritos, podemos obtener acceso a un nivel de descripción semántica de la música que ofrecerá novedosas e interesantes soluciones para estas tareas.
This thesis proposes a set of algorithms that can be used to compute estimates of music complexity facets from musical audio signals. They focus on aspects of acoustics, rhythm, timbre, and tonality. Music complexity is thereby considered on the coarse level of common agreement among human listeners. The target is to obtain complexity judgments through automatic computation that resemble a naive listener's point of view. The motivation for the presented research lies in the enhancement of human interaction with digital music collections. As we will discuss, there is a variety of tasks to be considered, such as collection visualization, play-list generation, or the automatic recommendation of music. Through the music complexity estimates provided by the described algorithms we can obtain access to a level of semantic music description, which allows for novel and interesting solutions of these tasks.

APA, Harvard, Vancouver, ISO, and other styles

35

Lapierre, Jimmy. "Amélioration de codecs audio standardisés avec maintien de l'interopérabilité." Thèse, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/8816.

Full text

Abstract:

Résumé : L’audio numérique s’est déployé de façon phénoménale au cours des dernières décennies, notamment grâce à l’établissement de standards internationaux. En revanche, l’imposition de normes introduit forcément une certaine rigidité qui peut constituer un frein à l’amélioration des technologies déjà déployées et pousser vers une multiplication de nouveaux standards. Cette thèse établit que les codecs existants peuvent être davantage valorisés en améliorant leur qualité ou leur débit, même à l’intérieur du cadre rigide posé par les standards établis. Trois volets sont étudiés, soit le rehaussement à l’encodeur, au décodeur et au niveau du train binaire. Dans tous les cas, la compatibilité est préservée avec les éléments existants. Ainsi, il est démontré que le signal audio peut être amélioré au décodeur sans transmettre de nouvelles informations, qu’un encodeur peut produire un signal amélioré sans ajout au décodeur et qu’un train binaire peut être mieux optimisé pour une nouvelle application. En particulier, cette thèse démontre que même un standard déployé depuis plusieurs décennies comme le G.711 a le potentiel d’être significativement amélioré à postériori, servant même de cœur à un nouveau standard de codage par couches qui devait préserver cette compatibilité. Ensuite, les travaux menés mettent en lumière que la qualité subjective et même objective d’un décodeur AAC (Advanced Audio Coding) peut être améliorée sans l’ajout d’information supplémentaire de la part de l’encodeur. Ces résultats ouvrent la voie à davantage de recherches sur les traitements qui exploitent une connaissance des limites des modèles de codage employés. Enfin, cette thèse établit que le train binaire à débit fixe de l’AMR WB+ (Extended Adaptive Multi-Rate Wideband) peut être compressé davantage pour le cas des applications à débit variable. Cela démontre qu’il est profitable d’adapter un codec au contexte dans lequel il est employé.
Abstract : Digital audio applications have grown exponentially during the last decades, in good part because of the establishment of international standards. However, imposing such norms necessarily introduces hurdles that can impede the improvement of technologies that have already been deployed, potentially leading to a proliferation of new standards. This thesis shows that existent coders can be better exploited by improving their quality or their bitrate, even within the rigid constraints posed by established standards. Three aspects are studied, being the enhancement of the encoder, the decoder and the bit stream. In every case, the compatibility with the other elements of the existent coder is maintained. Thus, it is shown that the audio signal can be improved at the decoder without transmitting new information, that an encoder can produce an improved signal without modifying its decoder, and that a bit stream can be optimized for a new application. In particular, this thesis shows that even a standard like G.711, which has been deployed for decades, has the potential to be significantly improved after the fact. This contribution has even served as the core for a new standard embedded coder that had to maintain that compatibility. It is also shown that the subjective and objective audio quality of the AAC (Advanced Audio Coding) decoder can be improved, without adding any extra information from the encoder, by better exploiting the knowledge of the coder model’s limitations. Finally, it is shown that the fixed rate bit stream of the AMR-WB+ (Extended Adaptive Multi-Rate Wideband) can be compressed more efficiently when considering a variable bit rate scenario, showing the need to adapt a coder to its use case.

APA, Harvard, Vancouver, ISO, and other styles

36

El, Gemayel Tarek. "Feasibility of Using Electrical Network Frequency Fluctuations to Perform Forensic Digital Audio Authentication." Thèse, Université d'Ottawa / University of Ottawa, 2013. http://hdl.handle.net/10393/24383.

Full text

Abstract:

Extracting the Electric Network Frequency (ENF) fluctuations from an audio recording and comparing it to a reference database is a new technology intended to perform forensic digital audio authentication. The objective of this thesis is to implement and design a range of programs and algorithms for capturing and extracting ENF signals. The developed C-program combined with a probe can be used to build the reference database. Our implementation of the Short-Time Fourier Transform method is intended for the ENF extraction of longer signals while our novel proposed use of the Autoregressive parametric method and our implementation of the zero-crossing approach tackle the case of shorter recordings. A Graphical User Interface (GUI) was developed to facilitate the process of extracting the ENF fluctuations. The whole process is tested and evaluated for various scenarios ranging from long to short recordings.

APA, Harvard, Vancouver, ISO, and other styles

37

Zhao, Yue. "Independent Component Analysis Enhancements for Source Separation in Immersive Audio Environments." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/34.

Full text

Abstract:

In immersive audio environments with distributed microphones, Independent Component Analysis (ICA) can be applied to uncover signals from a mixture of other signals and noise, such as in a cocktail party recording. ICA algorithms have been developed for instantaneous source mixtures and convolutional source mixtures. While ICA for instantaneous mixtures works when no delays exist between the signals in each mixture, distributed microphone recordings typically result various delays of the signals over the recorded channels. The convolutive ICA algorithm should account for delays; however, it requires many parameters to be set and often has stability issues. This thesis introduces the Channel Aligned FastICA (CAICA), which requires knowledge of the source distance to each microphone, but does not require knowledge of noise sources. Furthermore, the CAICA is combined with Time Frequency Masking (TFM), yielding even better SOI extraction even in low SNR environments. Simulations were conducted for ranking experiments tested the performance of three algorithms: Weighted Beamforming (WB), CAICA, CAICA with TFM. The Closest Microphone (CM) recording is used as a reference for all three. Statistical analyses on the results demonstrated superior performance for the CAICA with TFM. The algorithms were applied to experimental recordings to support the conclusions of the simulations. These techniques can be deployed in mobile platforms, used in surveillance for capturing human speech and potentially adapted to biomedical fields.

APA, Harvard, Vancouver, ISO, and other styles

38

Leis, John W. "Spectral coding methods for speech compression and speaker identification." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36062/7/36062_Digitised_Thesis.pdf.

Full text

Abstract:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

APA, Harvard, Vancouver, ISO, and other styles

39

Hill, Adam J. "Analysis, modeling and wide-area spatiotemporal control of low-frequency sound reproduction." Thesis, University of Essex, 2012. http://hdl.handle.net/10545/230034.

Full text

Abstract:

This research aims to develop a low-frequency response control methodology capable of delivering a consistent spectral and temporal response over a wide listening area. Low-frequency room acoustics are naturally plagued by room-modes, a result of standing waves at frequencies with wavelengths that are integer multiples of one or more room dimension. The standing wave pattern is different for each modal frequency, causing a complicated sound field exhibiting a highly position-dependent frequency response. Enhanced systems are investigated with multiple degrees of freedom (independently-controllable sound radiating sources) to provide adequate low-frequency response control. The proposed solution, termed a chameleon subwoofer array or CSA, adopts the most advantageous aspects of existing room-mode correction methodologies while emphasizing efficiency and practicality. Multiple degrees of freedom are ideally achieved by employing what is designated a hybrid subwoofer, which provides four orthogonal degrees of freedom configured within a modest-sized enclosure. The CSA software algorithm integrates both objective and subjective measures to address listener preferences including the possibility of individual real-time control. CSAs and existing techniques are evaluated within a novel acoustical modeling system (FDTD simulation toolbox) developed to meet the requirements of this research. Extensive virtual development of CSAs has led to experimentation using a prototype hybrid subwoofer. The resulting performance is in line with the simulations, whereby variance across a wide listening area is reduced by over 50% with only four degrees of freedom. A supplemental novel correction algorithm addresses correction issues at select narrow frequency bands. These frequencies are filtered from the signal and replaced using virtual bass to maintain all aural information, a psychoacoustical effect giving the impression of low-frequency. Virtual bass is synthesized using an original hybrid approach combining two mainstream synthesis procedures while suppressing each method‟s inherent weaknesses. This algorithm is demonstrated to improve CSA output efficiency while maintaining acceptable subjective performance.

APA, Harvard, Vancouver, ISO, and other styles

40

Frenštátský, Petr. "Softwarový analyzátor zvukových efektů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-220634.

Full text

Abstract:

The utilisation of personal computers for a conditioning of audio devices has shown a significant increase, since the digital signal processing (DSP) was introduced. The expansion of the DSP has allowed implementing analyses to obtain frequency and linear characteristics, distortion parameters (THD, THD+N, WHD, SINAD), a rate of crosstalk or a signal-to-noise ratio. In this work a software analyser is developed, which is able to obtain qualitative parameters of hardware audio devices that are connected with a sound card. For an efficient communication between the sound card and the personal computer the ASIO driver is used. The application is capable to measure audio effects that are implemented in VST plug-ins. The software is developed in C++ language and the implemented analyses are based on the AES17 recommendation.

APA, Harvard, Vancouver, ISO, and other styles

41

Rášo, Ondřej. "Objektivní měření a potlačování šumu v hudebním signálu." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-233609.

Full text

Abstract:

The dissertation thesis focuses on objective assessment and reduction of disturbing background noise in a musical signal. In this work, a new algorithm for the assessment of background noise audibility is proposed. The listening tests performed show that this new algorithm better predicts the background noise audibility than the existing algorithms do. An advantage of this new algorithm is the fact that it can be used even in the case of a general audio signal and not only musical signal, i.e. in the case when the audibility of one sound on the background of another sound is assessed. The existing algorithms often fail in this case. The next part of the dissertation thesis deals with an adaptive segmentation scheme for the segmentation of long-term musical signals into short segments of different lengths. A new adaptive segmentation scheme is then introduced here. It has been shown that this new adaptive segmentation scheme significantly improves the subjectively perceived quality of the musical signal from the output of noise reduction systems which use this new adaptive segmentation scheme. The quality improvement is better than that achieved by other segmentation schemes tested.

APA, Harvard, Vancouver, ISO, and other styles

42

Hiljanen, Henric, and Jonathan Karlsson. "JUCE vs. FAUST : En jämförande studie i prestanda mellan plugins." Thesis, Tekniska Högskolan, Jönköping University, JTH, Datateknik och informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-50355.

Full text

Abstract:

Purpose – Examine if there is any difference in performance between the C++ framework JUCE and the domain-specific programming language FAUST to create a decision basis to facilitate choices between them when developing plugins. Method – An experimental study where two delay-plugins with identical functionality were developed and compared in latency, CPU load and memory usage. The experiment consisted of three test cases and were performed on three different computers. Findings – FAUST performed better than JUCE regarding latency and CPU load during the experiment. JUCE on the other hand performed better than FAUST regarding memory usage. Implications – This study has made it easier to make a decision based on performance when choosing between JUCE and FAUST regarding development of plugins. Limitations – Time restrictions has led to only comparing JUCE and FAUST, leaving other relevant alternatives aside. It has also led to only developing one type of plugin. The results of the study cannot be generalized or applied to other frameworks and programming languages whose purpose is to ease processing of digital signals.
Syfte – Undersöka om det är någon skillnad i prestanda mellan C++-ramverket JUCE och det domänspecifika programmeringsspråket FAUST för att skapa ett beslutsunderlag för att underlätta val mellan dem vid utveckling av plugins. Metod – En experimentell studie där två delay-plugins med identisk funktionalitet utvecklades och jämfördes i latency, CPU-belastning och minnesanvändning. Experimentet bestod av tre testfall och utfördes på tre olika datorer. Resultat – FAUST presterade bättre än JUCE gällande latency och CPU-belastning under experimentet. JUCE presterade däremot bättre gällande minnesanvändning. Implikationer – Denna studie har gjort det lättare att fatta ett beslut baserat på prestanda vid val mellan JUCE och FAUST beträffande utveckling av plugins. Begränsningar – Tidsbegränsningar har lett till att endast en jämförelse mellan JUCE och FAUST har genomförts. Andra relevanta alternativ har uteslutits på grund av detta. Det har också medfört att endast en typ av plugin har utvecklats. Studiens resultat kan inte tillämpas eller generaliseras till andra ramverk och domänspecifika programmeringsspråk vars syfte är att bearbeta digitala ljudsignaler.

APA, Harvard, Vancouver, ISO, and other styles

43

Colón, Guillermo J. "Avian musing feature space analysis." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44754.

Full text

Abstract:

The purpose of this study was to analyze the possibility of utilizing known signal processing and machine learning algorithms to correlate environmental data to chicken vocalizations. The specific musing to be analyzed consist of not just one chicken's vocalizations but of a whole collective, it therefore becomes a chatter problem. There have been similar attempts to create such a correlation in the past but with singled out birds instead of a multitude. This study was performed on broiler chickens (birds used in meat production). One of the reasons why this correlation is useful is for the purpose of an automated control system. Utilizing the chickens own vocalization to determine the temperature, the humidity, the levels of ammonia among other environmental factors, reduces, and might even remove, the need for sophisticated sensors. Another factor that this study wanted to correlate was stress in the chickens to their vocalization. This has great implications in animal welfare, to guarantee that the animals are being properly take care off. Also, it has been shown that the meat of non-stressed chickens is of much better quality than the opposite. The audio was filtered and certain features were extracted to predict stress. The features considered were loudness, spectral centroid, spectral sparsity, temporal sparsity, transient index, temporal average, temporal standard deviation, temporal skewness, and temporal kurtosis. In the end, out of all the features analyzed it was shown that the kurtosis and loudness proved to be the best features for identifying stressed birds in audio.

APA, Harvard, Vancouver, ISO, and other styles

44

Bayle, Yann. "Apprentissage automatique de caractéristiques audio : application à la génération de listes de lecture thématiques." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0087/document.

Full text

Abstract:

Ce mémoire de thèse de doctorat présente, discute et propose des outils de fouille automatique de mégadonnées dans un contexte de classification supervisée musical.L'application principale concerne la classification automatique des thèmes musicaux afin de générer des listes de lecture thématiques.Le premier chapitre introduit les différents contextes et concepts autour des mégadonnées musicales et de leur consommation.Le deuxième chapitre s'attelle à la description des bases de données musicales existantes dans le cadre d'expériences académiques d'analyse audio.Ce chapitre introduit notamment les problématiques concernant la variété et les proportions inégales des thèmes contenus dans une base, qui demeurent complexes à prendre en compte dans une classification supervisée.Le troisième chapitre explique l'importance de l'extraction et du développement de caractéristiques audio et musicales pertinentes afin de mieux décrire le contenu des éléments contenus dans ces bases de données.Ce chapitre explique plusieurs phénomènes psychoacoustiques et utilise des techniques de traitement du signal sonore afin de calculer des caractéristiques audio.De nouvelles méthodes d'agrégation de caractéristiques audio locales sont proposées afin d'améliorer la classification des morceaux.Le quatrième chapitre décrit l'utilisation des caractéristiques musicales extraites afin de trier les morceaux par thèmes et donc de permettre les recommandations musicales et la génération automatique de listes de lecture thématiques homogènes.Cette partie implique l'utilisation d'algorithmes d'apprentissage automatique afin de réaliser des tâches de classification musicale.Les contributions de ce mémoire sont résumées dans le cinquième chapitre qui propose également des perspectives de recherche dans l'apprentissage automatique et l'extraction de caractéristiques audio multi-échelles
This doctoral dissertation presents, discusses and proposes tools for the automatic information retrieval in big musical databases.The main application is the supervised classification of musical themes to generate thematic playlists.The first chapter introduces the different contexts and concepts around big musical databases and their consumption.The second chapter focuses on the description of existing music databases as part of academic experiments in audio analysis.This chapter notably introduces issues concerning the variety and unequal proportions of the themes contained in a database, which remain complex to take into account in supervised classification.The third chapter explains the importance of extracting and developing relevant audio features in order to better describe the content of music tracks in these databases.This chapter explains several psychoacoustic phenomena and uses sound signal processing techniques to compute audio features.New methods of aggregating local audio features are proposed to improve song classification.The fourth chapter describes the use of the extracted audio features in order to sort the songs by themes and thus to allow the musical recommendations and the automatic generation of homogeneous thematic playlists.This part involves the use of machine learning algorithms to perform music classification tasks.The contributions of this dissertation are summarized in the fifth chapter which also proposes research perspectives in machine learning and extraction of multi-scale audio features

APA, Harvard, Vancouver, ISO, and other styles

45

CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.

Full text

Abstract:

Tra i diversi campi di ricerca nell’ambito dell’informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalità di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all’informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di più, la recente ascesa dei algoritmi di l’apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell’apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera più precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l’apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull’utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi più adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l’audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull’estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l’aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piú ispirato dall’idea di ricerca e creazione. Primo, descriviamo l’architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata ægo, basata insieme sulla nostra libreria vsacids e sull’uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonché eventuale applicazione artistiche.
Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.

APA, Harvard, Vancouver, ISO, and other styles

46

Bianchi, André Jucovsky. "Processamento de áudio em tempo real em dispositivos computacionais de alta disponibilidade e baixo custo." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-23012014-190028/.

Full text

Abstract:

Neste trabalho foi feita uma investigação sobre a realização de processamento de áudio digital em tempo real utilizando três dispositivos com características computacionais fundamentalmente distintas porém bastante acessíveis em termos de custo e disponibilidade de tecnologia: Arduino, GPU e Android. Arduino é um dispositivo com licenças de hardware e software abertas, baseado em um microcontrolador com baixo poder de processamento, muito utilizado como plataforma educativa e artística para computações de controle e interface com outros dispositivos. GPU é uma arquitetura de placas de vídeo com foco no processamento paralelo, que tem motivado o estudo de modelos de programação específicos para sua utilização como dispositivo de processamento de propósito geral. Android é um sistema operacional para dispositivos móveis baseado no kernel do Linux, que permite o desenvolvimento de aplicativos utilizando linguagem de alto nível e possibilita o uso da infraestrutura de sensores, conectividade e mobilidade disponível nos aparelhos. Buscamos sistematizar as limitações e possibilidades de cada plataforma através da implementação de técnicas de processamento de áudio digital em tempo real e da análise da intensidade computacional em cada ambiente.
This dissertation describes an investigation about real time audio signal processing using three platforms with fundamentally distinct computational characteristics, but which are highly available in terms of cost and technology: Arduino, GPU boards and Android devices. Arduino is a device with open hardware and software licences, based on a microcontroller with low processing power, largely used as educational and artistic platform for control computations and interfacing with other devices. GPU is a video card architecture focusing on parallel processing, which has motivated the study of specific programming models for its use as a general purpose processing device. Android is an operating system for mobile devices based on the Linux kernel, which allows the development of applications using high level language and allows the use of sensors, connectivity and mobile infrastructures available on devices. We search to systematize the limitations and possibilities of each platform through the implementation of real time digital audio processing techinques and the analysis of computational intensity in each environment.

APA, Harvard, Vancouver, ISO, and other styles

47

Hammarqvist, Ulf. "Audio editing in the time-frequency domain using the Gabor Wavelet Transform." Thesis, Uppsala universitet, Centrum för bildanalys, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-153634.

Full text

Abstract:

Visualization, processing and editing of audio, directly on a time-frequency surface, is the scope of this thesis. More precisely the scalogram produced by a Gabor Wavelet transform is used, which is a powerful alternative to traditional techinques where the wave form is the main visual aid and editting is performed by parametric filters. Reconstruction properties, scalogram design and enhancements as well audio manipulation algorithms are investigated for this audio representation.The scalogram is designed to allow a flexible choice of time-frequency ratio, while maintaining high quality reconstruction. For this mean, the Loglet is used, which is observed to be the most suitable filter choice. Re-assignmentare tested, and a novel weighting function using partial derivatives of phase is proposed. An audio interpolation procedure is developed and shown to perform well in listening tests.The feasibility to use the transform coefficients directly for various purposes is investigated. It is concluded that Pitch shifts are hard to describe in the framework while noise thresh holding works well. A downsampling scheme is suggested that saves on operations and memory consumption as well as it speeds up real world implementations significantly. Finally, a Scalogram 'compression' procedure is developed, allowing the caching of an approximate scalogram.

APA, Harvard, Vancouver, ISO, and other styles

48

Ištvánek, Matěj. "Analýza interpretace hudby metodami číslicového zpracování signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400860.

Full text

Abstract:

This diploma thesis deals with methods of the onset and tempo detection in audio signals using specific techniques of digital processing. It analyzes and describes the issue from both the musical and the technical side. First, several implementations using different programming environments are tested. The system with the highest detection accuracy and adjustable parameters is selected, which is then used to test functionality on the reference database. Then, an extension of the algorithm based on the Teager-Kaiser energy operator in the preprocessing stage is created. The difference in accuracy of both systems is compared – the operator has on average increased the accuracy of detection of a global tempo and inter-beat intervals. Finally, a second dataset containing 33 different interpretations of the first movement of Bedřich Smetana’s composition, String Quartet No. 1 in E minor "From My Life". The results show that the average tempo of the entire first movement of the song slightly decreases depending on the later year of the recording.

APA, Harvard, Vancouver, ISO, and other styles

49

Mačák, Jaromír. "Číslicová simulace kytarových zesilovačů jako zvukových efektů v reálném čase." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2012. http://www.nusl.cz/ntk/nusl-233567.

Full text

Abstract:

Práce se zabývá číslicovou simulací kytarových zesilovačů, jakož to nelineárních analogových hudebních efektů, v reálném čase. Hlavním cílem práce je návrh algoritmů, které by umožnily simulaci složitých systémů v reálném čase. Tyto algoritmy jsou prevážně založeny na automatizované DK-metodě a aproximaci nelineárních funkcí. Kvalita navržených algoritmů je stanovana pomocí poslechových testů.

APA, Harvard, Vancouver, ISO, and other styles

50

Panenka, Vojtěch. "Sluchátka s adaptivním potlačením šumu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413245.

Full text

Abstract:

The thesis deals with the analysis of technology used during the design of headphones with integrated active ambient noise cancellation and examines the possibilities of using adaptive filters to simplify development and achieve more effective attenuation.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Digital audio : Signal processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles