Auditory Computation - Computational issues in auditory...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Computational issues in auditory recognition Cognitive Neuroscience The challenges? n http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html The problems n Identify what the sounds are and where they are coming from q What: 1. 2. Segregating the auditory scene Identifying auditory "objects" q n Specificity Invariance (object constancy) Sounds and words must be recognized across variability in: source (speaker) q Where: n Location must be computed because (unlike vision) it is not "given" by the geometry of the sensory surface Sound: audible variations in air pressure (movement of air molecules some of these variations are periodic Frequency: the number of compressed or rarefied patches of air molecules that pass by our ears each second determines pitch Cycle: time between successive compressed patches Hertz: the number of cycles per second human auditory systemresponds to 20 20,000 Hz Amplitude: difference in pressure between compressed and rarefied patches of air determines volume Recognizing auditory objects: Basic computations Note: arrows may be bidirectional `cocktail party problem' (Cherry,1953) n q q In most listening environments: a mixture of sounds reaches our ears. we are able to attend to a particular voice or a particular musical instrument in these situations. How does the brain achieve this apparently effortless segregation of concurrent sounds? Stimulus driven processes Knowledge driven processes Audition is "nonveridical": not simply stimulusdriven, but also knowledgedriven n n q (1) (2) Illusions Gestalt grouping principles Grouping by proximity (similarity) n 2 tone streams of of very different freq 2 tone streams of similar freq Shows the importance of speed and frequency separation of sounds in the formation of substreams. Segregation is favored both by faster sequences and by larger separations between the frequencies of high and low tones. The role of speed is seen as the sequence gradually speeds up. At slow speeds there is no segregation, but at high speeds there may be, depending on the frequency separation n n freq time Albert S. Bregman & Pierre A. Ahad, Department of Psychology, McGill University. Closure/continuity n n The auditory system can segregate the streams and Offer a "guess" a to what is happening behind the noise (perceptual fillingin) Albert S. Bregman & Pierre A. Ahad, Department of Psychology, McGill University. Grouping by common fate n n n n n This is an example of the grouping of the frequency components of a complex tone as a result of parallel frequency changes. The components are arbitrarily divided into two subsets. Then the same modulation is applied to all the members of one subset, while the other subset remains steady, as shown in the figure. While this happens, the two sets are heard as separate sounds. Finally, when the partials come together to form a single steady harmonic series, they are heard again as a single tone. This pattern is played twice with a brief pause between repetitions. Albert S. Bregman & Pierre A. Ahad, Department of Psychology, McGill University. Recognizing auditory objects: basic computations Note: arrows may be bidirectional Specialized systems n n n Speech Environmental sounds Music Unclear at what point in the processing stream these systems become differentiated Speech n Identify what the sounds are and where they are coming from q What: n Goal: achieve object constancy (invariance), despite: q q Variability due to the other sounds in the word Variability across speaker Speech: Spectograph Variability due to coarticulation n n The segmentation problem: You cannot cut the stream and isolate the first phoneme due to: Coarticulation: sounds are produced differently depending on the preceding and following sounds Invariance: The segmentation problem n Coarticulation: q Every sound is produced differently, depending on the other sounds in the syllable > n The same phoneme is not produced in an invariant manner, nor can it be matched to an invariant acoustic template Invariance: The speaker problem The problem for the listener (human or computer) Topdown Influences n Audition is "nonveridical": not simply stimulus driven but also knowledge driven Phonology: Languagespecific knowledge q Inventory: Our knowledge of the legal sounds of a language Phoneme: smallest unit of a language sound that serves to distinguish one word from another: pot/pod rot/lot h *spot/p ot q Rules: Our knowledge of the legal combination of sounds English: blin /*bnin Arabic: *blin/ bnin q Lexicon: the specific combinations of sounds that comprise the meaningful units of a language English: k t Spanish: g a t ou Topdown influences: Categorical Perception n da n ga Present sound tokens that are equidistant Listeners perceive a categorical boundary dad gag Topdown influences: Phonemic restoration n A phoneme is heard where one has been deleted and replaced by a noise burst word word phonemes phonemes sound sound Summary n Major challenges for auditory processing: q q Where? What? n Segregation of a scene into possible objects n Object identification q Achieving object invariance: Identifying what is common between the signal and stored memory representations n Solutions: q q q Grouping principles/processes Specialized subsystems Stimulus + knowledge driven processing ...
View Full Document

This note was uploaded on 07/29/2008 for the course NEUROSCIEN 70 taught by Professor Whitney during the Spring '08 term at Johns Hopkins.

Ask a homework question - tutors are online