|
|
|
|
|
Perceptual Impact of System Latency on Localization of Virtual Sounds |
|
Principal Investigator
Elizabeth Wenzel
Abstract
Surprisingly little is known regarding the impact of introducing latency during dynamic auditory localization, although it is clearly a critical issue for virtual environments. Excessive latencies may degrade basic perceptual skills such as the ability to localize a sound source in 3-dimensional space. The impact of latency on performance may be even more critical in multi-modal displays, such as a virtual environment that combines visual and auditory information. For example, recent research in visual virtual displays indicates that latencies of 31 milliseconds or more can increase errors in distance judgments for virtual targets when observers are allowed to move their heads, although performance remains superior compared to stationary observers. Delays of this magnitude or greater are typical in many virtual display systems.
In a virtual acoustic environment, the total system latency (TSL) refers to the time elapsed from the transduction of an action, such as head movement, until the consequences of that action cause the equivalent change in the virtual sound source. This paper reports on the impact of increasing TSL on localization accuracy when head motion is enabled. Five subjects estimated the location of 12 virtual sound sources (individualized head-related transfer functions, HRTFs) with latencies of 33.8, 100.4, 250.4 or 500.3 milliseconds in an absolute judgment paradigm. The task was to indicate the apparent azimuth, elevation and distance of a virtual source using a graphical response method. Subjects also rated the perceived latency on each trial. Localization performance measures included plots of the estimated vs. target locations, azimuth and elevation confusion rates, externalization rates, and average error angles. In an effort to characterize the listeners' localization strategies, their head motions were also recorded and summarized in terms of the maximum deviations of pitch, roll and yaw on each localization trial.
Previous studies have produced base-line performance data for localization of spatialized sound using static (non-head-coupled) virtual sounds. Such stimuli tend to produce increased localization errors (relative to real sound sources) including increased confusion rates (sound heard with a front-back or up-down error), decreased elevation accuracy, and failures of externalization (sound heard inside the head). Enabling head motion typically improves localization performance, particularly by reducing front-back confusions. Here, the data indicated that localization was generally accurate, even with a latency as great as 500.3 milliseconds. Front-back confusions were minimal and almost all stimuli were externalized by all subjects. Both azimuth confusions and externalization were unaffected by latency. Elevation confusions and error angles increased with latency, although the increases were significant only for the largest latency tested, 500.3 milliseconds. Mean latency ratings, on the other hand, indicated that a latency of 250.4 milliseconds was noticeable to the subjects.
Overall, subjects' localization strategies were only moderately affected by latency. Yawing motions, a method for disambiguating front from rear locations, remained the primary strategy for the listeners in all conditions. Pitching and rolling motions appear to be moderately inhibited by increased latency in the stimuli. Although the individual head motion traces have not yet been examined in detail, it appears that the maximum angular velocities of the head motions (in particular, yaw) are similar to those observed in previous studies, e.g., about 175°/s.
Together with the results of previous studies, these data suggest that head motion provides powerful cues for localization of virtual sounds. These dynamic cues counteract many disrupting factors in the stimulus, including non-individualized HRTFs, conflicting interaural cues, and increased latency. The fact that accuracy was generally comparable for the shortest and longest latencies tested here suggests that listeners are largely able to ignore latency during active localization. Apparently, this is possible even though latencies of this magnitude produce an obvious spatial "slewing" of the sound source such that it is no longer stabilized in space as the head is reoriented. It may be that the localization task per se is not the most sensitive test of the impact of latency in a virtual audio system. Other tasks that are more directly dependent on temporal synchrony, such as tracking an auditory-visual virtual object, may be much more sensitive to latency effects. The latency rating data suggest that a discrimination paradigm, such as that used to measure minimum audible movement angles, may also be more likely to indicate changes in listeners behavior at lower latencies.
|
|
|