Sensory Evaluation of Sound
eBook - ePub

Sensory Evaluation of Sound

  1. 538 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Sensory Evaluation of Sound

Book details
Book preview
Table of contents
Citations

About This Book

Sensory Evaluation of Sound provides a detailed review of the latest sensory evaluation techniques, specifically applied to the evaluation of sound and audio. This three-part book commences with an introduction to the fundamental role of sound and hearing, which is followed by an overview of sensory evaluation methods and associated univariate and multivariate statistical analysis techniques. The final part of the book provides several chapters with concrete real-world applications of sensory evaluation ranging from telecommunications, hearing aids design and binaural sound, via the latest research in concert hall acoustics through to audio-visual interaction. Aimed at the engineer, researcher, university student or manager the book gives insight into the advanced methods for the sensory evaluation with many application examples.



  • Introduces the fundamental of hearing and the value of sound


  • Provides a firm theoretical basis for advanced techniques in sensory evaluation of sound that are then illustrated with concrete examples from university research through to industrial product development


  • Includes chapters on sensory evaluation practices and methods as well as univariate and multivariate statistical analysis


  • Six application chapters covering a wide range of concrete sensory evaluation study examples including insight into audio-visual assessment


  • Includes data analysis with several associated downloadable datasets


  • Provides extensive references to the existing research literature, text books and standards

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Sensory Evaluation of Sound by Nick Zacharov in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher
CRC Press
Year
2018
ISBN
9780429769900
Edition
1
III
Application
CHAPTER 8
Telecommunications Applications
Alexander Raake
Technical University of Ilmenau, Ilmenau, Germany
Janto Skowronek
Technical University of Ilmenau, Ilmenau, Germany
Michał Sołoducha
Technical University of Ilmenau, Ilmenau, Germany
CONTENTS
8.1 Introduction
8.2 Principles of speech quality
8.2.1 Historical background
8.2.2 Quality, elements and features
8.2.3 From traditional test methods to new developments
8.3 Speech quality vs. intelligibility
8.3.1 General considerations
8.3.2 Quality and intelligibility for packet loss degradations in VoIP
8.4 Assessing speech quality with terminal devices
8.5 Speech quality dimensions
8.5.1 Listening tests
8.5.1.1 Test methods
8.5.1.2 Insights
8.5.2 New trends: Conversation tests
8.6 Speech quality and delay
8.7 Multiparty quality tests
8.7.1 Standardised method for telemeeting assessment
8.7.2 Spatial audio meeting assessment
8.7.3 Assessment of asymmetric conditions
8.8 Summary and outlook
THis chapter provides an overview to distinguish speech quality and speech-service-related quality of experience from the case of non-speech audio. The chapter provides insights into current research activities and trends that address the main aspect of speech in a Telecommunications setting, namely to convey speech information via distances. After an introduction to the topic, the underlying concepts and terms of speech quality and its assessment are laid out. The chapter then addresses a number of key aspects in current speech quality research. The relation between intelligibility and quality is addressed, linking the function of speech with judgements of its quality, for different example types of transmission impairments. Like audio quality (and actually any sensory quality), speech quality is multidimensional by nature. The chapter summarises research on the multidimensional character of speech, starting with previous work on pure listening-related features, as well as more recent approaches addressing full conversations including also phases of speaking and interaction between interlocutors. In a subsequent part, the impact of specific degradations of a telecommunications link such as pure delay on the conversational flow and communication itself will be discussed, as well as the implications for speech quality and the conversational experience. Most approaches typically consider speech quality in terms of an integral judgement that does not distinguish the impact due to the end-user terminal from that due to the network transmission. Based on older and recent work by the authors, the chapter discusses the contributions of technical degradations on the one hand, and that due to a different mindset of the user on the other hand. Since nowadays speech-based interaction in telecommunications often involves more than two interlocutors within one call, the chapter devotes its final part to the topic of multiparty quality tests. At the end of the chapter, we provide an outlook on future developments regarding sensory evaluation in telecommunications.
8.1 INTRODUCTION
Language is a communication system particular to humans. Speech is a subsystem of language, and according to Sebeok (1996), is “[] communication by means of language in the acoustic channel”. Human interlocutors, for example, in a telephone conversation, communicate by exchanging speech signs. Thus, they are able to convey abstracted information in acoustic, that is, physical form – with little effort. The invention and practical implementations of the telephone between 1840 and 1875, based on the pioneering work from various individuals such as Charles Bourseul, Johann Philipp Reis, Antonio Meucci, Elisha Gray and Alexander Graham Bell, enabled speech to be transmitted across long distances and people to communicate with each other from afar. In spite of a quest for fidelity ongoing at that time (Thompson (1995); for more details see Section 8.2.1), the speech quality related to the telephone was very different from the typical acoustics of real-life conversations. Due to the advantage of far-distance (“tele-”) communication, the relatively poor quality was accepted by the early users.
After its revolutionary development, telephony has undergone substantial changes over time. First improvements came shortly after Bell’s invention of the first deployable telephone design in 1876, with Edison’s invention of the carbon microphone in 1878. Since that improvement, more dramatic changes came along. With mobile telephony slowly rising, first with analogue solutions in the late 1940s, it became possible to get rid of the wire. The first wider spread services were primarily focused on the car industry, mainly due to the substantial size of the equipment required. More widespread mobile telephony started with analogue technology with the 1st generation (“1G”) system around 1980, first offered by NTT and shortly after also available in other variants in the US and Europe. In the 1990s, the 2G network launched the digital-based GSM (general system for mobile, ETSI) standard in Europe and the CDMA (code-division multiple access) standard in the US. For the sake of wireless mobility, users accepted considerably worse quality than achieved with fixed lines. With mobility came new problems such as background noise, wind noise, additional delay and larger talker echo, and so on. With the advent of digital signal processing, smart solutions for improving telephone connections could evolve, such as (low-bitrate) speech coding, noise suppression, automatic gain control, echo cancellation, and speech enhancement (for example near-end listener enhancement see Sauert and Vary (2006)).
Until recently, the transmitted speech bandwidth remained around the initial 300–3400 Hz. With voice over internet protocol (VoIP) technology, the first sparse usage of digital wideband speech transmission for journalists and the like with the ITU-T G.722 codec was complemented by a steady increase of the transmitted-speech bandwidth, to wideband (WB, 50–7000 Hz1), super-wideband (SWB, up to 14 kHz) and fullband (FB, up to 20 kHz). This, too, was enabled by advances in digital signal processing, more precisely, with the evolution of low-bitrate speech codecs. This extension of the transmitted frequency range has led to an improved (that is, reduced) listening effort, better speaker recognition and generally improved sound quality Raake (2006). Here, the domains of speech and audio continue to converge, with audio codecs becoming low-delay and speech codecs becoming capable of meaningfully encoding (background) audio as well. With this additional bandwidth greater demands are placed upon the performance of the numerous speech enhancement algorithms within the telecommunication chain. One of the accelerating technologies for wideband and beyond to happen was VoIP, which has some degradation of speech quality. For example, in comparison to non-satellite PSTN (public switched telephone networks), VoIP-based services show an increased delay and possibly resulting echo, and furthermore, due to the packet losses inherent to the underlying packet-based (and not circuit-switched) delivery at times show audible artefacts. Here, too, digital signal processing has enabled some countermeasures, and with improvements in network technology and ever-increasing throughput, VoIP can sound substantially better than narrowband PSTN, often referred to as POTS (plain old telephony service).
The list of developments that have led to speech quality improvements and degradations is one thing, the actual quantification of these degradations or improvements is another one. Testing with human subjects is of special interest to telecommunication operators and equipment manufacturers, as is considered the only valid way to gain insights into how end-users perceive and evaluate telecommunications systems and services2. In addition to mere listening, a telephone conversation – or more generally mediated conversation that may also involve video – comprises different phases and states, which include speech periods and exchange, that is, active and interactive components (ITU-T, 1992; Raake, 2006; Möller et al., 2017). For telecommunication systems, testing may address different aspects of mediated conversations including comprehensibility, intelligibility, communicability, or the delight or annoyance due to the system capabilities and performance (Raake, 2006; Qualinet, 2013). In this chapter, all of these aspects are categorised under the terms speech quality and (Speech) quality of experience (Jekosch, 2006; Raake and Egger, 2014). The concepts underlying these terms are briefly discussed – in light of the research addressed in this chapter – in Section 8.2.
There are a number of established methods for speech quality testing, which will briefly be summarised in Section 8.2.3. In the remainder of the chapter, novel approaches for speech quality, or methods towards speech quality of experience assessment are discussed, and novel application domains for established methods are highlighted. As for all kinds of sensory evaluation, the development of dedicated methods, or the application of existing methods for novel applications, arise from the specific requirements of the items or phenomena under test.
In this chapter, a selection of novel evaluation methods and applications are addressed that were either required (i) for studying novel or previously known “speech-quality issues” (which may also be quality-improvements) in novel ways, or (ii) due to the application of novel speech communication paradigms. Examples of (i) are the extension of the view on speech quality by bridging the link to functional aspects of speech such as intelligibility (Section 8.3), the intention to distinguish the effects between end-user equipment and that due to the network (Section 8.4), or the extension of multivariate analysis approaches from listening tests towards the inclusion of speaking and conversational phases (Section 8.5). Examples for (ii) are multiparty conferencing systems and their extension towards spatial audio conferencing (Section 8.7).
An example of a quality issue, which actually can happen in telecommunications as well as in face-to-face communication, is poor signal-to-noise ratio, that may affect the audibility or understanding of the desired sound (i.e. speech), and may be associated with annoyance due to the unwanted sound (e.g. noise). Here, the degree of communicability of the communication system (telecommunication system and/or environment) plays a crucial role (Raake, 2006), that is, which degree of effectiveness and efficiency of the speech communication is enabled by the system. Remarkably, humans dispose of different means to compensate for the effect of degraded reception when they produce the desired sound themselves during speaking. For example, in noisy environments, interlocutors speak in Lombard speech, that is, raise their voice and stress syllables differently from normal speech (e.g. Lombard, 1911; Lane et al., 1970) Similarly, when speaking to hearing-impaired or non-native speakers, persons may use clear speech3, that is, over-articulated and slowed-down louder speech, to enhance the comprehensibility of their utterances (e.g. Payton et al., 1994).
The talker is often unaware of actively applying means of situational quality improvements. It is important to note that the respective recipient of such context-adapted speech will typically find it to be normal and appropria...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Table of Contents
  7. Foreword
  8. Preface
  9. Acknowledgements
  10. Nomenclature and Abbreviations
  11. Contributors
  12. SECTION I Background
  13. SECTION II Theory and Practice
  14. SECTION III Application
  15. SECTION IV Annexes
  16. APPENDIX A ■ Description of Data Sets
  17. Bibliography
  18. Index