Coherent Multimodal Instrument Design – Case of Granular Synthesizer by Myungin Lee

Article

Published on 25 oct. 2024 by mlee

Digital medium provides great freedom for new artistic expressions with advanced audio, graphics, interface, and algorithms, including machine learning. However, while our nature is multimodal, these modalities in the digital domain are genuinely separate, and the computational platform allows innumerable varieties of linkages among them. For this reason, the holistic multimodal experience is highly dependent on the design and connection of different modalities.

Crossmodal Correspondence & Coherence

In our multimodal experience, the signals sensed through one sensory modality can influence the processing of information received through another. Such a phenomenon is crossmodal interaction. To achieve a "natural" mapping of features, or dimensions, of experience across sensory modalities, we genuinely utilize crossmodal interaction to find crossmodal correspondence among different modalities. This natural mapping requires systematic, consistent, logical connections bridging "coherent" multimodal experience. Here is an example of crossmodal correspondence, takete–maluma effect.

In this experiment by Wolfgang Köhler in 1929, the participants match the words "takete" and "maluma" with the two figures above. About 97 % of participants relate the word "takete" better with the left shape, whereas the word "maluma" connects with the right shape.

Let us extract a few features representing the characteristics of two figures and words to expand this experiment into numerical analysis. The figure below shows the features extracted: geometric observation, waveform from text-to-speech, audio spectrogram, and pitch estimation.

Interface

The instrument's interface is the membrane of interaction between humans and technology. Especially designing an interface for the digital instrument requires a substantial effort to achieve coherent crossmodal interaction. While the characteristics of the acoustic instrument are determined by its physicality including structure, resonance, texture, and space, digital instruments with computational platforms are inherently non-physical. The digital instrument designer can separate sound production from the means. This circumstance gives excellent freedom to instrument design. At the same time, designing the interface to interact with the sound material that is now separated from the physicality is challenging. Nevertheless, our bodies and movements are the most expressive tools that humans can have.

To assess the characteristics of coherent instruments, this study proposes a model that interprets the experience of the music, instrumentalist, instrument, and audience as a function.

This model derives two properties of coherent instruments: time invariance and perceptible interface. Time invariance means the input and output characteristics of a system do not change with time. The second property states the audience can identify and relate the gestural event to the sound. These properties are necessary for our brain to synthesize information from systematic, consistent, and logical cross-modal stimuli through multisensory integration.

These observations bring underlying questions:

How do we decide which multimodal experience is coherent?

What is the “natural” mapping of multimodal features?

Is the user study the only method to evaluate the experience?

Is there a way to numerically analyze the level of multimodal coherence?

What opportunities do the multimodal experience allow compared with the monomodal experience?

This study discusses the key elements of multimodal instrument design and the potential benefits through the case studies. The design process of the multimodal granular synthesizer, AlloThresher, is introduced.

- AlloThresher: Multimodal Granular Synthesizer

AlloThresher is a multimodal instrument based on granular synthesis using the gestural interface. Granular synthesis is a sound synthesis method that creates complex tones by combining and mixing the simple micro-sonic elements called grains. Using two smartphones with gyroscopes and accelerometers in both hands, the user can precisely and spontaneously trigger the parameters of the granular synthesis in real-time. The devised gestural interface includes an adaptive filter and reverberation, adding expressiveness. The modulated spectrogram of each grain and post-processing generate the corresponding visuals, morphing and blending dynamically with the instrumentalist's performance. The entire software is programmed in C++, optimizing the real-time multimodality. By removing conventional interfaces like knobs and sliders, this seamless connection between modalities utilizes the profound advantage of the gestural interface. The instrumentalist's physical presence and gesture become part of the space and the performance so that the audience can simultaneously observe and cohesively connect the audio, visual, and interface.

Additionally, this study suggests a way to use correlation coefficients between modalities to assess crossmodal correspondence.

Related Articles

- Myungin Lee, “Coherent Digital Multimodal Instrument Design and the Evaluation of Crossmodal Correspondence," Ph.D. dissertation, August., 2023. https://escholarship.org/uc/item/3gb4h770

- Myungin Lee, Jongwoo Yim, "AlloThresher: Multimodal Granular Synthesizer," International Computer Music Conference (ICMC), June 2024. https://www.researchgate.net/publication/382330556_AlloThresher_Multimodal_Granular_Synthesizer