DiphoneStudio
In speech, a diphone is defined as a transition between two phonemes. In a musical context, this could be taken to mean not only a transition between two vocal sounds, but more generally a transition between two sounds of any kind, whether instrumental, vocal, or recorded « sound objects ». As such, the definition of a musical « diphone » could also be extended to include a single stable sound or silence. The idea of synthesis using diphones was conceived in the late 1980s by Xavier Rodet, in an attempt to address the problem of successfully synthesizing a musical phrase using both transient and stable sounds. Using traditional analog studio techniques, a series of transient and stable sounds can be concatenated, or pieced together, by splicing small pieces of tape end to end. In today’s digital studios, this is done by cross-fading. In both cases, the result usually can end up sounding far from convincing since the inner contents of the spliced or faded sounds do not generally match. In Rodet’s system of « generalized diphone control and synthesis », sounds are carefully analyzed, and it is the analysis data which is « spliced » or « fad-ed » together by interpolating the values between neighboring segments of analysis data. The resulting in-terpolated analysis can then be resynthesized, providing a much cleaner and more natural result than could be obtained through simple tape-splicing or cross-fading. Diphone was first implemented on UNIX workstations in 1988 by Xavier Rodet and Philippe Depalle using source-filter synthesis and, later, additive synthesis. There was only a very rudimentary graphical user interface which allowed analysis data segments to be placed in a consecutive manner on the screen. The first Macintosh version of Diphone, completed in 1996, was (and continues to be) programmed by Adrien Lefevre.