Real-Time Performance via User Interfaces to Musical Structures
Stephen Travis Pope--CCRMA, Dept. of Music, Stanford University
June, 1991; revised January, 1993
Author's current address:
CNMAT, Dept. of Music, U. C. Berkeley--stp@CNMAT.Berkeley.edu
(since 1996, CREATE, Dept. of Music, U. C. Santa Barbara--stp@create.ucsb.edu)
Originally in:
Proceedings of the Int'l Workshop on Man-Machine Interaction in Live Performance. (Pisa, Italy, June, 1991) pp. 167-176.
Reprinted in:
INTERFACE 22(3): 195-212. (August, 1993)
Abstract
This informal and subjective presentation will introduce and compare several
software systems written by the myself and others for computer music
composition and performance based on higher-level abstractions of musical
data structures. I will then evaluate a few of the issues in real-time
interaction with structural descriptions of musical data.
The premise is that very interesting live-performance software environments
could be based in existing technology for structural music description, but
that much of the current real-time performance-oriented software for music is
rather limited in that it supports only very low-level notions of musical
structures.The examples will demonstrate various systems for graphical
interaction with procedural, knowledge-based, hierarchical and/or stochastic
music description systems that could be used for live performance.
Introduction
This paper discusses various types of software man-machine interfaces to
middle- and high-level musical structures in terms of their applicability to
live performance. My contention is that there are many good ideas of
graphical structure-editing-based interfaces already in the literature that
could well be used for controlling performance at levels higher than those
addressed by most current systems. The software systems described here can be
divided into two classes: those that are primarily designed for use by
composers; and novel systems for graphical interaction with structured data.
The paper opens with several comments about formalisms for composition, and
how these might be relevant to live performance. This section is followed by
a collection of annotated examples of software man-machine interfaces for
composition or other tasks, whereby the pertinence of each to live
performance is discussed.
Formalisms for Composition
A number of compositional formalisms have been developed through the ages
for various types of music. The relationship between musical form and
structure and compositional methods is taught in depth in music academies as
part of our composition or performance training.
In many musics, the role of text is central in the development of simple
musical structures; they often mirror the text's structure, as in the case of
many songs or liturgical musics. The central role of diatonic harmony as a
structure-giving element in the western music of the last 400 years is also
obvious. In the early 20th century, diatonic harmony was replaced with
12-tone formalisms without a parallel advancement in the structural aspects.
Alban Berg, for example, wrote elaborate sonata-allegro forms in 12-tone technique.
When looking at formalisms for composition as algorithms in the computer
science sense, one should cite Donald Knuth's definition of algorithm in
terms of finiteness, definiteness, input, output, and effectiveness (Knuth
1973). According to this definition, Guido d'Arezzo's hand and Phillipe de
Vitry's use of isorhythms both count as compositional algorithms.
Historically, composers have always used the highest technology available to
them in composition (although I know of composers who would argue with this
statement), so it can be informative to investigate the impact of 20th
century mathematics and information theory on this field. Computer software
has been applied to music composition since the early days of the
availability computers. The first attempts (in the 1950's) to use software
for composition fell basically into two areas: implementing complex
stochastic methods; and realizing strict serialism (Hiller 1970).
More recently, contemporary mathematical and software techniques have lead
to a wide range of attempts-with generally aesthetically-questionable
results-into the areas of fractal techniques based on self-similarity over
many orders of magnitude, procedural techniques based on the
composition-as-programming paradigm (Loy 1989), the use of generative
grammars for form generation (Roads 1978), and more advanced artificial
intelligence software techniques for knowledge-based composition. The
literature in this field is rich.
The relevance of this to live performance is that many software
implementations of such formalisms can be executed in real time by
contemporary PC- or workstation-class computers. The more complex issue is
how to construct man-machine interfaces that facilitate interaction with the
structures that are relevant to the chosen formalism or algorithm in a live
performance situation. The examples discussed below are all well-known and
documented software systems; the newest of them (T-R Trees) is two years old.
I believe that any of them could be used to great effect in real-time
performance situations where the abstractions they (re)present can be mapped
onto the piece's compositional structure.
Software Interfaces for Composers
In the first group of examples below, several software systems for music
processing are described and illustrated that take novel approaches to the
processes of music composition or performance, or present novel music
description languages. The second example section presents several related
technologies that the author believes might profitably flow into software
tools for the performance of music. It is interesting to note that (though
not intended as a selection criterium), all but one of these examples (ARA)
are written in the Smalltalk-80 programming system, and make use of the
system's object-oriented design, and rapid prototyping and incremental
refinement features, often presenting them to the end user in an extensible package.
Novel Music Processing Systems
Sound Kit
Mark Lentczner's Sound Kit (Lentczner 1985) is a novel example of the
category of samples signal editing tools. In Sound Kit, sampled sounds may be
edited using the menu of operations visible in the view to the lower-right of
Figure 1. The difference between this and other sample editors is that the
system maintains an operation tree (visible in the upper-left view in Figure
2), which can be used to describe sounds in terms of Smalltalk-80-language
messages. In the example, one can see the results of several copy/cut/paste
operations in terms of the Smalltalk-80 messages copyFrom:to: and pasteAt:.
This is a simple technique for capturing the worker's process and presenting
it to him or her in a form that may be useful for evaluation of larger-scale
patterns. This is also a possible format for describing and interacting with
interesting real-time transformations of sampled sound (i.e., the kind that
are often undertaken in performances using IRCAM's 4X signal processor). The
operation trees allow a fair level of abstraction in the representation and
manipulation of sample processing scripts, and thus they could effectively be
used for interaction with a signal processor that is processing live sound.
Figure 1: Sound Kit sound view and menu (in front) and operation tree (behind).
Kyma
The Kyma system, developed by Carla Scaletti (Scaletti 1989), and now
commercialized by Symbolic Sound Corporation, is a music composition package
that is linked to a real-time digital signal processor (named for some reason
the Capybara after a large and ugly rodent). In Kyma, no differentiation is
made between sounds, scores and compositions; structures are built as
hierarchies or concatenations of sub-sounds. The hierarchies can be described
by directed acyclic graphs (DAGs) where the leaves of the graphs represent
components sounds and the arcs can be marked to denote alterations of the
sub-sound, as shown in Figure 2. Thus Kyma offers the musique concrète
abstraction of composition as sound manipulation within a system that scales
well over many orders of magnitude (i.e., up to very large scale
compositions). This is one of the few current music software systems that
breaks away from the paradigms of CMN editors or tape-recorder-oriented
sample editors. Kyma also allows the specification of new sounds using
refinement of existing sounds, so that scores can include comprehensive sound
inheritance hierarchies. This system has already been used in live
performance by its creators and has aptly demonstrated its effectiveness
under these circumstances.
Figure 2: Kyma sound view showing a DAG for three plucks with delays and transpositions.
ARA
My 1984 ARA system used knowledge-based software techniques to present a
composer's user interface based on the paradigm of developing a
composition-specific vocabulary to describe each work. When using ARA, a
composer starts by defining a set of basic melodic and rhythmical materials
(visible in the load_set expressions in the motive area at the top of the
window in Figure 3), and then describes these materials using freely-chosen
adjectives (such as those visible in the load_attribute expressions in Figure
3). Larger structures are built by moving into the higher-level areas (such
as the voice area shown in the middle of Figure 3) while keeping some set of
attributes marked as "active." The active attributes are shown as a stack in
the menu that is visible to the right of the main view in Figure 3, where
something "round, brown, dark" is being set to follow the "consonant,
rolling, harder" section. In this way, the composer characterizes his
materials at each level of hierarchy using a vocabulary of his or her own
design, and the system uses this information to select and manipulate the
musical materials. The image of using this type of system in performance
entices me. I'd love to be able to play an instrument while instructing the
accompaniment system to use "small pointy blue" timbres and mix them with my
instrument into a "smooth and dark" textured mix.
Figure 3: The ARA user interface, showing the motive and voice areas opened
and the gen, mix and play areas collapsed.
DoubleTalk
DoubleTalk (Pope 1986) is a system whereby logic-marked Petri nets (called
predicate-transition or PrT diagrams), are used for composition.The network
can be thought of as a finite state automaton or transition diagram whereby a
logic description language (a Prolog variant) is used to describe the
conditions under which network transitions will fire, and to describe the
types of tokens that flow within the network. Figure 4 shows a DoubleTalk net
editor (with several of the system menus visible), a net marking editor (with
which tokens may be placed in the net), and a form type definition editor
(with which one defined token types). The system fell in the no-man's-land
somewhere between composition and performance systems, since low-level
networks can be isomorphic to note-by-note scores whose performance can be
influenced in real-time, whereas high-level nets can be thought of as
abstract machines for semi- (or wholly-) deterministic compositional
structures. In terms of the real-time use of such a system, one can simply
view it as a vastly more powerful and scalable version of the now popular
graphical data-flow editors such as Miller Puckette's MAX.
Figure 4: DoubleTalk's net, marking and form type editors showing a section
of the score of my work Requiem Aeternam Dona Eis.
EventGenerators in the MODE
The Musical Object Development Environment (MODE) (Pope 1991) has several
components that relate to compositional algorithms and tools. (Pope 1989b)
describes the system's framework of "EventGenerators" (EGens)-objects that
embody the form of some "middle-level" musical structure such as a chord,
cluster or ostinato. EGens can be used to build systems that support a
methodology of "composition by refinement" (Pope 1989a) whereby the composer
begins to describe a composition in rough terms in terms of general-purpose
EGens, and then refines the description or behaviors of these objects to more
exactly specify his or her wishes. To my knowledge, there is very little work
being undertaken at present aimed as bridging the gap between notes and
higher-level musical structures. I used the EGens system in a live MIDI
experiment called Day, samples of which I played to accompany a talk about
EGens at the 1989 ICMC. In Day there were some sections where the user
interacted with autonomously-running processes via EGen editors, and others
where EGens polled MIDI input and responded to it in various manners.
Figure 5: MODE EventGenerator example showing a code fragment that defines a
"DynamicPodCloud" object and the resulting event list displayed in
Hauer-Steffens (piano-roll-like) notation.
T-R Trees in the MODE
Another component of the MODE is a collection of tools based on Fred
Lerdahl's "generative theory of tonal music" (Lerdahl and Jackendoof 1983).
In this package, patterns of tension and relaxation (or decreasing and
increasing stability) in musical motives are analyzed into hierarchical
structures according to a set of well-formedness and preference rules defined
by Lerdahl and Jackendoof. The similarities between these
"tension-relaxation" trees and the prosodic stress trees used by linguists to
represent the inflection of spoken utterance can be used to manipulate spoken
text and musical materials using the same paradigms. The novelty of the T-R
trees system stems from the fact that the hierarchies described in the
generative theory (as well as prosody), bridge the gap between expressive and
structural hierarchies in music and text-something that is visibly missing in
most software tools for composers. The use of the T-R Trees system with
speech processing cannot currently be used in real time (owing to its use of
the phase vocoder for pitch, duration, and spectral processing), but the
possibilities for using the system to generate of modify MIDI streams or
commands to simpler signal processing systems are numerous and worthy of
further investigation.
Figure 6: A simple T-R Tree editor example showing the prosody of a specific
way of reading the word dunkelkammergespräche.
Harmonic Analysis Tool
Paul Alderman's Harmonic Analysis Tool (HAT) is a system that analyzes
musical chords using traditional rule-based expert system techniques. It is
useful in that it can be presented with arbitrary pitch sets and attempts to
analyze them using the rules of diatonic harmony. The analysis proceeds from
chord analysis to key analysis (if more than one chord is present), using
rules about cadences and modulation. HAT's uniqueness lies in its ability to
analyze a given (possibly unstructured) input in terms of a give rule set.
Its current rule-base only addresses diatonic harmony, but could be augmented
or replaced with a different rule-base relatively easily. For live
performance, this is simply an example of a system for somewhat higher-level
musical feature extraction, the results of which could be used in any number
of ways.
Figure 7: The Harmonic Analysis Tool showing the analysis of a ninth chord;
the system's rule base only included chords up to the seventh, so it analyzes
this as a Major seventh chord.
Examples of Related Technologies
Alternate Reality Kit
The Alternate Reality Kit (ARK) developed by Randy Smith at the Xerox Palo
Alto Research Center (PARC) (Smith 1986; 1987), is another example of visual
programming whereby the system presents the user with a simulated reality
within which he or she can program the laws of interaction of objects. In the
example shown in Figure 8 one can see several operation switches, the
warehouse of all objects, and the copy-object button (with the Xerox logo, of
course). The example shows two pens (the circles to the right of the drawing
tables in the upper portion of Figure 8), that have been enrolled in the
spring force and the Newtonian laws of motion. They therefore oscillate
around each other in the manner of binary stars. The rectangular view in the
center-top of the figure is a piece of virtual paper that I "threw" under the
pens, creating the sinusoid-like drawings on it. ARK has been used to build
simulations ranging in size up to comprehensive models of the interaction of
elementary particles in bubble chambers. It is of interest here because it
removes users entirely from the domain of computer programming and lets them
remain in their own domain of expertise, without limiting their ability to
extend the system or to build new models. I like the idea of building virtual
worlds that consist of interacting objects for modeling the processes within
a musical piece, and then interacting with them in live performance based on
an animated alternate reality.
Figure 8: Alternate Reality Kit example; the hand icon is the user's mouse.
ThinkerToy
The ThinkerToy system developed by Steven Gutfreund (Gutfreund 1987) is a
graphical environment for modeling decision support problems. It uses iconic
presentations of possibly complex mathematical operations in the form of
"ManiplIcons," which have both graphical and semantical properties.
ThinkerToy applications are generally configured by system experts for use by
domain experts. The system expert analyzes the domain and designs a set of
appropriate ManiplIcons; the domain expert uses this palette of operations,
and may extend it within the graphical manipulation paradigm. Examples of
ThinkerToy in action are shown in Figure 9, which shows user interfaces
constructed for statistical data manipulation. Figure 9a shows the iconic
buttons for several operations that are defined for array-type data. Figure
9b shows a more verbose version of several operations whereby the
Smalltalk-80 language messages are visible. In Figure 9c one sees a control
panel built for a statistician or scientist who is evaluating data gathered
from the Pioneer satellite on its fly-by of the planet Saturn. To the left of
and below the main graph view, on e can see the icons for various operations
on the data-the three large buttons in the lower-left corner, for example,
represent three methods of grouping the data. For musical applications, the
idea of a domain-specific, extensible set of default operations integrated in
a graphical data manipulation environment seems quite compelling. (Gutfreund
1987) contains several of other ThinkerToy examples that make good food for
thought in light of possible musical applications. The applicability of this
type of system to live performance processing of event-oriented or
signal-oriented data is obvious.
Figure 9a: ThinkerToy control board for numerical arrays, showing several of
the possible operations in their iconic (ManiplIcon) form.
Figure 9b: User interface components for concrete accessory tools that
operate on arrays showing the Smalltalk-80 messages they send to their operands.
Figure 9c: A ThinkerToy user interface for data analysis showing a data
graph and ManiplIcons for several types of operations.
Chinese Temple Design Tool Kit
Ranjit Makkuni's Chinese Temple Design Toolkit (CTDTK), also developed at
Xerox PARC, is an example of a software package for the capture and aid in
the process of refinement of a design. The process of design of façades for
chinese temples is modeled as consisting of the phases of definition of a
vocabulary of basic elements, design of a compositional topology, and
integration of a finished design. The system's various editors use
phase-specific input techniques for mapping gestures or graphical
configuration onto the properties of temple elements of configurations.
Examples of this are the vocabulary editor shown in Figure 10a, where a mouse
gesture (shown in the left part of the view) is mapped onto the spacing and
orientation of tiles. Once the vocabulary is defined, a topology editor
(shown in Figure 10b), may be used to build a larger structure from a
collection of basic elements. Any number of transformations are possible in
this editor, such as mappings based on mirror symmetry. Each element of a
design has a history, and these can be used to annotate the process of
refinement via threads-lines between elements, topologies and designs that
show the heritage of designs within a process. Scene editors are used to
represent the overall process of design, as illustrated in Figure 10c where
the weight and shape of line connections represent the strength of
relationships and design scenes are constructed to capture higher-level
aspects of the process. The obvious relevance to composers would be in the
fact that CTDTK aids in the introspection and process management operations
of creative activities and allows the designer to develop the so-called
"libraries of process." This type of system seems extremely useful for groups
such as our local Tonus Finalis ensemble, which practices structured
improvisation. Having the computer act as an "actively-learning listener"
during several rehearsals of a performance and then allowing the performers
(or a third party) to interact with models of their processes or their
interaction during a performance could lead to very interesting results.
Figure 10a: A CTDTK Vocabulary editor whereby the mapping of a gesture onto
the spacing of tiles can be designed.
Figure 10b: A CTDTK topology editor showing the palette of vocabulary
elements at the bottom and editor where the user places them at the top.
Figure 10c: A scene editor that represents the design process. The paths
connecting the various editors represent types and strengths of relationships
between items.
Issues
The examples above are intended to demonstrate the powerful abstractions
that have been developed for representing and manipulating middle- and
high-level musical constructs. I believe that most of them could effectively
be applied in demanding real-time environments to produce flexible, powerful
and abstract performance instruments. The primary difficulty I see in this
actually taking place are the poor support for multi-level hardware and
software architectures for performance instruments, and the poor software
development environments for real-time software applications.
The first issue, that of system architecture, can be related to two primary
problems: system complexity and cost; and data interchange protocols. Many
manufacturers (and even many users), still believe that it would be
prohibitively expensive to allocate separate processors to critical real-time
and user interface tasks. The recent multi-DSP systems developed for NeXT
machines at IRCAM and by Ariel_ Corp. point in a new direction in this area.
The harder problem is that of standardized communication protocols for the
exchange of complex data between processors. The two extremes of MIDI and
sample streams simply do not suffice if we are to develop better performance instruments.
Given adequate solutions to these two problems, I believe it will soon be
unacceptable to offer performance tools based on the current, very-low-level
representations based on events and signals.
Conclusions
This short, informal essay intended to raise several questions and present
several rather unorthodox opinions of mine on issues related to computer
music systems for live performance. I hope to have demonstrated the power of
higher-level abstractions in music representation, presentation and
manipulation in terms of a collection of software user interfaces that could
be used today to control performance instruments.
References
Gutfreund, S. H. 1987. "ManiplIcons in ThinkerToy." Proceedings of the 1987
ACM Conference on Object-Oriented Programming Systems, Languages and
Applications (OOPSLA). pp. 307-317.
Hiller, L. 1970. "Music Composed with Computers." in H. B. Lincoln, ed. The
Computer and Music. Ithaca, New York: Cornell University Press.
Lentczner, M. 1985. "Sound Kit: A Sound Manipulator." Proceedings of the
International Computer Music Conference. San Francisco: Computer Music Association.
Lerdahl, F. and R. Jackendoof. 1983. A Generative Theory of Tonal Music.
Cambridge: MIT Press.
Loy, D. G. 1989. "Composing with Computers-A Survey of Some Compositional
Formalisms and Music Programming Languages." in M. Mathews and J. Pierce,
eds. Current Directions in Computer Music Research. MIT Press.
Makkuni, R. 1986. "Representing the Process of Composing Chinese Temples."
Design Computing 1(3): 216-235.
Makkuni, R. 1987. "Gestural Representation of the Process of Composing
Chinese Temples." IEEE Computer Graphics and Applications. 7(12): 45-61.
Pope, S. T. 1986. "Music Notation and the Representation of Musical
Structure and Knowledge" Perspectives of New Music 24(2):156-189.
Pope, Stephen T. 1989a. "Composition by Refinement." Proceedings of the AIMI
Cagliari Computer Music Conference. Venice, AIMI.
Pope, Stephen Travis. 1989b. "Modeling Musical Structures as
EventGenerators" Proceedings of the International Computer Music Conference.
San Francisco: Computer Music Association.
Pope, S. T. 1991. "Introduction to MODE: The Musical Object Development
Environment." in S. T. Pope, ed. The Well-Tempered Object: Musical
Applications of Object-Oriented Software Technology. Cambridge: MIT Press.
Puckette, M. 1991. "Combining Event and Signal Processing in the MAX
Graphical Programming Environment." Computer Music Journal 15(3).
Roads, C. 1978. Composing Grammars. (Monograph) San Francisco: Computer
Music Association.
Scaletti, C. 1989. "The Kyma/Platypus Computer Music Workstation." Computer
Music Journal 13(2): 23-38. also in S. T. Pope, ed. The Well-Tempered Object:
Musical Applications of Object-Oriented Software Technology. Cambridge: MIT Press.
Scaletti, C. 1991. "A Kyma Update." in S. T. Pope, ed. The Well-Tempered
Object: Musical Applications of Object-Oriented Software Technology.
Cambridge: MIT Press.
Smith, R. B. 1986. "The Alternate Reality Kit: An Animated Environment for
Creating Interactive Simulations." Proceedings of the IEEE Workshop in Visual
Languages. pp. 99-106.
Smith, R. B. 1987. "Experiences with the Alternate Reality Kit: An Examples
of the Tension between Literalism and Magic." IEEE Computer Graphics and
Applications. 7(9): 42-50.
Figure Credits
Figure 1: (Lentczner 1985) Copyright Mark Lentczner; used by Permission.
Figure 2: (Scaletti 1991) Copyright MIT Press; used by Permission.
Figure 4: (Pope 1986)
Figure 5: (Pope 1989b)
Figure 9a, b: (Gutfreund 1987) Copyright Association for Computing
Machinery, used by Permission
Figure 10a, b, c: (Makkuni 1986) Copyright John Wiley & Sons; used by Permission.
Pisa/INTERFACE Paper (c) 1995 Stephen Travis Pope. All Rights Reserved.
[stp@CNMAT.Berkeley.edu--LastEditDate: 1995.09.17