Abstracts of Selected Publications by Stephen T. Pope

Topics


Best-sellers

Automatic Labeling and Control of Audio Algorithms by Audio Recognition (with Jason LeBoeuf)

U. S. Patent Application 9,031,243
Controlling a multimedia software application using high-level metadata features and symbolic object labels derived from an audio source, wherein a first-pass of low-level signal analysis is performed, followed by a stage of statistical and perceptual processing, followed by a symbolic machine-learning or data-mining processing component is disclosed. This multi-stage analysis system delivers high-level metadata features, sound object identifiers, stream labels or other symbolic metadata to the application scripts or programs, which use the data to configure processing chains, or map it to other media. Embodiments of the invention can be incorporated into multimedia content players, musical instruments, recording studio equipment, installed and live sound equipment, broadcast equipment, metadata-generation applications, software-as-a-service applications, search engines, and mobile devices. Get the PDF file

Method and apparatus for analyzing animal vocalizations, extracting identification characteristics, and using databases of these characteristics for identifying the species of vocalizing animals (with Tom Stephenson)

U. S. Patent 9,177,559
A method for capturing and analyzing audio, in particular vocalizing animals including birds, frogs, and mammals, and which uses the resulting analysis parameters to establish a database of identification characteristics for the vocalizations of known species. This system of analysis can then be used on the files of unknown species to identify the species producing that vocalization type. The method uses a unique multi-stage method of analysis that includes first-stage analysis followed by segmentation of a vocalization into its structural components, such as Parts, Elements, and sections. Further analysis of the individual Parts, Elements, sections and other song structures produces a wide range of parameters which are then used to assign groups of identical, known species a diagnostic set of structural and qualitative criteria. Subsequently, the vocalizations of unknown species can be similarly analyzed and the resulting parameters can be used to match the unknown data sample to the database of similarly analyzed audio data features from a plurality of known species. Get the PDF file

Method and system for scalable multi-stage music cover-song detection (with D. Della Santa and J. Trevino)

Provisional patent application U.S. 62/944,798 filed December 6, 2019
A cover-song detection (CSD) method and system used to determine whether one musical selection is a variation or “cover” of another. A computer processor is presented with a plurality of musical selections in digital form (the song database), and these data files are analyzed to generate one or more multi-valued feature vectors for each. Later, a musical selection (called the query, assumed not to be one of the plurality of musical selections) is presented to the system, and similar analysis is used to generate one or more multi-valued feature vectors for it.

The Big MAT Book: Courseware for Audio & Multimedia Engineering (in 3 volumes)

MAT/CREATE, 2008, 665 pages
Multimedia engineering is a broad and complex topic. It is also one of the fastest-growing and most valuable fields of research and development within electronic technology. The book before you is an anthology of curriculum materials developed over the space of 12 years at the University of California, Santa Barbara for students in UCSB’s Graduate Program in Media Arts and Technology.
The BigMATBook consists of the presentation slides for eleven ten-week courses, amounting to almost 500 hours of presentation time. For each of the eleven courses, the presentation slides are accompanied by the tables of contents of the course readers, and an overview of the example code archives. These resources are available for down-load from the MAT or HeavenEverywhere web sites (see http://HeavenEverywhere.com/TheBigMATBook).
The multimedia engineering courses included here cover theory and practice, hardware and software, visual and audio media, and arts as well as entertainment applications. Some of the courses (the first two chapters) are required of all MAT graduate students, and thus must target less-technical and also non-audio-centric students. The bulk of this material, though, consists of elective courses that have somewhat higher-level prerequisites and assume basic knowledge of acoustics and some (minimal) programming experience in mainstream programming languages.  Get the PDF file

The Allosphere: An Immersive Multimedia Instrument for Scientific Data Discovery and Artistic Exploration (with Xavier Amatriain, JoAnn Kuchera-Morin and Tobias Hollerer)

IEEE Transactions on Multimedia, 2008.
The UCSB Allosphere is a 3-story high spherical space in which fully immersive environments can be experienced.  It allows for the exploration of large-scale data sets in an environment that is at the same time multimodal, multimedia, multi-user, immersive, and interactive. The Allosphere is being used for research into scientific visualization/auralization and data exploration but also as a research environment for behavioral/cognitive scientists and artists. The facility consists of a perforated aluminum sphere, ten meters in diameter, suspended inside a near-anechoic cube. The Allosphere is being equipped with high-resolution active stereo projectors, a complete 3D sound system with hundreds of speakers and novel interfaces. Once fully equipped it will enable seamless immersive projection and 3D audio. In this article we give an overview of the purpose of the instrument as well as the systems that are being put in place to equip such a unique environment. We also review the first results and experiences in developing and using the Allosphere in several prototype projects. Get the PDF file

“The Acoustics of a large 3D Immersive Environment: The Allosphere at UCSB,” (with D. Conant, T. Hoover and K. McNally)

Proc. 2008 ASA-EAA Joint Conference on Acoustics. Paris.
The Allosphere is a new audio/visual immersion space for the California Nanosystems Institute at the University of California, Santa Barbara, used for both scientific and performing-arts studies. This 3-story sphere with central-axis catwalk permits at unusually large experiential region. The huge perforated-metal visual projection sphere, with its principle listening locations centered inside the sphere, introduces multiple considerations and compromises, especially since the ideal acoustical environmental is anechoic. Video projection requires opaque light reflectivity of the concave projection surface, while audio solicits extreme sound transmissibility of the screen plus full-range sound absorptivity outside the sphere. The design requires high-fidelity spatialization of a large number of simulated sound sources over a large region near the core, and support of vector-based amplitude panning, Ambisonic playback, and wave-field synthesis. This paper discusses considerations that both conform to, and lie outside of, traditional acoustical analysis methodologies, and briefly reviews the electroacoustic systems design.
Get the PDF file

“Interchange Formats for Spatial Audio”

(invited position paper) Proc. 2008 Int’l Computer Music Conference (ICMC), Belfast.
Space has been a central parameter in electroacoustic music composition and performance since its origins. Nevertheless, the design of a standardized interchange format for spatial audio performances is a complex task that poses a diverse set of constraints and problems. This position paper attempts to describe the current state of the art in terms of what can be called “easy” today, and what areas pose as-yet unsolved technical or theoretical problems. The paper ends with a set of comments on the process of developing a widely useable spatial sound interchange format. Get the PDF file

Scripting and Tools for Analysis/Resynthesis of Audio

Proceedings of the 2007 International Computer Music Conference.
Software tools for audio analysis, signal processing and synthesis come in many flavors; in general they fall into one of two categories: interactive tools with limited extensibility, or non-graphical scripting languages. It has been our attempt to combine the best features of these two worlds into one framework that supports both (a) the easy development of GUI-based applications for digital audio signal processing (DASP), and (b) an extensible text-based scripting language with built-in libraries for DASP applications. The goal is to combine the good performance of optimized low-level code for the signal processing number-crunching, with a powerful, flexible scripting language and GUI construction tools for application development. We investigate the solutions to this dilemma on the basis of four concrete examples in which DASP tools have been used together with the Siren music/sound package for Smalltalk. Get the PDF file

Teaching Digital Audio Programming: Notes on a Two-year Course Sequence

Proceedings of the 2007 International Computer Music Conference.
The MAT 240 Digital Audio Programming course sequence is a six-quarter (i.e., two-year) practical workshop class devoted to teaching digital audio processing techniques and software development at the graduate level. It has been delivered through several complete iterations at UCSB since 2000. In this paper, we will introduce the course sequence topics, describe what students actually do and learn in the course, and evaluate our challenges, successes and failures. Get the PDF file

Immersive Audio and Music in the Allosphere (with Xavier Amatriain, Tobias Hollerer, and JoAnn Kuchera-Morin)

Proceedings of the 2007 International Computer Music Conference.
The UCSB Allosphere is a 3-story-high spherical instrument in which virtual environments and performances can be experienced in full immersion. It is made of a perforated aluminum sphere, ten meters in diameter, suspended inside an anechoic cube. The space is now being equipped with high-resolution active stereo projectors, a 3D sound system with several hundred speakers, and with tracking and interaction mechanisms. The Allosphere allows for the exploration of large-scale data sets in an environment that is at the same time multimodal, multimedia, multi-user, immersive, and interactive. This novel and unique instrument will be used for research into scientific visualization/auralization and data exploration, and as a research environment for behavioral and cognitive scientists. It will also serve as a research and performance space for artists exploring new forms of art. In particular, the Allosphere has been carefully designed to allow for immersive music applications. In this paper, we give an overview of the instrument, focusing on the audio subsystem. We present first results and our experiences in developing and using the Allosphere in several prototype projects. Get the PDF file

The Siren 7.5 Package for Music and Sound in Smalltalk

MAT/CREATE Internal Report, 2007
Siren is a programming framework for developing music/sound applications in the Smalltalk programming system. It has been under development for more than 20 years, and the newest version (7.5) has a collection of major updates and new subsystems. This paper briefly introduces Siren, and then concentrates on the significant new features, interfaces, and applications in Siren 7.5. Get the PDF file

Software Models and Frameworks for Sound Composition, Synthesis, and Analysis: The Siren, CSL, and MAK Music Languages

Anthology, June, 2005, updated May, 2007, 462 pages
Music is an undeniably complex phenomenon, so the design of abstract representations, formal models, and description languages for music-related data can be expected to be a rich domain. Music-making consists of a variety of diverse activities, and each of these presents different requirements for developers of new abstract and concrete data formats for musician users.
The topic of this work is the design of formal models and languages for a set of common musical activities including (but not limited to) composition, performance and production, and semantic analysis. The background of this work is the 50-year history of computer music programming languages, which began with low-level and (by today’s standards) simplistic notations for signal synthesis routines and compositional algorithms. Over these 50 years, many generations of new ideas have been applied to programming language design, and the topics of formal modeling and explicit knowledge representation have arisen and taken an important place in computer science, and thus in computer music.
The three concrete systems presented in this anthology have been developed and refined over a period of 25 years, and address the areas, respectively, of (a) music composition (Siren), (b) sound synthesis and processing (CSL), and (c) music data analysis for information retrieval (MAK). In each successive generation of refinement of these concrete languages, the underlying models and metamodels have been considered and incrementally merged, so that the current-generation (Siren 7, CSL 4 and MAK 4) share  both superficial and deep models and expressive facilities. This allows the user (assumed to be a composer, performer, or musicologist) to share data and functionality across these domains, and, as will be demonstrated, to extend the models and frameworks into new areas with relative ease.
The significant contributions of this work to the literature can be found in (a) the set of design criteria and trade-offs developed for music language developers, (b) the new object-oriented design patterns for computer music systems, and (c) the trans-disciplinary design of the three specific languages for composers, performer/producers, and musicologists presented here.  Get the PDF file


MODE & Siren: Smalltalk and Music

The Siren 7.5 Package for Music and Sound in Smalltalk

MAT/CREATE Internal Report, 2007
Siren is a programming framework for developing music/sound applications in the Smalltalk programming system. It has been under development for more than 20 years, and the newest version (7.5) has a collection of major updates and new subsystems. This paper briefly introduces Siren, and then concentrates on the significant new features, interfaces, and applications in Siren 7.5. Get the PDF file

Metamodels and Design Patterns in CSL4 (with Xavier Amatriain, Lance Putnam, Jorge Castellanos, and Ryan Avery)

Proceedings of the 2006 International Computer Music Conference
The task of building a description language for audio synthesis and processing consists of balancing a variety of conflicting demands and constraints such as easy learning curve, usability, flexibility, extensibility, and run-time performance. There are many alternatives as to what a modern language for describing signal processing patches should look like. This paper describes the object-oriented models and design patterns used in version 4 of the CREATE Signal Library (CSL), a full rewrite that included an effort to use concepts from the ”4MS” metamodel for multimedia systems, and to integrate a set of design patterns for signal processing. We refer the reader to other publications for an introduction to CSL, and will concentrate on design and implementation choices in CSL4 that simplify the kernel classes, improve their performance, and ease their extension while using best-practice software engineering techniques. Get the PDF file

Recent Developments in Siren: Modeling, Control, and Interaction for Large-scale Distributed Music Software (with Chandrasekhar Ramakrishnan)

Proceedings of the 2003 International Computer Music Conference.
This paper describes recent advances in platform-independent object-oriented software for music and sound processing. The Siren system is the result of almost 20 years of continuous development in the Smalltalk programming language; it incorporates an abstract music representation language, interfaces for real-time I/O in several media, a user interface framework, and connections to object databases. To support ambitious compositional and performance applications, the system is integrated with a scalable realtime distributed processing framework. Rather than presenting a system overview (Siren is exhaustively documented elsewhere), we discuss the new features of the system here, including its integration with new DSP frameworks, new I/O interfaces, and its use in several recent compositions. Get the PDF file

Music and Sound Processing in Squeak Using Siren

Invited Chapter in Squeak: Open Personal Computing and Multimedia edited by Mark Guzdial and Kim Rose. Prentice-Hall, 2002.
The Siren system is a general-purpose music composition and production framework integrated with Squeak Smalltalk (1); it is a Smalltalk class library of about 200 classes for building musical applications. Siren runs on a variety of platforms with support for real-time MIDI and multi-channel audio I/O. The system's source code is available for free on the Internet; see the Siren home page at the URL http://www.create.ucsb.edu/Siren. This chapter concentrates on (a) the Smoke music description language, (b) the real-time MIDI and sound I/O facilities, and (c) the GUIs for the 2.7 version of Siren. It is intended for a Squeak programmer who is interested in music and sound applications, or for a computer music enthusiast who is interested in Squeak applications. Get the PDF file

The Musical Object Development Environment (MODE)--Ten Years of Music Software in Smalltalk

Proceedings of the 1994 International Computer Music Conference.
The author has developed a family of software tool kits for composers with the Smalltalk-80 programming sys tem over the last decade. The current MODE Version 2 system supports structured composition, flexible graphical editing of high- and low-level musical objects, real-time MIDI I/O, software sound synthesis and processing, and other tasks. This poster will introduce the MODE and SmOKe, its representation language, and survey the various end-user applications it includes. The discussion will evaluate the system's performance and requirements. Get the PDF file

The Interim DynaPiano: An Integrated Tool and Instrument for Composers

Computer Music Journal 16:3, Fall, 1992, 21 p.
The Interim DynaPiano (IDP) is an integrated computer hardware/software configuration for music composition, production, and performance based on a Sun Microsystems Inc. SPARCstation computer and the Musical Object Development Environment (MODE) software. The IDP SPARCstation is a powerful hardware-accelerated color graphics RISC- (reduced instruction set computer) based workstation computer running the UNIX operating system. It is augmented by large RAM and disk memories and coprocessors and interfaces for real-time sampled sound and MIDI I/O. The MODE is a large hierarchy of object-oriented software components for music written in the Smalltalk-80 language and programming system. MODE software applications in IDP support flexible structured music composition, sampled sound recording and processing, and real-time music performance using MIDI or sampled sounds.   The motivation for the development of IDP is to build a powerful, flexible, and portable computer-based composer's tool and musical instrument that is affordable by a professional composer (i.e., around the price of a good piano or MIDI studio). The hardware and low-level software of the system consist entirely of off-the-shelf commercial components. The goal of the high-level and application software is to exhibit good object-oriented design principles and elegant modern software engineering practice. The basic configuration of the system is consistent with a whole series of "intelligent composer's assistants" based on a core technology that has been stable for a decade. This article presents an overview of the hardware and software components of the current IDP system. The background section discusses several of the design issues in IDP in terms of definitions and a set of examples from the literature. The hardware system configuration is presented next, and the rest of the article is a description of the MODE signal and event representations, software libraries, and application examples.  Get the PDF file

The SmOKe Music Representation, Description Language, and Interchange Format

Proceedings of the 1992 International Computer Music Conference.
The Smallmusic Object Kernel (SmOKe) is an object-oriented representation, description language and interchange format for musical parameters, events, and structures. The author believes this representation, and its proposed linear ASCII description, to be well-suited as a basis for: (1) concrete description interfaces in other languages, (2) specially-designed binary storage and interchange formats, and (3) use within and between interactive multimedia, hypermedia applications in several application do mains. The textual versions of SmOKe share the terseness of note-list-oriented music input languages, the flexibility and extensibility of "real" music programming languages, and the non-sequential description and annotation features of hypermedia description formats.   This description defines SmOKe's basic concepts and constructs, and presents examples of the music mag nitudes and event structures. The intended audience for this discussion is programmers and musicians working with digital- technology-based multimedia tools who are interested in the design issues related to music representations, and are familiar with the basic concepts of software engineering. Two other documents ([Smallmusic 1992] and [Pope 1992]), describe the SmOKe language, and the MODE environment within which it has been implemented, in more detail. Get the PDF file

Modeling Musical Structures as EventGenerators

Proceedings of the 1989 International Computer Music Conference.
There is a broad range of music description languages. The common terms for describing musical structures define a vocabulary that every musician learns as part of his or her training. The terms we take for granted in de scribing music can be used for building generative software description languages. This paper describes recent work modeling higher-level musical structures in terms of objects that understand specialized sub-languages for creation of-and interaction with-musical structures. The goal is to provide tools for composers to describe compositions by incrementally refining the behaviors of a hierarchical collection of structure models. Get the PDF file

T-R Trees in the MODE (A Tree Editor Based Loosely on Fred's Theory)

Proceedings of the 1991 International Computer Music Conference.
The T-R Trees software system is a set of software tools for the graphical and programmatic manipulation of expressive and structural hierarchies in music composition. It is loosely based on the hierarchies described in Fred Lerdahl and Ray Jackendoof's landmark book A Generative Theory of Tonal Music--weighted grouping and prolongational reduction trees (also called tension-relaxation or T-R trees). This article describes T-R tree derivation, editing, and application in score representation and management. Get the PDF file

Distributed Processing

The Distributed Processing Environment for High-Performance Distributed Multimedia Applications (with Andreas Engberg, Frode Holm, and Ahmi Wolf)

Proc. 2001 IEEE Multimedia Technology and Applications Conference
Our group is involved in implementing large-scale multimedia software for application areas ranging from multi-user virtual worlds to complex real-time sound synthesis. We call this class of system High-Performance Distributed Multimedia (HPDM) software. The Distributed Processing Environment (DPE) is an infrastructure for configuring and managing HPDM software. It consists of several components that allow the start-up, monitoring, and shut-down of software services on a network. This report describes the design and implementation of the prototype DPE system, which we built for the ATON project.  Get the PDF file

The Real-time (Multimedia) Interface Description Language: RIDL (with Andreas Engberg and Frode Holm)

Proc. 2001 IEEE Multimedia Technology and Applications Conference
The Real-time Multimedia Interface Description Language—RIDL—is an extension of the CORBA IDL for use in building distributed real-time multimedia software systems. We designed RIDL to integrate quality-of-service (QoS) information, as well as configuration requirements, into the IDL interface descriptions of our software components. We have built a flexible first-generation RIDL compiler and associated repositories.  Get the PDF file

All About CRAM: The CREATE Real-time Application Manager

CREATE Internal Report
The CREATE Real-time Applications Manager (CRAM) is a framework for developing, deploying, and managing distributed real-time software. It has evolved in our group at UCSB through three implementations over the space of five years. The background of CRAM is the work done since the early 1990s on distributed processing environments (DPEs), which started in the telecommunications industry (see Appendix 1). CRAM is unusual among DPEs in that it is very light-weight and efficient, but also fault-tolerant, and that it supports both planning-time and run-time load balancing as required by real-time applications. Its main application areas to date are large-scale music performance systems and distributed virtual environments. Get the PDF file.

ATON Report 2001.06.1: ATON/UCSB Final Report

CREATE Internal Report
The ATON Project was an ambitious, large-scale, multi-year R&D effort undertaken by three teams collaborating across several disciplines. The original project description (see the ATON web site http://www.create.ucsb.edu/ATON/overview.html) stated, “The project involves topics as diverse as robotics, computer vision, distributed multimedia processing, and virtual reality.“ For the ATON system, we need to build a virtual environment (VE) that allows one or more users to control robots and video cameras located anywhere in the state of California, and to “see through the eyes” of the robots to manage traffic incidents. This implies a kind of widearea distributed real-time multimedia system that we call High-Performance Distributed Multimedia (HPDM) software. This report summarizes the work carried out in the CREATE Lab at UCSB as part of the DiMI ATON Project between 1999 and 2001. We describe the background of the ATON Project, and discuss our efforts, relating them to our published reports and concrete deliverables.  Get the PDF file.


Computer Music and Music Composition

Producing Kombination XI: Using Modern Hardware and Software Systems for Composition

Leonardo Music Journal, 2(1): 23-28, 1992.
This article discusses two topics related to the realization of my composition "Kombination XI: A Ritual Place for Live and Processed Voices." These are the score's structure representation language and the software tools for manipulating it using graphical structure editors, and the process of realization using several different digital signal processing software and hardware systems. The reason for focusing on the first issue is the attempt to built a notation and set of software tools based on weighted trees that span the expressive and structural domains of music. The second topic is of interest as an example of the possibility of using several types of computer hardware and software in consort as one instrument. Numerous score and structure description and editing examples, and documentation of the realization process are presented. Get the PDF file

Fifteen Years of Computer-assisted Composition

Proceedings of the 2nd Brazilian Symposium on Computer Music, 1995.
This paper describes several generations of computer music systems and the music they have enabled. It will introduce the software tools used in some of my music com positions realized in the years 1979-94 at a variety of studios using various software and hardware systems and programming languages. These tools use a wide range of compositional methods, including (among others): high-level graphical notations, lim ited stochastic selection, Markov transition tables, forward-chaining expert systems, non-deterministic Petri networks, and hierarchical rule-based knowledge systems. The paper begins by defining several of the terms that are frequently used in the computer music literature with respect to computer-aided composition and realization, and intro duces several of the categories of modern models of music composition. A series of in- depth examples are then drawn from my works of the last 15 years, giving descriptions of the models, the software tools, and demonstrating the resulting music. Get the PDF file

Computer Music Workstations I Have Known and Loved

Proceedings of the 1995 International Computer Music Conference.
This paper introduces a set of design criteria and points of current debate in the development of computer music workstations. It surveys the systems of the last ten years and makes several subjective comments on the design and implementation of computer-based tools for music composition, production, and live performance. The intent is to focus the reader's attention on the issues of hardware architecture and soft ware support in defining computer-based tools and instruments. Get the PDF file

Why is Good Electroacoustic Music So Good? Why is Bad Electroacoustic Music So Bad?

(expanded version of the Editor's Note in CMJ 18:3 with responses). YLEM Newsletter 15:4 (July/August, 1995), 4 p. Get the ASCII text file

Real-Time Performance via User Interfaces to Musical Structures

Proceedings of the Int'l Workshop on Man-Machine Interaction in Live Performance, Pisa, Italy, June, 1991. reprinted in Interface 22(3): 195-212. 9 p.
This informal and subjective presentation will introduce and compare several software systems written by the myself and others for computer music composition and perfor mance based on higher-level abstractions of musical data structures. I will then evaluate a few of the issues in real-time interaction with structural descriptions of musical data.  The premise is that very interesting live-performance software environments could be based in existing technology for structural music description, but that much of the current real-time performance-oriented software for music is rather limited in that it supports only very low-level notions of musical structures.The examples will demonstrate various systems for graphical interaction with procedural, knowledge-based, hierarchical and/or stochastic music description systems that could be used for live performance. Get the PDF file (without figures) Read the HTML version (*with* figures)

Web.La.Radia: Social, Economic, and Political Aspects of Music and DigitalMedia

Invited Paper, Salzburg Symposium on New Media Technology and Networking for Creative Applications (1997). Reprinted in Proceedings of the 1997 International Computer Music Conference, Thessaloniki. Reprinted in Computer Music Journal 23:1, Spring, 1999, 10 p.
This informal essay addresses the current status and trajectory of media art and media technology. In formulating my ideas on these topics, I found myself being drawn away from my usual technical concerns, and increasingly to the sociology, economics, and political relationships of electronic media art and its modes of production and dissemination. There are several rather bold statements below on the subject of new media art and art-making on the world-wide web, and I rely heavily on a series of quotes taken from the literature to make my points, without the implication that I necessarily agree with every one of them. I take a critical stance in these comments, but still do not wish to be considered a ìweb-Luddite.î I use the web daily, and it is a major component of my research. On the other hand, I am very concerned by several trends I see in the web culture and feel that it is necessary to draw attention to them. Get the PDF File

Music Information Retrieval and Databases

Automatic Labeling and Control of Audio Algorithms by Audio Recognition (with Jason LeBoeuf)

U. S. Patent Application 20110075851, 2010
Controlling a multimedia software application using high-level metadata features and symbolic object labels derived from an audio source, wherein a first-pass of low-level signal analysis is performed, followed by a stage of statistical and perceptual processing, followed by a symbolic machine-learning or data-mining processing component is disclosed. This multi-stage analysis system delivers high-level metadata features, sound object identifiers, stream labels or other symbolic metadata to the application scripts or programs, which use the data to configure processing chains, or map it to other media. Embodiments of the invention can be incorporated into multimedia content players, musical instruments, recording studio equipment, installed and live sound equipment, broadcast equipment, metadata-generation applications, software-as-a-service applications, search engines, and mobile devices. Get the PDF file

Feature Extraction and Database Design for Music Software (with Frode Holm and Alexandre Kouznetsov)

Proceedings of the 2004 International Computer Music Conference
Persistent storage and access of sound/music meta-data is an increasingly relevant topic to the developers of multimedia software. This paper focuses on the design of music signal analysis tools and database formats for modern applications. It is partly tutorial in nature, and partly a discussion of design issues. We begin with a high-level overview of the dimensions of music database (MDB) software, and then walk through the common feature extraction techniques. A requirements analysis of several application categories will allow us to carefully determine which features might be most useful for them. This leads us to suggest concrete architectural and design criteria, and to close by introducing several of our recent implemented systems. The authors believe that much current MDB software suffers due to ad-hoc design of analysis systems and feature vectors, which often incorporate only low-level features and are not tuned for the application at hand. Our goal is to advance the state of the art of music meta-data extraction and database design by fostering a better engineering practice in the construction of high-level feature vectors and analysis engines for music software. Get the PDF file

The FASTLab Music Analysis Kernel

FASTLab Internal Report
The FASTLab Music Analysis Kernel (FMAK) is a software package for building and using music and sound databases. It consists of four main interfaces: analysis, segmentation, clustering, and classification. The FMAK analyzer computes both low-level and high-level features (called feature vectors or meta-data) from musical selections. The segmenter takes these feature vectors and finds the phrase, verse, and section breaks in music, thus discovering the musical form and allowing us to reduce the number of feature vectors we need to store. The clustering functions support data mining in large databases of feature vectors by grouping the data into well-defined genre clusters. The classifier adds customizable database pruning and run-time distance metrics for using genre databases. These four components can be used in a variety of ways to build software applications that processes large volumes of multimedia data.  Get the PDF file

Expert Mastering Assistant (EMA) Version 2.0 Technical Documentation (with Alex Kouznetsov)

FASTLab Internal Report
This document describes the design, and implementation of the “Expert Mastering Assistant” (EMA) tool version 2.0 developed by UCSB Center for Research in Electronic Art Technology (CREATE), and FastLAB Inc. for Panasonic Spin-Up Fund. The “expert mastering assistant” (EMA) is a prototype artificial-intelligence-based software tool that “listens” to a set of musical selections and gives expert advice to a mastering engineer, suggesting parameters for signal processing modules that perform the signal processing: equalization, compression, reverberation, etc. EMA suite consists of two major components: the interactive EMA application that analyses and processes individual songs with real-time interactivity, and a number of development applications that are required as a part of the expert system training process (Figure 1).  Get the PDF file

The Open Music Network Infrastructure (OMNI)

CREATE Internal Report
This proposal describes the Open Music Network Infrastructure (OMNI), an Internet-based music service that aims to provide music content providers with a new forum in which to attract music consumers, enabling the so-called “second music industry.” The OMNI system consists of a content provider interfaces, a large-scale artificial-intelligence-assisted “smart” music/sound database, and listener services that allow users to select musical selections based on their personal taste. The most unique feature of OMNI relative to other web-based music services is this use of a smart indexing and search component in the database, which facilitates little-known musicians finding an audience that would like their songs. This document is aimed at a semi-technical reader. Get the PDF file

Content Analysis and Queries in a Sound and Music Database

Proceedings of the 1999 International Computer Music Conference.
The Paleo database project at CREATE aims to develop and deploy a large-scale integrated sound and music database that supports several kinds of content and analysis data and several domains of queries. The basic components of the Paleo system are: (1) a scalable general-purpose object database system, (2) a comprehensive suite of sound/music analysis (feature extraction) tools, (3) a distributed interface to the database, and (4) prototype end-user applications The Paleo system is based on a rich set of signal and event analysis programs for feature extraction from sound and music data. The premise is that, in order to support several kinds of queries, we need to extract a wide range of different kinds of features from the data as it is loaded into the database, and possibly to analyze still more in response to queries. The results of these analyses will be very long ³feature vectors² (or multi-level indices) that describe the contents of the database. To be useful for a wide range of applications, the Paleo system must allow several different kinds of queries, i.e., it needs to manage large and changing feature vectors.  As data in the database is used, the feature vectors can be simplified. This might mean discarding spectral analysis data for speech sounds, or metrical grouping trees for unmetered music. This is what sets Paleo apart from most other media database projects‹the use of complex and dynamic feature vectors and indices.  This paper introduces the Paleo system¹s architecture, and then focusses on three issues: the signal and event analysis routines, the use of constraints in analysis and queries, and the object storage layer and formats. Some examples of Paleo usage are also given. Get the PDF file of the text Get the PDF File of the presentation slides

Spatial and 3-D Sound Systems

Immersive Audio and Music in the Allosphere (with Xavier Amatriain, Tobias Hollerer, and JoAnn Kuchera-Morin)

Proceedings of the 2007 International Computer Music Conference.
The UCSB Allosphere is a 3-story-high spherical instrument in which virtual environments and performances can be experienced in full immersion. It is made of a perforated aluminum sphere, ten meters in diameter, suspended inside an anechoic cube. The space is now being equipped with high-resolution active stereo projectors, a 3D sound system with several hundred speakers, and with tracking and interaction mechanisms. The Allosphere allows for the exploration of large-scale data sets in an environment that is at the same time multimodal, multimedia, multi-user, immersive, and interactive. This novel and unique instrument will be used for research into scientific visualization/auralization and data exploration, and as a research environment for behavioral and cognitive scientists. It will also serve as a research and performance space for artists exploring new forms of art. In particular, the Allosphere has been carefully designed to allow for immersive music applications. In this paper, we give an overview of the instrument, focusing on the audio subsystem. We present first results and our experiences in developing and using the Allosphere in several prototype projects. Get the PDF file

Audio in the UCSB CNSI AlloSphere

MAT/CNSI Internal Report
The UCSB AlloSphere is a joint effort of the California NanoSystems Institute (CNSI) and the graduate program in Media Arts and Technology (MAT) at the University of California Santa Barbara (UCSB). It is currently under construction, with completion scheduled for the first half of 2006. The AlloSphere is designed as an immersive computational interface for 10 to 20 users, featuring surround-sound data sonification and immersive visualization (i.e., 3D audio and video projection) on a spherical surface. It will provide interactive control by the means of microphone arrays, cameras, and mechanical, and magnetic input tracking. The actual shape of the AlloSphere can be described as two hemispheres with 16-foot radii pulled 8 feet apart, placed in a 3-story anechoic chamber. A 7-foot-wide bridge runs across the center, supporting the users. This document describes the requirements for the audio component of the AlloSphere, introduces the three prevalent spatial sound processing technologies in use today, and outlines the AlloSphere audio input and projection design and implementation plan, from low-level transducer elements to high-level network protocols. Get the PDF file

The State of the Art in Sound Spatialization

There are several aspects to the field of spatial sound, each of which pose different chalenges and offer different potential applications. Although our understanding of aural perception is still incomplete, we are able to both synthesize and record spatial sound fields, and to render sound such that the fidelity of localization is very high (for a specific listener). There are several well-known and effective techniques for creating the perceptual cues that our brains use to localize sound, but the systems that scale well to large spaces or to many listeners are not the same ones that give the best localizational fidelity. The formal study of spatial sound performance in larger space (e.g., concert halls) is still in its (relative) infancy. Most work in this area has been ad hoc, treating the spatial sound performance situation more as an instrumental performance than as a controlled experiment.  This presentation will explore the aspects of aural perception that contribute to the difficulties, and the potential, in the recording and playback of spatial sound, and will survey the current techniques used in this area. Get the PDF File

Building Sound into a Virtual Environment: An Aural Perspective Engine for a Distributed Interactive Virtual Environment (An APE for a DIVE). (with Lennart E. Fahlén)

Report of the Distributed Systems Laboratory of the Swedish Institute for Computer Science, Stockholm, August, 1992.
We have investigated the addition of spatially-localized sound to an existing graphics-oriented synthetic environment (virtual reality system). To build "3-D audio" systems that are robust, listener-independent, real-time, multi-source, and able to give stable sound localization is beyond the current state-of-the-art-even using expensive special-purpose hardware. The "auralizer" or "aural renderer" described here was built as a test-bed for experimenting with the known techniques for generating sound localization cues based on the geometrical models available in a synthetic 3-D world. This paper introduces the psychoacoustical background of sound localization, and then describes the design and usage of the DIVE auralizer. We close by evaluating the system's implementation and performance.    Get the PDF file

The Use of 3-D Audio in a Synthetic Environment (with Lennart E. Fahlén)

Proceedings of the 1993 AIMI Colloquium, Milan, Italy.
(See the above abstract.) Get the PDF file

Machine Tongues--Computer Music Journal Survey or Tutorial Articles

Machine Tongues XI: Object-oriented Software Design

Computer Music Journal 13(2):9-22, Summer, 1989
Object-oriented programming is a term that represents a collection of new techniques for problem-solving and software engineering. Two previous articles in this "Machine Tongues" series have introduced object-oriented programming, presenting tutorials to this technology, and describing its application to music modeling and software development (Krasner 1980, Lieberman 1982). This paper discusses the new problem-solving techniques that constitute the object-oriented design methodology. Object-oriented analysis, synthesis, design and implementation are presented, while stressing the issues of design by analytical modeling, design for reuse, and the development of software packages in terms of frameworks, toolkits and customizable applications. Numerous object-oriented software description examples and architectural structures are presented including music modeling, representation and interactive applications. This essay will outline object-oriented problem-solving and software design in a language independent manner. Examples will be taken primarily from the Smalltalk-80 (TM of ParcPlace Systems) programming system, but the reader need only refer to some of the other articles in this issue of Computer Music Journal for descriptions of systems based on other languages and programming environments. No basic introduction to the terms or techniques of object-oriented languages will be presented here. Get the PDF file

Machine Tongues XV: Three Packages for Software Sound Synthesis

Computer Music Journal 17(2): 23-54, Summer, 1993
The origin of the technology and methodology of modern computer music is certainly the Music V family of software sound synthesis systems developed since the late 1950s. In the "old days," this consisted of batch computer processing of musical programs expressed in terms of instrument definitions (programs) and score note lists (input data), generating sampled sound output data to off-line storage for later performance. The noticeable rekindling of interest in programs and languages for software sound synthesis (SWSS) and software digital audio signal processing (DSP) using general-purpose computers is due to a number of factors, not least among them the dramatic increase in the power of personal workstations over the last five years. There are currently three widely-used, portable, C-language SWSS tools: (in alphabetical order) cmix (Lansky 1990), cmusic (Moore 1990), and Csound (Vercoe 1991). This article will discuss the technology of SWSS and then present and compare these three systems. It is divided into three parts; the first introduces SWSS in terms of progressive examples. Part two compares the three systems using the same two instrument/score examples written in each of them. The final section presents informal benchmark tests of the systems run on two different hardware platforms-a Sun Microsystems SPARCstation-2 IPX and a Next Computer Inc. TurboCube machine-and subjective comments on various features of the languages and programming environments of state-of-the-art SWSS software.  Get the PDF file

Machine Tongues XVIII. A Child's Garden of Sound File Formats (with Guido Van Rossum)

Computer Music Journal 19(1): 25-63 Spring, 1995.
This article introduces a few of the many ways that sound data can be stored in computer files, and describes several of the file formats that are in common use for this purpose. This text is an expanded and edited version of a "frequently asked questions" (FAQ) document that is updated regularly by one of the authors (van Rossum). Extensive references are given here to printed and network-accessible machine-readable documentation and source code resources. Getthe PDF file

Object-Oriented Programming and Design Patterns

Metamodels and Design Patterns in CSL4 (with Xavier Amatriain, Lance Putnam, Jorge Castellanos, and Ryan Avery)

Proceedings of the 2006 International Computer Music Conference.
The task of building a description language for audio synthesis and processing consists of balancing a variety of conflicting demands and constraints such as easy learning curve, usability, flexibility, extensibility, and run-time performance. There are many alternatives as to what a modern language for describing signal processing patches should look like. This paper describes the object-oriented models and design patterns used in version 4 of the CREATE Signal Library (CSL), a full rewrite that included an effort to use concepts from the ”4MS” metamodel for multimedia systems, and to integrate a set of design patterns for signal processing. We refer the reader to other publications for an introduction to CSL, and will concentrate on design and implementation choices in CSL4 that simplify the kernel classes, improve their performance, and ease their extension while using best-practice software engineering techniques. Get the PDF file

The Well-Tempered Object: Musical Applications of Object-Oriented Software Technology -- A Structured Anthology on Software Science and Systems based on Articles from Computer Music Journal 1980-89

Compiled and edited by Stephen Travis Pope. Published by MIT Press, 1991

See Well-Tempered Object Web Page

A Description of the Model-View-Controller User Interface Paradigm in the Smalltalk-80 System (The MVC Cookbook) (with Glenn Krasner)

Journal of Object-Oriented Programming 1(3):26-49 This essay describes the Model-View-Controller (MVC) programming paradigm and methodology used in the Smalltalk-80TM programming system. MVC programming is the application of a three-way factoring, whereby objects of different classes take over the operations related to the application domain, the display of the application's state, and the user interaction with the model and the view. We present several extended examples of MVC implementations and of the layout of composite application views. The Appendices provide reference materials for the Smalltalk-80 programmer wishing to understand and use MVC better within the Smalltalk-80 system. Get the PDF file


Presentation Slides

Keynote Speech from the CWU Symposium on Undergraduate Research and Creative Expression (SOURCE)

Get the slides as a PDF File See also STP's SOURCE Links

The State of the Art in "Sound and Music Computing"

Slides for a presentation given at the weekly computer science colloquium, UCSB, Feb. 7, 1996. Get the PDF file

Composition by Refinement

Presentation at the AIMI Conference, 1989.
Description of the use of the HyperScore ToolKit for composition.  Get the PDF file

Building Large-scaleInteractive Systems with OSC, Siren, CSL, and CRAM

UC Berkeley AudioIcon Workshop, 2003
Get the PDF file


CREATE White Papers and Project Reports

Distributed Multimedia Systems R&D at CREATE

Since 1996, the UCSB Center for Research in Electronic Art Technology (CREATE) has been the home of a series of projects on distributed software systems for real-time and multimedia applications. Several aspects of our work are relevant to a new classes of applications as more and more systems are built using distributed object software technology for real-time services. This white paper describes our previous projects and innovations in this area and our plans for the future.  Get the PDF file.

Research on Spatial and Surround Sound at CREATE

Researchers at the UCSB Center for Research in Electronic Art Technology (CREATE) have been developing spatial sound performance systems and multichannel surround sound rendering software for several years. We use these systems as components of immersive user interfaces for a variety of applications, as well as for the performance of spatialized music. This white paper surveys our previous work in the field and describes our plans for the future. Get the PDF file.

Research on Music/Sound Databases at CREATE

Large-scale storage of sound and music has only become possible in the last decade. With this, and the new possibility for wide-area distribution of multimedia over the Internet, there arose a new requirement for flexible and powerful databases for musical and audio data. Since 1996, our work at CREATE has focused on database frameworks for multimedia applications, and on analysis and feature extraction techniques for music and sound databases. This white paper describes our results and presents several of our plans for future applications. Get the PDF file.

Application and User Interface Development at CREATE

The history of computer applications in music reaches back into the 1950s. Only recently, however, has it been possible to control complex musical processes such as algorithmic composition or sophisticated sound synthesis programs in real-time. Advanced software and hardware technology also allow us to develop user interfaces that allow non-musicians (and even non-readers) to be musically creative. These two domains of application development and user interface construction have been important tasks at CREATE for ten years. We present examples of tools weíve developed below, and discuss what features they introduce that might be useful to other application areas. Get the PDF file.

The CREATE Signal Library (“Sizzle”): Design, Issues, and Applications (with Chandrasekhar Ramakrishnan)

Proceedings of the 2003 International Computer Music Conference
The CREATE Signal Library (CSL) is a portable general-purpose software framework for sound synthesis and digital audio signal processing. It is implemented as a C++ class library to be used as a standalone synthesis server, or embedded as a library into other programs. This first section of this paper describes the overall design of CSL version 3 and gives a series of progressive code examples. We also present CSL's facilities for network I/O of control and sample streams, and the development and deployment of distributed CSL systems. What is more interesting is the discussion that follows of the design issues we faced in implementing CSL, and the presentation of a few of the applications in which we've used CSL over the last year. Get the PDF file.


See Also

  • Full Bibliography
  • List of Musical Compositions
  • Example Reviews of My Music
  • Computer Music Journal WWW/FTP Archives (many music-related links)
  • Return to home page

    For more detailed information, mail a letter to STP.

    [Stephen Travis Pope, stp@create.ucsb.edu]