Ramirez and Hazan, 2005: Generating Expressive Music Performances

[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Rafael Ramirez and Amaury Hazan, A Learning Scheme for Generating Expressive Music Performances of Jazz Standards, International Joint Conference on Artificial Intelligence, pp. 1628-1629, 2005.

@InProceedings{RamirezHazan2005,
  title =	{A Learning Scheme for Generating Expressive Music
	 Performances of Jazz Standards},
  author =	      {Ramirez, Rafael and Hazan, Amaury},
  year =	      {2005},
  bibdate =	      {2005-12-09},
  bibsource =	      {DBLP,
	     http://dblp.uni-trier.de/db/conf/ijcai/ijcai2005.html#RamirezH05},
  booktitle =	{International Joint Conference on Artificial Intelligence},
  crossref =	{conf/ijcai/2005},
  pages =  {1628--1629},
  URL =	   {http://www.ijcai.org/papers/post-0429.pdf},
}

Author of the summary: Robert Bertschi, 2006, 2rab@qlink.queensu.ca

Cite this paper for:

Recent studies of musical expressive performance done by applying machine learning techniques.

-------------------

The actual paper can be found at http://www.ijcai.org/papers/post-0429.pdf

Ramirez and Hazan describe their approach for generating an expressive music performance of a monophonic Jazz melody. There are three components to this system:

1. A melodic transcription component which extracts a set of acoustic features from monophonic recordings [p. 1]

2. A machine learning component which induces an expressive transformation model from the set of extracted acoustic features [p. 1]

3. A melody synthesis component which generates expressive monophonic output (MIDI or audio) from inexpressive melody descriptions using the induced expressive transformation model. [p. 1]

The main approach in the past for generating an expressive performance of a melody has been based on statistical analysis (Repp, 1992), mathematical modelling (Todd, 1992) and analysis-by-synthesis (Friberg, 1995)

These methods all rely on the person who is responsible for devising a theory or mathematical model which will capture all of the different aspects of an expressive performance.

Ramirez and Hazan use a more recent idea; that is to apply machine learning techniques to the study of expressive performances. Others have also used this method in different ways:

1. (Widmer, 2002) focused on discovering general rules of expressive classical piano.

2. (Lopez de Mantaras, 2002) a case-based reasoning system able to infer a set of expressive transformations. Predictions could not be explained.

Ramirez and Hazan are exploring expressive performance based on inductive machine learning. The changes and deviations they considered for this study were:

a set composed of 1936 performance notes

Mainly concerned with note-level (duration, onset and energy) and intra-note-level (pitch and amplitude shape) expressive transformations. Each note from the data set is given a number of attributes that represent the properties of the note (duration, metrical position and envelope) and some information on its context (duration of previous and following notes).

There are many different methods of machine learning that can be used 1 to. These methods are:

regression trees

model trees

support vector machines

Model trees were chosen to be the most accurate and is the learning component used by Ramirez and Hazan. Rule-based models were also used to explain the predictions made by the tool.

The learning scheme used is as follows:

1. Apply k-clustering to all the notes. 5 clusters. Each note characterized by attack, sustain and release.

2. Apply classification algorithm (classification trees) in order to predict the cluster which each note belongs to, using the note descriptors above.

3. Given a note, apply a nearest neighbour algorithm to determine the most similar note within the same cluster. Distance measured in pitch and duration.

Once all the notes are obtained by the learning scheme they are linked together and an algorithm to obtain smooth note transitions is applied.

The melody synthesis component of this model transforms the inexpressive melody into an expressive melody through the use of induced models. In order to do this the notes obtained are transformed according to the computed duration, onset and energy deviations. Then these notes are joined together using an algorithm which optimizes transitions between notes.

Example of expressive melodies produced by this model: www.iua.upf.es/~rramirez/promusic/demo.wav

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies (jim@jimdavies.org)

Last modified: Thu Apr 15 11:07:19 EDT 1999 .