PROJECT11 | Mysite

seq2seq model

Data Processing

MUSIC ACCOMPANIMENT

Using Seq2Seq Model, with an input lead melody data, it will generate the accompaniment melody on the exact beat. The melody data then change to MIDI notes and time data and write into a new MIDI file. With the help of Logic Pro X, I bounce the merged MIDI files into the MP3 data. The SoundCloud playlists show good outputs.

Model: I chose the RNN models since it is good at dealing with sequential data, for music information, time is a very important parameter, so using RNN is very reasonable. Here is also a link to the article by Andrej Kaparthy which talks about how powerful RNN is. For music accompaniment, I chose the special variations of RNN which is the Encoder-Decoder Model, also called the Seq2Seq Model. It gets rid of the limit of output having the same length as the input. The Seq2Seq Model includes an Encoder RNN, a Decoder RNN and a state vectors which pass the Encoder data to the Decoder.
I refer to the Keras Documentation Seq2Seq example, which is the English to French Translator.

Dataset: There are many formats of music files, using MIDI files is the most convenient way. Because we can access to the instrument type, time data and pitch data from the MIDI file. I download the MIDI files from this link.

Data Processing: In order to train the data, I need to first process the data. The data should be changed from the midi file to a 3D array. I first read the midi file into a text data by python pretty-midi library. And then I differentiate the lead track and accompaniment track into two lists. Each list stores all the data in the dataset. Then I changed these two lists to 3 different 3D matrices, which are encoder input data, encoder output data, and decoder output data. The 3D matrices are initialized to zero first. The Z index suggests the number of training pieces. The y-axis is the time information, with a static interval of 0.5s. The x-axis is the total number of the pitch, which is 0-127. Using One-Hot-Encoder, we set the matrices according to the list data. For example, from the start time of 0s and the end time 0.5s, the pitch 64's note is on, then set 64's index of the first line to 1, and other positions remain 0.

Conclusion and Future Work: I've learned a lot on this specific model and the algorithm of machine learning. I also think that there are a lot can be done in music accompaniment, such as more dynamic timing, adding more tracks of instruments, creating more variations and so on. Music Accompaniment also has a lot of potentials, such as real-time accompaniment, people singing and the machine automatically make accompaniment melody for the singer.

BACK