I've created a diffusion-based neural net generative assistant that makes creating new melodies much easier, even for non-musicians like me. These are meant to be just the catchy "hook" parts of songs, so more work is required to make them into full songs, but this is already handled well by existing products (e.g. there are plugins that can suggest a few possible chord progressions based on the melody and there is even good singing software that I used without any tweaks to make the “voice” playlist: Synthesizer V Studio).
This side project turned out to be quite challenging because of how little data there is to train on - several orders of magnitude less than DALL-E or GPT-3 had available for its training, so it required a deep dive into research of new generalization and augmentation techniques and some feature engineering.
Various other instruments:
SoundCloud electric piano: https://soundcloud.com/lech-mazur-995769534/sets/ai-assistan...
SoundCloud vocal: https://soundcloud.com/lech-mazur-995769534/sets/ai-assistan...
Are you familiar with aiva.ai? They also use midi format. Though I thought after Jukebox everyone would switch to raw audio.Reply
I think AI in music generation will end up being a big thing like with DALL-E for images and GPT3 for text. Possibly more because it seems like people have less intuition for creating music than they do for images and words.Reply
This is really neat. Is there a colab or Jupyter notebook we can look at?Reply
After listening to the same pieces over and over again on my kids' music boxes and whatnot, I've really wanted a music box that just auto-generated new 0:15 second pieces every time instead of a static loop.Reply
I've had pretty good luck recently with a mix of SUNDAE (https://arxiv.org/abs/2112.06749) and coconet (https://arxiv.org/abs/1903.07227) and/or Music Transformer based internal models recently for modeling very small datasets of polyphonic "midified" music. Research paper hopefully soon to come... Not sure what your pipeline looks like, but those papers might be worth putting on your radar. And as you mention, symbolic music datasets are both surprisingly small, surprisingly low quality, and generally a huge pain to work with. Cool stuff - I like the sax!
For anyone unfamiliar with diffusion models (and coconet / OrderlessNADE), one of the really nice properties of them as opposed to "standard" autoregressive (GPT / RNN) style models, is that you should be able to specify any part, and fill in any other part - rather than being forced to specify the "past" and predict only the "future". The coconet "doodle" is a good example of this interface at work (https://www.google.com/doodles/celebrating-johann-sebastian-...)
XLNet had some of this promise too (https://arxiv.org/abs/1906.08237) but I never had much luck with it as a pure generator. Autoregressive Diffusion models (https://openreview.net/forum?id=Lm8T39vLDTE) have similar properties, but I haven't had time to sus out the subtle differences yet.Reply
Am I the only one who listened to a few, and neither turned catchy at all?Reply
This is fascinating. I've always loved the idea of a .midi melody being able to be mapped to a "vocal" track and transformed accordingly—akin to Autotune The News, but with AI/ML instead of by-hand.
In a sense, the lyrics could be spoken and then applied to the .midi melody. The result would sound similar to the SoundCloud vocal OP link, but with "words".Reply