Hacker News Re-Imagined

Show HN: Catchy melodies made with a diffusion-based neural net assistant

I've created a diffusion-based neural net generative assistant that makes creating new melodies much easier, even for non-musicians like me. These are meant to be just the catchy "hook" parts of songs, so more work is required to make them into full songs, but this is already handled well by existing products (e.g. there are plugins that can suggest a few possible chord progressions based on the melody and there is even good singing software that I used without any tweaks to make the “voice” playlist: Synthesizer V Studio).

This side project turned out to be quite challenging because of how little data there is to train on - several orders of magnitude less than DALL-E or GPT-3 had available for its training, so it required a deep dive into research of new generalization and augmentation techniques and some feature engineering.

Various other instruments:

Voice: https://www.youtube.com/playlist?list=PLoCzMRqh5SkE1yC8_WtJ-...

Synth: https://www.youtube.com/playlist?list=PLoCzMRqh5SkFj7RNZvjr7...

Bell: https://www.youtube.com/playlist?list=PLoCzMRqh5SkEYHYvHX9m9...

Guitar: https://www.youtube.com/playlist?list=PLoCzMRqh5SkGKvfkP2Oex...

Sax: https://www.youtube.com/playlist?list=PLoCzMRqh5SkHfsZgzzdSh...

Grand Piano: https://www.youtube.com/playlist?list=PLoCzMRqh5SkFMch5x60uh...

SoundCloud electric piano: https://soundcloud.com/lech-mazur-995769534/sets/ai-assistan...

SoundCloud vocal: https://soundcloud.com/lech-mazur-995769534/sets/ai-assistan...

  • 37 points
  • 7 days ago

  • @zone411
  • Created a post

Show HN: Catchy melodies made with a diffusion-based neural net assistant


@p1esk 6 days

Replying to @zone411 🎙

Are you familiar with aiva.ai? They also use midi format. Though I thought after Jukebox everyone would switch to raw audio.

Reply


@armchairhacker 7 days

Replying to @zone411 🎙

I think AI in music generation will end up being a big thing like with DALL-E for images and GPT3 for text. Possibly more because it seems like people have less intuition for creating music than they do for images and words.

Reply


@stolenmerch 7 days

Replying to @zone411 🎙

This is really neat. Is there a colab or Jupyter notebook we can look at?

Reply


@mcphage 7 days

Replying to @zone411 🎙

After listening to the same pieces over and over again on my kids' music boxes and whatnot, I've really wanted a music box that just auto-generated new 0:15 second pieces every time instead of a static loop.

Reply


@kastnerkyle 6 days

Replying to @zone411 🎙

I've had pretty good luck recently with a mix of SUNDAE (https://arxiv.org/abs/2112.06749) and coconet (https://arxiv.org/abs/1903.07227) and/or Music Transformer based internal models recently for modeling very small datasets of polyphonic "midified" music. Research paper hopefully soon to come... Not sure what your pipeline looks like, but those papers might be worth putting on your radar. And as you mention, symbolic music datasets are both surprisingly small, surprisingly low quality, and generally a huge pain to work with. Cool stuff - I like the sax!

For anyone unfamiliar with diffusion models (and coconet / OrderlessNADE), one of the really nice properties of them as opposed to "standard" autoregressive (GPT / RNN) style models, is that you should be able to specify any part, and fill in any other part - rather than being forced to specify the "past" and predict only the "future". The coconet "doodle" is a good example of this interface at work (https://www.google.com/doodles/celebrating-johann-sebastian-...)

XLNet had some of this promise too (https://arxiv.org/abs/1906.08237) but I never had much luck with it as a pure generator. Autoregressive Diffusion models (https://openreview.net/forum?id=Lm8T39vLDTE) have similar properties, but I haven't had time to sus out the subtle differences yet.

Reply


@electric_muse 6 days

Replying to @zone411 🎙

Nice work. Somehow this reminds me of Harvest Moon 64 music. I'm not sure I felt these were necessarily catchy (but I think that comes from repetition), but they were incredibly pleasant and interesting.

Also, obligatory re: earworms: https://www.youtube.com/watch?v=iDjX4-LKqCA

Reply


@lostmsu 4 days

Replying to @zone411 🎙

Am I the only one who listened to a few, and neither turned catchy at all?

Reply


@cuttysnark 7 days

Replying to @zone411 🎙

This is fascinating. I've always loved the idea of a .midi melody being able to be mapped to a "vocal" track and transformed accordingly—akin to Autotune The News, but with AI/ML instead of by-hand.

In a sense, the lyrics could be spoken and then applied to the .midi melody. The result would sound similar to the SoundCloud vocal OP link, but with "words".

Reply


About Us

site design / logo © 2022 Box Piper