My fork with experiments; https://github.com/spsanps/diffwave-sashimi
S4 → Sashimi (S4 Diffusion model) → It's Raw! Audio Generation with State-Space Models. Karan Goel, Albert Gu, Chris Donahue, Christopher Ré
Sashimi → https://github.com/albertfgu/diffwave-sashimi → Unconditional Generation, Spectrogram based generation (diffusion)
S4 models → Sashimi → Can it do something like deep-performer? (score → audio)
Only used the Bach Violin Dataset (Small, Aligned, good for experiments).
Used1s samples, either 8Khz or 16Khz
Tried: (from worst to best)
MIDI → synthesized waveform
Synthesized waveform:
emil-telmanyi_bwv1001_mov1_syn.wav
Synthesized → Unet (S4) → Original Output
This didn’t work! The model wasn’t really training. (But maybe model size was too small or some other problem with how I set it up).