Sound segmentation
Was absolutely excited to see segment anything until i did read picture/video...
does facebook also plan to "segment sound" ... for example in movies you have background music, background noise, foreground noise, voices and thousands of sound effects.... is facebook in the work to "one day" truly be able to segment everything picture,video, sound?
As an audio researcher in machine learning - this is one of the most difficult areas to segment. especially with foreground noise segmented to background noise.
I think demucs does a good job for vocals for general segmenting and music generally, but its not perfect.
correct, segmenting sounds is extremely difficult (kinda like solving billions of fourier transforms) maybe one day we get it.
My reason for it is very simple, as soon as we have this, we would be able to segment movie scene sounds (voices, sound effects, background sounds, etc.) from there it isnt far then to fully create a movie entierly created by AI :-)
thanks for mentioning demucs, i will check it out...
You might actually be able to do this right now with Demucs.
Just make sure you have the highest audio quality possible.
Demucs can split the stream into Vocals, Instrument.
You could then take the vocals and phase cancel the original file, to get background audio without music.
You could also do some audio processing here to clean up the artefacts of the segmentation as it causes phasing issues. but its doable.
Go check it out: https://github.com/adefossez/demucs
Meta originally had the source but the owner left Meta and still maintains it.
thats a pretty cool idea to extract vocals and music then inverse apply it to the original audio... the only problem i see is with different sound effects together... for example a gun shot and glass cracking , maybe i find somehow a way. Thank you realy much for the input, appreciate it...
Yeah this loops back to the original fundamental issue you originally mentioned. even if you FFT every frequency, every single point in a DAC's PCM will be a quantised point.
There will always be loss of information just by the nature of quantization. The rabbit hole of signal processing :D
You might actually be able to do this right now with Demucs.
Just make sure you have the highest audio quality possible.
Demucs can split the stream into Vocals, Instrument.
You could then take the vocals and phase cancel the original file, to get background audio without music.
You could also do some audio processing here to clean up the artefacts of the segmentation as it causes phasing issues. but its doable.Go check it out: https://github.com/adefossez/demucs
Meta originally had the source but the owner left Meta and still maintains it.
very good idea💡