Sound segmentation

by snapo - opened 17 days ago

17 days ago

Was absolutely excited to see segment anything until i did read picture/video...
does facebook also plan to "segment sound" ... for example in movies you have background music, background noise, foreground noise, voices and thousands of sound effects.... is facebook in the work to "one day" truly be able to segment everything picture,video, sound?

Tesseract3D

17 days ago

As an audio researcher in machine learning - this is one of the most difficult areas to segment. especially with foreground noise segmented to background noise.
I think demucs does a good job for vocals for general segmenting and music generally, but its not perfect.

snapo

16 days ago

correct, segmenting sounds is extremely difficult (kinda like solving billions of fourier transforms) maybe one day we get it.
My reason for it is very simple, as soon as we have this, we would be able to segment movie scene sounds (voices, sound effects, background sounds, etc.) from there it isnt far then to fully create a movie entierly created by AI :-)

thanks for mentioning demucs, i will check it out...

Tesseract3D

16 days ago

You might actually be able to do this right now with Demucs.
Just make sure you have the highest audio quality possible.
Demucs can split the stream into Vocals, Instrument.
You could then take the vocals and phase cancel the original file, to get background audio without music.
You could also do some audio processing here to clean up the artefacts of the segmentation as it causes phasing issues. but its doable.

Go check it out: https://github.com/adefossez/demucs
Meta originally had the source but the owner left Meta and still maintains it.

snapo

16 days ago

thats a pretty cool idea to extract vocals and music then inverse apply it to the original audio... the only problem i see is with different sound effects together... for example a gun shot and glass cracking , maybe i find somehow a way. Thank you realy much for the input, appreciate it...

Tesseract3D

16 days ago

Yeah this loops back to the original fundamental issue you originally mentioned. even if you FFT every frequency, every single point in a DAC's PCM will be a quantised point.
There will always be loss of information just by the nature of quantization. The rabbit hole of signal processing :D

likewendy

13 days ago

You might actually be able to do this right now with Demucs.
Just make sure you have the highest audio quality possible.
Demucs can split the stream into Vocals, Instrument.
You could then take the vocals and phase cancel the original file, to get background audio without music.
You could also do some audio processing here to clean up the artefacts of the segmentation as it causes phasing issues. but its doable.

Go check it out: https://github.com/adefossez/demucs
Meta originally had the source but the owner left Meta and still maintains it.

very good idea💡

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment