Training Repository & AudioCaps2.0
Hello again here too! 🤗
First of all, thanks again for sharing your works.
I wanted to know if it would be possible also here to share the repository of the model training in order to learn about the full approach and the hyperparameters you chose.
Additionally, I saw in the AudioCaps repo that a more updated version of the dataset (version 2.0) has been released.
I wanted to know if, by any chance, you plan to train a more up-to-date model on more data?
Thanks in advance.
Hi, thank you for your comment. The training and evaluation repo is here.
Also, thank you for your remind of updates in AudioCaps. I am planning to involve more data, possibly with AudioCaps v2.0 and WavCaps into training.
Thank you very much for sharing!
Awsome! Do you have an estimated time for the updated model you are planning?
Are you planning to update it under the same repository on Hugging Face or as a completely new model?
Thanks again :)
Hi, sorry for the late response as I was busy working for my PhD thesis ddl so I missed the message.
I am going to train the model but it may take some time to compare the best setting under new training corpus so I would say it may take two weeks to get the new model (if computing resources in my lab are available). I am going to release it under the same repository.
Hi, sorry for the late response as I was busy working for my PhD thesis ddl so I missed the message.
I am going to train the model but it may take some time to compare the best setting under new training corpus so I would say it may take two weeks to get the new model (if computing resources in my lab are available). I am going to release it under the same repository.
Thanks for the response :)
Best of luck with your PhD work.
I’ll keep an eye on the repo over the next few weeks to check for the update.
Thanks again!
Sorry for this long wait! The exploration takes much longer than expected since I made many changes to the code base and the model architecture at this time, and the original code is not so robust.
Besides, I tried incorporating AudioSet strong and WavCaps into training for generalizability to music data but results on AudioGrounding test dataset shows these data do not bring further improvement.
So finally the new model is available here. Feel free to try it and check whether the performance improves indeed.
Thank you very much for the update!
Again, many thanks for sharing your work!
I’ll definitely give it a try :)