more than 40GB vRAM consumed by paddleocr-vl on A100 GPU!!!

#59
by sayed99 - opened

image

this is running on colab with the official code snippet in the model card using the transformers library, it consumed more than 40GB vRAM during the inference of a single page, although it is a simple page contains no more than 300 arabic words, also i taken more than 2 minutes to generate the result which not very optimized, I will investigate on that to know if that is because specific missing configuration or attention stuff or what is the reason for that.

for reference here's the sample image i run on it:

page_1

I modified the code snippet to use the flash attn 2 implementation and the performance got boosted to be 19 seconds instead of 2 minutes and using GPU vRAM of 3.3 GB instead of the massive 45 GB.
could I do a pull request for the new snippet?

@sayed99 , Could you point exactly to the place where you have made the change?

@maksym-ostapenko
Hi! I recently updated the README to modify the Transformers code and enable the use of FlashAttention 2. This change significantly reduces memory usage and improves performance.

Hereโ€™s the pull request I opened for the model card README: Model Card PR

PaddlePaddle org

I modified the code snippet to use the flash attn 2 implementation and the performance got boosted to be 19 seconds instead of 2 minutes and using GPU vRAM of 3.3 GB instead of the massive 45 GB.
could I do a pull request for the new snippet?

Contributions are highly welcome!

@sayed99 please can you share your notebook?
I have been trying to run the model on Colab with no luck.

@Vinci Hi!, you could find the updated colab optimized code under the new section "Click to expand: Use flash-attn to boost performance and reduce memory usage" on the model card,
and no problem here's the full notebook of the experiment, but please comment out the first part as I assume it will crash on the free t4 gpu due to limited memory, go for the 2 section directly when using the flash attn.
paddle-paddle-inference

Sign up or log in to comment