** NOTE: get_text_features() method does a final projection to 512 dims and normalization after this forward pass 

=========================================================================================================
Layer (type:depth-idx)                                  Output Shape              Param #
=========================================================================================================
ClapTextModel                                           [1, 768]                  --
├─ClapTextEmbeddings: 1-1                               [1, 5, 768]               --
│    └─Embedding: 2-1                                   [1, 5, 768]               (38,603,520)
│    └─Embedding: 2-2                                   [1, 5, 768]               (768)
│    └─Embedding: 2-3                                   [1, 5, 768]               (394,752)
│    └─LayerNorm: 2-4                                   [1, 5, 768]               (1,536)
│    └─Dropout: 2-5                                     [1, 5, 768]               --
├─ClapTextEncoder: 1-2                                  [1, 5, 768]               --
│    └─ModuleList: 2-6                                  --                        --
│    │    └─ClapTextLayer: 3-1                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-2                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-3                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-4                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-5                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-6                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-7                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-8                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-9                          [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-10                         [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-11                         [1, 5, 768]               (7,087,872)
│    │    └─ClapTextLayer: 3-12                         [1, 5, 768]               (7,087,872)
├─ClapTextPooler: 1-3                                   [1, 768]                  --
│    └─Linear: 2-7                                      [1, 768]                  (590,592)
│    └─Tanh: 2-8                                        [1, 768]                  --
=========================================================================================================
Total params: 124,645,632
Trainable params: 0
Non-trainable params: 124,645,632
Total mult-adds (Units.MEGABYTES): 124.65
=========================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 4.18
Params size (MB): 498.58
Estimated Total Size (MB): 502.77
=========================================================================================================