Causal Masking used but Embedding Gemma is supposed to be using bidirectional attention

#36

by tltl123 - opened 3 days ago

3 days ago

Hi just to check, I read that Embedding Gemma uses bidirectional attention.

But from what I can see from the transformers code, it seems that a causal mask is used.

This would produce different results from an actual bidirectional attention.

Is this intended/correct?

Xenova

3 days ago

Hi there 👋 A bidirectional mask is indeed necessary for this model (see https://huggingface.co/google/embeddinggemma-300m/blob/main/config.json#L57).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment