Causal Masking used but Embedding Gemma is supposed to be using bidirectional attention

#36
by tltl123 - opened

Hi just to check, I read that Embedding Gemma uses bidirectional attention.

But from what I can see from the transformers code, it seems that a causal mask is used.

This would produce different results from an actual bidirectional attention.

Is this intended/correct?

Hi there 👋 A bidirectional mask is indeed necessary for this model (see https://huggingface.co/google/embeddinggemma-300m/blob/main/config.json#L57).

Make sure you're using the latest version of transformers, where this is indeed taken into account: https://github.com/huggingface/transformers/blob/a7f29523361b2cc12e51c1f5133d95f122f6f45c/src/transformers/models/gemma3/modeling_gemma3.py#L565

Sign up or log in to comment