This is a discrete text diffusion model, adapted from jhu-clsp/mmBERT-base.

It was trained on 760k sample SFT dataset, 50% of which was taken from allenai/tulu-3-sft-mixture and 50% from Vikhrmodels/GrandMaster-PRO-MAX datasets. Resulting performance on English is okay, in Russian it loops.

Training was carried out using my own fork of dllm repository. Command for training:

python -m accelerate.commands.launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
    examples/bert/sft.py \
    --model_name_or_path "jhu-clsp/mmBERT-base" \
    --dataset_args "allenai/tulu-3-sft-mixture|Vikhrmodels/GrandMaster-PRO-MAX" \
    --max_length 1024 \
    --num_train_epochs 10 \
    --per_device_train_batch_size 12 \
    --per_device_eval_batch_size 12 \
    --save_steps 0.1 \
    --output_dir "models/etmmBERT-base/tulu-3-grandmaster/epochs-10-bs-384-len-1024"

It is recommended to set a higher temperature to combat repetition issues. To run the model, use:

python examples/bert/chat.py --model_name_or_path path/to/checkpoint --temperature 1.0

Generation example:

[Question]
Who are you?
[/Question]

[Answer]
As an artificial intelligence, I am not a human. I am a program designed to process and analyze information, data, and algorithms to provide useful responses and support users with queries. My purpose is to assist with information, answer questions, and provide responses based on user input, data, and algorithms.

I am designed to process and analyze information, data, and algorithms to provide useful responses and support users with queries. My purpose is to assist with information, answer questions, and provide responses based on user input, data, and algorithms.
[/Answer]

<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>