You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Highlights

Base model: google/vit-base-patch16-224
Parameters: ~97.2M
Training samples: 2,000,000 curated plant occurrences
Species coverage: ~14,000 unique species
Source data: GBIF (research-grade iNaturalist images)
Training method: End-to-end supervised fine-tuning
Use-case: Fast species-level classification from a single photo

Example Usage

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch

model_id = "juppy44/plant-identification-2m-vit-b"

processor = AutoImageProcessor.from_pretrained(model_id)
model = AutoModelForImageClassification.from_pretrained(model_id)

url = "https://example.com/plant.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

pred = logits.softmax(dim=-1)[0]
topk = torch.topk(pred, k=5)

for prob, idx in zip(topk.values, topk.indices):
    label = model.config.id2label[idx.item()]
    print(f"{label}: {prob.item():.4f}")

Intended Applications

Ecological surveys
Nursery and horticulture tools
Restoration and revegetation workflows
Field research and biodiversity monitoring
Citizen science and educational platforms
Image-based species tagging pipelines

Data & Training Details

Dataset Construction

Sourced from GBIF occurrences with valid species and image metadata.
Cleaned and deduplicated.
Species filtered to those with ≥ 20 images.
Maximum cap of 1,000 images per species to reduce class imbalance.
Final training dataset: 2,000,000 images across ~14k species.

Training

ViT-Base fine-tuned for 1 epoch over 2M samples.
AdamW optimizer, standard ViT augmentations.
Mixed-precision training on GPU.

Limitations

Some species are visually indistinguishable without context (location, reproduction structures, etc.).
Performance varies for rare, morphologically similar, or poorly photographed species.
No location metadata incorporated yet — purely image-based.

Labels

Species names follow the canonical GBIF taxonomy (species_name). Each class corresponds directly to a species.

You can inspect all labels via:

from transformers import AutoConfig
cfg = AutoConfig.from_pretrained("juppy44/plant-identification-2m-vit-b")
labels = cfg.id2label

Performance

No fixed metric is published yet due to absence of a clean evaluation split in GBIF/iNat. However, the model performs strongly in practical testing and is suitable for downstream adaptation or LoRA-based fine-tuning.

Formal benchmarks will be added when a standardized evaluation subset is released.

Fine-Tuning & Adapters

You can further specialize the model using LoRA adapters for:

Regional subsets
Functional groups
Threatened species
Agricultural crops
Disease classification

The base model is trained broadly enough to support domain-specific adapter tuning with minimal compute.

License

Model weights follow the same license as the underlying ViT-Base model. Users are responsible for ensuring compliance with GBIF/iNaturalist usage terms for any downstream dataset creation.

Downloads last month: 50

Safetensors

Model size

97.2M params

Tensor type

F32

Model tree for juppy44/plant-identification-2m-vit-b

Base model

google/vit-base-patch16-224

Finetuned

(927)

this model

juppy44
/

plant-identification-2m-vit-b