odinglynn
/

swiftvit-150

Video Classification

Model card Files Files and versions

swiftvit-150 / README.md

odinglynn's picture

Super-squash branch 'main' using huggingface_hub

4581f62 verified about 2 months ago

|

history blame contribute delete

2.53 kB

	---
	language:
	- en
	pipeline_tag: video-classification
	tags:
	- birds
	- swifts
	- MViTv2
	- Ballinrobe
	license: other
	license_name: bcs-lcs
	license_link: LICENSE
	base_model:
	- timm/mvitv2_small.fb_in1k
	library_name: transformers
	datasets:
	- odinglynn/swift-150
	---

	# SwiftViT-150

	MViT-v2 fine-tuned on 150 videos for common swift feeding behavior classification.

	## Model

	Fine-tuned `mvit_v2_s` (Kinetics-400 pretrained) on single-camera nestbox footage. Achieves ~87% validation accuracy (in controlled settings) and demonstrates surprising cross-camera generalization despite training on a single viewpoint and on a miniscule dataset (150 samples).

	## Usage
	```python
	import torch
	import torchvision

	model = torchvision.models.video.mvit_v2_s(weights=None)
	model.head = torch.nn.Sequential(
	torch.nn.Dropout(0.5),
	torch.nn.Linear(768, 512),
	torch.nn.GELU(),
	torch.nn.Dropout(0.3),
	torch.nn.Linear(512, 3),
	)

	checkpoint = torch.load("swiftvit-150.pth")
	model.load_state_dict(checkpoint["model_state_dict"])
	model.eval()

	# Inference
	with torch.no_grad():
	video = load_video() # Shape: [C, T, H, W]
	output = model(video.unsqueeze(0))
	prediction = torch.argmax(output, dim=1)
	# 0: feeding, 1: possible_feeding, 2: not_feeding
	```

	## Architecture

	- Base: MViT-v2 Small (24M params)
	- Head: Custom 768→512→3 with dropout
	- Input: 16 frames @ 224x224
	- Classes: 3 (feeding, possible_feeding, not_feeding)

	## Training

	- 120 train / 30 val samples
	- Batch size: 4
	- Optimizer: AdamW (lr=1e-4, wd=0.05)
	- Scheduler: CosineAnnealingWarmRestarts
	- Mixed precision training on H100
	- Early stopping: 40 epoch patience

	## Performance

	- Train accuracy: 100%
	- Val accuracy: 87%
	- Unexpected cross-camera generalization observed

	## Dataset

	Trained on [swift-150](https://huggingface.co/datasets/odinglynn/swift-150) - 150 videos from GABLE nestbox camera (Ireland, 2020-2025).

	## Context

	Part of climate research correlating swift feeding patterns with weather data at terrabyte scale. Ballinrobe Community School entry for REDACTED.

	## Citation

	If you reference this work, cite:
	```bibtex
	@misc{swift150bcs,
	title={Swift-150: A Dataset for Common Swift Feeding Behavior Analysis},
	author={Odin Glynn-Martin, Culan O'Meara, Anas Rashid, Shayden D'Souza, Pádraig Foley and Mark Lally},
	year={2025},
	institution={Ballinrobe Community School},
	url={https://ballinrobecommunityschool.ie},
	note={REDACTED - Entry 2025}
	}
	```

	## License

	Proprietary. See LICENSE for restrictions.