Spaces:
Sleeping
Sleeping
DeepDefend
Multi-Modal Deepfake Detection System
Detect AI-generated deepfakes in videos using computer vision and audio analysis
Overview
DeepDefend is a comprehensive deepfake detection system that combines video frame analysis and audio analysis to identify AI-generated synthetic media. Using machine learning models and AI-powered evidence fusion, it provides detailed, interval-by-interval analysis with explainable results.
Why DeepDefend?
- Multi-Modal Analysis: Combines video and audio detection for higher accuracy
- AI-Powered Fusion: Uses LLM to generate human-readable reports
- Interval Breakdown: Shows exactly which parts of the video are suspicious
- REST API: Easy integration with any frontend or application
Features
Core Detection Capabilities
Video Analysis
- Frame-by-frame deepfake detection using pre-trained models
- Face detection and region-specific analysis
- Suspicious region identification (eyes, mouth, face boundaries)
- Confidence scoring per frame
Audio Analysis
- Voice synthesis detection
- Spectrogram analysis for audio artifacts
- Frequency pattern recognition
- Audio splicing detection
AI-Powered Reporting
- LLM-based evidence fusion (Google Gemini)
- Natural language explanation of findings
- Verdict with confidence percentage
- Timestamped suspicious intervals
Processing Pipeline
Video Input
β
βββββββββββββββββββββ
β Media Extraction β β Extract frames (5 per interval)
β β β Extract audio chunks
ββββββββββ¬βββββββββββ
β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββ
β Video Analysis β β Audio Analysis β β Timeline Gen β
β β’ Face detect β β β’ Spectrogram β β β’ 2s intervals β
β β’ Region scan β β β’ Voice synth β β β’ Metadata β
β β’ Fake score β β β’ Artifacts β β β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ ββββββββββ¬ββββββββ
β β β
ββββββββββββββββ¬βββββββββββββββ¬ββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββ
β LLM Fusion Engine β
β β’ Combine evidence β
β β’ Generate verdict β
β β’ Natural language reportβ
ββββββββββββββ¬ββββββββββββββ
βΌ
Final Report
(JSON Response)
Demo
Live Demo
API: https://deepdefend-api.hf.space
Docs: https://deepdefend-api.hf.space/docs
Example Analysis
Click to see sample output
{
"verdict": "DEEPFAKE",
"confidence": 87.5,
"overall_scores": {
"overall_video_score": 0.823,
"overall_audio_score": 0.756,
"overall_combined_score": 0.789
},
"detailed_analysis": "This video shows strong indicators of deepfake manipulation...",
"suspicious_intervals": [
{
"interval": "4.0-6.0",
"video_score": 0.891,
"audio_score": 0.834,
"video_regions": ["eyes", "mouth"],
"audio_regions": ["voice_synthesis_artifacts"]
}
],
"total_intervals_analyzed": 15,
"video_info": {
"duration": 12.498711111111112,
"fps": 29.923085402583734,
"total_frames": 374,
"file_size_mb": 31.36
},
"analysis_id": "4cd98ea5-8c14-4cae-8da4-689345b0aabc",
"timestamp": "2025-10-10T23:34:35.724916"
}
Installation
Prerequisites
- Python 3.10 or higher
- FFmpeg installed on your system
- Google Gemini API key
Local Setup
- Clone the repository
git clone https://github.com/yourusername/deepdefend.git
- Create virtual environment
python -m venv venv
# On Linux/Mac
source venv/bin/activate
# On Windows
venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Download ML models
python models/download_model.py
This will download ~2GB of models from Hugging Face
- Configure environment
cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY
- Run the server
uvicorn main:app --reload
The API will be available at http://127.0.0.1:8000
Docker Setup
# Build image
docker build -t deepdefend .
# Run container
docker run -p 8000:8000 -e GOOGLE_API_KEY=your_key deepdefend
Tech Stack
Backend
- Framework: FastAPI 0.109.0
- Server: Uvicorn
- ML Framework: PyTorch 2.3.1
- Transformers: Hugging Face Transformers 4.36.2
ML Models
- Video Detection: dima806/deepfake_vs_real_image_detection
- Audio Detection: mo-thecreator/Deepfake-audio-detection
- LLM Fusion: Google Gemini 2.5 Flash
Processing
- Computer Vision: OpenCV, Pillow
- Audio Processing: Librosa, SoundFile
- Video Processing: FFmpeg
Deployment
- Container: Docker
- Platforms: Hugging Face Spaces
Project Structure
deepdefend/
β
βββ extraction/
β βββ media_extractor.py # Frame & audio extraction
β βββ timeline_generator.py # Timeline creation
β
βββ analysis/
β βββ video_analyser.py # Video deepfake detection
β βββ audio_analyser.py # Audio deepfake detection
β βββ llm_analyser.py # LLM-based fusion
β βββ prompt.py # LLM prompts
β
βββ models/
β βββ download_model.py # Model downloader
β βββ load_models.py # Model loader
β βββ video_model/ # (Downloaded)
β βββ audio_model/ # (Downloaded)
β
βββ main.py # FastAPI application
βββ pipeline.py # Main detection pipeline
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ .gitignore
βββ README.md