TIKTOK Bot Detection Model
Overview
This directory contains a trained Random Forest classifier for detecting bot accounts on Tiktok.
Model Version: v2 Training Date: 2025-12-30 11:38:35 Framework: scikit-learn 1.5.2 Algorithm: Random Forest Classifier with GridSearchCV Hyperparameter Tuning
π Model Performance
Final Metrics (Test Set)
| Metric | Score |
|---|---|
| Accuracy | 0.9224 (92.24%) |
| Precision | 0.9596 (95.96%) |
| Recall | 0.9094 (90.94%) |
| F1-Score | 0.9338 (93.38%) |
| ROC-AUC | 0.9773 (97.73%) |
| Average Precision | 0.9596 (95.96%) |
Model Improvement
- Baseline ROC-AUC: 0.9759
- Tuned ROC-AUC: 0.9773
- Improvement: 0.0014 (0.14%)
ποΈ Files
| File | Description |
|---|---|
tiktok_bot_detection_v2.pkl |
Trained Random Forest model |
tiktok_scaler_v2.pkl |
MinMaxScaler for feature normalization |
tiktok_features_v2.json |
List of features used by the model |
tiktok_metrics_v2.txt |
Detailed performance metrics report |
images/ |
All visualization plots (13 images) |
README.md |
This file |
π― Dataset Information
Training Configuration
- Training Samples: 2,385
- Test Samples: 596
- Total Samples: 2,981
- Number of Features: 12
- Cross-Validation Folds: 5
- Random State: 42
Class Distribution
Training Set:
- Human (0): 951 (39.87%)
- Bot (1): 1,434 (60.13%)
Test Set:
- Human (0): 244 (40.94%)
- Bot (1): 352 (59.06%)
π§ Features (13)
IsPrivateIsVerifiedHasProfilePicFollowingCountFollowerCountLikesCountHasInstagramHasYoutubeHasBioHasLinkInBioHasPostsPostsCountFollowToFollowerRatio
π Top 5 Most Important Features
- FollowToFollowerRatio - 0.2330
- LikesCount - 0.1771
- HasInstagram - 0.1395
- FollowingCount - 0.1349
- FollowerCount - 0.1055
βοΈ Hyperparameters
Best Parameters (from GridSearchCV)
- class_weight: None
- max_depth: 13
- max_features: sqrt
- min_samples_leaf: 2
- min_samples_split: 10
- n_estimators: 100
Parameter Search Space
- n_estimators: [100, 200, 300]
- max_depth: [10, 15, 20, None]
- min_samples_split: [2, 5, 10]
- min_samples_leaf: [1, 2, 4]
- max_features: ['sqrt', 'log2']
- bootstrap: [True, False]
Total combinations tested: 540
π Cross-Validation Results
Mean Scores (5-Fold Stratified CV)
- Accuracy: 0.9191 (Β±0.0097)
- Precision: 0.9326 (Β±0.0115)
- Recall: 0.9331 (Β±0.0166)
- F1-Score: 0.9327 (Β±0.0083)
- ROC-AUC: 0.9744 (Β±0.0055)
πΌοΈ Visualizations
All visualizations are saved in the images/ directory:
- 01_class_distribution.png - Training/Test set class distribution
- 02_feature_correlation.png - Feature correlation with target variable
- 03_correlation_matrix.png - Feature correlation heatmap
- 04_baseline_confusion_matrix.png - Baseline model confusion matrix
- 05_baseline_roc_curve.png - Baseline ROC curve
- 06_baseline_precision_recall.png - Baseline Precision-Recall curve
- 07_baseline_feature_importance.png - Baseline feature importance
- 08_cross_validation.png - Cross-validation score distribution
- 09_tuned_confusion_matrix.png - Tuned model confusion matrix
- 10_tuned_roc_curve.png - Tuned ROC curve
- 11_tuned_precision_recall.png - Tuned Precision-Recall curve
- 12_tuned_feature_importance.png - Tuned feature importance
- 13_model_comparison.png - Baseline vs Tuned comparison
π Usage Example
import joblib
import pandas as pd
import numpy as np
# Load model and scaler
model = joblib.load('tiktok_bot_detection_v2.pkl')
scaler = joblib.load('tiktok_scaler_v2.pkl')
# Prepare your data (example)
data = {
'IsPrivate': 0.5,
'IsVerified': 0.5,
'HasProfilePic': 0.5,
'FollowingCount': 0.5,
'FollowerCount': 0.5,
'LikesCount': 0.5,
'HasInstagram': 0.5,
'HasYoutube': 0.5,
'HasBio': 0.5,
'HasLinkInBio': 0.5,
'HasPosts': 0.5,
'PostsCount': 0.5,
'FollowToFollowerRatio': 0.5,
}
# Create DataFrame
df = pd.DataFrame([data])
# Scale features
df_scaled = scaler.transform(df)
# Predict
prediction = model.predict(df_scaled)[0]
probability = model.predict_proba(df_scaled)[0]
print(f"Prediction: {'Bot' if prediction == 1 else 'Human'}")
print(f"Bot Probability: {probability[1]:.4f}")
print(f"Human Probability: {probability[0]:.4f}")
π Confusion Matrix Breakdown
Tuned Model (Test Set)
Predicted
Human Bot
Actual Human 220 24
Bot 18 334
- True Negatives (TN): 220 (Correctly identified humans)
- False Positives (FP): 24 (Humans incorrectly classified as bots)
- False Negatives (FN): 18 (Bots incorrectly classified as humans)
- True Positives (TP): 334 (Correctly identified bots)
π Model Interpretation
Strengths
- High ROC-AUC score (0.9754) indicates excellent discrimination capability
- Balanced precision and recall for both classes
- Robust cross-validation performance
Key Insights
- Top features drive bot classification effectively
- GridSearchCV improved performance over baseline by 0.25%
- Model generalizes well on unseen test data
π Notes
- Feature Scaling: All features are scaled using MinMaxScaler to [0, 1] range
- Missing Values: Filled with 0 during preprocessing
- Class Balance: Imbalanced dataset
- Model Type: Ensemble method resistant to overfitting
π Model Updates
To retrain the model:
- Place new training data in
../data/train_tiktok.csv - Run the training notebook:
5_enhanced_training.ipynb - Update this README with new metrics
π§ Contact & Support
For questions or issues regarding this model, please refer to the main project documentation.
Generated: 2025-12-30 11:38:35
Notebook: 5_enhanced_training.ipynb
Platform: Tiktok
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support