Spaces:

Thadillo
/

participatory-planner

Sleeping

thadillo commited on Oct 6

Commit

71797a4

1 Parent(s): 1377fb1

Phases 1-3: Database schema, text processing, analyzer updates

- Add SubmissionSentence model with relationships
- Add sentence_analysis_done flag to Submission
- Update TrainingExample to support sentence-level
- Create TextProcessor for sentence segmentation (NLTK + regex fallback)
- Update analyzer with analyze_with_sentences() method
- Store confidence scores for later retrieval

Files changed (20) hide show

CATEGORIZATION_DECISION_GUIDE.md +286 -0
Claude‘s Plan.md +344 -0
DEPLOYMENT_READY.md +316 -0
DEPLOYMENT_SUCCESS.md +268 -0
DEPLOY_TO_HF.md +255 -0
HF_DEPLOYMENT_CHECKLIST.md +315 -0
NEXT_STEPS_CATEGORIZATION.md +267 -0
SENTENCE_LEVEL_CATEGORIZATION_PLAN.md +830 -0
TRAINING_STRATEGY.md +266 -0
ZERO_SHOT_MODEL_SELECTION.md +185 -0
analyze_submissions_for_sentences.py +245 -0
app/analyzer.py +48 -0
app/models/models.py +75 -5
app/utils/__init__.py +2 -0
app/utils/text_processor.py +170 -0
mock_data_60.json +726 -0
prepare_hf_deployment.sh +109 -0
requirements.txt +3 -0
run.py +6 -0
sentence_analysis_results.txt +9 -0

CATEGORIZATION_DECISION_GUIDE.md ADDED Viewed

	@@ -0,0 +1,286 @@

+# 🎯 Quick Decision Guide: Categorization Strategy
+## Your Problem (Excellent Observation!)
+**Current**: One submission → One category
+**Reality**: One submission often contains multiple categories
+**Example**:
+```
+"Dallas should establish more green spaces in South Dallas neighborhoods.
+Areas like Oak Cliff lack accessible parks compared to North Dallas."
+Current system: Forces you to pick ONE category
+Better system: Recognize both Objective + Problem
+```
+---
+## 🔄 Three Solutions (Ranked by Effort vs. Value)
+### 🥇 Option 1: Sentence-Level Analysis (YOUR PROPOSAL)
+**What it does**:
+```
+Submission A
+  ├─ Sentence 1: "Dallas should establish..." → Objective
+  ├─ Sentence 2: "Areas like Oak Cliff..." → Problem
+  └─ Geotag: [lat, lng] (applies to all sentences)
+      Stakeholder: Community (applies to all sentences)
+```
+**UI Example**:
+```
+┌────────────────────────────────────────┐
+│ Submission #42 - Community             │
+├────────────────────────────────────────┤
+│ "Dallas should establish more green    │
+│  spaces in South Dallas neighborhoods. │
+│  Areas like Oak Cliff lack accessible  │
+│  parks compared to North Dallas."      │
+│                                        │
+│ Primary Category: Objective            │
+│ Distribution: 50% Objective, 50% Problem│
+│                                        │
+│ [▼ View Sentences (2)]                 │
+│ ┌──────────────────────────────────┐  │
+│ │ 1. "Dallas should establish..."   │  │
+│ │    Category: [Objective ▼]        │  │
+│ │                                   │  │
+│ │ 2. "Areas like Oak Cliff..."      │  │
+│ │    Category: [Problem ▼]          │  │
+│ └──────────────────────────────────┘  │
+└────────────────────────────────────────┘
+```
+**Pros**: ✅ Maximum accuracy, ✅ Best training data, ✅ Detailed analytics
+**Cons**: ⚠️ More complex, ⚠️ Takes longer to implement
+**Time**: 13-20 hours
+**Value**: ⭐⭐⭐⭐⭐
+---
+### 🥈 Option 2: Multi-Label (Simpler)
+**What it does**:
+```
+Submission A
+  ├─ Categories: [Objective, Problem]
+  ├─ Geotag: [lat, lng]
+  └─ Stakeholder: Community
+```
+**UI Example**:
+```
+┌────────────────────────────────────────┐
+│ Submission #42 - Community             │
+├────────────────────────────────────────┤
+│ "Dallas should establish more green    │
+│  spaces in South Dallas neighborhoods. │
+│  Areas like Oak Cliff lack accessible  │
+│  parks compared to North Dallas."      │
+│                                        │
+│ Categories: [Objective] [Problem]      │
+│            (select multiple)           │
+└────────────────────────────────────────┘
+```
+**Pros**: ✅ Simple to implement, ✅ Captures complexity
+**Cons**: ❌ Can't tell which sentence is which, ❌ Less precise training data
+**Time**: 4-6 hours
+**Value**: ⭐⭐⭐
+---
+### 🥉 Option 3: Primary + Secondary
+**What it does**:
+```
+Submission A
+  ├─ Primary: Objective
+  ├─ Secondary: [Problem, Values]
+  ├─ Geotag: [lat, lng]
+  └─ Stakeholder: Community
+```
+**Pros**: ✅ Preserves hierarchy, ✅ Moderate complexity
+**Cons**: ⚠️ Arbitrary primary choice, ❌ Still loses granularity
+**Time**: 8-10 hours
+**Value**: ⭐⭐⭐
+---
+## 📊 Side-by-Side Comparison
+| Feature | Sentence-Level | Multi-Label | Primary+Secondary |
+|---------|---------------|-------------|-------------------|
+| **Granularity** | Each sentence categorized | Submission-level | Submission-level |
+| **Training Data** | Precise per sentence | Ambiguous | Hierarchical |
+| **UI Complexity** | Collapsible view | Checkbox list | Dropdown + pills |
+| **Dashboard** | Dual mode (submissions vs sentences) | Overlapping counts | Clear hierarchy |
+| **Implementation** | New table + logic | Array field | Two fields |
+| **Time to Build** | 13-20 hrs | 4-6 hrs | 8-10 hrs |
+| **Your Example** | ✅ Perfect fit | ⚠️ OK | ⚠️ OK |
+| **Future AI Training** | ✅ Excellent | ⚠️ Limited | ⚠️ OK |
+---
+## 🎯 My Recommendation: Start with Proof of Concept
+### Phase 0: Quick Test (4-6 hours)
+**Goal**: See sentence breakdown WITHOUT changing database
+**Implementation**:
+1. Add sentence segmentation library (NLTK)
+2. Update submissions page to SHOW sentence breakdown (read-only)
+3. Display: "This submission contains X sentences in Y categories"
+4. Let admins see the breakdown and provide feedback
+**Example UI** (read-only preview):
+```
+┌────────────────────────────────────────┐
+│ Submission #42                         │
+│ "Dallas should establish..."           │
+│                                        │
+│ Current Category: Objective            │
+│                                        │
+│ [💡 AI Detected Multiple Topics]      │
+│ ┌──────────────────────────────────┐  │
+│ │ This submission contains:         │  │
+│ │ • 1 sentence about: Objective     │  │
+│ │ • 1 sentence about: Problem       │  │
+│ │                                   │  │
+│ │ [View Details ▼]                  │  │
+│ └──────────────────────────────────┘  │
+└────────────────────────────────────────┘
+```
+**Then decide**:
+- ✅ If admins find it useful → Full implementation
+- ⚠️ If too complex → Try multi-label
+- ❌ If not valuable → Keep current system
+---
+## 💭 Questions to Help Decide
+### Ask yourself:
+1. **Frequency**: How often do submissions contain multiple categories?
+   - Often (>30%) → Sentence-level worth it
+   - Sometimes (10-30%) → Multi-label sufficient
+   - Rarely (<10%) → Keep current system
+2. **Analytics depth**: Do you need to know which specific ideas are Objectives vs Problems?
+   - Yes, important → Sentence-level
+   - Just need tags → Multi-label
+   - Primary is enough → Primary+Secondary
+3. **Training priority**: Is fine-tuning accuracy critical?
+   - Yes, very important → Sentence-level (best training data)
+   - Moderately → Multi-label OK
+   - Not critical → Any approach works
+4. **User complexity tolerance**: How much UI complexity can admins handle?
+   - High (tech-savvy) → Sentence-level
+   - Medium → Multi-label
+   - Low → Primary+Secondary
+5. **Timeline**: When do you need this?
+   - This week → Multi-label (fast)
+   - Next 2 weeks → Sentence-level (with testing)
+   - Flexible → Sentence-level (best long-term)
+---
+## 🚀 Recommended Path Forward
+### Step 1: Quick Analysis (Now - 30 min)
+Run a sample analysis on your current data:
+```python
+# I can write a script to analyze your 60 submissions
+# and show:
+# - How many have multiple categories?
+# - Average sentences per submission
+# - Potential category distribution
+Would you like me to create this analysis script?
+```
+### Step 2: Choose Approach (After analysis)
+Based on results:
+- **>40% multi-category** → Go with sentence-level
+- **20-40% multi-category** → Try proof of concept
+- **<20% multi-category** → Multi-label might be enough
+### Step 3: Implementation
+**Option A: Full Commit (Sentence-Level)**
+- I implement all 7 phases (~15 hours of work)
+- You get the most powerful system
+**Option B: Test First (Proof of Concept)**
+- I implement Phase 0 (~4 hours)
+- You test with real users
+- Then decide on full implementation
+**Option C: Simple (Multi-Label)**
+- I implement multi-label (~5 hours)
+- Less powerful but faster to market
+---
+## 🎯 What Should We Do?
+**I recommend**: **Option B - Test First**
+**Steps**:
+1. ✅ I create analysis script (show current data patterns)
+2. ✅ I implement proof of concept (sentence display only)
+3. ✅ You test with admins (get feedback)
+4. ✅ We decide: Full sentence-level OR Multi-label OR Keep current
+**Advantages**:
+- Low risk (no DB changes initially)
+- Real user feedback
+- Informed decision
+- Can always upgrade later
+---
+## 📝 Your Decision
+**Which path do you want to take?**
+**A) Analysis Script First** (30 min)
+- I create a script to analyze your 60 submissions
+- Show: % multi-category, sentence distribution, etc.
+- Then decide based on data
+**B) Proof of Concept** (4-6 hours)
+- Skip analysis, go straight to sentence display
+- See it in action, get feedback
+- Then decide on full implementation
+**C) Full Implementation** (13-20 hours)
+- Commit to sentence-level now
+- Build everything
+- Most powerful, takes longest
+**D) Multi-Label Instead** (4-6 hours)
+- Simpler approach
+- Good enough for most cases
+- Fast to implement
+**E) Keep Current System**
+- If not worth the effort
+- Stay with one category per submission
+---
+**What's your choice?** Let me know and I'll get started! 🚀

Claude‘s Plan.md ADDED Viewed

	@@ -0,0 +1,344 @@

+# Fine-Tuning System Implementation Plan
+## Overview
+Implement an active learning system that collects admin corrections, builds a training dataset, and fine-tunes the BART classification model using LoRA (Low-Rank Adaptation).
+---
+## Phase 1: Training Data Collection Infrastructure
+### 1.1 Database Schema Extensions
+**New Model: `TrainingExample`**
+- `id` (Integer, PK)
+- `submission_id` (Integer, FK to Submission)
+- `message` (Text) - snapshot of submission text
+- `original_category` (String, nullable) - AI's initial prediction
+- `corrected_category` (String) - Admin's correction
+- `contributor_type` (String)
+- `correction_timestamp` (DateTime)
+- `confidence_score` (Float, nullable) - original prediction confidence
+- `used_in_training` (Boolean, default=False) - track if used in fine-tuning
+- `training_run_id` (Integer, nullable, FK) - which training run used this
+**New Model: `FineTuningRun`**
+- `id` (Integer, PK)
+- `created_at` (DateTime)
+- `status` (String) - 'preparing', 'training', 'evaluating', 'completed', 'failed'
+- `num_training_examples` (Integer)
+- `num_validation_examples` (Integer)
+- `num_test_examples` (Integer)
+- `training_config` (JSON) - hyperparameters, LoRA config
+- `results` (JSON) - metrics (accuracy, loss, per-category F1)
+- `model_path` (String, nullable) - path to saved LoRA weights
+- `is_active_model` (Boolean) - currently deployed model
+- `improvement_over_baseline` (Float, nullable)
+- `completed_at` (DateTime, nullable)
+### 1.2 Admin Routes Extension (`app/routes/admin.py`)
+**Modify `update_category` endpoint:**
+- When admin changes category, create TrainingExample record
+- Capture: original prediction, corrected category, confidence score
+- Track whether it's a correction (different from AI) or confirmation (same)
+**New endpoints:**
+- `GET /admin/training-data` - View collected training examples
+- `GET /admin/api/training-stats` - Stats on corrections collected
+- `DELETE /admin/api/training-example/<id>` - Remove bad examples
+---
+## Phase 2: Fine-Tuning Configuration UI
+### 2.1 New Admin Page: Training Dashboard (`app/templates/admin/training.html`)
+**Sections:**
+1. **Training Data Stats**
+   - Total corrections collected
+   - Per-category distribution
+   - Corrections vs confirmations ratio
+   - Data quality indicators (duplicates, conflicts)
+2. **Fine-Tuning Controls** (enabled when ≥20 examples)
+   - Configure training parameters:
+     - Minimum examples threshold (default: 20)
+     - Train/Val/Test split (e.g., 70/15/15)
+     - LoRA rank (r=8, 16, 32)
+     - Learning rate (1e-4 to 5e-4)
+     - Number of epochs (3-5)
+   - "Start Fine-Tuning" button (with confirmation)
+3. **Training History**
+   - Table of past FineTuningRun records
+   - Show: date, examples used, accuracy, status
+   - Actions: View details, Deploy model, Export weights
+4. **Active Model Indicator**
+   - Show which model is currently in use
+   - Option to rollback to base model
+### 2.2 Settings Extension
+- `fine_tuning_enabled` (Boolean) - master switch
+- `min_training_examples` (Integer, default: 20)
+- `auto_train` (Boolean, default: False) - auto-trigger when threshold reached
+---
+## Phase 3: Fine-Tuning Engine
+### 3.1 New Module: `app/fine_tuning/trainer.py`
+**Class: `BARTFineTuner`**
+**Methods:**
+`prepare_dataset(training_examples)`
+- Convert TrainingExample records to HuggingFace Dataset
+- Create train/val/test splits (stratified by category)
+- Tokenize texts for BART
+- Return: `train_dataset`, `val_dataset`, `test_dataset`
+`setup_lora_model(base_model_name, lora_config)`
+- Load base BART model (`facebook/bart-large-mnli`)
+- Apply PEFT (Parameter-Efficient Fine-Tuning) with LoRA
+- LoRA configuration:
+  ```python
+  {
+    "r": 16,  # rank
+    "lora_alpha": 32,
+    "target_modules": ["q_proj", "v_proj"],  # attention layers
+    "lora_dropout": 0.1,
+    "bias": "none"
+  }
+  ```
+`train(train_dataset, val_dataset, config)`
+- Use HuggingFace Trainer with custom loss
+- Multi-class cross-entropy loss
+- Metrics: accuracy, F1 per category, confusion matrix
+- Early stopping on validation loss
+- Save checkpoints to `/data/models/finetuned/run_{id}/`
+`evaluate(test_dataset, model)`
+- Run predictions on test set
+- Calculate: accuracy, precision, recall, F1 (macro/micro)
+- Generate confusion matrix
+- Compare to baseline (zero-shot) performance
+`export_model(run_id, destination_path)`
+- Save LoRA adapter weights
+- Save tokenizer config
+- Create model card with metrics
+- Package for backup/deployment
+**Alternative Approach: Output Layer Fine-Tuning**
+- Option to only train final classification head
+- Faster, less prone to overfitting
+- Good for small datasets (20-50 examples)
+### 3.2 Background Task Handler (`app/fine_tuning/tasks.py`)
+- Fine-tuning runs in background (avoid blocking Flask)
+- Options:
+  1. **Simple Threading** (for development)
+  2. **Celery** (for production) - requires Redis/RabbitMQ
+  3. **HF Spaces Gradio Jobs** (if deploying to HF)
+**Status Updates:**
+- Update FineTuningRun.status in real-time
+- Store progress in Settings table for UI polling
+- Log to file for debugging
+---
+## Phase 4: Model Deployment & Versioning
+### 4.1 Model Manager (`app/fine_tuning/model_manager.py`)
+**Class: `ModelManager`**
+`get_active_model()`
+- Check if fine-tuned model is deployed
+- Load LoRA weights if available
+- Fallback to base model
+`deploy_model(run_id)`
+- Set FineTuningRun.is_active_model = True
+- Update Settings: `active_model_id`
+- Reload analyzer with new model
+- Create deployment snapshot
+`rollback_to_baseline()`
+- Deactivate all fine-tuned models
+- Reload base BART model
+- Log rollback event
+`compare_models(run_id_1, run_id_2, test_dataset)`
+- Side-by-side comparison
+- Statistical significance tests
+- A/B testing support (future)
+### 4.2 Analyzer Modification (`app/analyzer.py`)
+**Update `SubmissionAnalyzer.__init__`:**
+- Check for active fine-tuned model
+- Load LoRA adapter if available
+- Track model version being used
+**Add method: `get_model_info()`**
+- Return: model type (base/finetuned), version, metrics
+**Store prediction metadata:**
+- Add confidence scores to all predictions
+- Track which model version made prediction
+---
+## Phase 5: Validation & Quality Assurance
+### 5.1 Cross-Validation
+- K-fold cross-validation (k=5) for small datasets
+- Stratified splits to ensure category balance
+- Report: mean ± std accuracy across folds
+### 5.2 Minimum Viable Training Set
+**Data Requirements:**
+- At least 3 examples per category (18 total)
+- Recommended: 5+ examples per category (30 total)
+- Warn if severe class imbalance (>5:1 ratio)
+### 5.3 Quality Checks
+- Detect duplicate texts
+- Detect conflicting labels (same text, different categories)
+- Flag suspiciously short/long texts
+- Admin review interface for cleanup
+### 5.4 Success Criteria
+**Model is deployed if:**
+- Test accuracy > baseline accuracy + 5%
+- OR per-category F1 improved for majority of categories
+- AND no category has F1 < 0.3 (catch catastrophic forgetting)
+**If criteria not met:**
+- Keep base model active
+- Suggest: collect more data, adjust hyperparameters
+---
+## Phase 6: Export & Backup
+### 6.1 Model Export
+**Format Options:**
+1. **HuggingFace Hub** - push LoRA adapter to private repo
+2. **Local Files** - save to `/data/models/exports/`
+3. **Download via UI** - ZIP file with weights + config
+**Export Contents:**
+- LoRA adapter weights (`adapter_model.bin`)
+- Adapter config (`adapter_config.json`)
+- Training metrics (`metrics.json`)
+- Training examples used (`training_data.json`)
+- Model card (`README.md`)
+### 6.2 Import Pre-trained Model
+- Upload ZIP with LoRA weights
+- Validate compatibility with base model
+- Deploy to production
+---
+## Technical Implementation Details
+### Dependencies to Add (requirements.txt)
+```
+peft>=0.7.0           # LoRA implementation
+datasets>=2.14.0      # HuggingFace datasets
+scikit-learn>=1.3.0   # cross-validation, metrics
+matplotlib>=3.7.0     # confusion matrix plotting
+seaborn>=0.12.0       # visualization
+accelerate>=0.24.0    # training optimization
+evaluate>=0.4.0       # evaluation metrics
+```
+### File Structure
+```
+app/
+├── fine_tuning/
+│   ├── __init__.py
+│   ├── trainer.py          # BARTFineTuner class
+│   ├── model_manager.py    # Model deployment logic
+│   ├── tasks.py            # Background job handler
+│   ├── metrics.py          # Custom evaluation metrics
+│   └── data_validator.py   # Training data QA
+├── models/
+│   └── models.py           # Add TrainingExample, FineTuningRun
+├── routes/
+│   └── admin.py            # Add training endpoints
+├── templates/admin/
+│   └── training.html       # Training dashboard UI
+└── analyzer.py             # Update to support LoRA models
+/data/models/              # Persistent storage (HF Spaces)
+├── finetuned/
+│   ├── run_1/
+│   ├── run_2/
+│   └── ...
+└── exports/
+```
+### API Endpoints Summary
+- `GET /admin/training` - Training dashboard page
+- `GET /admin/api/training-stats` - Get correction stats
+- `GET /admin/api/training-examples` - List training data
+- `DELETE /admin/api/training-example/<id>` - Remove example
+- `POST /admin/api/start-training` - Trigger fine-tuning
+- `GET /admin/api/training-status/<run_id>` - Poll training progress
+- `POST /admin/api/deploy-model/<run_id>` - Deploy fine-tuned model
+- `POST /admin/api/rollback-model` - Revert to base model
+- `GET /admin/api/export-model/<run_id>` - Download model weights
+### UI Workflow
+1. Admin corrects categories on Submissions page (already working)
+2. Navigate to **Training** tab in admin panel
+3. View stats: "25 corrections collected (Ready to train!)"
+4. Click "Start Fine-Tuning" → Configure parameters → Confirm
+5. Progress bar shows: "Preparing data... Training... Evaluating..."
+6. Results displayed: "Accuracy: 87% (+12% improvement!)"
+7. Click "Deploy Model" to activate
+8. All future predictions use fine-tuned model
+### Performance Considerations
+- **Training Time**: ~2-5 minutes for 20-50 examples (CPU)
+- **Memory**: LoRA uses ~10% of full fine-tuning memory
+- **Storage**: ~50MB per LoRA checkpoint
+- **Inference**: Minimal overhead vs base model
+### Risk Mitigation
+1. **Overfitting**: Use validation set, early stopping
+2. **Catastrophic Forgetting**: Monitor all category metrics
+3. **Bad Training Data**: Quality validation before training
+4. **Model Regression**: Always compare to baseline, allow rollback
+5. **Resource Limits**: LoRA keeps training feasible on HF Spaces
+---
+## Implementation Phases
+**Phase 1 (Foundation):** Database models + data collection (2-3 hours)
+**Phase 2 (UI):** Training dashboard + configuration (2-3 hours)
+**Phase 3 (Core ML):** Fine-tuning engine + LoRA (4-5 hours)
+**Phase 4 (Deployment):** Model management + versioning (2-3 hours)
+**Phase 5 (QA):** Validation + metrics (2-3 hours)
+**Phase 6 (Polish):** Export/import + documentation (1-2 hours)
+**Total Estimated Time:** 13-19 hours
+---
+## Questions for Clarification
+1. **Training Infrastructure**: Run on HF Spaces (CPU) or local machine (GPU)?
+2. **Background Jobs**: Use simple threading or prefer Celery/Redis?
+3. **Model Hosting**: Keep models in HF Spaces persistent storage or upload to HF Hub?
+4. **Auto-training**: Should system auto-train when threshold reached, or admin-triggered only?
+5. **Notification**: Email/webhook when training completes?
+6. **Multi-model**: Support multiple fine-tuned models simultaneously (A/B testing)?
+Ready to proceed with implementation upon your approval!

DEPLOYMENT_READY.md ADDED Viewed

	@@ -0,0 +1,316 @@

+# ✅ Deployment Ready - Status Report
+**Generated**: October 6, 2025
+**Target Platform**: Hugging Face Spaces
+**Status**: 🟢 READY TO DEPLOY
+---
+## 📦 Files Prepared
+### Core HF Files
+- ✅ **Dockerfile** (port 7860, HF-optimized)
+- ✅ **README.md** (with YAML metadata for Space)
+- ✅ **app_hf.py** (HF Spaces entry point)
+- ✅ **requirements.txt** (all dependencies)
+- ✅ **wsgi.py** (WSGI wrapper)
+### Application Code
+- ✅ **app/** directory (complete application)
+  - ✅ app/__init__.py (database config for HF)
+  - ✅ app/routes/ (all routes)
+  - ✅ app/models/ (database models)
+  - ✅ app/templates/ (UI templates)
+  - ✅ app/fine_tuning/ (model training)
+  - ✅ app/analyzer.py (AI classification)
+### Configuration
+- ✅ **.gitignore** (excludes sensitive files)
+- ✅ **.hfignore** (HF-specific exclusions)
+- ✅ **Environment variables** configured:
+  - DATABASE_PATH=/data/app.db
+  - HF_HOME=/data/.cache/huggingface
+  - PORT=7860
+---
+## 🔐 Security Configuration
+### Secret Key (CRITICAL)
+**Production Secret**: `9fd11d101e36efbd3a7893f56d604b860403d247633547586c41453118e69b00`
+**⚠️ IMPORTANT**: Add this to HF Space Settings → Repository secrets as:
+- **Name**: `FLASK_SECRET_KEY`
+- **Value**: (the key above)
+### Admin Access
+- **Default Token**: `ADMIN123`
+- **Recommendation**: Change before public deployment
+- **Location**: app/models/models.py (line 61)
+### Session Security
+- ✅ HTTPS enforced
+- ✅ HttpOnly cookies
+- ✅ SameSite=None (iframe support)
+- ✅ Partitioned cookies (Safari compatibility)
+---
+## 🚀 Deployment Configuration
+### Port Configuration
+```dockerfile
+EXPOSE 7860                    # Dockerfile
+ENV PORT=7860                  # Environment
+port = int(os.environ.get("PORT", 7860))  # app_hf.py
+```
+✅ Verified: Port 7860 configured correctly
+### Database Configuration
+```python
+DATABASE_PATH=/data/app.db     # HF persistent storage
+SQLALCHEMY_DATABASE_URI = f'sqlite:///{db_path}'
+```
+✅ Verified: Database uses persistent /data directory
+### Model Cache Configuration
+```dockerfile
+ENV HF_HOME=/data/.cache/huggingface
+ENV TRANSFORMERS_CACHE=/data/.cache/huggingface
+ENV HUGGINGFACE_HUB_CACHE=/data/.cache/huggingface
+```
+✅ Verified: Models cache in persistent storage
+---
+## 📊 Resource Requirements
+### Minimum (Free Tier)
+- **CPU**: 2 vCPU
+- **RAM**: 16GB
+- **Storage**: 5GB
+- **Performance**: Good for <100 submissions
+### Recommended (HF Pro - FREE for you!)
+- **CPU**: 4 vCPU (CPU Upgrade)
+- **RAM**: 32GB
+- **Storage**: 50GB
+- **Performance**: Excellent for any size session
+---
+## 🎯 Deployment Steps (Summary)
+1. **Create Space**: https://huggingface.co/new-space
+   - SDK: Docker ⚠️
+   - Hardware: CPU Basic or CPU Upgrade
+2. **Upload Files**:
+   - Dockerfile
+   - README.md
+   - requirements.txt
+   - app_hf.py
+   - wsgi.py
+   - app/ (entire directory)
+3. **Configure Secret**:
+   - Settings → Repository secrets
+   - Add FLASK_SECRET_KEY
+4. **Wait for Build** (~10 minutes)
+5. **Access**: https://YOUR_USERNAME-participatory-planner.hf.space
+---
+## ✅ Pre-Flight Checklist
+### Files
+- [x] Dockerfile uses port 7860
+- [x] README.md has YAML header
+- [x] app_hf.py configured for HF
+- [x] requirements.txt complete
+- [x] .hfignore excludes dev files
+- [x] Database path uses /data
+### Security
+- [x] Production secret key generated
+- [x] .env excluded from deployment
+- [x] Session cookies configured
+- [x] HTTPS ready
+### Features
+- [x] AI model auto-downloads
+- [x] Database auto-creates
+- [x] Fine-tuning works
+- [x] Model selection works
+- [x] Zero-shot models work
+- [x] Export/Import ready
+### Testing
+- [x] Local app runs successfully
+- [x] Port 7860 accessible
+- [x] Database persists
+- [x] AI analysis works
+- [x] All features tested
+---
+## 📝 Deployment Documentation
+### Quick Start
+- **DEPLOY_TO_HF.md** - 5-minute deployment guide
+### Detailed Guides
+- **HUGGINGFACE_DEPLOYMENT.md** - Complete HF deployment guide
+- **HF_DEPLOYMENT_CHECKLIST.md** - Detailed checklist & troubleshooting
+### Helper Scripts
+- **prepare_hf_deployment.sh** - Automated preparation script
+---
+## 🔍 Verification Commands
+### Pre-Deployment Check
+```bash
+./prepare_hf_deployment.sh
+```
+**Status**: ✅ Passed
+### Manual Verification
+```bash
+# Check port config
+grep -E "7860" Dockerfile app_hf.py
+# Check YAML header
+head -10 README.md
+# Verify files
+ls Dockerfile README.md app_hf.py requirements.txt wsgi.py app/
+```
+**Status**: ✅ All verified
+---
+## 🎁 What You Get
+### Deployed Application
+- ✅ Full AI-powered planning platform
+- ✅ Token-based access control
+- ✅ AI categorization (6 categories)
+- ✅ Geographic mapping
+- ✅ Analytics dashboard
+- ✅ Fine-tuning capability
+- ✅ Model selection (7+ models)
+- ✅ Zero-shot options (3 models)
+- ✅ Export/Import sessions
+- ✅ Training history
+- ✅ Model deployment management
+### Infrastructure
+- ✅ Auto-SSL (HTTPS)
+- ✅ Persistent storage
+- ✅ Auto-restart on crash
+- ✅ Build logs
+- ✅ Health checks
+- ✅ Domain ready (Pro)
+### Cost
+- ✅ **$0/month** (included in HF Pro)
+---
+## 📈 Expected Performance
+### Build Times
+- First deployment: ~10 minutes
+- Subsequent builds: ~3-5 minutes
+- Model download (first run): ~5 minutes
+### Runtime
+- Startup: 10-20 seconds
+- AI inference: <3 seconds per submission
+- Page load: <2 seconds
+- Database queries: <100ms
+### Storage Usage
+- Base image: ~500MB
+- AI models: ~1.5GB (cached)
+- Database: grows with usage
+- Total: ~2GB initially
+---
+## 🚨 Important Notes
+### Before Public Launch
+1. ⚠️ **Change admin token** from ADMIN123
+2. ⚠️ **Add FLASK_SECRET_KEY** to HF Secrets
+3. ⚠️ Consider making Space private if handling sensitive data
+4. ⚠️ Set up regular backups (Export feature)
+### Model Considerations
+- First run downloads ~1.5GB model
+- Models cache in /data (persists)
+- Fine-tuned models stored in /data/models
+- Training works on CPU (LoRA efficient)
+### Data Persistence
+- Database: /data/app.db (persists)
+- Models: /data/.cache (persists)
+- Fine-tuned: models/finetuned (persists)
+- 50GB storage with Pro
+---
+## 🎯 Next Steps
+1. **Deploy Now**: https://huggingface.co/new-space
+2. **Follow**: DEPLOY_TO_HF.md guide
+3. **Test**: All features after deployment
+4. **Share**: Your Space URL with stakeholders
+---
+## 📞 Support & Resources
+### Documentation
+- [Quick Deploy](./DEPLOY_TO_HF.md)
+- [Full Guide](./HUGGINGFACE_DEPLOYMENT.md)
+- [Checklist](./HF_DEPLOYMENT_CHECKLIST.md)
+### HF Resources
+- [Spaces Docs](https://huggingface.co/docs/hub/spaces)
+- [Discord](https://hf.co/join/discord)
+- [Forum](https://discuss.huggingface.co/)
+### Monitoring
+- Logs: Your Space → Logs tab
+- Status: Your Space → Status badge
+- Metrics: Your Space → Settings (Pro)
+---
+## ✨ Final Status
+```
+🟢 DEPLOYMENT READY
+All systems verified and tested.
+All files prepared and configured.
+All documentation complete.
+Secret key generated.
+Ready to deploy to Hugging Face Spaces!
+Estimated deployment time: 15 minutes
+Estimated cost: $0 (HF Pro included)
+```
+---
+**Action Required**: Click → https://huggingface.co/new-space
+**Good luck with your deployment! 🚀**

DEPLOYMENT_SUCCESS.md ADDED Viewed

	@@ -0,0 +1,268 @@

+# 🎉 Deployment Successful!
+**Status**: ✅ Pushed to Hugging Face Spaces
+**Time**: October 6, 2025
+**Commit**: 1377fb1
+---
+## 🌐 Your Space
+### URLs
+- **Space Dashboard**: https://huggingface.co/spaces/thadillo/participatory-planner
+- **Live App**: https://thadillo-participatory-planner.hf.space
+- **Settings**: https://huggingface.co/spaces/thadillo/participatory-planner/settings
+### Admin Login
+- **Token**: `ADMIN123`
+---
+## 🚨 CRITICAL - Next Step Required!
+### Add Secret Key (Do this NOW!)
+1. **Go to**: https://huggingface.co/spaces/thadillo/participatory-planner/settings
+2. **Click**: "Repository secrets" (left sidebar)
+3. **Click**: "New secret"
+4. **Add**:
+   - **Name**: `FLASK_SECRET_KEY`
+   - **Value**: `9fd11d101e36efbd3a7893f56d604b860403d247633547586c41453118e69b00`
+5. **Click**: "Add secret"
+**⚠️ Without this, sessions won't work properly!**
+---
+## 📊 Build Status
+### What's Happening Now:
+1. ✅ Code pushed to HF Spaces
+2. 🔄 Docker image building (~10 minutes)
+3. ⏳ AI models downloading (~5 minutes)
+4. ⏳ App starting
+### Check Progress:
+1. Go to: https://huggingface.co/spaces/thadillo/participatory-planner
+2. Click: **"Logs"** tab
+3. Look for: `Running on http://0.0.0.0:7860`
+### Status Indicators:
+- 🟡 **Yellow badge** = Building
+- 🟢 **Green badge** = Running
+- 🔴 **Red badge** = Error (check Logs)
+---
+## 🎯 Deployed Features
+### All Features Included:
+- ✅ AI-powered text categorization (6 categories)
+- ✅ Model selection (7+ transformer models)
+- ✅ Zero-shot model selection (3 NLI models)
+- ✅ Fine-tuning capability (LoRA + Head-only)
+- ✅ Training run management
+- ✅ Model export/import
+- ✅ Token-based access control
+- ✅ Geographic mapping
+- ✅ Analytics dashboard
+- ✅ Session export/import
+### Infrastructure:
+- ✅ Port 7860 configured
+- ✅ Persistent storage (/data)
+- ✅ Auto-SSL (HTTPS)
+- ✅ Health checks
+- ✅ Model caching
+---
+## ✅ Verification Checklist
+Once build completes, test:
+- [ ] App loads at https://thadillo-participatory-planner.hf.space
+- [ ] Admin login works (ADMIN123)
+- [ ] Can create tokens
+- [ ] Can submit contributions
+- [ ] AI analysis works
+- [ ] Model selection works (7+ models)
+- [ ] Zero-shot model selection works (3 models)
+- [ ] Training panel loads
+- [ ] Dashboard displays correctly
+- [ ] Data persists after refresh
+---
+## 📈 Expected Timeline
+| Step | Duration | Status |
+|------|----------|--------|
+| Code push | Instant | ✅ Done |
+| Docker build | ~10 min | 🔄 In progress |
+| Model download | ~5 min | ⏳ Waiting |
+| App start | ~30 sec | ⏳ Waiting |
+| **Total** | **~15 min** | 🔄 |
+---
+## 🔍 Monitoring
+### View Build Logs:
+```
+https://huggingface.co/spaces/thadillo/participatory-planner
+→ Click "Logs" tab
+```
+### What to Look For:
+```
+✓ Successfully built
+✓ Successfully tagged
+✓ Container started
+✓ Running on http://0.0.0.0:7860
+✓ Debugger is active! (or production mode)
+```
+### Common First-Time Messages (Normal):
+```
+⚠️ Downloading model... (first run, takes ~5 min)
+⚠️ Model cache empty (will populate)
+⚠️ Creating database... (auto-creates)
+```
+---
+## 🛠️ Troubleshooting
+### Build Fails
+**Check**: Logs tab for error details
+**Common fix**: Wait and try again (HF sometimes has delays)
+### App Not Loading
+**Check**: Build completed successfully (green badge)
+**Fix**: Give it 15-20 minutes for first deployment
+### Session Issues
+**Check**: FLASK_SECRET_KEY added to secrets?
+**Fix**: Add it now (see top of this file)
+### Model Download Timeout
+**Wait**: First download takes up to 10 minutes
+**Normal**: Models cache after first run
+---
+## 🎁 HF Pro Benefits Active
+Your deployment uses:
+- ✅ Better hardware (more CPU/RAM available)
+- ✅ Persistent storage (50GB)
+- ✅ No sleep mode
+- ✅ Priority builds
+- ✅ Custom domain support
+- ✅ Private space option
+**Cost**: $0 (included in HF Pro) 🎉
+---
+## 📊 What's Deployed
+### Git Commit Info:
+```
+Commit: 1377fb1
+Branch: feature/fine-tuning → main
+Files: 10 changed, 1020+ insertions
+```
+### Key Updates:
+- Model selection (7+ transformers)
+- Zero-shot options (3 NLI models)
+- Fine-tuning improvements
+- Training run management
+- Export/delete functionality
+- HF Spaces configuration
+---
+## 🔐 Security Notes
+### Current Setup:
+- ✅ HTTPS enabled (automatic)
+- ✅ Secret key in HF Secrets (add it!)
+- ⚠️ Admin token: ADMIN123 (change for production)
+### For Production:
+1. Change admin token in `app/models/models.py`
+2. Enable Space authentication
+3. Make Space private if needed
+4. Regular data backups
+---
+## 📞 Support
+### If You Need Help:
+- **Logs**: Check build/runtime logs
+- **HF Docs**: https://huggingface.co/docs/hub/spaces
+- **HF Discord**: https://hf.co/join/discord
+- **Status**: https://status.huggingface.co
+### Your Space:
+- **Dashboard**: https://huggingface.co/spaces/thadillo/participatory-planner
+- **Settings**: https://huggingface.co/spaces/thadillo/participatory-planner/settings
+- **Files**: https://huggingface.co/spaces/thadillo/participatory-planner/tree/main
+---
+## 🚀 Next Steps
+### Immediate (Now):
+1. ✅ Code pushed
+2. ⏳ Add FLASK_SECRET_KEY to secrets (critical!)
+3. ⏳ Wait for build (~15 min)
+4. ⏳ Test app functionality
+### Soon (After Build):
+1. Test all features
+2. Change admin token for production
+3. Configure Space settings (privacy, etc.)
+4. Share with stakeholders
+### Optional:
+1. Enable Space authentication
+2. Set up custom domain
+3. Configure hardware (CPU Upgrade)
+4. Set up monitoring/alerts
+---
+## ✨ Success Criteria
+Your deployment is successful when:
+- ✅ Space shows "Running" (green badge)
+- ✅ App loads at URL
+- ✅ Admin login works
+- ✅ AI analysis completes
+- ✅ Data persists
+- ✅ No errors in Logs
+**Estimated completion**: ~15 minutes from now
+---
+## 🎉 Congratulations!
+Your Participatory Planning Platform is deploying to Hugging Face Spaces!
+**Watch it build**: https://huggingface.co/spaces/thadillo/participatory-planner
+**First action**: Add the secret key! ⬆️
+---
+**Deployment Time**: October 6, 2025
+**Platform**: Hugging Face Spaces
+**Status**: 🔄 Building
+**ETA**: ~15 minutes

DEPLOY_TO_HF.md ADDED Viewed

	@@ -0,0 +1,255 @@

+# 🚀 Quick Deploy to Hugging Face Spaces
+## ⚡ 5-Minute Deployment
+Your app is **ready to deploy**! Everything is configured.
+---
+## 📋 What You Need
+1. ✅ Hugging Face account (you have Pro!)
+2. ✅ 10 minutes of time
+3. ✅ This repository
+---
+## 🎯 Deployment Steps
+### Step 1: Run Preparation Script (Already Done!)
+```bash
+cd /home/thadillo/MyProjects/participatory_planner
+./prepare_hf_deployment.sh
+```
+**Status**: ✅ Complete! Files are ready.
+---
+### Step 2: Create Hugging Face Space
+1. **Go to**: https://huggingface.co/new-space
+2. **Fill in the form**:
+   - **Space name**: `participatory-planner` (or your choice)
+   - **License**: MIT
+   - **SDK**: ⚠️ **Docker** (IMPORTANT!)
+   - **Hardware**: CPU Basic (free) or CPU Upgrade (Pro - faster)
+   - **Visibility**: Public or Private
+3. **Click**: "Create Space"
+---
+### Step 3: Upload Files
+Two options:
+#### Option A: Web UI (Easier)
+1. Go to your Space → **Files** tab
+2. Click "Add file" → "Upload files"
+3. Upload these files/folders:
+   ```
+   ✅ Dockerfile
+   ✅ README.md
+   ✅ requirements.txt
+   ✅ app_hf.py
+   ✅ wsgi.py
+   ✅ app/ (entire folder)
+   ```
+4. Commit: "Initial deployment"
+#### Option B: Git Push
+```bash
+# Add HF as remote (replace YOUR_USERNAME)
+git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/participatory-planner
+# Push
+git add Dockerfile README.md requirements.txt app_hf.py wsgi.py app/
+git commit -m "🚀 Deploy to HF Spaces"
+git push hf main
+```
+---
+### Step 4: Configure Secret Key
+1. **Go to**: Your Space → Settings → Repository secrets
+2. **Click**: "New secret"
+3. **Add**:
+   - **Name**: `FLASK_SECRET_KEY`
+   - **Value**: `9fd11d101e36efbd3a7893f56d604b860403d247633547586c41453118e69b00`
+4. **Save**
+---
+### Step 5: Wait for Build
+1. Go to **Logs** tab
+2. Watch the build (5-10 minutes first time)
+3. Look for:
+   ```
+   ✓ Running on http://0.0.0.0:7860
+   ```
+4. Status will change: "Building" → "Running" ✅
+---
+### Step 6: Access Your App! 🎉
+Your app is live at:
+- **Direct**: `https://huggingface.co/spaces/YOUR_USERNAME/participatory-planner`
+- **Embedded**: `https://YOUR_USERNAME-participatory-planner.hf.space`
+**Login**: `ADMIN123`
+---
+## ✅ Verify Deployment
+Test these features:
+- [ ] App loads correctly
+- [ ] Admin login works
+- [ ] Can create tokens
+- [ ] Can submit contributions
+- [ ] AI analysis works
+- [ ] Dashboard displays
+- [ ] Training panel accessible
+- [ ] Data persists after refresh
+---
+## 🔧 Troubleshooting
+### Build Failed?
+- Check **Logs** tab for error details
+- Verify Docker SDK was selected
+- Try CPU Upgrade if out of memory
+### App Not Loading?
+- Wait 10 minutes for model download
+- Check Logs for errors
+- Verify port 7860 in Dockerfile
+### Database Issues?
+- Database creates automatically on first run
+- Stored in `/data/app.db` (persists)
+- Check Space hasn't run out of storage
+---
+## 🎁 Bonus: Pro Features
+With your HF Pro account:
+### Faster Performance
+- Settings → Hardware → CPU Upgrade (4 vCPU, 32GB RAM)
+### Private Space
+- Settings → Visibility → Private
+- Perfect for confidential planning sessions
+### Custom Domain
+- Settings → Custom domains
+- Add: `planning.yourdomain.com`
+### Always-On
+- Settings → Sleep time → Never sleep
+- No cold starts!
+---
+## 📊 What Gets Deployed
+### Included:
+- ✅ Full application code (`app/`)
+- ✅ AI models (download on first run)
+- ✅ Database (created automatically)
+- ✅ All features working
+### NOT Included:
+- ❌ Local development files
+- ❌ Your local database
+- ❌ venv/
+- ❌ .env file (use Secrets instead)
+---
+## 🔐 Security Notes
+### Current Setup:
+- ✅ Secret key stored in HF Secrets (not in code)
+- ✅ HTTPS enabled automatically
+- ✅ Session cookies configured
+- ⚠️ Default admin token: `ADMIN123`
+### For Production:
+1. **Change admin token** to something secure
+2. **Enable Space authentication** (Settings)
+3. **Make Space private** if handling sensitive data
+4. **Regular backups** via Export feature
+---
+## 📈 Performance
+### Expected:
+- **Build time**: 5-10 minutes (first time)
+- **Model download**: 5 minutes (first run, then cached)
+- **Startup time**: 10-20 seconds
+- **Inference**: <3 seconds per submission
+- **Storage**: ~2GB (model + database)
+### With Pro CPU Upgrade:
+- ⚡ 2x faster inference
+- ⚡ Faster model loading
+- ⚡ Better for large sessions (100+ submissions)
+---
+## 📞 Support
+### Documentation:
+- **Full guide**: `HUGGINGFACE_DEPLOYMENT.md`
+- **Checklist**: `HF_DEPLOYMENT_CHECKLIST.md`
+- **HF Docs**: https://huggingface.co/docs/hub/spaces
+### Help:
+- **Logs**: Your Space → Logs tab
+- **HF Discord**: https://hf.co/join/discord
+- **HF Forum**: https://discuss.huggingface.co/
+---
+## 🎯 Quick Summary
+```
+1. Create Space (SDK: Docker)      → 1 min
+2. Upload files                     → 2 min
+3. Add FLASK_SECRET_KEY to Secrets  → 1 min
+4. Wait for build                   → 10 min
+5. Test & enjoy!                    → ∞
+Total: ~15 minutes
+Cost: $0 (included in HF Pro!)
+```
+---
+## ✨ You're Ready!
+Everything is configured and tested. Just follow the steps above.
+**Next**: Click this link → https://huggingface.co/new-space
+Good luck! 🚀🎉
+---
+**Files prepared by**: `prepare_hf_deployment.sh`
+**Deployment verified**: ✅ Ready
+**Secret key generated**: ✅ Ready
+**Docker config**: ✅ Port 7860
+**Database**: ✅ Auto-creates at `/data/app.db`

HF_DEPLOYMENT_CHECKLIST.md ADDED Viewed

	@@ -0,0 +1,315 @@

+# 🚀 Hugging Face Deployment Checklist
+## ✅ Pre-Deployment Checklist
+### 1. Files Ready
+- [x] `Dockerfile.hf` - HF-compatible Docker configuration
+- [x] `app_hf.py` - HF Spaces entry point (port 7860)
+- [x] `README_HF.md` - Space description with YAML metadata
+- [x] `requirements.txt` - All dependencies included
+- [x] `app/` directory - Complete application code
+- [x] `.gitignore` - Ignore patterns configured
+- [x] `wsgi.py` - WSGI application wrapper
+### 2. Configuration Verified
+- [x] Port 7860 configured in Dockerfile.hf and app_hf.py
+- [x] Database path uses environment variable (DATABASE_PATH=/data/app.db)
+- [x] HuggingFace cache configured (/data/.cache/huggingface)
+- [x] Session cookies configured for iframe embedding
+- [x] Health check endpoint configured
+- [x] Models directory configured (models/finetuned/)
+### 3. Security
+- [ ] **IMPORTANT**: Update FLASK_SECRET_KEY in HF Secrets
+  - Use this secure key: `9fd11d101e36efbd3a7893f56d604b860403d247633547586c41453118e69b00`
+- [ ] Consider changing ADMIN123 token to something more secure
+- [ ] Review .hfignore to exclude sensitive files
+---
+## 🎯 Deployment Steps
+### Option A: Web UI (Recommended - 5 minutes)
+#### Step 1: Create Space
+1. Go to https://huggingface.co/new-space
+2. Login with your HF Pro account
+3. Fill in:
+   - **Space name**: `participatory-planner`
+   - **License**: MIT
+   - **SDK**: Docker ⚠️ IMPORTANT
+   - **Hardware**: CPU Basic (or CPU Upgrade for Pro)
+   - **Visibility**: Public or Private
+#### Step 2: Prepare Files for Upload
+Run this command to copy HF-specific files:
+```bash
+cd /home/thadillo/MyProjects/participatory_planner
+# Copy HF-specific files to root
+cp Dockerfile.hf Dockerfile
+cp README_HF.md README.md
+```
+#### Step 3: Upload Files via Web UI
+Upload these files/folders to your Space:
+- ✅ `Dockerfile` (the HF version)
+- ✅ `README.md` (the HF version with YAML header)
+- ✅ `requirements.txt`
+- ✅ `app_hf.py`
+- ✅ `wsgi.py`
+- ✅ `app/` (entire folder with all subfolders)
+- ✅ `.gitignore`
+**DO NOT upload:**
+- ❌ `venv/` (Python virtual environment)
+- ❌ `instance/` (local database)
+- ❌ `models/finetuned/` (will be created on HF)
+- ❌ `.git/` (Git history)
+- ❌ `__pycache__/` (Python cache)
+#### Step 4: Configure Secrets
+1. Go to your Space → Settings → Repository secrets
+2. Click "Add a secret"
+3. Add:
+   - **Name**: `FLASK_SECRET_KEY`
+   - **Value**: `9fd11d101e36efbd3a7893f56d604b860403d247633547586c41453118e69b00`
+4. (Optional) Add:
+   - **Name**: `FLASK_ENV`
+   - **Value**: `production`
+#### Step 5: Wait for Build
+1. Go to "Logs" tab
+2. Watch the build process (5-10 minutes first time)
+3. Look for: `Running on http://0.0.0.0:7860`
+4. Space will show "Building" → "Running"
+#### Step 6: Access & Test
+1. Visit: `https://huggingface.co/spaces/YOUR_USERNAME/participatory-planner`
+2. Login with: `ADMIN123`
+3. Test all features:
+   - [ ] Registration page loads
+   - [ ] Can create tokens
+   - [ ] Can submit contributions
+   - [ ] AI analysis works
+   - [ ] Dashboard displays correctly
+   - [ ] Map visualization works
+   - [ ] Training panel accessible
+   - [ ] Export/Import works
+---
+### Option B: Git CLI (For Advanced Users)
+#### Step 1: Install Git LFS
+```bash
+git lfs install
+```
+#### Step 2: Create Space via CLI
+```bash
+# Install HF CLI
+pip install huggingface_hub
+# Login to HF
+huggingface-cli login
+# Create space (replace YOUR_USERNAME)
+huggingface-cli repo create participatory-planner --type space --space_sdk docker
+```
+#### Step 3: Prepare Repository
+```bash
+cd /home/thadillo/MyProjects/participatory_planner
+# Copy HF-specific files
+cp Dockerfile.hf Dockerfile
+cp README_HF.md README.md
+# Add HF remote
+git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/participatory-planner
+```
+#### Step 4: Commit and Push
+```bash
+# Make sure .hfignore is in place
+git add .
+git commit -m "🚀 Initial deployment to Hugging Face Spaces"
+git push hf main
+```
+#### Step 5: Configure secrets via Web UI
+(Same as Option A, Step 4)
+---
+## 📋 Post-Deployment Verification
+### Essential Tests
+- [ ] Space builds successfully (check Logs)
+- [ ] App is accessible at Space URL
+- [ ] Admin login works (ADMIN123)
+- [ ] Database persists between restarts
+- [ ] AI model loads successfully
+- [ ] File uploads work
+- [ ] Map loads correctly
+### Performance Checks
+- [ ] First load time < 3 seconds (after warm-up)
+- [ ] AI analysis completes in < 5 seconds
+- [ ] No memory errors in logs
+- [ ] Model caching works (subsequent loads faster)
+### Security Checks
+- [ ] FLASK_SECRET_KEY is set in Secrets (not in code)
+- [ ] No sensitive data in logs
+- [ ] HTTPS works correctly
+- [ ] Session cookies work in iframe
+---
+## 🔧 Troubleshooting
+### Build Fails
+**Error**: "Out of memory during build"
+- **Solution**: Upgrade to CPU Upgrade hardware in Settings
+**Error**: "Port 7860 not responding"
+- **Solution**: Verify Dockerfile exposes 7860 and app_hf.py uses it
+### Runtime Issues
+**Error**: "Database locked" or "Database resets"
+- **Solution**: Verify DATABASE_PATH=/data/app.db in Dockerfile
+**Error**: "Model download timeout"
+- **Solution**: First download takes 10+ minutes. Be patient. Check Logs.
+**Error**: "Can't access Space"
+- **Solution**: Check Space visibility (Settings). Set to Public.
+### AI Model Issues
+**Error**: "Transformers error on first run"
+- **Solution**: Models download on first use. Check HF_HOME=/data/.cache
+**Error**: "CUDA/GPU errors"
+- **Solution**: App uses CPU by default. Don't select GPU hardware unless needed.
+---
+## 📊 Monitoring
+### Daily Checks
+- View Logs tab for errors
+- Check Space status badge (green = good)
+- Verify database size (Settings → Storage)
+### Weekly Maintenance
+- Export data backup via admin panel
+- Review error logs
+- Check model storage size
+- Update dependencies if needed
+---
+## 🔄 Updates & Rollbacks
+### To Update Your Space
+Via Git:
+```bash
+git add .
+git commit -m "Update: description of changes"
+git push hf main
+```
+Via Web UI:
+1. Go to Files tab
+2. Edit files directly
+3. Commit changes
+### To Rollback
+1. Go to Files → Commits
+2. Find last working commit
+3. Click "Revert to this commit"
+---
+## 💡 Optimization Tips
+### For Better Performance
+- Enable CPU Upgrade (4 vCPU, 32GB RAM) - Free with Pro!
+- Use model presets (DeBERTa-v3-small recommended)
+- Set persistent storage for model cache
+### For Production Use
+1. Change admin token from ADMIN123
+2. Enable Space authentication (Settings)
+3. Set up custom domain (Pro feature)
+4. Enable always-on (Pro feature)
+5. Set up monitoring alerts
+---
+## 🎉 Success Criteria
+Your deployment is successful when:
+- ✅ Space status shows "Running" (green badge)
+- ✅ No errors in Logs for 5 minutes
+- ✅ Admin login works
+- ✅ AI analysis completes successfully
+- ✅ Data persists after refresh
+- ✅ All features work as in local development
+---
+## 📞 Support Resources
+- **HF Spaces Docs**: https://huggingface.co/docs/hub/spaces
+- **HF Discord**: https://hf.co/join/discord
+- **App Logs**: Your Space → Logs tab
+- **HF Status**: https://status.huggingface.co
+---
+## 🔐 Important Security Notes
+**CRITICAL - Before going public:**
+1. **Change Admin Token** in `app/models/models.py`:
+   ```python
+   if not Token.query.filter_by(token='YOUR_SECURE_TOKEN').first():
+       admin_token = Token(token='YOUR_SECURE_TOKEN', type='admin', ...)
+   ```
+2. **Use HF Secrets** (never commit secrets):
+   - FLASK_SECRET_KEY (already set)
+   - Any API keys
+   - Database credentials (if using external DB)
+3. **Consider Space Authentication**:
+   - Settings → Enable authentication
+   - Require HF login to access
+4. **For Confidential Sessions**:
+   - Set Space to Private
+   - Use password protection
+   - Regular data backups
+---
+## 📝 Final Notes
+**Estimated Deployment Time**: 10-15 minutes (first time)
+**Resources Used** (with HF Pro):
+- Storage: ~2GB (model cache + database)
+- RAM: ~1-2GB during inference
+- CPU: 2-4 cores recommended
+**Cost**: $0 (included in HF Pro subscription) 🎉
+**Next Step**: Click "Create Space" on huggingface.co/new-space and follow the checklist above!
+---
+**Good luck with your deployment! 🚀**

NEXT_STEPS_CATEGORIZATION.md ADDED Viewed

	@@ -0,0 +1,267 @@

+# 🎯 Next Steps: Sentence-Level Categorization
+## 📋 What We've Created
+Your excellent observation about multi-category submissions has led to a comprehensive analysis and plan:
+### 📄 Documents Created:
+1. **SENTENCE_LEVEL_CATEGORIZATION_PLAN.md** (Complete implementation plan)
+   - 4 solution options with pros/cons
+   - Detailed 7-phase implementation for sentence-level
+   - Database schema, UI mockups, code examples
+   - Migration strategy
+2. **CATEGORIZATION_DECISION_GUIDE.md** (Quick decision helper)
+   - Visual comparisons of approaches
+   - Questions to help decide
+   - Recommended path forward
+3. **analyze_submissions_for_sentences.py** (Data analysis script)
+   - Analyzes your current 60 submissions
+   - Shows % with multiple categories
+   - Identifies which need sentence-level breakdown
+   - Generates recommendation based on data
+---
+## 🚀 How to Proceed
+### Step 1: Run Analysis (5 minutes) ⏰
+**See the data before deciding!**
+```bash
+cd /home/thadillo/MyProjects/participatory_planner
+source venv/bin/activate
+python analyze_submissions_for_sentences.py
+```
+**This will show**:
+- How many submissions contain multiple categories
+- Which submissions would benefit most
+- Sentence count distribution
+- Data-driven recommendation
+**Example output**:
+```
+📊 STATISTICS
+─────────────────────────────────────────
+Total Submissions:        60
+Multi-category:           23 (38.3%)
+Avg Sentences/Submission: 2.3
+💡 RECOMMENDATION
+✅ STRONGLY RECOMMEND sentence-level categorization
+   38.3% of submissions contain multiple categories.
+```
+---
+### Step 2: Choose Your Path
+Based on analysis results, pick one:
+#### Path A: Full Implementation (if >40% multi-category)
+```
+Timeline: 2-3 weeks
+Effort: 13-20 hours
+Result: Best system, maximum value
+```
+**What you get**:
+- ✅ Sentence-level categorization
+- ✅ Collapsible UI for sentence breakdown
+- ✅ Dual-mode dashboard (submission vs sentence view)
+- ✅ Precise training data
+- ✅ Geotag inheritance
+- ✅ Category distribution per submission
+**Start with**: Phase 1 (Database schema)
+---
+#### Path B: Proof of Concept (if 20-40% multi-category)
+```
+Timeline: 3-5 days
+Effort: 4-6 hours
+Result: Test before committing
+```
+**What you get**:
+- ✅ Sentence breakdown display (read-only)
+- ✅ Shows what it WOULD look like
+- ✅ No database changes (safe)
+- ✅ Get user feedback
+- ✅ Then decide: full implementation or not
+**Start with**: UI prototype (no backend changes)
+---
+#### Path C: Multi-Label (if <20% multi-category)
+```
+Timeline: 2-3 days
+Effort: 4-6 hours
+Result: Good enough, simpler
+```
+**What you get**:
+- ✅ Multiple categories per submission
+- ✅ Simple checkbox UI
+- ✅ Fast to implement
+- ❌ Less granular than sentence-level
+**Start with**: Add category array field
+---
+#### Path D: Keep Current (if <10% multi-category)
+```
+Timeline: 0 days
+Effort: 0 hours
+Result: No change needed
+```
+**Decision**: Current system is sufficient
+---
+### Step 3: Implementation
+**Once you decide, I can**:
+#### If Full Implementation (Path A):
+1. ✅ Create database migration
+2. ✅ Add SubmissionSentence model
+3. ✅ Implement sentence segmentation
+4. ✅ Update analyzer for sentence-level
+5. ✅ Build collapsible UI
+6. ✅ Update dashboard aggregation
+7. ✅ Migrate existing data
+8. ✅ Add training data updates
+**I'll create**: Working feature branch with all phases
+#### If Proof of Concept (Path B):
+1. ✅ Add sentence display (read-only)
+2. ✅ Show category breakdown
+3. ✅ Test with users
+4. ✅ Get feedback
+5. ✅ Then decide next steps
+**I'll create**: UI prototype for testing
+#### If Multi-Label (Path C):
+1. ✅ Update Submission model
+2. ✅ Change UI to checkboxes
+3. ✅ Update dashboard logic
+4. ✅ Migrate data
+**I'll create**: Multi-label feature
+---
+## 📊 Decision Matrix
+**Use this to decide**:
+| Factor | Full Sentence-Level | Proof of Concept | Multi-Label | Keep Current |
+|--------|-------------------|------------------|-------------|--------------|
+| Multi-category % | >40% | 20-40% | 10-20% | <10% |
+| Time available | 2-3 weeks | 3-5 days | 2-3 days | - |
+| Training data priority | High | Medium | Low | - |
+| Analytics depth | Very important | Important | Nice to have | Not critical |
+| Risk tolerance | Low (test first) | Medium | High | - |
+---
+## 🎯 My Recommendation
+### Do This Now (10 minutes):
+1. **Run the analysis script**:
+   ```bash
+   cd /home/thadillo/MyProjects/participatory_planner
+   source venv/bin/activate
+   python analyze_submissions_for_sentences.py
+   ```
+2. **Look at the percentage** of multi-category submissions
+3. **Decide based on data**:
+   - **>40%** → "Let's do full sentence-level"
+   - **20-40%** → "Let's try proof of concept first"
+   - **<20%** → "Multi-label is probably enough"
+4. **Tell me your decision**, and I'll start implementation immediately
+---
+## 💡 Key Insights from Your Observation
+You identified a **critical limitation**:
+> "Dallas should establish more green spaces in South Dallas neighborhoods. Areas like Oak Cliff lack accessible parks compared to North Dallas."
+**Current problem**:
+- System forces ONE category
+- Loses semantic richness
+- Training data is imprecise
+**Your solution**:
+- Sentence-level categorization
+- Preserve all meaning
+- Better AI training
+**This is exactly the right thinking!** 🎯
+The analysis script will show if this pattern is common enough to warrant the implementation effort.
+---
+## 📞 What I Need from You
+**To proceed, please**:
+1. ✅ Run the analysis script (above)
+2. ✅ Review the output
+3. ✅ Tell me which path you want:
+   - **A**: Full sentence-level implementation
+   - **B**: Proof of concept first
+   - **C**: Multi-label approach
+   - **D**: Keep current system
+4. ✅ I'll start building immediately!
+---
+## 📂 Files Ready for You
+All documentation is ready:
+- ✅ `SENTENCE_LEVEL_CATEGORIZATION_PLAN.md` - Full technical plan
+- ✅ `CATEGORIZATION_DECISION_GUIDE.md` - Decision helper
+- ✅ `analyze_submissions_for_sentences.py` - Analysis script
+- ✅ This file - Next steps summary
+**Everything is prepared. Just waiting for your decision!** 🚀
+---
+## ⏰ Timeline Estimates
+| Path | Phase | Time | What Happens |
+|------|-------|------|--------------|
+| **A: Full** | Week 1 | 8-10h | DB, backend, analysis |
+| | Week 2 | 5-8h | UI, dashboard |
+| | Week 3 | 2-4h | Testing, polish |
+| **B: POC** | Days 1-2 | 4-6h | UI prototype |
+| | Day 3 | - | User testing |
+| | Days 4-5 | Decide | Full or abort |
+| **C: Multi-label** | Days 1-2 | 4-6h | Implementation |
+| | Day 3 | 1-2h | Testing |
+---
+**Ready when you are!** Just run the analysis and let me know what you decide. 🎉

SENTENCE_LEVEL_CATEGORIZATION_PLAN.md ADDED Viewed

	@@ -0,0 +1,830 @@

+# 📋 Sentence-Level Categorization - Implementation Plan
+**Problem Identified**: Single submissions often contain multiple semantic units (sentences) belonging to different categories, leading to loss of nuance.
+**Example**:
+> "Dallas should establish more green spaces in South Dallas neighborhoods. Areas like Oak Cliff lack accessible parks compared to North Dallas."
+- Sentence 1: **Objective** (should establish...)
+- Sentence 2: **Problem** (lack accessible parks...)
+---
+## 🎯 Proposed Solutions (Ranked by Complexity)
+### Option 1: Sentence-Level Categorization (User's Proposal) ⭐ RECOMMENDED
+**Concept**: Break submissions into sentences, categorize each individually while maintaining parent submission context.
+**Pros**:
+- ✅ Maximum granularity and accuracy
+- ✅ Preserves all semantic information
+- ✅ Better training data for fine-tuning
+- ✅ More detailed analytics
+- ✅ Maintains geotag/stakeholder context
+**Cons**:
+- ⚠️ Significant database schema changes
+- ⚠️ UI complexity increases
+- ⚠️ More AI inference calls (slower/costlier)
+- ⚠️ Dashboard aggregation more complex
+**Complexity**: High
+**Value**: Very High
+---
+### Option 2: Multi-Label Classification (Simpler Alternative)
+**Concept**: Assign multiple categories to a single submission.
+**Example**: Submission → [Objective, Problem]
+**Pros**:
+- ✅ Simpler implementation (no schema change)
+- ✅ Faster than sentence-level
+- ✅ Captures multi-faceted submissions
+- ✅ Minimal UI changes
+**Cons**:
+- ❌ Loses granularity (which sentence is which?)
+- ❌ Can't map specific sentences to categories
+- ❌ Training data less precise
+- ❌ Dashboard becomes ambiguous
+**Complexity**: Low
+**Value**: Medium
+---
+### Option 3: Primary + Secondary Categories (Hybrid)
+**Concept**: Main category + optional secondary categories.
+**Example**: Submission → Primary: Objective, Secondary: [Problem, Values]
+**Pros**:
+- ✅ Preserves primary focus
+- ✅ Acknowledges complexity
+- ✅ Moderate implementation effort
+- ✅ Good for hierarchical analysis
+**Cons**:
+- ❌ Still loses sentence-level detail
+- ❌ Arbitrary primary/secondary distinction
+- ❌ Training data structure unclear
+**Complexity**: Medium
+**Value**: Medium
+---
+### Option 4: Aspect-Based Sentiment Analysis (Advanced)
+**Concept**: Extract aspects/topics from each sentence, then categorize aspects.
+**Example**:
+- Aspect: "green spaces" → Category: Objective, Sentiment: Positive desire
+- Aspect: "park access disparity" → Category: Problem, Sentiment: Negative
+**Pros**:
+- ✅ Very sophisticated analysis
+- ✅ Captures nuance and sentiment
+- ✅ Excellent for research
+**Cons**:
+- ❌ Very complex implementation
+- ❌ Requires different AI models
+- ❌ Overkill for planning sessions
+- ❌ Harder to explain to stakeholders
+**Complexity**: Very High
+**Value**: Medium (unless research-focused)
+---
+## 🏗️ Implementation Plan: Option 1 (Sentence-Level Categorization)
+### Phase 1: Database Schema Changes
+#### New Model: `SubmissionSentence`
+```python
+class SubmissionSentence(db.Model):
+    __tablename__ = 'submission_sentences'
+    id = db.Column(db.Integer, primary_key=True)
+    submission_id = db.Column(db.Integer, db.ForeignKey('submissions.id'), nullable=False)
+    sentence_index = db.Column(db.Integer, nullable=False)  # 0, 1, 2...
+    text = db.Column(db.Text, nullable=False)
+    category = db.Column(db.String(50), nullable=True)
+    confidence = db.Column(db.Float, nullable=True)
+    created_at = db.Column(db.DateTime, default=datetime.utcnow)
+    # Relationships
+    submission = db.relationship('Submission', backref='sentences')
+    # Composite unique constraint
+    __table_args__ = (
+        db.UniqueConstraint('submission_id', 'sentence_index', name='uq_submission_sentence'),
+    )
+```
+#### Update `Submission` Model
+```python
+class Submission(db.Model):
+    # ... existing fields ...
+    # NEW: Flag to track if sentence-level analysis is done
+    sentence_analysis_done = db.Column(db.Boolean, default=False)
+    # DEPRECATED: category (keep for backward compatibility)
+    # category = db.Column(db.String(50), nullable=True)
+    def get_primary_category(self):
+        """Get most frequent category from sentences"""
+        if not self.sentences:
+            return self.category  # Fallback to old system
+        from collections import Counter
+        categories = [s.category for s in self.sentences if s.category]
+        if not categories:
+            return None
+        return Counter(categories).most_common(1)[0][0]
+    def get_category_distribution(self):
+        """Get percentage of each category in this submission"""
+        if not self.sentences:
+            return {self.category: 100} if self.category else {}
+        from collections import Counter
+        categories = [s.category for s in self.sentences if s.category]
+        total = len(categories)
+        if total == 0:
+            return {}
+        counts = Counter(categories)
+        return {cat: (count/total)*100 for cat, count in counts.items()}
+```
+#### Update `TrainingExample` Model
+```python
+class TrainingExample(db.Model):
+    # ... existing fields ...
+    # NEW: Link to sentence instead of submission
+    sentence_id = db.Column(db.Integer, db.ForeignKey('submission_sentences.id'), nullable=True)
+    # Keep submission_id for backward compatibility
+    submission_id = db.Column(db.Integer, db.ForeignKey('submissions.id'), nullable=True)
+    # Relationships
+    sentence = db.relationship('SubmissionSentence', backref='training_examples')
+```
+---
+### Phase 2: Sentence Segmentation Logic
+#### New Module: `app/utils/text_processor.py`
+```python
+import re
+import nltk
+from typing import List
+# Download required NLTK data (run once)
+# nltk.download('punkt')
+class TextProcessor:
+    """Handle sentence segmentation and text processing"""
+    @staticmethod
+    def segment_into_sentences(text: str) -> List[str]:
+        """
+        Break text into sentences using multiple strategies.
+        Strategies:
+        1. NLTK punkt tokenizer (primary)
+        2. Regex-based fallback
+        3. Min/max length constraints
+        """
+        # Clean text
+        text = text.strip()
+        # Try NLTK first (better accuracy)
+        try:
+            from nltk.tokenize import sent_tokenize
+            sentences = sent_tokenize(text)
+        except:
+            # Fallback: regex-based segmentation
+            sentences = TextProcessor._regex_segmentation(text)
+        # Clean and filter
+        sentences = [s.strip() for s in sentences if s.strip()]
+        # Filter out very short "sentences" (likely not meaningful)
+        sentences = [s for s in sentences if len(s.split()) >= 3]
+        return sentences
+    @staticmethod
+    def _regex_segmentation(text: str) -> List[str]:
+        """Fallback sentence segmentation using regex"""
+        # Split on period, exclamation, question mark (followed by space or end)
+        pattern = r'(?<=[.!?])\s+(?=[A-Z])|(?<=[.!?])$'
+        sentences = re.split(pattern, text)
+        return [s.strip() for s in sentences if s.strip()]
+    @staticmethod
+    def is_valid_sentence(sentence: str) -> bool:
+        """Check if sentence is valid for categorization"""
+        # Must have at least 3 words
+        if len(sentence.split()) < 3:
+            return False
+        # Must have some alphabetic characters
+        if not any(c.isalpha() for c in sentence):
+            return False
+        # Not just a list item or fragment
+        if sentence.strip().startswith('-') or sentence.strip().startswith('•'):
+            return False
+        return True
+```
+**Dependencies to add to `requirements.txt`**:
+```
+nltk>=3.8.0
+```
+---
+### Phase 3: Analysis Pipeline Updates
+#### Update `app/analyzer.py`
+```python
+class SubmissionAnalyzer:
+    # ... existing code ...
+    def analyze_with_sentences(self, submission_text: str):
+        """
+        Analyze submission at sentence level.
+        Returns:
+            List[Dict]: List of {text: str, category: str, confidence: float}
+        """
+        from app.utils.text_processor import TextProcessor
+        # Segment into sentences
+        sentences = TextProcessor.segment_into_sentences(submission_text)
+        # Classify each sentence
+        results = []
+        for sentence in sentences:
+            if TextProcessor.is_valid_sentence(sentence):
+                category = self.analyze(sentence)
+                # Get confidence if using fine-tuned model
+                confidence = self._get_last_confidence() if self.model_type == 'finetuned' else None
+                results.append({
+                    'text': sentence,
+                    'category': category,
+                    'confidence': confidence
+                })
+        return results
+    def _get_last_confidence(self):
+        """Store and return last prediction confidence"""
+        # Implementation depends on model type
+        return getattr(self, '_last_confidence', None)
+```
+#### Update Analysis Endpoint: `app/routes/admin.py`
+```python
+@bp.route('/api/analyze', methods=['POST'])
+@admin_required
+def analyze_submissions():
+    data = request.json
+    analyze_all = data.get('analyze_all', False)
+    use_sentences = data.get('use_sentences', True)  # NEW: sentence-level flag
+    # Get submissions to analyze
+    if analyze_all:
+        to_analyze = Submission.query.all()
+    else:
+        to_analyze = Submission.query.filter_by(sentence_analysis_done=False).all()
+    if not to_analyze:
+        return jsonify({'success': False, 'error': 'No submissions to analyze'}), 400
+    analyzer = get_analyzer()
+    success_count = 0
+    error_count = 0
+    for submission in to_analyze:
+        try:
+            if use_sentences:
+                # NEW: Sentence-level analysis
+                sentence_results = analyzer.analyze_with_sentences(submission.message)
+                # Clear old sentences
+                SubmissionSentence.query.filter_by(submission_id=submission.id).delete()
+                # Create new sentence records
+                for idx, result in enumerate(sentence_results):
+                    sentence = SubmissionSentence(
+                        submission_id=submission.id,
+                        sentence_index=idx,
+                        text=result['text'],
+                        category=result['category'],
+                        confidence=result.get('confidence')
+                    )
+                    db.session.add(sentence)
+                submission.sentence_analysis_done = True
+                # Set primary category for backward compatibility
+                submission.category = submission.get_primary_category()
+            else:
+                # OLD: Submission-level analysis (backward compatible)
+                category = analyzer.analyze(submission.message)
+                submission.category = category
+            success_count += 1
+        except Exception as e:
+            logger.error(f"Error analyzing submission {submission.id}: {e}")
+            error_count += 1
+            continue
+    db.session.commit()
+    return jsonify({
+        'success': True,
+        'analyzed': success_count,
+        'errors': error_count,
+        'sentence_level': use_sentences
+    })
+```
+---
+### Phase 4: UI/UX Updates
+#### A. Submissions Page - Collapsible Sentence View
+**Template Update: `app/templates/admin/submissions.html`**
+```html
+<!-- Submission Card -->
+<div class="card mb-3">
+    <div class="card-header d-flex justify-content-between align-items-center">
+        <div>
+            <strong>{{ submission.contributor_type }}</strong>
+            <span class="badge bg-secondary">{{ submission.timestamp.strftime('%Y-%m-%d %H:%M') }}</span>
+        </div>
+        <div>
+            {% if submission.sentence_analysis_done %}
+                <button class="btn btn-sm btn-outline-primary"
+                        data-bs-toggle="collapse"
+                        data-bs-target="#sentences-{{ submission.id }}">
+                    <i class="bi bi-list-nested"></i> View Sentences ({{ submission.sentences|length }})
+                </button>
+            {% endif %}
+        </div>
+    </div>
+    <div class="card-body">
+        <!-- Original Message -->
+        <p class="mb-2">{{ submission.message }}</p>
+        <!-- Primary Category (backward compatible) -->
+        <div class="mb-2">
+            <strong>Primary Category:</strong>
+            <span class="badge bg-info">{{ submission.get_primary_category() or 'Unanalyzed' }}</span>
+        </div>
+        <!-- Category Distribution -->
+        {% if submission.sentence_analysis_done %}
+            <div class="mb-2">
+                <strong>Category Distribution:</strong>
+                {% for category, percentage in submission.get_category_distribution().items() %}
+                    <span class="badge bg-secondary">{{ category }}: {{ "%.0f"|format(percentage) }}%</span>
+                {% endfor %}
+            </div>
+        {% endif %}
+        <!-- Collapsible Sentence Details -->
+        {% if submission.sentence_analysis_done %}
+            <div class="collapse mt-3" id="sentences-{{ submission.id }}">
+                <div class="border-start border-primary ps-3">
+                    <h6>Sentence Breakdown:</h6>
+                    {% for sentence in submission.sentences %}
+                        <div class="mb-2 p-2 bg-light rounded">
+                            <div class="d-flex justify-content-between align-items-start">
+                                <div class="flex-grow-1">
+                                    <small class="text-muted">Sentence {{ sentence.sentence_index + 1 }}:</small>
+                                    <p class="mb-1">{{ sentence.text }}</p>
+                                </div>
+                                <div>
+                                    <select class="form-select form-select-sm"
+                                            onchange="updateSentenceCategory({{ sentence.id }}, this.value)">
+                                        <option value="">Uncategorized</option>
+                                        {% for cat in categories %}
+                                            <option value="{{ cat }}"
+                                                    {% if sentence.category == cat %}selected{% endif %}>
+                                                {{ cat }}
+                                            </option>
+                                        {% endfor %}
+                                    </select>
+                                </div>
+                            </div>
+                            {% if sentence.confidence %}
+                                <small class="text-muted">Confidence: {{ "%.0f"|format(sentence.confidence * 100) }}%</small>
+                            {% endif %}
+                        </div>
+                    {% endfor %}
+                </div>
+            </div>
+        {% endif %}
+    </div>
+</div>
+```
+**JavaScript Update**:
+```javascript
+function updateSentenceCategory(sentenceId, category) {
+    fetch(`/admin/api/update-sentence-category/${sentenceId}`, {
+        method: 'POST',
+        headers: {'Content-Type': 'application/json'},
+        body: JSON.stringify({category: category})
+    })
+    .then(response => response.json())
+    .then(data => {
+        if (data.success) {
+            showToast('Sentence category updated', 'success');
+            // Optionally refresh to update distribution
+        } else {
+            showToast('Error: ' + data.error, 'error');
+        }
+    });
+}
+```
+#### B. Dashboard Updates - Aggregation Strategy
+**Two Aggregation Modes**:
+1. **Submission-Based** (backward compatible): Count primary category per submission
+2. **Sentence-Based** (new): Count all sentences by category
+**Template Update: `app/templates/admin/dashboard.html`**
+```html
+<!-- Aggregation Mode Selector -->
+<div class="mb-3">
+    <label>View Mode:</label>
+    <div class="btn-group" role="group">
+        <input type="radio" class="btn-check" name="viewMode" id="viewSubmissions"
+               value="submissions" checked onchange="updateDashboard()">
+        <label class="btn btn-outline-primary" for="viewSubmissions">
+            By Submissions
+        </label>
+        <input type="radio" class="btn-check" name="viewMode" id="viewSentences"
+               value="sentences" onchange="updateDashboard()">
+        <label class="btn btn-outline-primary" for="viewSentences">
+            By Sentences
+        </label>
+    </div>
+</div>
+<!-- Category Chart (updates based on mode) -->
+<canvas id="categoryChart"></canvas>
+```
+**Route Update: `app/routes/admin.py`**
+```python
+@bp.route('/dashboard')
+@admin_required
+def dashboard():
+    analyzed = Submission.query.filter(Submission.category != None).count() > 0
+    if not analyzed:
+        flash('Please analyze submissions first', 'warning')
+        return redirect(url_for('admin.overview'))
+    # NEW: Get view mode from query param
+    view_mode = request.args.get('mode', 'submissions')  # 'submissions' or 'sentences'
+    submissions = Submission.query.filter(Submission.category != None).all()
+    # Contributor stats (unchanged)
+    contributor_stats = db.session.query(
+        Submission.contributor_type,
+        db.func.count(Submission.id)
+    ).group_by(Submission.contributor_type).all()
+    # Category stats - MODE DEPENDENT
+    if view_mode == 'sentences':
+        # NEW: Sentence-based aggregation
+        category_stats = db.session.query(
+            SubmissionSentence.category,
+            db.func.count(SubmissionSentence.id)
+        ).filter(SubmissionSentence.category != None).group_by(SubmissionSentence.category).all()
+        # Breakdown by contributor (via parent submission)
+        breakdown = {}
+        for cat in CATEGORIES:
+            breakdown[cat] = {}
+            for ctype in CONTRIBUTOR_TYPES:
+                count = db.session.query(db.func.count(SubmissionSentence.id)).join(
+                    Submission
+                ).filter(
+                    SubmissionSentence.category == cat,
+                    Submission.contributor_type == ctype['value']
+                ).scalar()
+                breakdown[cat][ctype['value']] = count
+    else:
+        # OLD: Submission-based aggregation (backward compatible)
+        category_stats = db.session.query(
+            Submission.category,
+            db.func.count(Submission.id)
+        ).filter(Submission.category != None).group_by(Submission.category).all()
+        breakdown = {}
+        for cat in CATEGORIES:
+            breakdown[cat] = {}
+            for ctype in CONTRIBUTOR_TYPES:
+                count = Submission.query.filter_by(
+                    category=cat,
+                    contributor_type=ctype['value']
+                ).count()
+                breakdown[cat][ctype['value']] = count
+    # Geotagged submissions (unchanged - submission level)
+    geotagged_submissions = Submission.query.filter(
+        Submission.latitude != None,
+        Submission.longitude != None,
+        Submission.category != None
+    ).all()
+    return render_template('admin/dashboard.html',
+                         submissions=submissions,
+                         contributor_stats=contributor_stats,
+                         category_stats=category_stats,
+                         geotagged_submissions=geotagged_submissions,
+                         categories=CATEGORIES,
+                         contributor_types=CONTRIBUTOR_TYPES,
+                         breakdown=breakdown,
+                         view_mode=view_mode)
+```
+---
+### Phase 5: Geographic Mapping Updates
+**Challenge**: A single geotag now maps to multiple categories (via sentences).
+**Solution Options**:
+#### Option A: Multi-Category Markers (Recommended)
+```javascript
+// Map marker shows all categories in this submission
+marker.bindPopup(`
+    <strong>${submission.contributorType}</strong><br>
+    ${submission.message}<br>
+    <strong>Categories:</strong> ${submission.category_distribution}
+`);
+```
+#### Option B: One Marker Per Sentence-Category
+```javascript
+// Create separate markers for each sentence (if has geotag)
+// Color by sentence category
+submission.sentences.forEach(sentence => {
+    if (sentence.category) {
+        createMarker({
+            lat: submission.latitude,
+            lng: submission.longitude,
+            category: sentence.category,
+            text: sentence.text
+        });
+    }
+});
+```
+**Recommendation**: Option A (cleaner map, less clutter)
+---
+### Phase 6: Training Data Updates
+**Key Change**: Training examples now link to sentences, not submissions.
+**Update Training Example Creation**:
+```python
+@bp.route('/api/update-sentence-category/<int:sentence_id>', methods=['POST'])
+@admin_required
+def update_sentence_category(sentence_id):
+    try:
+        sentence = SubmissionSentence.query.get_or_404(sentence_id)
+        data = request.json
+        new_category = data.get('category')
+        # Store original
+        original_category = sentence.category
+        # Update sentence
+        sentence.category = new_category
+        # Create/update training example
+        existing = TrainingExample.query.filter_by(sentence_id=sentence_id).first()
+        if existing:
+            existing.original_category = original_category
+            existing.corrected_category = new_category
+            existing.correction_timestamp = datetime.utcnow()
+        else:
+            training_example = TrainingExample(
+                sentence_id=sentence_id,
+                submission_id=sentence.submission_id,
+                message=sentence.text,  # Just the sentence text
+                original_category=original_category,
+                corrected_category=new_category,
+                contributor_type=sentence.submission.contributor_type
+            )
+            db.session.add(training_example)
+        # Update parent submission's primary category
+        submission = sentence.submission
+        submission.category = submission.get_primary_category()
+        db.session.commit()
+        return jsonify({'success': True})
+    except Exception as e:
+        return jsonify({'success': False, 'error': str(e)}), 500
+```
+---
+### Phase 7: Migration Strategy
+#### Migration Script: `migrations/add_sentence_level.py`
+```python
+"""
+Migration: Add sentence-level categorization support
+This migration:
+1. Creates SubmissionSentence table
+2. Adds sentence_analysis_done flag to Submission
+3. Optionally migrates existing submissions to sentence-level
+"""
+from app import create_app, db
+from app.models.models import Submission, SubmissionSentence
+from app.utils.text_processor import TextProcessor
+import logging
+logger = logging.getLogger(__name__)
+def migrate_existing_submissions(auto_segment=False):
+    """
+    Migrate existing submissions to sentence-level structure.
+    Args:
+        auto_segment: If True, automatically segment and categorize
+                     If False, just mark as pending sentence analysis
+    """
+    app = create_app()
+    with app.app_context():
+        # Create new table
+        db.create_all()
+        # Get all submissions
+        submissions = Submission.query.all()
+        logger.info(f"Migrating {len(submissions)} submissions...")
+        for submission in submissions:
+            if auto_segment and submission.category:
+                # Auto-segment using old category as fallback
+                sentences = TextProcessor.segment_into_sentences(submission.message)
+                for idx, sentence_text in enumerate(sentences):
+                    sentence = SubmissionSentence(
+                        submission_id=submission.id,
+                        sentence_index=idx,
+                        text=sentence_text,
+                        category=submission.category,  # Use old category as default
+                        confidence=None
+                    )
+                    db.session.add(sentence)
+                submission.sentence_analysis_done = True
+                logger.info(f"Segmented submission {submission.id} into {len(sentences)} sentences")
+            else:
+                # Just mark for re-analysis
+                submission.sentence_analysis_done = False
+        db.session.commit()
+        logger.info("Migration complete!")
+if __name__ == '__main__':
+    # Run with auto-segmentation disabled (safer)
+    migrate_existing_submissions(auto_segment=False)
+    # Or run with auto-segmentation (assigns old category to all sentences)
+    # migrate_existing_submissions(auto_segment=True)
+```
+**Run migration**:
+```bash
+python migrations/add_sentence_level.py
+```
+---
+## 📊 Comparison: Implementation Approaches
+| Aspect | Option 1: Sentence-Level | Option 2: Multi-Label | Option 3: Primary+Secondary |
+|--------|-------------------------|----------------------|----------------------------|
+| **Granularity** | ⭐⭐⭐⭐⭐ Highest | ⭐⭐⭐ Medium | ⭐⭐⭐ Medium |
+| **Accuracy** | ⭐⭐⭐⭐⭐ Best | ⭐⭐⭐⭐ Good | ⭐⭐⭐⭐ Good |
+| **Implementation** | ⭐⭐ Complex | ⭐⭐⭐⭐⭐ Simple | ⭐⭐⭐⭐ Moderate |
+| **Training Data** | ⭐⭐⭐⭐⭐ Precise | ⭐⭐⭐ Ambiguous | ⭐⭐⭐ OK |
+| **UI Complexity** | ⭐⭐ High | ⭐⭐⭐⭐⭐ Low | ⭐⭐⭐⭐ Low |
+| **Dashboard** | ⭐⭐⭐ Flexible | ⭐⭐⭐ Limited | ⭐⭐⭐⭐ Clear |
+| **Performance** | ⭐⭐⭐ OK (more API calls) | ⭐⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐⭐ Fast |
+| **Backward Compat** | ⭐⭐⭐⭐⭐ Yes | ⭐⭐⭐⭐⭐ Yes | ⭐⭐⭐⭐ Mostly |
+---
+## 🎯 Final Recommendation
+### **Implement Option 1: Sentence-Level Categorization**
+**Why**:
+1. ✅ Matches your use case perfectly
+2. ✅ Provides maximum analytical value
+3. ✅ Better training data = better AI
+4. ✅ Backward compatible (maintains `submission.category`)
+5. ✅ Scalable to future needs
+**Implementation Priority**:
+1. **Phase 1**: Database schema ⏱️ 2-3 hours
+2. **Phase 2**: Sentence segmentation ⏱️ 1-2 hours
+3. **Phase 3**: Analysis pipeline ⏱️ 2-3 hours
+4. **Phase 4**: UI updates (collapsible view) ⏱️ 3-4 hours
+5. **Phase 5**: Dashboard aggregation ⏱️ 2-3 hours
+6. **Phase 6**: Training updates ⏱️ 1-2 hours
+7. **Phase 7**: Migration & testing ⏱️ 2-3 hours
+**Total Estimate**: 13-20 hours
+---
+## 💡 Alternative: Incremental Rollout
+**If you want to test before full commitment**:
+### Phase 0: Proof of Concept (4-6 hours)
+1. Add sentence segmentation (no DB changes)
+2. Show sentence breakdown in UI (read-only)
+3. Let admins test and provide feedback
+4. Decide whether to proceed with full implementation
+**Then choose**:
+- ✅ **Full sentence-level** if feedback is positive
+- ⚠️ **Multi-label** if sentence-level is too complex
+- 🔄 **Stay with current** if not worth effort
+---
+## 🚀 Next Steps
+**I recommend**:
+1. **Validate approach**: Review this plan with stakeholders
+2. **Start with Phase 0**: Proof of concept (sentence display only)
+3. **Get feedback**: Do admins find sentence breakdown useful?
+4. **Decide**: Full implementation or alternative approach
+**Should I proceed with**:
+- A) Phase 0: Proof of concept (sentence display, no DB changes)
+- B) Full implementation: All phases
+- C) Alternative: Multi-label approach (simpler)
+**Your choice?** 🎯

TRAINING_STRATEGY.md ADDED Viewed

	@@ -0,0 +1,266 @@

+# Training Strategy Guide for Participatory Planning Classifier
+## Current Performance (as of Oct 2025)
+- **Dataset**: 60 examples (~42 train / 9 val / 9 test)
+- **Current Best**: Head-only training - **66.7% accuracy**
+- **Baseline**: ~60% (zero-shot BART-mnli)
+- **Challenge**: Only 6.7% improvement - model is **underfitting**
+## Recommended Training Strategies (Ranked)
+### 🥇 **Strategy 1: LoRA with Conservative Settings**
+**Best for: Your current 60-example dataset**
+```yaml
+Configuration:
+  training_mode: lora
+  lora_rank: 4-8          # Start small!
+  lora_alpha: 8-16        # 2x rank
+  lora_dropout: 0.2       # High dropout to prevent overfitting
+  learning_rate: 1e-4     # Conservative
+  num_epochs: 5-7         # Watch for overfitting
+  batch_size: 4           # Smaller batches
+```
+**Expected Accuracy**: 70-80%
+**Why it works:**
+- More capacity than head-only (~500K params with r=4)
+- Still parameter-efficient enough for 60 examples
+- Dropout prevents overfitting
+**Try this first!** Your head-only results show you need more model capacity.
+---
+### 🥈 **Strategy 2: Data Augmentation + LoRA**
+**Best for: Improving beyond 80% accuracy**
+**Step 1: Augment your dataset to 150-200 examples**
+Methods:
+1. **Paraphrasing** (use GPT/Claude):
+   ```python
+   # For each example:
+   "We need better public transit"
+   → "Public transportation should be improved"
+   → "Transit system requires enhancement"
+   ```
+2. **Back-translation**:
+   English → Spanish → English (creates natural variations)
+3. **Template-based**:
+   Create templates for each category and fill with variations
+**Step 2: Train LoRA (r=8-16) on augmented data**
+- Expected Accuracy: 80-90%
+---
+### 🥉 **Strategy 3: Two-Stage Progressive Training**
+**Best for: Maximizing performance with limited data**
+1. **Stage 1**: Head-only (warm-up)
+   - 3 epochs
+   - Initialize the classification head
+2. **Stage 2**: LoRA fine-tuning
+   - r=4, low learning rate
+   - Build on head-only initialization
+---
+### 🔧 **Strategy 4: Optimize Category Definitions**
+**May help with zero-shot AND fine-tuning**
+Your categories might be too similar. Consider:
+**Current Categories:**
+- Vision vs Objectives (both forward-looking)
+- Problem vs Directives (both constraints)
+**Better Definitions:**
+```python
+CATEGORIES = {
+    'Vision': {
+        'name': 'Vision & Aspirations',
+        'description': 'Long-term future state, desired outcomes, what success looks like',
+        'keywords': ['future', 'aspire', 'imagine', 'dream', 'ideal']
+    },
+    'Problem': {
+        'name': 'Current Problems',
+        'description': 'Existing issues, frustrations, barriers, root causes',
+        'keywords': ['problem', 'issue', 'challenge', 'barrier', 'broken']
+    },
+    'Objectives': {
+        'name': 'Specific Goals',
+        'description': 'Measurable targets, concrete milestones, quantifiable outcomes',
+        'keywords': ['increase', 'reduce', 'achieve', 'target', 'by 2030']
+    },
+    'Directives': {
+        'name': 'Constraints & Requirements',
+        'description': 'Must-haves, non-negotiables, compliance requirements',
+        'keywords': ['must', 'required', 'mandate', 'comply', 'regulation']
+    },
+    'Values': {
+        'name': 'Principles & Values',
+        'description': 'Core beliefs, ethical guidelines, guiding principles',
+        'keywords': ['equity', 'sustainability', 'justice', 'fairness', 'inclusive']
+    },
+    'Actions': {
+        'name': 'Concrete Actions',
+        'description': 'Specific steps, interventions, activities to implement',
+        'keywords': ['build', 'create', 'implement', 'install', 'construct']
+    }
+}
+```
+---
+## Alternative Base Models to Consider
+### **DeBERTa-v3-base** (Better for Classification)
+```python
+# In app/analyzer.py
+model_name = "microsoft/deberta-v3-base"
+# Size: 184M params (vs BART's 400M)
+# Often outperforms BART for classification
+```
+### **DistilRoBERTa** (Faster, Lighter)
+```python
+model_name = "distilroberta-base"
+# Size: 82M params
+# 2x faster, 60% smaller
+# Good accuracy
+```
+### **XLM-RoBERTa-base** (Multilingual)
+```python
+model_name = "xlm-roberta-base"
+# If you have multilingual submissions
+```
+---
+## Data Collection Strategy
+**Current**: 60 examples → **Target**: 150+ examples
+### How to get more data:
+1. **Active Learning** (Built into your system!)
+   - Deploy current model
+   - Admin reviews and corrects predictions
+   - Automatically builds training set
+2. **Historical Data**
+   - Import past participatory planning submissions
+   - Manual labeling (15 min for 50 examples)
+3. **Synthetic Generation** (Use GPT-4)
+   ```
+   Prompt: "Generate 10 participatory planning submissions
+   that express VISION for urban transportation"
+   ```
+4. **Crowdsourcing**
+   - Mturk or internal team
+   - Label 100 examples: ~$20-50
+---
+## Performance Targets
+| Dataset Size | Method | Expected Accuracy | Time to Train |
+|-------------|--------|------------------|---------------|
+| 60 | Head-only | 65-70% ❌ Current | 2 min |
+| 60 | LoRA (r=4) | 70-80% ✅ Try next | 5 min |
+| 150 | LoRA (r=8) | 80-85% ⭐ Goal | 10 min |
+| 300+ | LoRA (r=16) | 85-90% 🎯 Ideal | 20 min |
+---
+## Immediate Action Plan
+### Week 1: Low-Hanging Fruit
+1. ✅ Train with LoRA (r=4, epochs=5)
+2. ✅ Compare to head-only baseline
+3. ✅ Check per-category F1 scores
+### Week 2: Data Expansion
+4. Collect 50 more examples (aim for balance)
+5. Use data augmentation (paraphrase 60 → 120)
+6. Retrain LoRA (r=8)
+### Week 3: Optimization
+7. Try DeBERTa-v3-base as base model
+8. Fine-tune category descriptions
+9. Deploy best model
+---
+## Debugging Low Performance
+If accuracy stays below 75%:
+### Check 1: Data Quality
+```python
+# Look for label conflicts
+SELECT message, corrected_category, COUNT(*)
+FROM training_examples
+GROUP BY message
+HAVING COUNT(DISTINCT corrected_category) > 1
+```
+### Check 2: Class Imbalance
+- Ensure each category has 5-10+ examples
+- Use weighted loss if imbalanced
+### Check 3: Category Confusion
+- Generate confusion matrix
+- Merge categories that are frequently confused
+  (e.g., Vision + Objectives → "Future Goals")
+### Check 4: Text Quality
+- Remove very short texts (< 5 words)
+- Remove duplicates
+- Check for non-English text
+---
+## Advanced: Ensemble Models
+If single model plateaus at 80-85%:
+1. Train 3 models with different seeds
+2. Use voting or averaging
+3. Typical boost: +3-5% accuracy
+```python
+# Pseudo-code
+predictions = [
+    model1.predict(text),
+    model2.predict(text),
+    model3.predict(text)
+]
+final = most_common(predictions)  # Voting
+```
+---
+## Conclusion
+**For your current 60 examples:**
+1. 🎯 **DO**: Try LoRA with r=4-8 (conservative settings)
+2. 📈 **DO**: Collect 50-100 more examples
+3. 🔄 **DO**: Try DeBERTa-v3 as alternative base model
+4. ❌ **DON'T**: Use head-only (proven to underfit)
+5. ❌ **DON'T**: Use full fine-tuning (will overfit)
+**Expected outcome:** 70-85% accuracy (up from current 66.7%)
+**Next milestone:** 150 examples → 85%+ accuracy

ZERO_SHOT_MODEL_SELECTION.md ADDED Viewed

	@@ -0,0 +1,185 @@

+# Zero-Shot Model Selection Feature
+## Overview
+You can now **choose which AI model** to use for zero-shot classification! This allows you to balance between accuracy and speed based on your needs.
+## Available Zero-Shot Models
+### 1. **BART-large-MNLI** (Current Default)
+- **Size**: 400M parameters
+- **Speed**: Slow
+- **Best for**: Maximum accuracy, works out of the box
+- **Description**: Large sequence-to-sequence model, excellent zero-shot performance
+- **Model ID**: `facebook/bart-large-mnli`
+### 2. **DeBERTa-v3-base-MNLI** ⭐ **Recommended**
+- **Size**: 86M parameters (4.5x smaller than BART)
+- **Speed**: Fast
+- **Best for**: Fast zero-shot classification with good accuracy
+- **Description**: DeBERTa trained on NLI datasets, excellent zero-shot with better speed
+- **Model ID**: `MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli`
+### 3. **DistilBART-MNLI**
+- **Size**: 134M parameters
+- **Speed**: Medium
+- **Best for**: Balanced zero-shot performance
+- **Description**: Distilled BART for zero-shot, good balance of speed and accuracy
+- **Model ID**: `valhalla/distilbart-mnli-12-3`
+## How to Use
+### Step 1: Go to Training Page
+1. Navigate to **Admin Panel** → **Training** tab
+2. Look for the **"Zero-Shot Classification Model"** section at the top
+### Step 2: View Current Model
+- The dropdown shows the currently active model
+- Below it, you'll see model information (size, speed, description)
+### Step 3: Change Model
+1. Select a different model from the dropdown
+2. The system will ask for confirmation
+3. The analyzer will reload with the new model
+4. **All future classifications** will use the selected model
+### Step 4: Test It
+- Go to **Submissions** page
+- Click "Re-analyze" on any submission
+- The new model will be used for classification!
+## When to Use Each Model
+### Use BART-large-MNLI if:
+- ✅ Accuracy is more important than speed
+- ✅ You have powerful hardware
+- ✅ You don't mind waiting a bit longer
+### Use DeBERTa-v3-base-MNLI if: ⭐ **RECOMMENDED**
+- ✅ You want good accuracy with better speed
+- ✅ You're working with many submissions
+- ✅ You want to save computational resources
+- ✅ You need faster response times
+### Use DistilBART-MNLI if:
+- ✅ You want something in between
+- ✅ You're familiar with BART but need better speed
+## Technical Details
+### How It Works
+1. **Settings Storage**: The selected model is stored in the database (`Settings` table)
+2. **Dynamic Loading**: The analyzer checks the setting and loads the selected model
+3. **Hot Reload**: When you change models, the analyzer reloads automatically
+4. **No Data Loss**: Changing models doesn't affect your training data or fine-tuned models
+### Model Persistence
+- The selected model remains active even after app restart
+- Each submission classification uses the currently active zero-shot model
+- Fine-tuned models override zero-shot models when deployed
+### API Endpoints
+**Get Current Model:**
+```
+GET /admin/api/get-zero-shot-model
+```
+**Change Model:**
+```
+POST /admin/api/set-zero-shot-model
+Body: {"model_key": "deberta-v3-base-mnli"}
+```
+## Performance Comparison
+| Model | Parameters | Classification Speed | Relative Accuracy |
+|-------|-----------|---------------------|-------------------|
+| BART-large-MNLI | 400M | 1x (baseline) | 100% |
+| DeBERTa-v3-base-MNLI | 86M | ~4x faster | ~95-98% |
+| DistilBART-MNLI | 134M | ~2x faster | ~92-95% |
+*Note: Actual performance may vary based on your hardware and text length*
+## Fine-Tuning vs Zero-Shot
+### Zero-Shot Model Selection
+- **When**: Before you have training data
+- **What**: Chooses which pre-trained model to use
+- **Where**: Admin → Training → Zero-Shot Classification Model
+- **Effect**: Affects all new classifications immediately
+### Fine-Tuning Model Selection
+- **When**: When training with your labeled data
+- **What**: Chooses which model architecture to fine-tune
+- **Where**: Admin → Training → Base Model Architecture for Fine-Tuning
+- **Effect**: Only affects that specific training run
+### Can I use both?
+**Yes!** You can:
+1. **Select a zero-shot model** (e.g., DeBERTa-v3-base-MNLI) for initial classifications
+2. **Fine-tune** using any model (e.g., DeBERTa-v3-small) for better performance
+3. **Deploy** the fine-tuned model, which will override the zero-shot model
+## Troubleshooting
+**Q: I changed the model but nothing happened?**
+A: The change affects new classifications. Try clicking "Re-analyze" on a submission to see the new model in action.
+**Q: Which model should I choose?**
+A: Start with **DeBERTa-v3-base-MNLI** - it's faster than BART with minimal accuracy loss.
+**Q: Does this affect my fine-tuned models?**
+A: No! Zero-shot models are only used when no fine-tuned model is deployed.
+**Q: Can I switch back to BART?**
+A: Yes! Just select BART-large-MNLI from the dropdown anytime.
+**Q: Will changing models break anything?**
+A: No, it's completely safe. Your data, training runs, and fine-tuned models are unaffected.
+## Best Practices
+1. **Start with DeBERTa-v3-base-MNLI** for better speed
+2. **Compare results** - try re-analyzing the same submission with different models
+3. **Consider your hardware** - larger models need more RAM
+4. **Fine-tune eventually** - zero-shot is great, but fine-tuning is better!
+## Example Workflow
+```
+1. Install app
+   ↓
+2. Select DeBERTa-v3-base-MNLI (for speed)
+   ↓
+3. Collect submissions
+   ↓
+4. Correct categories (builds training data)
+   ↓
+5. Fine-tune using DeBERTa-v3-small (best for small datasets)
+   ↓
+6. Deploy fine-tuned model (overrides zero-shot)
+   ↓
+7. Enjoy better accuracy! 🎉
+```
+## What's Next?
+After selecting your zero-shot model:
+- **Collect data**: Let users submit and classify with the selected model
+- **Review & correct**: Use the admin panel to fix any misclassifications
+- **Build training set**: Corrections are automatically saved
+- **Fine-tune**: Once you have 20+ examples, train a custom model
+- **Deploy**: Your fine-tuned model will outperform any zero-shot model!
+---
+**Ready to try it?** Go to Admin → Training and select your model! 🚀
+For questions or issues:
+1. Check the model info displayed below the dropdown
+2. Review this guide
+3. Try switching back to BART if issues occur

analyze_submissions_for_sentences.py ADDED Viewed

	@@ -0,0 +1,245 @@

+#!/usr/bin/env python3
+"""
+Analyze existing submissions to determine if sentence-level categorization is worth implementing.
+This script:
+1. Segments submissions into sentences
+2. Categorizes each sentence using current AI model
+3. Compares sentence-level vs submission-level categories
+4. Shows statistics to inform decision
+Run: python analyze_submissions_for_sentences.py
+"""
+import sys
+import os
+import re
+from collections import Counter, defaultdict
+from app import create_app, db
+from app.models.models import Submission
+from app.analyzer import get_analyzer
+import nltk
+# Try to download required NLTK data
+try:
+    nltk.data.find('tokenizers/punkt')
+except LookupError:
+    print("Downloading NLTK punkt tokenizer...")
+    nltk.download('punkt', quiet=True)
+def segment_sentences(text):
+    """Simple sentence segmentation"""
+    try:
+        from nltk.tokenize import sent_tokenize
+        sentences = sent_tokenize(text)
+    except:
+        # Fallback: regex-based
+        pattern = r'(?<=[.!?])\s+(?=[A-Z])|(?<=[.!?])$'
+        sentences = re.split(pattern, text)
+    # Clean and filter
+    sentences = [s.strip() for s in sentences if s.strip()]
+    # Filter very short "sentences"
+    sentences = [s for s in sentences if len(s.split()) >= 3]
+    return sentences
+def analyze_submissions():
+    """Analyze submissions to see if sentence-level categorization is beneficial"""
+    app = create_app()
+    with app.app_context():
+        # Get all analyzed submissions
+        submissions = Submission.query.filter(Submission.category != None).all()
+        if not submissions:
+            print("❌ No analyzed submissions found. Please run AI analysis first.")
+            return
+        print(f"\n{'='*70}")
+        print(f"📊 SENTENCE-LEVEL CATEGORIZATION ANALYSIS")
+        print(f"{'='*70}\n")
+        print(f"Analyzing {len(submissions)} submissions...\n")
+        # Load analyzer
+        analyzer = get_analyzer()
+        # Statistics
+        total_submissions = len(submissions)
+        total_sentences = 0
+        multi_sentence_count = 0
+        multi_category_count = 0
+        sentence_counts = []
+        category_changes = []
+        submission_details = []
+        # Analyze each submission
+        for submission in submissions:
+            # Segment into sentences
+            sentences = segment_sentences(submission.message)
+            sentence_count = len(sentences)
+            total_sentences += sentence_count
+            sentence_counts.append(sentence_count)
+            if sentence_count > 1:
+                multi_sentence_count += 1
+                # Categorize each sentence
+                sentence_categories = []
+                for sentence in sentences:
+                    try:
+                        category = analyzer.analyze(sentence)
+                        sentence_categories.append(category)
+                    except Exception as e:
+                        print(f"Error analyzing sentence: {e}")
+                        sentence_categories.append(None)
+                # Check if categories differ
+                unique_categories = set([c for c in sentence_categories if c])
+                if len(unique_categories) > 1:
+                    multi_category_count += 1
+                    category_changes.append({
+                        'id': submission.id,
+                        'text': submission.message,
+                        'submission_category': submission.category,
+                        'sentence_categories': sentence_categories,
+                        'sentences': sentences,
+                        'contributor_type': submission.contributor_type
+                    })
+        # Print Statistics
+        print(f"{'─'*70}")
+        print(f"📈 STATISTICS")
+        print(f"{'─'*70}\n")
+        print(f"Total Submissions:        {total_submissions}")
+        print(f"Total Sentences:          {total_sentences}")
+        print(f"Avg Sentences/Submission: {total_sentences/total_submissions:.1f}")
+        print(f"Multi-sentence (>1):      {multi_sentence_count} ({multi_sentence_count/total_submissions*100:.1f}%)")
+        print(f"Multi-category:           {multi_category_count} ({multi_category_count/total_submissions*100:.1f}%)")
+        # Sentence distribution
+        print(f"\n📊 Sentence Count Distribution:")
+        sentence_dist = Counter(sentence_counts)
+        for count in sorted(sentence_dist.keys()):
+            bar = '█' * int(sentence_dist[count] / total_submissions * 50)
+            print(f"  {count} sentence(s): {sentence_dist[count]:3d} {bar}")
+        # Category changes
+        if category_changes:
+            print(f"\n{'─'*70}")
+            print(f"🔄 SUBMISSIONS WITH MULTIPLE CATEGORIES ({len(category_changes)})")
+            print(f"{'─'*70}\n")
+            for idx, item in enumerate(category_changes[:10], 1):  # Show first 10
+                print(f"\n{idx}. Submission #{item['id']} ({item['contributor_type']})")
+                print(f"   Submission-level: {item['submission_category']}")
+                print(f"   Text: \"{item['text'][:100]}{'...' if len(item['text']) > 100 else ''}\"")
+                print(f"   Sentence breakdown:")
+                for i, (sentence, category) in enumerate(zip(item['sentences'], item['sentence_categories']), 1):
+                    marker = "⚠️" if category != item['submission_category'] else "✓"
+                    print(f"      {marker} S{i} [{category:12s}] \"{sentence[:60]}{'...' if len(sentence) > 60 else ''}\"")
+            if len(category_changes) > 10:
+                print(f"\n   ... and {len(category_changes) - 10} more")
+        # Category distribution comparison
+        print(f"\n{'─'*70}")
+        print(f"📊 CATEGORY DISTRIBUTION COMPARISON")
+        print(f"{'─'*70}\n")
+        # Submission-level counts
+        submission_cats = Counter([s.category for s in submissions if s.category])
+        # Sentence-level counts
+        sentence_cats = Counter()
+        for item in category_changes:
+            for cat in item['sentence_categories']:
+                if cat:
+                    sentence_cats[cat] += 1
+        print(f"{'Category':<15} {'Submission-Level':<20} {'Sentence-Level (multi-cat only)':<30}")
+        print(f"{'-'*15} {'-'*20} {'-'*30}")
+        categories = ['Vision', 'Problem', 'Objectives', 'Directives', 'Values', 'Actions']
+        for cat in categories:
+            sub_count = submission_cats.get(cat, 0)
+            sen_count = sentence_cats.get(cat, 0)
+            sub_bar = '█' * int(sub_count / total_submissions * 20)
+            sen_bar = '█' * int(sen_count / multi_category_count * 20) if multi_category_count > 0 else ''
+            print(f"{cat:<15} {sub_count:3d} {sub_bar:<15} {sen_count:3d} {sen_bar:<15}")
+        # Recommendation
+        print(f"\n{'='*70}")
+        print(f"💡 RECOMMENDATION")
+        print(f"{'='*70}\n")
+        multi_cat_percentage = (multi_category_count / total_submissions * 100) if total_submissions > 0 else 0
+        if multi_cat_percentage > 40:
+            print(f"✅ STRONGLY RECOMMEND sentence-level categorization")
+            print(f"   {multi_cat_percentage:.1f}% of submissions contain multiple categories.")
+            print(f"   Current system is losing significant semantic detail.")
+            print(f"\n   📈 Expected benefits:")
+            print(f"   • {multi_category_count} submissions will have richer categorization")
+            print(f"   • Training data will be ~{total_sentences - total_submissions} examples richer")
+            print(f"   • Analytics will be more accurate")
+        elif multi_cat_percentage > 20:
+            print(f"⚠️ RECOMMEND sentence-level categorization (or proof of concept)")
+            print(f"   {multi_cat_percentage:.1f}% of submissions contain multiple categories.")
+            print(f"   Moderate benefit expected.")
+            print(f"\n   💡 Suggestion: Start with proof of concept (display only)")
+            print(f"   Then decide if full implementation is worth it.")
+        else:
+            print(f"ℹ️ OPTIONAL - Multi-label might be sufficient")
+            print(f"   Only {multi_cat_percentage:.1f}% of submissions contain multiple categories.")
+            print(f"   Sentence-level might be overkill.")
+            print(f"\n   💡 Consider:")
+            print(f"   • Multi-label classification (simpler)")
+            print(f"   • Or keep current system if working well")
+        # Implementation effort
+        print(f"\n📋 Implementation Effort:")
+        print(f"   • Full sentence-level: 13-20 hours")
+        print(f"   • Proof of concept:     4-6 hours")
+        print(f"   • Multi-label:          4-6 hours")
+        print(f"\n{'='*70}\n")
+        # Export detailed results
+        export_path = "sentence_analysis_results.txt"
+        with open(export_path, 'w') as f:
+            f.write("DETAILED SENTENCE-LEVEL ANALYSIS RESULTS\n")
+            f.write("="*70 + "\n\n")
+            f.write(f"Total Submissions: {total_submissions}\n")
+            f.write(f"Multi-category Submissions: {multi_category_count} ({multi_cat_percentage:.1f}%)\n\n")
+            f.write("\nDETAILED BREAKDOWN:\n\n")
+            for idx, item in enumerate(category_changes, 1):
+                f.write(f"\n{idx}. Submission #{item['id']}\n")
+                f.write(f"   Contributor: {item['contributor_type']}\n")
+                f.write(f"   Submission Category: {item['submission_category']}\n")
+                f.write(f"   Full Text: {item['text']}\n")
+                f.write(f"   Sentences:\n")
+                for i, (sentence, category) in enumerate(zip(item['sentences'], item['sentence_categories']), 1):
+                    f.write(f"      {i}. [{category}] {sentence}\n")
+                f.write("\n")
+        print(f"📄 Detailed results exported to: {export_path}")
+if __name__ == '__main__':
+    try:
+        analyze_submissions()
+    except Exception as e:
+        print(f"\n❌ Error: {e}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)

app/analyzer.py CHANGED Viewed

@@ -168,6 +168,9 @@ class SubmissionAnalyzer:
             confidence = predictions[0][predicted_class].item()
         category = self.id2label[predicted_class]
         logger.info(f"Fine-tuned model classified as: {category} (confidence: {confidence:.2f})")
@@ -191,6 +194,9 @@ class SubmissionAnalyzer:
         # Extract the category name from the label
         top_label = result['labels'][0]
         category = top_label.split(':')[0]
         logger.info(f"Zero-shot model classified as: {category} (confidence: {result['scores'][0]:.2f})")
@@ -207,6 +213,48 @@ class SubmissionAnalyzer:
             list: List of predicted categories
         """
         return [self.analyze(msg) for msg in messages]
     def get_model_info(self):
         """

             confidence = predictions[0][predicted_class].item()
         category = self.id2label[predicted_class]
+        # Store confidence for later retrieval
+        self._last_confidence = confidence
         logger.info(f"Fine-tuned model classified as: {category} (confidence: {confidence:.2f})")
         # Extract the category name from the label
         top_label = result['labels'][0]
         category = top_label.split(':')[0]
+        # Store confidence for later retrieval
+        self._last_confidence = result['scores'][0]
         logger.info(f"Zero-shot model classified as: {category} (confidence: {result['scores'][0]:.2f})")
             list: List of predicted categories
         """
         return [self.analyze(msg) for msg in messages]
+    def analyze_with_sentences(self, submission_text: str):
+        """
+        Analyze submission at sentence level.
+        Args:
+            submission_text: Full submission text
+        Returns:
+            List[Dict]: List of {text: str, category: str, confidence: float}
+        """
+        from app.utils.text_processor import TextProcessor
+        # Segment into sentences
+        sentences = TextProcessor.segment_and_clean(submission_text)
+        # Classify each sentence
+        results = []
+        for sentence in sentences:
+            try:
+                category = self.analyze(sentence)
+                # Get confidence if available
+                confidence = self._get_last_confidence() if hasattr(self, '_last_confidence') else None
+                results.append({
+                    'text': sentence,
+                    'category': category,
+                    'confidence': confidence
+                })
+                logger.info(f"Sentence classified: '{sentence[:50]}...' -> {category}")
+            except Exception as e:
+                logger.error(f"Error analyzing sentence '{sentence[:50]}...': {e}")
+                # Skip problematic sentences
+                continue
+        return results
+    def _get_last_confidence(self):
+        """Get last prediction confidence (if available)"""
+        return getattr(self, '_last_confidence', None)
     def get_model_info(self):
         """

app/models/models.py CHANGED Viewed

@@ -29,11 +29,38 @@ class Submission(db.Model):
     latitude = db.Column(db.Float, nullable=True)
     longitude = db.Column(db.Float, nullable=True)
     timestamp = db.Column(db.DateTime, default=datetime.utcnow)
-    category = db.Column(db.String(50), nullable=True)  # Vision, Problem, Objectives, Directives, Values, Actions
     flagged_as_offensive = db.Column(db.Boolean, default=False)
     def to_dict(self):
-        return {
             'id': self.id,
             'message': self.message,
             'contributorType': self.contributor_type,
@@ -42,10 +69,51 @@ class Submission(db.Model):
                 'lng': self.longitude
             } if self.latitude and self.longitude else None,
             'timestamp': self.timestamp.isoformat() if self.timestamp else None,
             'category': self.category,
-            'flaggedAsOffensive': self.flagged_as_offensive
         }
 class Settings(db.Model):
     __tablename__ = 'settings'
@@ -74,8 +142,9 @@ class TrainingExample(db.Model):
     __tablename__ = 'training_examples'
     id = db.Column(db.Integer, primary_key=True)
-    submission_id = db.Column(db.Integer, db.ForeignKey('submissions.id'), nullable=False)
-    message = db.Column(db.Text, nullable=False)  # Snapshot of submission text
     original_category = db.Column(db.String(50), nullable=True)  # AI's prediction
     corrected_category = db.Column(db.String(50), nullable=False)  # Admin's correction
     contributor_type = db.Column(db.String(20), nullable=False)
@@ -86,6 +155,7 @@ class TrainingExample(db.Model):
     # Relationships
     submission = db.relationship('Submission', backref='training_examples')
     training_run = db.relationship('FineTuningRun', backref='training_examples')
     def to_dict(self):

     latitude = db.Column(db.Float, nullable=True)
     longitude = db.Column(db.Float, nullable=True)
     timestamp = db.Column(db.DateTime, default=datetime.utcnow)
+    category = db.Column(db.String(50), nullable=True)  # Vision, Problem, Objectives, Directives, Values, Actions (backward compat)
     flagged_as_offensive = db.Column(db.Boolean, default=False)
+    sentence_analysis_done = db.Column(db.Boolean, default=False)  # NEW: Track if sentence-level analysis is complete
+    def get_primary_category(self):
+        """Get most frequent category from sentences (or fallback to old category)"""
+        if not self.sentences or len(self.sentences) == 0:
+            return self.category  # Fallback to old system
+        from collections import Counter
+        categories = [s.category for s in self.sentences if s.category]
+        if not categories:
+            return None
+        return Counter(categories).most_common(1)[0][0]
+    def get_category_distribution(self):
+        """Get percentage of each category in this submission"""
+        if not self.sentences or len(self.sentences) == 0:
+            return {self.category: 100.0} if self.category else {}
+        from collections import Counter
+        categories = [s.category for s in self.sentences if s.category]
+        total = len(categories)
+        if total == 0:
+            return {}
+        counts = Counter(categories)
+        return {cat: round((count/total)*100, 1) for cat, count in counts.items()}
     def to_dict(self):
+        """Convert to dictionary with sentence-level support"""
+        base_dict = {
             'id': self.id,
             'message': self.message,
             'contributorType': self.contributor_type,
                 'lng': self.longitude
             } if self.latitude and self.longitude else None,
             'timestamp': self.timestamp.isoformat() if self.timestamp else None,
+            'category': self.get_primary_category() if self.sentence_analysis_done else self.category,
+            'flaggedAsOffensive': self.flagged_as_offensive,
+            'sentenceAnalysisDone': self.sentence_analysis_done
+        }
+        # Add sentence-level data if available
+        if self.sentence_analysis_done and self.sentences:
+            base_dict['sentences'] = [s.to_dict() for s in self.sentences]
+            base_dict['categoryDistribution'] = self.get_category_distribution()
+        return base_dict
+class SubmissionSentence(db.Model):
+    """Stores individual sentences from submissions with their categories"""
+    __tablename__ = 'submission_sentences'
+    id = db.Column(db.Integer, primary_key=True)
+    submission_id = db.Column(db.Integer, db.ForeignKey('submissions.id'), nullable=False)
+    sentence_index = db.Column(db.Integer, nullable=False)  # 0, 1, 2...
+    text = db.Column(db.Text, nullable=False)
+    category = db.Column(db.String(50), nullable=True)
+    confidence = db.Column(db.Float, nullable=True)
+    created_at = db.Column(db.DateTime, default=datetime.utcnow)
+    # Relationships
+    submission = db.relationship('Submission', backref='sentences')
+    # Composite unique constraint
+    __table_args__ = (
+        db.UniqueConstraint('submission_id', 'sentence_index', name='uq_submission_sentence'),
+    )
+    def to_dict(self):
+        return {
+            'id': self.id,
+            'submission_id': self.submission_id,
+            'sentence_index': self.sentence_index,
+            'text': self.text,
             'category': self.category,
+            'confidence': self.confidence,
+            'created_at': self.created_at.isoformat() if self.created_at else None
         }
 class Settings(db.Model):
     __tablename__ = 'settings'
     __tablename__ = 'training_examples'
     id = db.Column(db.Integer, primary_key=True)
+    submission_id = db.Column(db.Integer, db.ForeignKey('submissions.id'), nullable=True)  # Made nullable for sentence-level
+    sentence_id = db.Column(db.Integer, db.ForeignKey('submission_sentences.id'), nullable=True)  # NEW: Link to sentence
+    message = db.Column(db.Text, nullable=False)  # Snapshot of submission/sentence text
     original_category = db.Column(db.String(50), nullable=True)  # AI's prediction
     corrected_category = db.Column(db.String(50), nullable=False)  # Admin's correction
     contributor_type = db.Column(db.String(20), nullable=False)
     # Relationships
     submission = db.relationship('Submission', backref='training_examples')
+    sentence = db.relationship('SubmissionSentence', backref='training_examples')
     training_run = db.relationship('FineTuningRun', backref='training_examples')
     def to_dict(self):

app/utils/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Utils package
2	+

app/utils/text_processor.py ADDED Viewed

	@@ -0,0 +1,170 @@

+"""
+Text processing utilities for sentence-level categorization.
+Handles sentence segmentation and text cleaning.
+"""
+import re
+from typing import List
+import logging
+logger = logging.getLogger(__name__)
+class TextProcessor:
+    """Handle sentence segmentation and text processing"""
+    @staticmethod
+    def segment_into_sentences(text: str) -> List[str]:
+        """
+        Break text into sentences using multiple strategies.
+        Strategies:
+        1. NLTK punkt tokenizer (primary)
+        2. Regex-based fallback
+        3. Min/max length constraints
+        Args:
+            text: Input text to segment
+        Returns:
+            List of sentences
+        """
+        # Clean text
+        text = text.strip()
+        if not text:
+            return []
+        # Try NLTK first (better accuracy)
+        try:
+            import nltk
+            # Try to use punkt tokenizer
+            try:
+                from nltk.tokenize import sent_tokenize
+                sentences = sent_tokenize(text)
+            except LookupError:
+                # Download punkt if not available
+                logger.info("Downloading NLTK punkt tokenizer...")
+                nltk.download('punkt', quiet=True)
+                from nltk.tokenize import sent_tokenize
+                sentences = sent_tokenize(text)
+        except Exception as e:
+            # Fallback: regex-based segmentation
+            logger.warning(f"NLTK tokenization failed ({e}), using regex fallback")
+            sentences = TextProcessor._regex_segmentation(text)
+        # Clean and filter
+        sentences = [s.strip() for s in sentences if s.strip()]
+        # Filter out very short "sentences" (likely not meaningful)
+        # Require at least 3 words
+        sentences = [s for s in sentences if len(s.split()) >= 3]
+        return sentences
+    @staticmethod
+    def _regex_segmentation(text: str) -> List[str]:
+        """
+        Fallback sentence segmentation using regex.
+        This is less accurate than NLTK but works without dependencies.
+        """
+        # Split on period, exclamation, question mark (followed by space or end)
+        # Look for: ., !, or ? followed by space + capital letter, or end of string
+        pattern = r'(?<=[.!?])\s+(?=[A-Z])|(?<=[.!?])$'
+        sentences = re.split(pattern, text)
+        return [s.strip() for s in sentences if s.strip()]
+    @staticmethod
+    def is_valid_sentence(sentence: str) -> bool:
+        """
+        Check if sentence is valid for categorization.
+        Args:
+            sentence: Input sentence
+        Returns:
+            True if valid, False otherwise
+        """
+        # Must have at least 3 words
+        if len(sentence.split()) < 3:
+            return False
+        # Must have some alphabetic characters
+        if not any(c.isalpha() for c in sentence):
+            return False
+        # Not just a list item or fragment
+        stripped = sentence.strip()
+        if stripped.startswith('-') or stripped.startswith('•') or stripped.startswith('*'):
+            # Allow if it has substantial text after the bullet
+            if len(stripped[1:].strip().split()) < 3:
+                return False
+        return True
+    @staticmethod
+    def clean_sentence(sentence: str) -> str:
+        """
+        Clean a sentence for processing.
+        Args:
+            sentence: Input sentence
+        Returns:
+            Cleaned sentence
+        """
+        # Remove leading bullet points or numbers
+        sentence = re.sub(r'^[\s\-•*\d.]+\s*', '', sentence)
+        # Normalize whitespace
+        sentence = ' '.join(sentence.split())
+        # Ensure it ends with punctuation
+        if sentence and not sentence[-1] in '.!?':
+            sentence += '.'
+        return sentence.strip()
+    @staticmethod
+    def segment_and_clean(text: str) -> List[str]:
+        """
+        Segment text into sentences and clean them.
+        This is the main entry point for text processing.
+        Args:
+            text: Input text
+        Returns:
+            List of cleaned, valid sentences
+        """
+        # Segment
+        sentences = TextProcessor.segment_into_sentences(text)
+        # Clean and filter
+        result = []
+        for sentence in sentences:
+            cleaned = TextProcessor.clean_sentence(sentence)
+            if TextProcessor.is_valid_sentence(cleaned):
+                result.append(cleaned)
+        return result
+    @staticmethod
+    def get_sentence_count_estimate(text: str) -> int:
+        """
+        Quick estimate of sentence count without full processing.
+        Args:
+            text: Input text
+        Returns:
+            Estimated sentence count
+        """
+        # Count sentence-ending punctuation
+        count = text.count('.') + text.count('!') + text.count('?')
+        # At least 1 if text exists
+        return max(1, count)

mock_data_60.json ADDED Viewed

	@@ -0,0 +1,726 @@

+{
+  "submissions": [
+    {
+      "id": 1,
+      "message": "We dream of a future with everyone has affordable housing within 20 minutes of work",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.7795,
+        "lng": -47.979
+      },
+      "timestamp": "2025-01-15T14:30:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 2,
+      "message": "Our vision is to create air quality meets the highest international standards",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.7251,
+        "lng": -47.9745
+      },
+      "timestamp": "2025-01-15T15:00:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 3,
+      "message": "The ideal scenario would be air quality meets the highest international standards",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.7235,
+        "lng": -47.9387
+      },
+      "timestamp": "2025-01-15T15:30:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 4,
+      "message": "We dream of a future with zero waste is achieved through comprehensive recycling",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.778,
+        "lng": -47.8505
+      },
+      "timestamp": "2025-01-15T16:00:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 5,
+      "message": "The ideal scenario would be parks and nature are accessible to all residents",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.7061,
+        "lng": -47.8908
+      },
+      "timestamp": "2025-01-15T16:30:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 6,
+      "message": "We dream of a future with renewable energy powers 100% of our infrastructure",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.7388,
+        "lng": -47.9121
+      },
+      "timestamp": "2025-01-15T17:00:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 7,
+      "message": "We envision a city where equity and inclusion are foundational to all decisions",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.8396,
+        "lng": -47.8803
+      },
+      "timestamp": "2025-01-15T17:30:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 8,
+      "message": "The ideal scenario would be all citizens have access to clean energy and green spaces",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.8681,
+        "lng": -47.9813
+      },
+      "timestamp": "2025-01-15T18:00:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 9,
+      "message": "Imagine a community that children can safely walk or bike to school",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.8515,
+        "lng": -47.8442
+      },
+      "timestamp": "2025-01-15T18:30:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 10,
+      "message": "We want to see a city that zero waste is achieved through comprehensive recycling",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.7153,
+        "lng": -47.9456
+      },
+      "timestamp": "2025-01-15T19:00:00",
+      "category": "Vision",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 11,
+      "message": "We are facing challenges with insufficient green spaces in densely populated zones",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.7989,
+        "lng": -47.979
+      },
+      "timestamp": "2025-01-15T19:30:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 12,
+      "message": "One major concern is inadequate waste management systems",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.7862,
+        "lng": -47.9812
+      },
+      "timestamp": "2025-01-15T20:00:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 13,
+      "message": "There is inadequate digital divide affecting low-income communities",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.8672,
+        "lng": -47.8886
+      },
+      "timestamp": "2025-01-15T20:30:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 14,
+      "message": "A critical problem is aging water infrastructure causing frequent issues",
+      "contributor_type": "ngo",
+      "location": {
+        "lat": -15.7679,
+        "lng": -47.862
+      },
+      "timestamp": "2025-01-15T21:00:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 15,
+      "message": "The current situation with lack of affordable housing for middle-income families is problematic",
+      "contributor_type": "ngo",
+      "location": {
+        "lat": -15.6868,
+        "lng": -47.8453
+      },
+      "timestamp": "2025-01-15T21:30:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 16,
+      "message": "The main issue is aging water infrastructure causing frequent issues",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.7037,
+        "lng": -47.8742
+      },
+      "timestamp": "2025-01-15T22:00:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 17,
+      "message": "We are facing challenges with lack of affordable housing for middle-income families",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.7255,
+        "lng": -47.9207
+      },
+      "timestamp": "2025-01-15T22:30:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 18,
+      "message": "We lack sufficient inadequate waste management systems",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.7296,
+        "lng": -47.9722
+      },
+      "timestamp": "2025-01-15T23:00:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 19,
+      "message": "One major concern is inadequate waste management systems",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.7532,
+        "lng": -47.9011
+      },
+      "timestamp": "2025-01-15T23:30:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 20,
+      "message": "The main issue is food deserts in several neighborhoods",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.7114,
+        "lng": -47.8629
+      },
+      "timestamp": "2025-01-16T00:00:00",
+      "category": "Problem",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 21,
+      "message": "We should strive to ensure 90% of residents live within 10 minutes of transit",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.8209,
+        "lng": -47.9591
+      },
+      "timestamp": "2025-01-16T00:30:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 22,
+      "message": "Our target is to increase bike lane network by 200 kilometers",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.8401,
+        "lng": -47.9368
+      },
+      "timestamp": "2025-01-16T01:00:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 23,
+      "message": "The objective should be to increase bike lane network by 200 kilometers",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.7152,
+        "lng": -47.9343
+      },
+      "timestamp": "2025-01-16T01:30:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 24,
+      "message": "We must work towards reduce carbon emissions by 50% in the next 5 years",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.8555,
+        "lng": -47.9754
+      },
+      "timestamp": "2025-01-16T02:00:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 25,
+      "message": "We must work towards increase bike lane network by 200 kilometers",
+      "contributor_type": "ngo",
+      "location": {
+        "lat": -15.7199,
+        "lng": -47.9691
+      },
+      "timestamp": "2025-01-16T02:30:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 26,
+      "message": "The objective should be to create 500 acres of new parks and green spaces",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.7006,
+        "lng": -47.9967
+      },
+      "timestamp": "2025-01-16T03:00:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 27,
+      "message": "The primary objective is retrofit all public buildings for energy efficiency",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.8463,
+        "lng": -48.0058
+      },
+      "timestamp": "2025-01-16T03:30:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 28,
+      "message": "We should strive to increase bike lane network by 200 kilometers",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.6882,
+        "lng": -47.9008
+      },
+      "timestamp": "2025-01-16T04:00:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 29,
+      "message": "We aim to achieve provide high-speed internet to 100% of households",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.7342,
+        "lng": -47.9172
+      },
+      "timestamp": "2025-01-16T04:30:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 30,
+      "message": "We aim to achieve improve water quality to exceed national standards",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.7662,
+        "lng": -47.9675
+      },
+      "timestamp": "2025-01-16T05:00:00",
+      "category": "Objectives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 31,
+      "message": "We must implement restrictions on single-use plastics in retail",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.879,
+        "lng": -47.9683
+      },
+      "timestamp": "2025-01-16T05:30:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 32,
+      "message": "We should establish rules for noise regulations in residential areas",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.7637,
+        "lng": -47.9788
+      },
+      "timestamp": "2025-01-16T06:00:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 33,
+      "message": "We should establish rules for energy efficiency standards for all renovations",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.713,
+        "lng": -47.9773
+      },
+      "timestamp": "2025-01-16T06:30:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 34,
+      "message": "The city should enforce building codes that require accessibility standards",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.6881,
+        "lng": -48.0225
+      },
+      "timestamp": "2025-01-16T07:00:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 35,
+      "message": "We need to mandate energy efficiency standards for all renovations",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.8179,
+        "lng": -47.9225
+      },
+      "timestamp": "2025-01-16T07:30:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 36,
+      "message": "Authorities need to enforce building codes that require accessibility standards",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.8307,
+        "lng": -47.898
+      },
+      "timestamp": "2025-01-16T08:00:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 37,
+      "message": "Authorities need to enforce protected bike lanes on all major corridors",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.7259,
+        "lng": -47.9658
+      },
+      "timestamp": "2025-01-16T08:30:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 38,
+      "message": "Policies must ensure tree preservation ordinances in development zones",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.8086,
+        "lng": -47.9173
+      },
+      "timestamp": "2025-01-16T09:00:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 39,
+      "message": "We should establish rules for building codes that require accessibility standards",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.8257,
+        "lng": -48.0039
+      },
+      "timestamp": "2025-01-16T09:30:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 40,
+      "message": "Authorities need to enforce restrictions on single-use plastics in retail",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.6997,
+        "lng": -47.8941
+      },
+      "timestamp": "2025-01-16T10:00:00",
+      "category": "Directives",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 41,
+      "message": "Our foundation is built on transparency and democratic decision-making",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.7953,
+        "lng": -47.8969
+      },
+      "timestamp": "2025-01-16T10:30:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 42,
+      "message": "We hold social equity and inclusive participation as a core value",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.8073,
+        "lng": -47.993
+      },
+      "timestamp": "2025-01-16T11:00:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 43,
+      "message": "We are committed to innovation balanced with preservation",
+      "contributor_type": "ngo",
+      "location": {
+        "lat": -15.7714,
+        "lng": -47.9996
+      },
+      "timestamp": "2025-01-16T11:30:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 44,
+      "message": "We are committed to community resilience and mutual support",
+      "contributor_type": "ngo",
+      "location": {
+        "lat": -15.78,
+        "lng": -47.9534
+      },
+      "timestamp": "2025-01-16T12:00:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 45,
+      "message": "We are committed to community resilience and mutual support",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.7062,
+        "lng": -47.8504
+      },
+      "timestamp": "2025-01-16T12:30:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 46,
+      "message": "We are committed to accessibility and universal design",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.7476,
+        "lng": -47.9312
+      },
+      "timestamp": "2025-01-16T13:00:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 47,
+      "message": "It is essential to prioritize health and wellbeing for all residents",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.7532,
+        "lng": -47.9828
+      },
+      "timestamp": "2025-01-16T13:30:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 48,
+      "message": "We hold innovation balanced with preservation as a core value",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.8689,
+        "lng": -48.0167
+      },
+      "timestamp": "2025-01-16T14:00:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 49,
+      "message": "The principle of innovation balanced with preservation matters to us",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.6869,
+        "lng": -48.0234
+      },
+      "timestamp": "2025-01-16T14:30:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 50,
+      "message": "Our community values accessibility and universal design",
+      "contributor_type": "academic",
+      "location": {
+        "lat": -15.8087,
+        "lng": -47.9772
+      },
+      "timestamp": "2025-01-16T15:00:00",
+      "category": "Values",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 51,
+      "message": "We can construct comprehensive recycling and composting facilities",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.8132,
+        "lng": -47.9721
+      },
+      "timestamp": "2025-01-16T15:30:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 52,
+      "message": "Let us establish a new metro line connecting eastern suburbs",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.694,
+        "lng": -47.9389
+      },
+      "timestamp": "2025-01-16T16:00:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 53,
+      "message": "We should install community centers in underserved neighborhoods",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.8259,
+        "lng": -47.9417
+      },
+      "timestamp": "2025-01-16T16:30:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 54,
+      "message": "We should build a new metro line connecting eastern suburbs",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.717,
+        "lng": -47.9367
+      },
+      "timestamp": "2025-01-16T17:00:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 55,
+      "message": "Let us organize farmers markets in every district",
+      "contributor_type": "industry",
+      "location": {
+        "lat": -15.8263,
+        "lng": -47.9003
+      },
+      "timestamp": "2025-01-16T17:30:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 56,
+      "message": "We should build comprehensive recycling and composting facilities",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.8417,
+        "lng": -47.9085
+      },
+      "timestamp": "2025-01-16T18:00:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 57,
+      "message": "We can construct free WiFi hotspots in all public spaces",
+      "contributor_type": "government",
+      "location": {
+        "lat": -15.8124,
+        "lng": -47.8294
+      },
+      "timestamp": "2025-01-16T18:30:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 58,
+      "message": "We need to develop farmers markets in every district",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.7155,
+        "lng": -47.918
+      },
+      "timestamp": "2025-01-16T19:00:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 59,
+      "message": "We need to develop protected bike lanes on major streets",
+      "contributor_type": "other",
+      "location": {
+        "lat": -15.8594,
+        "lng": -47.9596
+      },
+      "timestamp": "2025-01-16T19:30:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    },
+    {
+      "id": 60,
+      "message": "Let us create solar panel installations on 200 public buildings",
+      "contributor_type": "community",
+      "location": {
+        "lat": -15.7879,
+        "lng": -47.9923
+      },
+      "timestamp": "2025-01-16T20:00:00",
+      "category": "Actions",
+      "flagged_as_offensive": false
+    }
+  ],
+  "export_date": "2025-10-06T13:14:53.243263",
+  "description": "Mock dataset with 60 balanced submissions (10 per category)"
+}

prepare_hf_deployment.sh ADDED Viewed

	@@ -0,0 +1,109 @@

+#!/bin/bash
+# Hugging Face Deployment Preparation Script
+# This script prepares your app for deployment to Hugging Face Spaces
+set -e  # Exit on error
+echo "🚀 Preparing for Hugging Face Spaces Deployment"
+echo "================================================"
+echo ""
+# Check if we're in the right directory
+if [ ! -f "app_hf.py" ]; then
+    echo "❌ Error: Must run from project root (where app_hf.py is located)"
+    exit 1
+fi
+# Step 1: Copy HF-specific files
+echo "📁 Step 1: Copying HF-specific files..."
+cp Dockerfile.hf Dockerfile
+echo "   ✓ Copied Dockerfile.hf → Dockerfile"
+cp README_HF.md README.md
+echo "   ✓ Copied README_HF.md → README.md"
+# Step 2: Verify required files exist
+echo ""
+echo "🔍 Step 2: Verifying required files..."
+required_files=("Dockerfile" "README.md" "requirements.txt" "app_hf.py" "wsgi.py" ".gitignore" "app/__init__.py")
+for file in "${required_files[@]}"; do
+    if [ -f "$file" ] || [ -d "$file" ]; then
+        echo "   ✓ $file"
+    else
+        echo "   ❌ Missing: $file"
+        exit 1
+    fi
+done
+# Step 3: Check app/ directory
+echo ""
+echo "📂 Step 3: Checking app directory structure..."
+app_dirs=("app/routes" "app/models" "app/templates" "app/fine_tuning")
+for dir in "${app_dirs[@]}"; do
+    if [ -d "$dir" ]; then
+        echo "   ✓ $dir/"
+    else
+        echo "   ⚠️  Warning: $dir/ not found"
+    fi
+done
+# Step 4: Verify port configuration
+echo ""
+echo "🔌 Step 4: Verifying port 7860 configuration..."
+if grep -q "7860" Dockerfile && grep -q "7860" app_hf.py; then
+    echo "   ✓ Port 7860 configured correctly"
+else
+    echo "   ❌ Port 7860 not found in Dockerfile or app_hf.py"
+    exit 1
+fi
+# Step 5: Check for sensitive files
+echo ""
+echo "🔒 Step 5: Checking for sensitive files..."
+if [ -f ".env" ]; then
+    echo "   ⚠️  WARNING: .env file exists - DO NOT upload to HF!"
+    echo "      Use HF Secrets instead for FLASK_SECRET_KEY"
+fi
+if [ -f "instance/participatory_planner.db" ]; then
+    echo "   ⚠️  Local database exists - will NOT be uploaded (good)"
+fi
+# Step 6: Generate deployment summary
+echo ""
+echo "📊 Step 6: Deployment Summary"
+echo "============================="
+echo ""
+echo "Ready to deploy to Hugging Face Spaces!"
+echo ""
+echo "📦 Files ready for upload:"
+echo "   - Dockerfile (HF version)"
+echo "   - README.md (with YAML header)"
+echo "   - requirements.txt"
+echo "   - app_hf.py"
+echo "   - wsgi.py"
+echo "   - app/ directory"
+echo "   - .gitignore"
+echo ""
+echo "🔐 IMPORTANT - Configure these secrets in HF Space Settings:"
+echo "   Secret Name: FLASK_SECRET_KEY"
+echo "   Secret Value: 9fd11d101e36efbd3a7893f56d604b860403d247633547586c41453118e69b00"
+echo ""
+echo "🌐 Next steps:"
+echo "   1. Go to https://huggingface.co/new-space"
+echo "   2. Choose SDK: Docker"
+echo "   3. Upload the files listed above"
+echo "   4. Add FLASK_SECRET_KEY to Secrets"
+echo "   5. Wait for build (~10 minutes first time)"
+echo ""
+echo "📖 For detailed instructions, see:"
+echo "   - HF_DEPLOYMENT_CHECKLIST.md"
+echo "   - HUGGINGFACE_DEPLOYMENT.md"
+echo ""
+echo "✅ Preparation complete! Ready to deploy! 🎉"

requirements.txt CHANGED Viewed

@@ -14,3 +14,6 @@ matplotlib>=3.7.0
 seaborn>=0.12.0
 accelerate>=0.24.0
 evaluate>=0.4.0

 seaborn>=0.12.0
 accelerate>=0.24.0
 evaluate>=0.4.0
+# Text processing (for sentence segmentation)
+nltk>=3.8.0

run.py CHANGED Viewed

@@ -1,3 +1,9 @@
 from app import create_app
 app = create_app()

+import os
+from dotenv import load_dotenv
+# Load environment variables (including CUDA_VISIBLE_DEVICES)
+load_dotenv()
 from app import create_app
 app = create_app()

sentence_analysis_results.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+DETAILED SENTENCE-LEVEL ANALYSIS RESULTS
+======================================================================
+Total Submissions: 60
+Multi-category Submissions: 0 (0.0%)
+DETAILED BREAKDOWN: