# Multi-Model Support Testing Guide This guide explains how to test the new multi-model infrastructure locally before committing to GitHub. ## Prerequisites - Mac Studio M3 Ultra or MacBook Pro M4 Max - Python 3.8+ - All dependencies installed (`pip install -r requirements.txt`) - Internet connection (for downloading Code-Llama 7B) ## Quick Start ### Step 1: Start the Backend In one terminal: ```bash cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend python -m uvicorn backend.model_service:app --reload --port 8000 ``` **Expected output:** ``` INFO: Loading CodeGen 350M on Apple Silicon GPU... INFO: ✅ CodeGen 350M loaded successfully INFO: Layers: 20, Heads: 16 INFO: Uvicorn running on http://127.0.0.1:8000 ``` ### Step 2: Run the Test Script In another terminal: ```bash cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend python test_multi_model.py ``` ## What the Test Script Does The test script runs 10 comprehensive tests: 1. ✅ **Health Check** - Verifies backend is running 2. ✅ **List Models** - Shows available models (CodeGen, Code-Llama) 3. ✅ **Current Model** - Gets info about loaded model 4. ✅ **Model Info** - Gets detailed architecture info 5. ✅ **Generate (CodeGen)** - Tests text generation with CodeGen 6. ✅ **Switch to Code-Llama** - Loads Code-Llama 7B 7. ✅ **Model Info (Code-Llama)** - Verifies Code-Llama loaded correctly 8. ✅ **Generate (Code-Llama)** - Tests generation with Code-Llama 9. ✅ **Switch Back to CodeGen** - Verifies model unloading works 10. ✅ **Generate (CodeGen again)** - Tests CodeGen still works ## Expected Test Duration - Tests 1-5 (CodeGen only): ~2-3 minutes - Test 6 (downloading Code-Llama): ~5-10 minutes (first time only) - Tests 7-10: ~3-5 minutes **Total first run:** ~15-20 minutes **Subsequent runs:** ~5-10 minutes (no download) ## Manual API Testing If you prefer to test manually, use these curl commands: ### List Available Models ```bash curl http://localhost:8000/models | jq ``` ### Get Current Model ```bash curl http://localhost:8000/models/current | jq ``` ### Switch to Code-Llama ```bash curl -X POST http://localhost:8000/models/switch \ -H "Content-Type: application/json" \ -d '{"model_id": "code-llama-7b"}' | jq ``` ### Generate Text ```bash curl -X POST http://localhost:8000/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "def fibonacci(n):\n ", "max_tokens": 50, "temperature": 0.7, "extract_traces": false }' | jq ``` ### Get Model Info ```bash curl http://localhost:8000/model/info | jq ``` ## Success Criteria Before committing to GitHub, verify: - ✅ All tests pass - ✅ CodeGen generates reasonable code - ✅ Code-Llama loads successfully - ✅ Code-Llama generates reasonable code - ✅ Can switch between models multiple times - ✅ No Python errors in backend logs - ✅ Memory usage is reasonable (check Activity Monitor) ## Expected Model Behavior ### CodeGen 350M - Loads in ~5-10 seconds - Uses ~2-3GB RAM - Generates Python code (trained on Python only) - 20 layers, 16 attention heads ### Code-Llama 7B - First download: ~14GB, takes 5-10 minutes - Loads in ~30-60 seconds - Uses ~14-16GB RAM - Generates multiple languages - 32 layers, 32 attention heads (GQA with 8 KV heads) ## Troubleshooting ### Backend won't start ```bash # Check if already running lsof -i :8000 # Kill existing process kill -9 ``` ### Import errors ```bash # Reinstall dependencies pip install -r requirements.txt ``` ### Code-Llama download fails - Check internet connection - Verify HuggingFace is accessible: `ping huggingface.co` - Try downloading manually: ```python from transformers import AutoModelForCausalLM AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf") ``` ### Out of memory - Close other applications - Use CodeGen only (skip Code-Llama tests) - Check Activity Monitor for memory usage ## Next Steps After Testing Once all tests pass: 1. **Document any issues found** 2. **Take note of generation quality** 3. **Check if visualizations need updates** (next phase) 4. **Commit to feature branch** (NOT main) 5. **Test frontend integration** ## Files Modified This implementation modified/created: **Backend:** - `backend/model_config.py` (NEW) - `backend/model_adapter.py` (NEW) - `backend/model_service.py` (MODIFIED) - `test_multi_model.py` (NEW) **Status:** All changes are in `feature/multi-model-support` branch **Rollback:** `git checkout pre-multimodel` tag if needed