Model Card: NYC Touristy Text Classifier
Model Details
- Model Type: A
distilbert-base-uncasedmodel fine-tuned for binary text classification. - Model Date: September 19, 2025
- Developed by: [Your Name/Team Name]
- Hugging Face Hub ID:
zacCMU/2025-24679-text-distilbert-predictor
Intended Use
This model is designed to classify short, descriptive texts about locations in New York City as either touristy or not_touristy. It is intended for applications that aim to categorize user-generated content, filter location reviews, or analyze descriptive narratives about urban environments.
Training Data
The model was fine-tuned on the bareethul/nyc-landmark-descriptions dataset. This dataset contains descriptions of various locations, each labeled as touristy or not_touristy.
The training process utilized both the original and an augmented version of the dataset to improve robustness and generalization.
Training Procedure
The model was trained for 5 epochs using the Hugging Face Trainer API. The training process was configured with the following key hyperparameters:
- Learning Rate:
2e-5 - Batch Size:
8per device for both training and evaluation - Weight Decay:
0.01 - Evaluation Strategy: Performed at the end of each epoch
- Best Model Selection: The model with the highest
accuracyon the evaluation set was saved as the final version.
Evaluation
The model's performance was evaluated on two separate datasets: the augmented test set and the original, un-augmented data (treated as an external validation set). The model achieved perfect scores across all standard classification metrics on both sets.
Test Results (Augmented Data):
| Metric | Value |
|---|---|
| Accuracy | 1.0000 |
| F1 | 1.0000 |
| Precision | 1.0000 |
| Recall | 1.0000 |
External Validation Results (Original Data):
| Metric | Value |
|---|---|
| Accuracy | 1.0000 |
| F1 | 1.0000 |
| Precision | 1.0000 |
| Recall | 1.0000 |
Sample Prediction:
Input: 'Flower stalls line the avenues, petals bright against brownstone grit. Young lovers trade tulips, old friends share sunflowers, all believing in the promise of beauty for another day.'
True Label:
not_touristyPredicted:
not_touristy(Confidence: 0.999)
Limitations and Ethical Considerations
- Dataset Specificity: This model is highly specialized for the
nyc-landmark-descriptionsdataset. Its performance on text describing locations outside of New York City or on different styles of prose is not guaranteed. - Subjectivity: The labels
touristyandnot_touristyare inherently subjective and reflect the definitions used in the original dataset. The model's classifications may not align with every individual's perception. - Potential for Overfitting: While the model scored perfectly on the provided test sets, this may indicate a risk of overfitting to the specific vocabulary and structure of the training data. Performance may differ on completely novel, real-world data.
How to Use
You can use this model for inference with the pipeline function from the transformers library.
- Downloads last month
- 9
Model tree for zacCMU/2025-24679-text-distilbert-predictor
Base model
distilbert/distilbert-base-uncased