Welcome to our solution for the Personalized Solutions Dataquest Challenge, where we built a predictive and recommendation system to enhance user engagement on a financial services platform.
The challenge was to:
- Predict meaningful customer interactions (
CLICK,CHECKOUT) from raw behavioral data. - Recommend relevant financial products to users based on their activity and profile.
- Dataset Description
- Data Preparation
- Exploratory Data Analysis (EDA)
- Modeling Approaches
- Evaluation Metrics
- Recommendation System
- Feature Importance
- Wow Moments
- Conclusion & Next Steps
| Column | Type | Description |
|---|---|---|
idcol |
User ID | Unique customer identifier |
interaction |
Categorical | DISPLAY, CLICK, CHECKOUT |
int_date |
Date | Date of interaction |
item |
Categorical | Item code |
item_type |
Category | TRANSACT, LEND, INVEST, etc. |
item_descrip |
Text | Description of the item |
page, tod |
Context | Time of Day, App Page |
segment, beh_segment |
User Features | Broad and detailed segmentation |
active_ind |
Activity | Cold Start, Semi Active, Active |
- Date conversion and missing value handling
- Created
rating: DISPLAY = 0, CLICK = 1, CHECKOUT = 2 - Train/Test Split using time-based logic
- Categorical encoding and feature scaling
- Weekly interaction trends visualized for DISPLAY, CLICK, CHECKOUT
- Feature correlation heatmap showing influence of
segment,item_type,active_ind
- Logistic Regression (baseline)
- Random Forest
- Bagging Classifier
- Gradient Boosting
- Improved performance using Bagging and Boosting to reduce variance and bias.
We evaluated models using:
- Accuracy
- Precision
- Recall
- F1 Score
- AUC-ROC
- Log Loss
- Precision@10 (for top-N relevance)
- TF-IDF + Cosine Similarity on item descriptions
- Recommended similar items per product
- SVD-based user-item matrix factorization
- Suggested items based on similar users’ behaviors
Key influencing features:
segmentactive_inditem_type
Visualized using:
- Random Forest Feature Importance
- Permutation Importance
- Transformed
interactioninto numerical ratings for modeling - Hybrid Recommender System combining content & collaborative filtering
- Ensemble visualizations with trend lines
- Segment-based insights revealed behavioral drivers
- Time-aware test split to simulate real-world predictions
- Precision@K focused on top-N quality
- Accurate interaction prediction with ensemble methods
- Personalized recommendations using hybrid approach
- Add time-series aware modeling
- Improve recommendations for cold-start users
- Automate pipeline for retraining and monitoring
- Simanga Mchunu
- Contact: simacoder@hotmail.com
├── data/
│ └── data.csv
├── notebooks/
│ └── model_training.ipynb
├── src/
│ ├── preprocessing.py
│ ├── modeling.py
│ ├── recommender.py
├── Recommender Presentation.pdf
├── README.md