Outcome Prediction on a Simulated Diamond Dataset
Built and compared regression models on a 10,000-row training set with 30 features, then showed the target was simulated and weakly tied to classic diamond pricing variables.
- Depth emerged as the dominant signal (corr ~ -0.411), while price and carat had near-zero relationship with outcome.
- Best model was tuned XGBoost with R² = 0.4736, outperforming linear regression and random forest.
- Feature importance indicated depth contributed ~57% of split decisions, with remaining signal limited by high irreducible noise.