Feature Engineering Drives More Improvement Than Hyperparameter Tuning

The DataCo Late Delivery Predictor is an end-to-end MLOps pipeline trained on 180,000 shipment records. It predicts late deliveries before they ship.

Metric	Value
F1-weighted score	0.69
Improvement vs. DummyClassifier baseline	+66%
Validation method	37-month walk-forward backtest
Top SHAP driver	Shipping mode

Why Feature Engineering Comes First

Feature engineering creates new signal. Hyperparameter tuning optimises within existing signal. If the signal is weak, tuning cannot rescue it.

In this dataset, shipping mode was the top driver of late deliveries according to SHAP analysis. That column existed raw in the data. Other derived features required construction. Days since last shipment per customer-supplier pair. Rolling average delay by route over 30-day windows. A port congestion flag derived from external weather and holiday data. Each derived feature added measurable lift. Hyperparameter tuning alone could not have discovered these patterns.

In this project, feature engineering on domain variables produced the majority of the lift. Hyperparameter tuning added incremental gains on top of that foundation.

What the Full Pipeline Includes

The project uses standard MLOps tooling for reproducibility. ZenML handles pipeline orchestration. MLflow manages experiment tracking and the model registry. Validation uses a 37-month walk-forward backtest, not a cherry-picked holdout split. SHAP provides model explainability. Evidently monitors for data drift with rollback capability. A Streamlit executive dashboard surfaces the business cost of each wrong prediction, because a false negative in late delivery has a real dollar figure attached to it.

Most ML demos stop at the notebook. This one does not.

The code is available on GitHub.

When to Tune Anyway

Hyperparameter tuning is not useless. It adds value after feature engineering is exhausted. The mistake is doing tuning first, or only.

Priority	Activity	Impact
1	Feature engineering	High (primary driver of lift)
2	Hyperparameter tuning	Incremental (adds on top)

The Decision Order Matters

Build the right features first. Then tune. The decision order matters. Not after the model is deployed.