ARTICLE
Stop Cold Calling, Start Targeting: Why Strategy and EDA are the Engines of Marketing Machine Learning
Transforming an 11.7% conversion rate into a precision targeting engine: A case study in customer lifecycle optimization
Machine learning in isolation is often seen as a silver bullet, but its real-world efficacy depends heavily on the depth of the exploratory data analysis (EDA) and business strategy that precede the first line of code. Without this foundation, models risk optimizing for the wrong metrics or relying on “leaky” data that holds no value in a live environment. A cold-call marketing campaign for a Portuguese bank serves as a perfect case study for this: with a baseline success rate of only 11.7%, a standard “blanket marketing” model would have been statistically accurate but commercially useless. By integrating deep feature exploration with a clear business roadmap, we can transform these raw datasets into high-precision targeting models that prioritize resource efficiency over noise.
Data Integrity over Data Deletion
In a perfect world, we would have perfect information. In the real world we are much more likely to work with incomplete or ‘messy’ data. For example, in the Portuguese bank dataset, the “previous outcome” variable (the variable indicating whether any outreach campaigns had resulted in a sale) appeared to be 99% “unknown”.
Before diving into imputation methods, or just discarding the whole feature, it is important to look into why this data is missing. Cross-referencing the feature with the time elapsed since last contact revealed a critical business insight: these were first-time prospects. By remapping these to “no previous contact”, the amount of “unknown” data was reduced to 32% and a seemingly faulty variable ended up becoming a primary driver in the model.
Another common pitfall is unexpected data leakage. During the exploratory analysis, it was clear that the call duration was a major predictor of success. Great! Lets have agents stay on the phone as long as possible… No. What this really tells us is that customers who purchase a product stay on the phone longer while they provide details and go through onboarding. Also, the model needs to support the marketing team in making decisions beforepicking up the phone, and so the feature had to be removed.
The “Accuracy Paradox” & Business Direction
In a dataset with a non-purchase rate of 88%, a model can achieve 88% accuracy by simply predicting that no one will buy. While this is technically “accurate”, it is commercially useless. This is where balancing techniques such as SMOTE and sampling are crucial.
Evaluating the model requires a choice between two business realities: resource efficiency or market penetration. A precision-focused approach is ideal for teams with tight budgets who need to minimize wasted outreach. However, a recall-focused strategy is better suited for aggressive growth, where capturing every potential customer is worth the cost of a higher failure rate on individual calls.
Conclusion: Data Science Needs a Business Compass
In a marketing setting, machine learning is only as valuable as the business strategy it supports; without a clear problem to solve and a deep exploratory analysis, optimizing happens in a vacuum. A high-performing model built on “leaky” data provides zero predictive value in application, just as an “accurate” model that ignores class imbalance might fail to find a single new customer. For any customer lifecycle strategy to succeed, the data must be interrogated until it is understood, and the model must be in tune to the business’ strategy.
Machine Learning Outcome
The following models were build and tested, each focusing on a different measure of success:
High precision: Random forest with SMOTE, precision = 47%, recall = 33%, accuracy = 88%
High recall: Random forest (200t, max depth = 12), precision = 32%, recall = 58%, accuracy = 81%