Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Label-efficient Response Modelling: Cost ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

Label-efficient Response Modelling: Cost-Effective Marketing Using Cluster-Based Active Sampling
Swee Chuan Tan ORCID icon link to view author Swee Chuan Tan details  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1198
Pub. online: 3 September 2025      Type: Data Science In Action      Open accessOpen Access

Received
16 April 2025
Accepted
18 July 2025
Published
3 September 2025

Abstract

This paper introduces a label-efficient response modelling method useful when the target labels are unknown a priori. Unlike most response modelling methods that adopt a supervised or semi-supervised approach, we apply clustering to partition data into homogeneous segments, which are assumed to reflect the underlying response behaviours. We then take a random sample from each cluster. For each sampled record, the true target label is acquired. Through this cluster-based stratified sampling approach, we reduced the cost of label acquisition needed to estimate the cluster-specific and overall basic response rates. The goal is to identify a subset of the population more likely to respond (e.g., make a purchase) while controlling campaign costs. This idea of subsetting the population represents a departure from conventional classification tasks, which require full labeling of all observations. We regard clusters with response rates significantly higher than the estimated basic response rate as high-propensity clusters and proceed to acquire all their remaining labels. Our experimental results show that the response rates of high-propensity clusters are at least 1.7 times the basic response rate. This suggests that the proposed approach significantly reduces costs by targeting only high-propensity groups and is useful in scenarios lacking historical ground truth.

Supplementary material

 Supplementary Material
The Python notebook containing the implementation of the proposed method is available at the following link: https://drive.google.com/drive/folders/1WE8A0aZ-cKLJ45hRDMFH20CczwZ2wiWh?usp=sharing.

References

 
Ali A, Abd Razak S, Othman SH, Eisa TAE, Al-Dhaqm A, Nasser M, et al. (2022). Financial fraud detection based on machine learning: A systematic literature review. Applied Sciences, 12(19): 9637. https://doi.org/10.3390/app12199637
 
Baesens B (2004). Developing intelligent systems for credit scoring using machine learning techniques. Ph.D. Thesis, Katholieke Universiteit Leuven, Belgium.
 
Chaudhuri N, Gupta G, Vamsi V, Bose I (2021). On the platform but will they buy? Predicting customers’ purchase behavior using deep learning. Decision Support Systems, 149: 113622. https://doi.org/10.1016/j.dss.2021.113622
 
Emtiyaz S, Keyvanpour M (2011). Customers behavior modeling by semi-supervised learning in customer relationship management. arXiv preprint: https://arxiv.org/abs/1201.1670.
 
Gönül FF, Hofstede FT (2006). How to compute optimal catalog mailing decisions. Marketing Science, 25(1): 65–74. Published online: January 1, 2006. https://doi.org/10.1287/mksc.1050.0136
 
Google LLC (2025). Google analytics. Web analytics platform.
 
Hanssens DM, Leeflang PSH, Wittink DR (2005). Market response models and marketing practice. UCLA Anderson School of Management.
 
Haron NHB (2022). Stratified sampling using cluster analysis. AIP Conference Proceedings, 2472(1): 050012.
 
Haughton D, Oulabi S (1993). Direct marketing modeling with CART and CHAID. Journal of Direct Marketing, 7(3): 16–26. 11 pages. https://doi.org/10.1002/dir.4000070305
 
He H, Garcia EA (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9): 1263–1284. https://doi.org/10.1109/TKDE.2008.239
 
Housden M, Thomas B (2002). Direct Marketing in Practice, 1st edition. Routledge, London. EBook published 27 April 2012.
 
Kang P, Cho S, MacLachlan DL (2012). Improved response modeling based on clustering, under-sampling, and ensemble. Expert Systems with Applications, 39(8): 6738–6753. https://doi.org/10.1016/j.eswa.2011.12.028
 
Lee HJ, Shin H, Hwang SS, Cho S, MacLachlan D (2010). Semi-supervised response modeling. Journal of Interactive Marketing, 24(1): 42–54. https://doi.org/10.1016/j.intmar.2009.10.004
 
Mohammed Amine Naji S, El Filali S, Aarika K, Benlahmar EH, Ait Abdelouhahid R, Debauche O (2021). Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191: 487–492. https://doi.org/10.1016/j.procs.2021.07.062
 
Moro S, Rita P, Cortez P (2014). Bank marketing. UCI Machine Learning Repository.
 
Sakar C, Kastro Y (2018). Online shoppers purchasing intention dataset. UCI Machine Learning Repository.
 
Thomas AR (2007). The end of mass marketing: Or, why all successful marketing is now direct marketing. Direct Marketing: An International Journal, 1(1): 6–16. https://doi.org/10.1108/17505930710734107
 
Tipton E (2013). Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments. Evaluation Review, 37(2): 109–139. https://doi.org/10.1177/0193841X13516324
 
Tékouabou SCK, Gherghina SC, Toulni H, Neves Mata P, Mata MN, Martins JM (2022). A machine learning framework towards bank telemarketing prediction. Journal of Risk and Financial Management, 15(6): 269. https://doi.org/10.3390/jrfm15060269
 
Yan X, Nazmi S, Gebru B, et al. (2022). A clustering-based active learning method to query informative and representative samples. Applied Intelligence, 52: 13250–13267. https://doi.org/10.1007/s10489-021-03139-y

PDF XML
PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
active learning data-efficient learning imbalanced data predictive modelling semi-supervised learning stratified sampling

Metrics
since February 2021
577

Article info
views

782

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy