<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1198</article-id>
<article-id pub-id-type="doi">10.6339/25-JDS1198</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Data Science in Action</subject></subj-group></article-categories>
<title-group>
<article-title>Label-efficient Response Modelling: Cost-Effective Marketing Using Cluster-Based Active Sampling</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0007-7739-6743</contrib-id>
<name><surname>Tan</surname><given-names>Swee Chuan</given-names></name><email xlink:href="mailto:jamestansc@suss.edu.sg">jamestansc@suss.edu.sg</email><xref ref-type="aff" rid="j_jds1198_aff_001">1</xref>
</contrib>
<aff id="j_jds1198_aff_001"><label>1</label>School of Business, <institution>Singapore University of Social Sciences</institution>, <country>Singapore</country></aff>
</contrib-group>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>3</day><month>9</month><year>2025</year></pub-date><volume content-type="ahead-of-print">0</volume><issue>0</issue><fpage>1</fpage><lpage>14</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1198_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The Python notebook containing the implementation of the proposed method is available at the following link: <ext-link ext-link-type="uri" xlink:href="https://colab.research.google.com/drive/1IG-9N7iakfPUUKnIskKYH2kbs_F6sdgP?usp=sharing">https://colab.research.google.com/drive/1IG-9N7iakfPUUKnIskKYH2kbs_F6sdgP?usp=sharing</ext-link>.</p>
<p>Additionally, the datasets used in this study, where features are ordered by their importance (from left to right), can be accessed via: <ext-link ext-link-type="uri" xlink:href="https://drive.google.com/drive/folders/1WE8A0aZ-cKLJ45hRDMFH20CczwZ2wiWh?usp=sharing">https://drive.google.com/drive/folders/1WE8A0aZ-cKLJ45hRDMFH20CczwZ2wiWh?usp=sharing</ext-link>.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>16</day><month>4</month><year>2025</year></date><date date-type="accepted"><day>18</day><month>7</month><year>2025</year></date></history>
<permissions><copyright-statement>2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>This paper introduces a label-efficient response modelling method useful when the target labels are unknown a priori. Unlike most response modelling methods that adopt a supervised or semi-supervised approach, we apply clustering to partition data into homogeneous segments, which are assumed to reflect the underlying response behaviours. We then take a random sample from each cluster. For each sampled record, the true target label is acquired. Through this cluster-based stratified sampling approach, we reduced the cost of label acquisition needed to estimate the cluster-specific and overall basic response rates. The goal is to identify a subset of the population more likely to respond (e.g., make a purchase) while controlling campaign costs. This idea of subsetting the population represents a departure from conventional classification tasks, which require full labeling of all observations. We regard clusters with response rates significantly higher than the estimated basic response rate as high-propensity clusters and proceed to acquire all their remaining labels. Our experimental results show that the response rates of high-propensity clusters are at least 1.7 times the basic response rate. This suggests that the proposed approach significantly reduces costs by targeting only high-propensity groups and is useful in scenarios lacking historical ground truth.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>active learning</kwd>
<kwd>data-efficient learning</kwd>
<kwd>imbalanced data</kwd>
<kwd>predictive modelling</kwd>
<kwd>semi-supervised learning</kwd>
<kwd>stratified sampling</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1198_reflist_001">
<title>References</title>
<ref id="j_jds1198_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Ali</surname> <given-names>A</given-names></string-name>, <string-name><surname>Abd Razak</surname> <given-names>S</given-names></string-name>, <string-name><surname>Othman</surname> <given-names>SH</given-names></string-name>, <string-name><surname>Eisa</surname> <given-names>TAE</given-names></string-name>, <string-name><surname>Al-Dhaqm</surname> <given-names>A</given-names></string-name>, <string-name><surname>Nasser</surname> <given-names>M</given-names></string-name>, <etal>et al.</etal> (<year>2022</year>). <article-title>Financial fraud detection based on machine learning: A systematic literature review</article-title>. <source><italic>Applied Sciences</italic></source>, <volume>12</volume>(<issue>19</issue>): <fpage>9637</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/app12199637" xlink:type="simple">https://doi.org/10.3390/app12199637</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_002">
<mixed-citation publication-type="other"> <string-name><surname>Baesens</surname> <given-names>B</given-names></string-name> (<year>2004</year>). Developing intelligent systems for credit scoring using machine learning techniques. <italic>Ph.D. Thesis, Katholieke Universiteit Leuven, Belgium</italic>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Chaudhuri</surname> <given-names>N</given-names></string-name>, <string-name><surname>Gupta</surname> <given-names>G</given-names></string-name>, <string-name><surname>Vamsi</surname> <given-names>V</given-names></string-name>, <string-name><surname>Bose</surname> <given-names>I</given-names></string-name> (<year>2021</year>). <article-title>On the platform but will they buy? Predicting customers’ purchase behavior using deep learning</article-title>. <source><italic>Decision Support Systems</italic></source>, <volume>149</volume>: <elocation-id>113622</elocation-id>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.dss.2021.113622" xlink:type="simple">https://doi.org/10.1016/j.dss.2021.113622</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_004">
<mixed-citation publication-type="other"> <string-name><surname>Emtiyaz</surname> <given-names>S</given-names></string-name>, <string-name><surname>Keyvanpour</surname> <given-names>M</given-names></string-name> (<year>2011</year>). Customers behavior modeling by semi-supervised learning in customer relationship management. arXiv preprint: <uri>https://arxiv.org/abs/1201.1670</uri>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Gönül</surname> <given-names>FF</given-names></string-name>, <string-name><surname>Hofstede</surname> <given-names>FT</given-names></string-name> (<year>2006</year>). <article-title>How to compute optimal catalog mailing decisions</article-title>. <source><italic>Marketing Science</italic></source>, <volume>25</volume>(<issue>1</issue>): <fpage>65</fpage>–<lpage>74</lpage>. <comment>Published online: January 1, 2006</comment>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1287/mksc.1050.0136" xlink:type="simple">https://doi.org/10.1287/mksc.1050.0136</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_006">
<mixed-citation publication-type="other"> <string-name><surname>Google LLC</surname></string-name> (<year>2025</year>). Google analytics. Web analytics platform.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_007">
<mixed-citation publication-type="other"> <string-name><surname>Hanssens</surname> <given-names>DM</given-names></string-name>, <string-name><surname>Leeflang</surname> <given-names>PSH</given-names></string-name>, <string-name><surname>Wittink</surname> <given-names>DR</given-names></string-name> (<year>2005</year>). Market response models and marketing practice. <italic>UCLA Anderson School of Management</italic>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Haron</surname> <given-names>NHB</given-names></string-name> (<year>2022</year>). <article-title>Stratified sampling using cluster analysis</article-title>. <source><italic>AIP Conference Proceedings</italic></source>, <volume>2472</volume>(<issue>1</issue>): <elocation-id>050012</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Haughton</surname> <given-names>D</given-names></string-name>, <string-name><surname>Oulabi</surname> <given-names>S</given-names></string-name> (<year>1993</year>). <article-title>Direct marketing modeling with CART and CHAID</article-title>. <source><italic>Journal of Direct Marketing</italic></source>, <volume>7</volume>(<issue>3</issue>): <fpage>16</fpage>–<lpage>26</lpage>. <comment>11 pages</comment>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/dir.4000070305" xlink:type="simple">https://doi.org/10.1002/dir.4000070305</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>He</surname> <given-names>H</given-names></string-name>, <string-name><surname>Garcia</surname> <given-names>EA</given-names></string-name> (<year>2009</year>). <article-title>Learning from imbalanced data</article-title>. <source><italic>IEEE Transactions on Knowledge and Data Engineering</italic></source>, <volume>21</volume>(<issue>9</issue>): <fpage>1263</fpage>–<lpage>1284</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TKDE.2008.239" xlink:type="simple">https://doi.org/10.1109/TKDE.2008.239</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_011">
<mixed-citation publication-type="book"> <string-name><surname>Housden</surname> <given-names>M</given-names></string-name>, <string-name><surname>Thomas</surname> <given-names>B</given-names></string-name> (<year>2002</year>). <source><italic>Direct Marketing in Practice</italic></source>, <edition>1</edition>st edition. <publisher-name>Routledge</publisher-name>, <publisher-loc>London</publisher-loc>. <comment>EBook published 27 April 2012</comment>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Kang</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>S</given-names></string-name>, <string-name><surname>MacLachlan</surname> <given-names>DL</given-names></string-name> (<year>2012</year>). <article-title>Improved response modeling based on clustering, under-sampling, and ensemble</article-title>. <source><italic>Expert Systems with Applications</italic></source>, <volume>39</volume>(<issue>8</issue>): <fpage>6738</fpage>–<lpage>6753</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.eswa.2011.12.028" xlink:type="simple">https://doi.org/10.1016/j.eswa.2011.12.028</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Lee</surname> <given-names>HJ</given-names></string-name>, <string-name><surname>Shin</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hwang</surname> <given-names>SS</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>S</given-names></string-name>, <string-name><surname>MacLachlan</surname> <given-names>D</given-names></string-name> (<year>2010</year>). <article-title>Semi-supervised response modeling</article-title>. <source><italic>Journal of Interactive Marketing</italic></source>, <volume>24</volume>(<issue>1</issue>): <fpage>42</fpage>–<lpage>54</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.intmar.2009.10.004" xlink:type="simple">https://doi.org/10.1016/j.intmar.2009.10.004</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Mohammed Amine Naji</surname> <given-names>S</given-names></string-name>, <string-name><surname>El Filali</surname> <given-names>S</given-names></string-name>, <string-name><surname>Aarika</surname> <given-names>K</given-names></string-name>, <string-name><surname>Benlahmar</surname> <given-names>EH</given-names></string-name>, <string-name><surname>Ait Abdelouhahid</surname> <given-names>R</given-names></string-name>, <string-name><surname>Debauche</surname> <given-names>O</given-names></string-name> (<year>2021</year>). <article-title>Machine learning algorithms for breast cancer prediction and diagnosis</article-title>. <source><italic>Procedia Computer Science</italic></source>, <volume>191</volume>: <fpage>487</fpage>–<lpage>492</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.procs.2021.07.062" xlink:type="simple">https://doi.org/10.1016/j.procs.2021.07.062</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_015">
<mixed-citation publication-type="other"> <string-name><surname>Moro</surname> <given-names>S</given-names></string-name>, <string-name><surname>Rita</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cortez</surname> <given-names>P</given-names></string-name> (<year>2014</year>). Bank marketing. <italic>UCI Machine Learning Repository</italic>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_016">
<mixed-citation publication-type="other"> <string-name><surname>Sakar</surname> <given-names>C</given-names></string-name>, <string-name><surname>Kastro</surname> <given-names>Y</given-names></string-name> (<year>2018</year>). Online shoppers purchasing intention dataset. <italic>UCI Machine Learning Repository</italic>.</mixed-citation>
</ref>
<ref id="j_jds1198_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Thomas</surname> <given-names>AR</given-names></string-name> (<year>2007</year>). <article-title>The end of mass marketing: Or, why all successful marketing is now direct marketing</article-title>. <source><italic>Direct Marketing: An International Journal</italic></source>, <volume>1</volume>(<issue>1</issue>): <fpage>6</fpage>–<lpage>16</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1108/17505930710734107" xlink:type="simple">https://doi.org/10.1108/17505930710734107</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Tipton</surname> <given-names>E</given-names></string-name> (<year>2013</year>). <article-title>Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments</article-title>. <source><italic>Evaluation Review</italic></source>, <volume>37</volume>(<issue>2</issue>): <fpage>109</fpage>–<lpage>139</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1177/0193841X13516324" xlink:type="simple">https://doi.org/10.1177/0193841X13516324</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Tékouabou</surname> <given-names>SCK</given-names></string-name>, <string-name><surname>Gherghina</surname> <given-names>SC</given-names></string-name>, <string-name><surname>Toulni</surname> <given-names>H</given-names></string-name>, <string-name><surname>Neves Mata</surname> <given-names>P</given-names></string-name>, <string-name><surname>Mata</surname> <given-names>MN</given-names></string-name>, <string-name><surname>Martins</surname> <given-names>JM</given-names></string-name> (<year>2022</year>). <article-title>A machine learning framework towards bank telemarketing prediction</article-title>. <source><italic>Journal of Risk and Financial Management</italic></source>, <volume>15</volume>(<issue>6</issue>): <fpage>269</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/jrfm15060269" xlink:type="simple">https://doi.org/10.3390/jrfm15060269</ext-link></mixed-citation>
</ref>
<ref id="j_jds1198_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Yan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Nazmi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gebru</surname> <given-names>B</given-names></string-name>, <etal>et al.</etal> (<year>2022</year>). <article-title>A clustering-based active learning method to query informative and representative samples</article-title>. <source><italic>Applied Intelligence</italic></source>, <volume>52</volume>: <fpage>13250</fpage>–<lpage>13267</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10489-021-03139-y" xlink:type="simple">https://doi.org/10.1007/s10489-021-03139-y</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
