<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1151</article-id>
<article-id pub-id-type="doi">10.6339/24-JDS1151</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Efficient UCB-Based Assignment Algorithm Under Unknown Utility with Application in Mentor-Mentee Matching</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Shi</surname><given-names>Yuyang</given-names></name><email xlink:href="mailto:syuyang123@gmail.com">syuyang123@gmail.com</email><xref ref-type="aff" rid="j_jds1151_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref><xref ref-type="fn" rid="j_jds1151_fn_001">†</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Mei</surname><given-names>Yajun</given-names></name><xref ref-type="aff" rid="j_jds1151_aff_002">2</xref>
</contrib>
<aff id="j_jds1151_aff_001"><label>1</label>H. Milton Stewart School of Industrial and Systems Engineering, <institution>Georgia Institute of Technology</institution>, Atlanta, GA, <country>USA</country></aff>
<aff id="j_jds1151_aff_002"><label>2</label>Department of Biostatistics, School of Global Public Health, <institution>New York University</institution>, New York, NY, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:syuyang123@gmail.com">syuyang123@gmail.com</ext-link>.</corresp><fn id="j_jds1151_fn_001"><label>†</label>
<p>This paper is part of the first author’s PhD dissertation at Georgia Institute of Technology.</p></fn>
</author-notes>
<pub-date pub-type="ppub"><year>2024</year></pub-date><pub-date pub-type="epub"><day>24</day><month>9</month><year>2024</year></pub-date><volume content-type="ahead-of-print">0</volume><issue>0</issue><fpage>1</fpage><lpage>17</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1151_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The supplementary materials online includes: Proofs of Lemmas and Theorems used in the paper, and Python code needed to reproduce the results.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>18</day><month>6</month><year>2024</year></date><date date-type="accepted"><day>22</day><month>8</month><year>2024</year></date></history>
<permissions><copyright-statement>2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2024</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>The assignment problem, crucial in various real-world applications, involves optimizing the allocation of agents to tasks for maximum utility. While it has been well-studied in the optimization literature when the underlying utilities between all agent-task pairs are known, research is sparse when the utilities are unknown and need to be learned from data on the fly. This paper addresses this gap, as motivated by mentor-mentee matching programs at many U.S. universities. We develop an efficient sequential assignment algorithm, with the aim of nearly maximizing the overall utility simultaneously over different time periods. Our proposed algorithm is to use stochastic bandit feedback to adaptively estimate the unknown utilities through linear regression models, integrating the Upper Confidence Bound (UCB) algorithm in the multi-armed bandit problem with the Hungarian algorithm in the assignment problem. We provide theoretical bounds of our algorithm for both the estimation error and the total regret. Additionally, numerical studies are also conducted to demonstrate the practical effectiveness of our algorithm.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>bandits</kwd>
<kwd>estimation</kwd>
<kwd>optimal assignment</kwd>
<kwd>upper confidence bound</kwd>
</kwd-group>
<funding-group><funding-statement>This research was supported in part by NSF grant DMS-2015405 and by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002378. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.</funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1151_reflist_001">
<title>References</title>
<ref id="j_jds1151_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Abbasi-Yadkori</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Pál</surname> <given-names>D</given-names></string-name>, <string-name><surname>Szepesvári</surname> <given-names>C</given-names></string-name> (<year>2011</year>). <article-title>Improved algorithms for linear stochastic bandits</article-title>. <source><italic>Advances in Neural Information Processing Systems</italic></source>, <volume>24</volume>: <fpage>2312</fpage>–<lpage>2320</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_002">
<mixed-citation publication-type="book"> <string-name><surname>Anderson</surname> <given-names>T</given-names></string-name> (<year>2008</year>). <source><italic>The Theory and Practice of Online Learning</italic></source>. <publisher-name>Athabasca University Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Anscombe</surname> <given-names>FJ</given-names></string-name> (<year>1953</year>). <article-title>Sequential estimation</article-title>. <source><italic>Journal of the Royal Statistical Society, Series B, Methodological</italic></source>, <volume>15</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>21</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Avis</surname> <given-names>D</given-names></string-name> (<year>1983</year>). <article-title>A survey of heuristics for the weighted matching problem</article-title>. <source><italic>Networks</italic></source>, <volume>13</volume>(<issue>4</issue>): <fpage>475</fpage>–<lpage>493</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Biró</surname> <given-names>P</given-names></string-name>, <string-name><surname>Gyetvai</surname> <given-names>M</given-names></string-name> (<year>2023</year>). <article-title>Online voluntary mentoring: Optimising the assignment of students and mentors</article-title>. <source><italic>European Journal of Operational Research</italic></source>, <volume>307</volume>(<issue>1</issue>): <fpage>392</fpage>–<lpage>405</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Cesa-Bianchi</surname> <given-names>N</given-names></string-name>, <string-name><surname>Lugosi</surname> <given-names>G</given-names></string-name> (<year>2012</year>). <article-title>Combinatorial bandits</article-title>. <source><italic>Journal of Computer and System Sciences</italic></source>, <volume>78</volume>(<issue>5</issue>): <fpage>1404</fpage>–<lpage>1422</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_007">
<mixed-citation publication-type="chapter"> <string-name><surname>Chen</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>Y</given-names></string-name> (<year>2013</year>). <chapter-title>Combinatorial multi-armed bandit: General framework and applications</chapter-title>. In: Dasgupta S and McAllester D (eds.), <source><italic>Proceedings of the 30th International Conference on Machine Learning</italic></source>, <fpage>151</fpage>–<lpage>159</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_008">
<mixed-citation publication-type="chapter"> <string-name><surname>Chu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Li</surname> <given-names>L</given-names></string-name>, <string-name><surname>Reyzin</surname> <given-names>L</given-names></string-name>, <string-name><surname>Schapire</surname> <given-names>R</given-names></string-name> (<year>2011</year>). <chapter-title>Contextual bandits with linear payoff functions</chapter-title>. In: Gordon G, Dunson D, and Dudík M (eds.), <source><italic>Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics</italic></source>, <fpage>208</fpage>–<lpage>214</lpage>. JMLR Workshop and Conference Proceedings.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Deshmukh</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Dogan</surname> <given-names>U</given-names></string-name>, <string-name><surname>Scott</surname> <given-names>C</given-names></string-name> (<year>2017</year>). <article-title>Multi-task learning for contextual bandits</article-title>. <source><italic>Advances in Neural Information Processing Systems</italic></source>, <volume>30</volume>: <fpage>4848</fpage>–<lpage>4856</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>DiCiccio</surname> <given-names>TJ</given-names></string-name>, <string-name><surname>Efron</surname> <given-names>B</given-names></string-name> (<year>1996</year>). <article-title>Bootstrap confidence intervals</article-title>. <source><italic>Statistical Science</italic></source>, <volume>11</volume>(<issue>3</issue>): <fpage>189</fpage>–<lpage>228</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Duan</surname> <given-names>R</given-names></string-name>, <string-name><surname>Pettie</surname> <given-names>S</given-names></string-name> (<year>2014</year>). <article-title>Linear-time approximation for maximum weight matching</article-title>. <source><italic>Journal of the ACM</italic></source>, <volume>61</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>23</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Edmonds</surname> <given-names>J</given-names></string-name>, <string-name><surname>Karp</surname> <given-names>RM</given-names></string-name> (<year>1972</year>). <article-title>Theoretical improvements in algorithmic efficiency for network flow problems</article-title>. <source><italic>Journal of the ACM</italic></source>, <volume>19</volume>(<issue>2</issue>): <fpage>248</fpage>–<lpage>264</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_013">
<mixed-citation publication-type="book"> <string-name><surname>Efron</surname> <given-names>B</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>RJ</given-names></string-name> (<year>1994</year>). <source><italic>An Introduction to the Bootstrap</italic></source>. <publisher-name>CRC Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_014">
<mixed-citation publication-type="chapter"> <string-name><surname>Erraqabi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lazaric</surname> <given-names>A</given-names></string-name>, <string-name><surname>Valko</surname> <given-names>M</given-names></string-name>, <string-name><surname>Brunskill</surname> <given-names>E</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>YE</given-names></string-name> (<year>2017</year>). <chapter-title>Trading off rewards and errors in multi-armed bandits</chapter-title>. In: Singh A and Zhu J (eds.), <source><italic>Proceedings of the 20th International Conference on Artificial Intelligence and Statistics</italic></source>, <fpage>709</fpage>–<lpage>717</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Fang</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>H</given-names></string-name> (<year>2022</year>). <article-title>Matching for peer support: Exploring algorithmic matching for online mental health communities</article-title>. <source><italic>Proceedings of the ACM on Human–Computer Interaction</italic></source>, <volume>6</volume>(<issue>CSCW2</issue>): <fpage>1</fpage>–<lpage>37</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Gai</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Krishnamachari</surname> <given-names>B</given-names></string-name>, <string-name><surname>Jain</surname> <given-names>R</given-names></string-name> (<year>2012</year>). <article-title>Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations</article-title>. <source><italic>IEEE/ACM Transactions on Networking</italic></source>, <volume>20</volume>(<issue>5</issue>): <fpage>1466</fpage>–<lpage>1478</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_017">
<mixed-citation publication-type="book"> <string-name><surname>Ghosh</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mukhopadhyay</surname> <given-names>N</given-names></string-name>, <string-name><surname>Sen</surname> <given-names>PK</given-names></string-name> (<year>2011</year>). <source><italic>Sequential Estimation</italic></source>. <publisher-name>John Wiley &amp; Sons</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Hazan</surname> <given-names>E</given-names></string-name> (<year>2016</year>). <article-title>Introduction to online convex optimization</article-title>. <source><italic>Foundations and Trends in Optimization</italic></source>, <volume>2</volume>(<issue>3–4</issue>): <fpage>157</fpage>–<lpage>325</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Kuhn</surname> <given-names>HW</given-names></string-name> (<year>1955</year>). <article-title>The Hungarian method for the assignment problem</article-title>. <source><italic>Naval Research Logistics Quarterly</italic></source>, <volume>2</volume>(<issue>1–2</issue>): <fpage>83</fpage>–<lpage>97</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Kurtzberg</surname> <given-names>JM</given-names></string-name> (<year>1962</year>). <article-title>On approximation methods for the assignment problem</article-title>. <source><italic>Journal of the ACM</italic></source>, <volume>9</volume>(<issue>4</issue>): <fpage>419</fpage>–<lpage>439</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Lai</surname> <given-names>TL</given-names></string-name>, <string-name><surname>Robbins</surname> <given-names>H</given-names></string-name> (<year>1985</year>). <article-title>Asymptotically efficient adaptive allocation rules</article-title>. <source><italic>Advances in Applied Mathematics</italic></source>, <volume>6</volume>(<issue>1</issue>): <fpage>4</fpage>–<lpage>22</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Perrault</surname> <given-names>P</given-names></string-name>, <string-name><surname>Boursier</surname> <given-names>E</given-names></string-name>, <string-name><surname>Valko</surname> <given-names>M</given-names></string-name>, <string-name><surname>Perchet</surname> <given-names>V</given-names></string-name> (<year>2020</year>). <article-title>Statistical efficiency of Thompson sampling for combinatorial semi-bandits</article-title>. <source><italic>Advances in Neural Information Processing Systems</italic></source>, <volume>33</volume>: <fpage>5429</fpage>–<lpage>5440</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_023">
<mixed-citation publication-type="chapter"> <string-name><surname>Pizzato</surname> <given-names>L</given-names></string-name>, <string-name><surname>Rej</surname> <given-names>T</given-names></string-name>, <string-name><surname>Chung</surname> <given-names>T</given-names></string-name>, <string-name><surname>Koprinska</surname> <given-names>I</given-names></string-name> <string-name><surname>Kay</surname> <given-names>J</given-names></string-name> (<year>2010</year>). <chapter-title>RECON: A reciprocal recommender for online dating</chapter-title>. In: <source><italic>Proceedings of the Fourth ACM Conference on Recommender Systems</italic></source>, <fpage>207</fpage>–<lpage>214</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_024">
<mixed-citation publication-type="journal"> <string-name><surname>Shalev-Shwartz</surname> <given-names>S</given-names></string-name> (<year>2012</year>). <article-title>Online learning and online convex optimization</article-title>. <source><italic>Foundations and Trends in Machine Learning</italic></source>, <volume>4</volume>(<issue>2</issue>): <fpage>107</fpage>–<lpage>194</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_025">
<mixed-citation publication-type="chapter"> <string-name><surname>Shi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Mei</surname> <given-names>Y</given-names></string-name> (<year>2022</year>). <chapter-title>Efficient sequential ucb-based Hungarian algorithm for assignment problems</chapter-title>. In: <source><italic>2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton)</italic></source>, <fpage>1</fpage>–<lpage>8</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_026">
<mixed-citation publication-type="chapter"> <string-name><surname>Simchi-Levi</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>C</given-names></string-name> (<year>2023</year>). <chapter-title>Multi-armed bandit experimental design: Online decision-making and adaptive inference</chapter-title>. In: Ruiz F, Dy J, and van de Meent J-W (eds.), <source><italic>Proceedings of the 26th International Conference on Artificial Intelligence and Statistics</italic></source>, <fpage>3086</fpage>–<lpage>3097</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_027">
<mixed-citation publication-type="journal"> <string-name><surname>Tomizawa</surname> <given-names>N</given-names></string-name> (<year>1971</year>). <article-title>On some techniques useful for solution of transportation network problems</article-title>. <source><italic>Networks</italic></source>, <volume>1</volume>(<issue>2</issue>): <fpage>173</fpage>–<lpage>194</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_028">
<mixed-citation publication-type="chapter"> <string-name><surname>Vakili</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Y</given-names></string-name> (<year>2014</year>). <chapter-title>Time-varying stochastic multi-armed bandit problems</chapter-title>. In: <source><italic>2014 48th Asilomar Conference on Signals, Systems and Computers</italic></source>, <fpage>2103</fpage>–<lpage>2107</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_029">
<mixed-citation publication-type="chapter"> <string-name><surname>Wen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Kveton</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ashkan</surname> <given-names>A</given-names></string-name> (<year>2015</year>). <chapter-title>Efficient learning in large-scale combinatorial semi-bandits</chapter-title>. In: Bach F and Blei D (eds.), <source><italic>Proceedings of the 32nd International Conference on Machine Learning</italic></source>, <fpage>1113</fpage>–<lpage>1122</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_030">
<mixed-citation publication-type="chapter"> <string-name><surname>Xia</surname> <given-names>P</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>C</given-names></string-name> (<year>2015</year>). <chapter-title>Reciprocal recommendation system for online dating</chapter-title>. In: <source><italic>Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</italic></source>, <fpage>234</fpage>–<lpage>241</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_031">
<mixed-citation publication-type="chapter"> <string-name><surname>Xu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>F</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>He</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name> (<year>2020</year>). <chapter-title>Contextual-bandit based personalized recommendation with time-varying user interests</chapter-title>. In: <source><italic>Proceedings of the AAAI Conference on Artificial Intelligence</italic></source>, volume <volume>34</volume>, <fpage>6518</fpage>–<lpage>6525</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1151_ref_032">
<mixed-citation publication-type="journal"> <string-name><surname>Yang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Ramdas</surname> <given-names>A</given-names></string-name>, <string-name><surname>Jamieson</surname> <given-names>KG</given-names></string-name>, <string-name><surname>Wainwright</surname> <given-names>MJ</given-names></string-name> (<year>2017</year>). <article-title>A framework for multi-a(rmed)/b(andit) testing with online FDR control</article-title>. <source><italic>Advances in Neural Information Processing Systems</italic></source>, <volume>30</volume>: <fpage>5959</fpage>–<lpage>5968</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
