Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Efficient UCB-Based Assignment Algorithm ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Efficient UCB-Based Assignment Algorithm Under Unknown Utility with Application in Mentor-Mentee Matching
Yuyang Shi †   Yajun Mei  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1151
Pub. online: 24 September 2024      Type: Statistical Data Science      Open accessOpen Access

† This paper is part of the first author’s PhD dissertation at Georgia Institute of Technology.

Received
18 June 2024
Accepted
22 August 2024
Published
24 September 2024

Abstract

The assignment problem, crucial in various real-world applications, involves optimizing the allocation of agents to tasks for maximum utility. While it has been well-studied in the optimization literature when the underlying utilities between all agent-task pairs are known, research is sparse when the utilities are unknown and need to be learned from data on the fly. This paper addresses this gap, as motivated by mentor-mentee matching programs at many U.S. universities. We develop an efficient sequential assignment algorithm, with the aim of nearly maximizing the overall utility simultaneously over different time periods. Our proposed algorithm is to use stochastic bandit feedback to adaptively estimate the unknown utilities through linear regression models, integrating the Upper Confidence Bound (UCB) algorithm in the multi-armed bandit problem with the Hungarian algorithm in the assignment problem. We provide theoretical bounds of our algorithm for both the estimation error and the total regret. Additionally, numerical studies are also conducted to demonstrate the practical effectiveness of our algorithm.

Supplementary material

 Supplementary Material
The supplementary materials online includes: Proofs of Lemmas and Theorems used in the paper, and Python code needed to reproduce the results.

References

 
Abbasi-Yadkori Y, Pál D, Szepesvári C (2011). Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 24: 2312–2320.
 
Anderson T (2008). The Theory and Practice of Online Learning. Athabasca University Press.
 
Anscombe FJ (1953). Sequential estimation. Journal of the Royal Statistical Society, Series B, Methodological, 15(1): 1–21.
 
Avis D (1983). A survey of heuristics for the weighted matching problem. Networks, 13(4): 475–493.
 
Biró P, Gyetvai M (2023). Online voluntary mentoring: Optimising the assignment of students and mentors. European Journal of Operational Research, 307(1): 392–405.
 
Cesa-Bianchi N, Lugosi G (2012). Combinatorial bandits. Journal of Computer and System Sciences, 78(5): 1404–1422.
 
Chen W, Wang Y, Yuan Y (2013). Combinatorial multi-armed bandit: General framework and applications. In: Dasgupta S and McAllester D (eds.), Proceedings of the 30th International Conference on Machine Learning, 151–159. PMLR.
 
Chu W, Li L, Reyzin L, Schapire R (2011). Contextual bandits with linear payoff functions. In: Gordon G, Dunson D, and Dudík M (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 208–214. JMLR Workshop and Conference Proceedings.
 
Deshmukh AA, Dogan U, Scott C (2017). Multi-task learning for contextual bandits. Advances in Neural Information Processing Systems, 30: 4848–4856.
 
DiCiccio TJ, Efron B (1996). Bootstrap confidence intervals. Statistical Science, 11(3): 189–228.
 
Duan R, Pettie S (2014). Linear-time approximation for maximum weight matching. Journal of the ACM, 61(1): 1–23.
 
Edmonds J, Karp RM (1972). Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM, 19(2): 248–264.
 
Efron B, Tibshirani RJ (1994). An Introduction to the Bootstrap. CRC Press.
 
Erraqabi A, Lazaric A, Valko M, Brunskill E, Liu YE (2017). Trading off rewards and errors in multi-armed bandits. In: Singh A and Zhu J (eds.), Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 709–717. PMLR.
 
Fang A, Zhu H (2022). Matching for peer support: Exploring algorithmic matching for online mental health communities. Proceedings of the ACM on Human–Computer Interaction, 6(CSCW2): 1–37.
 
Gai Y, Krishnamachari B, Jain R (2012). Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 20(5): 1466–1478.
 
Ghosh M, Mukhopadhyay N, Sen PK (2011). Sequential Estimation. John Wiley & Sons.
 
Hazan E (2016). Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3–4): 157–325.
 
Kuhn HW (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2): 83–97.
 
Kurtzberg JM (1962). On approximation methods for the assignment problem. Journal of the ACM, 9(4): 419–439.
 
Lai TL, Robbins H (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1): 4–22.
 
Perrault P, Boursier E, Valko M, Perchet V (2020). Statistical efficiency of Thompson sampling for combinatorial semi-bandits. Advances in Neural Information Processing Systems, 33: 5429–5440.
 
Pizzato L, Rej T, Chung T, Koprinska I Kay J (2010). RECON: A reciprocal recommender for online dating. In: Proceedings of the Fourth ACM Conference on Recommender Systems, 207–214.
 
Shalev-Shwartz S (2012). Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2): 107–194.
 
Shi Y, Mei Y (2022). Efficient sequential ucb-based Hungarian algorithm for assignment problems. In: 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1–8. IEEE.
 
Simchi-Levi D, Wang C (2023). Multi-armed bandit experimental design: Online decision-making and adaptive inference. In: Ruiz F, Dy J, and van de Meent J-W (eds.), Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, 3086–3097. PMLR.
 
Tomizawa N (1971). On some techniques useful for solution of transportation network problems. Networks, 1(2): 173–194.
 
Vakili S, Zhao Q, Zhou Y (2014). Time-varying stochastic multi-armed bandit problems. In: 2014 48th Asilomar Conference on Signals, Systems and Computers, 2103–2107. IEEE.
 
Wen Z, Kveton B, Ashkan A (2015). Efficient learning in large-scale combinatorial semi-bandits. In: Bach F and Blei D (eds.), Proceedings of the 32nd International Conference on Machine Learning, 1113–1122. PMLR.
 
Xia P, Liu B, Sun Y, Chen C (2015). Reciprocal recommendation system for online dating. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 234–241. IEEE.
 
Xu X, Dong F, Li Y, He S, Li X (2020). Contextual-bandit based personalized recommendation with time-varying user interests. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 6518–6525.
 
Yang F, Ramdas A, Jamieson KG, Wainwright MJ (2017). A framework for multi-a(rmed)/b(andit) testing with online FDR control. Advances in Neural Information Processing Systems, 30: 5959–5968.

Related articles PDF XML
Related articles PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
bandits estimation optimal assignment upper confidence bound

Funding
This research was supported in part by NSF grant DMS-2015405 and by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002378. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Metrics
since February 2021
183

Article info
views

69

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy