<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn><issn pub-type="ppub">1680-743X</issn><issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1067</article-id>
<article-id pub-id-type="doi">10.6339/22-JDS1067</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>EVIboost for the Estimation of Extreme Value Index Under Heterogeneous Extremes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>Jiaxi</given-names></name><xref ref-type="aff" rid="j_jds1067_aff_001">1</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Hou</surname><given-names>Yanxi</given-names></name><email xlink:href="mailto:yxhou@fudan.edu.cn">yxhou@fudan.edu.cn</email><xref ref-type="aff" rid="j_jds1067_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname><given-names>Xingchi</given-names></name><xref ref-type="aff" rid="j_jds1067_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>Tiandong</given-names></name><xref ref-type="aff" rid="j_jds1067_aff_003">3</xref>
</contrib>
<aff id="j_jds1067_aff_001"><label>1</label>School of Data Science, <institution>Fudan University</institution>, <country>China</country></aff>
<aff id="j_jds1067_aff_002"><label>2</label>Department of Statistics, <institution>Texas A&amp;M University</institution>, <country>USA</country></aff>
<aff id="j_jds1067_aff_003"><label>3</label>Shanghai Center for Mathematical Sciences, <institution>Fudan University</institution>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:yxhou@fudan.edu.cn">yxhou@fudan.edu.cn</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>3</day><month>10</month><year>2022</year></pub-date><volume>21</volume><issue>4</issue><fpage>638</fpage><lpage>657</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1067_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The following files are included in the supplementary material: (1) Programs for modeling TIR and EVIboost; (2) Code files for simulation study, along with the detailed experiment results; (3) Code and data files for financial data analysis via EVIboost; (4) Simulation results on computational time of the EVIboost model.</p>
</caption>
</supplementary-material><history><date date-type="received"><day>18</day><month>6</month><year>2022</year></date><date date-type="accepted"><day>16</day><month>9</month><year>2022</year></date></history>
<permissions><copyright-statement>2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Modeling heterogeneity on heavy-tailed distributions under a regression framework is challenging, yet classical statistical methodologies usually place conditions on the distribution models to facilitate the learning procedure. However, these conditions will likely overlook the complex dependence structure between the heaviness of tails and the covariates. Moreover, data sparsity on tail regions makes the inference method less stable, leading to biased estimates for extreme-related quantities. This paper proposes a gradient boosting algorithm to estimate a functional extreme value index with heterogeneous extremes. Our proposed algorithm is a data-driven procedure capturing complex and dynamic structures in tail distributions. We also conduct extensive simulation studies to show the prediction accuracy of the proposed algorithm. In addition, we apply our method to a real-world data set to illustrate the state-dependent and time-varying properties of heavy-tail phenomena in the financial industry.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>gradient boosting</kwd>
<kwd>heterogeneous extremes</kwd>
<kwd>Pareto model</kwd>
<kwd>tail estimation</kwd>
<kwd>tree-based method</kwd>
</kwd-group>
<funding-group><award-group><funding-source xlink:href="https://doi.org/10.13039/501100001809">National Natural Science Foundation of China</funding-source><award-id>72171055</award-id></award-group><award-group><funding-source xlink:href="https://doi.org/10.13039/100007219">Natural Science Foundation of Shanghai</funding-source><award-id>20ZR1403900</award-id></award-group><funding-statement>Yanxi Hou’s research was partly supported by the National Natural Science Foundation of China Grant 72171055 and the Natural Science Foundation of Shanghai Grant 20ZR1403900. </funding-statement></funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1067_reflist_001">
<title>References</title>
<ref id="j_jds1067_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Adrian</surname> <given-names>T</given-names></string-name>, <string-name><surname>Brunnermeier</surname> <given-names>MK</given-names></string-name> (<year>2016</year>). <article-title>Covar</article-title>. <source><italic>The American Economic Review</italic></source>, <volume>106</volume>: <fpage>1705</fpage>–<lpage>1741</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_002">
<mixed-citation publication-type="book"> <string-name><surname>Breiman</surname> <given-names>L</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Stone</surname> <given-names>C</given-names></string-name>, <string-name><surname>Olshen</surname> <given-names>R</given-names></string-name> (<year>1984</year>). <source><italic>Classification and Regression Trees</italic></source>. <publisher-name>CRC Press</publisher-name>, <publisher-loc>Abingdon, United Kingdom</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Clauset</surname> <given-names>A</given-names></string-name>, <string-name><surname>Shalizi</surname> <given-names>CR</given-names></string-name>, <string-name><surname>Newman</surname> <given-names>MEJ</given-names></string-name> (<year>2009</year>). <article-title>Power-law distributions in empirical data</article-title>. <source><italic>SIAM Review</italic></source>, <volume>51</volume>(<issue>4</issue>): <fpage>661</fpage>–<lpage>703</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_004">
<mixed-citation publication-type="book"> <string-name><surname>De Haan</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ferreira</surname> <given-names>A</given-names></string-name> (<year>2006</year>). <source><italic>Extreme Value Theory: An Introduction (Vol. 21)</italic></source>. <publisher-name>Springer</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_005">
<mixed-citation publication-type="journal"> <string-name><surname>Dekkers</surname> <given-names>A</given-names></string-name>, <string-name><surname>De Haan</surname> <given-names>L</given-names></string-name> (<year>1989</year>). <article-title>On the estimation of the extreme-value index and large quantile estimation</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>17</volume>(<issue>4</issue>): <fpage>1795</fpage>–<lpage>1832</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>DiCiccio</surname> <given-names>TJ</given-names></string-name>, <string-name><surname>Efron</surname> <given-names>B</given-names></string-name> (<year>1996</year>). <article-title>Bootstrap confidence intervals</article-title>. <source><italic>Statistical Science</italic></source>, <volume>11</volume>(<issue>3</issue>): <fpage>189</fpage>–<lpage>228</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Einmahl</surname> <given-names>J</given-names></string-name>, <string-name><surname>De Haan</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>C</given-names></string-name> (<year>2016</year>). <article-title>Statistics of heteroscedastic extremes</article-title>. <source><italic>Journal of the Royal Statistical Society, Series B, Statistical Methodology</italic></source>, <volume>78</volume>(<issue>1</issue>): <fpage>31</fpage>–<lpage>51</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name> (<year>2001</year>). <article-title>Greedy function approximation: A gradient boosting machine</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>29</volume>(<issue>6</issue>): <fpage>1189</fpage>–<lpage>1232</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Gençay</surname> <given-names>R</given-names></string-name>, <string-name><surname>Selçuk</surname> <given-names>F</given-names></string-name>, <string-name><surname>Ulugülyaǧci</surname> <given-names>A</given-names></string-name> (<year>2003</year>). <article-title>High volatility, thick tails and extreme value theory in value-at-risk estimation</article-title>. <source><italic>Insurance. Mathematics &amp; Economics</italic></source>, <volume>33</volume>(<issue>2</issue>): <fpage>337</fpage>–<lpage>356</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Hall</surname> <given-names>P</given-names></string-name> (<year>1982</year>). <article-title>On some simple estimates of an exponent of regular variation</article-title>. <source><italic>Journal of the Royal Statistical Society, Series B, Methodological</italic></source>, <volume>44</volume>(<issue>1</issue>): <fpage>37</fpage>–<lpage>42</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Hill</surname> <given-names>B</given-names></string-name> (<year>1975</year>). <article-title>A simple general approach to inference about the tail of a distribution</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>3</volume>(<issue>5</issue>): <fpage>1163</fpage>–<lpage>1174</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Meinshausen</surname> <given-names>N</given-names></string-name> (<year>2006</year>). <article-title>Quantile regression forests</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>7</volume>(<issue>35</issue>): <fpage>983</fpage>–<lpage>999</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_013">
<mixed-citation publication-type="journal"> <string-name><surname>Mentch</surname> <given-names>L</given-names></string-name>, <string-name><surname>Hooker</surname> <given-names>G</given-names></string-name> (<year>2016</year>). <article-title>Quantifying uncertainty in random forests via confidence intervals and hypothesis tests</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>17</volume>(<issue>26</issue>): <fpage>1</fpage>–<lpage>41</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>Sandri</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zuccolotto</surname> <given-names>P</given-names></string-name> (<year>2008</year>). <article-title>A bias correction algorithm for the Gini variable importance measure in classification trees</article-title>. <source><italic>Journal of Computational and Graphical Statistics</italic></source>, <volume>17</volume>(<issue>3</issue>): <fpage>611</fpage>–<lpage>628</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Wager</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name> (<year>2014</year>). <article-title>Confidence intervals for random forests: The jackknife and the infinitesimal jackknife</article-title>. <source><italic>Journal of Machine Learning Research</italic></source>, <volume>15</volume>(<issue>1</issue>): <fpage>1625</fpage>–<lpage>1651</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Wang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tsai</surname> <given-names>C</given-names></string-name> (<year>2009</year>). <article-title>Tail index regression</article-title>. <source><italic>Journal of the American Statistical Association</italic></source>, <volume>104</volume>(<issue>487</issue>): <fpage>1233</fpage>–<lpage>1240</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>White</surname> <given-names>AP</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>WZ</given-names></string-name> (<year>1994</year>). <article-title>Technical mote: Bias in information-based measures in decision tree induction</article-title>. <source><italic>Machine Learning</italic></source>, <volume>15</volume>(<issue>3</issue>): <fpage>321</fpage>–<lpage>329</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Xu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Hou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>D</given-names></string-name> (<year>2020</year>). <article-title>Prediction of extremal expectile based on regression models with heteroscedastic extremes</article-title>. <source><italic>Journal of Business &amp; Economic Statistics</italic></source>, <volume>40</volume>(<issue>2</issue>): <fpage>522</fpage>–<lpage>536</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Qian</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>H</given-names></string-name> (<year>2018</year>). <article-title>Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models</article-title>. <source><italic>Journal of Business &amp; Economic Statistics</italic></source>, <volume>36</volume>(<issue>3</issue>): <fpage>456</fpage>–<lpage>470</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>B</given-names></string-name> (<year>2005</year>). <article-title>Boosting with early stopping: Convergence and consistency</article-title>. <source><italic>The Annals of Statistics</italic></source>, <volume>33</volume>(<issue>4</issue>): <fpage>1538</fpage>–<lpage>1579</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1067_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Zhao</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>R</given-names></string-name> (<year>2018</year>). <article-title>Modeling maxima with autoregressive conditional Fréchet model</article-title>. <source><italic>Journal of Econometrics</italic></source>, <volume>207</volume>(<issue>2</issue>): <fpage>325</fpage>–<lpage>351</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
