Abstract: Quick identification of severe injury crashes can help Emergency Medical Services (EMS) better allocate their scarce resources to improve the survival of severely injured crash victims by providing them with a fast and timely response. Data broadcast from a vehicle’s Event Data Recorder (EDR) provide an opportunity to capture crash information and send them to EMS near real-time. A key feature of EDR data is a longitudinal measure of crash deceleration. We used functional data analysis (FDA) to ascertain key features of the deceleration trajectories (absolute integral, absolute in- tegral of its slope, and residual variance) to develop and verify a risk predic- tion model for serious (AIS 3+) injuries. We used data from the 2002-2012 EDR reports and the National Highway and National Automotive Sampling System (NASS) Crashworthiness Data System (CDS) datasets available on the National Transportation Safety Administration (NHTSA) website. We consider a variety of approaches to model deceleration data, including non- penalized and penalized splines and a variable selection method, ultimately obtaining a model with a weighted AUC of 0.93. A novel feature of our approach is the use of residual variance as a measure of predictive risk. Our model can be viewed as an important first step towards developing a real- time prediction model capable of predicting the risk of severe injury in any motor vehicle crash.
Abstract: Identification of representative regimes of wave height and direction under different wind conditions is complicated by issues that relate to the specification of the joint distribution of variables that are defined on linear and circular supports and the occurrence of missing values. We take a latent-class approach and jointly model wave and wind data by a finite mixture of conditionally independent Gamma and von Mises distributions. Maximum-likelihood estimates of parameters are obtained by exploiting a suitable EM algorithm that allows for missing data. The proposed model is validated on hourly marine data obtained from a buoy and two tide gauges in the Adriatic Sea.
Abstract: Student retention is an important issue for all university policy makers due to the potential negative impact on the image of the university and the career path of the dropouts. Although this issue has been thoroughly studied by many institutional researchers using parametric techniques, such as regression analysis and logit modeling, this article attempts to bring in a new perspective by exploring the issue with the use of three data mining techniques, namely, classification trees, multivariate adaptive regression splines (MARS), and neural networks. Data mining procedures identify transferred hours, residency, and ethnicity as crucial factors to retention. Carrying transferred hours into the university implies that the students have taken college level classes somewhere else, suggesting that they are more academically prepared for university study than those who have no transferred hours. Although residency was found to be a crucial predictor to retention, one should not go too far as to interpret this finding that retention is affected by proximity to the university location. Instead, this is a typical example of Simpson’s Paradox. The geographical information system analysis indicates that non-residents from the east coast tend to be more persistent in enrollment than their west coast schoolmates.