Scholarship & Research

Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/1

Browse

Search Results

Now showing 1 - 6 of 6
  • Thumbnail Image
    Item
    Risk mapping of wildlife–vehicle collisions across the state of Montana, USA: a machine-learning approach for imbalanced data along rural roads
    (Oxford University Press, 2024-05) Bell, Matthew; Wang, Yiyi; Ament, Rob
    Wildlife–vehicle collisions (WVCs) with large animals are estimated to cost the USA over 8 billion USD in property damage, tens of thousands of human injuries and nearly 200 human fatalities each year. Most WVCs occur on rural roads and are not collected evenly among road segments, leading to imbalanced data. There are a disproportionate number of analysis units that have zero WVC cases when investigating large geographic areas for collision risk. Analysis units with zero WVCs can reduce prediction accuracy and weaken the coefficient estimates of statistical learning models. This study demonstrates that the use of the synthetic minority over-sampling technique (SMOTE) to handle imbalanced WVC data in combination with statistical and machine-learning models improves the ability to determine seasonal WVC risk across the rural highway network in Montana, USA. An array of regularized variables describing landscape, road and traffic were used to develop negative binomial and random forest models to infer WVC rates per 100 million vehicle miles travelled. The random forest model is found to work particularly well with SMOTE-augmented data to improve the prediction accuracy of seasonal WVC risk. SMOTE-augmented data are found to improve accuracy when predicting crash risk across fine-grained grids while retaining the characteristics of the original dataset. The analyses suggest that SMOTE augmentation mitigates data imbalance that is encountered in seasonally divided WVC data. This research provides the basis for future risk-mapping models and can potentially be used to address the low rates of WVCs and other crash types along rural roads.
  • Thumbnail Image
    Item
    Inference of Transit Passenger Counts and Waiting Time Using Wi-Fi Signals
    (Western Transportation Institute, 2021-08) Videa, Aldo; Wang, Yiyi
    Passenger data such as real-time origin-destination (OD) flows and waiting times are central to planning public transportation services and improving visitor experience. This project explored the use of Internet of Things (IoT) Technology to infer transit ridership and waiting time at bus stops. Specifically, this study explored the use of Raspberry Pi computers, which are small and inexpensive sets of hardware, to scan the Wi-Fi networks of passengers’ smartphones. The process was used to infer passenger counts and obtain information on passenger trajectories based on Global Positioning System (GPS) data. The research was conducted as a case study of the Streamline Bus System in Bozeman, Montana. To evaluate the reliability of the data collected with the Raspberry Pi computers, the study conducted technology-based estimation of ridership, OD flows, wait time, and travel time for a comparison with ground truth data (passenger surveys, manual data counts, and bus travel times). This study introduced the use of a wireless Wi-Fi scanning device for transit data collection, called a Smart Station. It combines an innovative set of hardware and software to create a non-intrusive and passive data collection mechanism. Through the field testing and comparison evaluation with ground truth data, the Smart Station produced accurate estimates of ridership, origin-destination characteristics, wait times, and travel times. Ridership data has traditionally been collected through a combination of manual surveys and Automatic Passenger Counter (APC) systems, which can be time-consuming and expensive, with limited capabilities to produce real-time data. The Smart Station shows promise as an accurate and cost-effective alternative. The advantages of using Smart Station over traditional data collection methods include the following: (1) Wireless, automated data collection and retrieval, (2) Real-time observation of passenger behavior, (3) Negligible maintenance after programming and installing the hardware, (4) Low costs of hardware, software, and installation, and (5) Simple and short programming and installation time. If further validated through additional research and development, the device could help transit systems facilitate data collection for route optimization, trip planning tools, and traveler information systems.
  • Thumbnail Image
    Item
    Traffic Safety Along Tourist Routes in Rural Areas
    (2016-01) Wang, Yiyi; Veneziano, David; Russell, Sam; Al-Kaisy, Ahmed
    Little is known about the safety of tourist drivers in the United States. Most domestic studies have focused on traffic deaths and injuries of U.S. citizens traveling abroad and cite factors such as driving on the left, lack of seat belt use, and alcohol consumption. U.S. states that have a number of tourist attractions and the roadways to reach them may be interested in whether traffic safety is problematic for drivers who are tourists. To that end, this research investigated the contributing factors for crash severity and crash likelihood of visiting drivers in or near three national parks in rural areas. Driver-level data from the Rocky Mountain National Park in Colorado and the Sequoia and Kings Canyon National Parks in California revealed risk factors for crash severity, including age, geometry, and seat belt use. The second data set offered a more microscopic view at the road level and was used to anticipate crash frequency of visiting drivers at the road link level. Moreover, the second data set contained road geometry, traffic volume, environment, and crash counts aggregated at the segment level along a 57.8-mi stretch of U.S. Hwy 89 (a primary route to the north gate of Yellowstone National Park) in Montana that is frequently used by tourists. Crash risk factors (e.g., geometry and traffic intensity) affected local and nonlocal (tourist) drivers in different ways. Further exploration of crash trends in specific parks would be valuable in understanding the overall trends and contributors to crashes in tourism areas and to determine effective improvement measures.
  • Thumbnail Image
    Item
    The Impact of Weight Matrices on Parameter Estimation & Inference: A Case Study of Binary Response Using Land Use Data
    (2013-11) Wang, Yiyi; Kockelman, Kara M.; Wang, Xiaokun (Cara)
    This paper develops two new models and evaluates the impact of using different weight matrices on parameter estimates and inference in three distinct spatial specifications for discrete response. These specifications rely on a conventional, sparse, inverse-distance weight matrix for a spatial auto-regressive probit (SARP), a spatial autoregressive approach where the weight matrix includes an endogenous distance-decay parameter (SARPα), and a matrix exponential spatial specification for probit (MESSP). These are applied in a binary choice setting using both simulated data and parcel-level land-use data. Parameters of all models are estimated using Bayesian methods. In simulated tests, adding a distance-decay parameter term to the spatial weight matrix improved the quality of estimation and inference, as reflected by a lower deviance information criteriaon (DIC) value, but the added sampling loop required to estimate the distance-decay parameter substantially increased computing times. In contrast, the MESSP model’s obvious advantage is its fast computing time, thanks to elimination of a log-determinant calculation for the weight matrix. In the model tests using actual land-use data, the MESSP approach emerged as the clear winner, in terms of fit and computing times. Results from all three models offer consistent interpretation of parameter estimates, with locations farther away from the regional central business district (CBD) and closer to roadways being more prone to (mostly residential) development (as expected). Again, the MESSP model offered the greatest computing-time savings benefits, but all three specifications yielded similar marginal effects estimates, showing how a focus on the spatial interactions and net (direct plus indirect) effects across observational units is more important than a focus on slope-parameter estimates when properly analyzing spatial data.
  • Thumbnail Image
    Item
    Estimation of Seasonal Daily Traffic Flow of Agricultural Products and Implications for Implementation of Automatic Traffic Recorders
    (2015-06) Forsythe, Shane; Stephens, Jerry; Wang, Yiyi
    Reliable traffic counts on a highway system are critical for sound decision making about the maintenance, operation, and expansion of the system. Portable short-term automatic traffic recorders (ATRs) are a cost-efficient way to complement traffic counts from permanent ATR sites by performing temporary traffic counts on the highway system. Complicating the collection of traffic data with these short-term devices is the seasonal variation in vehicle operations throughout the year. This work focused on predicting the spatial distribution of seasonal traffic resulting from agricultural activities by using a new method that combines geographic information system spatial functions and the four-step travel demand model. This research collected information about township grids for Montana (as proxies for trip origins), grain elevators (trip destinations), agricultural ground cover, and crop yield estimates to estimate flows in tonnage at the grid level on the road network. Results suggest that the proposed method using the location of major crops and the locations of grain elevators can be used to predict tonnage of product that will be added to individual routes. The predicted values can then be compared with reported heavy-truck traffic to locate sites that may have underrepresented traffic flows. Although this work considered specifically three crops, the method can be applied to any resource flow that has known origin and destination information. The method can be enhanced by refining assumptions of the composition of heavy trucks transporting agricultural products and by field measurements of vehicle flows to better test the validity of the model.
  • Thumbnail Image
    Item
    Where are the electric vehicles? A spatial model for vehicle-choice count data
    (2015-02) Chen, T. Donna; Wang, Yiyi; Kockelman, Kara M.
    Electric vehicles (EVs) are predicted to increase in market share as auto manufacturers introduce more fuel efficient vehicles to meet stricter fuel economy mandates and fossil fuel costs remain unpredictable. Reflecting spatial autocorrelation while controlling for a variety of demographic and locational (e.g., built environment) attributes, the zone-level spatial count model in this paper offers valuable information for power providers and charging station location decisions. By anticipating over 745,000 personal-vehicle registrations across a sample of 1000 census block groups in the Philadelphia region, a trivariate Poisson-lognormal conditional autoregressive (CAR) model anticipates Prius hybrid EV, other EV, and conventional vehicle ownership levels. Initial results signal higher EV ownership rates in more central zones with higher household incomes, along with significant residual spatial autocorrelation, suggesting that spatially-correlated latent variables and/or peer (neighbor) effects on purchase decisions are present. Such data sets will become more comprehensive and informative as EV market shares rise. This work’s multivariate Poisson-lognormal CAR modeling approach offers a rigorous, behaviorally-defensible framework for spatial patterns in choice behavior.
Copyright (c) 2002-2022, LYRASIS. All rights reserved.