Scholarly Work - Computer Science
Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/3034
Browse
Item A Comprehensive Study of Walmart Sales Predictions Using Time Series Analysis(Sciencedomain International, 2024-06) C., Cyril Neba; F., Gerard Shu; Nsuh, Gillian; A., Philip Amouda; F., Adrian Neba; Webnda, F.; Ikpe, Victory; Orelaja, Adeyinka; Sylla, Nabintou AnissiaThis article presents a comprehensive study of sales predictions using time series analysis, focusing on a case study of Walmart sales data. The aim of this study is to evaluate the effectiveness of various time series forecasting techniques in predicting weekly sales data for Walmart stores. Leveraging a dataset from Kaggle comprising weekly sales data from various Walmart stores around the United States, this study explores the effectiveness of time series analysis in forecasting future sales trends. Various time series analysis techniques, including Auto Regressive Integrated Moving Average (ARIMA), Seasonal Auto Regressive Integrated Moving Average (SARIMA), Prophet, Exponential Smoothing, and Gaussian Processes, are applied to model and forecast Walmart sales data. By comparing the performance of these models, the study seeks to identify the most accurate and reliable methods for forecasting retail sales, thereby providing valuable insights for improving sales predictions in the retail sector. The study includes an extensive exploratory data analysis (EDA) phase to preprocess the data, detect outliers, and visualize sales trends over time. Additionally, the article discusses the partitioning of data into training and testing sets for model evaluation. Performance metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are utilized to compare the accuracy of different time series models. The results indicate that Gaussian Processes outperform other models in terms of accuracy, with an RMSE of 34,116.09 and an MAE of 25,495.72, significantly lower than the other models evaluated. For comparison, ARIMA and SARIMA models both yielded an RMSE of 555,502.2 and an MAE of 462,767.3, while the Prophet model showed an RMSE of 567,509.2 and an MAE of 474,990.8. Exponential Smoothing also performed well with an RMSE of 555,081.7 and an MAE of 464,110.5. These findings suggest the potential of Gaussian Processes for accurate sales forecasting. However, the study also highlights the strengths and weaknesses of each forecasting methodology, emphasizing the need for further research to refine existing techniques and explore novel modeling approaches. Overall, this study contributes to the understanding of time series analysis in retail sales forecasting and provides insights for improving future forecasting endeavors.Item ACAR: Adaptive Connectivity Aware Routing for Vehicular Ad Hoc Networks in City Scenarios(Springer, 2010-02) Yang, Qing; Lim, Alvin; Li, Shuang; Fang, Jian; Agrawal, PrathimaMulti-hop vehicle-to-vehicle communication is useful for supporting many vehicular applications that provide drivers with safety and convenience. Developing multi-hop communication in vehicular ad hoc networks (VANET) is a challenging problem due to the rapidly changing topology and frequent network disconnections, which cause failure or inefficiency in traditional ad hoc routing protocols. We propose an adaptive connectivity aware routing (ACAR) protocol that addresses these problems by adaptively selecting an optimal route with the best network transmission quality based on statistical and real-time density data that are gathered through an on-the-fly density collection process. The protocol consists of two parts: 1) select an optimal route, consisting of road segments, with the best estimated transmission quality, and 2) in each road segment of the chosen route, select the most efficient multi-hop path that will improve the delivery ratio and throughput. The optimal route is selected using our transmission quality model that takes into account vehicle densities and traffic light periods to estimate the probability of network connectivity and data delivery ratio for transmitting packets. Our simulation results show that the proposed ACAR protocol outperforms existing VANET routing protocols in terms of data delivery ratio, throughput and data packet delay. Since the proposed model is not constrained by network densities, the ACAR protocol is suitable for both daytime and nighttime city VANET scenarios.Item Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting(Sciencedomain International, 2024-06) C., Cyril Neba; F., Gerard Shu; Nsuh, Gillian; A., Philip Amouda; F.. Adrian Neba; Webnda, F.; Ikpe, Victory; Orelaja, Adeyinka; Sylla, Nabintou AnissiaIn the rapidly evolving landscape of retail analytics, the accurate prediction of sales figures holds paramount importance for informed decision-making and operational optimization. Leveraging diverse machine learning methodologies, this study aims to enhance the precision of Walmart sales forecasting, utilizing a comprehensive dataset sourced from Kaggle. Exploratory data analysis reveals intricate patterns and temporal dependencies within the data, prompting the adoption of advanced predictive modeling techniques. Through the implementation of linear regression, ensemble methods such as Random Forest, Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), this research endeavors to identify the most effective approach for predicting Walmart sales. Comparative analysis of model performance showcases the superiority of advanced machine learning algorithms over traditional linear models. The results indicate that XGBoost emerges as the optimal predictor for sales forecasting, boasting the lowest Mean Absolute Error (MAE) of 1226.471, Root Mean Squared Error (RMSE) of 1700.981, and an exceptionally high R-squared value of 0.9999900, indicating near-perfect predictive accuracy. This model's performance significantly surpasses that of simpler models such as linear regression, which yielded an MAE of 35632.510 and an RMSE of 80153.858. Insights from bias and fairness measurements underscore the effectiveness of advanced models in mitigating bias and delivering equitable predictions across temporal segments. Our analysis revealed varying levels of bias across different models. Linear Regression, Multiple Regression, and GLM exhibited moderate bias, suggesting some systematic errors in predictions. Decision Tree showed slightly higher bias, while Random Forest demonstrated a unique scenario of negative bias, implying systematic underestimation of predictions. However, models like GBM, XGBoost, and LGB displayed biases closer to zero, indicating more accurate predictions with minimal systematic errors. Notably, the XGBoost model demonstrated the lowest bias, with an MAE of -7.548432 (Table 4), reflecting its superior ability to minimize prediction errors across different conditions. Additionally, fairness analysis revealed that XGBoost maintained robust performance in both holiday and non-holiday periods, with an MAE of 84273.385 for holidays and 1757.721 for non-holidays. Insights from the fairness measurements revealed that Linear Regression, Multiple Regression, and GLM showed consistent predictive performance across both subgroups. Meanwhile, Decision Tree performed similarly for holiday predictions but exhibited better accuracy for non-holiday sales, whereas, Random Forest, XGBoost, GBM, and LGB models displayed lower MAE values for the non-holiday subgroup, indicating potential fairness issues in predicting holiday sales. The study also highlights the importance of model selection and the impact of advanced machine learning techniques on achieving high predictive accuracy and fairness. Ensemble methods like Random Forest and GBM also showed strong performance, with Random Forest achieving an MAE of 12238.782 and an RMSE of 19814.965, and GBM achieving an MAE of 10839.822 and an RMSE of 1700.981. This research emphasizes the significance of leveraging sophisticated analytics tools to navigate the complexities of retail operations and drive strategic decision-making. By utilizing advanced machine learning models, retailers can achieve more accurate sales forecasts, ultimately leading to better inventory management and enhanced operational efficiency. The study reaffirms the transformative potential of data-driven approaches in driving business growth and innovation in the retail sector.Item An Affective Computing in Virtual Reality Environments for Managing Surgical Pain and Anxiety(2019-12) Prabhu, Vishnunarayan G.; Linder, Courtney; Stanley, Laura M.; Morgan, RobertPain and anxiety are common accompaniments of surgery. About 90% of people indicate elevated levels of anxiety during pre-operative care, and 66% of the people report moderate to high levels of pain immediately after surgery. Currently, opioids are the primary method for pain management during postoperative care, and approximately one in 16 surgical patients prescribed opioids becomes a long-term user. This, along with the current opioid epidemic crisis calls for alternative pain management mechanisms. This research focuses on utilizing affective computing techniques to develop and deliver an adaptive virtual reality experience based on the user's physiological response to reduce pain and anxiety. Biofeedback is integrated with a virtual environment utilizing the user's heart rate variability, respiration, and electrodermal activity. Early results from Total Knee Arthroplasty patients undergoing surgery at Patewood Memorial Hospital in Greenville, SC demonstrate promising results in the management of pain and anxiety during pre and post-operative care.Item AirLab: Distributed Infrastructure for Wireless Measurements(USENIX, 2010) Kone, Vinod; Zheleva, Mariya; Wittie, Mike P.; Zhang, Zengbin; Zhao, Xiaohan; Zhao, Ben Y.; Belding, Elizabeth M.; Zheng, Haitao; Almeroth, Kevin C.The importance of experimental research in the field of wireless networks is well understood. So far researchers have either built their own testbeds or accessed third-party controlled testbeds (http://orbit-lab.org) or used publicly available traces (http://crawdad.cs.dartmouth.edu) for evaluation. While immensely useful, all these approaches have their drawbacks. While building own test beds requires cost and effort, third-party controlled test beds do not replicate real network deployments. On the other hand, the publicly available traces are often collected using different software and hardware platforms, making it very difficult to compare results across traces. As a result, observations are often inconsistent across different networks, leading researchers to draw potentially conflicting conclusions across their own studies. To facilitate meaningful analysis of wireless networks and protocols, we need a way to collect measurement traces across a wide variety of network deployments, all using a consistent set of measurement metrics. Widespread multi-faceted data collection will provide multiple viewpoints of the same network, enabling deeper understanding of both self and exterior interference properties, spectrum usage, network usage, and a wide variety of other factors. Furthermore, data collected in this manner across a variety of heterogeneous network types, such as university, corporate, and home environments, will facilitate cross-comparison of observed network phenomena within each of these settings. To address the critical need for comparable and consistent wireless traces, we propose AirLab, a publicly accessible distributed infrastructure for wireless measurementsItem An Architecture of Cloud-Assisted Information Dissemination in Vehicular Networks(2016-05) Binhai, Zhu; Wu, Shaoen; Yang, QingVehicular network technology allows vehicles to exchange real-time information between each other, which plays a vital role in the development of future intelligent transportation systems Existing research on vehicular networks assumes that each vehicle broadcasts collected information to neighboring vehicles, so that information is shared among vehicles. The fundamental problem of what information is delivered with which vehicle(s), however, has not been adequately studied. We propose an innovative cloud-assisted architecture to facilitate intelligent information dissemination among vehicles. Within the novel architecture, virtual social connections between vehicles are created and maintained on the cloud. Vehicles with similar driving histories are considered friends in a vehicular social network (VSN). The closeness of the relation between two vehicles in a VSN is then modeled by the three-valued subjective logic model. Based on the closeness between vehicles, only relevant information will be delivered to vehicles that are likely interested in it. The cloud-assisted architecture coordinates vehicular social connection construction, VSN maintenance, vehicle closeness assessment, and information dissemination.Item Breakpoint distance and PQ-trees(2020-12) Haitao, Jiang; Hong, Liu; Cedric, Chauve; Binhai, ZhuThe PQ-tree is a fundamental data structure that has also been used in comparative genomics to model ancestral genomes with some uncertainty. To quantify the evolution between genomes represented by PQ-trees, in this paper we study two fundamental problems of PQ-tree comparison motivated by this application. First, we show that the problem of comparing two PQ-trees by computing the minimum breakpoint distance among all pairs of permutations generated respectively by the two considered PQ-trees is NP-complete for unsigned permutations. Next, we consider a generalization of the classical Breakpoint Median problem, where an ancestral genome is represented by a PQ-tree and p>_1 permutations are given and we want to compute a permutation generated by the PQ-tree that minimizes the sum of the breakpoint distances to the p permutations (or k). We show that this problem is also NP-complete for , and is fixed-parameter tractable with respect to k for p>_1.Item Cascading Impact of Lag on User Experience in Multiplayer Games(USENIX, 2013) Howard, Eben; Cooper, Clint; Wittie, Mike P.; Yang, QingPlaying cooperative multiplayer games should be fun for everyone involved and part of having fun in games is being able to perform well, be immersed, and stay engaged [13, 17]. These indicators of enjoyment are part of a user's Quality of Experience (QoE), a measure which further includes additional metrics such as attention levels and ability to succeed. Players stop playing the game when it ceases to provide a high enough QoE, especially in cooperative and social games. [8, 18, 19]. Industry application development and current research both operate with the assumption that for any given individual in a group, that individual's QoE is affected only by their own network condition and not the network conditions of the other group members [4, 7, 8]. We show that this assumption is incorrect. Our research shows that the QoE of all group members is negatively affected by a single member's lag (communication delay, or loss caused by poor network conditions). Understanding a user's QoE as a function that includes other users' network conditions has the potential to improve lag mitigation strategies for multiplayer games and other group applications.Item Comparative Investigation on CSMA/CA-Based Opportunistic Random Access for Internet of Things(IEEE, 2014-01) Tang, Chong; Song, Lixing; Balasubramani, Jagadeesh; Wu, Shaoen; Biaz, Saad; Yang, Qing; Wang, HonggangWireless communication is indispensable to Internet of Things (IoT). Carrier sensing multiple access/collision avoidance (CSMA/CA) is a well-proven wireless random access protocol and allows each node of equal probability in accessing wireless channel, which incurs equal throughput in long term regardless of the channel conditions. To exploit node diversity that refers to the difference of channel condition among nodes, this paper proposes two opportunistic random access mechanisms: overlapped contention and segmented contention, to favor the node of the best channel condition. In the overlapped contention, the contention windows of all nodes share the same ground of zero, but have different upper bounds upon channel condition. In the segmented contention, the contention window upper bound of a better channel condition is smaller than the lower bound of a worse channel condition; namely, their contention windows are segmented without any overlapping. These algorithms are also polished to provide temporal fairness and avoid starving the nodes of poor channel conditions. The proposed mechanisms are analyzed, implemented, and evaluated on a Linux-based testbed and in the NS3 simulator. Extensive comparative experiments show that both opportunistic solutions can significantly improve the network performance in throughput, delay, and jitter over the current CSMA/CA protocol. In particular, the overlapped contention scheme can offer 73.3% and 37.5% throughput improvements in the infrastructure-based and ad hoc networks, respectively.Item Computing the Tandem Duplication Distance is NP-Hard(Society for Industrial & Applied Mathematics, 2022-03) Lafond, Manuel; Zhu, Binhai; Zou, PengIn computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment---this can be represented as the string operation AXB⇒AXXB. Tandem exon duplications have been found in many species such as human, fly, and worm and have been largely studied in computational biology. The tandem duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet Σ, compute the smallest sequence of TDs required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that TDs have received much attention ever since. In this paper, we focus on the special case when all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. We first prove that this problem is NP-hard when the alphabet size is unbounded, settling the 16-year-old open problem. We then show how to adapt the proof to |Σ|=4, hence proving the NP-hardness of the TD problem for any |Σ|≥4. One of the tools we develop for the reduction is a new problem called Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.Item Counterfactual Explanations of Neural Network-Generated Response Curves(IEEE, 2023-06) Morales, Giorgio; Sheppard, JohnResponse curves exhibit the magnitude of the response of a sensitive system to a varying stimulus. However, response of such systems may be sensitive to multiple stimuli (i.e., input features) that are not necessarily independent. As a consequence, the shape of response curves generated for a selected input feature (referred to as “active feature”) might depend on the values of the other input features (referred to as “passive features”). In this work we consider the case of systems whose response is approximated using regression neural networks. We propose to use counterfactual explanations (CFEs) for the identification of the features with the highest relevance on the shape of response curves generated by neural network black boxes. CFEs are generated by a genetic algorithm-based approach that solves a multi-objective optimization problem. In particular, given a response curve generated for an active feature, a CFE finds the minimum combination of passive features that need to be modified to alter the shape of the response curve. We tested our method on a synthetic dataset with 1-D inputs and two crop yield prediction datasets with 2-D inputs. The relevance ranking of features and feature combinations obtained on the synthetic dataset coincided with the analysis of the equation that was used to generate the problem. Results obtained on the yield prediction datasets revealed that the impact on fertilizer responsivity of passive features depends on the terrain characteristics of each field.Item Cyber-Physical Systems for Water Sustainability: Challenges and Opportunities(IEEE, 2015) Wang, Zhaohui; Song, Houbing; Watkins, David W.; Ong, Keat Ghee; Xue, Pengfei; Yang, Qing; Shi, XianmingWater plays a vital role in the proper functioning of the Earth’s ecosystems, and practically all human activities, such as agriculture, manufacturing, transportation, and energy production. The proliferation of industrial and agricultural activities in modern society, however, poses threats to water resources in the form of chemical, biological, and thermal pollution. On the other hand, tremendous advancement in science and technology offers valuable tools to address water sustainability challenges. Key technologies, including sensing technology, wireless communications and networking, hydrodynamic modeling, data analysis, and control, enable intelligently wireless networked water Cyber-Physical Systems (CPS) with embedded sensors, processors, and actuators that can sense and interact with the water environment. This article will provide an overview of water CPS for sustainability from four critical aspects: sensing and instrumentation, communications and networking, computing, and control, and explore opportunities and design challenges of relevant techniques.Item Designing multi-phased CO2 capture and storage infrastructure deployments(Elsevier BV, 2022-08) Jones, Erick C.; Yaw, Sean; Bennett, Jeffrey A.; Ogland-Hand, Jonathan D.; Strahan, Cooper; Middleton, Richard S.CO2 capture and storage (CCS) is a climate change mitigation strategy aimed at reducing the amount of CO vented into the atmosphere by capturing CO emissions from industrial sources, transporting the CO via a dedicated pipeline network, and injecting it into geologic reservoirs. Designing CCS infrastructure is a complex problem requiring concurrent optimization of source selection, reservoir selection, and pipeline routing decisions. Current CCS infrastructure design methods assume that project parameters including costs, capacities, and availability, remain constant throughout the project’s lifespan. In this research, we introduce a novel, multi-phased, CCS infrastructure design model that allows for analysis of more complex scenarios that allow for variations in project parameters across distinct phases. We demonstrate the efficacy of our approach with theoretical analysis and an evaluation using real CCS infrastructure data.Item Dispersing and grouping points on planar segments(Elsevier BV, 2022-09) He, Xiaozhou; Lai, Wenfeng; Zhu, Binhai; Zou, PengMotivated by (continuous) facility location, we study the problem of dispersing and grouping points on a set of segments (of streets) in the plane. In the former problem, given a set of n disjoint line segments in the plane, we investigate the problem of computing a point on each of the n segments such that the minimum Euclidean distance between any two of these points is maximized. We prove that this 2D dispersion problem is NP-hard, in fact, it is NP-hard even if all the segments are parallel and are of unit length. This is in contrast to the polynomial solvability of the corresponding 1D problem by Li and Wang (2016), where the intervals are in 1D and are all disjoint. With this result, we also show that the Independent Set problem on Colored Linear Unit Disk Graph (meaning the convex hulls of points with the same color form disjoint line segments) remains NP-hard, and the parameterized version of it is in W[2]. In the latter problem, given a set of n disjoint line segments in the plane we study the problem of computing a point on each of the n segments such that the maximum Euclidean distance between any two of these points is minimized. We present a factor-1.1547 approximation algorithm which runs in time. Our results can be generalized to the Manhattan distance.Item Efficient Minimum Flow Decomposition via Integer Linear Programming(Mary Ann Liebert Inc, 2022-11) Dias, Fernando H.C.; Williams, Lucia; Mumey, Brendan; Tomescu, Alexandru I.Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.Item An Empirical Internet Protocol Network Intrusion Detection using Isolation Forest and One-Class Support Vector Machines(The Science and Information Organization, 2023-01) Shu Fuhnwi, Gerard; Adedoyin, Victoria; Agbaje, Janet O.With the increasing reliance on web-based applications and services, network intrusion detection has become a critical aspect of maintaining the security and integrity of computer networks. This study empirically investigates internet protocol network intrusion detection using two machine learning techniques: Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), combined with ANOVA F-test feature selection. This paper presents an empirical study comparing the effectiveness of two machine learning algorithms, Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), with ANOVA F-test feature selection in detecting network intrusions using web services. The study used the NSL-KDD dataset, encompassing hypertext transfer protocol (HTTP), simple mail transfer protocol (SMTP), and file transfer protocol (FTP) web services attacks and normal traffic patterns, to comprehensively evaluate the algorithms. The performance of the algorithms is evaluated based on several metrics, such as the F1-score, detection rate (recall), precision, false alarm rate (FAR), and Area Under the Receiver Operating Characteristic (AUCROC) curve. Additionally, the study investigates the impact of different hyper-parameters on the performance of both algorithms. Our empirical results demonstrate that while both IF and OC-SVM exhibit high efficacy in detecting network intrusion attacks using web services of type HTTP, SMTP, and FTP, the One-Class Support Vector Machines outperform the Isolation Forest in terms of F1-score (SMTP), detection rate(HTTP, SMTP, and FTP), AUCROC, and a consistent low false alarm rate (HTTP). We used the t-test to determine that OCSVM statistically outperforms IF on DR and FAR.Item An empirical study of reliable networking for vehicular networks using IEEE 802.11n(Inderscience Publishers, Geneva, SWITZERLAND, 2014) Lee, Seungbae; Lim, Alvin; Yang, QingThe IEEE 802.11n technology is becoming more and more prevalent in wireless networks due to its significant enhancements in network performance. To examine whether the reliability of 802.11n is sufficient for vehicular networks, we conducted extensive experiments on inter-vehicle and intra-vehicle communications in vehicular environments. From this empirical study, we found that 802.11n provides high performance with stable throughput and reliable coverage in most cases. However, 802.11n protocols do not detect frequent changes of propagation and polarisation due to vehicle mobility and its rate adaptation algorithms improperly select multi-stream rates under channel fading conditions, although single-stream rates perform better. Moreover, an optimal antenna alignment that enables High Throughput (HT) operation using parallel data streams needs further investigation in vehicular environments. Our findings have profound implications on the protocol design and appropriate configuration for reliable networking in vehicular networks using 802.11n.Item ES-MPICH2: A Message Passing Interface with Enhanced Security(IEEE, 2012-01) Ruan, Xiaojun; Yang, Qing; Alghamdi, Mohammed I.; Yin, Shu; Qin, XiaoAn increasing number of commodity clusters are connected to each other by public networks, which have become a potential threat to security sensitive parallel applications running on the clusters. To address this security issue, we developed a Message Passing Interface (MPI) implementation to preserve confidentiality of messages communicated among nodes of clusters in an unsecured network. We focus on M PI rather than other protocols, because M PI is one of the most popular communication protocols for parallel computing on clusters. Our MPI implementation-called ES-MPICH2-was built based on MPICH2 developed by the Argonne National Laboratory. Like MPICH2, ES-MPICH2 aims at supporting a large variety of computation and communication platforms like commodity clusters and high-speed networks. We integrated encryption and decryption algorithms into the MPICH2 library with the standard MPI interface and; thus, data confidentiality of MPI applications can be readily preserved without a need to change the source codes of the MPI applications. MPI-application programmers can fully configure any confidentiality services in MPICHI2, because a secured configuration file in ES-MPICH2 offers the programmers flexibility in choosing any cryptographic schemes and keys seamlessly incorporated in ES-MPICH2. We used the Sandia Micro Benchmark and Intel MPI Benchmark suites to evaluate and compare the performance of ES-MPICH2 with the original MPICH2 version. Our experiments show that overhead incurred by the confidentiality services in ES-MPICH2 is marginal for small messages. The security overhead in ES-MPICH2 becomes more pronounced with larger messages. Our results also show that security overhead can be significantly reduced in ES-MPICH2 by high-performance clusters. The executable binaries and source code of the ES-MPICH2 implementation are freely available at http:// www.eng.auburn.edu/~xqin/software/es-mpich2/.Item Experimental Study: A LQI-Based Ranging Technique in ZigBee Sensor Networks(Inderscience Publishers, Geneva, SWITZERLAND, 2015) Yang, Ting; Yang, Qing; Cheng, LihuaRanging technology which estimates the distance between two communicating wireless nodes has been widely used as a necessary component in localization solutions for wireless sensor networks (WSN). LQI (Link Quality Indicator) is a metric introduced in IEEE 802.15.4 that measures the error in the incoming modulation of successfully received packets that pass the CRC (Cyclic Redundancy Check). Because of low system cost and less computational complexity, LQI-based ranging techniques are increasingly applied in Zigbee sensor networks. However, due to the environmental affects and electronic noise generated by hardware, raw LQI data could not be directly aligned with distances. To eliminate errors in LQI data and obtain higher ranging accuracy, we design and evaluate a novel LQI-based ranging technique which includes three essential data processing components: pre-correction, error compensation and mixed regression analysis. First, anchor nodes with known locations are used in pre-correction process to correct LQI measurements against the empirical regression function obtained from historical data. Then, error compensation is applied to eliminate the intrinsic error in LQI data. Finally, ranging results are refined by the mixed regression analysis. The proposed ranging technique is implemented and evaluated on a Zigbee sensor prototype Tarax. Experiment results show that the average ranging error is less than 1m, confirming that the proposed technique is able to achieve higher ranging accuracy and suitable for localization applications in WSN.Item Exploiting Locality of Interest in Online Social Networks(ACM CoNEXT, 2010) Wittie, Mike P.; Pejovic, Veljko; Deek, Lara B.; Almeroth, Kevin C.; Zhao, Ben Y.Online Social Networks (OSN) are fun, popular, and socially significant. An integral part of their success is the immense size of their global user base. To provide a consistent service to all users, Facebook, the world’s largest OSN, is heavily dependent on centralized U.S. data centers, which renders service outside of the U.S. sluggish and wasteful of Internet bandwidth. In this paper, we investigate the detailed causes of these two problems and identify mitigation opportunities. Because details of Facebook’s service remain proprietary, we treat the OSN as a black box and reverse engineer its operation from publicly available traces. We find that contrary to current wisdom, OSN state is amenable to partitioning and that its fine grained distribution and processing can significantly improve performance without loss in service consistency. Through simulations of reconstructed Facebook traffic over measured Internet paths, we show that user requests can be processed 79% faster and use 91% less bandwidth. We conclude that the partitioning of OSN state is an attractive scaling strategy for Facebook and other OSN services.