Traffic prediction problems concern the mobility of human crowds in urban city landscapes. The study of these problems is crucial to improve the efficiency of traffic flows in urban networks and people’s quality of life. Deep learning, in recent years, has achieved impeccable results and outperformed traditional statistical methods for solving traffic-related problems in the literature. However, deep learning suffers from poor interpretability. The aim of this dissertation is to investigate deep learning models applied in the traffic domain, and interpret the behaviors of the models by analyzing salient global feature importance and specific input-to-output relationship. First, a literature review of traffic prediction problems and deep learning models is included, followed by a review of analyzer methods for interpreting deep learning models. Second, we explore the usage of Twitter’s tweets as a new source of additional information in the new digital age to improve the performance of a crowd flow prediction model ST-ResNet on the city-wide crowd flow prediction task, as well as to provide additional context to the prediction in the form of human natural language. Third, we take inspirations from deep learning attribution methods such as Integrated Gradients and SmoothGrad, and proposed a novel improved method, SmoothTaylor, which is derived from the Taylor’s theorem. Finally, we discuss future work as we share some preliminary applied results of the above attribution methods on the traffic status prediction for graph-based traffic status prediction model.
@mastersthesis{Goh21Analyzing,author={Goh, Gary S. W.},title={Analyzing Deep Learning Models for Traffic Prediction},school={Singapore University of Technology and Design},address={Singapore},year={2021}}
Integrated Gradients as an attribution method for deep neural network models offers simple implementability. However, it suffers from noisiness of explanations which affects the ease of interpretability. The SmoothGrad technique is proposed to solve the noisiness issue and smoothen the attribution maps of any gradient-based attribution method. In this paper, we present SmoothTaylor as a novel theoretical concept bridging Integrated Gradients and SmoothGrad, from the Taylor’s theorem perspective. We apply the methods to the image classification problem, using the ILSVRC2012 ImageNet object recognition dataset, and a couple of pretrained image models to generate attribution maps. These attribution maps are empirically evaluated using quantitative measures for sensitivity and noise level. We further propose adaptive noising to optimize for the noise scale hyperparameter value. From our experiments, we find that the SmoothTaylor approach together with adaptive noising is able to generate better quality saliency maps with lesser noise and higher sensitivity to the relevant points in the input space as compared to Integrated Gradients.
@inproceedings{GohLWSB21Understanding,author={Goh, Gary S. W. and Lapuschkin, Sebastian and Weber, Leander and Samek, Wojciech and Binder, Alexander},title={Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution},booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},pages={4949--4956},publisher={IEEE},year={2021},address={Virtual Event / Milan, Italy},doi={10.1109/ICPR48806.2021.9413242},arxiv={2004.10484}}
We explore the usage of real-time tweets to explain for non-recurring large-scale spatio-temporal crowd movement. The aim is to evaluate the usefulness of tweets to improve the performance of city-wide crowd flow prediction. We conduct experiments in the context of Singapore city to investigate our proposition by extending upon an existing crowd flow prediction model. Implemented using a deep-neural-network-based approach, an end-to-end predictive model is configured to take in tweets as additional inputs to forecast the future flow of crowds in an urban environment. We extract various features from tweets, such as tweet counts, tweet tenses and sentiments as additional signals to the predictive model. From the experimental results, we show that some models are able to improve the prediction accuracy, and share our insights on how tweets are related to crowd flows.
@inproceedings{GohKZ18Twitter,author={Goh, Gary S. W. and Koh, Jing Yu and Zhang, Yue},title={Twitter-Informed Crowd Flow Prediction},booktitle={2018 IEEE International Conference on Data Mining Workshops (ICDM)},pages={624--631},publisher={IEEE},year={2018},address={Singapore},doi={10.1109/ICDMW.2018.00097}}
Satellite data is discrete in both space and time; it can be considered as temporal snapshots (time series) of lattice processes. As the raw datasets are often too large to host publicly, processed datasets with a coarse spatial resolution are often hosted as an alternative. Nevertheless, with a regular grid, the inhomogeneous variability in the lattice processes cannot be captured effectively. In this paper, a quadtree-based spatial data dimension reduction algorithm is demonstrated. Based on the stratum variance, this algorithm iteratively divides lattice data into strata of fours. In this way, the number of strata in an area can be correlated to the variability of that area. A satellite-derived surface solar radiation (SSR) dataset is used for the case study. Using parallel computing, the quadtree algorithm is applied on each temporal snapshot of SSR in the dataset. The processed data is then saved in a list structure. Finally, a solar resource assessment application, namely, optimizing the orientation of a photovoltaic array, is considered to demonstrate the effectiveness and efficiency of the dimension-reduced dataset.
@inproceedings{YangGJZ16Spatial,author={Yang, Dazhi and Goh, Gary S. W. and Jiang, Siwei and Zhang, Allan N.},title={Spatial Data Dimension Reduction using Quadtree: A Case Study on Satellite-derived Solar Radiation},booktitle={2016 IEEE International Conference on Big Data},pages={3807--3812},publisher={IEEE Computer Society},year={2016},address={Washington DC, USA},doi={10.1109/BigData.2016.7841052}}
Coordination across a supply chain creates win-win situation for all players in that supply chain; we address the benefits, in terms of forecast accuracy, of reconciling demand forecasts across a supply chain. In Part III of this three-part paper, we continue our discussion on optimal reconciliation of forecasts. Two contributions are made in this paper: 1) the grouped reconciliation technique is used to address the forecast inconsistency in situations when more than one hierarchy can be defined in a supply chain, and 2) minimum trace (MinT) estimator is used to further improve the reconciliation accuracy on top of the weighted least square (WLS) approach, which was used in the earlier parts of this three-part paper. Following the earlier works, the same set of fast moving consumer goods data is used here. The current results are compared to the previous ones. It is shown that the MinT reconciliation technique outperforms the WLS approach, which has been previously identified as the best reconciliation technique for the data from the bottled juice category in the Dominick’s Finer Food dataset.
@inproceedings{YangGJZ16aForecast,author={Yang, Dazhi and Goh, Gary S. W. and Jiang, Siwei and Zhang, Allan N.},title={Forecast UPC-level FMCG demand, Part III: Grouped Reconciliation},booktitle={2016 IEEE International Conference on Big Data},pages={3813--3819},publisher={IEEE Computer Society},year={2016},address={Washington DC, USA},doi={10.1109/BigData.2016.7841053}}
Traditional manual design of analytical processes is challenging as it requires a general analyst to have good grasping of numerous algorithms and the interaction effects between each technique and the data across multiple domains. Especially in an increasingly high data variety/multi-domain environment today, this design process can be very laborious/challenging. In this paper, we describe a design optimization approach using design of experiments to determine a suitable design in a standardized text classification process with high classification performance. We focus on sentiment analysis as a use case for this approach, as standard analytical methods in each phase of the sentiment analysis process have been established; from data pre-processing, feature selection and classification. In our proposed approach, we present an automatic and domain-free technique of using design of experiments to this design process, with the sentiment classification evaluation metrics as the performance criteria for optimization. In addition, we show that several interpretable analyses can be made to better understand the complex interaction effects of various analytical techniques with the data, which then can guide a general analyst to select more appropriate process design parameters for better text classification performance.
@inproceedings{GohAZ16Optimizing,author={Goh, Gary S. W. and Ang, Andy J. L. and Zhang, Allan N.},title={Optimizing Performance of Sentiment Analysis through Design of Experiments},booktitle={2016 IEEE International Conference on Big Data},pages={3737--3742},publisher={IEEE Computer Society},year={2016},address={Washington DC, USA},doi={10.1109/BigData.2016.7841042}}
The widespread use of social media and the internet are emerging trends that offer an additional interaction channel for companies to better understand customer sentiments about their brands and products. Sentiment analysis uses text data from social media such as customer comments and reviews, which has the nature of high dimensionality. Without selection, typically there are at least thousands of features (words or phrases) that can be extracted from a text corpus, among which there are many redundant or irrelevant features for sentiment classification task. Thus, it is critical to select a compact yet effective set of features to avoid the complex classifier design and slow running time of classification process. However, very few of existing metrics is able to improve efficacy of feature selection by addressing the issue of sparsity of feature matrix for text data, i.e., many features may appear only in a few documents. In this paper, an improved feature selection metric known as sparsity adjusted information gain (SAIG) is proposed, which modifies the conventional information gain metric and aims to adjust the feature ranking scores according to the sparsity of the feature vector. It is able to use less features to obtain a targeted performance level. The experiment results show that SAIG is able to improve the performance of sentiment classification.
@inproceedings{OngGX15Sparsity,author={Ong, B. Y. and Goh, Gary S. W. and Xu, Chi},title={Sparsity Adjusted Information Gain for Feature Selection in Sentiment Analysis},booktitle={2015 IEEE International Conference on Big Data},pages={2122--2128},publisher={IEEE Computer Society},year={2015},address={Santa Clara, CA, USA},doi={10.1109/BigData.2015.7363995}}
In a big data enabled environment, manufacturers and distributors may have access to previously unobserved retailer-level demand related information. This additional information can be considered in demand forecasting to produce more accurate forecasts, and thus enable better stock-outs management. In Part II of this two-part paper, we explore the hierarchical nature of fast moving consumer goods (FMCG) demand (represented by sales) time series and produce one week ahead rolling forecasts on universal product code (UPC) level (or distributor level, as per our definition below). We show that the hierarchical forecasting framework has significant accuracy improvement over the conventional univariate forecasting methods. The main reason of the observed improvements is due to the price and promotion information available at the retailer level, which is assumed to be unknown to the distributor. To reconcile forecasts according to the hierarchy, only the forecast values at retailer level are needed, the business strategies of individual retailers remain proprietary. A freely available dataset is considered to encourage further exploration. Data exploratory analysis and visualization tools are discussed in Part I of the paper.
@inproceedings{YangGJZA15Forecast,author={Yang, Dazhi and Goh, Gary S. W. and Jiang, Siwei and Zhang, Allan N. and Akcan, Orkan},title={Forecast UPC-level FMCG demand, Part II: Hierarchical Reconciliation},booktitle={2015 IEEE International Conference on Big Data},pages={2113--2121},publisher={IEEE Computer Society},year={2015},address={Santa Clara, CA, USA},doi={10.1109/BigData.2015.7363994}}
We are interested in forecasting a large collection of FMCG demand time series. As the demand of FMCG exists in a hierarchy (from manufacturers to distributors to retailers), the bottom level of the hierarchy may contain thousands or even millions of time series. Producing aggregate consistent forecasts while utilizing the unique features from each time series thus become a technical challenge. To achieve better forecasting results, exploratory analysis is often necessary to obtain insights on the underlying demand generating mechanism for each time series. Exploratory analysis aims at discovering those so-called "exogenous factors", such as price, demand of the complementary/substitutive goods and calendar events, which can help explain some of the demand fluctuation. During forecast accuracy evaluation, outlier detection is also important; a single anomalous time series can contribute much to the overall error. However, in a big data (such as retailing scanner data) enabled environment, exploratory analysis and visualization need much attention, because of the non-scalable nature of the existing methods. Scalability is essential for exogenous factor selection and outlier detection in big time series data. In Part I of this two-part paper, we introduce some exploratory analytics and visualization methods (from not scalable to very scalable) for big retailing time series. Forecasting of the hierarchical FMCG demand is addressed in Part II.
@inproceedings{YangGXZA15Forecast,author={Yang, Dazhi and Goh, Gary S. W. and Xu, Chi and Zhang, Allan N. and Akcan, Orkan},title={Forecast UPC-level FMCG demand, Part I: Exploratory Analysis and Visualization},booktitle={2015 IEEE International Conference on Big Data},pages={2106--2112},publisher={IEEE Computer Society},year={2015},address={Santa Clara, CA, USA},doi={10.1109/BigData.2015.7363993}}