\documentclass{article} % For LaTeX2e
\usepackage{nips13submit_e,times}
\usepackage{hyperref}
\usepackage{url}
%\documentstyle[nips13submit_09,times,art10]{article} % For LaTeX 2.09
%%%%%%% algorithm setup %%%%%%%%
\usepackage{algorithm}% http://ctan.org/pkg/algorithms
%\usepackage{algpseudocode}% http://ctan.org/pkg/algorithmicx
\usepackage[noend]{algpseudocode}
\usepackage{fontspec}
% \setmainfont{Hoefler Text}
\setmainfont[Mapping=tex-text]{Times New Roman}
\newcommand*\DNA{\textsc{dna}}
\newcommand*\Let[2]{\State #1 $\gets$ #2}
\algrenewcommand\alglinenumber[1]{
{\sf\footnotesize\addfontfeatures{Colour=888888,Numbers=Monospaced}#1}}
\algrenewcommand\algorithmicrequire{\textbf{Precondition:}}
\algrenewcommand\algorithmicensure{\textbf{Postcondition:}}
%\newcommand{\listofalgorithms}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{subfigure}
\usepackage{float}
\title{Human Activity Recognition with Smartphones}
\author{
Rao Fu \\
Computing Science \\
Simon Fraser University \\
\texttt{raof@sfu.ca} \\
\And
Yao Song \\
Computing Science \\
Simon Fraser University \\
\texttt{songyaos@sfu.ca} \\
\And
Weipu Zhao \\
Computing Science \\
Simon Fraser University \\
\texttt{weipuz@sfu.ca} \\
}
% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to \LaTeX{} to determine where to break
% the lines. Using \AND forces a linebreak at that point. So, if \LaTeX{}
% puts 3 of 4 authors names on the first line, and the last on the second
% line, try using \AND instead of \And before the third author name.
\newcommand{\fix}{\marginpar{FIX}}
\newcommand{\new}{\marginpar{NEW}}
\nipsfinalcopy % Uncomment for camera-ready version
\begin{document}
\maketitle
\begin{abstract}
This report focuses on improving classification accuracy and reducing computational complexity for human activity recognition problem on public datasets UCI and WISDM. We discussed the benefits of getting access to smartphones in the filed of HAR research. Our experiment indicates that combining AdaBoost M1 algorithm with C4.5 contributes to discriminating several common human activities. Moreover, we showed that it is feasible to reduce computational complexity and achieve high accuracy at the same time by applying correlation-based feature selection.
\end{abstract}
\section{Introduction}
Human activity recognition (HAR) is nothing new. Image based HAR has been studied in the field of computer vision for a long time. The goal of HAR is the same as that of a broader concept called context-aware computing or ubiquitous computing. In ubiquitous computing, the sensor collects data from the user and tries to assist the user with the task. HAR has a wide application in medical area, military field, family entertainment and personal daily life.
In [1], automatic activity recognition techniques are used in the soldier assist system to help soldiers with their after action reports. In [2], HAR techniques are used to help hospital staff with their daily working by estimating their working activities. In [3], daily activities are learned in order to detect abnormality of the person so that caregiver will be alerted if necessary. There are tons of research talking about Microsoft Kinect, which works by capturing the human movement and gestures. Some running shoes have integrated motion sensors to provide performance feedback for athletes [4]. It is obvious that HAR is getting more and more importance in every aspect of our life.
Thanks to the efforts of researchers, basic activities such as sitting, walking, running can be recognized with high accuracy given multiple sensors on the subject. However, it is not practical to have multiple sensors on the user in daily life.
Recently, activity recognition using wearable consumer electronics has attracted more interests. The most popular one is smartphone which boosted in recent years. These days new smartphones are all equipped with multiple sensors for different purposes. People are carrying these “sensors” nearly all the time. These smartphone sensors, especially the accelerometer give us several obvious advantages in the HAR research.
First, smartphones allow the data to be collected anywhere and anytime in the day from the subjects with less obtrusiveness. The subject just needs to carry the smartphone while performing the required activities. In this way, we can obtain realistic data from the real world daily activities. It is much better than the lab controlled data collection method, which usually requires many sensors on different parts of human body. Lab controlled data tends to generate fake high classification accuracy compared with realistic data. As shown in [5], the accuracy dropped nearly $30\%$ when changing from lab data to real data.
Second, we get access to more data by using smartphones as data source. In previous researches, due to lab equipment and other constraints, people usually study on less than 10 subjects, which significantly limited the generalization of the research results. With smartphones as sensors, it is easy to have 30 subjects with different ages. People are using their phones all the time and the data can be accumulated over time.
Overall, we can train the model based on a larger and more realistic dataset with smartphone sensors. However, the difficult problem is we only have one sensor source now and the position of the sensor is not accurately controlled by the researchers.
It has been shown by previous researches [6] that when the number of sensors decreases, the recognition accuracy tends to decrease as well. This fact is proved in smartphone based HAR research. In [7], the detecting accuracy for some activities, such as walking upstairs or walking downstairs is quite low. On the contrary, with multiple sensors, the recognition of such basic activities is not a difficult job [8]. It is also noted that the position of the sensor on the body will affect the recognition task. All this motivates us to study the smartphone based activity recognition problem.
From our point of view, there are mainly two ways to deal with the single sensor problem arising in the smartphone based HAR. The first method relies on the data collection and feature generation and selection step. If we can get access to more data, the model obtained after the training process is more likely to give a better prediction on the future testing data. The second method does not focus on the dataset, but the classification algorithm which trains the dataset. Researchers have tried different learning algorithms on their own datasets.
However, currently, different research groups use data collected by themselves from various sources. Moreover, most data are not available to the public. Among those available, raw data is seldomly provided, only transformed feature data is accessible. This makes it impossible to aggregate the data from different groups. It is also impossible to compare the results presented by different research groups. This lack of data situation stops us from generating new features based on raw data. However, given publicly available dataset, we can still try to target the important features to gain some insights into the smartphone based activity recognition problem.
In this paper, we focus on the second approach. We make use of publicly available datasets online and try to improve the classification accuracy by choosing the proper learning algorithm. We also reduce the computational complexity by employing feature selection algorithm. The rest of the paper is organized as follows: Sectoin 2 talks about the main approach we have utilized in the learning process. Section 3 presents and discusses our simulation results. Finally, we conclude our work in section 4.
\section{Approach}
\subsection{HAR Procedures}
According to [4], HAR process can be divided into several stages. We first obtain raw data from sensors. In the smartphone case, the data may come from the accelerometer. The raw data need to be preprocessed for classification training algorithm . For smartphone accelerometer, time series data were segmented into fractions. Features are then generated and selected based on raw data. Given enough data samples, we can train our model using proper learning algorithms. %A typical smartphone accelerometer will measure the 3-axial linear acceleration at a rate of $50$Hz. Given a $10$ second segment, we will have a maximum $500$ data points in total. We may sample the accelerometer at a lower frequency depending on our application.
\subsection{Datasets}
In this report, we use two datasets publicly available online. The first dataset is from the WISDM group [7]. The data was collected from users carrying an Android smartphone with a data collection app in the lab environment. The Android phone is put in the subject’s front leg pocket and five activities including walking , jogging, walking upstairs, walking downstairs, sitting, and standing, were monitored. The data was sampled at the rate of $20$Hz. Then the raw data is segmented into $10$ second segments without overlapping. So there are $200$ data points in each segments.
From $6$ basic features, $43$ features in total are generated from the raw data. Details about the features can be found in the original paper.
The second dataset is from [9]. The data was collected from $30$ volunteers with the age range of $19$-$48$ years old. Each subject performed six activities wearing a smartphone on the waist. The six activities are walking, walking upstairs, walking downstairs, sitting, standing and laying. The data was sampled at maximum rate of $50$Hz. The segmentation method is different from the first dataset. The window size is $2.56$ seconds. So there are $128$ data points in each segment. However, the segments are $50\%$ overlapped.
\subsection{AdaBoost M1}
Boosting method was first introduced by Yoav Freund and Robert Schapire [10]. As described in [11], it works by combining several weak learning algorithms with weights into a strong learning algorithm which yields higher accuracy than each base learning algorithm. AdaBoost, short for ``adaptive boost'', is an improved version of the original boosting algorithm in the way that it will adapt to the individual learner’s error rates. In the WEKA [8], Adaboost M1[12] method is implemented.
Yoav Freund and Robert E. Schapire proves that when extending AdaBoost M1 to a multiclass classification problem, each weak learner has prediction accuracy more than $50\%$. In our experiment shown in section 3, the overall accuracy achieves over $80\%$ when we only apply C4.5 algorithm. Therefore, using C4.5 as our weak hypothesis in AdaBoost M1, it is plausible to improve accuracy. The whole algorithm [12] is as below.
%Boosting method only assumes the individual weak learning algorithm a little better than the random guess. It also boasts of less overfitting problems than other learning algorithms.
%The key problem in using AdaBoost lies in choosing the proper weak learners for combination. AdaBoost algorithms have several variants. In the WEKA [8], Adaboost M1[12] method is implemented.
%AdaBoost M1 is the first extension of AdaBoost and it is the most straightforward generalization. It has been proved that this method works better than random choosing for a binary classification problem. When extending to a multi-class classification problem, Yoav Freund and Robert E. Schapire in [13] proves that each weak learn have prediction accuracy more than $50\%$. In our experiment shown in section 3, the overall accuracy achieves over $80\%$ when we only apply C4.5 algorithm. Therefore, using C4.5 as our weak hypothesis in AdaBoost M1, it is achievable to improve accuracy by extending AdaBoost M1 to multi-class classification for human activity recognition.
\begin{algorithm}
\caption{AdaBoost M1
\label{AdaBoost M1}}
\begin{algorithmic}[1]
\Require{\emph{\textbf{input}} $(x_{1},y_{1}),...,(x_{m},y_{m})$, $y_{i} \in Y = \{1,...,k\}$}
\Statex
\Function{AdaBoost.M1}{$x, y$}
\Let{$D_{1}(i)$}{$1/m$}
\For{$t \gets 1 \textrm{ to } T$}
\State \emph{call\ weak\ learning\ algorithm\ with} $D_{t}$
\State \emph{compute\ hypothesis} $h_{t}(x)$
\State \emph{calculate} $\epsilon = \sum_{i:h_{t}(x_{i})\neq y_{i}}D_{t}(i)$
\If{$\epsilon > 1/2$}
\Let{$T$}{$t - 1$}
\State \emph{abort\ the\ loop}
\Else
\Let{$\beta_{t}$}{$\epsilon/(1 - \epsilon)$}
\EndIf
\If{$h_{t}(x_{i}) = y_{i}$}
\Let{$D_{t+1}(i)$}{$\frac{D_{t}(i)}{Z_{t}} \times \beta_{t}$}
\Else
\Let{$D_{t+1}(i)$}{$\frac{D_{t}(i)}{Z_{t}} \times 1$}
\EndIf
\EndFor
\State \Return{$h_{fin}(x) = \mathit{argmax}_{y\in Y}\sum_{t:h_{t}(x)=y}\log \frac{1}{\beta_{t}}$}
\EndFunction
\end{algorithmic}
\end{algorithm}
The algorithm assigns weight distribution $D_{t}$ to training instances. Initially, $D_{1}$ is equally distributed among samples. It runs $T$ iterations. For each iteration, it focuses on hypothesis misclassifications by applying the user-specified weak learning algorithm with distribution $D_{t}$ and for each misclassification, accumulating its distribution $D_{t}(i)$. The accumulation is used to obtain weights for correct classifications. Then we compute weighted distributions among instances and renormalize the results for next iteration distributions. In Algorithm 1, it is obvious that correct classification will always get a weight no more than $1$, so its weight is smaller than incorrect classification. It explains that AdaBoost M1 puts the most weight on the ``hardest'' sample given a weak learning algorithm. Finally, for a given sample $x_{i}$, we consider all rounds of hypotheses and their predictions for labels. We choose the label $y_{i}\in Y$ that maximizes the sum of weights where the output of hypothesis $h_{t}(x_{i})$ is equal to $y_{i}$ and set final hypothesis $h_{fin} (x_{i})$ output $y_{i}$.
\subsection{Correlation-based Feature Selection}
In practice, a high dimension of features might result in ``curse of dimensionality''. Moreover, some of these features might be redundant attributes and will mislead the modeling algorithm. Furthermore, taking irrelevant features into consideration might lead to over-fitting on test data because it decreases the importance of good quality features. In either case, a high dimension of features increases complexity and has a negative effect on final result. Hence, feature selection becomes essential in our experiment since we have $561$ features in the original dataset.
Correlation-based Feature Selection (CFS) for machine learning is introduced by Mark A. Hall [13]. CFS is a filter algorithm that ranks subsets of features by a correlation based heuristic evaluation function as shown below [13]:
\begin{equation}
M_{s} = \frac{k\bar{r}_{cf}}{\sqrt{k + k(k - 1)\bar{r}_{ff}}},\qquad f\in S
\end{equation}
$M_{s}$ is the measure of subset $S$ containing $k$ values, while $\bar{r}_{cf}$ is the mean correlation for feature-class, and $\bar{r}_{ff}$ is the feature-feature mean correlation. CFS chooses the subset of features which has the highest measure. The chosen subset holds the property that features inside this subset have high correlation with the class and unrelated with each other. CFS assumes that features are conditionally independent on the class. However, it can still work well when there exists slight feature dependence.
% \subsection{Search Strategy}
% To enumerate all possible combinations of subset $S$ on a given high dimensional instance space is not feasible in reality[15]. Therefore, three search strategies are adopted to obtain optimal subset of features[14]. The first one is forward selection. It begins with $0$ features and then adding single feature to the subset on the basis of greedy algorithm until at some point, adding any possible single feature will not increase the evaluation of the subset. The second method is backward elimination. It starts with all the features and then greedily eliminate single feature as long as the elimination does not decrease the evaluation. Best first is the last strategy which can start with either no features by forward searching or full features by backward searching. A stop criterion is used to prevent from searching the entire feature subsets.
% \subsection{Complexity}
% The computation expense for CFS algorithm includes to compute feature correlation matrix, feature selection and measure of feature subset[14]. The complexity for the first and last one is $O(nd^{2})$ and $O(k^{2})$ separately, when there are $n$ instances, $d$ features and $k$ features in $S$. Complexity for feature selection differs on the basis of search strategy. For forward selection and backward elimination, in the worst case, it requires $((d^2-d))⁄2$ operations. However, for best first strategy, it depends on the stopping criterion. In sum, CFS reduces the complexity to a quadratic function with respect to the number of training samples, initial features and optimal features.
\section{Experiments}
\subsection{Comparison of Classification Algorithm}
As described in section 2.3, Adaboost M1 is a power algorithm to improve accuracy of classification. We designed several experiments to compare the result of Adaboost algorithm on the two selected datasets with their original methods. In particular, we choose C4.5 decision tree as the weak learner of Adaboost M1. Table 1 and Figure 1 shows the detail accuracy of certain classification method on dataset 2, UCI dataset[9]. Multiclass svm and Multiclass Hardware Friendly SVM was used in [9] on dataset 2 with $70\%$ training set and $30\%$ test set. We compute the accuracy of simple C4.5 algorithm and the Adaboost M1 using C4.5 as weak learner. Additionally, [14] stated that $100$ is a proper number of iteration for Adaboost M1 thus we choose $10$ and $100$ as two different iteration numbers of Adaboost in our experiment. In order to be generally comparable, for each method we use both $10$-fold cross validation and test set as test options. Similarly, same approaches and $10$-fold cross validation was apllied to dataset 1, WISDM dataset[7]. Table 2 shows the results comparing to Multilayer Perceptron which is originally used in [7].
\begin{table}[t]
\caption{Result for UCI HAR dataset: (number of attributes: $561$, training instances: $7352$ )}
\label{UCI HAR dataset Result}
\begin{center}
\begin{tabular}{lllllllll}
\multicolumn{1}{c}{\bf Method} &\multicolumn{1}{c}{\bf Test} &\multicolumn{7}{c}{\bf F-measure accuracy (\%)}
\\
\multicolumn{1}{c}{} &\multicolumn{1}{c}{\bf option} &\multicolumn{1}{c}{\bf Walking} &\multicolumn{1}{c}{\bf Upstairs} &\multicolumn{1}{c}{\bf Downstairs} &\multicolumn{1}{c}{\bf Standing} &\multicolumn{1}{c}{\bf Sitting} &\multicolumn{1}{c}{\bf Laying} &\multicolumn{1}{c}{\bf Overall}
\\ \hline \\
MC-svm &testset &$91.2$ &$77.5$ &$77.5$ &$95.0$ &$96.0$ &$100.0$ &$89.3$ \\
MC-HF-svm &testset &$91.1$ &$76.6$ &$76.7$ &$94.9$ &$95.5$ &$100.0$ &$89.0$ \\
C4.5 &10cv &$95.2$ &$94.5$ &$94.0$ &$93.8$ &$93.3$ &$100.0$ &$95.3$ \\
C4.5 &testset &$80.7$ &$72.2$ &$79.1$ &$83.0$ &$79.6$ &$100.0$ &$82.9$ \\
AdaBoost(10it) &10cv &$99.2$ &$99.2$ &$99.0$ &$96.1$ &$95.8$ &$100.0$ &$98.1$\\
AdaBoost(10it) &testset &$93.1$ &$88.6$ &$92.1$ &$87.6$ &$85.0$ &$100.0$ &$99.2$ \\
AdaBoost(100it)&10cv &$99.6$ &$99.7$ &$99.6$ &$98.4$ &$98.1$ &$100.0$ &$99.2$ \\
AdaBoost(100it)&testset &$96.2$ &$91.8$ &$94.0$ &$90.5$ &$89.1$ &$100.0$ &$93.6$ \\
\end{tabular}
\end{center}
\end{table}
\begin{figure}[t]
\begin{center}
% \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
\includegraphics[width=0.4\textwidth]{Fig2.png}
\end{center}
\caption{Test Accuracy with Full Feature Set on UCI Dataset}
\end{figure}
\begin{figure}[t]
\centering
\subfigure[10 fold CV Accuracy with Selected Feature]{
\label{Fig.1}
\includegraphics[width=0.4\textwidth]{Fig31.png}}
\subfigure[Test Accuracy with Selected Feature Set]{
\label{Fig.2}
\includegraphics[width=0.4\textwidth]{Fig32.png}}
\caption{10 fold CV Accuracy and Test Accuracy with Selected Feature Set on UCI Dataset}
\label{Fig.lable}
\end{figure}
\begin{figure}[t]
\begin{center}
% \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
\includegraphics[width=0.4\textwidth]{Fig4.png}
\end{center}
\caption{Test and CV Accuracy Curve with Increasing Computation Time}
\end{figure}
\begin{table}[t]
\caption{Result for WISDM dataset:(number of attributes: $43$, training instances: $5418$)}
\label{WISDM dataset Result}
\begin{center}
\begin{tabular}{llllllllll}
\multicolumn{1}{c}{\bf Method} &\multicolumn{1}{c}{\bf Test} &\multicolumn{7}{c}{\bf F-measure accuracy (\%)} &\multicolumn{1}{c}{\bf Model} \\
\multicolumn{1}{c}{} &\multicolumn{1}{c}{\bf option} &\multicolumn{1}{c}{\bf Walking} &\multicolumn{1}{c}{\bf Jogging} &\multicolumn{1}{c}{\bf Upstairs} &\multicolumn{1}{c}{\bf Downstairs} &\multicolumn{1}{c}{\bf Sitting} &\multicolumn{1}{c}{\bf Standing} &\multicolumn{1}{c}{\bf Overall} &\multicolumn{1}{c}{\bf build} \\
\multicolumn{1}{c}{} &\multicolumn{1}{c}{} &\multicolumn{7}{c}{} &\multicolumn{1}{c}{\bf time(s)}
\\ \hline \\
Multilayer &10cv &$95.9$ &$99$ &$71.2$ &$68.0$ &$98.4$ &$93.4$ &$91.2$ &$213.99$ \\
perceptron \\
C4.5 &10cv &$94.6$ &$95.8$ &$68.0$ &$65.7$ &$97.5$ &$97.2$ &$89.3$ &$1.27$ \\
AdaBoost &10cv &$97.8$ &$98.6$ &$80.9$ &$77.8$ &$98.2$ &$95.9$ &$94.0$ &$8.02$ \\
AdaBoost(100it) &10cv &$98.7$ &$98.8$ &$83.7$ &$80.6$ &$98.5$ &$96.7$ &$95.1$ &$72.08$ \\
\end{tabular}
\end{center}
\end{table}
\subsection{Result Comparison Before and After Feature Seletion}
It is very clear that in both datasets, Adaboost M1 outperformed other methods, especially with $100$ iterations. But in dataset $2$, with $561$ attributes as input feature vector, the computation time for building model is relatively high, which is impossible to be used in real world application. Thus CFS feature selection method is employed to reduce feature dimension. After feature selection, same classification methods were tested with respect to time consumption and classification accuracy. Table 3 shows the comparison of classification time consumption of full feature dataset and selected feature dataset. It shows that running an additional time consuming feature selection algorithm is worthwhile for a complex method such as AdaBoost. The total time spent for feature selection and model building is nearly $10$ times less than building model on full feature set. What’s more important is the accuracy of selected feature set is comparable to that of full feature set. Table 4 shows the accuracy comparison of two feature sets. The accuracy of AdaBoost with $100$ iterations on selected feature set only decreased less than $1\%$. Detailed accuracy results on selected feature set are listed in Figure 2 and Table 5.
\begin{table}[t]
\caption{Time Consumption with Different Feature Sets}
\label{Time Consumption}
\begin{center}
\begin{tabular}{lllll}
\multicolumn{1}{c}{\bf Method} &\multicolumn{1}{c}{\bf Test} &\multicolumn{1}{c}{\bf Model build times(s)} &\multicolumn{1}{c}{\bf Model build time(s)} &\multicolumn{1}{c}{\bf Total time(s) of feature} \\
\multicolumn{1}{c}{} &\multicolumn{1}{c}{\bf option} &\multicolumn{1}{c}{\bf with $561$ features} &\multicolumn{1}{c}{\bf with selected $50$ features} &\multicolumn{1}{c}{\bf selection and model build}
\\ \hline \\
C4.5 &10cv &$14.65$ &$1.21$ &$55.02$ \\
C4.5 &testset &$14.65$ &$1.21$ &$35.60$ \\
AdaBoost &10cv &$172.78$ &$13.34$ &$74.25$ \\
AdaBoost &testset &$172.78$ &$13.34$ &$55.26$ \\
AdaBoost(100it) &10cv &$2080.41$ &$147.38$ &$255.68$ \\
AdaBoost(100it) &testset &$2080.41$ &$147.38$ &$215.32$ \\
\end{tabular}
\end{center}
\end{table}
% \begin{figure}[t]
% \centering
% \subfigure[10 fold CV Accuracy with Selected Feature]{
% \label{Fig.1}
% \includegraphics[width=0.4\textwidth]{Fig31.png}}
% \subfigure[Test Accuracy with Selected Feature Set]{
% \label{Fig.2}
% \includegraphics[width=0.4\textwidth]{Fig32.png}}
% \caption{10 fold CV Accuracy and Test Accuracy with Selected Feature Set on UCI Dataset}
% \label{Fig.lable}
% \end{figure}
% \begin{figure}
% \begin{center}
% % \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
% \includegraphics[width=0.4\textwidth]{Fig4.png}
% \end{center}
% \caption{Test and CV Accuracy Curve with Increasing Computation Time}
% \end{figure}
% \begin{table}[t]
% \caption{Comparison of Accuracy with Different Feature Sets}
% \label{Comparison of Accuracy}
% \begin{center}
% \begin{tabular}{lllll}
% \multicolumn{1}{c}{\bf Overall} &\multicolumn{1}{c}{\bf 10-fold with} &\multicolumn{1}{c}{\bf 10-fold with} &\multicolumn{1}{c}{\bf Test with} &\multicolumn{1}{c}{\bf Test with} \\
% \multicolumn{1}{c}{\bf accuracy} &\multicolumn{1}{c}{\bf full features} &\multicolumn{1}{c}{\bf selected features} &\multicolumn{1}{c}{\bf full features} &\multicolumn{1}{c}{\bf selected features}
% \\ \hline \\
% C4.5 &$95.3$ &$95.5$ &$82.9$ &$84.6$ \\
% AdaBoost(10it) &$98.1$ &$98.3$ &$91.1$ &$90.7$ \\
% AdaBoost(100it)&$99.2$ &$99.1$ &$93.6$ &$93.1$ \\
% \end{tabular}
% \end{center}
% \end{table}
% \begin{table}[H]
% \caption{Result on Reduced Feature sets: (number of attributes redueced to $50$, training instances: $7352$)}
% \label{Reduced Feature Sets Result}
% \begin{center}
% \begin{tabular}{lllllllll}
% \multicolumn{1}{c}{\bf Method} &\multicolumn{1}{c}{\bf Test} &\multicolumn{7}{c}{\bf F-measure accuracy (\%)}
% \\
% \multicolumn{1}{c}{} &\multicolumn{1}{c}{\bf option} &\multicolumn{1}{c}{\bf Walking} &\multicolumn{1}{c}{\bf Upstairs} &\multicolumn{1}{c}{\bf Downstairs} &\multicolumn{1}{c}{\bf Standing} &\multicolumn{1}{c}{\bf Sitting} &\multicolumn{1}{c}{\bf Laying} &\multicolumn{1}{c}{\bf Overall}
% \\ \hline \\
% C4.5 &10cv &$95.2$ &$94.6$ &$95.2$ &$94.0$ &$93.6$ &$100.0$ &$95.5$ \\
% C4.5 &testset &$83.4$ &$79.1$ &$83.6$ &$81.8$ &$78.5$ &$100$ &$84.6$ \\
% AdaBoost(10it) &10cv &$99.1$ &$98.8$ &$98.8$ &$96.9$ &$96.6$ &$99.9$ &$98.3$ \\
% AdaBoost(10it) &testset &$92.7$ &$88.5$ &$92.0$ &$86.7$ &$83.7$ &$100.0$ &$90.7$ \\
% AdaBoost(100it) &10cv &$99.6$ &$99.6$ &$99.5$ &$98.3$ &$98.1$ &$98.3$ &$99.1$ \\
% AdaBoost(100it) &testset &$96.5$ &$92.3$ &$93.3$ &$89.2$ &$87.1$ &$100.0$ &$93.1$ \\
% \end{tabular}
% \end{center}
% \end{table}
% \begin{figure}[H]
% \centering
% \subfigure[10 fold CV Accuracy with Selected Feature]{
% \label{Fig.1}
% \includegraphics[width=0.4\textwidth]{Fig31.png}}
% \subfigure[Test Accuracy with Selected Feature Set]{
% \label{Fig.2}
% \includegraphics[width=0.4\textwidth]{Fig32.png}}
% \caption{10 fold CV Accuracy and Test Accuracy with Selected Feature Set on UCI Dataset}
% \label{Fig.lable}
% \end{figure}
\subsection{Proper Iterations on AdaBoost M1}
The feature selection nethod significantly reduced the computation complexity of AdaBoost algorithm. Since the iteration number of AdaBoost is very important and highly related to classification accuracy, choosing a proper iteration number with reasonable computation time is critical. Table 6 shows the comparison of test accuracy and 10 fold cross validation accuracy with different iteration numbers from $10$ to $100$. Figure 3 shows the relationship curve between accuracy and time cost. With increasing number of iteration, AdaBoost M1 does not have over fitting problem. Both test accuracy and cross validation accuracy increased with higher computation time and the highest accuracy occurs at the largest iteration number. Time consumption and accuracy tradeoff might be needed especially in mobile computing.
% \begin{table}[t]
% \caption{Result on Reduced Feature sets: (number of attributes redueced to $50$, training instances: $7352$)}
% \label{Reduced Feature Sets Result}
% \begin{center}
% \begin{tabular}{lllllllll}
% \multicolumn{1}{c}{\bf Method} &\multicolumn{1}{c}{\bf Test} &\multicolumn{7}{c}{\bf F-measure accuracy (\%)}
% \\
% \multicolumn{1}{c}{} &\multicolumn{1}{c}{\bf option} &\multicolumn{1}{c}{\bf Walking} &\multicolumn{1}{c}{\bf Upstairs} &\multicolumn{1}{c}{\bf Downstairs} &\multicolumn{1}{c}{\bf Standing} &\multicolumn{1}{c}{\bf Sitting} &\multicolumn{1}{c}{\bf Laying} &\multicolumn{1}{c}{\bf Overall}
% \\ \hline \\
% C4.5 &10cv &$95.2$ &$94.6$ &$95.2$ &$94.0$ &$93.6$ &$100.0$ &$95.5$ \\
% C4.5 &testset &$83.4$ &$79.1$ &$83.6$ &$81.8$ &$78.5$ &$100$ &$84.6$ \\
% AdaBoost(10it) &10cv &$99.1$ &$98.8$ &$98.8$ &$96.9$ &$96.6$ &$99.9$ &$98.3$ \\
% AdaBoost(10it) &testset &$92.7$ &$88.5$ &$92.0$ &$86.7$ &$83.7$ &$100.0$ &$90.7$ \\
% AdaBoost(100it) &10cv &$99.6$ &$99.6$ &$99.5$ &$98.3$ &$98.1$ &$98.3$ &$99.1$ \\
% AdaBoost(100it) &testset &$96.5$ &$92.3$ &$93.3$ &$89.2$ &$87.1$ &$100.0$ &$93.1$ \\
% \end{tabular}
% \end{center}
% \end{table}
% \begin{figure}[t]
% \begin{center}
% % \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
% \includegraphics[width=0.4\textwidth]{Fig4.png}
% \end{center}
% \caption{Test and CV Accuracy Curve with Increasing Computation Time}
% \end{figure}
% \begin{table}[t]
% \caption{Accuracy and Time Cost with Increasing Iteration number in AdaBoost}
% \label{Accuracy and Time Cost in AdaBoost}
% \begin{center}
% \begin{tabular}{lllllllllll}
% \multicolumn{1}{c}{\bf \# of iteration} &\multicolumn{1}{c}{\bf $10$} &\multicolumn{1}{c}{\bf $20$} &\multicolumn{1}{c}{\bf $20$} &\multicolumn{1}{c}{\bf $40$} &\multicolumn{1}{c}{\bf $50$} &\multicolumn{1}{c}{\bf $60$} &\multicolumn{1}{c}{\bf $70$} &\multicolumn{1}{c}{\bf $80$} &\multicolumn{1}{c}{\bf $90$} &\multicolumn{1}{c}{\bf $100$}
% \\ \hline \\
% Overall 10CV Acc. &$98.31$ &$98.61$ &$98.90$ &$99.98$ &$99.02$ &$99.02$ &$99.05$ &$99.09$ &$99.13$ &$99.14$ \\
% Overall Test Acc. &$90.70$ &$91.86$ &$92.43$ &$92.16$ &$92.57$ &$92.77$ &$92.90$ &$92.67$ &$93.04$ &$93.15$ \\
% \textbf{Time(s)} &$13.34$ &$28.28$ &$40.46$ &$55.79$ &$72.20$ &$87.04$ &$100.00$ &$115.60$ &$128.03$ &$147.38$ \\
% \end{tabular}
% \end{center}
% \end{table}
% \begin{figure}[H]
% \begin{center}
% % \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
% \includegraphics[width=0.4\textwidth]{Fig4.png}
% \end{center}
% \caption{Test and CV Accuracy Curve with Increasing Computation Time}
% \end{figure}
\section{Conclusion}
This project presents experiments of Adaboost.M1 for smartphone-based human activity recognition. Detailed accuracy of Adaboost.M1 and several other methods was compared on two different datasets. In both datasets Adaboost.M1 outperformed commonly used classifiers. To further reduce the feature dimension and make the algorithm applicable, CFS based feature selection method was employed. The selected feature set have remarkable time efficiency along with comparable classification accuracy. Experiment was also performed to show that training with Adaboost.M1 was robust to overfitting.
\subsubsection*{Contributions}
Rao explored AdaBoost M1 and CFS applied in WEKA. Yao did literature review and wrote the introduction. Weipu designed and performed experiments, gathered and processed experiment results.
\subsubsection*{References}
\small{
[1] Minnen David \& Tracy Westeyn \& Daniel Ashbrook \& Peter Presti \& Thad Starner. (March-April2007) Recognizing soldier activities in the field. {\it In 4th International Workshop on Wearable and Implantable Body Sensor Networks (BSN 2007)}, pp. 236-241. Springer Berlin Heidelberg.
[2] Sanchez D. \& Tentori M. \& Favela J.. (March-April 2008) Activity Recognition for the Smart Hospital. {\it Intelligent Systems, IEEE}, vol.23, no.2, pp. 50,57.
[3] Duong T.V. \& Bui H.H.\& Phung D.Q.\& Venkatesh S.. (June 2005) Activity recognition and abnormality detection with the switching hidden semi-Markov model. {\it Computer Vision and Pattern Recognition 2005, CVPR 2005, IEEE Computer Society Conference}, vol.1, pp. 838,845, vol.1, pp. 20-25
[4] Bulling Andreas \& Ulf Blanke \& Bernt Schiele. (2014) A tutorial on human activity recognition using body-worn inertial sensors. {\it ACM Computing Surveys (CSUR) 46}, no.3 (2014): 33.
[5] Foerster F. \& Smeja M. \& Fahrenberg J.. (September 1, 1999) Detection of posture and motion by accelerometry: a validation study in ambulatory monitoring. {\it Computers in Human Behavior}, Volume 15, Issue 5, pp. 571-583
[6] Reiss A. \& Stricker D. (August 30 - September 3, 2001) Introducing a modular activity monitoring system. {\it Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE}, pp. 5621,5624.
[7] Jennifer R. Kwapisz \& Gary M. Weiss \& Samuel A. Moore. (2010). Activity recognition using cell phone accelerometers. {\it Proceedings of the Fourth International Workshop on Knowledge Discovery from Sensor Data (at KDD-10)}, Washington D.C..
[8] Mark Hall \& Eibe Frank \& Geoffrey Holmes \& Bernhard Pfahringer \& Peter Reutemann \& Ian H. Witten. (2009) The WEKA data mining software: an update. {\it SIGKDD Explorations}, vol.11, issue 1.
[9] Davide Anguita \& Alessandro Ghio \& Luca Oneto \& Xavier Parra \& Jorge L. \& Reyes-Ortiz. (December 2012) Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain.
[10] Freund Yoav \& Robert E. Schapire. (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. {\it In computational learning theory}, pp.23-37. Springer Berlin Heidelberg.
[11] Freund Yoav \& Robert E. Schapire \& N. Abe. (1999) A short introduction to boosting. {\it Journal-Japanese Society For Artificial Intelligence}, 14, no.771-780 (1999): 1612.
[12] Freund Yoav \& Robert E. Schapire. (1996) Experiments with a new boosting algorithm. {\it In ICML}, vol.96, pp. 148-156.
% [13] Yoav Freund \& Robert E. Schapire. (August 1997) A decision-theoretic generalization of on-line learning and an application to boosting. {\it Journal of Computer and System Sciences}, 55(1), pp. 130-134
[13] Hall M. A. (1999) Correlation-based feature selection for machine learning. Doctoral dissertation. The University of Waikato.
% [15] Langley P.. (1994) Selection of relevant features in machine learning. {\it In Proceedings of the AAAI Fall Symposium on Relevance}. AAAI Press.
[14] Reiss A.\& Hendeby G. \& Stricker D. (2013). A competitive approach for human activity recognition on smartphones. {\it In ESANN 2013}, pp. 455-460
\begin{table}[H]
\caption{Result on Reduced Feature sets: (number of attributes redueced to $50$, training instances: $7352$)}
\label{Reduced Feature Sets Result}
\begin{center}
\begin{tabular}{lllllllll}
\multicolumn{1}{c}{\bf Method} &\multicolumn{1}{c}{\bf Test} &\multicolumn{7}{c}{\bf F-measure accuracy (\%)}
\\
\multicolumn{1}{c}{} &\multicolumn{1}{c}{\bf option} &\multicolumn{1}{c}{\bf Walking} &\multicolumn{1}{c}{\bf Upstairs} &\multicolumn{1}{c}{\bf Downstairs} &\multicolumn{1}{c}{\bf Standing} &\multicolumn{1}{c}{\bf Sitting} &\multicolumn{1}{c}{\bf Laying} &\multicolumn{1}{c}{\bf Overall}
\\ \hline \\
C4.5 &10cv &$95.2$ &$94.6$ &$95.2$ &$94.0$ &$93.6$ &$100.0$ &$95.5$ \\
C4.5 &testset &$83.4$ &$79.1$ &$83.6$ &$81.8$ &$78.5$ &$100$ &$84.6$ \\
AdaBoost(10it) &10cv &$99.1$ &$98.8$ &$98.8$ &$96.9$ &$96.6$ &$99.9$ &$98.3$ \\
AdaBoost(10it) &testset &$92.7$ &$88.5$ &$92.0$ &$86.7$ &$83.7$ &$100.0$ &$90.7$ \\
AdaBoost(100it) &10cv &$99.6$ &$99.6$ &$99.5$ &$98.3$ &$98.1$ &$98.3$ &$99.1$ \\
AdaBoost(100it) &testset &$96.5$ &$92.3$ &$93.3$ &$89.2$ &$87.1$ &$100.0$ &$93.1$ \\
\end{tabular}
\end{center}
\end{table}
%\begin{figure}[t]
% \begin{center}
% % \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
% \includegraphics[width=0.4\textwidth]{Fig4.png}
% \end{center}
% \caption{Test and CV Accuracy Curve with Increasing Computation Time}
% \end{figure}
\begin{table}[h]
\caption{Comparison of Accuracy with Different Feature Sets}
\label{Comparison of Accuracy}
\begin{center}
\begin{tabular}{lllll}
\multicolumn{1}{c}{\bf Overall} &\multicolumn{1}{c}{\bf 10-fold with} &\multicolumn{1}{c}{\bf 10-fold with} &\multicolumn{1}{c}{\bf Test with} &\multicolumn{1}{c}{\bf Test with} \\
\multicolumn{1}{c}{\bf accuracy} &\multicolumn{1}{c}{\bf full features} &\multicolumn{1}{c}{\bf selected features} &\multicolumn{1}{c}{\bf full features} &\multicolumn{1}{c}{\bf selected features}
\\ \hline \\
C4.5 &$95.3$ &$95.5$ &$82.9$ &$84.6$ \\
AdaBoost(10it) &$98.1$ &$98.3$ &$91.1$ &$90.7$ \\
AdaBoost(100it)&$99.2$ &$99.1$ &$93.6$ &$93.1$ \\
\end{tabular}
\end{center}
\end{table}
\begin{table}[h]
\caption{Accuracy and Time Cost with Increasing Iteration number in AdaBoost}
\label{Accuracy and Time Cost in AdaBoost}
\begin{center}
\begin{tabular}{lllllllllll}
\multicolumn{1}{c}{\bf \# of iteration} &\multicolumn{1}{c}{\bf $10$} &\multicolumn{1}{c}{\bf $20$} &\multicolumn{1}{c}{\bf $20$} &\multicolumn{1}{c}{\bf $40$} &\multicolumn{1}{c}{\bf $50$} &\multicolumn{1}{c}{\bf $60$} &\multicolumn{1}{c}{\bf $70$} &\multicolumn{1}{c}{\bf $80$} &\multicolumn{1}{c}{\bf $90$} &\multicolumn{1}{c}{\bf $100$}
\\ \hline \\
Overall 10CV Acc. &$98.31$ &$98.61$ &$98.90$ &$99.98$ &$99.02$ &$99.02$ &$99.05$ &$99.09$ &$99.13$ &$99.14$ \\
Overall Test Acc. &$90.70$ &$91.86$ &$92.43$ &$92.16$ &$92.57$ &$92.77$ &$92.90$ &$92.67$ &$93.04$ &$93.15$ \\
\textbf{Time(s)} &$13.34$ &$28.28$ &$40.46$ &$55.79$ &$72.20$ &$87.04$ &$100.00$ &$115.60$ &$128.03$ &$147.38$ \\
\end{tabular}
\end{center}
\end{table}
\end{document}