\documentclass[12pt, a4paper]{report}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\graphicspath{ {./images/} }
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{float}
\usepackage{color}
\usepackage{tikz}
\usetikzlibrary{shapes.geometric, arrows}
\usetikzlibrary{decorations.pathreplacing}
\usetikzlibrary{matrix,shapes,arrows,positioning,chains}
\usepackage{lscape}
%\usepackage{natbib}
\usepackage[english]{babel}
\usepackage{longtable}
\usepackage{pdflscape}
\usepackage{multirow,bigdelim,dcolumn,booktabs}
\usepackage{apacite}
\usepackage{setspace}
\usepackage[paperwidth=8.5in,left=1.5in,right=1.0in,top=1.0in,bottom=1.0in,
paperheight=11.0in]{geometry}
\doublespacing
\begin{document}
\begin{titlepage}
\centering
\singlespacing
\includegraphics[angle=0,width=7cm,height=3.0cm]{images/logo.jpg}\\
\vspace{2cm}
\large
\bf TITLE OF FYP PROPOSAL
\vspace{5cm}
Nurul Nisa' Khairol Azmi (GS123456)\\ Nora Basir (GS111111)\\ Noorezatty Mohd Yusop (GS101010) \\
\vspace{5cm}
Supervisor:\\
Jamilah Othman
\vspace{5mm}
December 2019
\date{ }
\end{titlepage}
%\maketitle
\newpage
\pagenumbering{roman}
\tableofcontents
\listoftables
\listoffigures
\newpage
\pagenumbering{arabic}
\chapter{Introduction}
\section{Background of Study}
Smoothing is more of a curve fitting whereby the main purpose is tracing the trend from a set of data series blurred by noise. In a data series, trends provide the direction to choose appropriate method of estimation. Smoothing data series does not necessarily have to be well fitted, but most importantly it has an ability to reduce noise so that overall picture regarding global behavior of data series can be captured. The pattern extracted from smoothing process is able to provides some guideline on a suitable modelling estimation for forecasting purpose. Smoothing does not only helps in curve fitting but also very useful in determining future values by eliminating non-well behave noise. \\
Smoothing by definition varies according to the fields of interest. Some studies, use the term filtering to refer to smoothing for example \cite{ataman1981}, \cite{bovik1983}, \cite{gabbouj1992}, \cite{zeng1994convergence} and \cite{miao2013}. In order to avoid any confusion, the term smoothing is used consistently throughout this research. The main concern of smoothing is to capture underlying pattern by removing unwanted noise from the data series.
\section{Problem Statement}
Linear smoother is optimal to eliminate Gaussian noise and track trends that are common in practice, \cite{bernholt2006}.
However, noise of high volatility tends to mask the general picture of a data series. The existence of non well-behaved noise violated the assumptions of linear model. Usually, least square estimation which is well known for its poor performance in the presence of outliers or long-tailed distribution data is used. \\
According to \cite{venet1990}, linear smoothers also have a high tendency to blur important features and lack of the ability to remove impulsive noise. Not only that, linear smoothers are highly vulnerable to outliers and could not deal well with nonlinearity in a data series. Blurry edge which leads to the lost of important information is actually due to the sudden changes in a series, \cite{bernholt2006}. \\
Due to its ability to remove non-Gaussian noise from a data series, median smoother is usually the favored smoothing tools. Unfortunately, median smoother tends to over smoothed a data series since it eliminates Gaussian noise too.
\section{Research Objectives}
Since there is an opportunity for improvement in compound smoother, some modifications to running median of span size 42 is suggested in this study. The existing study only focuses on noise with long tailed distribution. The pattern with small portion of contaminated can easily be observed with naked eyes. Unfortunately, for data with high fluctuation, the signal might mix up with heavy noise, making it hard to capture any possible trends. In this research, the performance of smoothers in highly volatile data is compared and evaluated. This research provides some values added to the existing study and also motivates future research to expand the idea this study addresses for a better solution.
Guided by the earlier discussion, the purposes of study are summarized as follows;
\begin{enumerate}
\item To modify existing compound smoothers
\item To determine the stability of modified compound smoothers towards block pulse.
\item To evaluate the performance of modified compound smoother via simulation procedure with higher percentage of contaminated normal noise for sinusoidal, Doppler, Bumps, Blocks and Heavy Sine function.
\item To formulate a strategy of forecasting by extracting deterministic components in data series.
\item To apply the proposed modified smoother to financial, environment and agriculture data.
\end{enumerate}
\chapter{Literature Review}
\section{Introduction}
Many theories and research which extended from Tukey's (1977) idea, evolved over the past few decades. Although the literatures cover a variety Tukey's approaches in method of estimation, these reviews only focused on smoothing techniques that have emerged since they have been introduced. In this chapter, some properties of good smoother are discussed in terms of monotonicity, effectiveness, consistency and stability. A review on median smoother particularly on its general behavior towards Gaussian and non-Gaussian noise and deterministic properties is included. A brief summary regarding various type of compound smoothers are also discussed. This chapter also highlights the types of means used in the modification of compound smoother.
\section{Properties of Smoother}
A few issues or concerns are regarding the properties of smoothers and measuring the smoothness of a smoothed series. Estimating signal by smoother does not require strict mathematical or statistical assumptions to be full filled. \cite{jankowitz2007} discusses the properties of good smoother extensively as follows;
\begin{itemize}
\item \textbf{Effectiveness}\\
An effective smoother, let say $S$, if for each $X_{t}$, $S(X_{t})$ plays as a signal and\\ $\left[(I-S)X_{t}=X_{t}-S(X_{t})\right]$ is noise where $I$ is the identity operator, \cite{rohwer2005}. The main objective of smoothers is to ensure that it is an effective smoother. In real situation, determining whether a smoother is effective can be very difficult to be done since a signal is unknown. So, the main purpose is to obtain a good estimator for the signal. To do this, the unwanted noise can be reduced for any given signals. In the case of this study, the effectiveness of smoother is measured via simulation studies. The procedures of simulations are elaborated in Chapter 3.\\
\item \textbf{Consistency}\\
Consistent smoother describes a smoother with the ability to maintain the main features of a signal and equating noise to 0, \cite{rohwer2005}. The terms of idempotency and co-idempotency are highly related and defined as
\end{itemize}
\section{Conclusion}
Some elements of smoother that have been discussed were incorporated with the proposed modified compound smoother, and discussed later in Chapters 4 and 5. The median smoother is described extensively in terms of strength and weakness. Deliberation on some statistical and deterministic properties of median smoother for odd span size, is also included. The idea of compound smoother is extended for improvement and also to provide more options for further analysis. A review on the types of means provides an insight into possible improvement for modification.
\chapter{Methodology}
\section{Introduction}
In this chapter, the method of compound smoother specifically 4253HT is discussed extensively. Simulation was performed by generating deterministic functions added with noise. The procedure of simulation is based on \cite{donoho1994} and \cite{conradie2009}. The existing procedure only take into consideration of extracting Gaussian noise or 10\% contaminated Gaussian noise.
In this study, performance of the compound smoother is widened by measuring the success of recovering the signal from heavy noise. The amount of contaminated normal from non-Gaussian noise was increased to 25\%, 50\% and 75\%.
\section{Compound Smoother-4253HT}
Compound smoother of 4253HT is a combination of algorithm consisting of running median of span size four, two, five and three, followed by Hanning and re-smoothed the rough.
Let a temporal data $\textbf{X}$ be a doubly-infinite sequence of real data $\{X_{-N}, \ldots, X_{t-1}, X_{t},X_{t+1},\ldots, X_{N}\}$, (Mallow, 1980). Let $S$ be a smoother that works on \textbf{X} to generate a data series $S(\textbf{X}_{t})$, smoothed values.\\
The computation of compound smoother started with a median smoother of window size four, re-centered by median smoother of window size two. Then, the smoothed values are re-smoothed by a running
median of span size five and next by running median of span size three. Subsequently, the values
are computed using running weighted average. The
result of this smoothing is polished by computing the rough or residual, applying
the same algorithm of smoothing and adding the result to first pass. Figure \ref{algo4253ht} shows the algorithm involve in compound smoother 4253HT.\\
\tikzset{
desicion/.style={
diamond,
draw,
text width=4em,
text badly centered,
inner sep=0pt
},
block/.style={
rectangle,
draw,
text width=30em,
text centered,
rounded corners
},
cloud/.style={
draw,
ellipse,
minimum height=2em
},
descr/.style={
fill=white,
inner sep=2.5pt
},
connector/.style={
-latex,
font=\scriptsize
},
rectangle connector/.style={
connector,
to path={(\tikztostart) -- ++(#1,0pt) \tikztonodes |- (\tikztotarget) },
pos=0.5
},
rectangle connector/.default=-2cm,
straight connector/.style={
connector,
to path=--(\tikztotarget) \tikztonodes
}
}
\tikzstyle{line} = [draw, -latex']
\begin{figure}
\small
\centering
\begin{tikzpicture}
\matrix (m)[matrix of nodes, column sep=1cm,row sep=6mm, align=center, nodes={rectangle,draw, anchor=center} ]{
|[block]| {Data Series \\ $\textbf{X}=(X_{N},\ldots,X_{t},\ldots,X_{N})$} \\
|[block]| {Median Smoother of Window Size 4\\ $S_{1}(\textbf{X}_{t}) = \text{median}(X_{t-2}, X_{t-1}, X_{t}, X_{t+1})$} \\
|[block]| {Median Smoother of Window Size 2\\ $S_{2}(\textbf{X}_{t})=\text{median}\left[S_{1}(\textbf{X}_{t}), S_{1}(\textbf{X}_{t+1})\right]$} \\
|[block]| {Median Smoother of Window Size 5 \\$S_{3}{(\textbf{X}_{t})}=\text{median}\left[S_{2}(\textbf{X}_{t-2}),S_{2}(\textbf{X}_{t-1}), S_{2}(\textbf{X}_{t}), S_{2}(\textbf{X}_{t+1}), S_{2}(\textbf{X}_{t+2})\right]$} \\
|[block]| {Median Smoother of Window Size 3 \\$S_{4}(\textbf{X}_{t})=\text{median}[S_{3}(\textbf{X}_{t-1}),S_{3}(\textbf{X}_{t}), S_{2}(\textbf{X}_{t+1})]$} \\ |[block]| {Hanning \\ $S_{5}(\textbf{X}_{t})=\frac{1}{4}S_{4}(\textbf{X}_{t-1}) + \frac{1}{2}S_{4}(\textbf{X}_{t}) + \frac{1}{4}S_{4}(\textbf{X}_{t+1})$} \\
|[block]| {Rough \\ $\textbf{e}=(e_{N},\ldots,e_{t},\ldots,e_{N}), e_{t}=\textbf{X}_{t}-S_{5}(\textbf{X}_{t})$} \\
|[block]| {Median Smoother of Window Size 4\\ $S_{1}(\textbf{e}_{t}) = \text{median}(e_{t-2}, e_{t-1}, e_{t}, e_{t+1})$} \\
|[block]| {Median Smoother of Window Size 2\\ $S_{2}(\textbf{e}_{t})=\text{median}\left[S_{1}(\textbf{e}_{t}), S_{1}(\textbf{e}_{t+1})\right]$}\\
|[block]| {Median Smoother of Window Size 5 \\$S_{3}{(\textbf{e}_{t})}=\text{median}\left[S_{2}(\textbf{e}_{t-2}),S_{2}(\textbf{e}_{t-1}), S_{2}(\textbf{e}_{t}), S_{2}(\textbf{e}_{t+1}), S_{2}(\textbf{e}_{t+2})\right]$} \\
|[block]| {Median Smoother of Window Size 3 \\$S_{4}(\textbf{e}_{t})=\text{median}[S_{3}(\textbf{e}_{t-1}),S_{3}(\textbf{e}_{t}), S_{2}(\textbf{e}_{t+1})]$} \\
|[block]| {Hanning \\ $S_{5}(\textbf{e}_{t})=\frac{1}{4}S_{4}(\textbf{e}_{t-1}) + \frac{1}{2}S_{4}(\textbf{e}_{t}) + \frac{1}{4}S_{4}(\textbf{e}_{t+1})$}\\
|[block]| {$S_{6}(\textbf{X}_{t})=S_{5}(\textbf{X}_{t})+S_{5}(\textbf{e}_{t})$} \\
}
;
\path [>=latex,->] (m-1-1) edge (m-2-1);
\path [>=latex,->] (m-2-1) edge (m-3-1);
\path [>=latex,->] (m-3-1) edge (m-4-1);
\path [>=latex,->] (m-4-1) edge (m-5-1);
\path [>=latex,->] (m-5-1) edge (m-6-1);
\path [>=latex,->] (m-6-1) edge (m-7-1);
\path [>=latex,->] (m-7-1) edge (m-8-1);
\path [>=latex,->] (m-8-1) edge (m-9-1);
\path [>=latex,->] (m-9-1) edge (m-10-1);
\path [>=latex,->] (m-10-1) edge (m-11-1);
\path [>=latex,->] (m-11-1) edge (m-12-1);
\path [>=latex,->] (m-12-1) edge (m-13-1);
\end{tikzpicture}
\caption{Algorithm of 4253HT}\label{algo4253ht}
\end{figure}
In general, the
flow of computation are as follows;
\begin{enumerate}
\item Perform running medians of span size four;
\begin{eqnarray}
S_{1}(\textbf{X}_{t})&=& \text{median}(X_{t-2}, X_{t-1}, X_{t}, X_{t+1})\nonumber\\
&=& \text{median}\left[X_{(t-2)}^{*}, X_{(t-1)}^{*}, X_{(t)}^{*}, X_{(t+1)}^{*}\right]\nonumber\\
&=& \text{mean}\left[X_{(t-1)}^{*},X_{(t)}^{*}\right]\nonumber\\
&=& \frac{1}{2}\left[X_{(t-1)}^{*}+X_{(t)}^{*}\right]
\end{eqnarray}
where $X_{t-i}^{*}$ are the ordered observations in a window of size 4, $i=\{-2,-1,0,1\}$.
\item Repeat the running medians of span size five and then running median of span size three. A span of five periods are as follows;
\begin{equation}
S_{3}{(\textbf{X})_{t}}=\text{median}\left[S_{2}(\textbf{X}_{t-2}),S_{2}(\textbf{X}_{t-1}), S_{2}(\textbf{X}_{t}), S_{2}(\textbf{X}_{t+1}), S_{2}(\textbf{X}_{t+2})\right].\\
\end{equation}
\end{enumerate}
\textbf{Simulation Procedure for Doppler Signal}\\
Doppler is a sine function begin with small and fast waves which extendedly become larger and slower as $t$ increases. The function of Doppler can be expressed as;
\begin{equation}
\mu_{t}=[t(1-t)]^\frac{1}{2} \mbox{sin}[2\pi(1+\epsilon)/(t+\epsilon)], \qquad\epsilon=0.05.
\end{equation}
%\newpage
Since the number of observations in each function is $n=2048$, hence $(t_{1},\ldots,t_{n})=\left(\frac{1}{n},\ldots,1\right)$.\\
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{images/doppsign.png}
\caption{Signal of Doppler} \label{dopp}
\end{figure}
Figure \ref{dopp} depicts an example of Doppler function.\\
\begin{landscape}
\thispagestyle{empty}
\begin{table}
\centering
\caption{Summary of performance of modified 4253HT in extracting signal of sinusoidal function with noise added}\label{summary1}
\begin{tabular}{cccccc}
\hline
\multirow{2}{*}{Measurement error}&\multirow{2}{*}{Frequency}&\multicolumn{4}{c}{Noise}\\ \cline{3-6}
&&10\%&25\%&50\%&75\% \\ \hline
\multirow{3}{*}{Regression coefficient}&Low & Geometric &Geometric&Geometric& Geometric\\
&Moderate & Contra harmonic &Harmonic&Harmonic& Geometric\\
&High & Geometric &Geometric&Geometric& Geometric\\
\hline
\multirow{3}{*}{EIMSE}&Low & Quadratic &Quadratic&Quadratic& Contra Harmonic\\
&Moderate &Contra Harmonic &Harmonic&Harmonic& Contra harmonic\\
&High & Contra harmonic &Quadratic&Quadratic& Contra harmonic \\
\hline
\multirow{3}{*}{Variation Reduction}&Low & Contra harmonic &Contra harmonic&Contra harmonic& Contra harmonic\\
&Moderate & Contra harmonic &Contra harmonic&Contra harmonic& Contra harmonic\\
&High & Geometric &Geometric&Geometric& Geometric\\
\hline
\end{tabular}
\end{table}
\end{landscape}
\bibliographystyle{apacite}
\bibliography{bibliografi}
\section*{Appendix A}
\includegraphics[width=1\textwidth]{largepreview.png}
\end{document}