algorithm - Peak signal detection in realtime timeseries data -
update: best performing algorithm so far is one.
this question explores robust algorithms detecting sudden peaks in real-time timeseries data.
consider following dataset:
p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1 1 1 1.1 0.9 1 1.1 1 1 0.9 1, ... 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1 1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1 1 3, ... 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];
(matlab format it's not language algorithm)
you can see there 3 large peaks , small peaks. dataset specific example of class of timeseries datasets question about. class of datasets has 2 general features:
- there basic noise general mean
- there large 'peaks' or 'higher data points' deviate noise.
let's assume following:
- the width of peaks cannot determined beforehand
- the height of peaks deviates other values
- the used algorithm must calculate realtime (so change each new datapoint)
for such situation, boundary value needs constructed triggers signals. however, boundary value cannot static , must determined realtime based on algorithm.
my question: algorithm calculate such thresholds in realtime? there specific algorithms such situations? well-known algorithms?
robust algorithms or useful insights highly appreciated. (can answer in language: it's algorithm)
smoothed z-score algo (very robust thresholding algorithm)
i have constructed algorithm works these types of datasets. based on principle of dispersion: if new datapoint given x number of standard deviations away moving mean, algorithm signals (also called z-score). algorithm robust because constructs separate moving mean , deviation, such signals not corrupt threshold. future signals therefore identified approximately same accuracy, regardless of amount of previous signals. algorithm takes 3 inputs: lag = lag of moving window
, threshold = z-score @ algorithm signals
, influence = influence (between 0 , 1) of new signals on mean , standard deviation
. example, lag
of 5 use last 5 observations smooth data. threshold
of 3.5 signal if datapoint 3.5 standard deviations away moving mean. , influence
of 0.5 gives signals half of influence normal datapoints have. likewise, influence
of 0 ignores signals recalculating new threshold: influence of 0 therefore robust option; 1 least.
it works follows:
pseudocode
# let y vector of timeseries data of @ least length lag+2 # let mean() function calculates mean # let std() function calculates standard deviaton # let absolute() absolute value function # settings (the ones below examples: choose best data) set lag 5; # lag 5 smoothing functions set threshold 3.5; # 3.5 standard deviations signal set influence 0.5; # between 0 , 1, 1 normal influence, 0.5 half # initialise variables set signals vector 0,...,0 of length of y; # initialise signal results set filteredy y(1),...,y(lag) # initialise filtered series set avgfilter null; # initialise average filter set stdfilter null; # initialise std. filter set avgfilter(lag) mean(y(1),...,y(lag)); # initialise first value set stdfilter(lag) std(y(1),...,y(lag)); # initialise first value i=lag+1,...,t if absolute(y(i) - avgfilter(i-1)) > threshold*stdfilter(i-1) if y(i) > avgfilter(i-1) set signals(i) +1; # positive signal else set signals(i) -1; # negative signal end # make influence lower set filteredy(i) influence*y(i) + (1-influence)*filteredy(i-1); else set signals(i) 0; # no signal set filteredy(i) y(i); end # adjust filters set avgfilter(i) mean(filteredy(i-lag),...,filteredy(i)); set stdfilter(i) std(filteredy(i-lag),...,filteredy(i)); end
demo
the matlab code demo can found @ end of answer. use demo, run , create time series clicking on upper chart. algorithm starts working after drawing lag
number of observations.
appendix 1: matlab , r code algorithm
matlab code
function [signals,avgfilter,stdfilter] = thresholdingalgo(y,lag,threshold,influence) % initialise signal results signals = zeros(length(y),1); % initialise filtered series filteredy = y(1:lag+1); % initialise filters avgfilter(lag+1,1) = mean(y(1:lag+1)); stdfilter(lag+1,1) = std(y(1:lag+1)); % loop on datapoints y(lag+2),...,y(t) i=lag+2:length(y) % if new value specified number of deviations away if abs(y(i)-avgfilter(i-1)) > threshold*stdfilter(i-1) if y(i) > avgfilter(i-1) % positive signal signals(i) = 1; else % negative signal signals(i) = -1; end % make influence lower filteredy(i) = influence*y(i)+(1-influence)*filteredy(i-1); else % no signal signals(i) = 0; filteredy(i) = y(i); end % adjust filters avgfilter(i) = mean(filteredy(i-lag:i)); stdfilter(i) = std(filteredy(i-lag:i)); end % done, return results end
example:
% data y = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,... 1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,... 1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,... 1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1]; % settings lag = 30; threshold = 5; influence = 0; % results [signals,avg,dev] = thresholdingalgo(y,lag,threshold,influence); figure; subplot(2,1,1); hold on; x = 1:length(y); ix = lag+1:length(y); area(x(ix),avg(ix)+threshold*dev(ix),'facecolor',[0.9 0.9 0.9],'edgecolor','none'); area(x(ix),avg(ix)-threshold*dev(ix),'facecolor',[1 1 1],'edgecolor','none'); plot(x(ix),avg(ix),'linewidth',1,'color','cyan','linewidth',1.5); plot(x(ix),avg(ix)+threshold*dev(ix),'linewidth',1,'color','green','linewidth',1.5); plot(x(ix),avg(ix)-threshold*dev(ix),'linewidth',1,'color','green','linewidth',1.5); plot(1:length(y),y,'b'); subplot(2,1,2); stairs(signals,'r','linewidth',1.5); ylim([-1.5 1.5]);
r code
thresholdingalgo <- function(y,lag,threshold,influence) { signals <- rep(0,length(y)) filteredy <- y[0:lag] avgfilter <- null stdfilter <- null avgfilter[lag] <- mean(y[0:lag]) stdfilter[lag] <- sd(y[0:lag]) (i in (lag+1):length(y)){ if (abs(y[i]-avgfilter[i-1]) > threshold*stdfilter[i-1]) { if (y[i] > avgfilter[i-1]) { signals[i] <- 1; } else { signals[i] <- -1; } filteredy[i] <- influence*y[i]+(1-influence)*filteredy[i-1] } else { signals[i] <- 0 filteredy[i] <- y[i] } avgfilter[i] <- mean(filteredy[(i-lag):i]) stdfilter[i] <- sd(filteredy[(i-lag):i]) } return(list("signals"=signals,"avgfilter"=avgfilter,"stdfilter"=stdfilter)) }
example:
# data y <- c(1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9, 1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3, 2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1) lag <- 30 threshold <- 5 influence <- 0 # run algo lag = 30, threshold = 5, influence = 0 result <- thresholdingalgo(y,lag,threshold,influence) # plot result par(mfrow = c(2,1),oma = c(2,2,0,0) + 0.1,mar = c(0,0,2,1) + 0.2) plot(1:length(y),y,type="l",ylab="",xlab="") lines(1:length(y),result$avgfilter,type="l",col="cyan",lwd=2) lines(1:length(y),result$avgfilter+threshold*result$stdfilter,type="l",col="green",lwd=2) lines(1:length(y),result$avgfilter-threshold*result$stdfilter,type="l",col="green",lwd=2) plot(result$signals,type="s",col="red",ylab="",xlab="",ylim=c(-1.5,1.5),lwd=2)
this code (both languages) yield following result data of original question:
implementations in other languages:
appendix 2: matlab demonstration code (click make data)
function [] = robustthresholdingdemo() %% specifications lag = 5; % lag smoothing threshold = 3.5; % number of st.dev. away mean signal influence = 0.3; % when signal: how influence new data? (between 0 , 1) % 1 normal influence, 0.5 half %% start demo demoscreen(30,lag,threshold,influence); end function [signals,avgfilter,stdfilter] = thresholdingalgo(y,lag,threshold,influence) signals = zeros(length(y),1); filteredy = y(1:lag+1); avgfilter(lag+1,1) = mean(y(1:lag+1)); stdfilter(lag+1,1) = std(y(1:lag+1)); i=lag+2:length(y) if abs(y(i)-avgfilter(i-1)) > threshold*stdfilter(i-1) if y(i) > avgfilter(i-1) signals(i) = 1; else signals(i) = -1; end filteredy(i) = influence*y(i)+(1-influence)*filteredy(i-1); else signals(i) = 0; filteredy(i) = y(i); end avgfilter(i) = mean(filteredy(i-lag:i)); stdfilter(i) = std(filteredy(i-lag:i)); end end % demo screen function function [] = demoscreen(n,lag,threshold,influence) figure('position',[200 100,1000,500]); subplot(2,1,1); title(sprintf(['draw data points (%.0f max) [settings: lag = %.0f, '... 'threshold = %.2f, influence = %.2f]'],n,lag,threshold,influence)); ylim([0 5]); xlim([0 50]); h = gca; subplot(2,1,1); set(h, 'ylimmode', 'manual'); set(h, 'xlimmode', 'manual'); set(h, 'ylim', get(h,'ylim')); set(h, 'xlim', get(h,'xlim')); xg = []; yg = []; i=1:n try [xi,yi] = ginput(1); catch return; end xg = [xg xi]; yg = [yg yi]; if == 1 subplot(2,1,1); hold on; plot(h, xg(i),yg(i),'r.'); text(xg(i),yg(i),num2str(i),'fontsize',7); end if length(xg) > lag [signals,avg,dev] = ... thresholdingalgo(yg,lag,threshold,influence); area(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),... 'facecolor',[0.9 0.9 0.9],'edgecolor','none'); area(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),... 'facecolor',[1 1 1],'edgecolor','none'); plot(xg(lag+1:end),avg(lag+1:end),'linewidth',1,'color','cyan'); plot(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),... 'linewidth',1,'color','green'); plot(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),... 'linewidth',1,'color','green'); subplot(2,1,2); hold on; title('signal output'); stairs(xg(lag+1:end),signals(lag+1:end),'linewidth',2,'color','blue'); ylim([-2 2]); xlim([0 50]); hold off; end subplot(2,1,1); hold on; j=2:i plot(xg([j-1:j]),yg([j-1:j]),'r'); plot(h,xg(j),yg(j),'r.'); text(xg(j),yg(j),num2str(j),'fontsize',7); end end end
the code above has been written such recalculates full algorithm everytime. of course 1 change code such filteredy
, avgfilter
, stdfilter
saved, , values updated when new information arrives (this make algorithm faster). demonstration purposes, decided put code in single function.
if use function somewhere, please credit me or answer. if have questions regarding algorithm, post them in comments below or reach out me on linkedin.
Comments
Post a Comment