algorithm - Peak signal detection in realtime timeseries data -



update: best performing algorithm so far is one.


this question explores robust algorithms detecting sudden peaks in real-time timeseries data.

consider following dataset:

p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1 1 1 1.1 0.9 1 1.1 1 1 0.9 1, ...      1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1 1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1 1 3, ...       2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1]; 

(matlab format it's not language algorithm)

plot of data

you can see there 3 large peaks , small peaks. dataset specific example of class of timeseries datasets question about. class of datasets has 2 general features:

  1. there basic noise general mean
  2. there large 'peaks' or 'higher data points' deviate noise.

let's assume following:

  • the width of peaks cannot determined beforehand
  • the height of peaks deviates other values
  • the used algorithm must calculate realtime (so change each new datapoint)

for such situation, boundary value needs constructed triggers signals. however, boundary value cannot static , must determined realtime based on algorithm.


my question: algorithm calculate such thresholds in realtime? there specific algorithms such situations? well-known algorithms?


robust algorithms or useful insights highly appreciated. (can answer in language: it's algorithm)

smoothed z-score algo (very robust thresholding algorithm)

i have constructed algorithm works these types of datasets. based on principle of dispersion: if new datapoint given x number of standard deviations away moving mean, algorithm signals (also called z-score). algorithm robust because constructs separate moving mean , deviation, such signals not corrupt threshold. future signals therefore identified approximately same accuracy, regardless of amount of previous signals. algorithm takes 3 inputs: lag = lag of moving window, threshold = z-score @ algorithm signals , influence = influence (between 0 , 1) of new signals on mean , standard deviation. example, lag of 5 use last 5 observations smooth data. threshold of 3.5 signal if datapoint 3.5 standard deviations away moving mean. , influence of 0.5 gives signals half of influence normal datapoints have. likewise, influence of 0 ignores signals recalculating new threshold: influence of 0 therefore robust option; 1 least.

it works follows:

pseudocode

# let y vector of timeseries data of @ least length lag+2 # let mean() function calculates mean # let std() function calculates standard deviaton # let absolute() absolute value function  # settings (the ones below examples: choose best data) set lag 5;          # lag 5 smoothing functions set threshold 3.5;  # 3.5 standard deviations signal set influence 0.5;  # between 0 , 1, 1 normal influence, 0.5 half  # initialise variables set signals vector 0,...,0 of length of y;   # initialise signal results set filteredy y(1),...,y(lag)                # initialise filtered series set avgfilter null;                          # initialise average filter set stdfilter null;                          # initialise std. filter set avgfilter(lag) mean(y(1),...,y(lag));    # initialise first value set stdfilter(lag) std(y(1),...,y(lag));     # initialise first value  i=lag+1,...,t   if absolute(y(i) - avgfilter(i-1)) > threshold*stdfilter(i-1)     if y(i) > avgfilter(i-1)       set signals(i) +1;                     # positive signal     else       set signals(i) -1;                     # negative signal     end     # make influence lower     set filteredy(i) influence*y(i) + (1-influence)*filteredy(i-1);   else     set signals(i) 0;                        # no signal     set filteredy(i) y(i);   end   # adjust filters   set avgfilter(i) mean(filteredy(i-lag),...,filteredy(i));   set stdfilter(i) std(filteredy(i-lag),...,filteredy(i)); end 

demo

demonstration of robust thresholding algorithm

the matlab code demo can found @ end of answer. use demo, run , create time series clicking on upper chart. algorithm starts working after drawing lag number of observations.


appendix 1: matlab , r code algorithm

matlab code

function [signals,avgfilter,stdfilter] = thresholdingalgo(y,lag,threshold,influence) % initialise signal results signals = zeros(length(y),1); % initialise filtered series filteredy = y(1:lag+1); % initialise filters avgfilter(lag+1,1) = mean(y(1:lag+1)); stdfilter(lag+1,1) = std(y(1:lag+1)); % loop on datapoints y(lag+2),...,y(t) i=lag+2:length(y)     % if new value specified number of deviations away     if abs(y(i)-avgfilter(i-1)) > threshold*stdfilter(i-1)         if y(i) > avgfilter(i-1)             % positive signal             signals(i) = 1;         else             % negative signal             signals(i) = -1;         end         % make influence lower         filteredy(i) = influence*y(i)+(1-influence)*filteredy(i-1);     else         % no signal         signals(i) = 0;         filteredy(i) = y(i);     end     % adjust filters     avgfilter(i) = mean(filteredy(i-lag:i));     stdfilter(i) = std(filteredy(i-lag:i)); end % done, return results end 

example:

% data y = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...     1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...     1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...     1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];  % settings lag = 30; threshold = 5; influence = 0;  % results [signals,avg,dev] = thresholdingalgo(y,lag,threshold,influence);  figure; subplot(2,1,1); hold on; x = 1:length(y); ix = lag+1:length(y); area(x(ix),avg(ix)+threshold*dev(ix),'facecolor',[0.9 0.9 0.9],'edgecolor','none'); area(x(ix),avg(ix)-threshold*dev(ix),'facecolor',[1 1 1],'edgecolor','none'); plot(x(ix),avg(ix),'linewidth',1,'color','cyan','linewidth',1.5); plot(x(ix),avg(ix)+threshold*dev(ix),'linewidth',1,'color','green','linewidth',1.5); plot(x(ix),avg(ix)-threshold*dev(ix),'linewidth',1,'color','green','linewidth',1.5); plot(1:length(y),y,'b'); subplot(2,1,2); stairs(signals,'r','linewidth',1.5); ylim([-1.5 1.5]); 

r code

thresholdingalgo <- function(y,lag,threshold,influence) {   signals <- rep(0,length(y))   filteredy <- y[0:lag]   avgfilter <- null   stdfilter <- null   avgfilter[lag] <- mean(y[0:lag])   stdfilter[lag] <- sd(y[0:lag])   (i in (lag+1):length(y)){     if (abs(y[i]-avgfilter[i-1]) > threshold*stdfilter[i-1]) {       if (y[i] > avgfilter[i-1]) {         signals[i] <- 1;       } else {         signals[i] <- -1;       }       filteredy[i] <- influence*y[i]+(1-influence)*filteredy[i-1]     } else {       signals[i] <- 0       filteredy[i] <- y[i]     }     avgfilter[i] <- mean(filteredy[(i-lag):i])     stdfilter[i] <- sd(filteredy[(i-lag):i])   }   return(list("signals"=signals,"avgfilter"=avgfilter,"stdfilter"=stdfilter)) } 

example:

# data y <- c(1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,        1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,        2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1)  lag       <- 30 threshold <- 5 influence <- 0  # run algo lag = 30, threshold = 5, influence = 0 result <- thresholdingalgo(y,lag,threshold,influence)  # plot result par(mfrow = c(2,1),oma = c(2,2,0,0) + 0.1,mar = c(0,0,2,1) + 0.2) plot(1:length(y),y,type="l",ylab="",xlab="")  lines(1:length(y),result$avgfilter,type="l",col="cyan",lwd=2) lines(1:length(y),result$avgfilter+threshold*result$stdfilter,type="l",col="green",lwd=2) lines(1:length(y),result$avgfilter-threshold*result$stdfilter,type="l",col="green",lwd=2) plot(result$signals,type="s",col="red",ylab="",xlab="",ylim=c(-1.5,1.5),lwd=2) 

this code (both languages) yield following result data of original question:

thresholding example matlab code


implementations in other languages:


appendix 2: matlab demonstration code (click make data)

function [] = robustthresholdingdemo()  %% specifications lag         = 5;       % lag smoothing threshold   = 3.5;     % number of st.dev. away mean signal influence   = 0.3;     % when signal: how influence new data? (between 0 , 1)                        % 1 normal influence, 0.5 half       %% start demo demoscreen(30,lag,threshold,influence);  end  function [signals,avgfilter,stdfilter] = thresholdingalgo(y,lag,threshold,influence) signals = zeros(length(y),1); filteredy = y(1:lag+1); avgfilter(lag+1,1) = mean(y(1:lag+1)); stdfilter(lag+1,1) = std(y(1:lag+1)); i=lag+2:length(y)     if abs(y(i)-avgfilter(i-1)) > threshold*stdfilter(i-1)         if y(i) > avgfilter(i-1)             signals(i) = 1;         else             signals(i) = -1;         end         filteredy(i) = influence*y(i)+(1-influence)*filteredy(i-1);     else         signals(i) = 0;         filteredy(i) = y(i);     end     avgfilter(i) = mean(filteredy(i-lag:i));     stdfilter(i) = std(filteredy(i-lag:i)); end end  % demo screen function function [] = demoscreen(n,lag,threshold,influence) figure('position',[200 100,1000,500]); subplot(2,1,1); title(sprintf(['draw data points (%.0f max)      [settings: lag = %.0f, '...     'threshold = %.2f, influence = %.2f]'],n,lag,threshold,influence)); ylim([0 5]); xlim([0 50]); h = gca; subplot(2,1,1); set(h, 'ylimmode', 'manual'); set(h, 'xlimmode', 'manual'); set(h, 'ylim', get(h,'ylim')); set(h, 'xlim', get(h,'xlim')); xg = []; yg = []; i=1:n     try         [xi,yi] = ginput(1);     catch         return;     end     xg = [xg xi]; yg = [yg yi];     if == 1         subplot(2,1,1); hold on;         plot(h, xg(i),yg(i),'r.');          text(xg(i),yg(i),num2str(i),'fontsize',7);     end     if length(xg) > lag         [signals,avg,dev] = ...             thresholdingalgo(yg,lag,threshold,influence);         area(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),...             'facecolor',[0.9 0.9 0.9],'edgecolor','none');         area(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),...             'facecolor',[1 1 1],'edgecolor','none');         plot(xg(lag+1:end),avg(lag+1:end),'linewidth',1,'color','cyan');         plot(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),...             'linewidth',1,'color','green');         plot(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),...             'linewidth',1,'color','green');         subplot(2,1,2); hold on; title('signal output');         stairs(xg(lag+1:end),signals(lag+1:end),'linewidth',2,'color','blue');         ylim([-2 2]); xlim([0 50]); hold off;     end     subplot(2,1,1); hold on;     j=2:i         plot(xg([j-1:j]),yg([j-1:j]),'r'); plot(h,xg(j),yg(j),'r.');         text(xg(j),yg(j),num2str(j),'fontsize',7);     end end end 

the code above has been written such recalculates full algorithm everytime. of course 1 change code such filteredy, avgfilter , stdfilter saved, , values updated when new information arrives (this make algorithm faster). demonstration purposes, decided put code in single function.

if use function somewhere, please credit me or answer. if have questions regarding algorithm, post them in comments below or reach out me on linkedin.


Comments

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

php - Magento - Deleted Base url key -

android - How to disable Button if EditText is empty ? -