4. 논문리뷰/Time-series Anomaly detection

TAD 벤치마크 데이터셋

Honestree 2022. 10. 20. 19:14
  • 최신 6개의 논문에서 사용된 Time Series 데이터셋 정리 및 모델 간의 성능 비교를 위해 작성
  • 각 모델별 사용한 데이터의 특성, 모델의 특성이 다르기 때문에 해당 표는 단순히 참고용
  • 논문 목록
    • Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy
      • Reconstruction-based method + Association Discrepancy loss를 추가
    • Time Series Anomaly Detection with Multi-resolution Ensemble Decoding
      • Reconstruction-based method + Multi-resolution 기법을 추가
    • Detecting Anomalous Event Sequences with Temporal Point Processes
      • Temporal point precess에서의 OOD detection을 AD로 해석
    • Neural Contextual Anomaly Detection for Time Series
      • Window 기법
    • Multi-Scale One-Class Recurrent Neural Networks for Discrete Event Sequence Anomaly Detection
      • OC-NN, event sequence + local, global latent space 를 따로 지정해 OC-NN을 수행
      • 로그 메시지를 미리 설정하여 이상현상을 찾아내는 모델
    • Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding
      • 다변량 시계열 데이터를 2가지 관점에서 해석하여 각각을 임베딩하여 학습하는 모델
        • 하나의 변수 내에서 시간적인 변화량 관점
        • 여러 변수간의 상관관계 관점

데이터셋

(1) SMD (Server Machine Dataset, Su et al. (2019))

(2) PSM (Pooled Server Metrics, Abdulaal et al. (2021))

(3) Both MSL (Mars Science Laboratoryrover) and SMAP (Soil Moisture Active Passive satellite)

  • Public datasets from NASA (Hundman et al., 2018) with 55 and 25 dimensions respectively,
  • Contain the telemetry anomaly data derived from the Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems.
  • 데이터 확보가 어려워 일단 제외

(4) SWaT(Secure Water Treatment, Mathur & Tippenhauer (2016))

(5) NeurIPS-TS (NeurIPS 2021 Time Series Benchmark)

  • Proposed by Lai et al. (2021) and includes five time series anomaly scenarios categorized by behavior-driven taxonomy as point-global, pattern-contextual, pattern-shapelet, pattern-seasonal and pattern-trend.

(6) UCR Dataset

  • Multi-dataset Time Series Anomaly Detection Competition of KDD2021.

(7) ECG

(8) 2D-gesture

  • This contains time series of X-Y coordinates of an actor’s right hand. The data is extracted from an video in which the actor grabs a gun from his hip-mounted holster, moves it to the target, and returns it to the holster. The anomalous region is in the area where the actor failsto return his gun to the holster.
  • https://www.cs.ucr.edu/~eamonn/discords/

(9) Power-demand

(10) Yahoo’s S5 Webscope

(11) RUBiS

  • weblog dataset and was generated by an auction site prototype modeled after eBay.com
  • User web behavior including user_id, date, request_inf

(12) Hadoop Distributed File Systme(HDFS)

  • Generated by a Hadoop-based map-reduce cloud environment usingbenchmark workloads
  • 11,175,629 log messages, of which 2.9% are labeled as anomalies by Hadoop experts

(13) BlueGene/L (BGL)

  • 4,747,936 log messages generated by a BlueGene/L supercomputer system with 131,072 processors and 32,768 GB memory at Lawrence Livermore National Labs.

(14) WADI

  • Water Distribution

(15) ASD