4. 논문리뷰/Time-series Anomaly detection
TAD 벤치마크 데이터셋
Honestree
2022. 10. 20. 19:14
- 최신 6개의 논문에서 사용된 Time Series 데이터셋 정리 및 모델 간의 성능 비교를 위해 작성
- 각 모델별 사용한 데이터의 특성, 모델의 특성이 다르기 때문에 해당 표는 단순히 참고용
- 논문 목록
- Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy
- Reconstruction-based method + Association Discrepancy loss를 추가
- Time Series Anomaly Detection with Multi-resolution Ensemble Decoding
- Reconstruction-based method + Multi-resolution 기법을 추가
- Detecting Anomalous Event Sequences with Temporal Point Processes
- Temporal point precess에서의 OOD detection을 AD로 해석
- Neural Contextual Anomaly Detection for Time Series
- Window 기법
- Multi-Scale One-Class Recurrent Neural Networks for Discrete Event Sequence Anomaly Detection
- OC-NN, event sequence + local, global latent space 를 따로 지정해 OC-NN을 수행
- 로그 메시지를 미리 설정하여 이상현상을 찾아내는 모델
- Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding
- 다변량 시계열 데이터를 2가지 관점에서 해석하여 각각을 임베딩하여 학습하는 모델
- 하나의 변수 내에서 시간적인 변화량 관점
- 여러 변수간의 상관관계 관점
- 다변량 시계열 데이터를 2가지 관점에서 해석하여 각각을 임베딩하여 학습하는 모델
- Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy
데이터셋
(1) SMD (Server Machine Dataset, Su et al. (2019))
- 5-week-long dataset that is collected from a large Internet company with 38 dimensions.
- https://github.com/NetManAIOps/OmniAnomaly/tree/master/ServerMachineDataset
- https://gudwns1215.medium.com/how-to-time-series-anomaly-detection-using-deep-learning-edfc2d1dbbc8
(2) PSM (Pooled Server Metrics, Abdulaal et al. (2021))
- Multiple application server nodes at eBay with 26 dimensions.
- https://github.com/eBay/RANSynCoders/tree/main/data
(3) Both MSL (Mars Science Laboratoryrover) and SMAP (Soil Moisture Active Passive satellite)
Public datasets from NASA (Hundman et al., 2018) with 55 and 25 dimensions respectively,Contain the telemetry anomaly data derived from the Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems.- 데이터 확보가 어려워 일단 제외
(4) SWaT(Secure Water Treatment, Mathur & Tippenhauer (2016))
- Obtained from 51 sensors of the critical infrastructure system under continuous operations.
- https://itrust.sutd.edu.sg/itrust-labs_datasets/
(5) NeurIPS-TS (NeurIPS 2021 Time Series Benchmark)
Proposed by Lai et al. (2021) and includes five time series anomaly scenarios categorized by behavior-driven taxonomy as point-global, pattern-contextual, pattern-shapelet, pattern-seasonal and pattern-trend.
(6) UCR Dataset
- Multi-dataset Time Series Anomaly Detection Competition of KDD2021.
(7) ECG
- This is a collection of 6 data sets on the detection of anomalous beats from electrocardiograms readings.
- https://www.cs.ucr.edu/~eamonn/discords/
(8) 2D-gesture
- This contains time series of X-Y coordinates of an actor’s right hand. The data is extracted from an video in which the actor grabs a gun from his hip-mounted holster, moves it to the target, and returns it to the holster. The anomalous region is in the area where the actor failsto return his gun to the holster.
- https://www.cs.ucr.edu/~eamonn/discords/
(9) Power-demand
- This contains one year of power consumption records measured by a Dutch research facility in 1997.
- https://www.cs.ucr.edu/~eamonn/discords/
(10) Yahoo’s S5 Webscope
- This contains records from real production traffic of the Yahoo website
- https://webscope.sandbox.yahoo.com/catalog.php?datatype=s
(11) RUBiS
- weblog dataset and was generated by an auction site prototype modeled after eBay.com
- User web behavior including user_id, date, request_inf
(12) Hadoop Distributed File Systme(HDFS)
- Generated by a Hadoop-based map-reduce cloud environment usingbenchmark workloads
- 11,175,629 log messages, of which 2.9% are labeled as anomalies by Hadoop experts
(13) BlueGene/L (BGL)
- 4,747,936 log messages generated by a BlueGene/L supercomputer system with 131,072 processors and 32,768 GB memory at Lawrence Livermore National Labs.
(14) WADI
- Water Distribution
(15) ASD
- Application Server Dataset
- https://github.com/zhhlee/InterFusion/tree/main/data