[SED] DCASE 2023 Task 4 Baseline test2

Data 가 다운이 안되는 이슈로 인해 따로 데이터를 받아서 정리했다.

어떤 종류의 데이터인지 확인 : 44k frequency sampling data

받은 데이터 수는 다음과 같고

DCASE를 /home 에 다운로드 했다는 가정 하에
해당 zip 파일 저장한 경로로 가서 다음을 실행

1. unzip -o weak.zip -d ~/DESED_task/data/dcase/dataset/audio/train

2. for file in unlabel*.zip; do unzip -o -j "$file" -d ~/DESED_task/data/dcase/dataset/audio/train/unlabel_in_domain; done

3. unzip -o -j validation.zip -d ~/DESED_task/data/dcase/dataset/audio/validation/validation

4. unzip -o -j strong_real.zip -d ~/DESED_task/data/dcase/dataset/audio/train/strong_label_real

for dir in strong_label_real synthetic21_train unlabel_in_domain_16k weak_16k strong_label_real_16k unlabel_in_domain weak; do
  echo "$dir: $(ls -l $dir/*.wav 2>/dev/null | wc -l) files"
done

strong_label_real: 3410 files
synthetic21_train: 0 files
-- soundscapes: 10000 files
-- soundscapes_16k: 10000 files
unlabel_in_domain_16k: 12069 files
weak_16k: 1578 files
strong_label_real_16k: 3224 files
unlabel_in_domain: 14412 files
weak: 1578 files

validation : 1168 files

My dataset

Synthetic training set with strong annotations - 10000 clips

Weak labeled training set - 1578 clips (2244 class occurrences)

Unlabeled in domain training set - 14412 clips

Strong labeled training set - 3470 clips -> 3410 clips

Train

python train_sed.py

Scenario 1 ( 반응시간이 굉장히 중요)

The system needs to react fast upon an event detection (e.g. to trigger an alarm, adapt home automation system...). The localization of the sound event is then really important. The PSDS parameters reflecting these needs are:

Detection Tolerance criterion (DTC): 0.7
Ground Truth intersection criterion (GTC): 0.7
Cost of instability across class (αSTαST): 1
Cost of CTs on user experience (αCTαCT): 0
Maximum False Positive rate (e_max): 100

Scenario 2(정확한 분류)

The system must avoid confusing between classes but the reaction time is less crucial than in the first scenario. The PSDS parameters reflecting these needs are:

Detection Tolerance criterion (DTC): 0.1
Ground Truth intersection criterion (GTC): 0.1
Cost of instability across class (αSTαST): 1
Cross-Trigger Tolerance criterion (cttc): 0.3
Cost of CTs on user experience (αCTαCT): 0.5
Maximum False Positive rate (e_max): 100

- student/scenario1

- student/scenerio2

-teacher/scnersio1

- teacher/synerio2

# Mean-teacher model

- semi-supervised learning
- SED in domestic envrionment
- useful when dealing with limited labeled data and a larger amount of unlabeled data
- strong and weak labels for training

IT에서 살아남기