added AMR demo because AMR was accepted at ICASSP. Congrats!

line · Dec 25, 2024 · e6750b3 · e6750b3
1 parent ba9da72
commit e6750b3
Show file tree

Hide file tree

Showing 4 changed files with 19 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -11,6 +11,7 @@ It supports seven models, four features (video and audio features), and six data
 Furthermore, Lighthouse supports [audio moment retrieval](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/), a task to identify relevant moments from an audio input based on a given text query.
 
 ## News
+- [2024/12/24] Our work ["Language-based audio moment retrieval"](https://arxiv.org/abs/2409.15672) has been accepted at ICASSP 2025.
 - [2024/10/22] [Version 1.0](https://github.com/line/lighthouse/releases/tag/v1.0) has been released.
 - [2024/10/6] Our paper has been accepted at EMNLP2024, system demonstration track.
 - [2024/09/25] Our work ["Language-based audio moment retrieval"](https://arxiv.org/abs/2409.15672) has been released. Lighthouse supports AMR.
@@ -76,9 +77,13 @@ Run `python api_example/amr_demo.py` to reproduce the AMR results.
 For CPU users, set `feature_name='clip'` because CLIP+Slowfast or CLIP+Slowfast+PANNs features are very slow without GPUs.
 
 ## Gradio demo
-Run `python gradio_demo/demo.py`. Upload the video and input text query, and click the blue button. For AMR demo, run `python gradio_demo/amr_demo.py`.
+Run `python gradio_demo/demo.py`. Upload the video and input text query, and click the blue button. For AMR demo, run `python gradio_demo/amr_demo.py`. 
 
-![Gradio demo image](images/demo_improved.png)
+MR-HD demo
+![Gradio demo image](images/vmr_demo.png)
+
+AMR demo
+![Amr demo image](images/amr_demo.png)
 
 ## Supported models, datasets, and features
 ### Models
@@ -239,6 +244,7 @@ zip -r submission.zip val_submission.jsonl test_submission.jsonl
 ```
 
 ## Citation
+Lighthouse
 ```bibtex
 @InProceedings{taichi2024emnlp,
   author    = {Taichi Nishimura and Shota Nakada and Hokuto Munakata and Tatsuya Komatsu},
@@ -247,6 +253,15 @@ zip -r submission.zip val_submission.jsonl test_submission.jsonl
   year      = {2024},
 }
 ```
+Audio moment retrieval
+```bibtex
+@InProceedings{hokuto2025icassp,
+  author    = {Hokuto Munakata and Taichi Nishimura and Shota Nakada and Tatsuya Komatsu},
+  title     = {Language-based Audio Moment Retrieval},
+  booktitle = {IEEE International Conference on Acoustic, Speech, and Signal Processing},
+  year      = {2025},
+}
+```
 
 ## Contributing
 Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

diff --git a/images/amr_demo.png b/images/amr_demo.png
diff --git a/images/demo_improved.png → images/vmr_demo.png b/images/demo_improved.png → images/vmr_demo.png
diff --git a/lighthouse/feature_extractor/audio_encoders/clap_a.py b/lighthouse/feature_extractor/audio_encoders/clap_a.py
@@ -32,7 +32,8 @@ def __init__(self, device: str, cfg: CLAPAudioConfig):
 
     def _preprocess(self, audio: np.ndarray, sr: int) -> torch.Tensor:
         audio_tensor = self._move_data_to_device(audio)
-        audio_tensor = T.Resample(sr, self.sample_rate)(audio_tensor)  # original implementation in msclap
+        resampler = T.Resample(sr, self.sample_rate).to(self._device)
+        audio_tensor = resampler(audio_tensor)  # original implementation in msclap
 
         win_length = int(round(self.window_sec * self.sample_rate))
         hop_length = int(round(self.feature_time * self.sample_rate))