Skip to content

Commit

Permalink
added AMR demo because AMR was accepted at ICASSP. Congrats!
Browse files Browse the repository at this point in the history
  • Loading branch information
awkrail committed Dec 25, 2024
1 parent ba9da72 commit e6750b3
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 3 deletions.
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ It supports seven models, four features (video and audio features), and six data
Furthermore, Lighthouse supports [audio moment retrieval](https://h-munakata.github.io/Language-based-Audio-Moment-Retrieval/), a task to identify relevant moments from an audio input based on a given text query.

## News
- [2024/12/24] Our work ["Language-based audio moment retrieval"](https://arxiv.org/abs/2409.15672) has been accepted at ICASSP 2025.
- [2024/10/22] [Version 1.0](https://github.com/line/lighthouse/releases/tag/v1.0) has been released.
- [2024/10/6] Our paper has been accepted at EMNLP2024, system demonstration track.
- [2024/09/25] Our work ["Language-based audio moment retrieval"](https://arxiv.org/abs/2409.15672) has been released. Lighthouse supports AMR.
Expand Down Expand Up @@ -76,9 +77,13 @@ Run `python api_example/amr_demo.py` to reproduce the AMR results.
For CPU users, set `feature_name='clip'` because CLIP+Slowfast or CLIP+Slowfast+PANNs features are very slow without GPUs.

## Gradio demo
Run `python gradio_demo/demo.py`. Upload the video and input text query, and click the blue button. For AMR demo, run `python gradio_demo/amr_demo.py`.
Run `python gradio_demo/demo.py`. Upload the video and input text query, and click the blue button. For AMR demo, run `python gradio_demo/amr_demo.py`.

![Gradio demo image](images/demo_improved.png)
MR-HD demo
![Gradio demo image](images/vmr_demo.png)

AMR demo
![Amr demo image](images/amr_demo.png)

## Supported models, datasets, and features
### Models
Expand Down Expand Up @@ -239,6 +244,7 @@ zip -r submission.zip val_submission.jsonl test_submission.jsonl
```

## Citation
Lighthouse
```bibtex
@InProceedings{taichi2024emnlp,
author = {Taichi Nishimura and Shota Nakada and Hokuto Munakata and Tatsuya Komatsu},
Expand All @@ -247,6 +253,15 @@ zip -r submission.zip val_submission.jsonl test_submission.jsonl
year = {2024},
}
```
Audio moment retrieval
```bibtex
@InProceedings{hokuto2025icassp,
author = {Hokuto Munakata and Taichi Nishimura and Shota Nakada and Tatsuya Komatsu},
title = {Language-based Audio Moment Retrieval},
booktitle = {IEEE International Conference on Acoustic, Speech, and Signal Processing},
year = {2025},
}
```

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Expand Down
Binary file added images/amr_demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
3 changes: 2 additions & 1 deletion lighthouse/feature_extractor/audio_encoders/clap_a.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ def __init__(self, device: str, cfg: CLAPAudioConfig):

def _preprocess(self, audio: np.ndarray, sr: int) -> torch.Tensor:
audio_tensor = self._move_data_to_device(audio)
audio_tensor = T.Resample(sr, self.sample_rate)(audio_tensor) # original implementation in msclap
resampler = T.Resample(sr, self.sample_rate).to(self._device)
audio_tensor = resampler(audio_tensor) # original implementation in msclap

win_length = int(round(self.window_sec * self.sample_rate))
hop_length = int(round(self.feature_time * self.sample_rate))
Expand Down

0 comments on commit e6750b3

Please sign in to comment.