Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
-
Updated
May 21, 2023 - Jupyter Notebook
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
Wav2Vec for speech recognition, classification, and audio classification
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
Classification of 11 types of audio clips using MFCCs features and LSTM. Pretrained on Speech Command Dataset with intensive data augmentation.
Transformer-based model for Speech Emotion Recognition(SER) - implemented by Pytorch
It is a full-fetched web application.Based on sentiment classification, by using nltk library it predicts that a speech is how much toxic, sever toxic, insult, obscene, threat.
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
In this challenge, the goal is to learn to recognize which of several English words is pronounced in an audio recording. This is a multiclass classification task.
Speech Classification using Continuous Attention Mechanisms
Gender Classification with different Machine Learning models, using the LibriSpeech ASR dataset.
Qafar-af and Amharic voice Command Recognition project to control the movement of wheelchair
A convolutional neural network for gender classification, which achieved an F1-score of 94.3% when tested on the RAVDESS dataset. Created as postgraduate coursework, the report is included. The report also discusses Sodiq Adebiy's CNN, which I'd recommend looking at to anyone interested in emotion classification.
In this notebook, we aim to recognize speech commands using classification. For this purpose, we used the SPEECHCOMMANDS dataset and the deep convolutional model M5. The code is written in Python and designed for the PyTorch platform.
This repository contains code for all assignments in the Multimedia Computing and Applications (CSE563) course.
Fall 2021 Introduction to Deep Learning - Homework 3 Part 2 (RNN-based phoneme recognition)
A Python implementation of the Iterative Feature Normalization algorithm
CNN Based Approach for Audio File Classification. Contains Notebooks Illustrating Data Preprocessing, Feature Extraction, Model Training, & Model Inference Workflows & Overall Pipeline
This project represents my research on dementia classification using audio data.
Add a description, image, and links to the speech-classification topic page so that developers can more easily learn about it.
To associate your repository with the speech-classification topic, visit your repo's landing page and select "manage topics."