Audio and Speech Processing

Authors and titles for recent submissions, skipping first 25

[ total of 41 entries: 1-25 | 26-41 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 29 May 2024 (continued, showing last 3 of 8 entries)

[26] arXiv:2405.17809 (cross-list from cs.CL) [pdf, other]: Title: TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.17615 (cross-list from cs.SD) [pdf, other]: Title: Listenable Maps for Zero-Shot Audio Classifiers

Authors: Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[28] arXiv:2405.17569 (cross-list from cs.LG) [pdf, other]: Title: Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Authors: Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo Cândido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

Comments: 5 pages, 2 figures, 1 table. Published in Artificial Intelligence in Medicine (AIME) 2023

Journal-ref: Artificial Intellingence in Medicine Proceedings 2023, page 271-275

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 28 May 2024

[29] arXiv:2405.17364 [pdf, other]: Title: Speech Loudness in Broadcasting and Streaming

Authors: Matteo Torcoli, Mhd Modar Halimeh, Thomas Leitz, Yannik Grewe, Michael Kratschmer, Bernhard Neugebauer, Adrian Murtaza, Harald Fuchs, Emanuël A. P. Habets

Comments: Accepted for presentation at the Audio Engineering Society (AES) 156th Convention, June 2024, Madrid, Spain

Subjects: Audio and Speech Processing (eess.AS)
[30] arXiv:2405.16952 [pdf, other]: Title: A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2405.16834 [pdf, other]: Title: Speech enhancement deep-learning architecture for efficient edge processing

Authors: Monisankha Pal, Arvind Ramanathan, Ted Wada, Ashutosh Pandey

Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2405.16677 [pdf, other]: Title: Crossmodal ASR Error Correction with Discrete Speech Units

Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[33] arXiv:2405.17413 (cross-list from cs.SD) [pdf, ps, other]: Title: Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization

Authors: Navin Kamuni, Dheerendra Panwar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2405.17100 (cross-list from cs.CR) [pdf, other]: Title: Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2405.17028 (cross-list from cs.SD) [pdf, other]: Title: RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2405.16797 (cross-list from cs.SD) [pdf, ps, other]: Title: A Real-Time Voice Activity Detection Based On Lightweight Neural

Authors: Jidong Jia, Pei Zhao, Di Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[37] arXiv:2405.16687 (cross-list from cs.SD) [pdf, other]: Title: Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Authors: Xavier Riley, Simon Dixon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2405.16136 (cross-list from cs.AI) [pdf, other]: Title: C3LLM: Conditional Multimodal Content Generation Using Large Language Models

Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2405.16000 (cross-list from cs.SD) [pdf, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Authors: Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40] arXiv:2405.15923 (cross-list from eess.SP) [pdf, ps, other]: Title: Spiketrum: An FPGA-based Implementation of a Neuromorphic Cochlea

Authors: MHD Anas Alsakkal, Jayawan Wijekoon

Comments: To be published at "IEEE Transactions on Circuits and Systems"

Subjects: Signal Processing (eess.SP); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2405.15863 (cross-list from cs.SD) [pdf, other]: Title: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

[ total of 41 entries: 1-25 | 26-41 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2406, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 25

Wed, 29 May 2024 (continued, showing last 3 of 8 entries)

Tue, 28 May 2024