Synthetic Data for AI-Driven Detection of Voice Abnormality

Beenish
Sep 8
2 min read

Beenish Zia & Farshid Taghizadeh

Introduction

Artificial Intelligence (AI) is revolutionizing healthcare by enhancing diagnostics, treatment, and operational efficiency. One area that has seen limited research is the use of AI for voice data, particularly synthetic voice data, in detecting medical conditions such as laryngeal cancer. This blog provides a short summary of the study that addresses this gap by developing an AI model trained on synthetic voice data to detect laryngeal cancer. Detailed paper is linked below.

The Challenge

Laryngeal cancer significantly affects the voice, causing hoarseness, breathy voice, pitch shifts, and tremors. Previous studies have shown promise using real voice data, but they often rely on proprietary datasets, making replication difficult. The lack of publicly available models and datasets has hindered progress in this field.

The Solution

The authors of this study developed and open-sourced an AI model trained on synthetic voice data to distinguish between normal and diseased voices. They compared the performance of the model using both synthetic and real voice data, offering valuable insights into the potential of synthetic data for medical diagnostics.

Data Preparation

The study utilized both real and synthetic voice data. Real data was collected from volunteers, while synthetic data was generated using the Google Cloud Text-to-Speech API. A Python-based script was created to simulate the effects of laryngeal cancer on voice by modulating waveforms, shifting formants, and adding noise.

Model Development

The AI model was built using a 1D Convolutional Neural Network (CNN), which is well-suited for analyzing time-series data like audio waveforms. The model was trained on both real and synthetic data, with a sample rate of 16,000 Hz to ensure consistent input size.

Results

The model achieved impressive accuracy rates:

Synthetic Data Only: 91.6% accuracy with smaller training data and 87.5% with larger training data 1 2.
Real Data Only: 80% accuracy, with challenges in detecting diseased voices due to the small dataset 3.
Mixed Data: Combining real, synthetic, and modulated data improved accuracy, with the highest accuracy achieved using synthetic data alone 4 5.

Discussion

The study highlights the potential of synthetic data for training AI models in medical diagnostics. While synthetic data offers a viable path for AI training, mixing real and synthetic data provides a more robust model. The authors emphasize the need for open-source real data to create more accurate and impactful models.

Conclusion

This study demonstrates the potential of synthetic voice data in detecting laryngeal cancer using AI. By open-sourcing their model and data, the authors provide a valuable resource for further research and development in this field. The study underscores the importance of collaboration and open data in advancing AI-driven healthcare solutions.

Link to trained model

Download full paper here:

Synthetic Data for AI-Driven Detection of Voice Abnormality

Recent Posts

Comments

Contact