Detecting abnormalities in respiratory sounds using neural networks

Detecting abnormalities in respiratory sounds using neural networks

Kok Chun Yen

Project - Kok Chun Yen__Detecting abnormalities in the lungs

Automatic Detection Of Respiratory Abnormalities Using Neural Networks

Team Member

  • Kok Chun Yen

Class: Thurs 7pm

Data Source(s)

I. Executive Summary

This project aims to explore the possibilities of using machine learning methods, specifically neural networks, to detect abnormalities such as crackles and wheezes in patients from their respiratory sounds. The neural networks considered in this project are the k-nearest neighbors (KNN) and convoluted neural networks (CNN). By extracting multiple unique features from a database of 920 breath sound clips, a trained KNN model and CNN model were able to accurately predict 64% and 58% of the test sample breath sound clips.

I. Problem Statement

Auscultation of the lungs and heart are standard procedures in the examination of a patient. Currently, a doctor has to apply a stethoscope to specific parts of the abdomen in order to listen for problems with the lungs and heart. From the auscultation process, the doctor then has to interpret the sounds and based on his judgement, training and experiences, make a definitive diagnosis.

The aim of this project is to apply machine learning methods to train a model to detect any amnormalities of the lungs. If successful, the model should be able to inform the user of the presence of absence of any abnormalities after listening to any patient's lung sounds. This can reduce the possibilities of errors made by doctors or even aid in ascultation training of healthcare providers. Thus, automating this process has the potential to imrove the uality of patient care. In the future, the model can be upgraded by including input options for patient data such as age or weight to further improve accuracy or even make definitive diagnosis of specific diseases in the lungs.

II. Dataset Selected

A database of respiratory sounds was obtained from the website: ICBHI 2017 Challenge ( The format of the sound clips is in .wav format. This database consists of a total of 5.5 hours of recordings containing 6898 respiratory cycles, of which 1864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.

Three accompanying databases were also obtained from the same website in .txt format.

The databases are as follows:

  1. A zip folder containing the respiration cycle start and end times, the presence of crackles and the presence of wheezes of the respective cycle for each sound clip.
  2. ICBHI_Challenge_demographic_information.txt: A text file containing relevant information of the patients from which the sound clips were collected from. Information includes: Participant ID, Age, Sex, Adult BMI (kg/m2), Child Weight (kg), Child Height (cm)).
  3. ICBHI_Challenge_diagnosis.txt: A text file containing the pathology (or lack thereof) that the patient has.
  4. ICBHI_challenge_train_test.txt: A text file containing the train-test status of each patient's audio clip.

III. Methodology

In this project, I will be attempting to extract relevent features of the audio clips into different variations of neural network models and evaluating their accuracy.

The following are the general steps taken to achieve the project objectives:

A. Data Processing

B. Splitting the database of audio samples into 2 groups for training and testing (Train-Test Split)

C. Priming the data

Parts D, E and F will be repeated for the 2 models.

D. Building the machine learning model

E. Training the machine learning model

F. Evaluation of the model

The necessary libraries are first imported:

In [1]:
import pandas as pd
import numpy as np
import os
import math
import librosa
import librosa.display
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import shutil

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix

from keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization
from keras.models import Sequential, Model
from keras import regularizers, optimizers
from keras.layers import Dense, Conv1D, Flatten, GlobalAveragePooling1D, Activation, MaxPooling1D

%matplotlib inline'ggplot')
Using TensorFlow backend.

A. Data Processing

First, the zip folder of text files ( representing each audio sample are compiled into a single dataframe. Each text file contains the start and end times of each respiration cycle, along with the presence or absence of crackles and wheezes (represented by 1: Present and 0: Absent). The database of respiratory sounds have different lengths and therefore contain different numbers of respiration cycles. All of this data has to be included into the compiled dataframe.

In [2]:
#Reading the database of text files which corresponds to each sound clip
os.chdir('C:\\Users\\Kchun\\Desktop\\Digital Stethoscope\\ICBHI_final_database\\text files')
directory = os.fsencode("/Users/Kchun/Desktop/Digital Stethoscope/ICBHI_final_database/text files/")
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.endswith(".txt") & i == 1:
        text = pd.read_csv(filename, sep = '\t', header = None, names = ['Start', 'End', 'Crackles','Wheezes'])
        text.index = text.index + 1
        text_out = text.stack()
        text_out.index ='{0[1]}_{0[0]}'.format)
        text_out = text_out.to_frame().T
        data = text_out
    elif filename.endswith(".txt"):
        path = os.path.join(filename)
        text = pd.read_csv(path, sep = '\t', header = None, names = ['Start', 'End', 'Crackles','Wheezes'])
        text.index = text.index + 1
        text_out = text.stack()
        text_out.index ='{0[1]}_{0[0]}'.format)
        text_out["File"] = filename
        text_out = text_out.to_frame().T
        data = data.append(text_out, sort = False, ignore_index = True)
#Remove the .txt from the file names under the column labelled 'File'
File Start_1 End_1 Crackles_1 Wheezes_1 Start_2 End_2 Crackles_2 Wheezes_2 Start_3 ... Crackles_31 Wheezes_31 Start_32 End_32 Crackles_32 Wheezes_32 Start_33 End_33 Crackles_33 Wheezes_33
0 101_1b1_Al_sc_Meditron 0.036 0.579 0 0 0.579 2.45 0 0 2.45 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 101_1b1_Pr_sc_Meditron 0.036 1.264 0 0 1.264 3.422 0 0 3.422 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 102_1b1_Ar_sc_Meditron 0.264 1.736 0 0 1.736 3.293 0 0 3.293 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 103_2b2_Ar_mc_LittC2SE 0.364 3.25 0 1 3.25 6.636 0 0 6.636 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 104_1b1_Al_sc_Litt3200 0 1.8771 0 0 1.8771 3.7543 0 0 3.7543 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 133 columns

As evidenced from the compiled dataframe named "data", there are 133 columns with the final few columns containing the start and end times of the last respiration cycle of the audio sample with the longest length as well as the presence of crackles and wheezes in that cycle. Most of the columns are blanks because the audio samples do not all have the same time length.

In order to resolve this, I average the columns that indicate the presence or absence of crackles and wheezes respectively. This is added as two new columns in the dataframe. This indicates the general presence of crackles and wheezes for each audio sample in the respiratory database.

I subsequently drop all unnecessary variables including the individual start, end times of each cycle and the per cycle presence of crackles and wheezes.

In [3]:
copy = data.copy()

for index, row in copy.iterrows():
    count_crack = 0
    total_crack = 0
    count_wheez = 0
    total_wheez = 0
    for columns in range(3,135,4):
        if not math.isnan(row[columns]):
            count_crack=count_crack +1
            total_crack = total_crack + row[columns]
    copy.loc[index,'Crackles'] = (total_crack/count_crack>=0.5)
    for columns in range(4,135,4):
        if not math.isnan(row[columns]):
            count_wheez=count_wheez +1
            total_wheez = total_wheez + row[columns]
    copy.loc[index,'Wheezes'] = (total_wheez/count_wheez>=0.5)
headers = list(copy)
copy.drop(columns = headers[1:133],inplace=True)

To simplify categorisation problem, an additional column is added with the following boolean operators:

  • True: Audio sample contains neither crackles nor wheezes.
  • False: Audio sample contains either crackles or wheezes or both.
In [4]:
copy['Healthy'] = copy['Crackles'] +copy['Wheezes']

def checkifhealthy(copy):
    if copy['Healthy'] == 0:
        return True
    elif copy['Healthy'] >= 1:
        return False
        print('something went wrong')
copy['Healthy'] = copy.apply(lambda row: checkifhealthy(row), axis=1)

data = copy
File Crackles Wheezes Healthy
0 101_1b1_Al_sc_Meditron False False True
1 101_1b1_Pr_sc_Meditron False False True
2 102_1b1_Ar_sc_Meditron False False True
3 103_2b2_Ar_mc_LittC2SE False True False
4 104_1b1_Al_sc_Litt3200 False False True

Next, I combine the data from three text files containing the demographic information, pathology diagnosis and train/test status of the audio database of patients into one dataframe. As there are multiple sound clips recorded from different parts of the body of the same patient, the demographic and diagnosis information have to be matched to the right patient. Thus, the code has to be able to pair the same demographic information and diagnosis to sound clips from the same patient. This would involve putting the same information (of the same patient) to multiple rows.

ICBHI_challenge_train_test.txt: The number of rows in the train_test dataframe matches the number of audio files present in the database. The train/test status are input into the column labelled 'EVENT'.

ICBHI_Challenge_diagnosis.txt: The number of rows in the dataframe matches the number of patients who contributed to the respiratory sound database. The patient id was input into the dataframe as the row index. The respiratory diagnosis of the patients are input into the column labelled 'DIAGNOSIS'.

ICBHI_Challenge_demographic_information.txt: The number of rows in the dataframe matches the number of patients who contributed to the respiratory sound database. The patient id was input into the dataframe as the row index. The demographic information of the patients are input into columns labelled as what they are.

In [5]:
os.chdir('C:\\Users\\Kchun\\Desktop\\Digital Stethoscope\\ICBHI_final_database')

demo_info = pd.read_csv("ICBHI_Challenge_demographic_information.txt", sep = '\t', header = None, names = col_1,index_col = 0)

col_2 = ['DIAGNOSIS']
diagnosis = pd.read_csv("ICBHI_Challenge_diagnosis.txt", sep = '\t', header = None,names = col_2,index_col = 0)

col_3 = ['File','EVENT']
train_test = pd.read_csv("ICBHI_challenge_train_test.txt", sep = '\t', header = None, names = col_3)

patient_info = pd.DataFrame(columns=['File', 'AGE', 'SEX','ADULT BMI','CHILD WEIGHT','CHILD HEIGHT','DIAGNOSIS','EVENT'])

patient_info['File'] = train_test['File']
for row in patient_info['File']:[i, 'AGE'] = demo_info.loc[int(row[0:3]) ,'AGE'][i, 'SEX'] = demo_info.loc[int(row[0:3]) ,'SEX'][i, 'ADULT BMI'] = demo_info.loc[int(row[0:3]) ,'ADULT BMI'][i, 'CHILD WEIGHT'] = demo_info.loc[int(row[0:3]) ,'CHILD WEIGHT'][i, 'CHILD HEIGHT'] = demo_info.loc[int(row[0:3]) ,'CHILD HEIGHT'][i, 'DIAGNOSIS'] = diagnosis.loc[int(row[0:3]) ,'DIAGNOSIS'][i, 'EVENT'] = train_test.loc[int(row[0:3]) ,'EVENT']
0 101_1b1_Al_sc_AKGC417L 3 F NaN 19 99 URTI test
1 101_1b1_Pr_sc_AKGC417L 3 F NaN 19 99 URTI test
2 102_1b1_Ar_sc_AKGC417L 0.75 F NaN 9.8 73 Healthy test
3 103_2b2_Ar_mc_LittC2SE 70 F 33 NaN NaN Asthma test
4 104_1b1_Al_sc_Litt3200 70 F 28.47 NaN NaN COPD test

Because the diagnosis column is in string format, a machine learning model may not be able to process it. Thus, for simplicity and convenience, the different diagnosis are converted to integers and saved as a new column: "DIAGNOSIS INDEX". Thus, a function is written to convert the diagnosis to indexes.

In [ ]:
d = np.unique(patient_info['DIAGNOSIS'])

def convert_to_index(patient_info):
    if patient_info['DIAGNOSIS'] == d[0]:
        return 1
    elif patient_info['DIAGNOSIS'] == d[1]:
        return 2
    elif patient_info['DIAGNOSIS'] == d[2]:
        return 3
    elif patient_info['DIAGNOSIS'] == d[3]:
        return 4
    elif patient_info['DIAGNOSIS'] == d[4]:
        return 0
    elif patient_info['DIAGNOSIS'] == d[5]:
        return 5
    elif patient_info['DIAGNOSIS'] == d[6]:
        return 6
    elif patient_info['DIAGNOSIS'] == d[7]:
        return 7        

Now we apply the function and the output shows that the conversion was successful.

Although the diagnosis column was not used in this project, this process was done just in case it might be needed in the future.

It was not used because initial experiments with the machine learning algorithms to predict diseases did not yield much success. This is largely because of the over representation of 'COPD' and under representation of 'Asthma' and 'LRTI' in the sample (as seen below). Restricting the sample size of those diseases resulted in only 120 sound clips which is too small for deep learning to be applied constructively.

In [ ]:
patient_info['DIAGNOSIS INDEX'] = patient_info.apply(lambda row: convert_to_index(row), axis=1)

copy = patient_info.groupby('DIAGNOSIS').size()


The patient information dataframe named "patient_info" is then simply combined with the audio information dataframe named "data" into a compiled dataframe named "master". The file column from both dataframes should be matched.

I first simply concatenate the two dataframes together without reference because it seems (on first glance) that the file order from both dataframes are in the same order. To ensure that this is true, I iterated through each row of the combined dataframe to check if the file names from both database matched. I printed the boolean output of the check if the number of rows matched equals to the length of the entire dataframe.

In [6]:
#Combining the above dataframe with the dataframe containing information of the patient that produced each sound clip
master = pd.concat([data,patient_info],axis=1, join='outer')

#check if the file names are matched after simply concatenating the 2 dataframes
for row in range(0,920):
    if master.iloc[row,0][:3] == master.iloc[row,4][:3]:
        count = count+1
#Converts dataframe into csv
headers = list(master)

master = master.loc[:, ~master.columns.duplicated()]

As observed from the output True, the combination was successful and correct. This dataframe is then fed into a csv named "Master_database.csv".

B. Train-Test Split

Before creating the machine learning model, the audio database has to be divided into 2 separate folders for ease of training and testing. Luckily, the database site provided the train-test status of each audio clip, assigned during the data collection process.

The file names of a database of audio clips are first compiled into a list named "audio_files". To prevent the wrong file names from being compiled, a '.wav' file extension is required. Unnecessary columns are dropped for convenience.

In [7]:
os.chdir('C:\\Users\\Kchun\\Desktop\\Digital Stethoscope')
df = pd.read_csv('Master_database.csv')
copy = df.copy()
headers = list(copy)
copy.drop(columns = headers[1:9], inplace = True)
copy.drop(columns = 'DIAGNOSIS INDEX',inplace = True)
test = 'Test'
train = 'Train'
files = os.listdir('ICBHI_final_database\\audio files')
audio_files = []
for files in files:
    if '.wav' in files:
                     File  Crackles  Wheezes  Healthy    AGE SEX  ADULT BMI  \
0  101_1b1_Al_sc_Meditron     False    False     True   3.00   F        NaN   
1  101_1b1_Pr_sc_Meditron     False    False     True   3.00   F        NaN   
2  102_1b1_Ar_sc_Meditron     False    False     True   0.75   F        NaN   
3  103_2b2_Ar_mc_LittC2SE     False     True    False  70.00   F      33.00   
4  104_1b1_Al_sc_Litt3200     False    False     True  70.00   F      28.47   

0          19.0          99.0      URTI  test                7  
1          19.0          99.0      URTI  test                7  
2           9.8          73.0   Healthy  test                0  
3           NaN           NaN    Asthma  test                1  
4           NaN           NaN      COPD  test                4  

The working directory is changed to the folder that contains the audio files. The files names are iterated, matched to the corresponding test-train status and classified into their respective train and test folders.

The number of audio clips in the train and test folders are computed and shown.

In [ ]:
os.chdir('C:\\Users\\Kchun\\Desktop\\Digital Stethoscope\\ICBHI_final_database\\audio files')

for index, f in copy.iterrows():
    if copy.loc[index, 'EVENT'] == 'test':
        shutil.move(f[0]+'.wav', test)
    elif copy.loc[index, 'EVENT'] == 'train':
        shutil.move(f[0]+'.wav', train)
        print("weird result")
train = os.listdir('Train')
print('The size of the train sample is '+str(len(train))+'!')
test = os.listdir('Test')
print('The size of the test sample is '+str(len(test))+'!')

Before moving on, there is a need to assess the train and test sample sets to check if it meets the following criteria.

Both train and test sample sets should:

  • contain enough of normal and abnormal lung sounds.
  • Have roughly equal proportions of both normal and abnormal lung sound.
In [8]:
test_sample = df.copy()
train_sample = df.copy()
test_sample = test_sample[test_sample['EVENT'] == 'test']
train_sample = train_sample[train_sample['EVENT'] == 'train']

test_sample = test_sample.groupby('Crackles').size()
train_sample = train_sample.groupby('Crackles').size()
False    87
True     51
dtype: int64
False    493
True     289
dtype: int64

As observed from the output, the training and test sample sets should be adequete for the task at hand.

C. Priming the model

Three functions are created to execute three necessary tasks to prime the data to be fed into the RNN model. The functions and their roles are as follows:

  • extract_features: Reads in a sound clip and converts it into different features, namely: Mel-frequency cepstral coefficients (mfcc), Chroma, Mel Spectrogram (mel), contrast and tonnetz. These features are compressed into a single dimension array by flattening it and finding the mean. The features are then combined together in array form using np.concatenate. The function returns the combined array of features for the sound clip.
  • get_samples: Reads in the folder name at which the sound clips are saved in. This function iterates through all the sound clips in the folder, extracts its features and respective label from the database for each sound clip. The function returns an array of features and array of labels.
  • one_hot_encode: Reads in the categorical labels and converts them into a binary form for each category. Allows the machine learning algorithm to better predict values given the categories by reducing errors from magnitude of labels.
In [ ]:
def extract_feature(file_name):
    sound, sample_rate = librosa.load(file_name)
    #Fourier Transform
    stft = np.abs(librosa.stft(sound))
    mfccs = np.mean(librosa.feature.mfcc(y=sound, sr=sample_rate, n_mfcc=40).T,axis=0).flatten()
    chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0).flatten()
    mel = np.mean(librosa.feature.melspectrogram(sound, sr=sample_rate).T,axis=0).flatten()
    contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0).flatten()
    tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(sound), sr=sample_rate).T,axis=0).flatten()
    combined = np.concatenate((mfccs,chroma,mel,contrast,tonnetz))
    return combined

def get_samples(sub_dir):
    fn = []
    path = 'ICBHI_final_database/audio files/'+sub_dir
    for files in os.listdir(path):
        if files.endswith('.wav'):
    samples = []
    labels = []
    for file_name in fn:
        print('reading in '+file_name+'...')
        row = extract_feature(path +'/' +file_name)
        label = df.loc[file_name.replace('.wav',''),'Healthy']
    return np.array(samples), np.array(labels,dtype =

def one_hot_encode(labels):
    n_labels = len(labels)
    n_unique_labels = len(np.unique(labels))
    one_hot_encode = np.zeros((n_labels,n_unique_labels))
    one_hot_encode[np.arange(n_labels), labels] = 1
    return one_hot_encode

The "get_samples" function is applied to the audio files in the train sample set. The output array of features and array of labels are assigned to the variables tr_features and tr_labels respectively. The same is done for the test sample set. To avoid having to repeat having to extract all the features and labels from all the sound cips, the array of features and labels are saved as .npy files.

In [ ]:
os.chdir('C:\\Users\\Kchun\\Desktop\\Digital Stethoscope')

df = pd.read_csv('Master_database.csv', index_col = 'File')

tr_features,tr_labels = get_samples('Train')
ts_features,ts_labels = get_samples('Test')

tr_labels = one_hot_encode(tr_labels)
ts_labels = one_hot_encode(ts_labels)'x_train', x_train)'x_test', x_test)'y_train', y_train)'y_test', y_test)


Now, the three different neural network models can be built, trained and evaluated.

KNN Model

D. Building the model - KNN

Using SKlearn packages the KNN model can be built. As part of the building process, a graph showing the proportion of variance in data that can be explained by the features extracted from the sound.

In [9]:
tr_features = np.load('x_train.npy')
ts_features = np.load('x_test.npy')
tr_labels = np.load('y_train.npy')
ts_labels = np.load('y_test.npy')

scaler = StandardScaler()
tr_features_scaled = scaler.transform(tr_features)
ts_features_scaled = scaler.transform(ts_features)

pca = PCA().fit(tr_features_scaled)

plt.xlabel('Number of Components')
plt.ylabel('Variance (%)')

As observed from the plot, it can be shown that a significant proportion of variance is explained by a few of the sound features and additional features that were extracted and fed to build the model experiences diminishing marginal returns.

E. Training the model - KNN

The model is trained according to the assigned grid parameters.

In [10]:
grid_params = {
    'n_neighbors': [3, 5, 7, 9, 11, 15],
    'weights': ['uniform', 'distance'],
    'metric': ['euclidean', 'manhattan']

model = GridSearchCV(KNeighborsClassifier(), grid_params, cv=5, n_jobs=-1), tr_labels)
GridSearchCV(cv=5, error_score='raise-deprecating',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'n_neighbors': [3, 5, 7, 9, 11, 15], 'weights': ['uniform', 'distance'], 'metric': ['euclidean', 'manhattan']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

F. Evaluating the model - KNN

The test features and labels are input into the trained model to produce predictions. A prediction score showing the proportion of correct predictions made by the KNN model using the test labels is printed. The predictions are assigned to y_predict for printing if necessary.

In [11]:
print(f'Model Score: {model.score(ts_features_scaled, ts_labels)*100}')

y_predict = model.predict(ts_features)
Model Score: 63.76811594202898

The KNN model achieved an accuracy score of 63.768%.

CNN Model

D. Building the model - CNN

Using the package Keras, an RNN model can be built with three layers for back propagation to exert its effects on the model. A categorical cross entropy loss function was used to build this model as this is a classification problem.

In [ ]:
#create model
model_m = Sequential()

#add model layers
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(193,1)))
model_m.add(Conv1D(100, 10, activation='relu'))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(Dense(2, activation='softmax'))
model_m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


E. Training the model - CNN

The model is trained by feeding in the features and corresponding labels that was previously extracted into the built model. 100 Epochs were run since the input array is only 1 dimensional and thus training the model does not take long.

In [ ]:
tr_features = np.load('x_train.npy')
ts_features = np.load('x_test.npy')
tr_labels = np.load('y_train.npy')
ts_labels = np.load('y_test.npy')
tr_labels = one_hot_encode(tr_labels)
ts_labels = one_hot_encode(ts_labels)

reshape_train = np.reshape(tr_features,(782, 193, 1))
reshape_test = np.reshape(ts_features,(138, 193, 1)), tr_labels, epochs=100, batch_size=1, verbose=1)

F. Evaluating the model - CNN

The test features and labels are input into the trained model to produce predictions. A prediction score showing the proportion of correct predictions made by the CNN model using the test labels is printed.

In [ ]:
scores = model.evaluate(ts_features, ts_labels, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

The CNN model achieved an accuracy score of 58%.

III. Findings

As observed from the accuracy scores of predictions made using the 2 different neural networks, the model with the best accuracy is the KNN model. This is interesting because the KNN model is relatively simpler than its counterpart, the CNN. This could possibly be attributed to the relatively small sample size of breath sounds in the database. As a result, the more sophisticated and complex models suffered from overfitting. This is evidenced by the high training accuracy that the model attained, but poor testing accuracy.

Additionally, during initial experimentation with the models, only one feature (MFCC) was extracted from the breath sounds and used to build the models. Subsequently, it was discovered that by extracting more and more features, the improvements in accuracy increased but at a diminishing rate as each feature is added.

Currently, the best prediction accuracy score of 64% is still too low for application in the healthcare setting. However, this project provides the groundwork for further development of the model. The next step towards a reliable and fully automated abnormality detection algorithm for breath sounds is gathering more data. Deep learning is an extremely powerful tool. But it does require a large amount of data for it to be able to provide definitive prediction.

Weaknesses in data

The data quality makes or breaks a deep learning model. The poor accuracies could be attributed flaws in the data as follows:

  1. Poor sound quality of the breath sounds. A lot of the breath sounds are very noisy. Ambient hospital noises can be heard in the background and the patient's heart beating dominated the sound.

  2. Large noise from the initial placement of the stethoscope was not removed. This could have been avoided by setting an offset window before the sound is read in. This is difficult because there is no consistent time to which all sounds should be offset.

  3. Lack of data There were only 920 sounds. Ideally, each label (normal and abnormal breath sounds) should have 1000 sounds each.

  4. Each sound clip had different time length This resulted in some sound clips having a larger effect on the model as more of its features are extracted. This may have resulted in a skewed dataset. For example, abnormal breath sound clips could generally be longer in length resulting in more features from those sounds resulting in a model that is biased to abnormal predictions.

Future work

In the future, perhaps a Recurrent Neural Network (RNN) may be used to build a model that is able to process a time series data. In this case, sound recorded during each respiration cycle can be fed into the feature extraction function and appended into a multi-dimensional time-dependent array where each sound clip has multiple features extracted from each cycle. This can also resolve the lack of data issue that this database has since many different features are extracted from the same sound. This is very doable since the text files from the original database ( contains the start and end time of each respiration cycle and also indicates the presence of crackles and wheezes in each cycle.

III. Conclusion

To conclude, 2 different neural networks were used to build a deep learning model that is able to detect the presence of crackles or wheezes in human respiratory sounds. Features (mfccs, chroma , mel, contrast, tonnetz) were extracted from the databased of 920 breath sound clips and used to train the 2 neural networks. A predictive accuracy of 64% and 58% were obtained from the KNN and CNN models respectively.

Potentially, patient-centered information can be extracted alongside features from breath sounds. This would allow better detection of abnormalities since the patient's indicating vitals (like age, weight and smoking frequency) is likely to be highly correlated to respiratory abnormalities. For now, however, this project's aim of exploring the possibility of abnormality detection based solely on breath sounds has been successful.

August 12, 2020 Published by  Kok Chun Yen-

Related Topics

Road Safety in Great Britain

Read more
Obesity in America

Obesity in America

Read more
Analysis of Singapore's Human Freedom Index as compared to other countries.

Analysis of Singapore's Human Freedom Index as compared to other countries.

Read more