Abstract of Dissertation

Keyword : Data Mining; Decision Tree; Neural Network; Naive Bayes; Heart Disease

Objective : To study the various Data mining techniques of diagnosis and prediction system. To study the applications and compare the performances of three predictive data mining techniques of heart disease prediction. To study the global market and prospects of predictive diagnosis through data mining.

Background : Work on data mining started long time back in 90s. In a Comparative Study on Heart Disease Prediction System by Revathi et al., Neural Network provided 100% accuracy. A paper by Wadal et al. had developed a prototype Heart Disease Prediction System using 1000 records and 15 medicals attributes developed the most effective model to predict patients with heart disease. A study conducted by Saravanakumar et al. in 2014 proposed a frequent feature selection method for Heart Disease Prediction. Nidhi et al. analyzed the various data mining techniques introduced in recent years for heart disease prediction. Neural network provided 100% of accuracy, compared to Naïve Bayes and Decision Tree which showed a prediction of 90.74% 99.62% respectively.

Methodology : A scientific literature review search was done to know the various predictive diagnosis technique using data mining. Comparison of performances has been made on the lines of Accuracy, Sensitivity, and Specificity between the three most used techniques 1.e. Naïve Bayes, Neural Network, and Decision tree for prediction of Heart disease. Most used Attributes, big players and future trends have also been identified.

Findings : The findings of this study are based on Secondary literature review on heart disease prediction system in which sixty-five studies were reviewed from 2008 to 2016. It was found that patient records/instances are taken maximum in the range of 200-400.Thirteen attributes with one output attribute are most commonly used. Performance measurements was done with the help of a confusion matrix. The overall average Sensitivity, Specificity and Accuracy was calculated for each technique. Among the three, Neural Network and Naïve Bayes have almost equal value at 89.8 and 89.3 per cent respectively. Lowest was for Decision tree at 84.6 per cent.

Recommendations : The market for Predictive analytics has been increasing many folds with emerging economies like Brazil, China and India predicted to contribute a greater share in this segment. This study showed that data mining techniques can be used efficiently to model and predict heart disease cases by identifying the common types of attribute and number of records. The best model reviewed from the previous literature for predicting heart disease was Neural Network, which could not exceed a classification accuracy of 89.8 per cent.

Today, in the medical domain hospitals are storing large amount of heterogeneous data in the form of text, numbers, images, patient records, and test results on a regular basis. This information which is not only valuable for diagnosis and therapy but also rarely used after the actual treatment and the release of the patient for the improvement of medical procedures. Hospitals face high and increasing rates of diseases like Cardiovascular Disease or Heart Disease. The knowledge obtained from the data can support health care workers in treatment planning or diagnosis. Powerful methods from machine learning and data mining can be adapted to meet the specific demands of medical domains.