Diabetes_Prediction_Using_Classification_Method

IRFAN ULLAH KHAN

Diabetes_Prediction_Using_Classification_Method

IRFAN ULLAH KHAN

2024, Diabetes Prediction Using Classification Method

visibility

…

description

31 pages

Abstract

The "Diabetes Prediction using classification method" is a project is on creating a predictive model that can accurately identify individuals at risk of diabetes by leveraging classification methods in machine learning. Vital health parameters are analyzed using various algorithms to forecast the probability of an individual developing diabetes. Diabetes is also called Diabetes mellitus (DM). The primary objective of this initiative is to establish a tool for the early detection and prevention of diabetes. Timely and precise prediction of diabetes risk will enable healthcare professionals to implement targeted interventions such as lifestyle adjustments, medication, and ongoing monitoring, thereby potentially mitigating or delaying diabetes-related complications.

Key takeaways
AI

The project aims to create an accurate predictive model for diabetes risk using classification methods.
Logistic Regression achieved the highest accuracy at 74.03% among the algorithms used.
Data preprocessing is critical, involving normalization, imputation, and feature selection for effective model training.
Neural Networks, SVM, Decision Trees, and Logistic Regression are the algorithms employed for diabetes prediction.
Ethical considerations in data privacy and consent are essential throughout the predictive analytics process.

Final Project Report Diabetes Prediction using classification method Project Supervisor Respected Mam Warda Fiaz Submitted By Irfan Ullah Khan BC190400474 Software Projects & Research Section, Department of Computer Sciences, Virtual University of Pakistan CERTIFICATE This is to certify that Irfan Ullah Khan (BC190400474) have worked on and completed their Software Project at Software & Research Projects Section, Department of Computer Sciences, Virtual University of Pakistan in partial fulfillment of the requirement for the degree of BS in Computer Sciences under my guidance and supervision. In our opinion, it is satisfactory and up to the mark and therefore fulfills the requirements of BS in Computer Sciences. Supervisor / Internal Examiner Warda Fiaz Supervisor, Software Projects & Research Section, Department of Computer Sciences Virtual University of Pakistan ___________________ (Signature) External Examiner/Subject Specialist Sheikh Israr Ahmad ___________________ (Signature) Accepted By: _____________ (For office use) Page | 2 Diabetes Prediction using classification method EXORDIUM In the name of Allah, the Compassionate, the Merciful. Praise be to Allah, Lord of Creation, The Compassionate, the Merciful, King of Judgment-day! You alone we worship, and to You alone we pray for help, Guide us to the straight path The path of those who You have favored, Not of those who have incurred Your wrath, Nor of those who have gone astray. Page | 3 Diabetes Prediction using classification method DEDICATION To The Last Prophet Hazrat Muhammad (‫)ﷺ‬ All words dedicated with respect to LOVING PARENTS Whose love and prays always accompanies like a shining star whenever was in darkness. . Page | 4 Diabetes Prediction using classification method ACKNOWLEDGEMENT First of all, I am great full to the ALLAH Almighty the most Merciful and Beneficent who guides us in darkness and helps I difficulties. All respect for his prophet (P.B.U.U) whose gracious favor and blessings enable me to complete this research successfully. I regarded it a great honor and privilege to express my deepest sense of gratitude and appreciation to my learned worthy and honorable guide Mam Warda Fiaz for his great guidance encouragement excellent attitude and sincere personal involvement throughout the study. I am also thankful to all those who helped me in collection of relevant material. Irfan Ullah Khan Page | 5 Diabetes Prediction using classification method PREFACE Welcome to the project report on "Diabetes Prediction using Classification Method." This report presents an in-depth exploration of the application of classification methods in the prediction of diabetes, aiming to revolutionize the early detection and management of this prevalent chronic condition. The implementation of predictive analytics in healthcare, particularly in the domain of diabetes prediction, holds significant promise in identifying individuals at risk and facilitating timely intervention. This report aims to elucidate the fundamental concepts of classification algorithms, data preprocessing, feature selection, model evaluation, and the ethical considerations inherent in leveraging predictive analytics in healthcare. The system detailed in this report is envisioned to be invaluable for healthcare professionals, data scientists, researchers, and anyone intrigued by the intersection of technology and healthcare. It is designed to equip readers with the knowledge and insights necessary to harness classification methods for diabetes prediction, ultimately contributing to improved healthcare outcomes for individuals at risk of developing diabetes. I trust that this project report will serve as a comprehensive guide for understanding and implementing predictive analytics in the context of diabetes, fostering enhanced healthcare practices and outcomes. The duration of this project is 1 year provided by the university and I have successfully completed my project in the given time. Page | 6 Diabetes Prediction using classification method 1 Table of Contents PROBLEM STATEMENT AND CHALLENGES ........................... 8 1.1 1.2 1.3 1.4 1.5 Introduction ................................................................................................................... 9 Project problem ............................................................................................................. 9 Purpose ......................................................................................................................... 10 Scope ............................................................................................................................. 10 Summary ...................................................................................................................... 10 2 2.1 2.2 2.3 2.4 2.5 2.6 PROPOSED METHODOLOGY .. 12 Introduction ................................................................................................................. 13 Purpose ......................................................................................................................... 13 Scope ............................................................................................................................. 13 Dataset Detail ............................................................................................................... 13 Proposed Methodology ............................................................................................... 15 Data Pre-processing .................................................................................................... 17 3 3.1 IMPLEMENTATION ................... 19 Implementation ........................................................................................................... 20 3.1.1 3.1.2 3.1.3 3.1.4 3.2 Tools and Language .................................................................................................... 21 3.2.1 3.2.2 3.3 3.4 4 4.1 4.2 4.3 Data Acquisition ....................................................................................................................... 20 Exploratory Data Analysis (EDA) ............................................................................................ 20 Data Preprocessing.................................................................................................................... 20 Model Evaluation ...................................................................................................................... 20 Tools: ........................................................................................................................................ 21 Language: .................................................................................................................................. 21 DATASET DESCRIPTION ....................................................................................... 22 Data Pre-processing .................................................................................................... 23 RESULT AND DISCUSSION ..... 26 RESULTS of Algorithms ............................................................................................ 27 DISCUSSION FOR BEST.......................................................................................... 28 COMPARISON OF ALGORITHMS (DIAGRAM) ............................................... 29 5 REFERENCES ........................... 30 6 APPENDIX ................................. 31 Page | 7 Diabetes Prediction using classification method Problem statement and challenges CHAPTER 1 Problem statement and challenges Page | 8 Diabetes Prediction using classification method 1.1 Introduction Diabetes Prediction employing classification technique concentrates on the harnessing of machine learning and data science strategies to come up with a predictive model when it comes to diagnosing diabetes in the patients. Manufactured from a holistic data set; this is aimed at supporting both the patient and healthcare professional in finding and diagnosing diabetes accurately; it is really remarkable. The project aims at the following: bring in and look through the data, conduct Exploratory Data Analysis so as to get useful insights, preprocess the data, train the model through the Neural Network algorithm, test its performance, and then evaluate and compare the various models of SVM, Decision Tree, and Logistic Regression. To predict diabetes status, assess model performance using metrics like - Accuracy, - Precision, - Recall, and - F1 score, as well as to identify the most precise model for the prediction is the main purpose. The project will also place the utmost significance on conserving the highest efficacy of the model for future proceedings, which would mean the improvement of diabetes diagnosis by machine learning. There are two types of diabetes: Type 1 diabetes and Type 2 diabetes. Type 1 diabetes is a health problem frequently diagnosed in children or young adults when their immune system starts attacking and gradually destroying insulin-developing cells in the pancreas and thus they have to receive regular insulin injections. Type 2 diabetes, on the other hand, is the most common and it is found among people and typically older adults, the body's inability to utilize insulin properly, which is known as insulin resistance being one particular feature. Type 2 diabetes can be mostly supplemented by lifestyle changes, medicine, and sometimes insulin therapy. 1.2 Project problem The project seeks the proper model of machine learning in determining the symptoms of diabetes with the help of relevant data from the patients. To do that, the model will be advanced by using the Neural Network model, SVM, Decision Tree, and Logistic Page | 9 Diabetes Prediction using classification method Regression. The project is designed in the way a typical machine learning life cycle is followed, data import, exploratory data analysis, data preprocessing, model training, testing, and evaluation. The objective determines which model gives the highest precision in predicting diabetes and deposits the best-performing model for the future. 1.3 Purpose This project is aiming to create a tool for early detection and prevention of diabetes. Effective prediction of the likelihood of diabetes in people, doctors can act with toys such as changing food, medication, and regular monitoring which are for preventing or delaying the diabetes-related complications. 1.4 Scope This project encompasses the acquisition and preprocessing of pertinent health data, the selection of suitable classification algorithms, the training and assessment of the predictive model, and the deployment of the model for practical application. An essential aspect involves validating the model's accuracy and dependability using real-world data to ensure its practical efficacy in clinical environments. 1.5 Summary Using machine learning and data science techniques in this project leads to the development of a predictive model for diagnosing diabetes, thus assisting patients and health providers in the accurate identification of diabetes. Diverse data used in the project includes data importation, exploratory data analysis, pre-processing, and model training with algorithms like Neural Network, SVM, Decision Tree, and Logistic Regression. Goals comprised in this project are: diabetes status prediction, model performance assessment, and identification of the most precise model. Importance of saving the bestperforming model is highlighted, thereby contributing to improvements in diabetes diagnosis Page | 10 Diabetes Prediction using classification method using machine learning techniques. This study concentrates on Type 1 and Type 2 diabetes in order to predict its presence accurately hence providing possibilities for early detection and prevention. At this juncture where one can intervene through appropriate measures if there are any signs recurrently raised concerning an individual's health status in relation to specific diseases like diabetes mellitus that affect everyone regardless of age or sex, specific recommendations should be given. Page | 11 Diabetes Prediction using classification method 2 Proposed methodology CHAPTER 2 Proposed methodology Page | 12 Diabetes Prediction using classification method 2.1 Introduction The proposed methodology will use classification techniques to predict diabetes by categorizing data into different groups or classes according to certain characteristics. Classification is a machine learning method that groups data into classes based on some specific criteria. In the case of predicting diabetes, it will apply classification algorithms to some data available to predict how likely someone will be diagnosed with diabetes. 2.2 Purpose The reason of employing classification techniques for diabetes prediction is developing a reliable and accurate model that can help in timely detection and intervention for people who may develop this disease in the future. Its aim is to enhance the effectiveness of diabetes prediction through use of machine learning algorithms, which consequently allow for proactive healthcare measures as well as personalized interventions aimed at reducing disease burden 2.3 Scope The scope of this methodology includes: Data collection: Obtaining data that are connected with diabetes, such as demographic data, lifestyle history and medical records. Selecting Features: Identifying the key factors that predict the risk of contracting diabetes. Model building: Creation and training of classification models like logistic regression, decision tree or support vector machine aimed at studying diabetes. Model evaluation: How well does a prediction model perform? Some metrics we use in model evaluation include accuracy, sensitivity (recall), specificity or precision as well as AUC (area under the curve). Interpretation and application: A model’s results are interpreted so that there are actionable insights for health care personnel, while at the same time enabling various individuals to make informed decisions on how they could prevent or manage diabetes. Let us develop a strong and credible approach through these key areas of methodology 2.4 Dataset Detail Detail: Since there are many open datasets across the web, I am going to use the Pima Indians Dataset from the UCI Machine Learning Repo since it's reliable and contains much more relevant information in regard to the Page | 13 Diabetes Prediction using classification method various attributes used in detection and prediction of this disease such as Glucose level, Insulin, Age, Body Mass Index(BMI), Diabetes Pidegree Function and many more. You can find this dataset at the link below: https://www.kaggle.com/datasets/akhilalexander/diabeticprediction The dataset consist of several medical predictor (Independent) variable and one target variable (outcome). A predictor variable includes: Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Diabetes Pedigree Function, age, and Outcome. All about Dataset: Name of the columns with their meanings 1) Pregnancies: Number of times pregnant 2) Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3) Blood Pressure: Diastolic blood pressure (mm Hg) 4) Skin Thickness: Triceps skin fold thickness (mm) 5) Insulin: 2-Hour serum insulin (mu U/ml) 6) BMI: Body mass index (weight in kg/(height in m)^2) 7) Diabetes Pedigree Function: Diabetes pedigree function 8) Age: Age (years) 9) Outcome: Class variable (0 or 1) 268 of 768 are 1 …. (Diabetic) 500 of 768 are 0 …. (Non-Diabetic) Dataset of Diabetes Prediction Diabetes Prediction using classification method Page | 14 2.5 Proposed Methodology Every software development methodology is a platform for carrying out specific steps that are meant to help in the growth and sustenance of software. Many software development tactics have been in play from the genesis of information technology. Some of the common existing methodologies in the IT industry include.  Problem Definition: Define diabetes prediction problem and its importance for medicine. It expresses the importance of developing an accurate and dependable model for early detection.  Literature Review: Conduct in-depth research on diabetes forecasting with respect to current literature using machine learning. While tackling diabetes diagnosis Neural Networks, SVM, Decision Trees, and Logistic Regression patterns have been explored by many scholars. Review methodologies, difficulties, consequences from similar studies."  Objective Refinement:: Using knowledge from reading, refine the deliverables of this study. These should be in line with the current trends in the field and should cover all the areas that were left unaddressed.  Data Collection: Get the diabetic dataset from Kaggle using the link that will be provided to you. It is important that this dataset is able to represent a broad range of situations and can be used for training different types of machine learning algorithms as well.  Exploratory Data Analysis (EDA): Try using statistical methods and data visualization tools to familiarize yourself with the dataset. Find out how the various features are distributed, try spotting patterns and most importantly figure out the possible indicators for diabetes.  Data Pre-processing: Divide data set into 70% for training and 30% for testing purposes. This involves different preparation steps such as normalizing variables or imputing missing values that may be present in your dataset."  Model Training: A Neural Network is trained to predict diabetes using the training dataset. Optimize hyper parameters and monitor training progress.  Model Application: Page | 15 Diabetes Prediction using classification method Use SVM, Decision Tree, and Logistic Regression models in the training data. The results are then compared with those obtained from the Neural Network model.  Evaluation Metrics: Evaluate how well the model performs by confusion matrix metrics such as Accuracy, Precision, Recall, F1 score and also more in-depth analysis like assessing on the common falsehood. Find out which model has the highest Accuracy and decide whether it is the best one.  Results Analysis: Analyze and interpret the results, making comparisons between different models. Identify strengths and limitations for each model in predicting diabetes.  Model Selection: Choose the best model based on its evaluation metrics for future usage.  Save Model: Implementing a mechanism to save selected models for future predictions ensures efficiency and quick access to forecast capability.”  Ethical Considerations: Discuss the ethical considerations around data privacy, consent and predictive models being responsibly used; make sure you follow all ethical standards while using the data of patients.  Documentation and Reporting: In the documentation process, make sure you have taken note of every research step which includes methodologies used, decisions made during the process as well as their outcomes; give a detailed report showing research findings, model performance as well as possible future work recommendations. Algorithms used in this Project: a) Logistic Regression This is when the machine learning algorithm or AI is at work to sort an outcome between “zero” and “one”. b) Support Vector Machine (SVM): This is machine learning technique that attempts to logically divide samples into categories according to some criterion. c) Decision Tree: Page | 16 Diabetes Prediction using classification method At each Node of tree, the optimal condition splits our data into two based on some feature Diabetes Patient Dataset Data Preprocessing Data Splitting Training Dataset (70%) Test Dataset (30%) Apply Machine Learning Models (1) Logistic Regression (2) SVM (3) Decision Tree Performance Evaluation on Various Metrics Compare Analysis based on Accuracy Result 2.6 Data Pre-processing The data pre-processing steps for diabetes prediction using classification methods entail essential procedures that are meant to increase the accuracy and efficiency of predictive models. In diabetes prediction context, data pre-processing helps in classifying algorithms dataset preparation where classified. Below is a brief description of how these activities are done:  Data Cleaning: Data Integration  Data Transformation  Data Reduction Page | 17 Diabetes Prediction using classification method  Data Discretization  Data Normalization The data mining and machine learning procedure will be more effective and accurate by carrying out such data preprocessing tasks. It is necessary to perform this step on data to obtain information that may be considered accurate as well as meaningful by the consuming public. The predictive models for diabetes classification using machine learning methods can have their accuracy, efficiency and robustness during predicting diabetes outcomes improved if the steps of careful data preprocessing listed below are followed correctly. Page | 18 Diabetes Prediction using classification method 3 Implementation CHAPTER 3 Implementation Page | 19 Diabetes Prediction using classification method 3.1 Implementation Here’s a detailed breakdown of the implementation process that is involved in implementing this project which includes various main steps for building and testing the diabetes prediction model. 3.1.1 Data Acquisition Get the diabetes dataset at this location (https://www.kaggle.com/datasets/akhilalexander/diabeticprediction). Get it into the system that it can be processed further. 3.1.2 Exploratory Data Analysis (EDA) Perform Exploratory Data Analysis (EDA) in order to understand more about the dataset. Figure out some of the facts, ongoing processes, sequences and associations present within this dataset by studying summary statistics. Based on some of the above steps, draw diagrams for better comprehension about various properties in this data." 3.1.3 Data Preprocessing To deal with any missing values, use proper techniques such as imputation. Transform categorical data into formats that can work with machine learning algorithms. Ensure numerical features are on similar scales. Divide the dataset into training (70%) and testing (30%) subsections. Train the Neural Network model with the training data. Optimize hyper parameters for improving performance in Neural Networks. 3.1.4 Model Evaluation Application of the trained Neural Network model on test data Performance evaluation of the model using metrics such as Precision, Recall, Accuracy and F1 score Construction of a confusion matrix to evaluate how well the model performs in terms of prediction ability 6. Comparative Analysis Train other models with the algorithms SVM, Decision Tree and Logistic Regression. Using the same set of metrics as employed in step 5; assess how well each model performs. Identify the best model out of all models for diabetes prediction so that its productivity is maximized 7. Model Selection and Saving Choose the most accurate model specifically in predicting whether a person has diabetes or not save the chosen mode for future utilization and deployment the primary objective of this project is to enable patients and healthcare providers detect and prevent diabetes as early as possible by creating the best possible diabetes prediction tool By following this implementation process, the project aims to develop an accurate and reliable diabetes prediction model that can assist patients and healthcare professionals in early detection and prevention of diabetes. Page | 20 Diabetes Prediction using classification method 3.2 Tools and Language To implement the diabetes prediction project a combination of tools and programming languages that meet machine learning and data science requirements will be used. Here is a list of essential tools and languages that will be required in the study: 3.2.1 Tools: 1. 2. 3. 4. Windows 7 or higher Google Chrome Browser Anaconda Python: Python will serve as the primary programming language for developing the machine learning models due to its extensive libraries for data manipulation, analysis, and modeling (e.g., Numpy, Pandas, and Scikit-learn). 5. Excel: We used Excel for dataset. 6. Jupyter Notebook: Jupyter Notebook provides an interactive environment for code development, data visualization, and documentation, making it ideal for exploratory data analysis and model prototyping. 3.2.2 Language: Python: Python is a versatile and powerful programming language that has a wide range of applications and a promising future scope. Here's why Python is considered one of the best programming languages:  Diverse Applications: Data science, web development, AI, automation, IoT, scientific computing  Easy to Learn and Use: Simple, intuitive syntax, emphasis on readability  Extensive Libraries and Ecosystem: Vast community-developed tools and frameworks  Cross-Platform Compatibility: Runs on Windows, macOS, Linux  Growing Demand and Job Opportunities: High and increasing adoption across industries Python's wide-ranging capabilities, ease of use, and thriving ecosystem make it one of the best programming languages, with a promising future outlook across various domains.  Pandas:  Scikit-learn: Scikit-learn will be utilized for implementing machine learning algorithms like SVM, Decision Tree, and Logistic Regression, enabling model training and evaluation.  Matplotlib and Seaborn: These visualization libraries will be employed for creating informative plots and graphs to visualize data patterns and model performance.  NumPy : NumPy is a powerful Python library for scientific computing. Python is broad in scope and one of the most versatile, easy to use, and widely applied programming languages because of its extensive ecosystem. More importantly, as Python’s popularity increases, it appears likely that there will be brighter days ahead, particularly considering the field of data science, machine learning and web development. Page | 21 Diabetes Prediction using classification method 3.3 DATASET DESCRIPTION The objective of this project is to predict whether patient has diabetes or not. The dataset consist of several medical predictor (Independent) variable and one target variable (outcome). A predictor variable includes: Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Diabetes Pedigree Function, age, and Outcome. All about Dataset: Name of the columns with their meanings 1) Pregnancies: Number of times pregnant 2) Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3) Blood Pressure: Diastolic blood pressure (mm Hg) 4) Skin Thickness: Triceps skin fold thickness (mm) 5) Insulin: 2-Hour serum insulin (mu U/ml) 6) BMI: Body mass index (weight in kg/(height in m)^2) 7) Diabetes Pedigree Function: Diabetes pedigree function 8) Age: Age (years) 9) Outcome: Class variable (0 or 1) There are 9 Attributes and 768 Instances. 268 of 768 are 1 … (1 indicates Diabetic) 500 of 768 are 0… (0 indicates Non-Diabetic) Page | 22 Diabetes Prediction using classification method Dataset of Diabetes Prediction 3.4 Data Pre-processing The data pre-processing steps for diabetes prediction using classification methods entail essential procedures that are meant to increase the accuracy and efficiency of predictive models. In diabetes prediction context, data pre-processing helps in classifying algorithms dataset preparation where classified. Below is a brief description of how these activities are done: Data Cleaning: It deals with spotting and rectifying inaccuracies or disparities in records such as absence of figures, unusual things, and copies. Ways to clean data include filling in missing data, deleting data and data conversion. Page | 23 Diabetes Prediction using classification method Data Integration: It involves coming up with a single set of data by bringing together different data from various sources. It becomes a bit of a challenge to merge data because they come in different formats, structures, and meanings. When it comes to data integration, record linkage and data fusion should also be considered as some of the techniques that can be used. Data Transformation: This entails preparing the data in a way that it can be analyzed, and some of the common methods are normalization, standardization, and discretization. Normalization scales the data i.e., brings the range of the variables to a common standpoint whereas standardization transforms the data such that its mean is zero while the standard deviation is one. Discretization on the other hand is a process through which continuous information is divided into several categories and further grouped. Data Reduction: Such data reduction methods include extracting features that are used to represent highlevel data as in methods like principal component analysis (PCA) and linear discriminant analysis (LDA). Extracting such features usually entails estimation of eigenvectors or singular vectors corresponding to the largest eigenvalues or singular values of the matrix that results from applying one of the multiple aforementioned algorithms. Data Discretization: This is where data is partitioned into parts which are separate and distinct instead of being continuous. Basically, discretization is always used for algorithms where categorical input is required like in machine learning and data mining. Ways of making data discrete include but aren’t limited to equal width binning technique, clustering or even equal frequency binning technique. Data Normalization: Normalization is only applicable to numerical columns. There are five common normalization methods: 1. Single feature scaling 2. Min-max scaling 3. Z-score normalization 4. Log scaling 5. Clipping This consist of shuffling the data into a uniform set of data such as 0-1 or -1 , Normalizing is often used for different unit data and large scales Normalization techniques include min – max , z-score and decimal scaling. The data mining and machine learning procedure will be more effective and accurate by carrying out such data preprocessing tasks. It is necessary to perform this step on data to obtain information that may be considered accurate as well as meaningful by the consuming public. Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization. Removing Null Values: There are a few ways to remove null values in Python:  Using dropna() in Pandas: The dropna() method allows you to drop rows or columns with null values.  Using filter() function: The built-in filter() function can be used to create a new list without null values. Page | 24 Diabetes Prediction using classification method  Using list comprehension: You can use a list comprehension to create a new list without null values. By removing null values; you can ensure that your data is clean and ready for further analysis or model training. This can lead to more accurate and reliable results, as well as more efficient computations. In summary, removing null values is a crucial data cleaning step in Python, as it helps to ensure the quality and reliability of your data and analysis. Out Put of Null value: So there is no null value in our project. Page | 25 Diabetes Prediction using classification method 4 Result and discussion CHAPTER 4 Result and discussion Page | 26 Diabetes Prediction using classification method 4.1 RESULTS of Algorithms In this project we have use 3 algorithms which are given below: 1) Logistic Regression: 2) Support Vector Machine (SVM): 3) Decision Tree Logistic Regression A logistic regression model is a supervised learning algorithm that is used to classify binary data. It will provide the probability that a data point is binary as a function of one or more predictor variables. For example logistic regression is commonly used in predicting customer churn rates, credit card fraud detection systems and medical diagnosis The result of this algorithm is : Algorithm Precision Recall F-1 Accuracy % Logistic Regression 0.74 0.74 0. 0.74 74.025974 Support Vector Machine (SVM) Support Vector Machine (SVM) is an algorithm for supervised learning which can serve for both regression and classification jobs. In finding the feasible hyper plane which partitions different group maximally, SVMs are employed. SVMs are acknowledged for their suitability towards disparate data while being especially powerful in high-dimensional scenarios. Image recognition, text categorization, and bioinformatics are the typical incidences where they are applied. The result of this algorithm is: Algorithm SVM Precision Recall F-1 Accuracy % 0.73 0.74 0.72 73.593074 Page | 27 Diabetes Prediction using classification method Decision Tree A decision tree is an algorithm that tells you if an observation is good or bad. It compares the features of the current record with the branch of the tree and follows it to the bottom. There are many nodes at each stage and various decisions occur at each node. All these decisions lead to a single conclusion about an observation as you want either positive judgment or a negative. However, it recursively partitions the data based on the feature that provides the most information gain leading to a hierarchy of_ decisions sill this decision is reached; hence it is known as recursive partitioning. Decision Trees are intuitive, easy to interpret, and can handle both numerical and categorical data. They are used in various fields such as medical diagnosis, quality control, and decision support_ systems for_ industrial processes. The result of this algorithm is: Algorithm Decision Tree Precision Recall F-1 Accuracy % 0.70 0.68 0.69 68.396268 4.2 DISCUSSION FOR BEST After successfully run of a 3 algorithm we get this result: Algorithm Precision Recall F-1 Accuracy % Logistic Regression 0.74 0.74 0. 0.74 74.025974 SVM Decision Tree 0.73 0.70 0.74 0.68 0.72 0.69 73.593074 68.396268 Logistic Regression provides us with higher accuracy in comparison with other methods. Its accuracy is 74.025974%. It is a binary classification learning algorithm that is supervised and models the probability of a certain binary output such as yes/no or 1/0 based on a corresponding variable. And it is important to know that the outcome variable being analyzed is binary such as 1 or 0. Therefore, the solution is the most appropriate. Page | 28 Diabetes Prediction using classification method 4.3 COMPARISON OF ALGORITHMS (DIAGRAM) Page | 29 Diabetes Prediction using classification method 5 REFERENCES 1. 2. 3. 4. 5. 6. www.academia.edu www.slideshare.net www.uml.org https://www.javatpoint.com/machine-learning-life-cycle https://www.kaggle.com/datasets/akhilalexander/diabeticprediction https://ocw.vu.edu.pk/Videos.aspx?cat=Computer+Science%2fInformation+Technology+&course= CS607 7. https://www.youtube.com/watch?v=_u-PaJCpwiU&list=PLu0W_9lII9ai6fAMHpacBmJONT7Y4BSG&index=1 8. https://vulms.vu.edu.pk/Courses/CS607/Downloads/AI_Complete_handouts_for_Printing.df 9. https://www.programiz.com/python-programming 10. https://www.tutorialspoint.com/python/index.htm 11. https://www.tutorialspoint.com/python/index.htm 12. https://www.w3schools.com/python/ 13. https://machinelearningmastery.com/machine-learning-in-python-step-by-step/ 14. https://www.geeksforgeeks.org/machine-learning-with-python/ 15. https://www.youtube.com/watch?v=ZftI2fEz0Fw 16. https://www.simplilearn.com/10-algorithms-machine-learning-engineers-need-to-know-article 17. https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ 18. https://www.anaconda.com/ 19. https://www.python.org/ 20. https://jupyter.org/ 21. https://www.academia.edu/42929334/Diabetes_prediction_using_classification_algorithms_for_Weka 22. CS504 Software Engineering – I 23. CS605 Software Engineering – II 24. CS403 Database Management System 25. CS614 SOFTWARE PROJECT MANAGEMENT Page | 30 Diabetes Prediction using classification method 6 APPENDIX Chapter 1 PROBLEM STATEMENT AND CHALLENGES Chapter 2 PROPOSED METHODOLOGY Chapter 3 IMPLEMENTATION Chapter 4 RESULT AND DISCUSSION Author: Irfan Ullah Khan [email protected] [email protected] Page | 31 Diabetes Prediction using classification method

References (1)

CS504 Software Engineering -I 23. CS605 Software Engineering -II

FAQs

What are the key classification algorithms used in diabetes prediction?add

The study employs Neural Networks, Support Vector Machines, Decision Trees, and Logistic Regression for diabetes prediction.

How does logistic regression fare against other models in accuracy?add

Logistic Regression achieves an accuracy of 74.03%, outperforming other methods tested in the study.

What data preprocessing steps were essential for model training?add

Key data preprocessing steps included normalization, imputation of missing values, and dataset splitting into training and testing.

What evaluation metrics were used to assess model performance?add

Model performance was evaluated using accuracy, precision, recall, F1 score, and confusion matrix analysis.

How does the methodology ensure clinical applicability of the model?add

The methodology emphasizes real-world data validation and interpretable results to ensure actionable insights for healthcare professionals.

About the author

IRFAN ULLAH KHAN

Virtual University Of Pakistan, Undergraduate

Busy with Machine Learning & Data Science

Papers

Followers

View all papers from IRFAN ULLAH KHANarrow_forward

Diabetes_Prediction_Using_Classification_Method

Sign up for access to the world's latest research

Abstract

Key takeawaysAI

Related papers

References (1)

FAQs

Related papers

Related topics

Key takeaways
AI