Book cover

International Conference on Emerging Technologies in Computer Engineering

ICETCE 2022: Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT pp 39–51 Cite as

Detection of Liver Disease Using Machine Learning Techniques: A Systematic Survey

557 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 1591)

The rapid growth in count of patients suffering from liver disease is a major concern all over the globe. Identification of persons having liver disease is done through liver biopsy and by visual checking of MRI by trained experts which is a tedious and time-consuming process. Therefore, there is a need to develop automated diagnosis system which can provide results in less time and with high accuracy. Researchers worked on this domain and came up with various models for detection of liver disease and its severity using machine learning algorithms. This paper presents a systematic and comprehensive review of the work done in this domain focusing on various machine learning techniques developed by various authors for prediction of liver disease. The performance comparison of the various algorithms is also discussed. This study also explores the datasets used by the various authors for liver disease prediction. Finally, in the conclusion section the challenges involved in liver disease prediction and future scope is discussed.

This is a preview of subscription content, access via your institution .

Buying options

liver disease prediction research paper 2017

Kumar, A., Venkateswaran, J.: Estimating the surveillance of liver disorder using classification algorithms. Int. J. Comp. Appl. (57), 39–42 (2012)

Google Scholar  

Prakash, K., Saradha, S.: Efficient prediction and classification for cirrhosis disease using LBP. GLCM and SVM from MRI images (3), (2021)

Alfisahrin, S.N.N., Mantoro, T.: Data mining techniques for optimization of liver disease classification. In: International Conference on Advanced Computer Science Applications and Technologies, pp. 379–384 (2013)

Arshad, I., Dutta, C., Choudhury, T., Thakra, A.: Liver disease detection due to excessive alcoholism using data mining techniques. In: International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 163–168 (2018)

Singh, A., Pandey, B.: An euclidean distance based KNN computational method for assessing degree of liver damage. In: International Conference on Inventive Computation Technologies (ICICT), pp. 1–4 (2016)

Ayeldeen, H., Shaker, O., Ayeldeen, G., Anwar, K.M.: Prediction of liver fibrosis stages by machine learning model: A decision tree approach. In: Third World Conference on Complex Systems (WCCS), pp. 1–6 (2015)

Nahar, N., Ara, F.: Liver disease prediction by using different decision tree techniques. Int. J. Data Mining & Know. Manage. Proc. (8), 01–09 (2018)

Thirunavukkarasu, K., Singh, A.S., Irfan, M., Chowdhury, A.: Prediction of liver disease using classification algorithms. In: 4th International Conference on Computing Communication and Automation (ICCCA) , pp. 1–3 (2018)

Abdar, M., Zomorodi, M., Das, R., Ting, I.-H.: Performance analysis of classification algorithms on early detection of Liver disease. Expert Systems with Applications (67), 239–251 (2017)

Hassoon, M., Kouhi, M.S., Zomorodi-Moghadam, M., Abdar, M.: Rule optimization of boosted C5.0 classification using genetic algorithm for liver disease prediction. International Conference on Computer and Applications (ICCA), pp. 299–305 (2017)

Patel, H., Thakur, G.: An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J. Res. 1–10 (2018)

Kumar, P., Thakur, R.S.: Diagnosis of liver disorder using fuzzy adaptive and neighbor weighted K-NN method for LFT imbalanced data. International Conference on Smart Structures and Systems (ICSSS), pp. 1–5 (2019)

Mahrishi, M., Sharma, G., Morwal, S., Jain, V., Kalla, M. Chapter 7 data model recommendations for real-time machine learning applications: a suggestive approach. In: Kant Hiran, K., Khazanchi, D., Kumar Vyas, A., Padmanaban, S. (eds.) Machine Learning for Sustainable Development, pp. 115–128. De Gruyter, Berlin, Boston (2021).

Kumar, S., Katyal, S.: Effective analysis and diagnosis of liver disorder by data mining. In: International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 1047–1051 (2018)

Ramana, B., Surendra, P., Venkateswarlu, P.: A critical study of selected classification algorithms for liver disease diagnosis. Int. J. Datab. Manage. Sys. (IJDMS) (3), (2011)

Auxilia, L.A.: Accuracy prediction using machine learning techniques for indian patient liver disease. In: 2nd International Conference on Trends in Electronics and Informatics (ICOEI) , pp. 45–50 (2018)

Shaheamlung, G., Kaur, H.: The diagnosis of chronic liver disease using machine learning techniques. Info. Technol. Indu. (9), (2021)

Gogi, V.J., Vijayalakshmi, M.N.: Prognosis of liver disease: using machine learning algorithms. In: International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE), pp. 875–879 (2018)

Patel, H., Thakur, G.: A hybrid weighted nearest neighbor approach to mine imbalanced data. In: Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 106 (2016)

Download references

Author information

Authors and affiliations.

Ajay Kumar Garg Engineering College, Dr. A.P.J. Abdul Kalam Technical University, Uttar Pradesh, Ghaziabad, India

Geetika Singh, Charu Agarwal & Sonam Gupta

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Geetika Singh .

Editor information

Editors and affiliations.

Aurel Vlaicu University of Arad, Arad, Romania

Prof. Valentina E. Balas

Myanmar Institute of Information Technology, Mandalay, Myanmar

Dr. G. R. Sinha

Indian Institute of Information Technology Kota, Jaipur, India

Dr. Basant Agarwal

Shobhit University, Gangoh, India

Tarun Kumar Sharma

SKIT, Jaipur, India

Pankaj Dadheech

Mehul Mahrishi

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Singh, G., Agarwal, C., Gupta, S. (2022). Detection of Liver Disease Using Machine Learning Techniques: A Systematic Survey. In: Balas, V.E., Sinha, G.R., Agarwal, B., Sharma, T.K., Dadheech, P., Mahrishi, M. (eds) Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT. ICETCE 2022. Communications in Computer and Information Science, vol 1591. Springer, Cham.

Download citation


Published : 26 May 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-07011-2

Online ISBN : 978-3-031-07012-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Prediction of Liver Disease using Machine Learning Models with PCA

Ieee account.

Purchase Details

Profile Information

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2023 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

People also looked at

Brief research report article, fatty liver disease prediction model based on big data of electronic physical examination records.

liver disease prediction research paper 2017

Fatty liver disease (FLD) is a common liver disease, which poses a great threat to people's health, but there is still no optimal method that can be used on a large-scale screening. This research is based on machine learning algorithms, using electronic physical examination records in the health database as data support, to a predictive model for FLD. The model has shown good predictive ability on the test set, with its AUC reaching 0.89. Since there are a large number of electronic physical examination records in most of health database, this model might be used as a non-invasive diagnostic tool for FLD for large-scale screening.

1. Introduction

Fatty liver disease (FLD) is a lesion with excessive accumulation of fat in liver cells, which is divided into non-alcoholic fatty liver disease (NAFLD) and alcoholic fatty liver disease (AFLD) ( 1 ). In recent years, with the improvement of living standards, changes in lifestyle and diet, and the wide use of ultrasound and other imaging technology, the prevalence of FLD is growing rapidly ( 2 ). In fact, it has become the most common cause of chronic liver disease in developed and developing countries ( 3 ). According to research, about 25% of people worldwide and 21% of people in China catch NAFLD ( 4 , 5 ).

At present, the pathogenesis of NAFLD is not completely clear, and there is no ideal and effective treatment drug, but it is reversible in the early stages. Research shows that effective lifestyle interventions such as energy restriction, dietary changes, and increased physical activity are particularly effective in the early stages of NAFLD ( 6 ). Therefore, early detection and treatment is the key. At present, the main clinical diagnostic methods are ultrasound, CT, and liver biopsy ( 7 ). For their invasiveness and complexity, they are not suitable for large-scale epidemiological screening ( 8 – 10 ).

Based on the above situation, many scientists try to use machine learning algorithm to build the prediction model of FLD. In recent years, several machine learning models based on medical data have been proposed ( 11 – 13 ). Italian scholar Giorgio Bedogni collected data by gender, age, alcohol intake, alanine aminotransferase, aspartate aminotransferase, body mass index (BMI), waist circumference, the sum of four skinfolds, etc., and established a prediction model for NAFLD ( 13 ). However, most of the models are carried out through questionnaire surveys and medical experiments and use some features that are not easy to obtain in large quantities. The limitation of data quantity and the complexity of features make these models difficult to generalize.

The purpose of this study is to establish an efficient and convenient FLD prediction model using machine learning algorithm which can help doctors to screen out the patients that need further liver examination and can be applied to large-scale epidemiologic screening. To facilitate the generalization of the model, the features we use will be as convenient as possible, and the amount of data we use will be as much as possible.

2. Materials and Methods

2.1. dataset.

The development of the medical system, the popularity of electronic physical examination records, and the establishment of health databases provide data support for large-scale epidemiological research. The data set used in this study is from the health database of a hospital in China, which contains the electronic physical examination records of 44,854 patients. And in this data set, no patient's privacy information is included, only routine physical examination data and age are included. To simplify and generalize the model, we only extracted 129 routine physical examination items of all patients, including blood routine, biochemistry, urine routine, etc.

In this study, patients diagnosed with FLD by ultrasound were marked as 1, and the remaining patients were marked as 0. The prevalence of FLD in the data set is 23%, which is close to the previous research ( 5 ).

2.2. Data Preprocessing

Firstly, for the accuracy of the model, we deleted individuals who had not undergone ultrasound examination because we did not know if they had FLD. Then, we delete the items with more than 2 3 missing values that most people have not been examined. Finally, we randomly select 70% of the data set as the training set of the model and 30% as the test set.

Figure 1 shows the process of data preprocessing. Figure 2 shows the mean (standard deviation) of the different features of FLD patients and normal people and whether these features have passed the chi-square test with significance level of 0.05. It can be seen that there are significant differences in Male gender percentage, Uric acid (UA), Triglycerides (TG), Alanine aminotransferase (ALT), Aspartate aminotransferase (AST), Gamma glutamine transpeptidase (GGT), Age and AST/ALT between FLD patients and normal people, while Carbon dioxide (CO 2 ), Total bilirubin (TBIL), Total protein (TP), and Anion gap do not.

Figure 1 . Data preprocessing flowchart.

Figure 2 . Statistical information and chi-square test results of different features in different groups.

2.3. Missing Value Processing

Compared with conducting medical experiments and questionnaire surveys, the advantage of using electronic physical examination records in the health database for modeling is that the amount of data is large and the model is easy to be generalized, but the disadvantage is that there are lots of missing values. Therefore, how to fill in missing values is critical to modeling. The usual practice is to fill in the mean or median for missing values. In fact, the distribution of medical indicators varies with gender and age, and the range is large. So it's a good choice to fill in the median according to age and gender.

For age grouping method, standard age grouping can be used, but the result is not ideal. So we use the chi-square binning algorithm to group age. Chi-square binning algorithm is a binning algorithm based on the chi-square test, which is specifically implemented by the independence test in the chi-square test. The theoretical basis for binning is: the lower the chi-square value between two bins, the more likely they are to have similar distributions ( 14 ). If two adjacent bins have very similar distributions, then the two bins should be merged, otherwise, they should be separated. Therefore, in each step of the algorithm, the two bins with the smallest chi-square value must be combined until the number of bins meets the stopping condition.

In the present study, a bin refers to an age group and distribution refers to the prevalence of FLD. And we set the expected number of bins to 5, and the result after calculation on the training set is: [0, 17], (17, 29], (29, 35], (35, 47], (47, 197]. According to the results of age grouping, Figure 3 shows the distribution of several important features that need to be filled with missing values under different age and gender groups. It can be seen that the difference in distribution is obvious, so our strategy of filling in missing values is necessary and effective.

Figure 3 . Violin chart: the distribution of different features under different age groups and genders.

2.4. Feature Engineering

In machine learning modeling, the quality of features often determines the upper bound of model performance. Therefore, we need to do feature engineering on the existing routine features to maximize the usage of them. In clinical diagnosis, the combination of multiple characteristics often plays an important role in the judgment of diseases. For example, AST/ALT (Aspartate aminotransferase/Alanine aminotransferase) is of great significance in the diagnosis of liver diseases ( 1 ). So we want to generate new features through a combination of features.

In the present study, we use Spearman's correlation coefficient as a standard to measure the quality of features and use the genetic algorithm to find the optimal solution. Spearman's correlation coefficient, also known as rank correlation coefficient, can measure the rank correlation between two variables. If the machine learning model used is based on a decision tree, the Spearman correlation coefficient can measure the correlation between a feature and the target. The genetic algorithm is a method of searching for the optimal solution by simulating the natural evolution ( 15 , 16 ). The algorithm transforms the problem-solving process into a process similar to the crossover and mutation of chromosomal genes in biological evolution. When solving more complex combinatorial optimization problems, Compared with some conventional optimization algorithms, it can usually obtain better optimization results faster ( 16 ).

Figure 4 shows the process of feature engineering using genetic algorithm. In the algorithm, an individual in the population is defined as a binary tree. Each leaf node of the binary tree is a certain feature in the data set, and each inner node of the binary tree is an operator in {+, -, *, /, log, sqrt}. Each individual represents an expression composed of features and operators. Fitness is the Spearman correlation coefficient between the new feature and the target. In each generation, individuals with high fitness will be retained, and individuals with low fitness will be eliminated. The upper left of Figure 5 shows an individual example, which represents TG * AST + GLU . The upper right and lower parts of Figure 5 respectively show crossover operations and mutation operations, both of which generate new individuals by changing subtrees in the way that simulates biological variation.

Figure 4 . Genetic algorithm flowchart.

Figure 5 . Demonstration of individual and individual variation.

We set the number of individuals in each generation to 1,000 and set the maximum depth of the binary tree to three. Use the normalized features and iterating ten generations, the individuals with the first three fitness levels are added to the data set as new features. The result is: GA _ fea 1 = TG + log ( ALT ) with fitness 0.89, GA _ fea 2 = TG * GGT with fitness 0.87, and GA _ fea 3 = ( UA + AST ) * log ( ALT ) with fitness 0.79.

3. Experiments and results

XGBoost (eXtreme Gradient Boosting) is an engineering implementation of gradient boosting decision tree (GBDT) ( 17 ). Its core idea is to perform a second-order Taylor expansion of the loss function, and gradually train the decision tree based on the objective function, and greatly improve the training model speed ( 18 , 19 ). XGboost has many advantages. For example, traditional GBDT only uses first-order derivative information in optimization, while XGboost performs a second-order Taylor expansion on the cost function to make the result more accurate. Xgboost adds a regular term to the cost function to control the complexity of the model, which reduces the variance of the model and makes the learned model simpler and prevents overfitting. XGboost supports parallel computing on feature granularity, which greatly reduces the amount of calculation and improves the training speed. In addition, XGBoost is a model based on the decision tree model, it is more explanatory than neural networks and other algorithms, which can enable us to better understand how a physical examination data plays a role in the model ( 20 ). Therefore, the present study uses the XGBoost model for modeling.

The error of a machine learning model includes two aspects: variance and bias ( 21 ). High bias models usually have relatively simple parameter settings and tend to underfit, that is, there is little difference in performance between the training set and test set, but both are relatively low. High variance models usually have complex parameter settings and tend to overfit. They perform well on the training set, but the performance on the test set drops seriously. The usual practice is to make a trade-off between variance and bias to get a reasonable model. AUC (Area Under Curve) is defined as the area under the ROC curve (Receiver Operating Characteristic curve), which is a commonly used indicator to measure the quality of a machine learning model ( 22 ). AUC has nothing to do with the ratio of positive and negative samples, it represents the model's ability to sort samples to a certain extent ( 23 ). In present study, we use AUC as the evaluation criterion of the XGBoost model. On the training set, Bayesian optimization of hyperparameters is performed using triple cross-validation, and then the obtained results are fine-tuned to prevent over-fitting and ensure the rationality of the parameters. The main results are as follows: max _ depth : 3, learning _ rate : 0.07, n _ estimators : 150, scale _ pos _ weight : 2, min _ child _ weight : 6, gamma : 0.2, reg _ alpha : 0.1.

The upper left and upper right of Figure 6 respectively show the performance of the high variance model and the high bias model. The lower left shows the effect of the hyperparameter iterations on the model performance. It can be seen that with the increase of iterations , the over fitting phenomenon of the model appears, and the variance of the model becomes larger. The lower right shows the performance of the model with the optimal hyperparameter combination set. It can be seen that the AUC of the model reached 0.89, which shows that the model has a strong predictive ability for FLD, and the performance of the model in the test set and training set is basically the same, without over fitting phenomenon.

Figure 6 . Trade-off between variance and bias.

4. Discussion

Using the number of times the feature is used as the basis for splitting in the decision tree splitting as the importance of the feature, we can sort all the features by importance. Left of Figure 7 shows the model performance obtained by gradually adding the top 60 features of importance to the model. It can be seen that the top 10 features are the most important, and the features after the 20th place are dispensable. This shows that even if we only use the first ten features to train the model, its AUC can still reach the level of 0.87–0.88, but the model is greatly simplified at this time.

Figure 7 . The influence of the number of features used on the model and feature importance.

Right of Figure 7 shows the importance of the top 10 features. According to research, the degree of fat accumulation in the liver is directly proportional to body weight. The prevalence of obesity in NAFLD patients is estimated to be 51.34% (95% CI: 41.38–61.20) ( 1 ), so many FLD patients have a significant increase in TG. At the same time, and when liver disease occurs, ALT and GGT will increase significantly. Right of Figure 7 shows that TG, ALT, GGT, GA _ fea 1, and GA _ fea 2 play a vital role in the model, which is in line with the facts. Studies have also shown that the prevalence of diabetes in NAFLD patients is estimated to be 22.51% (95%CI: 17.92–27.89) ( 1 ), and with the increase of age, people's metabolism slows down and people are more likely to suffer from metabolic diseases. So the importance of GLU and Age is also well-understood.

We analyzed the patients with FLD who were mispredicted in the test set and found that their indicators were basically normal. We think that these people may be patients with AFLD or patients with mild FLD, they often do not have obvious symptoms and indicators change ( 1 ). Our data set does not include the alcohol intake and body condition of patients, which limits our prediction ability, because we can not exclude the interference of AFLD and we can not use the waist circumference of patients to judge whether they are obese(Even so, the AUC of our model is still high). But because of this, our model can be directly applied to the electronic physical examination records of the current health database for large-scale epidemics screening.

5. Conclusion

In the present study, we use the electronic physical examination records in the health database as data support, use the chi-square binning algorithm to help fill in the missing values, and use the genetic algorithm as the optimization algorithm for feature engineering, which tentatively solves the two disadvantages of the large-scale electronic medical record–missing values and lack of features. In the end, this study established an FLD prediction model based on the XGBoost algorithm with an AUC of 0.89. The satisfactory performance of the model makes large-scale screening of FLD possible, but due to the limited data breadth, more data is needed for external verification before applications.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin. Written informed consent was not obtained from the individual(s), nor the minor(s)' legal guardian/next of kin, for the publication of any potentially identifiable images or data included in this article.

Author Contributions

MZ puts forward the idea and realizes it, completes the visualization and the writing of the paper. CS is responsible for data sorting and data analysis. TL and TH assisted in writing the first draft of the paper. SL leads research and conduct thesis writing guidance. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1. Chalasani N, Younossi Z, Lavine JE, Charlton M, Cusi K, Rinella M, et al. The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American Association for the study of liver diseases. Hepatology . (2018) 67:328–57. doi: 10.1002/hep.29367

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Brunt EM, Wong VW, Nobili V, Day CP, Sookoian S, Maher JJ, et al. Nonalcoholic fatty liver disease. Nat Rev Dis Primers . (2015) 1:15080. doi: 10.1038/nrdp.2015.80

CrossRef Full Text | Google Scholar

3. Bellentani S. The epidemiology of non-alcoholic fatty liver disease. Liver Int. (2017) 37:81–4. doi: 10.1111/liv.13299

4. Younossi Z, Koenig A, Abdelatif D, Fazel Y, Henry L, Wymer M. Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology . (2016) 64:73–84. doi: 10.1002/hep.28431

5. Li Z, Xue J, Chen P, Chen L, Yan S, Liu L. Prevalence of nonalcoholic fatty liver disease in mainland of China: a meta-analysis of published studies. J Gastroenterol Hepatol . (2014) 29:42–51. doi: 10.1111/jgh.12428

6. El-Agroudy N, Kurzbach A, Rodionov R, O'Sullivan J, Roden M, Birkenfeld A, et al. Are lifestyle therapies effective for NAFLD treatment? Trends Endocrinol Metabol . (2019) 30:701–9. doi: 10.1016/j.tem.2019.07.013

7. Mishra P, Younossi ZM. Abdominal ultrasound for diagnosis of non alcoholic fatty liver disease (NAFLD). Am J Gastroenterol . (2007) 102:2716–7. doi: 10.1111/j.1572-0241.2007.01520.x

8. Noureddin M, Lam J, Peterson MR, Middleton M, Hamilton G, Le TA, et al. Utility of magnetic resonance imaging versus histology for quantifying changes in liver fat in nonalcoholic fatty liver disease trials. Hepatology . (2013) 58:1930–40. doi: 10.1002/hep.26455

9. Sumida Y, Nakajima A, Itoh Y. Limitations of liver biopsy and non invasive diagnostic tests for the diagnosis of nonalcoholic fatty liver disease/nonalcoholic steatohepatitis. World J Gastroenterol . (2014) 20:475–85. doi: 10.3748/wjg.v20.i2.475

10. Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi F, et al. Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing . (2021) 425:173–80. doi: 10.1016/j.neucom.2020.04.001

11. Yip T, Ma A, Wong V, Tse Y, Chan H, Yuen P, et al. Laboratory parameter based machine learning model for excluding non-alcoholic fatty liver disease (NAFLD) in the general population. Aliment Pharmacol Ther . (2017) 46:447–56. doi: 10.1111/apt.14172

12. Poynard T, Ratziu V, Naveau S, Thabut D, Charlotte F, Messous D, et al. The diagnostic value of biomarkers (SteatoTest) for the prediction of liver steatosis. Comp Hepatol . (2005) 4:1–14. doi: 10.1186/1476-5926-4-10

13. Bedogni G, Bellentani S, Miglioli L, Masutti F, Passalacqua M, Castiglione A, et al. The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol . (2006) 6:33. doi: 10.1186/1471-230X-6-33

14. Franke TM, Ho T, Christie CA. The chi-square test: often used and more often misinterpreted. Am J Eval. (2012) 33:448–58. doi: 10.1177/1098214011426594

15. Zeng N, Song D, Li H, You Y, Liu Y, Alsaadic F. A competitive mechanism integrated multi-objective whale optimization algorithm with differential devolution. Neurocomputing . (2021) 432:170–82. doi: 10.1016/j.neucom.2020.12.065

16. Mitchell M. An Introduction to Genetic Algorithms . Cambridge, MA: MIT Press (1998).

Google Scholar

17. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD'16 - XGBoost . San Francisco, CA: ACM Press (2016). p. 785–94. doi: 10.1145/2939672.2939785

18. Dietterich TG. Ensemble methods in machine learning. In: Kittler J, Roli F, editors. International Workshop on Multiple Classifier Systems . Berlin; Heidelberg: Springer (2000). p. 1–15.

19. Zopluoglu C. Detecting examinees with item preknowledge in large-scale testing using extreme gradient boosting (XGBoost). Educ Psychol. Meas. (2019) 79:13164419839439. doi: 10.1177/0013164419839439

20. Zeng N, Wang Z, Zineddin B, Li Y, Du M, Xiao L, et al. Image-based quantitative analysis of gold immunochromatographic strip via cellular neural network approach. IEEE Trans Med Imaging. (2014) 33:1129–36. doi: 10.1109/TMI.2014.2305394

21. Mehta P, Bukov M, Wang C-H, Day AG, Richardson C, Fisher CK, et al. A High-Bias, Low-Variance Introduction to Machine Learning for Physicists . (2018). Available online at:

PubMed Abstract | Google Scholar

22. McClish DK. Analyzing a portion of the ROC curve. Med Decis Making . (1989) 9:190–5.

23. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics . (1988) 44:837–45.

Keywords: fatty liver disease, electronic medical records, genetic algorithm, machine learning, XGBoost, chi-square binning algorithm

Citation: Zhao M, Song C, Luo T, Huang T and Lin S (2021) Fatty Liver Disease Prediction Model Based on Big Data of Electronic Physical Examination Records. Front. Public Health 9:668351. doi: 10.3389/fpubh.2021.668351

Received: 16 February 2021; Accepted: 11 March 2021; Published: 12 April 2021.

Reviewed by:

Copyright © 2021 Zhao, Song, Luo, Huang and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shiming Lin,

This article is part of the Research Topic

Data-Enabled Intelligence for Medical Technology Innovation no longer supports Internet Explorer.

To browse and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

paper cover thumbnail

Efficient Prediction of Liver Disease using Selected Attributes

Profile image of Maham Irfan

2017, VFAST Transactions on Software Engineering

Related Papers

International Journal of Latest Technology in Engineering, Management & Applied Science -IJLTEMAS (

The Data Mining refers to extracting or mining knowledge, information from the large amount of Data. The main purpose of data mining is data analysis. In Data Mining various techniques that used are Association Rule Mining, Sequential Pattern Mining, Clustering, and Classification. Classification is a data mining technique used to predict the class label or membership data. In this paper, we present the basic classification techniques. Several major kinds of technique, including Decision trees (DTs), Naive Bayes, K-nearest neighbor (K-NN), Artificial Neural Networks (ANN). The main goal of this survey is to provide Comparative review of various classification techniques in data mining.

liver disease prediction research paper 2017

Journal of Computer Science

International Journal of Advanced Computer Science and Applications

International Journal of Information Technology and Computer Science

Abhilasha Nakra

International Journal of Computer Science and Information Technology

shelly gupta

shambel kefelegn

The main aim of Data mining consumes existing trend for achieving analytical results. Data mining retrieving Massive amount of data is composed by the healthcare industry in order to determine hidden information for actual analysis and decision making. Machine learning algorithm used to predict the accuracy of the liver disorder diseases based on the classes. The main role of this paper to predict the liver disorder diseases by using machine learning algorithm by applying ensemble approach to get better accuracy compares with the previous studies. The Voting ensemble framework in this paper is used to combine many machine learning algorithms that are support vector machine, Naïve Bayes, Random Tree, K-Nearest Neibour and J48 decision tree to produce better accuracy with lost error rate. Keywords: Data mining, Machine leaning, Stacking, SVM, NB, K-NN, J48 and Random tree

Informatica, Slovenian Society Informatika

Dr. Naiyar Iqbal , Mohammad Islam

Dengue disease patients are increasing rapidly and actually dengue has recorded in every continent today according to the World Health Organization (WHO) record. By WHO report the number of dengue outbreak cases announced every year has expanded from 0.4 to 1.3 million during the period of 1996 to 2005 and then it has reached to 2.2 to 3.2 million during the year of 2010 to 2015 respectively. Consequently, it is fundamental to have a structure that can adequately perceive the pervasiveness of dengue outbreak in a large number of specimens momentarily. At this critical moment, the capability of seven prominent machine learning systems was assessed for the forecast of the dengue outbreak. These methods are evaluated by eight miscellaneous performance parameters. LogitBoost ensemble model is reported as the topmost classification accuracy of 92% with sensitivity and specificity of 90 and 94 % respectively. Povzetek: Sedem algoritmov strojnega učenja je analiziranih na izbruhu mrzlice dengi in LogitBoost je dosegel najboljše rezultate.

hamid karimkhani

Breast cancer is one of the deadliest diseases, is the most common of all cancers and is the leading cause of cancer deaths in women worldwide. The classification of Breast Cancer data can be useful to predict the outcome of some diseases or discover the genetic behavior of tumors. In this paper we present a comparative survey on data mining techniques in the diagnosis and prediction of breast cancer and also an analysis of the prediction of survivability rate of breast cancer patients. The data used is the SEER PublicUse Data.

International Research Group - IJET JOURNAL

Data mining techniques play very important role in health care industry. Liver disease is one of the growing diseases these days due to the changed life style of people. Various authors have worked in the field of classification of data and they have used various classification techniques like Decision Tree, Support Vector Machine, Naïve Bayes, Artificial Neural Network (ANN) etc. These techniques can be very useful in timely and accurate classification and prediction of diseases and better care of patients. The main focus of this work is to analyze the use of data mining techniques by different authors for the prediction and classification of liver patient.


Issam Salman


WSEAS Transactions on Information Science and Applications archive

sirage zeynu

Journal of Food Quality

Khongdet Phasinam , Tamal Mondal

Gormanukonda Ravi Kumar

Asian Journal of Research in Computer Science

Amira Bibo Sallow

Dr. Velmurugan Thambusamy

Ahmed Azouz

Computational Vision and Bio-Inspired Computing

Md. Faisal Faruque

Third International Conference on Advances in Bio-Informatics and Environmental Engineering - ICABEE 2015

Kemal Tutuncu , Murat KOKLU

The 2011 International Joint Conference on Neural Networks

Vincent Lemaire

Handbook of Medical and Healthcare Technologies

Dubravko Culibrk

Saria Eltalhi

durga kinge

ICST Transactions on Scalable Information Systems

shiva reddy

The International Conference on Electrical Engineering

ahmed elbohy

Design Engineering

Arvind Mewada

Kanza Hanif

Kwetishe J Danjuma

Alaa Nassar

International Journal of Applied Metaheuristic Computing

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

International Journal of Scientific Research in Computer Science, Engineering and Information Technology IJSRCSEIT

aysha ashraf

BioData Mining

Giovanni Felici

… in Bioinformatics and …

muskan kukreja

Williamjeet Singh

Kemal Tutuncu

George Theofilis

Tina Mathew

Pablo Sotuyo Blanco

Manoj Jayabalan

BMC Bioinformatics

Vikas Chaurasia

International Conference on Emerging Applications and Technologies for Industry 4.0 (EATI’2020)

Udoinynag Godwin Inyang

Gopinath Palai

amira abdalla

Gordon Ondego

Nursel Selver Ruzgar

EAI Endorsed Transactions on Pervasive Health and Technology

Dr. Anjana S Chandran

International Journal of Database Management Systems ( IJDMS ) , B. R


Captcha Page

We apologize for the inconvenience...

To ensure we keep this website safe, please can you confirm you are a human by ticking the box below.

If you are unable to complete the above request please contact us using the below link, providing a screenshot of your experience.

Please solve this CAPTCHA to request unblock to the website

AIP Publishing Logo

Liver disease prediction using ML techniques

Syed Nawaz Pasha , Dadi Ramesh , Sallauddin Mohmmad , Navya P. , P. Anil Kishan , C. H. Sandeep; Liver disease prediction using ML techniques. AIP Conference Proceedings 24 May 2022; 2418 (1): 020010.

Download citation file:

There is a tremendous increment of liver sickness patients as a result of exorbitant utilization of liquor, breathe in of hurtful gases, admission of polluted food, pickles and medications. Therefore lot of burden is put on the doctors to identify a patient whether he is having any liver disease are not .This paper helps to reduce the burden on the doctor by analyzing patients conditions using machine learning techniques[1]. We also make a comparison of few machine learning algorithms like random forest,logistic regression and SVM and compare their accuracy levels in predicting the liver disease.

Captcha Validation Error. Please try again.

Sign in via your Institution

Citing articles via, publish with us - request a quote.

liver disease prediction research paper 2017

Related Content

Connect with AIP Publishing

This Feature Is Available To Subscribers Only

Sign In or Create an Account


  1. (PDF) Strategic Analysis in Prediction of Liver Disease Using Different

    liver disease prediction research paper 2017


    liver disease prediction research paper 2017


    liver disease prediction research paper 2017

  4. (PDF) Liver Disease Prediction using SVM and Naïve Bayes Algorithms

    liver disease prediction research paper 2017

  5. (PDF) Fatty Liver Disease Prediction Using Supervised Learning

    liver disease prediction research paper 2017

  6. (PDF) Fatty Liver Disease Prediction Model Based on Big Data of

    liver disease prediction research paper 2017


  1. Liver Disease Relief

  2. Lecture-5 liver functions tests

  3. Tutorial case study for liver diseases dr samuel

  4. 😢Manobala Passed Away

  5. l7 jaundice p2 +liver failure

  6. Nutritional Consideration in Liver Disease[Part-2] Chap-7 || Applied Nutrition|| PNC and KMU Exam Pa


  1. Prediction of Liver Patients Using Machine Learning Algorithms

    2 Literature Survey. Previous researches for detection of patients of liver diseases have given good accuracies above 90%, be it using machine learning or data mining techniques. In [ 2 ], a three-phase analysis is employing normalization, PSO followed by application of algorithms. It showed J-48 having highest accuracy.

  2. Liver Disease Prediction Using Machine Learning Algorithms

    In Human beings, Liver is the most primary part of the body that performs many functions including the production of Bile, excretion of bile and bilirubin, meta Liver Disease Prediction Using Machine Learning Algorithms | IEEE Conference Publication | IEEE Xplore Liver Disease Prediction Using Machine Learning Algorithms

  3. Detection of Liver Disease Using Machine Learning Techniques: A

    This paper presents a detail study of the research work conducted in the detection of liver disease and its severity using machine learning models. ... Nahar, N., Ara, F.: Liver disease prediction by using different decision tree techniques. Int. J. ... (2017) Google Scholar Patel, H., Thakur, G.: An improved fuzzy k-nearest neighbor algorithm ...

  4. (PDF) Prediction of Liver Diseases Based on Machine ...

    ... As a rule, metaheuristic algorithms are evaluated based on their ability to solve decision-making difficulties and their significant balance between exploring and exploiting [9] [10] [11] [12]...

  5. Supervised Machine Learning Models for Liver Disease Risk Prediction

    A brief overview of the dataset's characteristics is shown in Table 1. Table 1. Dataset Description. 2.2. Liver Disease Risk Prediction. Nowadays, clinicians and health carers exploit machine-learning models to develop efficient tools for the risk assessment of a disease occurrence based on several risk factors.

  6. Software-based Prediction of Liver Disease with Feature Selection and

    In this paper, the main focus is to predict the liver disease based on a software engineering approach using classification and feature selection technique. The implementation of proposed work is done on Indian Liver Patient Dataset (ILPD) from the University of California, Irvine database.

  7. Prediction of Liver Disease using Classification Algorithms

    ... Singh et. al. [12] used different techniques for liver disease classification. Support vector machines, K-Nearest Neighbor, and logistic regression are the methods utilized for this task....

  8. Prediction of Liver Disease using Machine Learning Models with PCA

    The liver is one of the most vital organs of the human body. There are various disorders of liver that need early treatment by doctors. Early diagnosis and treating the patients are significant to reduce the risk. Healthcare system can benefit from various Machine Learning (ML) models to predict diseases in early stage. The aim of this study is to predict liver disease using different ML ...

  9. Statistical Machine Learning Approaches to Liver Disease Prediction

    Medical diagnoses have important implications for improving patient care, research, and policy. For a medical diagnosis, health professionals use different kinds of pathological methods to make decisions on medical reports in terms of the patients' medical conditions. Recently, clinicians have been actively engaged in improving medical diagnoses. The use of artificial intelligence and ...

  10. Frontiers

    Fatty liver disease (FLD) is a lesion with excessive accumulation of fat in liver cells, which is divided into non-alcoholic fatty liver disease (NAFLD) and alcoholic fatty liver disease (AFLD) ( 1 ).

  11. -Analysis of Liver Disease Prediction Methods and Future Research

    Golmei Shaheamlung In the 21st-century, the issue of liver disease has been increasing all over the world. As per the latest survey report, liver disease death toll has been rise approximately 2 million per year worldwide. The overall percentage of death by liver disease is 3.5% worldwide.

  12. Efficient Prediction of Liver Disease using Selected Attributes is a platform for academics to share research papers. Efficient Prediction of Liver Disease using Selected Attributes ... 2411-6327 Volume 12, Number 1, January-April , 2017 pp. 10--18 EFFICIENT PREDICTION OF LIVER DISEASE USING SELECTED ATTRIBUTES 1 MUJTABA HASSAN1, MAHAM IRFAN , SALAh-U-DIN AYUBI 1 Department of Software ...

  13. Liver disease diagnosis and prediction by hybrid data mining approach

    This hybrid technique of computation of the decision table saves the information based on the preferred arrangement of features and uses the model as a query table during the preparation of information predictions. This paper proposes a Liver infections Prediction framework for the general public to forestall the reason for the demise.

  14. Journal of Physics: Conference Series PAPER OPEN ACCESS ...

    Sharma and K C Juglan Application Progress of Bear Bile Powderand Ursodeoxycholic Acid in Liver Diseaseand its Mechanism of Action Xin Zhao, Ruipeng Wang, Zhengmin Caoet al. Assessment of liver function in acute orchronic liver disease by the methacetinbreath test: a tool for decision making inclinical hepatology Gadi Lalazar and Yaron Ilan

  15. Liver disease prediction using ML techniques

    INTERNATIONAL CONFERENCE ON RESEARCH IN SCIENCES, ENGINEERING & TECHNOLOGY. 12-13 February 2021 ... Therefore lot of burden is put on the doctors to identify a patient whether he is having any liver disease are not .This paper helps to reduce the burden on the doctor by analyzing patients conditions using machine learning techniques[1 ...