Character N-gram model for toxicity prediction
Abstract
Molecular toxicity prediction is a crucial step in the drug discovery process. It has a direct relationship with human health and medical destiny. Accurately assessing a molecule’s toxicity can aid in the weeding out of low-quality compounds early in the drug discovery phase, avoiding depletion later in the drug development process. Computational models have been used automatically for molecular toxicity prediction. In this paper, a machine learning-based model has been proposed. TF/IDF representation scheme has been used for N-gram and integrated with simplified molecular-input line-entry system (SMILES). Multiple machine learning classifiers such as logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), k-nearest neighbors (KNN), AdaBoost, multi-layer perceptron (MLP), and stochastic gradient descent (SGD) classifiers have been implemented. A wide range of N-gram models have been implemented and trigram reported the best results. RF and SVM achieved 85% and 84% accuracy respectively. Comparable to state-of-the-art models, our results are acceptable as we used minimum available resources.
Keywords
Feature extraction; Machine learning; Molecular toxicity prediction; N-gram; Simplified molecular-input line-entry system
Full Text:
PDFDOI: http://doi.org/10.11591/ijai.v13.i4.pp4380-4387
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
IAES International Journal of Artificial Intelligence (IJ-AI)
ISSN/e-ISSN 2089-4872/2252-8938
This journal is published by the Institute of Advanced Engineering and Science (IAES).