Different machine learning models exist and can be trained with experimental data to generate property predictors for activity or ADME-Tox, among other properties. These models range from Partial Least Squares (PLS) regressions to Random Forests, Bayesian Networks or Deep Neural Networks with different advantages and drawbacks. However, none of them is always the best for all applications and use cases. Each application benefits from one or another algorithm depending on the amount and characteristics of the training data and the desired outcome of the study. For instance, some models are useful to get insights from the results while, other models are used as black boxes.
Moreover, plain encodings such as SMILES codes or simple descriptors such as 2D fingerprints are used in many cases to train the models, providing a fast methodology that works for some applications. These descriptors, however, are missing important information, such as the 3D property distribution. As stated above, if simplifications are made when selecting molecular descriptors, subtle differences that are key for predicting the property of interest might be lost. Pharmacelera is capable of using the 3D molecular interaction fields generated with our high quality hydrophobicity molecular descriptors which have been validated in two publications co-signed with GSK.