Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Sabrina Jaeger, Simone Fulle, Samo Turk

BioMed X Innovation Center

Methods

使用ZINC dataset,和ChEMBL dataset。使用方法如下

encodes each atom with different radius using the Morgan algorithm, and such substructure are taken as the words.

Then the molecule can be taken as the sentence by placing the word/identifier with the orders of atoms in the canonical SMILES.

With such sentence formulation, word2vec can help learn the substructure representation, which is summed up as the sentence/molecule representation.

Appendix

方法上并不是特别新颖,从最早的Learning to SMILE(S)开始就有。

results matching ""

    No results matching ""