Reinforcement learning (RL) example w/RLMolecule and Ray (5:47)

The goal of the rlmolecule library is to enable general-purpose material and molecular optimization using reinforcement learning. It explores the molecular space by adding one atom/bond at a time, learning how to make molecules with desired properties. The notebook makes running your own molecular optimization easy and accessible. Parameter description tables are shown below.

../_images/RL-schematic-full.png

Full 1080p HD resolution and full-screen available through menu on the bottom right of the video

Option

Description

Starting molecule

Starting point for each molecule building episode

atom additions

Atom types to choose from when building molecules

max-atoms

Maximum number of heavy atoms

max-#-actions

Maximum number of actions to allow when building molecules

SA threshold

Potential molecules with a Synthetic Accessibility (SA)
score greater than the threshold will not be considered.
Used to filter out molecules unlikely to be synthesizable.

Output isomeric smiles

Option to not include information about stereochemistry in
the starting molecule.

Stereoisomers

Option to consider stereoisomers as different molecules.

Canonicalize tautomers

Option to use RDKit’s tautomer canonicalization functionality.

3D embedding

Try to get a 3D embedding of the molecule, and if this fails
remove it.

Cache

Option to cache molecule building for a given SMILES input
to speed up subsequent evaluations.

GDB filter

Option to apply filters from the gdb17 paper to get more
realistic, drug-like molecules e.g., no allenes (C═C═C).
See tables 1-3 here: https://doi.org/10.1021/ci300415d)..)

Hyperparameter

Good default value

Range of good values

Description

Training parameters

gamma

1

0.8 - 1.0

Float specifying the discount factor for future rewards i.e., of the Markov
Decision process. This can be thought of as how far into the future the agent
should care about possible rewards. In situations when the agent should be acting
in the present in order to prepare for rewards in the distant future, this value
should be large.

lr

0.001

0.0001, 0.001, 0.01

Learning rate corresponds to the strength of each gradient descent update step.
This should typically be decreased if training is unstable, and the reward does
not consistently increase.

entropy_coeff

0.001

0, 0.001, 0.005, 0.01

Coefficient of the entropy regularizer. A policy has maximum entropy when all policies
are equally likely and minimum when the one action probability of the policy is dominant.
The entropy coefficient is multiplied by the maximum possible entropy and added to loss.
This helps prevent premature convergence of one action probability dominating the policy
and preventing exploration.

clip_param

0.2

0.1,0.2,0.3

Hyperparameter for clipping in the policy objective. Roughly: how far can the new policy
go from the old policy while still profiting (improving the objective function)?

kl_coeff

0

0.0 - 1.0

Initial coefficient for KL divergence. Penalizes KL-divergence of the old policy vs the
the new policy in the objective function. Larger values mean a larger penalty (smaller updates).

sgd_minibatch_size

10

10 - 256

SGD: Stochastic Gradient DescentTotal batch size across all devices for SGD.
This defines the minibatch size within each epoch. Typically a larger batch size
corresponds to more stable training updates.

num_sgd_iter

5

3 - 30

Number of SGD iterations in each outer loop (i.e., number of epochs to execute
per train batch).

train_batch_size

1000

100 - 5000

Training batch size. Each train iteration samples the environment for
<train_batch_size> steps.

GNN policy model

features

64

32,64,128,256

Width of message passing layers

num_messages

3

1 - 12

Number of messages to use in the Graph Neural Network

RL Run Paramters

iterations

10

2-1000

During each iteration, a certain number of “episodes” are run. Each episode is building
a molecule and calculating its reward. The number of episodes is set by the
“train_batch_size” option.

#-of-rollout workers

1

N/A

Number of CPU threads available for rollout workers. Workflow will show maximum of threads
available. If this is set to less than the maximum available threads, the excess threads
will be distributed to additional grid search parameters (if a grid search is specified)