Troubleshooting/Tips
Common Issues
- Remote setup is complex and depends on a stable communication with the remote servers.
Check the internet connection
Often re-trying a compute session launch will be successful
If re-trying is not successful, quitting and restarting Chemistream is recommended.
Be sure to quit any VPN connection before launching compute sessions.
Sometimes if a compute session does not startup and configure successfully, a remote instance can still be started, so be sure to *terminate* any compute sessions you are not using.
- SLURM scheduler issues.
#-procs/job sets processors per cluster instance.
Multiple jobs will not run on a single instance.
If multiple small runs are needed, start a compute session with more instances with fewer cores (to save money).
- RLMolecule (Reinforcement Learning) image:
- To restore tensorboard behavior
terminate all RLMolecule compute sessions
restart Chemistream application in order to free TensorBoard ports
these limitations will be fixed in newer versions
Contact swsides@txcorp.com for further questions.
Reconnecting to Compute Session
Navigate to any notebook that was previously running
Open the notebook
Go to ‘File’ —> ‘Revert Notebook to Checkpoint’
Checkpoint will be state of notebook when last saved using ‘File’ —> ‘Save Notebook’
Note: if a notebook or compute session is quit while a python kernel is running, the state of the kernel will not be restored. Any jobs running in a SLURM queue, will still be running