Troubleshooting/Tips

Common Issues

Remote setup is complex and depends on a stable communication with the remote servers.
  • Check the internet connection

  • Often re-trying a compute session launch will be successful

  • If re-trying is not successful, quitting and restarting Chemistream is recommended.

  • Be sure to quit any VPN connection before launching compute sessions.

  • Sometimes if a compute session does not startup and configure successfully, a remote instance can still be started, so be sure to *terminate* any compute sessions you are not using.

SLURM scheduler issues.
  • #-procs/job sets processors per cluster instance.

  • Multiple jobs will not run on a single instance.

  • If multiple small runs are needed, start a compute session with more instances with fewer cores (to save money).

RLMolecule (Reinforcement Learning) image:
  • To restore tensorboard behavior
    • terminate all RLMolecule compute sessions

    • restart Chemistream application in order to free TensorBoard ports

    • these limitations will be fixed in newer versions

Contact swsides@txcorp.com for further questions.

Reconnecting to Compute Session

  • Navigate to any notebook that was previously running

  • Open the notebook

  • Go to ‘File’ —> ‘Revert Notebook to Checkpoint’

  • Checkpoint will be state of notebook when last saved using ‘File’ —> ‘Save Notebook’

  • Note: if a notebook or compute session is quit while a python kernel is running, the state of the kernel will not be restored. Any jobs running in a SLURM queue, will still be running

Compute Session Jupyterlab Tips

pic1 pic2