policy export error

Hi,
When I use nfsp to train my env, I encountered the following problem.
`RuntimeError: Function 'SoftmaxBackward0' returnen nan values in its 0th output`
By debugging, I found self.policy(state) outputed 0 in the function of agent.update. Just as the follows show,
![image](https://github.com/quantumiracle/MARS/assets/35334392/83acfae2-10b0-4c64-ba5b-362b07f1cbd1)

Because the part of output is 0, the value of log_probs is inf.
In my environment, the definition of observation_space and action_space is as follows:
self.observation_space = spaces.Box(low=0, high=1000, shape=(4,), dtype=np.float32)
self.action_space = spaces.Discrete(37)
Can you give me some suggestions?
Thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy export error #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

policy export error #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions