Diary - Frequent mistake for using torch.distributions.Normal

19 Mar 2022

To implement PPO algorithm with Pytorch, torch.distributions should be used. In case of your action space is continous space, you might use Normal distribution. At that time, you must check your mean and standard deviation dimension.

Let’s see the code below.

import torch
from torch.distributions import Normal

action = torch.Tensor([2,3]) # (2,)

a1 = torch.Tensor([2,3]) # (2,)
b1 = torch.Tensor([1,1]) # (2,)
dist1 = Normal(a1,b1)
logprob1 = dist1.log_prob(action)

print(logprob1)

a2 = torch.Tensor([[2],[3]]) # (2,1)
b2 = torch.Tensor([[1],[1]]) # (2,1)
dist2 = Normal(a2,b2)
logprob2 = dist2.log_prob(action)

print(logprob2)

a3 = torch.Tensor([[2,3]]) # (1,2)
b3 = torch.Tensor([[1,1]]) # (1,2)
dist3 = Normal(a3,b3)
logprob3 = dist3.log_prob(action)

print(logprob2)

# Result
tensor([-0.9189, -0.9189])
tensor([[-0.9189, -1.4189],
        [-1.4189, -0.9189]])
tensor([[-0.9189, -1.4189],
        [-1.4189, -0.9189]])

The main point is that **you should exactly match the action dimension and the mean, std dimensions. Dummy dimensions can cause issues because Normal realized it as multivariate Gaussian.

This issue is really hard to debug because they don’t give any errors.

Jae-Kyung Cho LLM Developers who was a Robotics engineer

Diary - Frequent mistake for using torch.distributions.Normal

Related posts

Survey - Multimodal LLM fine-tuning dataset 03 Jul 2024

Paper survey - Multimodal LLM 02 Jul 2024

Anthropic post - Mapping the Mind of a Large Language Model 27 May 2024