Improve numerical stability of torch.sigmoid by ymwangg · Pull Request #4311 · pytorch/xla

ymwangg · 2022-12-10T00:18:50Z

We recently found the torch_xla lowering of torch.sigmoid is not numerically stable on GPU. One common use-case of torch.sigmoid is to force the output value to be within [0,1].
For example, the following code failed with nan loss because x = -5.9604645e-08.

x = torch.sigmoid(torch.tensor([-16.740633],device=device))
y = torch.tensor([1.0],device=device)
print(torch.nn.functional.binary_cross_entropy(x,y)) # print tensor(nan, device='xla:1')

Are there any special reasons for torch_xla to use sigmoid(x) = 0.5+0.5*tanh(0.5*x) instead of sigmoid(x) = 1 / (1 + exp(-x))?

ymwangg · 2022-12-12T18:58:05Z

Ok it looks this issue exists on both cpu and gpu:
cpu: https://app.circleci.com/pipelines/github/pytorch/xla/14953/workflows/946bc70e-a4ba-4e89-be01-0d7ea7f653cb/jobs/34704
gpu: https://app.circleci.com/pipelines/github/pytorch/xla/14953/workflows/946bc70e-a4ba-4e89-be01-0d7ea7f653cb/jobs/34703

JackCaoG · 2022-12-13T00:37:24Z

I have a feeling that it might be because sigmoid(x) = 0.5+0.5*tanh(0.5*x) is faster.. let me double check.

ymwangg · 2022-12-13T19:43:30Z

Yes, the tanh implementation is slightly faster on GPU.
Using the following script:

x = torch.rand(1000000000,device=device)
xm.mark_step()
t0 = time.time()
for _ in range(100):
    for _ in range(100):
        y = torch.sigmoid(x)
    xm.mark_step()
t1 = time.time()
print(t1-t0)

I'm getting 1.2621409893035889 with tanh implementation (with clamp) and 1.301847219467163 with normal implementation.

If we want to keep the tanh implementation, one way is to wrap it with xla::Clamp(zero, half + half * xla::Tanh(half * input), one).

JackCaoG · 2022-12-14T15:04:49Z

I talked with Blake. Speed was the main reason we used tanh and TPU does not have this numerical instability issue. He suggested us to lower sigmod using XlaOp Logistic(XlaOp operand); which will have different TPU and GPU implementation in the backend to handle the subtle difference in accelerators.

ymwangg · 2022-12-14T18:05:03Z

Updated and thanks for the info. I just realize xla::Logistic is equivalent to torch.sigmoid.

JackCaoG

Thanks!

JackCaoG added the xla:gpu label Dec 10, 2022

ymwangg force-pushed the fix_sigmoid branch from 6681a9b to 90ca591 Compare December 12, 2022 19:18

ymwangg changed the title ~~[Draft] Improve numerical stability of torch.sigmoid~~ Improve numerical stability of torch.sigmoid Dec 12, 2022

Add unit test for sigmoid boundary check

50fb372

ymwangg force-pushed the fix_sigmoid branch from 90ca591 to 7e40c0a Compare December 14, 2022 18:01

Improve numerical stability of torch.sigmoid

456d196

ymwangg force-pushed the fix_sigmoid branch from 7e40c0a to 456d196 Compare December 14, 2022 18:03

JackCaoG approved these changes Dec 14, 2022

View reviewed changes

JackCaoG added the lowering ATen Operation lowering label Dec 14, 2022

JackCaoG merged commit 453aa65 into pytorch:master Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve numerical stability of torch.sigmoid#4311

Improve numerical stability of torch.sigmoid#4311
JackCaoG merged 2 commits intopytorch:masterfrom
ymwangg:fix_sigmoid

ymwangg commented Dec 10, 2022

Uh oh!

ymwangg commented Dec 12, 2022

Uh oh!

JackCaoG commented Dec 13, 2022

Uh oh!

ymwangg commented Dec 13, 2022

Uh oh!

JackCaoG commented Dec 14, 2022

Uh oh!

ymwangg commented Dec 14, 2022

Uh oh!

JackCaoG left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ymwangg commented Dec 10, 2022

Uh oh!

ymwangg commented Dec 12, 2022

Uh oh!

JackCaoG commented Dec 13, 2022

Uh oh!

ymwangg commented Dec 13, 2022

Uh oh!

JackCaoG commented Dec 14, 2022

Uh oh!

ymwangg commented Dec 14, 2022

Uh oh!

JackCaoG left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants