Skip to content

Incorrect gradient with different stride/kernel sizes #2

@conscell

Description

@conscell

@slvrfn
Thank you for the nice tutorial.
I compared the gradients obtained using your implementation with PyTorch's conv2d. For some values of strides and kernel sizes, the results seem to be incorrect.

import numpy as np
from convolution import Conv2D

in_channels = 3
out_channels = 128
kernel_size = 3
stride = 3                               ### <-- Here, the stride is changed
padding = 1
h_in = 12
w_in = 10

h_out = (h_in + 2 * padding - kernel_size) // stride + 1
w_out = (w_in + 2 * padding - kernel_size) // stride + 1

batch_size = (4, in_channels, h_in, w_in)
dout_size = (4, out_channels, h_out, w_out)

np.random.seed(42)

x = np.random.random(batch_size)  # create data for forward pass
dout = np.random.random(dout_size)  # create random data for backward
print('x: ', x.shape)
print('d_out: ', dout.shape)

conv = Conv2D(in_channels, out_channels, kernel_size, stride, padding)

conv_out = conv.forward(x)
print('conv_out: ', conv_out.shape)

db, dw, dx = conv.backward(dout)
print('db: ', db.shape)
print('dw: ', dw.shape)
print('dx: ', dx.shape)

import torch
xx = torch.tensor(x, requires_grad=True)
ww = torch.tensor(conv.weight, requires_grad=True)
bb = torch.tensor(conv.bias, requires_grad=True)
dd = torch.tensor(dout)

f = torch.nn.functional.conv2d(xx, ww, bb, stride, padding)
f.backward(dd)

assert np.allclose(dx, xx.grad.numpy())
assert np.allclose(dx, xx.grad.numpy())
AssertionError

Additionally, for kernel_size = 2 and stride = 2, the result is also incorrect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions