Skip to content

Sign in adjoint derivative calculation #35

@spenrich

Description

@spenrich

I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.

dw = -(x @ dx + y @ dy + s @ ds)
dz = np.concatenate(
[dx, D_proj_dual_cone.rmatvec(dy + ds) - ds, np.array([dw])])
if np.allclose(dz, 0):
r = np.zeros(dz.shape)
elif mode == "dense":
r = _diffcp._solve_adjoint_derivative_dense(M, MT, dz)
else:
r = _diffcp.lsqr(MT, dz).solution
values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols]
dA = sparse.csc_matrix((values, (rows, cols)), shape=A.shape)
db = pi_z[n:n + m] * r[-1] - pi_z[-1] * r[n:n + m]
dc = pi_z[:n] * r[-1] - pi_z[-1] * r[:n]
return dA, db, dc

It seems like, when compared to the paper, the code solves M.T @ r = dz for r, whereas the paper solves M.T @ g = -dz for g. So r = -g. But then the equations used in the code to compute (dA, db, dc) seem to match those in the paper, when they should all differ by a negative sign.

Similarly, for the forward-mode derivative, you solve M @ dz = dQ @ pi_z for dz, use the same equations as in the paper despite the sign difference, but you multiply (dx, dy, dz) by -1 before returning, so this is fine.

Is this a sign error in the adjoint derivative, or did I get something wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions