Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new stepsize rule adaPG(pi,r) #31

Merged
merged 1 commit into from
Jan 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions src/AdaProx.jl
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,43 @@ function stepsize(rule::OurRule, (gamma1, gamma0), x1, grad_x1, x0, grad_x0)
return (gamma, sigma), (gamma, gamma1)
end



struct OurRulePlus{R}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, we really need to improve the naming convention 😅

Should we just use “author initials + year + rule” all over the module?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a separate PR though, I can do that

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is a mess. The name OurRulePlus is terrible indeed. Your suggestion sounds like a great idea.

gamma::R
xi::R
nu::R
r::R
end

function OurRulePlus(; gamma = 0, nu = 1, xi = 1, r = 1/2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the default gamma = 0? I guess it’s required positive to initialize the stepsize sequence, so you can also not give it a default

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we keep it in the same style as OurRule (to be updated with added sigma for the primal-dual version), or shall I remove gamma already?

_gamma = if gamma > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also avoid introducing _gamma and just do

if gamma <= 0
    error(…)
end

no?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This stepsize rule is fed into the PD algorithm. That is why I kept the same style as we had for OurRule. Only sigma is changed to gamma because we should first check how exactly the stepsize rule looks like there.

gamma
else
error("you must provide gamma > 0")
end
R = typeof(_gamma)
return OurRulePlus{R}(_gamma, R(xi), R(nu), R(r))
end

function stepsize(rule::OurRulePlus)
gamma = rule.gamma
return (gamma, gamma), (gamma, gamma)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention we used elsewhere is for the second stepsize returned to be the dual stepsize (after fixing the ratio between the two). So here the ratio is just 1 right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again this is because in the near future we should add the primal-dual stepsize version with first (gamma, gamma) transforming into (gamma, sigma).

end

function stepsize(rule::OurRulePlus, (gamma1, gamma0), x1, grad_x1, x0, grad_x0)
C = norm(grad_x1 - grad_x0)^2 / dot(grad_x1 - grad_x0, x1 - x0) |> nan_to_zero
L = dot(grad_x1 - grad_x0, x1 - x0) / norm(x1 - x0)^2 |> nan_to_zero
D = 1- 2*rule.r + gamma1 * L * (gamma1 * C + 2*(rule.r-1) ) |> nan_to_zero
gamma = gamma1 * min(
sqrt( 1/(rule.r*(rule.nu + rule.xi)) + gamma1 / gamma0),
sqrt( (rule.nu*(1+rule.xi) -1)/(rule.nu*(rule.nu+rule.xi)) ) / sqrt(max(D,0))
)
return (gamma, gamma), (gamma, gamma1)
end



function adaptive_primal_dual(
x,
y;
Expand Down
Loading