-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile results #26
Comments
Testing notebook: https://gist.github.com/58ebc10424acd99d4514003e6d978076 |
@mrocklin Has |
I think so
…On Fri, Feb 10, 2017 at 10:02 AM, Chris White ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> Has gradient_descent been
optimized (using delayed, persist, etc.) in the same way that the other
functions have? I might be refactoring soon and I wanted to make sure that
piece was taken care of first.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszM01FIsKzfBknsvirg5sBt1agtVmks5rbIoJgaJpZM4L2e93>
.
|
Note that @eriknw is working on a dask optimization that may help to reduce overhead here: dask/dask#1979 |
I sat down with @amueller and we compare with sklearn's SGD. We found that proximal_grad and sklearn.SGD were similar in terms of runtime on a single machine (using dask.distributed, we didn't try the threaded scheduler). Presumably SGD was being a bit smarter and dask-glm was using more hardware. |
@mrocklin Did you look at ADMM? I'm currently starting to think that, going forward, we only employ ADMM, Newton, and gradient_descent. |
Nope, we only spent a few minutes on it. We ran the following: Prepimport dask.array as da
import numpy as np
from dask_glm.logistic import *
from dask_glm.utils import *
from distributed import Client
c = Client('localhost:8786')
N = 1e7
chunks = 1e6
seed = 20009
X = da.random.random((N,2), chunks=chunks)
y = make_y(X, beta=np.array([-1.5, 3]), chunks=chunks)
X, y = persist(X, y) Dask GLM%time proximal_grad(X,y) SKLearnfrom sklearn.linear_model import SGDClassifier
nX, ny = compute(X, y)
%time sgd = SGDClassifier(loss='log', n_iter=10, verbose=10, fit_intercept=False).fit(nX, ny) |
I haven't looked into whether we could use this data for benchmarking, but the incredibly large dataset over at https://www.kaggle.com/c/outbrain-click-prediction/data seems like it could be a good candidate. We might have to process the data a little bit before fitting a model, but I wouldn't mind taking a stab at that piece. |
On eight m4.2xlarges I created the following dataset
I then ran the various methods within this project and recorded the profiles as bokeh plots. They are linked to below:
Additionally, I ran against a 10x larger dataset and got the following results
Most runtimes were around a minute. The BFGS solution gave wrong results.
Notes
On larger problems with smallish chunks (8 * 4 * 1e6 == 24 MB) we seem to be bound by scheduling overhead. I've created an isolated benchmark here that is representative of this overhead: https://gist.github.com/mrocklin/48b7c4b610db63b2ee816bd387b5a328
The text was updated successfully, but these errors were encountered: