Profile results #26

mrocklin · 2017-02-03T15:53:56Z

On eight m4.2xlarges I created the following dataset

N = 1e8
beta = np.array([-1, 0, 1, 2])
M = 4
chunks = 1e6
seed = 20009

X = da.random.random((N, M), chunks=(chunks, M))
z0 = X.dot(beta)
y = da.random.random(z0.shape, chunks=z0.chunks) < sigmoid(z0)

X, y = persist(X, y)

I then ran the various methods within this project and recorded the profiles as bokeh plots. They are linked to below:

Additionally, I ran against a 10x larger dataset and got the following results

proximal grad

Most runtimes were around a minute. The BFGS solution gave wrong results.

Notes

On larger problems with smallish chunks (8 * 4 * 1e6 == 24 MB) we seem to be bound by scheduling overhead. I've created an isolated benchmark here that is representative of this overhead: https://gist.github.com/mrocklin/48b7c4b610db63b2ee816bd387b5a328

mrocklin · 2017-02-03T21:13:52Z

Testing notebook: https://gist.github.com/58ebc10424acd99d4514003e6d978076

cicdw · 2017-02-10T16:02:16Z

@mrocklin Has gradient_descent been optimized (using delayed, persist, etc.) in the same way that the other functions have? I might be refactoring soon and I wanted to make sure that piece was taken care of first.

mrocklin · 2017-02-10T18:06:33Z

I think so

…

On Fri, Feb 10, 2017 at 10:02 AM, Chris White ***@***.***> wrote: @mrocklin <https://github.com/mrocklin> Has gradient_descent been optimized (using delayed, persist, etc.) in the same way that the other functions have? I might be refactoring soon and I wanted to make sure that piece was taken care of first. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszM01FIsKzfBknsvirg5sBt1agtVmks5rbIoJgaJpZM4L2e93> .

mrocklin · 2017-02-14T17:43:20Z

Note that @eriknw is working on a dask optimization that may help to reduce overhead here: dask/dask#1979

mrocklin · 2017-02-14T22:08:02Z

I sat down with @amueller and we compare with sklearn's SGD. We found that proximal_grad and sklearn.SGD were similar in terms of runtime on a single machine (using dask.distributed, we didn't try the threaded scheduler). Presumably SGD was being a bit smarter and dask-glm was using more hardware.

cicdw · 2017-02-15T19:31:09Z

@mrocklin Did you look at ADMM? I'm currently starting to think that, going forward, we only employ ADMM, Newton, and gradient_descent.

mrocklin · 2017-02-15T19:34:35Z

Nope, we only spent a few minutes on it. We ran the following:

Prep

import dask.array as da
import numpy as np
from dask_glm.logistic import *
from dask_glm.utils import *

from distributed import Client
c = Client('localhost:8786')

N = 1e7
chunks = 1e6
seed = 20009

X = da.random.random((N,2), chunks=chunks)
y = make_y(X, beta=np.array([-1.5, 3]), chunks=chunks)

X, y = persist(X, y)

Dask GLM

%time proximal_grad(X,y)

SKLearn

from sklearn.linear_model import SGDClassifier
nX, ny = compute(X, y)
%time sgd = SGDClassifier(loss='log', n_iter=10, verbose=10, fit_intercept=False).fit(nX, ny)

cicdw · 2017-03-13T23:42:54Z

I haven't looked into whether we could use this data for benchmarking, but the incredibly large dataset over at https://www.kaggle.com/c/outbrain-click-prediction/data seems like it could be a good candidate. We might have to process the data a little bit before fitting a model, but I wouldn't mind taking a stab at that piece.

cc: @hussainsultan @jcrist

mrocklin mentioned this issue Feb 14, 2017

dask.array.jit or dask.array.vectorize dask/dask#1946

Closed

cicdw mentioned this issue Feb 24, 2017

big, large distributed GLM - Performance statsmodels/statsmodels#3091

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile results #26

Profile results #26

mrocklin commented Feb 3, 2017

mrocklin commented Feb 3, 2017

cicdw commented Feb 10, 2017

mrocklin commented Feb 10, 2017 via email

mrocklin commented Feb 14, 2017

mrocklin commented Feb 14, 2017

cicdw commented Feb 15, 2017

mrocklin commented Feb 15, 2017

cicdw commented Mar 13, 2017

Profile results #26

Profile results #26

Comments

mrocklin commented Feb 3, 2017

Notes

mrocklin commented Feb 3, 2017

cicdw commented Feb 10, 2017

mrocklin commented Feb 10, 2017 via email

mrocklin commented Feb 14, 2017

mrocklin commented Feb 14, 2017

cicdw commented Feb 15, 2017

mrocklin commented Feb 15, 2017

Prep

Dask GLM

SKLearn

cicdw commented Mar 13, 2017