-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle intercept term? #13
Comments
If the question is, how do I concatenate a column of ones, then the answer is to use the In [2]: import dask.array as da
In [3]: x = da.random.random((5, 2), chunks=(2, 2))
In [4]: o = da.ones((x.shape[0], 1), chunks=(x.chunks[0], (1,)))
In [5]: z = da.concatenate([x, o], axis=1)
In [6]: z.compute()
Out[6]:
array([[ 0.16174789, 0.06872224, 1. ],
[ 0.01018076, 0.68570003, 1. ],
[ 0.31238221, 0.91503403, 1. ],
[ 0.90225416, 0.04750495, 1. ],
[ 0.98440154, 0.22888387, 1. ]]) |
Naive attempt at using this to add an intercept makes X = da.random.random((100, 2), chunks=(50,2))
y = make_y(X, beta=np.array([-1.0, 2]), chunks=(50,))
o = da.ones((X.shape[0], 1), chunks=(X.chunks[0], (1,)))
z = da.concatenate([X, o], axis=1)
admm(z, y)
...
ValueError: shapes (50,1) and (3,) not aligned: 1 (dim 1) != 3 (dim 0)
Traceback
File "algorithms.py", line 199, in wrapped
return func(beta, X, y) + (rho / 2) * np.dot(beta - z + u,
File "families.py", line 17, in pointwise_loss
Xbeta = X.dot(beta) This could be an issue with how |
These lines are problematic
I recommend first rechunking these arrays to have only a single chunk along columns In [8]: z
Out[8]: dask.array<concate..., shape=(100, 3), dtype=float64, chunksize=(50, 2)>
In [9]: z.rechunk((None, z.shape[1]))
Out[9]: dask.array<rechunk..., shape=(100, 3), dtype=float64, chunksize=(50, 3)> |
What is the best way to handle intercepts?
Right now, the algorithms assume the user creates a column of
1
s in their dask array, à lastatsmodels
. However, sometimes it's convenient to have afit_intercept
option similar toscikit-learn
. Having this option set toTrue
will require a step which appends a column of 1's to the user-supplied dask array, but it won't be as simple as the correspondingnumpy
case.@mrocklin
The text was updated successfully, but these errors were encountered: