Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criteo benchmark #52

Open
mrocklin opened this issue May 9, 2017 · 2 comments
Open

Criteo benchmark #52

mrocklin opened this issue May 9, 2017 · 2 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented May 9, 2017

I tried out dask-glm on a subset of the criteo data here:

https://gist.github.com/mrocklin/1a1c0b011e187a750a050eb330ac36b2

This used the following:

  1. The LogisticRegression class
  2. The lbfgs solver
  3. Mixed dense/sparse arrays from dask.array

I suspect that there is still a fair amount to do here to optimize performance and quality of the model

cc @moody-marlin @TomAugspurger @MLnick

@mrocklin
Copy link
Member Author

Some questions that arose from this work:

  1. Are our sparse matvecs performing correctly? (I wasn't getting much signal from the sparse columns at one point)
  2. How can we transform our data or model to handle the large imbalance between not-clicked (97%) and clicked (3%) ads
  3. How can we make L1 regularization work when we pass non-differentiable points
  4. Should we create transformers for the sparsification of text columns? Are there already transformers in place to do some of this that we should be using?

@MLnick
Copy link
Contributor

MLnick commented May 10, 2017

For (2), typical approaches include over/under sampling (e.g. https://github.com/scikit-learn-contrib/imbalanced-learn), and using sample weights (which most sklearn estimators support).

(3): L1 reg won't work with L-BFGS. As mentioned in the discussion for #40 there is the "trick" to use L1 with L-BFGS, or one must use OWL-QN (such as https://pypi.python.org/pypi/PyLBFGS/0.1.3).

You should be able to use regularizer='l2' though?

(4): By "sparsification" do you mean one-hot-encoding or feature hashing type approaches? As it seems you've used feature hashing here? (which is what I've been using for Criteo data too). Sklearn's relevant transformers are OneHotEncoder and FeatureHasher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants