Skip to content

Commit

Permalink
Update to Rubix ML 0.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewdalpino committed Jan 1, 2021
1 parent c1b520b commit 0a98aaa
Show file tree
Hide file tree
Showing 8 changed files with 10 additions and 4,218 deletions.
2 changes: 1 addition & 1 deletion LICENSE.md → LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2020 The Rubix ML Community
Copyright (c) 2020 Rubix ML
Copyright (c) 2020 Andrew DalPino

Permission is hereby granted, free of charge, to any person obtaining a copy
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ $losses = $estimator->steps();

You'll notice that the loss should be decreasing at each epoch and changes in the loss value should get smaller the closer the learner is to converging on the minimum of the cost function.

![Cross Entropy Loss](https://raw.githubusercontent.com/RubixML/Credit/master/docs/images/training-loss.svg?sanitize=true)
![Cross Entropy Loss](https://raw.githubusercontent.com/RubixML/Credit/master/docs/images/training-loss.png)

### Cross Validation
Once the learner has been trained, the next step is to determine if the final model can generalize well to the real world. For this process, we'll need the testing data that we set aside earlier. We'll go ahead and generate two reports that compare the predictions outputted by the estimator with the ground truth labels from the testing set.
Expand Down Expand Up @@ -276,12 +276,12 @@ $stats->toJSON()->write('stats.json');
### Visualizing the Dataset
The credit card dataset has 25 features and after one hot encoding it becomes 93. Thus, the vector space for this dataset is *93-dimensional*. Visualizing this type of high-dimensional data with the human eye is only possible by reducing the number of dimensions to something that makes sense to plot on a chart (1 - 3 dimensions). Such dimensionality reduction is called *Manifold Learning* because it seeks to find a lower-dimensional manifold of the data. Here we will use a popular manifold learning algorithm called [t-SNE](https://docs.rubixml.com/en/latest/embedders/t-sne.html) to help us visualize the data by embedding it into only two dimensions.

We don't need the entire dataset to generate a decent embedding so we'll take 2,000 random samples from the dataset and only embed those. The `head()` method on the dataset object will return the first *n* samples and labels from the dataset in a new dataset object. Randomizing the dataset beforehand will remove the bias as to the sequence that the data was collected and inserted.
We don't need the entire dataset to generate a decent embedding so we'll take 2,500 random samples from the dataset and only embed those. The `head()` method on the dataset object will return the first *n* samples and labels from the dataset in a new dataset object. Randomizing the dataset beforehand will remove the bias as to the sequence that the data was collected and inserted.

```php
use Rubix\ML\Datasets\Labeled;

$dataset = $dataset->randomize()->head(2000);
$dataset = $dataset->randomize()->head(2500);
```

### Instantiating the Embedder
Expand Down Expand Up @@ -325,7 +325,7 @@ $ php explore.php

Here is an example of what a typical 2-dimensional embedding looks like when plotted.

![t-SNE Embedding](https://raw.githubusercontent.com/RubixML/Credit/master/docs/images/embedding.svg?sanitize=true)
![t-SNE Embedding](https://raw.githubusercontent.com/RubixML/Credit/master/docs/images/embedding.png)

> **Note**: Due to the stochastic nature of the t-SNE algorithm, every embedding will look a little different from the last. The important information is contained in the overall *structure* of the data.
Expand All @@ -345,4 +345,4 @@ Institutions: (1) Department of Information Management, Chung Hua University, Ta
>- Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
## License
The code is licensed [MIT](LICENSE.md) and the tutorial is licensed [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
The code is licensed [MIT](LICENSE) and the tutorial is licensed [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
10 changes: 3 additions & 7 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"type": "project",
"description": "An example project that predicts the risk of credit card default using a Logistic Regression classifier and a 30,000 sample dataset of credit card customers.",
"homepage": "https://github.com/RubixML/Credit",
"license": "Apache-2.0",
"license": "MIT",
"keywords": [
"classification", "classifier", "credit score", "cross validation", "dataset", "data science",
"data visualization", "default risk prediction", "dimensionality reduction", "example project",
Expand All @@ -13,17 +13,13 @@
"authors": [
{
"name": "Andrew DalPino",
"email": "me@andrewdalpino.com",
"homepage": "https://andrewdalpino.com",
"homepage": "https://github.com/andrewdalpino",
"role": "Lead Engineer"
}
],
"require": {
"php": ">=7.2",
"rubix/ml": "^0.1.0"
},
"suggest": {
"ext-tensor": "For faster training and inference"
"rubix/ml": "^0.3.0"
},
"scripts": {
"explore": "@php explore.php",
Expand Down
Binary file added docs/images/embedding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 0a98aaa

Please sign in to comment.