Skip to content

Commit

Permalink
2.5 (#336)
Browse files Browse the repository at this point in the history
* Initial commit (#299)

* Vantage Tree (#300)

* Initial commit

* Better testing

* Improve the docs

* Rename benchmark

* Explicitly import max() function

* Fix coding style

* Wrapper interface (#314)

* Add Wrapper interface for models wrappers

* Add WrapperAware trait

* Fix PhpDoc

* Revert "Add WrapperAware trait"

This reverts commit 241abc4.

* Rename Wrapper interface to EstimatorWrapper

* PHP CS fix

* Swoole Backend (#312)

* add Swoole backend

* phpstan: ignore swoole

* feat: swoole process scheduler

* fix(swoole): redo tasks when hash collision happens

* chore(swoole): make sure coroutines are at the root of the scheduler

* chore(swoole): set affinity / bind worker to a specific CPU core

* chore(swoole): use igbinary if available

* fix: remove comment

* fix(swoole): worker cpu affinity

* fix(swoole): cpu num

* feat: scheduler improvements

* style

* chore(swoole): remove unnecessary atomics

* chore(swoole): php backwards compatibility

* fix: phpstan, socket message size

* fix: uncomment test

* style: composer fix

* Plus plus check (#317)

* Initial commit

* Allow deltas in units tests

* Swoole docs (#326)

* add Swoole backend

* phpstan: ignore swoole

* feat: swoole process scheduler

* fix(swoole): redo tasks when hash collision happens

* chore(swoole): make sure coroutines are at the root of the scheduler

* chore(swoole): set affinity / bind worker to a specific CPU core

* chore(swoole): use igbinary if available

* fix: remove comment

* fix(swoole): worker cpu affinity

* fix(swoole): cpu num

* feat: scheduler improvements

* style

* chore(swoole): remove unnecessary atomics

* chore(swoole): php backwards compatibility

* fix: phpstan, socket message size

* fix: uncomment test

* style: composer fix

* docs: Swoole backend

* Fix coding style and composer.lock

* fix(swoole): setAffinity does not exist on some versions of Swoole (#327)

* Back out Swoole Backend code

* Bump version

---------

Co-authored-by: Ronan Giron <ElGigi@users.noreply.github.com>
Co-authored-by: Mateusz Charytoniuk <mateusz.charytoniuk@protonmail.com>
  • Loading branch information
3 people authored May 23, 2024
1 parent 696a2f6 commit d43a5f2
Show file tree
Hide file tree
Showing 32 changed files with 926 additions and 39 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
- 2.5.0
- Added Vantage Point Spatial tree
- Blob Generator can now `simulate()` a Dataset object
- Added Wrapper interface
- Plus Plus added check for min number of sample seeds
- LOF prevent div by 0 local reachability density

- 2.4.1
- Sentence Tokenizer fix Arabic and Farsi language support
- Optimize online variance updating
Expand Down
49 changes: 49 additions & 0 deletions benchmarks/Graph/Trees/VantageTreeBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<?php

namespace Rubix\ML\Benchmarks\Graph\Trees;

use Rubix\ML\Graph\Trees\VantageTree;
use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Datasets\Generators\Agglomerate;

/**
* @Groups({"Trees"})
* @BeforeMethods({"setUp"})
*/
class VantageTreeBench
{
protected const DATASET_SIZE = 10000;

/**
* @var \Rubix\ML\Datasets\Labeled;
*/
protected $dataset;

/**
* @var VantageTree
*/
protected $tree;

public function setUp() : void
{
$generator = new Agglomerate([
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
]);

$this->dataset = $generator->generate(self::DATASET_SIZE);

$this->tree = new VantageTree(30);
}

/**
* @Subject
* @Iterations(3)
* @OutputTimeUnit("seconds", precision=3)
*/
public function grow() : void
{
$this->tree->grow($this->dataset);
}
}
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
"@test",
"@check"
],
"analyze": "phpstan analyse -c phpstan.neon",
"analyze": "phpstan analyse -c phpstan.neon --memory-limit 1G",
"benchmark": "phpbench run --report=aggregate",
"check": [
"@putenv PHP_CS_FIXER_IGNORE_ENV=1",
Expand Down
12 changes: 10 additions & 2 deletions docs/datasets/generators/blob.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,16 @@ A normally distributed (Gaussian) n-dimensional blob of samples centered at a gi
```php
use Rubix\ML\Datasets\Generators\Blob;

$generator = new Blob([-1.2, -5., 2.6, 0.8, 10.], 0.25);
$generator = new Blob([-1.2, -5.0, 2.6, 0.8, 10.0], 0.25);
```

## Additional Methods
This generator does not have any additional methods.
Fit a Blob generator to the samples in a dataset.
```php
public static simulate(Dataset $dataset) : self
```

Return the center coordinates of the Blob.
```php
public center() : array
```
28 changes: 28 additions & 0 deletions docs/graph/trees/vantage-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/Graph/Trees/VPTree.php">[source]</a></span>

# Vantage Tree
A Vantage Point Tree is a binary spatial tree that divides samples by their distance from the center of a cluster called the *vantage point*. Samples that are closer to the vantage point will be put into one branch of the tree while samples that are farther away will be put into the other branch.

**Interfaces:** Binary Tree, Spatial

**Data Type Compatibility:** Depends on distance kernel

## Parameters
| # | Param | Default | Type | Description |
|---|---|---|---|---|
| 1 | max leaf size | 30 | int | The maximum number of samples that each leaf node can contain. |
| 2 | kernel | Euclidean | Distance | The distance kernel used to compute the distance between sample points. |

## Example
```php
use Rubix\ML\Graph\Trees\VantageTree;
use Rubix\ML\Kernels\Distance\Euclidean;

$tree = new VantageTree(30, new Euclidean());
```

## Additional Methods
This tree does not have any additional methods.

### References
>- P. N. Yianilos. (1993). Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ nav:
- Trees:
- Ball Tree: graph/trees/ball-tree.md
- K-d Tree: graph/trees/k-d-tree.md
- Vantage Tree: graph/trees/vantage-tree.md
- Kernels:
- Distance:
- Canberra: kernels/distance/canberra.md
Expand Down
15 changes: 14 additions & 1 deletion phpunit.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" backupGlobals="false" backupStaticAttributes="false" bootstrap="vendor/autoload.php" colors="true" convertErrorsToExceptions="true" convertNoticesToExceptions="true" convertWarningsToExceptions="true" forceCoversAnnotation="true" processIsolation="false" stopOnFailure="false" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd">
<phpunit
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
backupGlobals="false"
backupStaticAttributes="false"
bootstrap="vendor/autoload.php"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="true"
forceCoversAnnotation="true"
processIsolation="true"
stopOnFailure="false"
xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd"
>
<coverage processUncoveredFiles="true">
<include>
<directory suffix=".php">src</directory>
Expand Down
7 changes: 7 additions & 0 deletions src/Clusterers/Seeders/PlusPlus.php
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
use Rubix\ML\Kernels\Distance\Distance;
use Rubix\ML\Kernels\Distance\Euclidean;
use Rubix\ML\Specifications\DatasetIsNotEmpty;
use Rubix\ML\Exceptions\RuntimeException;

use function count;

Expand Down Expand Up @@ -49,12 +50,18 @@ public function __construct(?Distance $kernel = null)
*
* @param Dataset $dataset
* @param int $k
* @throws RuntimeException
* @return list<list<string|int|float>>
*/
public function seed(Dataset $dataset, int $k) : array
{
DatasetIsNotEmpty::with($dataset)->check();

if ($k > $dataset->numSamples()) {
throw new RuntimeException("Cannot seed $k clusters with only "
. $dataset->numSamples() . ' samples.');
}

$centroids = $dataset->randomSubset(1)->samples();

while (count($centroids) < $k) {
Expand Down
42 changes: 42 additions & 0 deletions src/Datasets/Generators/Blob.php
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,14 @@

use Tensor\Matrix;
use Tensor\Vector;
use Rubix\ML\DataType;
use Rubix\ML\Helpers\Stats;
use Rubix\ML\Datasets\Dataset;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Exceptions\InvalidArgumentException;

use function count;
use function sqrt;

/**
* Blob
Expand Down Expand Up @@ -37,6 +41,34 @@ class Blob implements Generator
*/
protected $stdDev;

/**
* Fit a Blob generator to the samples in a dataset.
*
* @param Dataset $dataset
* @throws InvalidArgumentException
* @return self
*/
public static function simulate(Dataset $dataset) : self
{
$features = $dataset->featuresByType(DataType::continuous());

if (count($features) !== $dataset->numFeatures()) {
throw new InvalidArgumentException('Dataset must only contain'
. ' continuous features.');
}

$means = $stdDevs = [];

foreach ($features as $values) {
[$mean, $variance] = Stats::meanVar($values);

$means[] = $mean;
$stdDevs[] = sqrt($variance);
}

return new self($means, $stdDevs);
}

/**
* @param (int|float)[] $center
* @param int|float|(int|float)[] $stdDev
Expand Down Expand Up @@ -74,6 +106,16 @@ public function __construct(array $center = [0, 0], $stdDev = 1.0)
$this->stdDev = $stdDev;
}

/**
* Return the center coordinates of the Blob.
*
* @return list<int|float>
*/
public function center() : array
{
return $this->center->asArray();
}

/**
* Return the dimensionality of the data this generates.
*
Expand Down
20 changes: 20 additions & 0 deletions src/EstimatorWrapper.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<?php

namespace Rubix\ML;

/**
* Wrapper
*
* @category Machine Learning
* @package Rubix/ML
* @author Ronan Giron
*/
interface EstimatorWrapper extends Estimator
{
/**
* Return the base estimator instance.
*
* @return Estimator
*/
public function base() : Estimator;
}
Loading

0 comments on commit d43a5f2

Please sign in to comment.