This is a demo project showcasing how to build Machine Learning (ML) based microservices with Dropwizard and PMML. PMML file encapsulates the predictive ML function, thus there is no need to write any code for it. This creates a clear separation of concerns, where
- Data Scientists train a predictive function and export it to a PMML file
- Engineers load the PMML file into an evaluation function (thanks to PMML API) and feed it with inputs from the clients, which is easy to automate.
This code also introduces PmmlProperty
annotation, which helps to glue service inputs with the PMML function inputs. For example, PMML input defined as
<DataField name="x1" optype="continuous" dataType="float"/>
is mapped to the relevant DTO (service input) as
public class UrlResource {
@PmmlProperty(name = "x1")
@JsonProperty
@Max(1)
@Min(-1)
private int urlLength;
and the service's resource will simply
...
private final ScoreCalculator<UrlResource> phishingClassifier;
...
@POST
@Timed
public Map<String, ?> classifyResource(@Valid UrlResource urlResource) {
return phishingClassifier.calculate(urlResource);
}
This particular demo pretends to perform phishing classification from the length of URLs, which is not a real scenario of course, but works for the demo purposes. Special thanks to Dr. David Pryce for building the PMML file for this demo using sklearn2pmml.
Make sure you have Java 8 and Maven 3.2.x installed in order to build the project.
From the command line
mvn clean verify
The build will produce the ml-phishing-classification-service-1.0.0-SNAPSHOT-dist.tar.gz
file in the target/
folder.
Unpack the tar.gz from the target/
folder.
tar -xvzf ml-phishing-classification-service-1.0.0-SNAPSHOT-dist.tar.gz
This will produce a local folder ml-phishing-classification-service/
. Open it
cd ml-phishing-classification-service/
Start the service
./bin/run.sh
If you need to configure the service differently, check etc/
folder for more details.
The following folder src/test/resources
contains a test script. Open the folder and execute
./test-endpoints.sh
There will be 3 calls to the service
- one to healthcheck endpoint
- one call to the evaluation endpoint with HTTP status 200
- one call to the evaluation endpoint which should trigger validation failure with HTTP status 422
The following folder src/test/resources
contains a Python script. Open the folder and execute while the service is running
python2 data-validator.py testdata-2.json
The Python code as designed to run under Python 2.7