MachineLearningModelGenerator

The model generator is a microservice component, which generates and stores classifiers for various machine learning approaches. It is written in Java and uses the Spring framework.

Getting started

Prerequisites

Cache
The model generator requests specific offers of idealo from the cache.
MongoDB
The model generator uses MongoDB for loading the results of the matching process. Those data are used for training the classifiers. It is expected that those information is separated into multiple collections (one for every shop, named by the corresponding shop ID).
File System
The model generator saves he different classifiers used by the matcher on disk in a folder named "models".
3.1. One file is named model.json. It contains a serialized classifier used for deciding whether a parsed offer and an idealo offer match or not.
3.2. One file is named category.json. It contains one serialized neural network for classifying the category of a parsed offer.
3.3. One file is named brand.json. It contains one serialized neural network for classifying the brand of a parsed offer.

Configuration

Environment variables

MLMG_PORT: The port that should be used by the model generator
MONGO_IP: The IP of the MongoDB instance
MONGO_PORT: The port of the MongoDB instance
MONGO_MLMG_USER: The username to access the MongoDB
MONGO_MLMG_PW: The password to access the MongoDB
CACHE_IP: The URI of the cache microservice

Component properties

matchesPerShop: Base amount of offers that should be in training data per shop. This value will be undercut when maximum amount of matches would be exceeded.
maximumMatchesForLearning: Maximum size of training data
trainingSetPercentage: Percentage of training set.
labelThreshold: The minimum probability to classify the category and the brand of a parsed offer

How it works

The model generator (MLMG) receives a request to generate a specific classifier (neural network for brand/category classification or model for matching) or all three models together.
If not already loaded, MLMG will create testing and training set (if all three classifiers should be trained, this will always perform).
2.1. The MLMG gets results matched with EAN (correct matches) from all shops and divides them randomly into training and testing set.
2.2. For generation of the model, 50% of matching results are used for match class, 50% are shuffled for not-match class.
The MLMG trains the requested classifier(s).
If model was requested, the MLMG evaluates all trained models on the training set and chooses the best one.
The classifier(s) are stored in file system.

Future work

do not store models in file system
change training and testing set generation to get non-matches with a high similarity

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
ISSUE_TEMPLATE		ISSUE_TEMPLATE
Jenkinsfile		Jenkinsfile
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MachineLearningModelGenerator

Getting started

Prerequisites

Configuration

Environment variables

Component properties

How it works

Future work

About

Releases

Packages

Contributors 3

Languages

HPI-BP2017N2/MachineLearningModelGenerator

Folders and files

Latest commit

History

Repository files navigation

MachineLearningModelGenerator

Getting started

Prerequisites

Configuration

Environment variables

Component properties

How it works

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages