Description

Main purpose

This repository introduces naive way to implement some kind of spark. It allows you run application server(implemented using scalatra) and run jobs for files within current directory.

It works with two kind of workers:

Standalone - worker deploys into the same machine where running main application and uses full available resources, if file you want to process has size more then available RAM you will get an error.
Remote - worker deploys into the amazon's ec2. When deploy started we calculating file size you want to process and choose appropriate instance.Type. To do it works you need change config and setup env variables for amazon(AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID) and change .pem key location in config.

Running jobs

Now when you clicking on fileName from webUi it automaticly runs wordsCount job. For now, if you want to run your own job, you need write expressions and place it into the file. Your expression must have type

Function2[String, String]

In future I will add more user friendly ways to create job like ACE editor and you as a user will have ability not to specify returning type of your job.

Example of wordsCountJob:

new Function[String, String] {
  def apply(file: String): String = {
    val content: List[String] = scala.io.Source.fromFile(file).getLines().toList
    val jobResult = content.flatMap(line => line.split(" ")).map(word => (word, 1)).groupBy(_._1)
      .filter(_._1 != "")
      .map { case (_, traversable) => traversable.reduce{
        (a,b) => (a._1, a._2 + b._2)}
      }
      .filterNot(_._1.startsWith("*"))
      .filterNot(_._1.contains("."))

    jobResult.mkString("")
  }
}

NOTE: This repository's purpose just get familiar with scala programming language and Actor model. Please, feel free to contribute if you want, but it will never run in production or release into maven central.

Roadmap

ACE Editor to allow create code for job in fly
Streaming support
AMQP support
Package workers to Docker

Build & Run

$ cd files-watcher
$ ./sbt
> ~ ;build ;server
> browse

If browse doesn't launch your browser, manually open http://localhost:8080/ in your browser.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
lib		lib
message_bus		message_bus
project		project
src		src
worker		worker
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Main purpose

Running jobs

Roadmap

Build & Run

About

Releases

Packages

Languages

Olefine/files-watcher

Folders and files

Latest commit

History

Repository files navigation

Description

Main purpose

Running jobs

Roadmap

Build & Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages