Yosegi is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.
There was a problem that it is too large to compress and save the data as it is in the Big Data era. From the demand for improvement in compression ratio and read performance, several columnar data formats (for example, Apache ORC and Apache Parquet) were proposed. They achieve the high compression ratio from similar data in column and reading performance for grouping data by column when data is used.
However, these data formats are required the data structure in a row (or a record) should be defined before saving the data. It was necessary to decide how to use it at the time of data storage, and it was often a problem that it was difficult to decide what kind of data to use.
In this project, we provide a new columnar format which does not require the schema at the time of data storage with compression and read performance equal to (or higher in case) than other formats.
Analyzing big data requires store data compactly and get data smoothly. Yosegi as a columnar format is useful for this needs.
Data Lake is a data pool that is not required the data structure (as a schema) in the row at the time of data storage. And stored data can be used with defining its schema at the time of analyzing. See DataLake.
This project is on the Apache License. Please treat this project under this license.
For easy usage please see the quick start.
Please see the repository of yosegi-tools for details.
If you want to know what kind of function it has, look at the command list.
Yosegi supports Apache Hadoop. Please see the repository of yosegi-hadoop for details.
For easy usage please see quick start.
Yosegi supports Apache Hive. Please see the repository of yosegi-hive for details.
For easy usage please see quick start.
Yosegi supports Apache Spark. Please see the repository of yosegi-spark for details.
For easy usage please see quick start.
Support and discussion of Yosegi are on the Mailing list.
- Mailing list: yosegi@googlegroups.com
- Bug trackter:JIRA
We plan to support and discussion of Yosegi on the Mailing list. However, please contact us via GitHub until ML is opened.
We welcome to join this project widely.
For information on how to start contributing to the project, please refer to the Yosegi contribution guide.
Following environments are required.
- Mac OS X or Linux
- Java 8 Update 92 or higher (8u92+), 64-bit
- Maven 3.3.9 or later (for building)
Yosegi sources can get from the Maven repository.
Compile each source following instructions.
$ mvn clean install