Skip to content

Commit

Permalink
cleanup and wasm movemask (#81)
Browse files Browse the repository at this point in the history
* add SSE2 movemask
* add wasm_movemask and update js benchmarks
* update npm message for EAGAIN readsync error using sync read + stdin on wasm
  • Loading branch information
liquidaty authored Nov 22, 2022
1 parent 1b4c4f4 commit 6f75955
Show file tree
Hide file tree
Showing 13 changed files with 312 additions and 74 deletions.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ that implements the expected
[app/benchmark/README.md](app/benchmark/README.md)
* Low memory usage (regardless of how big your data is) and size footprint for
both lib (~20k) and CLI executable (< 1MB)
* Easy to use as a library in a few lines of code
* Easy to use as a library in a few lines of code, via either pull or push parsing
* Includes the `zsv` CLI with the following built-in commands:
* `select`, `count`, `sql` query, `desc`ribe, `flatten`, `serialize`, `2json`,
`2db`, `stack`, `pretty`, `2tsv`, `jq`, `prop`, `rm`
Expand Down Expand Up @@ -157,10 +157,6 @@ choco.exe install zsv -source .\zsv-amd64-windows-mingw.nupkg
choco.exe uninstall zsv
```

**NOTE**: Windows build has a runtime dependency on `libwinpthread-1.dll`.
Please download it from here (https://wikidll.com/mingw-w64/libwinpthread-1-dll)
according to your Windows version and place it with `zsv` executable.

#### Node

The zsv parser library is available for node:
Expand Down Expand Up @@ -256,7 +252,7 @@ zsv sql my_population_data.csv "select * from data where population > 100000"

### Using the API

Basic examples of using the API are in [examples/lib/README.md](examples/lib/README.md).
Full application code examples can be found at [examples/lib/README.md](examples/lib/README.md).

An example of using the API, compiled to wasm and called via Javascript,
is in [examples/js/README.md](examples/js/README.md).
Expand Down
2 changes: 1 addition & 1 deletion app/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ ifneq ($(findstring emcc,$(CC)),) # emcc
LDFLAGS+=-pthread
endif
else # not emcc
CFLAGS+= ${CFLAGS_AVX}
CFLAGS+= ${CFLAGS_AVX} ${CFLAGS_SSE}
LDFLAGS+=-lpthread # Linux explicitly requires
endif
UTILS=$(addprefix ${BUILD_DIR}/objs/utils/,$(addsuffix .o,${UTILS1}))
Expand Down
55 changes: 46 additions & 9 deletions configure
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Configuration:
Optional configuration:
--minimal=yes do not include extra features (default=no)
--arch=ARCH use -march=ARCH. Set to 'none' for none, else defaults to 'native'
--jq-prefix=JQ_PREFIX specify directory containing lib/libjq and include/jq.h
Installation directories:
--prefix=PREFIX main installation prefix [\$PREFIX or /usr/local]
Expand All @@ -30,6 +31,7 @@ Optional features:
--try-avx512 use avx512 instructions, if available [no]
--force-avx2 force compile with (no CPU check) or without -mavx2 [auto]
--force-avx force compile with (no CPU check) or without -mavx [auto]
--force-sse2 force compile with (no CPU check) or without -msse2 [auto]
--enable-lto compile with LTO (works with some but not all platforms/compilers) [no]
--enable-whole-program compile without -fwhole-program even if no -flto [yes]
--enable-pie build with position independent executables [auto]
Expand All @@ -45,6 +47,8 @@ Some influential environment variables:
CFLAGS C compiler flags [-Os -pipe ...]
LDFLAGS Linker flags
CROSS_COMPILING=no Set to yes to disable auto-detect compilation flags
Use these variables to override the choices made by configure.
EOF
Expand Down Expand Up @@ -244,7 +248,7 @@ trysharedldflag () {
}

# Beginning of actual script

CROSS_COMPILING=no
CFLAGS_AUTO=
CFLAGS_TRY=
LDFLAGS_AUTO=
Expand All @@ -254,7 +258,7 @@ if [ "$CONFIGFILE" = "" ]; then
CONFIGFILE=config.mk
fi

if [ "$ARCH" = "" ]; then
if [ "$ARCH" = "" ] && [ "$CROSS_COMPILING" = "no" ]; then
ARCH=native
fi

Expand Down Expand Up @@ -290,6 +294,9 @@ MINIMAL=no

TRY_LTO=no
TRY_WHOLE_PROGRAM=auto
FORCE_AVX2=auto
FORCE_AVX=auto
FORCE_SSE2=auto

help=yes
usepie=auto
Expand Down Expand Up @@ -323,6 +330,9 @@ for arg ; do
--force-avx|--force-avx=yes) FORCE_AVX=yes;;
--force-avx=no) FORCE_AVX=no;;

--force-sse2|--force-sse2=yes) FORCE_SSE2=yes;;
--force-sse2=no) FORCE_SSE2=no;;

--enable-lto|--enable-lto=yes) TRY_LTO=yes;;
--enable-lto|--enable-lto=auto) TRY_LTO=auto;;
--disable-lto|--enable-lto=no) TRY_LTO=no;;
Expand Down Expand Up @@ -517,18 +527,40 @@ tryflag CFLAGS -ffunction-sections
tryflag CFLAGS -fdata-sections

CFLAGS_AVX=
if [ "$FORCE_AVX2" = "yes" ] ; then

HAVE_AVX=
if [ "$FORCE_AVX2" = "no" ]; then
tryflag CFLAGS -mno-avx2
elif [ "$FORCE_AVX2" = "yes" ] ; then
CFLAGS_AVX=-mavx2
trycpusupport avx2 || echo "warning: avx2 forced but not supported on native CPU"
elif [ "$FORCE_AVX2" != "no" ] ; then
if [ "$CROSS_COMPILING" = "no" ] ; then
trycpusupport avx2 || echo "warning: avx2 forced but not supported on native CPU"
fi
elif [ "$FORCE_AVX2" = "auto" ] && [ "$CROSS_COMPILING" = "no" ] ; then
trycpusupport avx2 && CFLAGS_AVX=-mavx2
fi

if [ "$FORCE_AVX" = "yes" ] ; then
CFLAGS_AVX=-mavx || echo "warning: avx forced but not supported on native CPU"
elif [ "$FORCE_AVX" != "no" ] && [ "$CFLAGS_AVX" = "" ] ; then
elif [ "$FORCE_AVX" = "auto" ] && [ "$CFLAGS_AVX" = "" ] && [ "$CROSS_COMPILING" = "no" ] ; then
trycpusupport avx && CFLAGS_AVX=-mavx
fi
if [ "$FORCE_AVX" = "no" ]; then
tryflag CFLAGS -mno-avx
fi

if [ "$FORCE_SSE2" = "no" ]; then
tryflag CFLAGS -mno-sse2
elif [ "$FORCE_SSE2" = "yes" ] ; then
CFLAGS_SSE=-msse2
if [ "$CROSS_COMPILING" = "no" ] ; then
trycpusupport sse2 || echo "warning: sse2 forced but not supported on native CPU"
fi
elif [ "$FORCE_SSE2" = "auto" ] && [ "$CROSS_COMPILING" = "no" ] ; then
if [ "$CFLAGS_SSE" = "" ] && [ "$CROSS_COMPILING" = "no" ] ; then
trycpusupport sse2 && tryflag CFLAGS_SSE -msse2
fi
fi

HAVE_LTO=0
if [ "$TRY_LTO" = "yes" ]; then
Expand Down Expand Up @@ -557,7 +589,10 @@ tryflag CFLAGS_OPT -fvisibility=hidden
tryldflag LDFLAGS_AUTO -Wl,--gc-sections

if [ "$ARCH" != "none" ] ; then
tryldflag LDFLAGS_OPT -march=$ARCH
if ! tryflag CFLAGS -march=$ARCH ; then
echo "Flag -march=$ARCH failed!"
exit 1
fi
fi
tryldflag LDFLAGS_OPT -ldl

Expand Down Expand Up @@ -602,7 +637,7 @@ if [ "$usetermcap" = "yes" ] || [ "$usetermcap" = "auto" ] ; then
fi
fi

if [ "$JQ_PREFIX" != "" ] && [ "$ARCH" = "native" ]; then
if [ "$JQ_PREFIX" != "" ] && [ "$CROSS_COMPILING" = "no" ] ; then
echo "checking --prefix-jq ${JQ_PREFIX}"
if ! tryldflag LDFLAGS_JQ -ljq -L${JQ_PREFIX}/lib ; then
echo "Error: Failed to compile with -ljq and -L${JQ_PREFIX}/lib"
Expand Down Expand Up @@ -676,9 +711,11 @@ CFLAGS_LTO = $CFLAGS_LTO
LDFLAGS_AUTO = $LDFLAGS_AUTO
HAVE_AVX512=$HAVE_AVX512
CFLAGS_AVX_512=$CFLAGS_AVX_512
CFLAGS_AVX_512=$CFLAGS_AVX_512
CFLAGS_AVX=$CFLAGS_AVX
CFLAGS_SSE=$CFLAGS_SSE
CFLAGS_DEBUG = -U_FORTIFY_SOURCE -UNDEBUG -O0 -g -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter # -g3 -ggdb
LDFLAGS_DEBUG = -U_FORTIFY_SOURCE -UNDEBUG -O0 -g # -g3 -ggdb
CFLAGS_PIC = $CFLAGS_PIC
Expand Down
93 changes: 64 additions & 29 deletions examples/js/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ INDEX=${BUILD_DIR}/index.html
EMJS=${BUILD_DIR}/zsv.em.js
WASM=${BUILD_DIR}/zsv.em.wasm

CFLAGS+= ${CFLAGS_PIC} -s ALLOW_MEMORY_GROWTH=1 -s EXPORTED_RUNTIME_METHODS="['setValue','addFunction','removeFunction','writeArrayToMemory']" -s RESERVED_FUNCTION_POINTERS=4 -s EXPORTED_FUNCTIONS="['_free','_malloc']"
CFLAGS+= ${CFLAGS_PIC} -s ALLOW_MEMORY_GROWTH=1 -s EXPORTED_RUNTIME_METHODS="['setValue','addFunction','removeFunction','writeArrayToMemory']" -s RESERVED_FUNCTION_POINTERS=4 -s EXPORTED_FUNCTIONS="['_free','_malloc']" -sASSERTIONS

ifeq ($(DEBUG),1)
CFLAGS += ${CFLAGS_DEBUG}
Expand All @@ -64,14 +64,14 @@ TEST_PASS=echo "${COLOR_BLUE}$@: ${COLOR_GREEN}Passed${COLOR_NONE}"
TEST_FAIL=(echo "${COLOR_BLUE}$@: ${COLOR_RED}Failed!${COLOR_NONE}" && exit 1)
#####

.PHONY: help all run clean prep node setup benchmark count_compare
.PHONY: help all run clean prep node setup benchmark count_compare select_compare

help:
@echo "make [build|run|node|test|clean]"
@echo "by default, minified code is generated, which requires running the below once:"
@echo " make setup"
@echo "alternatively, to generate non-minified code, use NO_MINIFY=1:"
@echo " make NO_MINIFY=1 [build|run|node|test]"
@echo " make NO_MINIFY=1 [build|run|node|test|benchmark]"

build: ${BROWSER_JS} ${STATIC}
@echo Built ${BROWSER_JS}
Expand All @@ -91,7 +91,7 @@ test: npm/test/select_all.js node
@mkdir -p build/test
@cp -p $< node/
@echo "Running test (example) program \`node node/select_all.js ../../data/test/desc.csv\`"
@(cd node && node select_all.js ../../../data/test/desc.csv > ../build/test/out.json 2> ../build/test/out.err1)
@(cd node && ${NODE} select_all.js ../../../data/test/desc.csv > ../build/test/out.json 2> ../build/test/out.err1)
@sed 's/[0-9.]*ms//g' < build/test/out.err1 > build/test/out.err
@cmp build/test/out.err npm/test/out.err
@cmp build/test/out.json npm/test/out.json && ${TEST_PASS} || ${TEST_FAIL}
Expand Down Expand Up @@ -121,6 +121,7 @@ ifeq ($(NO_MINIFY),1)
@mv $@.tmp.js $@
else
@uglifyjs $@.tmp.js -c -m > $@
rm $@.tmp.js
endif

### node package build
Expand All @@ -142,6 +143,7 @@ ifeq ($(NO_MINIFY),1)
@mv $@.tmp.js $@
else
@uglifyjs $@.tmp.js -c -m > $@
rm $@.tmp.js
endif

setup:
Expand All @@ -152,50 +154,83 @@ node: ${NODE_WASM} ${NODE_INDEX} ${NODE_PKG_FILES}


#### node benchmark
BENCHMARK_INPUT=${THIS_MAKEFILE_DIR}/../../app/benchmark/worldcitiespop_mil-sc.csv
BENCHMARK_INPUT=${THIS_MAKEFILE_DIR}/../../app/benchmark/worldcitiespop_mil.csv

NODE=node --experimental-wasm-modules

benchmark: node count_compare select_compare

count_compare:
@cp -p npm/test/count*.js node/
@cd node && (npm list | grep csv-parser) && echo "csv-parser already installed" || npm install csv-parser
@cd node && (npm list | grep csv-parser) && echo "csv-parser already installed" || npm install csv-parser papaparse

@echo "zsv count"
head -5000 ${BENCHMARK_INPUT} | node node/count.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/count.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/count.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count.js 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | node node/count.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | node node/count.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | node node/count.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count.js 2>&1 | head -1


@echo "csv-parser count"
head -5000 ${BENCHMARK_INPUT} | node node/count-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/count-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/count-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count-csv-parser.js 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count-csv-parser.js 2>&1 | head -1

@echo "papaparse count"
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count-papaparse.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count-papaparse.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/count-papaparse.js 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | node node/count-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | node node/count-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | node node/count-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count-papaparse.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count-papaparse.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/count-papaparse.js 2>&1 | head -1

select_compare:
@cp -p npm/test/select_all*.js node/
@cd node && (npm list | grep csv-parser) && echo "csv-parser already installed" || npm install csv-parser
@cd node && (npm list | grep csv-parser) && echo "csv-parser already installed" || npm install csv-parser papaparse

@echo "zsv select_all"
head -5000 ${BENCHMARK_INPUT} | node node/select_all.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/select_all.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/select_all.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js '' '[0,2]' 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js '' '[0,2]' 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | node node/select_all.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | node node/select_all.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js '' '[0,2]' 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js '' '[0,2]' 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all.js 2>&1 | head -1

@echo "csv-parser select_all"
head -5000 ${BENCHMARK_INPUT} | node node/select_all-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/select_all-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | node node/select_all-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js '' '[0,2]' 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js '' '[0,2]' 2>&1 | head -1

head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js '' '[0,2]' 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js '' '[0,2]' 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-csv-parser.js 2>&1 | head -1

@echo "papaparse select_all"
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js '' '[0,2]' 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js '' '[0,2]' 2>&1 | head -1

head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js 2>&1 | head -1
head -5000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js '' '[0,2]' 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js '' '[0,2]' 2>&1 | head -1

head -500000 ${BENCHMARK_INPUT} | node node/select_all-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | node node/select_all-csv-parser.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js 2>&1 | head -1
head -500000 ${BENCHMARK_INPUT} | ${NODE} node/select_all-papaparse.js 2>&1 | head -1
22 changes: 14 additions & 8 deletions examples/js/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,18 @@ this example does not require that libzsv is already installed

## Performance

Running ZSV lib from Javascript is still experimental and is not yet fully optimized. Some performance challenges are
unique to web assembly + Javascript, especially where a lot of string data
Running ZSV lib from Javascript is still experimental and is not yet fully optimized.
Some performance challenges rae particular to web assembly + Javascript, e.g. where a lot of string data
is being passed between Javascript and the library (see e.g. https://hacks.mozilla.org/2019/08/webassembly-interface-types/).

Furthermore, it is unlikely that zsv-lib can approach its full performance potential
until emscripten (or gcc) [can provide a SIMD-powered movemask function](https://github.com/WebAssembly/simd/pull/201). Until then, libzsv in emscripten resorts to the "slow"
movemask, which does have a significant impact.
However, initial results are promising:

Current testing suggests that on small files (under 1 MB), zsv-lib is 30-75% faster than, for example, the `csv-parser` library. However, on larger files,
due to the aforementioned Javascript/wasm memory overhead and lack of
SIMD movemask, it can be more than 50% slower than `csv-parser`.
* Running only "count", zsv-lib is ~90%+ faster than `csv-parser` and `papaparse`
* The more cell data that is fetched, the more this advantage diminishes due to the aforementioned Javascript/wasm memory overhead.
Our benchmarking suggests that if the entire row's data is fetched, performance is about on par with both csv-parser and papaparse.
If only a portion is fetched, performance is about the same for papaparse, and faster than csv-parser (how much faster
being roughly proportional to the difference between count (~90% faster) and the
amount of total data fetched)

## All the build commands

Expand All @@ -68,6 +69,11 @@ make clean

Add MINIFY=1 to any of the above to generate minified code

To run benchmark tests:
```
make benchmark
```

To see all make options:
```
make
Expand Down
Loading

0 comments on commit 6f75955

Please sign in to comment.