From f84de39b397b6eb5f044ae4d53d74f82b91aef66 Mon Sep 17 00:00:00 2001 From: Daniel Lehmann Date: Tue, 1 Dec 2020 20:18:40 +0100 Subject: [PATCH] add README for filtering and analysis --- filtering-and-analysis/README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/filtering-and-analysis/README.md b/filtering-and-analysis/README.md index 45393f3..4217e14 100644 --- a/filtering-and-analysis/README.md +++ b/filtering-and-analysis/README.md @@ -1,8 +1,14 @@ # Phase 2 and 3: Filtering and Extracting Metadata -Python/ TODO +This contains the code for combining the results from different collection sources, filtering unrepresentative binaries (e.g., do not validate, generated test files etc.), and static analysis tools and heuristics (e.g., for identifying the used memory allocator). -Rust/ TODO +In `Python/`, you find multiple scripts for (i) combining WebAssembly binaries from the different collection sources (see Phase 1) and deduplicating them based on the SHA256 hash of their contents, (ii) filtering them based on several heuristics and identified projects that are non-representative, and (iii) iteratively extracting and adding metadata to the JSON files (see `dataset-metadata/`). +In `util/` there are also smaller utilities, e.g., functions for nicer figures or printing Python `Counter`s. + +In `Rust/`, you find the source code of the static analysis tools for extracting additional information from the WebAssembly binaries. +They extract for example function names, strings from the data section, do unmanaged stack pointer analysis, etc. +The tools are organized into multiple binaries and can be compiled and installed with a recent version of Rust via `cargo build && cargo install --path .` +We use the `wasmparser` library for parsing WebAssembly binaries, which also supports extensions. Finally, for speeding up the manual analysis of many binaries and finding related ones, we implemented an approximate byte-based n-gram similarity comparison tool. This is available at https://github.com/hilbigan/ngrm.