From b02155ce469d411426772dcf0714305f23fc7a7b Mon Sep 17 00:00:00 2001 From: memeeerit Date: Thu, 30 Nov 2023 16:56:41 -0500 Subject: [PATCH 1/6] readme updates --- productnameextractor/README.md | 14 +++++++++++++- reconciler/README.md | 2 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/productnameextractor/README.md b/productnameextractor/README.md index 4d138b76d..7936272fd 100644 --- a/productnameextractor/README.md +++ b/productnameextractor/README.md @@ -211,11 +211,23 @@ If you want to run it locally without Docker, the program will attempt to automa * **RABBIT_PASSWORD**: The password for the RabbitMQ server connection. - Default value: `guest` +* **PNE_INPUT_QUEUE**: The RabbitMQ queue name to watch for input. + * Default value: 'RECONCILER_OUT' + +* **PNE_OUTPUT_QUEUE**: The RabbitMQ queue name to send jobs to the Pathfinder. + * Default value: 'PNE_OUT_PATCH' + +* **PNE_INPUT_QUEUE**: The RabbitMQ queue name to watch jobs to the Fixfinder. + * Default value: 'PNE_OUT_FIX' + ### Product Name Extractor Variables +* **INPUT_MODE**: The way the PNE will receive input. + - Default value: `rabbit` + * **CHAR_2_VEC_CONFIG**: Name of the configuration file for the Char2Vec model. - - Default value: `c2v_model_config_50.json` + - Default value: `c2v_model_config_50.json` * **CHAR_2_VEC_WEIGHTS**: Name of the weights file for the Char2Vec model. diff --git a/reconciler/README.md b/reconciler/README.md index 5b7dd7f41..7abf79b39 100644 --- a/reconciler/README.md +++ b/reconciler/README.md @@ -185,7 +185,7 @@ A list of the environment variables is provided below: * **NVD_API_URL**: URL to NVD's CVE 2.0 API - Default value:https://services.nvd.nist.gov/rest/json/cves/2.0?pubStartDate=&pubEndDate= -* **OPENAI_KEY**: Session key for OpenAI +* **OPENAI_KEY**: Session key for OpenAI [no longer used] - Default value:sk-xxxxxxxxxxxxx * **DATA_DIR**: Path to the data directory From 9f6f75f93abc50841fdf8829494278f830ae0521 Mon Sep 17 00:00:00 2001 From: memeeerit Date: Thu, 30 Nov 2023 17:15:00 -0500 Subject: [PATCH 2/6] readme updates --- README.md | 69 ++++++++----------------------------------------------- 1 file changed, 10 insertions(+), 59 deletions(-) diff --git a/README.md b/README.md index f64af57f1..8564cb20c 100644 --- a/README.md +++ b/README.md @@ -124,11 +124,7 @@ the service and click "start". - Make sure the **NVIP_DATA_DIR** points to the nvip_data directory and the database user and password in the **Environment Variables** are correct. ### Installation & Configuration Checklist -- Not all parameters are in **Environment Variables** at the moment. -There are two additional legacy config files used for some parameters. -`src/main/resources/nvip.properties` is used to set program parameters, and `src/main/resources/db-mysql.properties` -is used to set database parameters (We might not need that anymore though). When the system is run, the config files are first searched in the application root, -if they are not found there the ones at `\src\main\resources` are used! +- All parameters are in **Environment Variables** at the moment. - Required training data and resources are stored under the `nvip_data` folder (the data directory). @@ -248,14 +244,6 @@ A list of the environment variables is provided below: * **NVIP_CVE_CHARACTERIZATION_LIMIT**: Limit for maximum # of CVEs to run through the characterizer - Default value: 5000 -### Exploit Finder - -* **EXPLOIT_FINDER_ENABLED**: Boolean parameter for enabling the exploit finder - - Default value: true - - -* **EXPLOIT_DB_URL**: URL used for cloning and scraping the ExploitDB Git repo - - Default value: https://gitlab.com/exploit-database/exploitdb ### Patch Finder @@ -270,37 +258,12 @@ A list of the environment variables is provided below: * **PATCHFINDER_MAX_THREADS**: Limit of maximum # of threads for patch finder - Default value: 10 -### Email Notification Service - -* **NVIP_EMAIL_USER**: Email user name for NVIP notifications - - There is no default value. - - -* **NVIP_EMAIL_PASSWORD**: Email password for NVIP notifications - - There is no default value. - - -* **NVIP_EMAIL_FROM**: Email from address for NVIP notifications (data@cve.live) - - There is no default value. - - -* **NVIP_EMAIL_PORT**: SMTP port # for NVIP notifications (ex. 587) - - There is no default value. - - -* **NVIP_EMAIL_HOST**: SMTP host domain for NVIP notifications - - There is no default value. - - -* **NVIP_EMAIL_MESSAGE_URL**: URL domain for links in NVIP email notifications (ex. http://www.cve.live) - - There is no default value. - # Component Documentation ### Overview -This project consists of 8 main components +This project consists of 6 main components * **CVE Web Crawler** - Uses Multi Threaded Web Crawling for navigating source pages to grab raw CVE data @@ -312,15 +275,14 @@ This project consists of 8 main components description for each CVE. - -* **CVE Characterizer** +* **CVE Characterizer** (included in the reconciler module) - This component provides automated CVSS scores and VDO Labels for each CVE via a Natural Language Processing model, which is trained - via the data provided in `nvip_data` (Model is also here as well) + via the data provided in `nvip_data` (Model is also here as well). It also uses an SSVC API running in the NVIP environment for SSVC scoring. - NIST's CVSS score summary: https://nvd.nist.gov/vuln-metrics/cvss - NIST's VDO Label summary: https://csrc.nist.gov/csrc/media/publications/nistir/8138/draft/documents/nistir_8138_draft.pdf -* **CVE Processor** +* **NVD/MITRE Comparisons** (included in the reconciler module) - This component processes the compiled CVEs by storing them in the Database, then compares each CVE in NVIP to the CVEs in NVD and MITRE to compare performance of NVIP vs NVD and MITRE. - NVD: https://nvd.nist.gov/ @@ -328,28 +290,17 @@ This project consists of 8 main components - For comparing with NVD, we're currently transitioning to NVD's 2.0 API: https://nvd.nist.gov/developers/vulnerabilities -* **CVE Product Extractor** +* **Product Name Extractor** - This component identifies affected products in a CVE via a Named Entity Recognition (NER) model. - - The model and it's training data is provided in `nvip_data` + - The model and its training data is provided in `nvip_data` - Each extracted product is converted as a Common Product Enumeration (CPE) string - CPE Definition and Dictionary(s): https://nvd.nist.gov/products/cpe + - - -* **CVE Exploit Finder** - - This component identifies exploits for CVEs in NVIP - - Currently, we just pull exploit data from ExploitDB: https://gitlab.com/exploit-database/exploitdb - - -* **CVE Patch Finder** +* **CVE Patch/Fix Finder** - This component identifies possible patches for CVEs - Patches are found by crawling available repos for the affected products of a CVE - Each repo is cloned, then each commit is navigated to identify patches by checking for keywords in the commit messages - Product repos are cloned in `nvip_data`, then deleted afterwards after being used - **NOTE** This component relies directly on the affected product data from product extraction - - -* **NVIP Cache Updater and NVIP Email Notifications Service** - - This last component makes sure the Web App is up-to-date with the recent vulnerabilities found in NVIP. - - This is done via collecting the CVEs found in the past week (7 days), and adds them to the `vulnerabilityaggregate` table in the database. - - The `vulnerabilityaggregate` table acts as a cache table for the Web API. - - After the cache is updated, the service sends an email notification to all admin users in NVIP. Each email notification contains a list of CVEs added to NVIP. + - Fixes are found with web-scrapers similarly to the CVE crawler From f58ad04ad8bacfbbf094e78ec26051c6e77ef115 Mon Sep 17 00:00:00 2001 From: memeeerit Date: Thu, 30 Nov 2023 17:34:25 -0500 Subject: [PATCH 3/6] more --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8564cb20c..85a955155 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,8 @@ It scrapes disclosed CVEs, scores/characterizes them automatically and stores th * Automatically extracts affected Common Platform Enumeration (CPE) products from free-form CVE descriptions. * Maintains an updated list of CVE source URLs, takes a seed URL list as input and searches for additional CVE sources. * Compares crawled CVEs against NVD and MITRE and records the comparison results under the "output" directory. -(If NVIP is run at date "MM/dd/yyyy", the output will be at "output//yyyyMMdd" path.) +(If NVIP is run at date "MM/dd/yyyy", the output will be at "output//yyyyMMdd" path.) +* NVIP consists of multiple modules which send jobs to each other via RabbitMQ, and share the `db` module as a common dependency. ## System Requirements * NVIP requires at least Java version 8. From 850c42986e651785ffc96d02aab0e6a0cc07a8b0 Mon Sep 17 00:00:00 2001 From: memeeerit Date: Thu, 30 Nov 2023 18:06:07 -0500 Subject: [PATCH 4/6] more --- README.md | 162 ++++++++++++++---------------------------------------- 1 file changed, 40 insertions(+), 122 deletions(-) diff --git a/README.md b/README.md index 85a955155..29178537d 100644 --- a/README.md +++ b/README.md @@ -125,146 +125,34 @@ the service and click "start". - Make sure the **NVIP_DATA_DIR** points to the nvip_data directory and the database user and password in the **Environment Variables** are correct. ### Installation & Configuration Checklist -- All parameters are in **Environment Variables** at the moment. +- All parameters are in **Environment Variables** at the moment. For more information, see each module's README and env.list. - Required training data and resources are stored under the `nvip_data` folder (the data directory). You need to configure the data directory of the project (in the **Environment Variables** and (maybe) `nvip.properties`) to point to the `nvip_data` directory. +### Running the Crawler -### Environment Variables +`docker run -d --rm --memory=10g --env-file=./nvip.env --volume=./crawler-output:/usr/local/lib/output --volume=exploit-repo:/usr/local/lib/nvip_data/exploit-repo --volume=mitre-cve:/usr/local/lib/nvip_data/mitre-cve --name=nvip-crawler ghcr.io/softwaredesignlab/nvip-crawler:latest` -The `env.list` file contains a set of environment variables that the crawler requires in order to run. -Some variables contain default values for if they're not specified, but it is advised to have them configured based on your usage. +### Running the Reconciler -Like stated previously, you can provide these variables when running the application with Docker via the `env.list` file. -If you want to run it locally without Docker, you'll need to provide the environment variables through whatever tool or IDE you're using. +`docker run -d --env-file=./nvip.env --name=nvip-reconciler ghcr.io/softwaredesignlab/nvip-reconciler:latest` -- Setting up environment variables w/ **IntelliJ**: https://www.jetbrains.com/help/objc/add-environment-variables-and-program-arguments.html +### Running the Product Name Extractor +`docker run -d --env-file=./nvip.env --name=nvip-productnameextractor ghcr.io/softwaredesignlab/nvip-productnameextractor:latest` -- Setting up environment variables w/ **VS Code**: https://code.visualstudio.com/remote/advancedcontainers/environment-variables - -**NOTE** If you're running the application with Docker, you will not need to worry about setting up the Env Vars via your IDE. -IF there's any change in your Env Vars, you don't need to rebuild the image (unless there's changes in the code or properties files). - -A list of the environment variables is provided below: - -### Database - -* **HIKARI_URL**: JDBC URL used for connecting to the MySQL Database. - - There is no default value. - - Use mysql://localhost:3306 for running locally, and mysql://host.docker.internal:3306 to run with docker - - -* **HIKARI_USER**: Database username used to login to the MySQL database - - There is no default value - - -* **HIKARI_PASSWORD**: Database password used to login to the MySQL database - - There is no default value - -### Runtime Data - -* **NVIP_DATA_DIR**: Directory path for data resources used by NVIP at runtime - - Default value: nvip_data - - -* **NVIP_REFRESH_NVD_LIST**: Boolean parameter that determines whether or not NVIP should refresh the existing NVD data in the nvd-cve.csv file - - Default value: true - - -* **NVIP_PARALLEL_PROCESS_THREAD_LIMIT**: Maximum # of threads for the DBParallelProcess class to use - - Default value: 9 - -* **NVIP_OUTPUT_DIR**: Output directory path for the web crawler(s) - - Default value: output/crawlers - - -* **NVIP_SEED_URLS**: Directory path for seed URLs .txt file for NVIP's web crawler(s) - - Default value: nvip_data/url-sources/nvip-seeds.txt - - -* **NVIP_WHITELIST_URLS**: Directory path for whitelisted URLs/domains for NVIP's web crawler(s) - - Default value: nvip_data/url-sources/nvip-whitelist.txt - - -* **NVIP_ENABLE_GITHUB**: Boolean parameter for enabling pulling CVEs from CVE GitHib repo: https://github.com/CVEProject/cvelist - - Default value: true - -### Crawler - -* **NVIP_CRAWLER_POLITENESS**: Time (ms) for how long the crawler should wait for each page to load - - Default value: 3000 - - -* **NVIP_CRAWLER_MAX_PAGES**: Maximum # of pages for the crawler to navigate to - - Default value: 3000 - - -* **NVIP_CRAWLER_DEPTH**: Maximum depth for the web crawler - - Default value: 1 - - -* **NVIP_CRAWLER_REPORT_ENABLED**: Boolean parameter for enabling error report for crawler sources. Output is logged in the specified output directory - - Default value: true - - -* **NVIP_NUM_OF_CRAWLER**: Max # of crawler threads - - Default value: 10 - -### NVD Comparison - -* **NVD_API_URL**: URL for NVD API endpoint for grabbing CVEs from NVD - - Default value: https://services.nvd.nist.gov/rest/json/cves/2.0?pubstartDate=&pubEndDate= - - Where is the start date of the request and is the end date for the request - - Start and end date values are determined on runtime, end date being the current date and start date is 120 days before - - Example: https://services.nvd.nist.gov/rest/json/cves/2.0/?pubStartDate=2021-08-04T00:00:00.000&pubEndDate=2021-10-22T00:00:00.000 - - -* **NVD_API_REQUEST_LIMIT**: Max # of requests NVIP should make to NVD to collect CVEs for performance comparison - - Default value: 10 - - Each requests grabs 2000 CVEs from NVD - -### MITRE Comparison - -* **MITRE_GITHUB_URL**: Github URL used to pull MITRE's CVE repo and compare with MITREs results - - Default value: https://github.com/CVEProject/cvelist - -### Characterizer - -* **NVIP_CVE_CHARACTERIZATION_TRAINING_DATA_DIR**: Directory path for folder that contains Characterizer traning data - - Default value: characterization - - -* **NVIP_CVE_CHARACTERIZATION_TRAINING_DATA**: List of Characterization training data files (*.csv) (Ordered aplhabetically, and separated by comma (",")) - - Default value: AttackTheater.csv,Context.csv,ImpactMethod.csv,LogicalImpact.csv,Mitigation.csv - - -* **NVIP_CVE_CHARACTERIZATION_LIMIT**: Limit for maximum # of CVEs to run through the characterizer - - Default value: 5000 - - -### Patch Finder - -* **PATCHFINDER_ENABLED**: Boolean parameter for enabling the patch finder - - Default value: true - - -* **PATCHFINDER_SOURCE_LIMIT**: Limit of maximum # of repos to scrape for patches - - Default value: 10 - - -* **PATCHFINDER_MAX_THREADS**: Limit of maximum # of threads for patch finder - - Default value: 10 +### Running the Patchfinder +`docker run -d --env-file=./nvip.env --name=nvip-patchfinder ghcr.io/softwaredesignlab/nvip-patchfinder:latest` # Component Documentation ### Overview -This project consists of 6 main components +This project consists of 6 main components. * **CVE Web Crawler** - Uses Multi Threaded Web Crawling for navigating source pages to grab raw CVE data @@ -305,3 +193,33 @@ This project consists of 6 main components - Product repos are cloned in `nvip_data`, then deleted afterwards after being used - **NOTE** This component relies directly on the affected product data from product extraction - Fixes are found with web-scrapers similarly to the CVE crawler + +# Project Team +- Mehdi Mirakhorli, Principal Investigator +- Ahmet Okutan, Senior Research Developer +- Chris Enoch, Senior Project Manager +- Peter Mell, Collaborator +- Igor Khokhlov, Researcher +- Joanna Cecilia Da Silva Santos, Researcher +- Danielle Gonzalez, Researcher +- Celeste Gambardella, Researcher +- Olivia Gallucci, Vulnerability Researcher +- Steven Simmons, Developer +- Ryan Bryla, Developer +- Andrew Pickard, Developer +- Brandon Cooper, Developer +- Braden Little, Developer +- Adam Pang, Developer +- Anthony Ioppolo, Developer +- Andromeda Sawtelle, Developer +- Corey Urbanke, Developer +- James McGrath, Developer +- Matt Moon, Developer +- Stephen Shadders, Developer +- Paul Vickers, Developer +- Richard Sawh, Developer +- Greg Lynskey, Developer +- Eli MacDonald, Developer +- Ryan Moore, Developer +- Mackenzie Wade, Developer + From 9d5ff0fd22b52143368643f131f1fea35731305a Mon Sep 17 00:00:00 2001 From: memeeerit Date: Thu, 30 Nov 2023 18:09:22 -0500 Subject: [PATCH 5/6] more --- README.md | 35 ++++------------------------------- 1 file changed, 4 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 29178537d..45e7f3454 100644 --- a/README.md +++ b/README.md @@ -78,9 +78,10 @@ It scrapes disclosed CVEs, scores/characterizes them automatically and stores th * Click on "Database/Connect To Database" menu on MySQL Workbench and Click "Ok". Enter the password you set for user "root" earlier. You should be connected to the MySQL database. -* Open a new query editor in MySQL Workbench and execute the script provided at '\nvip_data\mysql-database\CreateAndInitializeDb.sql' to create and initialize the MySQL database. -> Please make sure the MySQL user name and password parameters in the Docker -> environment variables are updated! (Refer to **Environment Variables** section for specific DB parameters needed) +* Once you have a database created, run this command in the mysql-database/newDB directory: + +> liquibase --changeLogFile=db.init.xml --classpath=./mysql-connector-j-8.0.33.jar --url="jdbc:mysql://localhost:3306/DB Name" --username=USERNAME --password=PASSWORD update + ## 3. Build & Package Make sure you can build the project before setting it up with docker @@ -103,34 +104,6 @@ If you're using Docker (which is the prefferred way of running it), you don't ha ## 4. Install Docker and Build via Docker CLI -#### Build Crawler Image - $ docker build -t crawler . - -#### Run with Env List - $ docker run -m=10g --env-file env.list crawler - -Where `-m` is the maximum memory (RAM) the container can use during runtime, and `--env-file` is the path to -the environment variable file (in `.list` format) - -Make sure your MySQL service is running. If not, try the following: - - - (Windows) Go to services panel via windows explorer, navigate to where your MySQL service is (named MySQL80), select -the service and click "start". - - - - You can verify the service running by logging into MySQL via MySQL Command Line or MySQL Workbench -(Login will automatically fail if the service isn't running, so be sure the login credentials are correct!) - - - - Make sure the **NVIP_DATA_DIR** points to the nvip_data directory and the database user and password in the **Environment Variables** are correct. - -### Installation & Configuration Checklist -- All parameters are in **Environment Variables** at the moment. For more information, see each module's README and env.list. - - -- Required training data and resources are stored under the `nvip_data` folder (the data directory). -You need to configure the data directory of the project (in the **Environment Variables** and (maybe) `nvip.properties`) -to point to the `nvip_data` directory. ### Running the Crawler From c5414850c29329e9a3111061682a8b65df6dd3cf Mon Sep 17 00:00:00 2001 From: Dylan Mulligan Date: Thu, 30 Nov 2023 18:44:50 -0500 Subject: [PATCH 6/6] Update env vars --- patchfinder/README.md | 24 ++++++++++++++++++++---- productnameextractor/README.md | 20 ++++++++++++++++++-- 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/patchfinder/README.md b/patchfinder/README.md index a8e0621e8..93ff972fb 100644 --- a/patchfinder/README.md +++ b/patchfinder/README.md @@ -189,8 +189,8 @@ If you want to run it locally without Docker, the program will attempt to automa * **HIKARI_URL**: JDBC URL used for connecting to the MySQL Database. - By default, assumes that application will be run with Docker - - Use `mysql://localhost:3306/DB_NAME?useSSL=false&allowPublicKeyRetrieval=true` for running locally - - Use `mysql://host.docker.internal:3306/DB_NAME?useSSL=false&allowPublicKeyRetrieval=true` to run with Docker + - Use `mysql://localhost:3306/nvip?useSSL=false&allowPublicKeyRetrieval=true` for running locally + - Use `mysql://host.docker.internal:3306/nvip?useSSL=false&allowPublicKeyRetrieval=true` to run with Docker * **HIKARI_USER**: Database username used to log in to the database. @@ -212,6 +212,14 @@ If you want to run it locally without Docker, the program will attempt to automa - Default value: `host.docker.internal` +* **RABBIT_VHOST**: The virtual host for the RabbitMQ server. + - Default value: `/` + + +* **RABBIT_PORT**: The port for the RabbitMQ server. + - Default value: `5672` + + * **RABBIT_USERNAME**: The username for the RabbitMQ server connection. - Default value: `guest` @@ -223,8 +231,16 @@ If you want to run it locally without Docker, the program will attempt to automa ### Patch Finder Variables * **PF_INPUT_MODE**: Method of input for Patch Finder jobs, either 'db' or 'rabbit'. - - Default value: `db` - + - Default value: `rabbit` + + +* **PF_INPUT_QUEUE**: Input message queue for Patch Finder jobs, either 'db' or 'rabbit'. + - Default value: `PNE_OUT_PATCH` + + +* **FF_INPUT_QUEUE**: Input message queue for Fix Finder jobs, either 'db' or 'rabbit'. + - Default value: `PNE_OUT_FIX` + * **CVE_LIMIT**: The limit for CVEs to be processed by the Patch Finder during runtime. - Default value: `20` diff --git a/productnameextractor/README.md b/productnameextractor/README.md index 7936272fd..ccce41e14 100644 --- a/productnameextractor/README.md +++ b/productnameextractor/README.md @@ -204,6 +204,14 @@ If you want to run it locally without Docker, the program will attempt to automa - Default value: `host.docker.internal` +* **RABBIT_VHOST**: The virtual host for the RabbitMQ server. + - Default value: `/` + + +* **RABBIT_PORT**: The port for the RabbitMQ server. + - Default value: `5672` + + * **RABBIT_USERNAME**: The username for the RabbitMQ server connection. - Default value: `guest` @@ -211,13 +219,16 @@ If you want to run it locally without Docker, the program will attempt to automa * **RABBIT_PASSWORD**: The password for the RabbitMQ server connection. - Default value: `guest` + * **PNE_INPUT_QUEUE**: The RabbitMQ queue name to watch for input. * Default value: 'RECONCILER_OUT' -* **PNE_OUTPUT_QUEUE**: The RabbitMQ queue name to send jobs to the Pathfinder. + +* **PNE_OUTPUT_QUEUE_PATCH**: The RabbitMQ queue name to send jobs to the Pathfinder. * Default value: 'PNE_OUT_PATCH' -* **PNE_INPUT_QUEUE**: The RabbitMQ queue name to watch jobs to the Fixfinder. + +* **PNE_OUTPUT_QUEUE_FIX**: The RabbitMQ queue name to send jobs to the Fixfinder. * Default value: 'PNE_OUT_FIX' @@ -226,6 +237,11 @@ If you want to run it locally without Docker, the program will attempt to automa * **INPUT_MODE**: The way the PNE will receive input. - Default value: `rabbit` + +* **MAX_ATTEMPTS_PER_PAGE**: The maximum number of attempts to scrape any given page. + - Default value: `rabbit` + + * **CHAR_2_VEC_CONFIG**: Name of the configuration file for the Char2Vec model. - Default value: `c2v_model_config_50.json`