Skip to content

Commit

Permalink
Merge pull request #209 from SoftwareDesignLab/readme-updates
Browse files Browse the repository at this point in the history
Readme updates
  • Loading branch information
ctevse authored Dec 1, 2023
2 parents 50bb82a + c541485 commit 2dd925a
Show file tree
Hide file tree
Showing 4 changed files with 102 additions and 215 deletions.
261 changes: 52 additions & 209 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ It scrapes disclosed CVEs, scores/characterizes them automatically and stores th
* Automatically extracts affected Common Platform Enumeration (CPE) products from free-form CVE descriptions.
* Maintains an updated list of CVE source URLs, takes a seed URL list as input and searches for additional CVE sources.
* Compares crawled CVEs against NVD and MITRE and records the comparison results under the "output" directory.
(If NVIP is run at date "MM/dd/yyyy", the output will be at "output//yyyyMMdd" path.)
(If NVIP is run at date "MM/dd/yyyy", the output will be at "output//yyyyMMdd" path.)
* NVIP consists of multiple modules which send jobs to each other via RabbitMQ, and share the `db` module as a common dependency.

## System Requirements
* NVIP requires at least Java version 8.
Expand Down Expand Up @@ -77,9 +78,10 @@ It scrapes disclosed CVEs, scores/characterizes them automatically and stores th
* Click on "Database/Connect To Database" menu on MySQL Workbench and Click "Ok". Enter the password you set for user "root" earlier. You should be connected to the MySQL database.


* Open a new query editor in MySQL Workbench and execute the script provided at '\nvip_data\mysql-database\CreateAndInitializeDb.sql' to create and initialize the MySQL database.
> Please make sure the MySQL user name and password parameters in the Docker
> environment variables are updated! (Refer to **Environment Variables** section for specific DB parameters needed)
* Once you have a database created, run this command in the mysql-database/newDB directory:

> liquibase --changeLogFile=db.init.xml --classpath=./mysql-connector-j-8.0.33.jar --url="jdbc:mysql://localhost:3306/DB Name" --username=USERNAME --password=PASSWORD update

## 3. Build & Package
Make sure you can build the project before setting it up with docker
Expand All @@ -102,205 +104,28 @@ If you're using Docker (which is the prefferred way of running it), you don't ha

## 4. Install Docker and Build via Docker CLI

#### Build Crawler Image
$ docker build -t crawler .

#### Run with Env List
$ docker run -m=10g --env-file env.list crawler

Where `-m` is the maximum memory (RAM) the container can use during runtime, and `--env-file` is the path to
the environment variable file (in `.list` format)

Make sure your MySQL service is running. If not, try the following:

- (Windows) Go to services panel via windows explorer, navigate to where your MySQL service is (named MySQL80), select
the service and click "start".


- You can verify the service running by logging into MySQL via MySQL Command Line or MySQL Workbench
(Login will automatically fail if the service isn't running, so be sure the login credentials are correct!)


- Make sure the **NVIP_DATA_DIR** points to the nvip_data directory and the database user and password in the **Environment Variables** are correct.

### Installation & Configuration Checklist
- Not all parameters are in **Environment Variables** at the moment.
There are two additional legacy config files used for some parameters.
`src/main/resources/nvip.properties` is used to set program parameters, and `src/main/resources/db-mysql.properties`
is used to set database parameters (We might not need that anymore though). When the system is run, the config files are first searched in the application root,
if they are not found there the ones at `\src\main\resources` are used!


- Required training data and resources are stored under the `nvip_data` folder (the data directory).
You need to configure the data directory of the project (in the **Environment Variables** and (maybe) `nvip.properties`)
to point to the `nvip_data` directory.


### Environment Variables

The `env.list` file contains a set of environment variables that the crawler requires in order to run.
Some variables contain default values for if they're not specified, but it is advised to have them configured based on your usage.

Like stated previously, you can provide these variables when running the application with Docker via the `env.list` file.
If you want to run it locally without Docker, you'll need to provide the environment variables through whatever tool or IDE you're using.

- Setting up environment variables w/ **IntelliJ**: https://www.jetbrains.com/help/objc/add-environment-variables-and-program-arguments.html


- Setting up environment variables w/ **VS Code**: https://code.visualstudio.com/remote/advancedcontainers/environment-variables

**NOTE** If you're running the application with Docker, you will not need to worry about setting up the Env Vars via your IDE.
IF there's any change in your Env Vars, you don't need to rebuild the image (unless there's changes in the code or properties files).

A list of the environment variables is provided below:

### Database

* **HIKARI_URL**: JDBC URL used for connecting to the MySQL Database.
- There is no default value.
- Use mysql://localhost:3306 for running locally, and mysql://host.docker.internal:3306 to run with docker


* **HIKARI_USER**: Database username used to login to the MySQL database
- There is no default value


* **HIKARI_PASSWORD**: Database password used to login to the MySQL database
- There is no default value

### Runtime Data

* **NVIP_DATA_DIR**: Directory path for data resources used by NVIP at runtime
- Default value: nvip_data


* **NVIP_REFRESH_NVD_LIST**: Boolean parameter that determines whether or not NVIP should refresh the existing NVD data in the nvd-cve.csv file
- Default value: true


* **NVIP_PARALLEL_PROCESS_THREAD_LIMIT**: Maximum # of threads for the DBParallelProcess class to use
- Default value: 9

* **NVIP_OUTPUT_DIR**: Output directory path for the web crawler(s)
- Default value: output/crawlers


* **NVIP_SEED_URLS**: Directory path for seed URLs .txt file for NVIP's web crawler(s)
- Default value: nvip_data/url-sources/nvip-seeds.txt


* **NVIP_WHITELIST_URLS**: Directory path for whitelisted URLs/domains for NVIP's web crawler(s)
- Default value: nvip_data/url-sources/nvip-whitelist.txt


* **NVIP_ENABLE_GITHUB**: Boolean parameter for enabling pulling CVEs from CVE GitHib repo: https://github.com/CVEProject/cvelist
- Default value: true

### Crawler
### Running the Crawler

* **NVIP_CRAWLER_POLITENESS**: Time (ms) for how long the crawler should wait for each page to load
- Default value: 3000
`docker run -d --rm --memory=10g --env-file=./nvip.env --volume=./crawler-output:/usr/local/lib/output --volume=exploit-repo:/usr/local/lib/nvip_data/exploit-repo --volume=mitre-cve:/usr/local/lib/nvip_data/mitre-cve --name=nvip-crawler ghcr.io/softwaredesignlab/nvip-crawler:latest`

### Running the Reconciler

* **NVIP_CRAWLER_MAX_PAGES**: Maximum # of pages for the crawler to navigate to
- Default value: 3000
`docker run -d --env-file=./nvip.env --name=nvip-reconciler ghcr.io/softwaredesignlab/nvip-reconciler:latest`

### Running the Product Name Extractor

* **NVIP_CRAWLER_DEPTH**: Maximum depth for the web crawler
- Default value: 1
`docker run -d --env-file=./nvip.env --name=nvip-productnameextractor ghcr.io/softwaredesignlab/nvip-productnameextractor:latest`

### Running the Patchfinder

* **NVIP_CRAWLER_REPORT_ENABLED**: Boolean parameter for enabling error report for crawler sources. Output is logged in the specified output directory
- Default value: true


* **NVIP_NUM_OF_CRAWLER**: Max # of crawler threads
- Default value: 10

### NVD Comparison

* **NVD_API_URL**: URL for NVD API endpoint for grabbing CVEs from NVD
- Default value: https://services.nvd.nist.gov/rest/json/cves/2.0?pubstartDate=<StartDate>&pubEndDate=<EndDate>
- Where <StartDate> is the start date of the request and <EndDate> is the end date for the request
- Start and end date values are determined on runtime, end date being the current date and start date is 120 days before
- Example: https://services.nvd.nist.gov/rest/json/cves/2.0/?pubStartDate=2021-08-04T00:00:00.000&pubEndDate=2021-10-22T00:00:00.000


* **NVD_API_REQUEST_LIMIT**: Max # of requests NVIP should make to NVD to collect CVEs for performance comparison
- Default value: 10
- Each requests grabs 2000 CVEs from NVD

### MITRE Comparison

* **MITRE_GITHUB_URL**: Github URL used to pull MITRE's CVE repo and compare with MITREs results
- Default value: https://github.com/CVEProject/cvelist

### Characterizer

* **NVIP_CVE_CHARACTERIZATION_TRAINING_DATA_DIR**: Directory path for folder that contains Characterizer traning data
- Default value: characterization


* **NVIP_CVE_CHARACTERIZATION_TRAINING_DATA**: List of Characterization training data files (*.csv) (Ordered aplhabetically, and separated by comma (","))
- Default value: AttackTheater.csv,Context.csv,ImpactMethod.csv,LogicalImpact.csv,Mitigation.csv


* **NVIP_CVE_CHARACTERIZATION_LIMIT**: Limit for maximum # of CVEs to run through the characterizer
- Default value: 5000

### Exploit Finder

* **EXPLOIT_FINDER_ENABLED**: Boolean parameter for enabling the exploit finder
- Default value: true


* **EXPLOIT_DB_URL**: URL used for cloning and scraping the ExploitDB Git repo
- Default value: https://gitlab.com/exploit-database/exploitdb

### Patch Finder

* **PATCHFINDER_ENABLED**: Boolean parameter for enabling the patch finder
- Default value: true


* **PATCHFINDER_SOURCE_LIMIT**: Limit of maximum # of repos to scrape for patches
- Default value: 10


* **PATCHFINDER_MAX_THREADS**: Limit of maximum # of threads for patch finder
- Default value: 10

### Email Notification Service

* **NVIP_EMAIL_USER**: Email user name for NVIP notifications
- There is no default value.


* **NVIP_EMAIL_PASSWORD**: Email password for NVIP notifications
- There is no default value.


* **NVIP_EMAIL_FROM**: Email from address for NVIP notifications (data@cve.live)
- There is no default value.


* **NVIP_EMAIL_PORT**: SMTP port # for NVIP notifications (ex. 587)
- There is no default value.


* **NVIP_EMAIL_HOST**: SMTP host domain for NVIP notifications
- There is no default value.


* **NVIP_EMAIL_MESSAGE_URL**: URL domain for links in NVIP email notifications (ex. http://www.cve.live)
- There is no default value.

`docker run -d --env-file=./nvip.env --name=nvip-patchfinder ghcr.io/softwaredesignlab/nvip-patchfinder:latest`

# Component Documentation


### Overview
This project consists of 8 main components
This project consists of 6 main components.

* **CVE Web Crawler**
- Uses Multi Threaded Web Crawling for navigating source pages to grab raw CVE data
Expand All @@ -312,44 +137,62 @@ This project consists of 8 main components
description for each CVE.



* **CVE Characterizer**
* **CVE Characterizer** (included in the reconciler module)
- This component provides automated CVSS scores and VDO Labels for each CVE via a Natural Language Processing model, which is trained
via the data provided in `nvip_data` (Model is also here as well)
via the data provided in `nvip_data` (Model is also here as well). It also uses an SSVC API running in the NVIP environment for SSVC scoring.
- NIST's CVSS score summary: https://nvd.nist.gov/vuln-metrics/cvss
- NIST's VDO Label summary: https://csrc.nist.gov/csrc/media/publications/nistir/8138/draft/documents/nistir_8138_draft.pdf


* **CVE Processor**
* **NVD/MITRE Comparisons** (included in the reconciler module)
- This component processes the compiled CVEs by storing them in the Database, then compares each CVE in NVIP to the
CVEs in NVD and MITRE to compare performance of NVIP vs NVD and MITRE.
- NVD: https://nvd.nist.gov/
- MITRE: https://www.cve.org/
- For comparing with NVD, we're currently transitioning to NVD's 2.0 API: https://nvd.nist.gov/developers/vulnerabilities


* **CVE Product Extractor**
* **Product Name Extractor**
- This component identifies affected products in a CVE via a Named Entity Recognition (NER) model.
- The model and it's training data is provided in `nvip_data`
- The model and its training data is provided in `nvip_data`
- Each extracted product is converted as a Common Product Enumeration (CPE) string
- CPE Definition and Dictionary(s): https://nvd.nist.gov/products/cpe
-


* **CVE Exploit Finder**
- This component identifies exploits for CVEs in NVIP
- Currently, we just pull exploit data from ExploitDB: https://gitlab.com/exploit-database/exploitdb


* **CVE Patch Finder**
* **CVE Patch/Fix Finder**
- This component identifies possible patches for CVEs
- Patches are found by crawling available repos for the affected products of a CVE
- Each repo is cloned, then each commit is navigated to identify patches by checking for keywords in the commit messages
- Product repos are cloned in `nvip_data`, then deleted afterwards after being used
- **NOTE** This component relies directly on the affected product data from product extraction
- Fixes are found with web-scrapers similarly to the CVE crawler

# Project Team
- Mehdi Mirakhorli, Principal Investigator
- Ahmet Okutan, Senior Research Developer
- Chris Enoch, Senior Project Manager
- Peter Mell, Collaborator
- Igor Khokhlov, Researcher
- Joanna Cecilia Da Silva Santos, Researcher
- Danielle Gonzalez, Researcher
- Celeste Gambardella, Researcher
- Olivia Gallucci, Vulnerability Researcher
- Steven Simmons, Developer
- Ryan Bryla, Developer
- Andrew Pickard, Developer
- Brandon Cooper, Developer
- Braden Little, Developer
- Adam Pang, Developer
- Anthony Ioppolo, Developer
- Andromeda Sawtelle, Developer
- Corey Urbanke, Developer
- James McGrath, Developer
- Matt Moon, Developer
- Stephen Shadders, Developer
- Paul Vickers, Developer
- Richard Sawh, Developer
- Greg Lynskey, Developer
- Eli MacDonald, Developer
- Ryan Moore, Developer
- Mackenzie Wade, Developer


* **NVIP Cache Updater and NVIP Email Notifications Service**
- This last component makes sure the Web App is up-to-date with the recent vulnerabilities found in NVIP.
- This is done via collecting the CVEs found in the past week (7 days), and adds them to the `vulnerabilityaggregate` table in the database.
- The `vulnerabilityaggregate` table acts as a cache table for the Web API.
- After the cache is updated, the service sends an email notification to all admin users in NVIP. Each email notification contains a list of CVEs added to NVIP.
24 changes: 20 additions & 4 deletions patchfinder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,8 @@ If you want to run it locally without Docker, the program will attempt to automa

* **HIKARI_URL**: JDBC URL used for connecting to the MySQL Database.
- By default, assumes that application will be run with Docker
- Use `mysql://localhost:3306/DB_NAME?useSSL=false&allowPublicKeyRetrieval=true` for running locally
- Use `mysql://host.docker.internal:3306/DB_NAME?useSSL=false&allowPublicKeyRetrieval=true` to run with Docker
- Use `mysql://localhost:3306/nvip?useSSL=false&allowPublicKeyRetrieval=true` for running locally
- Use `mysql://host.docker.internal:3306/nvip?useSSL=false&allowPublicKeyRetrieval=true` to run with Docker


* **HIKARI_USER**: Database username used to log in to the database.
Expand All @@ -212,6 +212,14 @@ If you want to run it locally without Docker, the program will attempt to automa
- Default value: `host.docker.internal`


* **RABBIT_VHOST**: The virtual host for the RabbitMQ server.
- Default value: `/`


* **RABBIT_PORT**: The port for the RabbitMQ server.
- Default value: `5672`


* **RABBIT_USERNAME**: The username for the RabbitMQ server connection.
- Default value: `guest`

Expand All @@ -223,8 +231,16 @@ If you want to run it locally without Docker, the program will attempt to automa
### Patch Finder Variables

* **PF_INPUT_MODE**: Method of input for Patch Finder jobs, either 'db' or 'rabbit'.
- Default value: `db`

- Default value: `rabbit`


* **PF_INPUT_QUEUE**: Input message queue for Patch Finder jobs, either 'db' or 'rabbit'.
- Default value: `PNE_OUT_PATCH`


* **FF_INPUT_QUEUE**: Input message queue for Fix Finder jobs, either 'db' or 'rabbit'.
- Default value: `PNE_OUT_FIX`


* **CVE_LIMIT**: The limit for CVEs to be processed by the Patch Finder during runtime.
- Default value: `20`
Expand Down
Loading

0 comments on commit 2dd925a

Please sign in to comment.