Add url
into ScriptLocal
summary for external scripts with new
ScriptLocalNode.url_if_external
method.
Renamed EventListenerEdge
to EventListenerFiredEdge
to make it clearer
what the edge denotes.
Add event_name
methods on EventListenerAddEdge
, EventListenerRemoveEdge
,
and EventListenerEdge
edges.
Lots of cleanup (e.g., deleting a bunch of double quotes in annotations)
because of moving most everything to from __future__ import annotations
.
Add ability for the event listener edges to look up where the event was
added, removed, or fired in the document with
EventListenerEdge.event_add_edges()
, EventListenerEdge.event_fired_edges()
,
and EventListenerEdge.event_removed_edges()
methods.
Correctly handle when the URL for the top level frame is empty (which should only happen for older graphs, generated with v0.7.2 and older).
Additional options for the subframes
command, allowing for filtering frames
by first-and-third-party security origin.
Add ability to export subgraphs of the larger graph, using the elm --graphml
command in the ./run.py
script.
Additional test coverage.
Add support for PageGraph version 0.7.2 (which adds new node types for actors).
Fix iframes
test and regenerate test graphs with recent pagegraph.
Add additional tests for keeping track of js calls across frames.
Add additional command for logging if any unattributable events occurred in the graph (i.e., cases where there must have been a script occurring, but we couldn't determine which script.).
Fix issue with deeply recursive request loops (specifically, when the same URL could redirect to itself a large but finite number of times, before redirecting to an eventual end URL).
Move all abstract node and edge classes into pagegraph.graph.{node,edge}.abc
so that the directory structure more closely matches the PageGraph type
taxonomy.
Remove assumption in RequestChain
class that all requests will have a result
(either a completion edge or an error edge). There will be no result if the
graph was serialized while the request was still in the air.
Fix frame filter for requests
command.
Add html
command, for querying what HTML elements appeared in which pages.
Made some minor changes to get python 3.10 compatibility
Add some tests.
Moved to pylint linting, which required a lot of code restructuring.
Parse headers in relevant requests.
Add ability to gate some functionality behind graph versions.
Report frame information for the scripts
command if parsing graphs
versions 0.6.3 or later.
Added two new commands: elm
for querying information about a specific
graph element (and its surrounding subgraph), and scripts
for querying
information about relationship chains between scripts on a page.
Added graph structure type checks for edges.
Further cleaned up how reports are serialized.
Corrected handling of redirection flows for requests through the graph,
and exposed that information in the RequestChainReport
structure
and the requests
command.
Add new js-calls
query command, to allow querying what JS calls were
made during page execution.
Rework how JSON reports are defined and implemented, to allow it to
be mypy
checked, and to make the reports more consistent.
Much faster.
Add explicitly passed --debug
option on command line, to optionally
perform more expensive checks of graph correctness. Started moving
assert
s to this, to give more useful failure information.
Correct assertions in code about resource nodes and connected request edges to incorporate new request redirect edges.