Cybersecurity Inconsistency Tutorial ====================================== In this tutorial, we build a small cybersecurity knowledge graph using PyReason and use it to demonstrate how PyReason detects and resolves three types of inconsistency. The graph models network assets, the software they run, and real CVEs that affect that software. .. note:: Find the full, executable code `here `_ Background ---------- A **CVE** (Common Vulnerabilities and Exposures) is a standardised ID for a known security vulnerability, for example ``cve_2021_3156``. A **CVSS score** rates the severity of a CVE on a 0--10 scale. We divide by 10 to normalise it into the [0, 1] range used by PyReason annotation bounds. The CVEs in this tutorial are real entries from the `National Vulnerability Database `_: +------------------+---------------------+-------+----------------------------------+ | CVE ID | Software | CVSS | Description | +==================+=====================+=======+==================================+ | cve_2021_3156 | sudo 1.9.5p1 | 7.8 | Heap buffer overflow (CWE-121) | +------------------+---------------------+-------+----------------------------------+ | cve_2022_0185 | Linux Kernel 5.1 | 8.4 | Stack overflow (CWE-121) | +------------------+---------------------+-------+----------------------------------+ | cve_2022_26923 | OpenSSL 3.0.1 | 7.5 | Double free (CWE-415) | +------------------+---------------------+-------+----------------------------------+ Graph ----- The graph has three layers of nodes connected by directed edges: .. code-block:: text [asset] --runs--> [software] --has_cve--> [CVE] .. code-block:: python import pyreason as pr import networkx as nx pr.reset() pr.reset_rules() g = nx.DiGraph() # Asset nodes g.add_nodes_from(['web_server', 'workstation_1', 'dev_server']) # Software nodes g.add_nodes_from(['sudo_1_9_5p1', 'linux_kernel_5_1', 'openssl_3_0_1']) # CVE nodes (constants start with lowercase per PyReason convention) g.add_nodes_from(['cve_2021_3156', 'cve_2022_0185', 'cve_2022_26923']) # Which asset runs which software g.add_edge('web_server', 'sudo_1_9_5p1', runs=1) g.add_edge('workstation_1', 'linux_kernel_5_1', runs=1) g.add_edge('dev_server', 'openssl_3_0_1', runs=1) # Which CVE affects which software g.add_edge('sudo_1_9_5p1', 'cve_2021_3156', has_cve=1) g.add_edge('linux_kernel_5_1', 'cve_2022_0185', has_cve=1) g.add_edge('openssl_3_0_1', 'cve_2022_26923', has_cve=1) We then configure PyReason and load the graph: .. code-block:: python pr.settings.verbose = True pr.settings.atom_trace = True pr.settings.inconsistency_check = True pr.load_graph(g) We declare ``vulnerable`` and ``patched`` as mutually exclusive predicates. Setting one automatically updates the other to its negated bound: .. code-block:: python pr.add_inconsistent_predicate('vulnerable', 'patched') Rules ----- The rules we want to add are: 1. An asset is ``at_risk`` if it runs software that has a CVE. 2. An asset that is ``at_risk`` is also ``vulnerable`` with confidence [0.8, 1.0]. 3. A ``vulnerable`` asset is also ``compromised`` with confidence [0.8, 1.0]. 4. A ``compromised`` asset is unlikely to be patched -- ``patch_confidence:[0.0, 0.2]``. This will conflict with any high-confidence ``patch_confidence`` fact, creating a rule-triggered inconsistency. .. note:: Variables in PyReason rules must be uppercase. Constants (node names) must start with a lowercase letter. .. code-block:: python pr.add_rule(pr.Rule('at_risk(X) <- runs(X,Y), has_cve(Y,Z)', 'exposure_rule')) pr.add_rule(pr.Rule('vulnerable(X):[0.8,1.0] <- at_risk(X)', 'vulnerability_rule')) pr.add_rule(pr.Rule('compromised(X):[0.8,1.0] <- vulnerable(X):[0.5,1.0]', 'compromise_rule')) pr.add_rule(pr.Rule('patch_confidence(X):[0.0,0.2] <- compromised(X):[0.5,1.0]', 'unpatched_rule')) Note that ``patch_confidence`` is a separate predicate from ``patched`` and is not in the IPL. This means the rule chain can fire all the way through on ``dev_server`` without being blocked by the IPL, and the inconsistency only occurs at the final step when ``unpatched_rule`` fires. Facts ----- We seed the graph with CVE severity scores from NVD, normalised to [0, 1]: .. code-block:: python pr.add_fact(pr.Fact('severity(cve_2021_3156):[0.78,0.78]', 'sudo_cve_severity', 0, 3)) pr.add_fact(pr.Fact('severity(cve_2022_0185):[0.84,0.84]', 'kernel_cve_severity', 0, 3)) pr.add_fact(pr.Fact('severity(cve_2022_26923):[0.75,0.75]', 'openssl_cve_severity', 0, 3)) Inconsistency Demo 1: Monotonic Reasoning Violation ----------------------------------------------------- PyReason's reasoning is monotonic -- bounds can only get tighter over time. Two data sources disagree on the severity of ``cve_2021_3156`` with non-overlapping bounds, which PyReason cannot reconcile: .. code-block:: python pr.add_fact(pr.Fact('severity(cve_2021_3156):[0.8,1.0]', 'severity_source_A', 0, 3)) pr.add_fact(pr.Fact('severity(cve_2021_3156):[0.0,0.1]', 'severity_source_B', 0, 3)) ``[0.8, 1.0]`` and ``[0.0, 0.1]`` do not overlap. PyReason flags the conflict and resolves the annotation to ``[0.0, 1.0]`` (complete uncertainty). The relevant row from the atom trace -- note ``Consistent = False``: .. code-block:: text Time | Node | Label | Old Bound | New Bound | Occurred Due To | Consistent | Inconsistency Message 0 | cve_2021_3156 | severity | [0.78, 0.78] | [0.0, 1.0] | severity_source_A | False | Inconsistency occurred. Conflicting bounds for severity(cve_2021_3156). Update from [0.780, 0.780] to [0.800, 1.000] is not allowed. Setting bounds to [0,1] and static=True for this timestep. Inconsistency Demo 2: IPL Conflict via Facts -------------------------------------------- An asset management database says ``web_server`` is patched. A vulnerability scanner says it is vulnerable. Both assert high confidence: .. code-block:: python pr.add_fact(pr.Fact('patched(web_server):[0.9,1.0]', 'patch_db_fact', 0, 3)) pr.add_fact(pr.Fact('vulnerable(web_server):[0.9,1.0]', 'vuln_scanner_fact', 0, 3)) Because ``vulnerable`` and ``patched`` are in the IPL, these two facts contradict each other. PyReason resolves both to ``[0.0, 1.0]``. The relevant rows from the atom trace: .. code-block:: text Time | Node | Label | Old Bound | New Bound | Occurred Due To | Consistent | Inconsistency Message 0 | web_server | vulnerable | [0.0, 0.099...] | [0.0, 1.0] | vuln_scanner_fact | False | Inconsistency occurred. Grounding vulnerable(web_server) conflicts with grounding patched(web_server). Setting bounds to [0,1] and static=True for this timestep. 0 | web_server | patched | [0.9, 1.0] | [0.0, 1.0] | vuln_scanner_fact | False | Inconsistency occurred. Grounding vulnerable(web_server) conflicts with grounding patched(web_server). Setting bounds to [0,1] and static=True for this timestep. Inconsistency Demo 3: Rule-Triggered Inconsistency --------------------------------------------------- This demo shows an inconsistency derived entirely by the rule chain, not by directly conflicting facts. The asset management database records that ``dev_server`` has high patch confidence: .. code-block:: python pr.add_fact(pr.Fact('patch_confidence(dev_server):[0.9,1.0]', 'dev_patch_db_fact', 0, 3)) The rule chain then fires across timesteps: .. code-block:: text exposure_rule -> at_risk(dev_server):[1.0,1.0] vulnerability_rule -> vulnerable(dev_server):[0.8,1.0] compromise_rule -> compromised(dev_server):[0.8,1.0] unpatched_rule -> patch_confidence(dev_server):[0.0,0.2] At the final step, ``unpatched_rule`` infers ``patch_confidence:[0.0, 0.2]``. This conflicts with ``dev_patch_db_fact`` asserting ``patch_confidence:[0.9, 1.0]``. ``[0.0, 0.2]`` and ``[0.9, 1.0]`` do not overlap -- PyReason flags the conflict and resolves ``patch_confidence(dev_server)`` to ``[0.0, 1.0]``. The relevant row from the atom trace: .. code-block:: text Time | Node | Label | Old Bound | New Bound | Occurred Due To | Consistent | Inconsistency Message 0 | dev_server | patch_confidence | [0.9, 1.0] | [0.0, 1.0] | unpatched_rule | False | Inconsistency occurred. Conflicting bounds for patch_confidence(dev_server). Update from [0.900, 1.000] to [0.000, 0.200] is not allowed. Setting bounds to [0,1] and static=True for this timestep. Running PyReason ---------------- .. code-block:: python interpretation = pr.reason(timesteps=3) Expected Output --------------- **Assets at risk:** .. code-block:: text TIMESTEP 0: component at_risk 0 web_server [1.0, 1.0] 1 workstation_1 [1.0, 1.0] 2 dev_server [1.0, 1.0] All three assets are marked ``at_risk`` because each runs software with a known CVE. **CVE severity (Demo 1):** .. code-block:: text TIMESTEP 0: component severity 0 cve_2022_0185 [0.84, 0.84] 1 cve_2022_26923 [0.75, 0.75] 2 cve_2021_3156 [0.0, 0.1] The conflict on ``cve_2021_3156`` is detected and logged in the rule trace. **Vulnerable / patched (Demo 2):** .. code-block:: text TIMESTEP 0: component vulnerable patched 0 workstation_1 [0.8, 1.0] [0.0, 0.2] 1 dev_server [0.8, 1.0] [0.0, 0.2] 2 web_server [0.0, 1.0] [0.0, 1.0] ``web_server`` resolves to complete uncertainty on both predicates due to the IPL conflict. ``workstation_1`` and ``dev_server`` show normal IPL behaviour -- ``vulnerability_rule`` sets ``vulnerable:[0.8, 1.0]`` which automatically forces ``patched`` to ``[0.0, 0.2]``. **Patch confidence / compromised (Demo 3):** .. code-block:: text TIMESTEP 0: component patch_confidence compromised 0 dev_server [0.0, 1.0] [0.8, 1.0] 1 workstation_1 [0.0, 0.2] [0.8, 1.0] ``dev_server`` shows ``patch_confidence:[0.0, 1.0]`` -- complete uncertainty -- because ``unpatched_rule`` inferred ``patch_confidence:[0.0, 0.2]`` which conflicted with ``dev_patch_db_fact`` asserting ``patch_confidence:[0.9, 1.0]``. ``workstation_1`` shows ``patch_confidence:[0.0, 0.2]`` with no conflict since it had no high-confidence patch fact asserted. The full rule trace can be saved for inspection: .. code-block:: python node_trace, edge_trace = pr.get_rule_trace(interpretation) pr.save_rule_trace(interpretation)