# 🎯 APT / Nation-State Intrusion — Starter Kit

> Complete starter bundle for apt / nation-state intrusion incident response.

**Generated:** 2026-04-18

---

## Part 1: IR Cheatsheet

### Triage
- **APT Attribution** (P1, ~120m)
  1. Enumerate confirmed techniques so far against MITRE ATT&CK (Initial Access, Execution, Persistence, Privilege Escalation, Credential Access, Discovery, Lateral Movement, Collection, Command and Control, Exfiltration).
  2. Pull threat-intel reports for groups matching the observed TTP set (MITRE ATT&CK Groups, vendor tracked groups, Mandiant/CrowdStrike/Microsoft profiles). Track 3-5 candidate groups, not one "most likely".
  3. For each candidate group, build a hunt list: additional tools they typically bring, additional persistence they typically install, C2 infrastructure patterns.
- **Container Escape** (P1, ~90m)
  1. Inspect the compromised pod spec for escape-enabling configuration: `securityContext.privileged`, dangerous capabilities, hostPath mounts to /, host namespaces, kernel module loading capability.
  2. Check for host-mounted Docker socket (/var/run/docker.sock) -- single most common container escape vector; mount implies full node compromise capability.
  3. Review eBPF / Falco alerts in the compromise window for escape-adjacent events: write to /proc, unexpected setuid, kernel module load, cgroup breakout patterns.

### Containment
- **K8s Workload Isolation** (P1, ~120m)
  1. Preserve evidence first: capture pod spec (`kubectl get pod <pod> -o yaml`), image digest, and last N minutes of container logs; snapshot the node if possible.
  2. Apply a deny-all NetworkPolicy to the affected namespace or specific pod labels to block egress and internal lateral traffic.
  3. Revoke the pod's service account bindings (`kubectl patch rolebinding --type=json`) and rotate the bound service account token.
- **Serverless Containment** (P1, ~90m)
  1. Preserve the current deployed package: `aws lambda get-function --function-name <name>` and download via the returned time-limited URL; save environment variables separately.
  2. Disable all event-source mappings (`aws lambda list-event-source-mappings` then `delete-event-source-mapping`) to stop inbound invocations.
  3. Remove the function's execution-role attachment or replace with a minimum-privilege role that only allows log writes during investigation.

### Preservation
- **etcd Snapshot** (P2, ~60m)
  1. Identify cluster flavor: self-managed clusters allow direct etcd snapshot; managed clusters (EKS, GKE, AKS) require provider-specific backup paths.
  2. For self-managed: `ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-snapshot.db` with cluster certificates.
  3. Verify the snapshot with `etcdctl snapshot status --write-out=table`.

### Collection
- **K8s Audit Collection** (P1, ~120m)
  1. For EKS: export `/aws/eks/<cluster>/cluster` CloudWatch Logs; filter for api, audit, authenticator, controllerManager, scheduler log types.
  2. For GKE: `gcloud logging read 'resource.type="k8s_cluster"'` with appropriate freshness; separate Data Access logs may need explicit enablement.
  3. For AKS: Query Log Analytics `AzureDiagnostics | where Category in ("kube-audit","kube-apiserver","kube-controller-manager")`.
- **Serverless Collection** (P1, ~90m)
  1. Export execution logs: Lambda via CloudWatch Logs, GCF via Cloud Logging, Azure Functions via Application Insights / Log Analytics.
  2. Export management-plane events for the function ARN / resource: CloudTrail for Lambda (`UpdateFunctionCode`, `UpdateFunctionConfiguration`), Cloud Audit Logs for GCF, Activity Log for Azure Functions.
  3. Preserve the currently-deployed package: `aws lambda get-function`, `gcloud functions describe`, equivalent for Azure; download before any rollback.

### Analysis
- **Backdoor Analysis** (P2, ~240m)
  1. Extract the malicious artifact from the quarantined copy preserved during containment; verify hash against vendor advisory and calculate additional hashes (ssdeep, TLSH) for fuzzy matching.
  2. Static analysis first: `file`, `strings`, `binwalk`, entropy analysis, disassembly (Ghidra/IDA). For JS/Python supply-chain attacks, inspect package.json scripts, post-install hooks, and any `eval`/`exec` calls with obfuscated input.
  3. Dynamic analysis in an isolated lab: run in a disconnected VM with fake network services (INetSim, FakeNet-NG); capture process creation, file drops, registry changes, network I/O, and DNS queries.
- **APT Dwell Hunt** (P2, ~300m)
  1. Extend the investigation window to the full retention of each log source; document gaps where retention is shorter than suspected dwell.
  2. Hunt for living-off-the-land patterns: anomalous parent-child process chains (w3wp.exe -> cmd.exe; winlogon.exe -> unexpected child; LSASS access from unusual processes), rare command-line patterns (powershell -enc with long-base64 payloads, rundll32 with unusual DLLs).
  3. Hunt for persistence the APT commonly installs: WMI event subscriptions, scheduled-task XML with hidden flags, service DLLs in unusual paths, Registry Run keys pointing to user-writable paths, masquerading services, COM hijacking, AppInit_DLLs, Image File Execution Options debugger hijacks.
- **Escape Analysis** (P2, ~240m)
  1. Identify the escape primitive from preserved pod spec and runtime evidence: privileged flag, dangerous capabilities, host mounts, host namespaces, kernel CVE exploitation.
  2. Query the cluster for other pods with the same primitive; these are immediate candidates for the same compromise.
  3. Review admission-controller policy to understand why the primitive was allowed; gaps in policy are part of the finding.
- **Cloud IAM Escalation** (P2, ~240m)
  1. Build a "session graph": from the initial access event, walk STS AssumeRole chains / OAuth token exchanges / service-principal sign-ins until the highest-privilege identity reached.
  2. Identify over-permissioned roles: role permissions broader than function need (e.g., Lambda with `*:*`, service principal with Global Admin).
  3. Hunt for privilege-escalation primitives: iam:PassRole to higher-privilege role, sts:AssumeRole into cross-account, Entra ID app-consent phishing grant paths, Google Workspace super-admin role assignment.

### Eradication
- **Assume-Breach Rebuild** (P1, ~480m)
  1. Reset `krbtgt` password twice with a delay greater than max ticket lifetime between resets (Microsoft-recommended 24-48h), invalidating all existing Kerberos tickets including golden tickets.
  2. Rotate all service-account credentials, managed service account passwords, certificate-based service principals, and Entra ID application secrets/certificates; prioritize accounts with elevated or cross-tenant access.
  3. Rotate domain-admin, enterprise-admin, schema-admin passwords and enforce required smart-card or phishing-resistant MFA re-enrollment.

### Post-Incident Review
- **APT Intel Share** (P3, ~240m)
  1. Assemble the IoC package: hashes with algorithms, IPs with first-seen timestamps, domains with WHOIS/DNS snapshots, email indicators, TLS certificate thumbprints, file paths, registry keys, mutexes.
  2. Publish hunt queries in a shareable format (Sigma, KQL, SPL) and tag them with the originating TTP; submit Sigma rules to the Sigma public repo when non-sensitive.
  3. Prepare a TTP narrative mapped to MITRE ATT&CK tactics, techniques, and sub-techniques; reference the ATT&CK Group ID if attribution is high-confidence.

## Part 2: Key Artifacts

### Security Event Log (4624/4625/4688)
**Location:** `C:\Windows\System32\winevt\Logs\Security.evtx`
**Value:** Correlating Event ID 4624 logon types (e.g., Type 3 network, Type 10 RDP) with source IPs reveals lateral movement. Failed logon bursts (4625) expose brute-force and password-spray campaigns. Process creation events (4688) with command-line auditing enabled provide a full execution timeline even when EDR is absent.

### SYSTEM Registry Hive
**Location:** `C:\Windows\System32\config\SYSTEM`
**Value:** Services registered under ControlSet\Services expose malicious services used for persistence and privilege escalation. The ComputerName and TimeZoneInformation keys anchor timeline analysis. MountedDevices reveals USB storage that was connected, supporting data exfiltration investigations.

### AmCache.hve
**Location:** `C:\Windows\appcompat\Programs\Amcache.hve`
**Value:** AmCache provides SHA1 hashes for executed binaries, enabling immediate VirusTotal lookups even after the attacker deletes the original file. First-execution timestamps establish when a tool was first introduced to the system. Entries persist across reboots and are harder to anti-forensic than Prefetch.

### Scheduled Tasks
**Location:** `C:\Windows\System32\Tasks\`
**Value:** Scheduled tasks are a top persistence mechanism. Each task XML contains the exact command line and arguments the task executes, the user context it runs under, and when it was created. Comparing task creation timestamps against the intrusion timeline isolates attacker-created tasks. Tasks running as SYSTEM with encoded PowerShell or unusual binary paths are high-confidence indicators.

### Full Memory Dump
**Location:** `Acquired via live capture (RAM)`
**Value:** Memory analysis is the only reliable method to detect fileless malware, process injection, and reflective DLL loading that leave no disk artifacts. Active network connections with owning process context, decrypted credential material from LSASS, and in-memory-only scripts are all recoverable. Volatility profiles can reconstruct the full process tree, open handles, and loaded modules.

### WMI Event Subscriptions (OBJECTS.DATA)
**Location:** `C:\Windows\System32\wbem\Repository\OBJECTS.DATA`
**Value:** WMI event subscriptions are a stealthy persistence mechanism favored by advanced adversaries because they do not appear in traditional autoruns locations. Parsing OBJECTS.DATA reveals the trigger condition (e.g., system startup, user logon, interval timer) and the exact command or script payload. This persistence survives reboots and does not require files on disk if using ActiveScriptEventConsumer.

### SECURITY Hive (LSA Secrets & Cached Logons)
**Location:** `C:\Windows\System32\config\SECURITY`
**Value:** The SECURITY hive helps determine whether domain credentials, service passwords, or other LSA-protected secrets were present and potentially exposed on the host. Offline parsing in conjunction with the SYSTEM hive can recover cached logon metadata and secret blobs that reveal service-account use, scheduled task credentials, and prior administrative authentication patterns. This is especially valuable when scoping credential access or confirming whether an endpoint held reusable authentication material after a compromise.

### Unified Audit Log (UAL)
**Location:** `Microsoft Purview > Audit > Search (or Search-UnifiedAuditLog cmdlet)`
**Value:** The UAL is the single most important artifact for M365 investigations. It captures mailbox access, file downloads, sharing changes, admin role assignments, and OAuth app consents in one searchable location. Correlating ClientIP and UserAgent across operations reveals session hijacking -- when the same session token appears from two different geolocations, a token theft is confirmed. Retention is 90 days (E3) or 365 days (E5).

### Azure AD (Entra ID) Sign-in Logs
**Location:** `Azure Portal > Entra ID > Monitoring > Sign-in logs (or Microsoft Graph API /auditLogs/signIns)`
**Value:** Sign-in logs are the primary source for detecting compromised identities. Filtering by ResultType reveals specific failure reasons (e.g., 50126 invalid password, 50074 MFA required, 53003 blocked by CA policy). Impossible-travel detection compares sequential sign-in locations. Non-interactive sign-in logs expose token replay attacks where stolen refresh tokens are used from attacker infrastructure without triggering MFA.

### AWS CloudTrail Management Events
**Location:** `AWS CloudTrail > Event history (last 90 days) or trail delivery in S3 / CloudWatch Logs`
**Value:** CloudTrail is the primary source for reconstructing attacker activity across AWS accounts. It identifies the calling principal, source IP, user agent, request parameters, and affected resources for changes to IAM, EC2, EKS, ECR, S3, and logging configuration itself. It also reveals anti-forensics such as trail deletion, region disabling, or tampering with guardrail services.

### Kubernetes API Server Audit Log
**Location:** `Kubernetes API server audit log (--audit-log-path) or managed-cluster equivalent (AKS diagnostic settings, EKS control-plane logging, GKE Cloud Logging)`
**Value:** The K8s API audit log reconstructs the attacker's control-plane activity: what objects were created or modified, which service accounts were used, what images were deployed, which secrets were accessed. On managed clusters, enabling and forwarding control-plane logs is a prerequisite to meaningful investigation.

### Kubernetes etcd Snapshot
**Location:** `etcd data directory on control-plane nodes (/var/lib/etcd) or managed-equivalent snapshot (e.g., AKS etcd backup)`
**Value:** An etcd snapshot captures the exact cluster state at a moment in time, including secrets and configurations attackers may have modified. Used for control-plane forensics where runtime objects have been altered or deleted, and as a point-in-time anchor for reconstructing what the attacker saw and touched.

### Kubelet Node-Level Logs
**Location:** `Kubelet systemd journal on each node (`journalctl -u kubelet`) and kubelet log files (/var/log/kubelet.log)`
**Value:** Kubelet logs fill gaps between the control-plane audit log and container runtime events: they show the node's perspective on pod startup, image pulls, sidecar injection, and health-probe failures. Critical when an ephemeral container has been evicted and only node-level records remain.

### Container Runtime State and Events
**Location:** `containerd or Docker daemon log (journalctl -u containerd / journalctl -u docker), runtime state directory (/var/lib/containerd, /var/lib/docker)`
**Value:** When a compromised container has been evicted or replaced, the runtime state directory may still hold the container configuration (CRI-O/containerd JSON files), recent log tail, and layer references. Combined with the image registry, these reconstruct what actually ran and for how long.

### AWS Lambda Execution Logs
**Location:** `CloudWatch Logs log group `/aws/lambda/<function-name>``
**Value:** Lambda execution logs show what each invocation did: inputs, outputs, error traces, timing anomalies, and any stdout-printed attacker activity. Combined with CloudTrail management events for the function, they reconstruct both the "who deployed" and "what happened during execution" dimensions.

## Part 3: Key Queries

### Backdoor Analysis
```
DeviceRegistryEvents | where RegistryKey has_any ("<malicious-reg-path-1>","<malicious-reg-path-2>") | project Timestamp, DeviceName, RegistryKey, RegistryValueName, RegistryValueData
```

```
DeviceFileEvents | where FolderPath has_any ("<drop-path-1>","<drop-path-2>") or FileName in~ ("<dropped-file-1>","<dropped-file-2>") | summarize by DeviceName, FolderPath, FileName
```

### APT Attribution
```
SigninLogs | where TimeGenerated > ago(90d) | where UserPrincipalName in~ (<suspected-targets>) | summarize by IPAddress, Country, UserAgent, ClientAppUsed | order by TimeGenerated asc
```

```
DeviceProcessEvents | where Timestamp > ago(90d) | where InitiatingProcessFileName =~ "w3wp.exe" or InitiatingProcessFileName =~ "svchost.exe" | where FileName =~ "rundll32.exe" or FileName =~ "regsvr32.exe" | project Timestamp, DeviceName, ProcessCommandLine, InitiatingProcessCommandLine
```

### APT Dwell Hunt
```
DeviceProcessEvents | where Timestamp > ago(180d) | where ProcessCommandLine has "powershell" and ProcessCommandLine has_any ("-enc","-EncodedCommand") | where strlen(ProcessCommandLine) > 500 | summarize cnt=count() by DeviceName, InitiatingProcessFileName | order by cnt desc
```

```
DeviceRegistryEvents | where Timestamp > ago(180d) | where RegistryKey has_any ("\\Run","\\RunOnce","\\Winlogon","\\Image File Execution Options","\\Services\\") | where RegistryValueData has_any ("%AppData%","%Temp%","%Public%","\\Users\\Public\\") | project Timestamp, DeviceName, RegistryKey, RegistryValueName, RegistryValueData
```

### Assume-Breach Rebuild
```
Get-ADUser -Filter {PasswordLastSet -lt (Get-Date).AddDays(-1)} -Properties PasswordLastSet | Where-Object {$_.Enabled -eq $true -and $_.SamAccountName -match "svc|admin|backup"}
```

```
Get-ADObject -Filter {ObjectClass -eq "msDS-GroupManagedServiceAccount"} -Properties msDS-ManagedPasswordInterval,PasswordLastSet
```

### APT Intel Share
```
Which IoCs in this incident have already been publicly reported, and which are novel?
```

```
Are there detection rules we built during this incident that would benefit peers in our sector?
```

### Container Escape
```
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].securityContext.privileged == true) | {ns:.metadata.namespace,name:.metadata.name}'
```

```
kubectl get pods -A -o json | jq '.items[] | select(.spec.volumes[]? | .hostPath?) | {ns:.metadata.namespace,name:.metadata.name,host_paths:[.spec.volumes[].hostPath.path]}'
```

### K8s Workload Isolation
```
kubectl get networkpolicies -A; kubectl describe networkpolicy -n <ns> <policy>
```

```
kubectl logs <pod> -n <ns> --previous --tail=-1 > pod_logs.txt
```

### Serverless Containment
```
aws lambda list-event-source-mappings --function-name <name> | jq .EventSourceMappings[].UUID
```

```
aws iam get-role-policy --role-name <exec-role> --policy-name <policy>
```

---
*Generated by DFIR Assist*