GhostPrompt · Tuğberk Akbulut

An eight-layer scanner that catches hidden instructions before a PDF ever reaches an LLM.

As more workflows feed documents straight into LLMs, prompt injection hidden inside a PDF (invisible text, malformed metadata, layered content a human reader would never see) has become a real attack surface. GhostPrompt scans a PDF across eight independent layers before it ever reaches a model’s context window, and returns a single verdict: SAFE, SUSPICIOUS, or DANGEROUS.

$ python3 scan.py suspicious.pdf

✓ Layer 1 — Dangerous PDF Features: CLEAN

✓ Layer 2 — Invisible Text: CLEAN

⚠ Layer 3 — Injection Patterns: ISSUES FOUND

✓ Layer 4 — Encoding/Obfuscation: CLEAN

⚠ Layer 5 — Stream Content: ISSUES FOUND

✓ Layer 6 — Zero-Width Characters: CLEAN

✓ Layer 7 — Annotation/ObjStm: CLEAN

✓ Layer 8 — XMP/Incremental Updates/Semantic: CLEAN

🔴 VERDICT: DANGEROUS

Threat coverage

Layer	Attack vector
1	JavaScript, launch actions, OpenAction, embedded files, form submissions
2	White text, tiny fonts, invisible render mode (`Tr=3`)
3	Prompt override phrases, role hijacking, jailbreak patterns
4	Base64 payloads, homoglyph obfuscation, acrostic encoding
5	Injection terms in compressed streams, structural anomalies
6	Zero-width and invisible Unicode characters (steganographic encoding)
7	Annotation field payloads (`/Contents`, `/T`, `/Subj`), ObjStm streams
8	XMP metadata injection, incremental update overlays, semantic camouflage

GhostPrompt reports risk signals rather than sanitizing a file, so a DANGEROUS or SUSPICIOUS verdict is meant to gate a document out of an AI pipeline rather than silently clean it. It ships both as a CLI (python3 scan.py document.pdf, scriptable for shell automation) and as a self-contained Claude Skill, so a document can be screened as the first step of a Claude-based workflow before its content is ever pasted into context.