Cognitive Defense Agent
NLP prompt injection and jailbreak detection. Guards against adversarial text designed to manipulate downstream AI systems.
What It Does
The Cognitive Defense Agent uses a fine-tuned NLP model to detect prompt injection patterns, jailbreak attempts, role-playing attacks, and other adversarial text constructs embedded in data payloads. Critical for pipelines that feed into AI/LLM systems.
Capabilities
- Prompt injection detection
- Jailbreak attempt identification
- Role-playing attack detection
- Adversarial text classification
- Multi-language support
- Confidence scoring
Example
const result = await mcp.call('cognitive_defense_scan', {
payload: normalizedPayload,
stream_uuid: streamUUID,
scan_depth: 'standard',
});
console.log(result.adversarial_detected); // false
console.log(result.confidence); // 0.02 Configuration
agent: cognitive-defense-agent
version: "1.0"
model: redqueen-cognitive-defense-v2
scan_fields: all_text
detection:
prompt_injection: true
jailbreak: true
threshold:
flag_confidence: 0.75
reject_confidence: 0.95 Related Agents
Agent Smith Prime
Security sub-swarm orchestrator. Coordinates Code Injection Specialist, Cognitive Defense Agent, and Vaccine Compiler for comprehensive threat analysis.
Code Injection Specialist
OWASP/OSINT/YARA scanning specialist. Detects SQL injection, XSS, command injection, and known malware signatures in data payloads.
Vaccine Compiler
Payload sanitization and format normalization. Removes threats, neutralizes malicious content, and produces a clean payload for downstream processing.