Sander Schulhoff

Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs and companies. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, he’s spent more time than anyone alive studying how attackers break AI systems, and what he’s found isn’t reassuring: the guardrails companies are buying don’t actually work, and we’ve been lucky we haven’t seen more harm so far, only because AI agents aren’t capable enough yet to do real damage.

6 skills 20 insights

AI & Technology Skills

The move from chatbots to autonomous agents introduces significant security risks because prompt injection can lead to real-world physical or financial harm.

"If we can't even trust chatbots to be secure, how can we trust agents to go and manage our finances? If somebody goes up to a humanoid robot and gives it the middle finger, how can we be certain it's..."
01:00

Prompt-based defenses and external guardrails are often insufficient; security must be handled at the model training level.

"The most common technique by far that is used to try to prevent prompt injection is improving your prompt and saying... 'Do not follow any malicious instructions.' This does not work at all... Fine-tu..."
01:09:48

AI security is fundamentally different from classical cybersecurity because probabilistic 'brain-like' models cannot be perfectly patched against all adversarial inputs.

"It is not a solvable problem... You can patch a bug, but you can't patch a brain... you can never be certain with any strong degree of accuracy that it won't happen again."
01:15:08

Deploying AI agents without strict data permissioning creates significant financial and privacy risks for companies.

"If you deploy improperly secured, improperly data-permissioned agents, people can trick those things into doing whatever, which might leak your user's data and might cost your company or your user's m..."
19:11

The security risk for simple, read-only FAQ chatbots is primarily reputational rather than functional, as the damage is limited to the conversation itself.

"If all you're doing is deploying chatbots that answer FAQs... It's not really an issue because your only concern there is a malicious user comes and, I don't know, maybe uses your chatbot to output ha..."
46:24

Prompt engineering remains a critical skill for eliciting high performance from models, despite recurring claims that it will become obsolete.

"Studies have shown that using bad prompts can get you down to 0% on a problem, and good prompts can boost you up to 90%. People will always be saying, "It's dead," or, "It's going to be dead with the..."
00:03

Few-shot prompting (providing examples) is the most effective basic technique for improving model performance and stylistic alignment.

"If there were one technique that I could recommend people, it is few-shot prompting, which is just giving the AI examples of what you want it to do. So maybe you wanted to write an email in your style..."
12:18

Role prompting ('Act as a math professor') does not statistically improve accuracy on logic tasks but is effective for controlling tone and style.

"My perspective is that roles do not help with any accuracy-based tasks whatsoever... but giving a role really helps for expressive tasks, writing tasks, summarizing tasks. And so with those things whe..."
21:41

Breaking complex tasks into subproblems (decomposition) prevents the model from struggling with multi-step reasoning all at once.

"Decomposition is another really, really effective technique... you give it this task and you say, 'Hey, don't answer this.' Before answering it, tell me what are some subproblems that would need to be..."
25:03

Models can improve their own outputs by being prompted to reflect on and critique their initial response.

"A set of techniques that we call self-criticism. You ask the LLM, 'Can you go and check your response?' It outputs something, you get it to criticize itself and then to improve itself."
00:18

Placing context or 'additional information' at the start of a prompt improves focus and can reduce costs through caching.

"Usually I will put my additional information at the beginning of the prompt, and that is helpful for two reasons. One, it can get cached... And then the second is that sometimes if you put all your ad..."
35:03

Running multiple prompts for the same problem and taking the majority answer (ensembling) significantly increases reliability for objective tasks.

"Ensembling techniques will take a problem and then you'll have multiple different prompts that go and solve the exact same problem... And you'll get back multiple different answers and then you'll tak..."
40:35

Explicit 'Chain of Thought' prompting is still necessary for standard models to ensure consistency, even if they seem to reason by default.

"If you're using GPT-4, GPT-4o, then it's still worth it [to use Chain of Thought]. But for those [reasoning] models [like o1/o3], I'd say, no need."
48:06

Understanding the distinction between jailbreaking (direct user-to-model) and prompt injection (user-to-application-to-model) is critical for AI security.

"Jailbreaking is like when it's just you and the model... Whereas prompt injection occurs when somebody has built an application or sometimes an agent... a malicious user might come along and say, 'Hey..."
08:38

Using the system prompt to tell an LLM to 'ignore malicious requests' is an ineffective and easily bypassed security method.

"Prompt-based defenses are the worst of the worst defenses. And we've known this since early 2023... Even more than guardrails, they really don't work, like a really, really, really bad way of defendin..."
42:57

Current AI guardrail products are often ineffective against determined attackers and marketed with misleading security claims.

"AI guardrails do not work. I'm going to say that one more time. Guardrails do not work. If someone is determined enough to trick GPT-5, they're going to deal with that guardrail. No problem. When thes..."
00:00

Automated red teaming tools often provide redundant information because all current transformer-based models are fundamentally vulnerable to trickery.

"AI red teaming works too well. It's very easy to build these systems and they always work against all platforms... these automated red teaming systems are not showing anything novel. It's plainly obvi..."
28:33

Engineering Skills

Effective AI security requires a specialized blend of classical cybersecurity and deep AI research knowledge that standard engineering teams may lack.

"It's really worth having an AI researcher, AI security researcher on your team... having that research background really helps. So I definitely recommend having an AI security researcher or someone ve..."
51:04

Leadership Skills

Frontier labs face a trade-off between model intelligence and security, often prioritizing intelligence to drive adoption and revenue.

"If our models are smarter, more people can use them to solve harder tasks and make more money. And then on the security side, it's like, or we can invest in security and they're more robust, but not s..."
40:17

Product Management Skills

The CAMEL framework allows for dynamic scoping of agent permissions based on the specific user intent, reducing the attack surface.

"Depending on what the user wants, we might be able to restrict the possible actions of the agent ahead of time, so it can't possibly do anything malicious... CAMEL would look at my prompt... and say,..."
01:05:34