Conversation
…into cyber_scenario Resolving merge conflict
|
This looks good! i'm wondering if there are ways to incorporate like xpia attacks or converters (MaliciousQuestionGeneratorConverter, there might be more just at first glance) to be a bit more creative rather than just updating prompts |
There definitely are! CyberStrategy is used very sparsely here, which I don't like, but I haven't found a way to reconcile the nature of cybersecurity harms (which are often sequential, iterative, and don't rely on conversions as much) with the tag-based system. But it's definitely something I want to drive in a second PR |
rlundeen2
left a comment
There was a problem hiding this comment.
Looks great! I recommend incorporating the changes first but they are small
Typo Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>
Description
Adds a cybersecurity harms scenario to pyrit called the CyberScenario, which tests a model's willingness to generate malware via single-turn or multi-turn (red teaming) attack methods. Changes listed below:
This PR is meant to be a starting point for additional cybersecurity harm scaffolding as there are still many places CyberScenario can be expanded on.
Tests and Documentation
Unit tests focus on initialization, attack generation, execution, and scenario properties, similarly to other scenarios.