Through a recently completed multipronged red-teaming effort, the agency said it will develop repeatable testing datasets that can be used to evaluate large language model tools and services in the ...