3 min read

Past Work

Past Work
Photo by Fauzan Saari / Unsplash

Here is a selection of past work in AI safety and bias testing and evaluation, as well as original AI/ML/NLP research.

2025

Department of Defense Crowdsourced AI Red-Teaming Program

  • Led large-scale medical AI evaluation with Humane Intelligence and DOD's Chief Digital and Artificial Intelligence Office (CDAO) (link)
  • Coordinated red teaming exercise testing Large Language Model (LLM) chatbots for military medicine applications. Over 200 participants, including clinical providers and healthcare analysts from Defense Health Agency (DHA), Uniformed Services University of the Health Sciences, and military services.

Results: 800+ findings of potential vulnerabilities and biases

  • Compared three popular LLMs across medical use cases
  • Created benchmark datasets for evaluating future vendors
  • Findings informed DOD policies for responsible Generative AI use
💡
"Since applying GenAI for such purposes within the DoD is in earlier stages of piloting and experimentation, this program acts as an essential pathfinder for generating a mass of testing data, surfacing areas for consideration, and validating mitigation options that will shape future research, development, and assurance of GenAI systems that may be deployed in the future." — Dr. Matthew Johnson, CDAO

ICLR Workshop: Multicultural and Multilingual AI Systems

  • Poster presentation: "Red Teaming for Trust: Evaluating Multicultural and Multilingual AI Systems in Asia-Pacific" on multilingual red teaming methodologies at International Conference on Learning Representations (ICLR) workshop. (link)
💡
"Addresses an important gap in AI safety... well-structured documentation of red-teaming in a novel context and adds original scientific evidence to literature on bias of LLMs." "This work addresses a complex and underexplored problem, making it a valuable addition to the workshop." — Program Chairs & Evaluators

Expert Speaker at Portland State University

Presented on "Measuring Meaning Through Adversarial Testing" at Portland State University's Natural Language Processing lab, bridging academic NLP research and practical AI safety deployment.

The presentation demonstrated how systematic red teaming reveals what AI systems actually understand versus what they appear to understand. Key topics included multi-lingual safety gaps (GPT-4 fails safety tests 79% of the time in low-resource languages vs. <1% in English), multi-turn attacks exposing context limitations, and evaluation methodology gaps between benchmark performance and real-world robustness.

Research questions raised:

  • Can we build language-agnostic safety filters that understand harmful intent regardless of language?
  • How do we systematically evaluate cross-lingual safety across diverse linguistic contexts?
  • What feedback loops exist between adversarial testing findings and model training improvements?
💡
"Work remains to be done in bridging the gap between an AI safety red team finding significant safety results and key decision-makers deciding to devote the resources to mitigate those safety issues." — PSU student @ PortNLP

2024

Appointed to GSA's Acquisition Policy Federal Advisory Committee (GAPFAC)

Adrianna Tan was appointed to the U.S. General Services Administration's Acquisition Policy Federal Advisory Committee (GAP FAC) in 2024, advising on the integration of artificial intelligence and emerging technologies into federal procurement processes.

The committee brings together leaders from academia, industry, and government to modernize federal acquisition strategies. GAP FAC's focus includes integrating AI, data analytics, cloud computing, and cybersecurity into government procurement to drive innovation and efficiency.

💡
"GSA is ready to bring in emerging technologies and deliver solutions that meet the needs of government in this evolving landscape. This committee's invaluable expertise will help federal agencies use modern tools to drive innovation, improve efficiency and deliver better results for the American people." — Robin Carnahan, former GSA Administrator (link)

World's first multi-cultural and multi-lingual red team

  • Led 9-country, 8-language AI safety evaluation with Singapore's Info-communications Media Development Authority (IMDA) and Humane Intelligence (link)
  • First-ever multilingual human-led red teaming exercise conducted across Asia-Pacific region.
  • 54 participants from nine countries tested AI systems in Chinese, Malay, Tamil, Bahasa Indonesia, Thai, Vietnamese, and English
  • 300+ more participated virtually

Results:

  • Published world's first multilingual expert-led red teaming evaluation report
  • Demonstrated significant safety filter gaps in non-English languages
  • Established baseline for regional AI safety standards across East, Southeast, and South Asia
  • Created replicable methodology for multilingual testing that addresses demographic and cultural disparities overlooked in Western-centric AI evaluations
💡
Featured at AI Action Summit 2025, Paris. Singapore Minister for Digital Development and Information Josephine Teo presented the findings at the 2025 AI Action Summit in Paris, announcing the publication of Singapore's AI Safety Red Teaming Challenge Evaluation Report. Minister Teo emphasized understanding "how LLMs perform with regard to different languages and cultures in the Asia Pacific region" and highlighted that "no one party can accomplish that alone," underscoring Singapore's commitment to international collaboration on AI safety. (link)

Nominated as Top Women in GovTech

Adrianna Tan, Founder, Future Ethics, United States
Meet the Women in GovTech 2024.

2021

  • Invited to be government expert at IEEE-NYU AI Procurement Primer and roundtable (link)
Subscribe to my newsletter

Join our newsletter for AI safety news and research updates