A professional extension of TextAttack for multi-label adversarial example generation, with focus on toxicity classification. Generate adversarial examples that flip multiple labels simultaneously while preserving semantic meaning and grammatical correctness.
- 🏗️ Modular Architecture: Support for multiple models (Detoxify, custom HF models) and datasets
- ⚙️ CLI Interface: User-friendly command-line tools for all major operations
- 📊 Configuration-Driven: YAML configuration for flexible attack parameters
- 🔬 Professional Testing: Comprehensive test suite with coverage reporting
- 📚 Rich Examples: Jupyter notebooks and tutorials
- 🚀 Easy Installation: Pip-installable with proper dependency management
# install from source
git clone https://github.com/QData/TextAttack-Multilabel
cd TextAttack-Multilabel
pip install -e .# Install with development dependencies
pip install -e ".[dev]"
# Run setup script (optional, creates conda environment)
python install_env.py# Show help
textattack-multilabel --help
# Run a basic attack on benign samples
textattack-multilabel attack --attack benign
# Preprocess and analyze data
textattack-multilabel preprocess --data data.csv --analyze --sample benign
# Run test suite
textattack-multilabel test --coverage# Download data
python example_toxic_adv_examples/download_data.py
# Run main attack script
python example_toxic_adv_examples/attack_multilabel_tae_main.py
# Run baseline example
python example_toxic_adv_examples/baseline_multiclass_toxic_adv_example_attack.py
# Run ACL23 example
python example_toxic_adv_examples/multilabel_acl2023.py
# Run tests
python test/run_tests.py --coverageTextAttack-Multilabel/
├── .gitignore # Git ignore file
├── pyproject.toml # Package configuration
├── install_env.py # Environment setup script
├── LICENSE # License file
├── README.md # This file
├── example_toxic_adv_examples/ # Example scripts and configs
│ ├── attack_multilabel_tae_main.py # Main attack script
│ ├── baseline_multiclass_toxic_adv_example_attack.py # Baseline attack example
│ ├── download_data.py # Data download script
│ ├── multilabel_acl2023.py # ACL23 multilabel example
│ ├── config/ # Configuration files
│ │ └── toxic_adv_examples_config.yaml # Configuration for examples
│ └── __pycache__/ # Python cache
├── textattack_multilabel/ # Main package
│ ├── __init__.py # Package initialization
│ ├── attack_components.py # Attack components
│ ├── goal_function.py # Goal function implementations
│ ├── multilabel_model_wrapper.py # Model wrapper for multilabel
│ └── multilabel_target_attack_recipe.py # Multilabel attack recipes
└── test/ # Test suite
├── __init__.py
├── run_tests.py # Test runner
├── test_model_wrapper.py
├── test_multilabel_attack_recipes.py
└── test_shared.py
The package uses YAML configuration files for flexible setup:
# config/attack_config.yaml
defaults:
model:
type: "detoxify" # or "custom"
variant: "original"
dataset:
name: "jigsaw_toxic_comments"
sample_size: 500
attack:
recipe: "MultilabelACL23"
labels_to_maximize: [] # maximize all toxic labels
labels_to_minimize: [] # minimize currently toxic labels
# ... more optionsfrom textattack_multilabel import MultilabelModelWrapper, MultilabelACL23
# Load your model
model_wrapper = MultilabelModelWrapper(your_model, your_tokenizer, multilabel=True)
# Create attack recipe
attack = MultilabelACL23.build(
model_wrapper=model_wrapper,
labels_to_maximize=[0, 1, 2, 3, 4, 5], # maximize all toxic labels
labels_to_minimize=[],
wir_method="delete"
)
# Run attack
attacker = textattack.Attacker(attack, dataset)
results = attacker.attack_dataset()See examples/getting_started.ipynb for a complete walkthrough including:
- Model setup and testing
- Creating attack recipes
- Running attacks on sample data
- Results analysis
- Configuration examples
attack_multilabel_tae_main.py: Modular attack generation supporting multiple models and datasetsdownload_data.py: Secure Kaggle dataset download with environment variables
install_env.py: Cross-platform environment setup with verificationrun_tests.py: Comprehensive test runner with coverage and parallel execution
Run the complete test suite:
python test/run_tests.py --coverageTest structure:
- Unit tests: Individual function/component testing
- Integration tests: Full pipeline testing
- Coverage reporting: HTML coverage reports generated
python install_env.pyThis creates a conda environment (py3.8) with required dependencies: textattack[tensorflow,optional], detoxify, kaggle, sentence-transformers. Uses Python subprocess for secure command execution.
Download the Jigsaw Toxic Comments dataset from Kaggle (requires KAGGLE_USERNAME and KAGGLE_KEY environment variables):
python example_toxic_adv_examples/download_data.pyThis uses environment variables for Kaggle API credentials instead of plaintext files for improved security.
See the example scripts in example_toxic_adv_examples/ for running attack examples, such as multilabel_acl2023.py.