Description
This work considers the task of vision-and-language inference (VLI): predicting whether an inputthe sentence is true for given images or videos and starts with an investigation of model robustness to a set of 13 linguistic transformations, categorized as Semantics-Preserving or Semantics-Inverting based

This work considers the task of vision-and-language inference (VLI): predicting whether an inputthe sentence is true for given images or videos and starts with an investigation of model robustness to a set of 13 linguistic transformations, categorized as Semantics-Preserving or Semantics-Inverting based on whether they change the meaning of the sentence. It is observed that existing VLI models degenerate to close-to-random performance when tested on these linguistic transformations which include simple phenomena such as synonyms, antonyms, negation, swap-ping of subject and object, paraphrasing, and the substitutions of pronouns, comparatives, and numbers. This observation is utilized to design STAT(Semantics-Transformed Adversarial Training) { a model-agnostic and task-agnostic min-max optimization algorithm, with an inner maximization that utilizes semantic perturbations of in-put sentences to nd adversarial samples and an outer maximization that updates model parameters. Extensive experiments on three benchmark datasets (NLVR2, VIOLIN, VQA \Yes-No") not only demonstrate large gains in robustness to adversarial input sentences but also show model-agnostic performance improvements. This works also presents the suite of linguistic transformations as a robustness benchmark that may benet future research in vision and language robustness.
Reuse Permissions
  • Downloads
    pdf (8.1 MB)

    Details

    Title
    • Robust Vision and Language Inference via Semantics Transformed Adversarial Training
    Contributors
    Date Created
    2021
    Resource Type
  • Text
  • Collections this item is in
    Note
    • Partial requirement for: M.S., Arizona State University, 2021
    • Field of study: Computer Science

    Machine-readable links