Description
Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of training data for downstream or “Held-in” tasks. Often

Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of training data for downstream or “Held-in” tasks. Often in real-world situations, there is a scarcity of data available for finetuning, falling somewhere between few shot inference and fully supervised finetuning. In this work, I demonstrate the sample efficiency of instruction tuned models over various tasks by estimating the minimal training data required by downstream or “Held-In” tasks to perform transfer learning and match the performance of state-of-the-art (SOTA) supervised models. I conduct experiments on 119 tasks from Super Natural Instructions (SuperNI) in both the single task learning / Expert Modelling (STL) and multi task learning (MTL) settings. My findings reveal that, in the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks. In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement (ROUGE-L 74.68) over the previous SOTA. I conduct an analysis on T5 vs Tk-Instruct by developing several baselines to demonstrate that instruction tuning aids in increasing both sample efficiency and transfer learning. Additionally, I observe a consistent ∼ 4% performance increase in both settings when pre-finetuning is performed with instructions. Finally, I conduct a categorical study and find that contrary to previous results, tasks in the question rewriting and title generation categories suffer from instruction tuning.
Reuse Permissions
  • Downloads
    pdf (1.7 MB)

    Details

    Title
    • Instruction Tuned Models Are Quick Learners with Instruction Equipped Data on Downstream Tasks
    Contributors
    Date Created
    2023
    Resource Type
  • Text
  • Collections this item is in
    Note
    • Partial requirement for: M.S., Arizona State University, 2023
    • Field of study: Computer Science

    Machine-readable links