Skip to content

iLearn-Lab/TOIS24-DKMD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

[TOIS 2024] Official implementation of DKMD, a dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems.

Authors

Xiaolin Chen1, Xuemeng Song1*, Liqiang Jing1, Shuo Li1, Linmei Hu2, Liqiang Nie1*

1 Shandong University, Shandong, China
2 Beijing Institute of Technology, Beijing, China
* Corresponding authors

Links


Table of Contents


Updates

  • [10/2023] Paper accepted at ACM Transactions on Information Systems (TOIS)
  • [10/2023] Release code and parameters

Introduction

This repository is the official implementation of the paper "Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model", published in ACM Transactions on Information Systems (TOIS), 2024.

Text response generation for multimodal task-oriented dialog systems is an essential yet challenging task. Existing efforts still suffer from two pivotal limitations: (1) overlooking the benefit of generative pre-training, and (2) ignoring the textual context-related knowledge. To address these limitations, we propose DKMD (Dual Knowledge-enhanced generative pretrained language Model for multimodal task-oriented Dialog systems), where BART is adopted as the backbone. DKMD consists of three key components:

  • Dual Knowledge Selection: Selects context-related knowledge from the knowledge base according to both textual and visual modalities of the given context.
  • Dual Knowledge-enhanced Context Learning: Seamlessly integrates the selected knowledge into the multimodal context learning from both global and local perspectives, while exploring the cross-modal semantic relation via dual cross-modal representation refinement.
  • Knowledge-enhanced Response Generation: Comprises a revised BART decoder with an additional dot-product knowledge-decoder attention (DKDA) sub-layer to explicitly use knowledge for precise text response generation.

Highlights

  • Among the first to integrate generative pretrained language models (GPLMs) into multimodal task-oriented dialog systems
  • Proposes dual knowledge selection to acquire context-related knowledge from both textual and visual modalities
  • Designs dual cross-modal representation refinement (vision-oriented and text-oriented) to capture cross-modal semantic relations
  • Devises a knowledge-enhanced BART decoder with dot-product knowledge-decoder attention for precise response generation
  • Achieves state-of-the-art performance on public multimodal task-oriented dialog benchmark

Method / Framework

Framework

Figure 1. Overall framework of DKMD, which consists of three vital components: (a) Dual Knowledge Selection, (b) Dual Knowledge-enhanced Context Learning, and (c) Knowledge-enhanced Response Generation.


Project Structure

.
├── asserts/               # Figures and framework diagrams
├── config/                # Configuration files
├── dataset/               # Dataset and data processing scripts
├── lib/                   # Library dependencies
├── model/                 # Model architecture definitions
├── target_file/           # Target files for evaluation
├── tools/                 # Utility tools
├── util/                  # Utility functions
├── constant.py            # Constants and hyperparameters
├── train.py               # Training script
├── train.sh               # Shell script for training
├── eval_2.sh              # Shell script for evaluation
├── README.md
└── ...

Installation

1. Clone the repository

git clone https://github.com/iLearn-Lab/DKMD.git
cd DKMD

2. Prerequisite

  • Python 3.8
  • PyTorch 1.0
  • NLTK 3.7
  • transformers 4.3.2

Usage

Training

sh train.sh <gpu_id> text <model_file> <output_file>

Evaluation

Perl script mteval-v14.pl is used to evaluate the text result. You should first extract the result from the log files and convert them into XML file. For convenience, the convert.py is provided.


Citation

If you find this work useful for your research, please cite our paper:

@article{chen2024dkmd,
  title={Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model},
  author={Chen, Xiaolin and Song, Xuemeng and Jing, Liqiang and Li, Shuo and Hu, Linmei and Nie, Liqiang},
  journal={ACM Transactions on Information Systems},
  volume={42},
  number={2},
  pages={1--28},
  year={2024},
  publisher={ACM},
}

Acknowledgement

  • Thanks to our collaborators for their valuable support.
  • Thanks to the open-source community for providing useful baselines and tools.

License

This project is released under the Apache License 2.0.

About

[TOIS 2024] Official Implementation for Multimodal Dialog Systems with Dual Knowledge-enhancedGenerative Pretrained Language Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors