Skip to content

BelfaidaMedReda/HAN-implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Hierarchical Attention Network (HAN) Using GRU

Python
PyTorch
Google Colab
License

This repository contains a PyTorch implementation of a Hierarchical Attention Network (HAN) for document classification — built and trained in Google Colab.
The model follows the approach from Yang et al., 2016, combining bi-directional GRUs with attention mechanisms at both the word and sentence levels to produce interpretable document representations.


🧩 Project Overview

The HAN architecture mimics the hierarchical structure of language:

  • Word Encoder → encodes sequences of words into sentence representations.
  • Sentence Encoder → encodes sentences into document representations.
  • Attention Layers → compute context-aware weights at both levels for interpretability.

HAN Architecture Diagram


🧠 Model Components

🔹 1. Self-Attention Layer (AttentionWithContext)

Implements the attention mechanism as described by Yang et al. (2016):

  • Computes weighted averages of hidden states using a trainable context vector.
  • Supports extraction of attention coefficients for interpretability.

🔹 2. Sentence Encoder (AttentionBiGRU)

  • Embedding layer (trainable word vectors)
  • Bidirectional GRU to capture context in both directions
  • Word-level attention layer

🔹 3. Document Encoder (HAN)

  • Time-distributed sentence encoder
  • Bidirectional GRU for sentence sequence modeling
  • Sentence-level attention
  • Linear + sigmoid output layer for classification

⚙️ Training Configuration

Parameter Description Value
Optimizer Adam lr=0.001
Loss Binary Cross-Entropy BCELoss
Batch Size 64
Epochs 15
Dropout 0.5
Hidden Units 50
Patience (Early Stopping) 2
Device GPU (if available)

✅ Results

Training achieved a validation accuracy of ≈85%, with strong interpretability through attention visualization.


📂 Project Structure

├── data/
│ ├── docs_train.npy
│ ├── labels_train.npy
│ ├── docs_test.npy
│ ├── labels_test.npy
│ └── word_to_index.json
│
├── Lab.ipynb # Main notebook (implementation & training)
├── mini-report.pdf
└── README.md # Project documentation

🚀 How to Run

▶️ Option 1: Open in Google Colab

Click below to open the notebook directly:
Open in Colab

▶️ Option 2: Run Locally

git clone https://github.com/<your-username>/hierarchical-attention-network.git
cd hierarchical-attention-network
pip install torch numpy tqdm

Then launch the notebook or run:

!unzip data.zip
train()

🔍 Attention Visualization

The model provides both word-level and sentence-level attention weights, enabling explainable NLP. Example of sentence-level attention output:

27.04 ; ) First of all , Mulholland Drive is downright brilliant .
24.69 A masterpiece .
18.43 This is the kind of movie that refuse to leave your head .

🧪 Key Takeaways

  • Hands-on implementation of Hierarchical Attention Networks with PyTorch.

  • Design and use of custom attention layers.

  • Exploration of explainable deep learning for NLP.

  • Training optimization using early stopping and gradient clipping.

📖 Reference

Yang, Zichao, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy.
"Hierarchical Attention Networks for Document Classification."
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2016).
[Read Paper]

🧾 Additional Discussion: mini-report.pdf

The file mini-report.pdf included in this repository presents a concise theoretical discussion about:

  • The evolution of attention mechanisms and their improvements beyond basic self-attention.
  • The motivations for replacing recurrent operations with self-attention, as introduced in “Attention Is All You Need” by Vaswani et al. (2017).
  • A detailed analysis of the Hierarchical Attention Network (HAN) architecture, highlighting its strengths, limitations, and contextual dependencies as discussed in later works such as “Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding” (Remy et al., 2019).

This report complements the implementation in the notebook by connecting practical experimentation with academic insights from seminal papers in NLP attention modeling.

About

Hirerchical Attention Network implementation by PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors