Skip to content

iamthinbaker/gpoket2

Repository files navigation

license apache-2.0
tags
pokemon
sprite-generation
gpt2

🎮 GPokeT2 — Pokémon Sprite Generator

A GPT-2 based autoregressive model that generates 64×64 Pokémon sprites token by token, conditioned on type, generation, evolution stage and more.

Pokemon sprite ASCII representation Train the model
-> -> GPT2-Small

🚀 Usage

Install dependencies:

pip install transformers huggingface_hub opencv-python torch

Generate a sprite:

import cv2
import numpy as np

from huggingface_hub import snapshot_download
from transformers import AutoModelForCausalLM
from transformers import PreTrainedTokenizerFast

# Cargar modelo
ckpt = snapshot_download("iamthinbaker/GPokeT2")
tokenizer = PreTrainedTokenizerFast.from_pretrained(ckpt)
model = AutoModelForCausalLM.from_pretrained(ckpt,trust_remote_code=True)

# Generar Pokémon
image = model.generate_sprite(
    tokenizer,
    type1="fire",
    type2="dragon",
    verbose=True,
)

# Guardar imagen
cv2.imwrite("pokemon.png", cv2.cvtColor(np.uint8(image), cv2.COLOR_RGB2BGR))

Available types:

normal 🥊 fighting 🔮 psychic
🔥 fire ☠️ poison 🐛 bug
💧 water 🌍 ground 🪨 rock
electric 🌪️ flying 👻 ghost
🌿 grass 🐉 dragon 🌑 dark
🧊 ice ⚙️ steel 🧚 fairy

🥖 ThinBaker's Team

This is the team that I hace created (TBH after many trials, the model can create very strage pokemons sometimes)

Name Sprite Type 1 Type 2
Scaborite bug rock
Tidewing bug water
Noctibell dark fairy
Umbramole dark ground
Zephyrael flying psychic
Me water psychic

🧬 Model Details

Dataset

The dataset covers all sprites from every mainline Gen 3 and Gen 4 game:

Generation Game Sprites
Gen 3 Pokémon Emerald 1 600
Gen 3 Pokémon FireRed / LeafGreen 312
Gen 3 Pokémon Ruby / Sapphire 837
Gen 4 Pokémon Diamond / Pearl 2 528
Gen 4 Pokémon Platinum 2 556
Gen 4 Pokémon HeartGold / SoulSilver 2 560
Total 10 393

Each sprite is then augmented to produce 12 variants before training:

Technique Variants Description
Horizontal flip ×2 Each sprite is mirrored left↔right at the ASCII level (pixel order reversed per row)
Color shift ×6 All 5 non-identity permutations of the RGB channels are applied — swap R↔G, R↔B, G↔B, cycle R→G→B, cycle R→B→G — plus the original palette

These two augmentations are independent and combined, so 1 original sprite → 2 flip variants × 6 color variants = 12 total samples — giving a final training set of ~124 700 sequences.

Pixel → ASCII encoding

Each 64×64 sprite is serialized as a sequence of ASCII characters before being fed to the model. Each pixel is quantized to 4 levels per channel (R, G, B ∈ {0, 1, 2, 3}) and packed into a single character:

char = chr(R×16 + G×4 + B + 59)   # 64 possible color chars
char = '~'                          # white / transparent pixel

This yields a vocabulary of 65 pixel tokens (one per color + ~ for background), plus special row-marker tokens ([ROW_00][ROW_63]) that delimit each row of 64 pixels. A full sprite is therefore a sequence of 64 rows × 64 pixels = 4 096 tokens.

The encoder/decoder lives in the slv layer of the pipeline (PokemonEncoder).

Original sprite ASCII representation

GPT2 Architecture

  • Context length: 4096
  • Embedding dim: 512
  • Layers: 12
  • Attention heads: 8

Conditioning embeddings

Every token in the sequence receives a sum of learned embeddings that condition the generation:

Embedding Categories Description
Pokémon identity up to N Unique embedding per Pokémon; can be interpolated to generate novel creatures
Type 1 19 Primary type (18 types + unknown)
Type 2 20 Secondary type (18 types + none + unknown)
Generation 10 Game generation (Gen I–IX + margin)
Evolution stage 4 Basic / Stage 1 / Stage 2 / other
Has evolution 2 Whether the Pokémon can still evolve
Is shiny 2 Normal vs. shiny palette
Color shift 6 Which RGB permutation was applied (augmentation label)
Row position 65 Which row (0–63) the current token belongs to (spatial 2-D encoding)
Column position 65 Which column (0–63) within the row (spatial 2-D encoding)

During training a small Gaussian noise (σ = 0.1) is added to the conditioning vector to improve robustness. Background tokens (~) are also down-weighted (×0.6) in the loss so the model focuses on learning colored pixels.

⚙️ Training

Platform RunPod
GPU NVIDIA RTX A4000 (16 GB VRAM)
CUDA 12.4
Steps 5 505
Training time ~53 hours
Cost $0.26 / hour · **$10 total**
Precision BF16
Optimizer AdamW with cosine LR scheduler
Gradient checkpointing

🙏 Acknowledgements

Inspired by matthewRayfield/pokemon-gpt-2, which first explored the idea of generating Pokémon sprites with GPT-2. This project builds on that concept with a custom-trained model, richer metadata conditioning (type, generation, evolution stage…) and a tokenizer designed specifically for sprite sequences.

Training data sourced from:

  • PokéAPI — comprehensive Pokémon REST API providing metadata (types, generations, evolution chains…) used to build the conditioning labels.
  • Veekun — sprite repository from which the original 64×64 PNG sprites were extracted and encoded.

📬 Contact

Made by ThinBaker — feel free to reach out!

✉️ Website thinbaker.com
🖥️ GitHub github.com/iamthinbaker
🐦 Twitter twitter.com/iamthinbaker
📊 LinkedIn linkedin.com/in/delgadopanadero
▶️ YouTube youtube.com/@iamthinbaker

About

A GPT-2 trained to generates 64×64 Pokemon sprites as ASCII

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors