Skip to content

sahil-shubham/gha-runner-infra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Hosted GitHub Actions Runner Controller (Dual Architecture)

This project helps you deploy two self-hosted GitHub Actions Runner Controller (ARC) setups on AWS EC2, one for AMD64 (x86_64) and one for ARM64 (aarch64), using k3s for lightweight Kubernetes. It includes ECR authentication and proper IAM permissions setup for container operations on both instances.

Table of Contents

Overview

This project provides:

  • Infrastructure as Code (Terraform) to provision separate AMD64 and ARM64 AWS EC2 instances.
  • Automated instance initialization (user_data.sh.tftpl) setting up Docker, k3s, Kubernetes tools (kubectl, Helm), cloning this repository, and configuring maintenance cronjobs.
  • Kubernetes cluster setup using k3s on each instance.
  • GitHub Actions Runner Controller deployment using Helm, targetable to either cluster via a dedicated script (helm/deploy.sh).
  • Cronjobs for ECR authentication refresh and Docker cleanup.
  • IAM role configuration for ECR access shared by both instances.

Prerequisites

Before you begin, ensure you have the following installed and configured:

  • Terraform: Version compatible with the configuration (check terraform/terraform.tf).

You will also need:

  • GitHub Personal Access Token (PAT):
    • repo_clone_token: A PAT with repo scope (or fine-grained read access to this specific repository) required by Terraform during instance setup. See Infrastructure Provisioning for usage. Treat this as sensitive.
    • github_token: A Fine-grained PAT for ARC itself, required after instance setup. See ARC Kubernetes Setup for details and permissions. (Make sure it has the correct permissions described there!).

Infrastructure Provisioning (Terraform)

This section covers setting up the underlying AWS infrastructure using Terraform.

1. Set up Remote State Storage (Optional, Recommended)

If you haven't configured Terraform remote state using S3 and DynamoDB, follow these steps. This is crucial for team collaboration and state management.

Create an S3 bucket for Terraform state (replace YOUR_BUCKET_NAME and region if needed):

RANDOM_SUFFIX=$(openssl rand -hex 4)
BUCKET_NAME="gha-runner-tfstate-${RANDOM_SUFFIX}"
# Replace us-east-1 with your desired AWS region
aws s3api create-bucket --bucket "${BUCKET_NAME}" --region us-east-1
aws s3api put-bucket-versioning --bucket "${BUCKET_NAME}" --versioning-configuration Status=Enabled
echo "Created S3 bucket for Terraform state: ${BUCKET_NAME}"

Create DynamoDB table for state locking (match your S3 bucket region):

# Replace us-east-1 with your desired AWS region
aws dynamodb create-table \
    --table-name gha-runner-terraform-lock \
    --attribute-definitions AttributeName=LockID,AttributeType=S \
    --key-schema AttributeName=LockID,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST \
    --region us-east-1

Important: Update terraform/terraform.tf to use these backend resources:

terraform {
  # ... other config ...
  backend "s3" {
    bucket         = "YOUR_BUCKET_NAME" # Replace with your bucket name
    key            = "terraform.tfstate"
    region         = "us-east-1"      # Replace with your region
    dynamodb_table = "gha-runner-terraform-lock"
  }
}

2. Define Terraform Variables

Before applying Terraform, you must provide values for:

  • ssh_cidr_blocks: Your IP address range(s) allowed for SSH access (e.g., ["YOUR_IP/32"]).
  • repository_url: The HTTPS URL of this Git repository (e.g., https://github.com/your-org/your-repo.git).
  • repo_clone_token: A GitHub PAT with repo scope (or fine-grained read access) needed by the user_data script to clone this repository onto the instances during initialization. Treat this token as sensitive.

You can provide these using a terraform.tfvars file in the terraform/ directory (recommended for non-sensitive values like ssh_cidr_blocks and repository_url). Example: terraform # terraform.tfvars ssh_cidr_blocks = ["YOUR_IP/32"] repository_url = "https://github.com/your-org/your-repo.git"

Refer to terraform/variables.tf for all available variables and their default values.

3. Deploy Infrastructure

Navigate to the Terraform directory and run the following commands:

cd terraform

# Initialize Terraform (only needed on first run or after backend/module changes)
terraform init

# Apply the configuration
terraform apply

cd ..

This provisions two EC2 instances (one AMD64, one ARM64), IAM roles, security groups, and generates SSH key pairs stored securely in AWS Secrets Manager.

Instance Initialization (user_data.sh.tftpl) Summary

During instance startup, the user_data.sh.tftpl script automatically performs the following setup tasks:

  • Installs prerequisites (git, jq, cronie).
  • Installs and configures Docker.
  • Installs k3s (lightweight Kubernetes).
  • Waits for the k3s server node and API to be ready.
  • Copies the k3s kubeconfig to /home/ec2-user/.kube/config for kubectl access.
  • Installs Kubernetes tools: kubectl, helm and k9s.
  • Clones this Git repository (using the repository_url and repo_clone_token) into /home/ec2-user/<repo_name>.
  • Sets up cronjobs for ECR token refresh (refresh-token.sh) and Docker system prune (docker-prune.sh).
  • Ensures correct file permissions.

Logs for this process can be found on each instance at /var/log/user-data.log. (See Troubleshooting if issues occur).

Connecting to Instances

After terraform apply completes successfully:

1. Configure SSH Access

Retrieve the private SSH keys from AWS Secrets Manager and configure your local ~/.ssh/config for easy access.

AMD64 Instance:

INSTANCE_ARCH="amd64"
aws secretsmanager get-secret-value \
    --secret-id $(terraform output -raw ssh_secret_arn_${INSTANCE_ARCH}) \
    --query 'SecretString' \
    --output text | \
    jq -r '.private_key' > ~/.ssh/gha-runner-key-${INSTANCE_ARCH}.pem
chmod 600 ~/.ssh/gha-runner-key-${INSTANCE_ARCH}.pem

# Add/Update SSH config entry in ~/.ssh/config
cat << EOF >> ~/.ssh/config

Host github-runner-${INSTANCE_ARCH}
    HostName $(terraform output -raw public_ip_${INSTANCE_ARCH})
    User ec2-user
    IdentityFile ~/.ssh/gha-runner-key-${INSTANCE_ARCH}.pem
EOF
echo "SSH Config updated for github-runner-${INSTANCE_ARCH}"

ARM64 Instance:

INSTANCE_ARCH="arm64"
aws secretsmanager get-secret-value \
    --secret-id $(terraform output -raw ssh_secret_arn_${INSTANCE_ARCH}) \
    --query 'SecretString' \
    --output text | \
    jq -r '.private_key' > ~/.ssh/gha-runner-key-${INSTANCE_ARCH}.pem
chmod 600 ~/.ssh/gha-runner-key-${INSTANCE_ARCH}.pem

# Add/Update SSH config entry in ~/.ssh/config
cat << EOF >> ~/.ssh/config

Host github-runner-${INSTANCE_ARCH}
    HostName $(terraform output -raw public_ip_${INSTANCE_ARCH})
    User ec2-user
    IdentityFile ~/.ssh/gha-runner-key-${INSTANCE_ARCH}.pem
EOF
echo "SSH Config updated for github-runner-${INSTANCE_ARCH}"

2. Connect via SSH

You can now connect to the runners using the configured hostnames:

ssh github-runner-amd64
ssh github-runner-arm64

3. Access Kubernetes (kubectl)

The user_data script automatically configures kubectl on each instance to use the local k3s cluster via /home/ec2-user/.kube/config. Once connected via SSH, you can immediately use kubectl commands:

# On github-runner-amd64 or github-runner-arm64
kubectl get nodes
kubectl get pods -A

ARC Kubernetes Setup

This setup needs to be performed manually on each instance (AMD64 and ARM64) after they are running and you have connected via SSH.

1. Create GitHub PAT for ARC

Create a GitHub Fine-grained Personal Access Token (PAT) specifically for the Actions Runner Controller. This is different from the repo_clone_token used during infrastructure setup. (Refer back to Prerequisites for the general PAT requirements).

  • Go to GitHub Settings > Developer settings > Personal access tokens > Fine-grained tokens.
  • Click "Generate new token".
  • Set an appropriate name (e.g., arc-token-amd64) and expiration (e.g., 30 days - remember to rotate!).
  • Select the repository scope: "All repositories" or specific repositories ARC will manage runners for.
  • Set the following Repository permissions:
    • Actions: Read-only
    • Administration: Read & write (Needed for ARC to register/unregister runners)
    • Metadata: Read-only
  • If configuring runners at the Organization level (instead of repo), set these Organization permissions:
    • Self-hosted runners: Read & write

Generate and securely store the token. You will need it in the next step.

2. Create Kubernetes Secret (github-token)

On each instance (connect via ssh github-runner-amd64 and ssh github-runner-arm64 separately), create the Kubernetes secret containing the PAT you just generated.

# Run this command ON the github-runner-amd64 instance
kubectl create secret generic github-token \
  --namespace=default \
  --from-literal=github_token='YOUR_AMD64_GITHUB_PAT'

# Run this command ON the github-runner-arm64 instance
kubectl create secret generic github-token \
  --namespace=default \
  --from-literal=github_token='YOUR_ARM64_GITHUB_PAT'

Replace YOUR_AMD64_GITHUB_PAT and YOUR_ARM64_GITHUB_PAT with the actual token(s). You can use the same token for both if its scope allows, but using separate tokens might be preferable for auditing or rotation. The secret name github-token and namespace default are expected by the Helm charts (see helm/values/arc-runner-set-values.yaml).

Important: Remember to repeat this secret creation/update process whenever the PAT expires! (Currently every 30 days). See Maintenance section for rotation steps.

3. Initial ECR Authentication

The user_data script sets up a cronjob to refresh ECR credentials (detailed under Maintenance), but the Docker configuration file might not exist immediately after setup. To ensure the ARC runner pods can pull images from ECR right away, manually run the refresh script once on each instance:

# Run this command ON the github-runner-amd64 instance
cd ~/gha-runner-infra 
./refresh-token.sh

# Run this command ON the github-runner-arm64 instance
cd ~/gha-runner-infra
./refresh-token.sh

This script uses the instance's IAM role to get ECR credentials and store them in /home/ec2-user/.docker/config.json, which the runner pods will use.

ARC Deployment (Helm)

With the infrastructure and Kubernetes secrets ready (see ARC Kubernetes Setup), deploy the Actions Runner Controller components using the provided Helm script. This script must be run on each instance to deploy the components specific to that architecture's cluster.

Using helm/deploy.sh

Navigate to the Helm directory within the cloned repository on the instance and execute the script.

# Run ON the target instance (e.g., github-runner-amd64)
cd ~/gha-runner-infra/helm # Adjust path if your repo name is different

# --- Deployment Options ---

# Option 1: Deploy BOTH ARC Controller and Runner Set (Recommended)
# This is the default behavior if no arguments are given.
./deploy.sh

# Option 2: Deploy ONLY the ARC Controller
# Useful if you only want to update the controller itself.
# ./deploy.sh -r arc

# Option 3: Deploy BOTH ARC Controller and Runner Set (Explicit)
# Functionally the same as Option 1. The script ensures the controller
# is deployed/updated before the runner set.
# ./deploy.sh -r arc-runner-set

# --- Check Deployment ---
kubectl get pods -n arc-system # Check controller pods
kubectl get pods -n default    # Check runner-set pods (will scale based on demand)

The script performs the following actions:

  • Auto-detects the instance architecture (amd64/arm64).
  • Uses the kubeconfig at /home/ec2-user/.kube/config.
  • Verifies the github-token secret exists in the default namespace.
  • Uses helm upgrade --install to deploy/update:
    • The ARC Controller (gha-runner-scale-set-controller) to the arc-system namespace.
    • The ARC Runner Set (gha-runner-scale-set) to the default namespace.
  • Appends the architecture to the Helm release name (e.g., arc-amd64, arc-runner-set-arm64).
  • Uses values from helm/values/arc-values.yaml and helm/values/arc-runner-set-values.yaml.

Run the ./deploy.sh command on both the github-runner-amd64 and github-runner-arm64 instances.

Customizing ARC Behavior

You can modify the behavior of the ARC controller and runner sets by editing the YAML files in the helm/values/ directory before running ./deploy.sh. Key files:

  • helm/values/arc-values.yaml: Configuration for the controller itself.
  • helm/values/arc-runner-set-values.yaml: Configuration for the runner pods, including:
    • githubConfigUrl: Target repository or organization URL.
    • template.spec.containers[0].image: Runner container image.
    • minRunners, maxRunners: Scaling parameters.
    • template.spec.labels: Labels applied to runner pods (used in runs-on).

Usage in GitHub Actions Workflows

To target a specific architecture in your GitHub Actions workflows, use the runs-on key with labels that match the runner configuration. The deploy.sh script and default values (customizable as noted in ARC Deployment (Helm)) configure the runner set with labels including the architecture.

Example workflow targeting both architectures:

name: Build Multi-Arch Project

on: push

jobs:
  build-amd64:
    # Matches labels defined in helm/values/arc-runner-set-values.yaml + arch
    runs-on: arc-runner-set-amd64
    steps:
      - uses: actions/checkout@v4
      - name: Build for AMD64
        run: echo "Building on $(uname -m)..."
      # ... other AMD64 steps ...

  build-arm64:
    # Matches labels defined in helm/values/arc-runner-set-values.yaml + arch
    runs-on: arc-runner-set-arm64
    steps:
      - uses: actions/checkout@v4
      - name: Build for ARM64
        run: echo "Building on $(uname -m)..."
      # ... other ARM64 steps ...

Ensure the runs-on labels exactly match those defined in your Helm values (template.spec.labels in arc-runner-set-values.yaml) combined with the architecture suffix (e.g., -amd64 or -arm64) automatically added by the runner set configuration.

Maintenance

The setup includes automated maintenance tasks configured via cronjobs on each instance.

ECR Token Refresh

  • Script: /home/ec2-user/<repo_name>/refresh-token.sh
  • Schedule: Runs every 9 hours via cron.
  • Action: Uses the instance's IAM role to fetch fresh AWS ECR login credentials and updates /home/ec2-user/.docker/config.json.
  • Log: /home/ec2-user/logs/cron-ecr-refresh.log
  • Purpose: Ensures runner pods can continuously pull images from ECR.

Docker System Cleanup

  • Script: /home/ec2-user/<repo_name>/docker-prune.sh
  • Schedule: Runs daily at 4:00 AM via cron.
  • Action: Executes docker system prune -af to remove unused containers, networks, volumes, and images.
  • Log: /home/ec2-user/logs/cron-docker-prune.log (includes disk usage before/after).
  • Purpose: Prevents the instance disk from filling up with unused Docker resources.

Log directory (/home/ec2-user/logs/) is created automatically during instance initialization.

GitHub PAT Rotation

This process needs to be performed on each instance (amd64 and arm64) when the GitHub PAT used for the github-token secret is about to expire. Failure to rotate the token will prevent ARC from registering new runners.

Steps (perform on each instance via SSH):

  1. Generate New PAT: Create a new GitHub Fine-grained PAT with the required permissions, as described in the ARC Kubernetes Setup section.
  2. Set Architecture Variable: Define the architecture for the current instance to simplify commands:
    # On amd64 instance:
    INSTANCE_ARCH="amd64"
    # On arm64 instance:
    # INSTANCE_ARCH="arm64"
  3. Uninstall ARC Components: Uninstall the runner set and the controller using Helm. This ensures they stop using the old token.
    helm uninstall arc-runner-set-${INSTANCE_ARCH} -n default
    helm uninstall arc-${INSTANCE_ARCH} -n arc-system
  4. Delete Old Secret: Remove the existing Kubernetes secret.
    kubectl delete secret github-token -n default
  5. Create New Secret: Create the secret again using the new PAT.
    # Replace YOUR_NEW_PAT with the token generated in step 1
    kubectl create secret generic github-token \
      --namespace=default \
      --from-literal=github_token='YOUR_NEW_PAT'
  6. Redeploy ARC: Use the deployment script to reinstall the controller and runner set, which will now use the new secret.
    cd ~/gha-runner-infra/helm # Adjust path if your repo name is different
    ./deploy.sh
  • Frequency: Depends on the expiration date set for the PAT (e.g., currently every 30 days).

Important Note About Runner Configuration

The minRunners setting in helm/values/arc-runner-set-values.yaml is intentionally kept at 0 by default. This ensures that new runners start with the latest Docker configuration containing fresh ECR credentials fetched by the refresh-token.sh cronjob. Setting minRunners higher could lead to idle runners with stale ECR credentials, potentially causing image pull failures. The small delay in spinning up new runners from zero is generally preferable to authentication issues.

Troubleshooting

Common issues and potential solutions:

  1. Instance Initialization Failures:
    • Check the user data log: ssh github-runner-<arch> 'cat /var/log/user-data.log'
    • Look for errors related to package installation, k3s setup, Docker, or repository cloning. Check TF_VAR_repo_clone_token validity if cloning fails.
  2. ARC Controller or Runner Pods Not Starting/Crashing:
    • Check pod status and events: ssh github-runner-<arch> 'kubectl get pods -n <namespace>' (use arc-system for controller, default for runners)
    • Describe the failing pod: ssh github-runner-<arch> 'kubectl describe pod <pod-name> -n <namespace>'
    • Check pod logs: ssh github-runner-<arch> 'kubectl logs <pod-name> -n <namespace>'
    • Verify the github-token secret exists and is valid: ssh github-runner-<arch> 'kubectl get secret github-token -n default' (The secret data won't be shown, but its existence is key. Ensure the PAT hasn't expired or been revoked).
  3. Cannot SSH to Instance:
    • Verify Security Group rules allow SSH from your IP (ssh_cidr_blocks in Terraform).
    • Ensure your local ~/.ssh/config has the correct HostName (public IP from terraform output) and IdentityFile.
    • Check permissions on the private key file: chmod 600 ~/.ssh/gha-runner-key-<arch>.pem.
  4. Disk Space Issues:
    • Check current disk usage: ssh github-runner-<arch> 'df -h'
    • Verify the Docker prune cronjob is running and check its log: /home/ec2-user/logs/cron-docker-prune.log.
    • Manually run prune if needed: ssh github-runner-<arch> 'docker system prune -af'
    • Consider increasing the EBS volume size in terraform/main.tf if persistent space issues occur.
  5. k3s Service Issues:
    • Check k3s service status: ssh github-runner-<arch> 'sudo systemctl status k3s'
    • Check k3s logs: ssh github-runner-<arch> 'sudo journalctl -u k3s'

About

Easy to Setup Multi-Arch GitHub ARC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published