Dev Tools

How to Avoid Cold Starts in Rental GPU Dev Env [Hybrid Volumes Guide]

GPUInfraAutomation

Originally published on LinkedIn (September 7, 2025) under the full title "How to Avoid Cold Starts in Rental GPU Dev Env [Hybrid Volumes Guide - Runpod.io + Vast.ai]".

Nowadays, it's not uncommon to find yourself in a situation that requires powerful infrastructure—especially GPUs. Many of you might own a gaming laptop or workstation, but your/my 40/50/../90-series card is nothing compared to H100s and H200s.

Of course, the solution isn't spending $30K on a GPU for home use. Instead, you turn to cloud or marketplace services. But there's a common challenge when doing so: The Cold Start.

What is Cold Start?

GPU cold starts commonly come from image pulls, many pip/conda installs, dataset copy, and model weight loads.

You obviously want short cold-starts without paying all the time for expensive GPU time. Below I'll give a pragmatic, somehow provider-agnostic architecture you may find useful. To provide a practical solution, I need to provide concrete examples, so I'll be using Runpod.io (network persistence) and Vast.ai (local), and a list of other options / trade-offs.

TL;DR

Keep state (code, dependencies, model/artifacts, caches) in fast persistent storage + pre-baked container images, use small automation to sync snapshots between providers (or to S3/MinIO), and use provider APIs/onstart scripts to pre-warm on start. Runpod network volumes help on Runpod side; Vast volumes are local-to-host and need syncing/cloning or a central object store to be truly portable.

Why these pieces?

Pre-baked images + on-disk model artifacts + lazy-loading/prewarming significantly reduce cold-start time. Baking and lazy loading are industry best-practices.

Concrete step-by-step (Runpod ⇄ Vast.ai)

1) Prepare immutable artifacts

2) Runpod.io side (fast persistent & reusable)

3) Vast.ai side (spot/cheap offers)

4) Keep Runpod and Vast in sync (workflow patterns)

You have two common strategies:

A — Central S3/MinIO canonical storage (recommended)

B — Try to reuse provider-local volumes when possible

Automation flow example (GitHub Action / tiny controller)

Example: basic scripts & commands (sketches)

Vast: create a volume & instance (conceptual)

# create a volume (via vast CLI)
vastai create volume <offer-id> size_in_gb name=mywork
 
# create instance referencing that volume
vastai create instance <offer-id> image=myorg/dev-pytorch:cuda \
  env 'VAST_S3_BUCKET=my-bucket' disk 30 \
  # sample flags - use current Vast CLI docs for exact flags

Runpod: use network volume and API

# configure
runpodctl config --apiKey RUNPOD_API_KEY
 
# create pod (conceptual JSON) referencing networkVolume
# Use runpod docs/console to attach network volume and schedule/predict jobs

Sync to S3 (both sides)

# from instance before shutdown
aws s3 sync /data s3://my-bucket/projectX/working/ --delete
 
# on startup
aws s3 sync s3://my-bucket/projectX/working/ /data --exclude 'big-unused/*'

Other known options / alternatives (pros & cons)

Caveats & tips (practical)

More Automation

So far so good...

Now let's go even a step further & build a GitHub Actions workflow that automates:

this way you'll be actually implementing the following flow:

Prerequisites

A Sample Repo structure

.
├── .github/workflows/
│   └── gpu-job.yml      # automation workflow
├── scripts/
│   ├── prewarm.py       # warmup (load model once)
│   ├── start_runpod.py  # Python script to call Runpod API
│   ├── start_vast.py    # Python script to call Vast API
│   ├── sync_to_s3.sh    # sync workspace -> S3
│   └── sync_from_s3.sh  # restore workspace from S3

Example GitHub Actions workflow

name: GPU Job Automation
 
on:
  workflow_dispatch:
    inputs:
      provider:
        description: "GPU provider (runpod or vast)"
        required: true
        default: "runpod"
      task:
        description: "Task to run (train, predict, dev)"
        required: true
        default: "train"
 
jobs:
  gpu:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'
 
      - name: Install dependencies
        run: pip install requests boto3
 
      - name: Start GPU instance
        run: |
          if [ "${{ github.event.inputs.provider }}" = "runpod" ]; then
            python scripts/start_runpod.py \
              --task "${{ github.event.inputs.task }}"
          else
            python scripts/start_vast.py \
              --task "${{ github.event.inputs.task }}"
          fi
        env:
          RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }}
          VAST_API_KEY: ${{ secrets.VAST_API_KEY }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}

Example Runpod starter script (sketch: scripts/start_runpod.py)

import os, requests, json, time
 
RUNPOD_ENDPOINT = "https://api.runpod.io/graphql"
 
def start_pod(task: str):
    query = """
    mutation LaunchPod($input: PodInput!) {
      podFindAndDeployOnDemand(input: $input) {
        id
        environment
        imageName
      }
    }
    """
    variables = {
        "input": {
            "cloudType": "ALL",
            "gpuTypes": ["A100"],
            "imageName": "myorg/dev-pytorch:cuda",
            "volumeInGb": 20,
            "env": [
                f"TASK={task}",
                f"AWS_ACCESS_KEY_ID={os.getenv('AWS_ACCESS_KEY_ID')}",
                f"AWS_SECRET_ACCESS_KEY={os.getenv('AWS_SECRET_ACCESS_KEY')}",
                f"AWS_DEFAULT_REGION={os.getenv('AWS_DEFAULT_REGION')}",
            ],
            "startScript": "bash /workspace/scripts/sync_from_s3.sh && python /workspace/scripts/prewarm.py"
        }
    }
    headers = {"Authorization": f"Bearer {os.environ['RUNPOD_API_KEY']}"}
    r = requests.post(RUNPOD_ENDPOINT, json={"query": query, "variables": variables}, headers=headers)
    print(r.json())
 
if __name__ == "__main__":
    start_pod(task=os.getenv("TASK", "train"))

Example Vast starter script (sketch: scripts/start_vast.py)

import os, requests, json
 
VAST_API = "https://vast.ai/api/v0"
 
def start_instance(task: str):
    # Minimal example; in reality you'd query offers first
    offer_id = 123456
    payload = {
        "client_id": os.getenv("VAST_API_KEY"),
        "image": "myorg/dev-pytorch:cuda",
        "disk": 30,
        "onstart": "bash /workspace/scripts/sync_from_s3.sh && python /workspace/scripts/prewarm.py",
        "env": {
            "TASK": task,
            "AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID"),
            "AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY"),
            "AWS_DEFAULT_REGION": os.getenv("AWS_DEFAULT_REGION"),
        }
    }
    r = requests.post(f"{VAST_API}/instances/create/{offer_id}/", json=payload)
    print(r.json())
 
if __name__ == "__main__":
    start_instance(task=os.getenv("TASK", "train"))

Example Sync scripts (sketch)

scripts/sync_from_s3.sh:

#!/bin/bash
set -e
aws s3 sync s3://my-bucket/projectX/working/ /workspace/ --exclude 'big-unused/*'

scripts/sync_to_s3.sh:

#!/bin/bash
set -e
aws s3 sync /workspace/ s3://my-bucket/projectX/working/ --delete

No matter how expensive GPUs get, our most valuable asset remains time — and I hope these techniques help you save it.