What if we're not sure what AI can do for us?

That's exactly what the free strategy session is for. We map your workflows, identify the highest-ROI opportunities, and give you a concrete picture.

How long does implementation take?

Most projects go from concept to production in weeks, not months.

Yes. Enterprise-grade security with SOC 2 compliance, end-to-end encryption, and full audit trails. For LLM fine-tuning, your data never leaves your hardware.

Back to Blog

Product Launch

PMetal: Fine-Tune LLMs on Your Mac — No Cloud Required

Professional-grade large language model fine-tuning that runs entirely on Apple Silicon. Your data stays on your machine, your models stay under your control, and your costs stay at the price of electricity.

Epistates Engineering TeamMarch 202610 min read

PMetalLLM Fine-TuningApple SiliconOn-Premises

What Is PMetal?

PMetal is a local LLM fine-tuning platform built specifically for Apple Silicon. It gives developers, researchers, and businesses the tools to train and adapt large language models entirely on their own hardware — no cloud accounts, no API keys, no data leaving the building.

The name is a nod to Metal, Apple's GPU compute framework that powers everything under the hood. PMetal harnesses the unified memory architecture of M-series chips to make fine-tuning workloads that once required expensive cloud GPUs practical on hardware you already own.

Key Capabilities at a Glance

LoRA / QLoRA / DoRA

All major parameter-efficient fine-tuning methods, ready to run

GRPO Reasoning Training

Group Relative Policy Optimization for training reasoning models

20+ Model Architectures

Llama, Mistral, Gemma, Phi, Qwen, and more out of the box

Desktop GUI

Point-and-click fine-tuning with real-time loss curves and progress

TUI & CLI

Fully scriptable terminal interfaces for automation and CI workflows

Python SDK

Integrate fine-tuning directly into your existing ML pipelines

The Problem: Cloud Fine-Tuning Is Expensive and Risky

For the past few years, fine-tuning a language model has meant one thing: renting cloud GPUs. A modest training run on a capable model could easily cost hundreds of dollars. A production-quality fine-tune with hyperparameter sweeps could run into the thousands. And that cost repeats every time you want to experiment, iterate, or retrain on fresh data.

Cost is only half the story. The other half is data. When you send your training data to a cloud provider, you lose control of it. For businesses handling customer information, proprietary knowledge bases, internal documents, or anything governed by HIPAA, GDPR, SOC 2, or internal compliance policies, sending that data off-premises is either illegal, a breach of contract, or simply unacceptable.

The result has been a two-tier AI landscape: large organizations with dedicated ML infrastructure can fine-tune models on private data, while everyone else either uses generic public models or accepts the privacy trade-off of cloud training. PMetal exists to close that gap.

Cloud Fine-Tuning Pain Points

GPU rental costs add up fast — especially during experimentation phases
Your training data is uploaded to and processed on third-party infrastructure
Compliance teams may prohibit sending sensitive data outside the organization
Vendor lock-in: model weights and training artifacts live in someone else's storage
Internet dependency creates latency, availability risk, and bandwidth costs
No reproducibility guarantees if the provider changes their environment

The Solution: Run Everything Locally on Apple Silicon

Apple Silicon changed the calculus for local ML workloads. The M-series unified memory architecture means your CPU and GPU share the same high-bandwidth memory pool. A Mac Studio with 192GB of unified memory can comfortably hold and train models that would require A100-class cloud hardware — at a one-time hardware cost rather than an ongoing rental fee.

PMetal is built from the ground up to exploit this architecture. It uses Metal Performance Shaders for GPU-accelerated matrix operations, implements memory-efficient quantization to pack larger models into available RAM, and provides parameter-efficient fine-tuning methods (LoRA, QLoRA, DoRA) that reduce the memory footprint of training to a fraction of full fine-tuning.

The result is a complete, self-contained fine-tuning environment that installs on your Mac in minutes. Your training data never leaves your machine. Your model weights are stored wherever you choose. Your results are reproducible because nothing about your environment changes between runs.

100%

Data Privacy

Training data never leaves your machine

Cloud Cost

No GPU rental fees, ever

20+

Architectures

Supported model families out of the box

Fine-Tuning Methods: LoRA, QLoRA, DoRA, and GRPO

PMetal supports the full spectrum of modern parameter-efficient fine-tuning techniques, letting you choose the right trade-off between quality, speed, and memory usage for your specific use case.

LoRA (Low-Rank Adaptation)

The gold standard for efficient fine-tuning. LoRA injects small trainable rank decomposition matrices into the model's attention layers, allowing the base model weights to remain frozen while only a tiny fraction of parameters are updated. Ideal for task-specific adaptation with minimal compute.

Best for: domain adaptation, instruction following, style transfer

QLoRA (Quantized LoRA)

LoRA applied to a 4-bit quantized base model. QLoRA dramatically reduces the memory footprint of fine-tuning, making it possible to train 7B and 13B parameter models on hardware with 16–24GB of unified memory. The quality-to-resource ratio is exceptional.

Best for: larger models on memory-constrained hardware

DoRA (Weight-Decomposed Low-Rank Adaptation)

An advancement over LoRA that decomposes pre-trained weights into magnitude and direction components and applies LoRA only to the directional component. DoRA consistently achieves better performance than LoRA at equivalent parameter counts.

Best for: when you need LoRA-level efficiency with better final model quality

GRPO (Group Relative Policy Optimization)

PMetal is one of the first local fine-tuning tools to support GRPO — the reinforcement learning technique used to train reasoning-capable models. GRPO enables you to train models that reason step-by-step, improving performance on math, coding, and logic tasks without requiring a separate reward model.

Best for: training reasoning models, chain-of-thought behavior

Four Ways to Work with PMetal

PMetal is designed to fit into your workflow, not the other way around. Whether you prefer a graphical interface, the terminal, or Python scripting, there is a first-class PMetal experience for you.

Desktop GUI

A native macOS application with a clean visual interface for loading datasets, configuring training runs, and monitoring progress in real time. Loss curves, memory utilization, and estimated completion times are all visible at a glance. No command-line experience required.

Terminal UI (TUI)

A rich terminal interface for those who live in the command line. Keyboard-driven navigation, side-by-side training monitors, and full feature parity with the desktop GUI — all without leaving your terminal session.

CLI

A composable command-line interface designed for scripting and automation. Chain PMetal commands in shell scripts, integrate with Makefiles, or trigger fine-tuning jobs as part of a larger data pipeline without writing Python.

Python SDK

A fully-featured Python library for integrating PMetal into your existing ML workflows. Programmatically configure training jobs, sweep hyperparameters, load checkpoints, and export adapters — all from within your Jupyter notebooks or training scripts.

On-Premises Means You Own Your AI

When a fine-tuned model runs in the cloud, the business relationship between you and your AI provider shapes what you can do with it. Rate limits, terms of service changes, pricing adjustments, outages, and provider decisions are outside your control. Your AI is on someone else's infrastructure, subject to someone else's policies.

PMetal takes a different philosophy: the model you train is yours, full stop. The weights live on your hardware. The training data stays in your control. There is no usage metering, no per-inference cost, and no third party that can revoke access. The only ongoing cost is electricity.

For organizations with sensitive workloads — healthcare, legal, finance, defense, or any domain with stringent compliance requirements — this is not just a nice-to-have. It is often the only acceptable path to deploying AI on private data.

Data Sovereignty by Design

PMetal has no telemetry, no cloud sync, and no requirement for an internet connection after installation. Your training runs, datasets, model checkpoints, and exported adapters are entirely local. Compliance auditors will find nothing to flag because there is nothing leaving the machine.

Get Started with PMetal

PMetal runs on any Apple Silicon Mac — M1, M2, M3, or M4, across the full lineup from MacBook Air to Mac Pro. Models from 1B to 70B+ parameters are supported depending on available unified memory. The recommended minimum is 16GB unified memory; 32GB or more unlocks the full range of capabilities.

The project is open source and available on GitHub. Full documentation, quickstart guides, example datasets, and a community forum are available at the PMetal product page.

View PMetal Product Page View on GitHub

Fine-Tuning That Respects Your Data

PMetal is the result of a straightforward belief: the hardware sitting on your desk is already powerful enough to fine-tune world-class language models. The only thing that was missing was software built specifically for it. Now it exists.

Explore PMetal Talk to the Team