TorchDevice image

TorchDevice

Module contains class TorchDevice abstracting CUDA/MPS/CPU, intercepting, and redirecting calls to the available device to assist in porting code from CUDA to MPS.

  • TorchDevice 0.5.2: Still Beta, but Three Months of Refactoring, Testing, and Real-World Breakthroughs

    TorchDevice is a compatibility layer that makes PyTorch code written for CUDA “just work” on Apple’s MPS (and vice versa), with minimal friction for devs. It’s the kind of glue that shouldn’t exist-but absolutely needs to if you live on both sides of the Apple/NVIDIA divide.

  • Announcing TorchDevice 0.0.5 Beta - Transparent Hardware Redirection for PyTorch

    We’re pleased to announce the release of TorchDevice 0.0.5 Beta, a significant milestone in simplifying hardware compatibility for PyTorch applications. This release introduces robust enhancements, thorough testing, and a powerful new CPU override feature to ensure seamless integration across CUDA and Apple Silicon (Metal) hardware.

CUDA-to-MPS Compatibility for PyTorch

TorchDevice is compatibility tooling for Python projects that were written with CUDA-first PyTorch assumptions but need to run on Apple Silicon using MPS.

It intercepts common PyTorch device calls and redirects them based on the available backend: CUDA, MPS, or CPU. The goal is not to hide every hardware difference. The goal is to reduce the first round of breakage when trying to run existing CUDA-oriented code on local Apple hardware.

Why This Matters

Many AI and ML examples assume NVIDIA hardware. That creates friction for developers working locally on Apple Silicon, especially when testing older code, research projects, or tools that hard-code CUDA device behavior.

TorchDevice acts as a bridge for that situation. It can help local experiments get running faster, make porting work more explicit, and show where CUDA-specific assumptions still need attention.

Key Features

  • Automatic Device Redirection: Intercepts torch.device instantiation and redirects it based on available hardware (CUDA, MPS, or CPU).
  • Drop-in compatibility layer: Reduces code changes needed for common CUDA-to-MPS test runs.
  • Apple Silicon focused: Helps CUDA-oriented PyTorch code run against Apple’s Metal-backed MPS backend where practical.
  • AI and ML workflow support: Reduces friction when moving experiments between CUDA systems and local Apple hardware.
  • Explicit CPU Override: Provides a special 'cpu:-1' device specification to force CPU usage regardless of available accelerators.
  • Mocked CUDA Functions: Provides mocked implementations of CUDA-specific functions, enabling code that uses CUDA functions to run on MPS hardware.
  • Stream and Event Support: Implements full support for CUDA streams and events on MPS devices, allowing for asynchronous operations and event timing.
  • Unified Memory Handling: Handles differences in memory management between CUDA and MPS, providing reasonable values for memory-related functions.
  • Logging and Debugging: Outputs informative log messages indicating how calls are intercepted and handled, assisting in code migration and debugging.
  • Transparent Integration: Works transparently without requiring changes to existing codebases.
  • PyTorch Compiler Compatibility: Works with PyTorch’s dynamo compiler and inductor.

Development & Availability

This is older compatibility tooling, but it remains useful for local Apple Silicon experiments and CUDA-to-MPS porting work. It is available on GitHub.

For installation, usage, and technical details, see the project’s README.


Support This Work

If this project is useful and you want to help support more work like it, you can contribute here:


Join the Discussion

Comments for this post live in GitHub Discussions. That keeps moderation in one place and gives the conversation a stable home.