June 23, 2026

Claude Design + Vercel, WebSocket Functions, on-device AI

Tool of the Week

Claude Design deploys directly to Vercel now

Claude Design now integrates Vercel as a send-to destination, converting designs to live deployments via the Vercel MCP server without leaving the canvas.

Eliminates the design-to-deployment handoff by piping Claude-generated designs directly into Vercel projects. Reduces friction between design iteration and live URL sharing for rapid feedback loops.

Replaces manual design export + Vercel project setup workflow. Requires connecting the Vercel MCP server to Claude Design via the Share menu. Ready now—official integration with documented setup path.

“Vercel is now a send-to destination in Claude Design”
“Claude Design deploys the design as a new project in your connected Vercel account and returns a URL you can open and share”
“add Vercel as your destination in the 'Share' menu and connect the Vercel MCP server to get started”

claude-designvercelmcpdeploymentdesign-to-code

Dev Signal

Get issues like this in your inbox — free, 3× a week.

Quick Signals

Apple releases Core AI framework for on-device LLMs

Core AI is the successor to Core ML, enabling PyTorch model conversion and deployment of up to 70B-parameter LLMs on-device via unified hardware access (CPU/GPU/Neural Engine) with quantization and palettization built-in.

Eliminates per-token cloud costs, removes server dependencies, and keeps user data local—streamlining inference pipelines for developers targeting Apple Silicon. Model specialization on first load trades one-time latency for cached subsequent runs, changing how you architect cold-start handling.

Replaces Core ML for neural networks and transformers. Requires PyTorch models converted via torch.export.ExportedProgram → TorchConverter().to_coreai(). Ready now with OS release, but ecosystem maturity depends on community adoption—start with vision/reasoning models if targeting iPhone/iPad/Mac only.

“Core AI framework, the official successor to Core ML”
“support for both custom-converted PyTorch models and pre-optimized open-source models”
“unified architecture for deploying models ranging from compact 3B-parameter vision models to large-scale LLMs, including reasoning models with up to 70B-parameter reasoning models”
“ensures user data privacy, zero server dependencies, and zero per-token cloud costs”
“unified hardware access, allowing workloads to seamlessly run across the CPU, GPU, and Neural Engine under one API”
“ahead-of-time (AOT) compilation, which shifts work off the user's device, yielding near-instant load times”
“the first attempt to use a model may take significantly longer than subsequent runs, once the model has been already cached”

on-device-inferencepytorch-conversionapple-siliconmodel-compressionllm-deployment

Data Point

GLM-5.2 passes frontier-model vibe check

GLM-5.2 adds IndexShare sparse-attention optimization and clears the 'daily driver' bar for open-weight models, with free inference via Hugging Face and local GGUF support.

Eliminates the benchmaxxing cycle: practitioners independently validate GLM-5.2 as production-ready, not just lab artifact. Reduces friction to local deployment and inference cost ($2.40/task vs $31 for Fable 5).

Replaces prior open models (GLM-5.1, DeepSeek-style) as the first credible open alternative for agentic knowledge work. Requires 128GB+ VRAM for full model or 3-bit quantization for Apple Silicon (~26 tok/s on M3 Max). Worth trying now—architecture change (IndexShare) and availability strategy (free Hugging Face window, llama.cpp support) mean zero barrier to prototyping. Gap: no vision support.

“multiple practitioners independently described Zhipu's GLM-5.2 as the first open-weight model that feels plausibly frontier-adjacent in daily use”
“beyond MLA and DSA inherited from prior GLM/DeepSeek-style designs, GLM-5.2 adds IndexShare, reusing sparse-attention top-k indices across groups of layers to reduce the cost of 1M-token inference”
“GLM-5.2 $2.40, while some weaker options were orders of magnitude cheaper”
“free via Hugging Face Inference Providers for a limited window, local GGUF support via llama.cpp/Unsloth”

open-modelsinference-optimizationagentic-codesparse-attentionbenchmark

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

Claude Design + Vercel, WebSocket Functions, on-device AI

Claude Design deploys directly to Vercel now

Quick Signals

Apple releases Core AI framework for on-device LLMs

GLM-5.2 passes frontier-model vibe check

Vercel Functions now serve WebSocket connections

Claude automates 95% of analytics queries via semantic layers

Sakana Fugu Ultra routes work across frontier models

Open SWE deploys async coding agents to GitHub