Claude Design + Vercel, WebSocket Functions, on-device AI — Dev Signal
Dev Signal/Archive/Claude Design + Vercel, WebSocket Functions, on-device AI
June 23, 2026
Claude Design + Vercel, WebSocket Functions, on-device AI
Share:
Tool of the Week
Claude Design deploys directly to Vercel now
Claude Design now integrates Vercel as a send-to destination, converting designs to live deployments via the Vercel MCP server without leaving the canvas.
Eliminates the design-to-deployment handoff by piping Claude-generated designs directly into Vercel projects. Reduces friction between design iteration and live URL sharing for rapid feedback loops.
Replaces manual design export + Vercel project setup workflow. Requires connecting the Vercel MCP server to Claude Design via the Share menu. Ready now—official integration with documented setup path.
“Vercel is now a send-to destination in Claude Design”
“Claude Design deploys the design as a new project in your connected Vercel account and returns a URL you can open and share”
“add Vercel as your destination in the 'Share' menu and connect the Vercel MCP server to get started”
claude-designvercelmcpdeploymentdesign-to-code
Dev Signal
Get issues like this in your inbox — free, 3× a week.
Quick Signals
Apple releases Core AI framework for on-device LLMs
Core AI is the successor to Core ML, enabling PyTorch model conversion and deployment of up to 70B-parameter LLMs on-device via unified hardware access (CPU/GPU/Neural Engine) with quantization and palettization built-in.
Eliminates per-token cloud costs, removes server dependencies, and keeps user data local—streamlining inference pipelines for developers targeting Apple Silicon. Model specialization on first load trades one-time latency for cached subsequent runs, changing how you architect cold-start handling.
Replaces Core ML for neural networks and transformers. Requires PyTorch models converted via torch.export.ExportedProgram → TorchConverter().to_coreai(). Ready now with OS release, but ecosystem maturity depends on community adoption—start with vision/reasoning models if targeting iPhone/iPad/Mac only.
“Core AI framework, the official successor to Core ML”
“support for both custom-converted PyTorch models and pre-optimized open-source models”
“unified architecture for deploying models ranging from compact 3B-parameter vision models to large-scale LLMs, including reasoning models with up to 70B-parameter reasoning models”
“ensures user data privacy, zero server dependencies, and zero per-token cloud costs”
“unified hardware access, allowing workloads to seamlessly run across the CPU, GPU, and Neural Engine under one API”
“ahead-of-time (AOT) compilation, which shifts work off the user's device, yielding near-instant load times”
“the first attempt to use a model may take significantly longer than subsequent runs, once the model has been already cached”
GLM-5.2 adds IndexShare sparse-attention optimization and clears the 'daily driver' bar for open-weight models, with free inference via Hugging Face and local GGUF support.
Eliminates the benchmaxxing cycle: practitioners independently validate GLM-5.2 as production-ready, not just lab artifact. Reduces friction to local deployment and inference cost ($2.40/task vs $31 for Fable 5).
Replaces prior open models (GLM-5.1, DeepSeek-style) as the first credible open alternative for agentic knowledge work. Requires 128GB+ VRAM for full model or 3-bit quantization for Apple Silicon (~26 tok/s on M3 Max). Worth trying now—architecture change (IndexShare) and availability strategy (free Hugging Face window, llama.cpp support) mean zero barrier to prototyping. Gap: no vision support.
“multiple practitioners independently described Zhipu's GLM-5.2 as the first open-weight model that feels plausibly frontier-adjacent in daily use”
“beyond MLA and DSA inherited from prior GLM/DeepSeek-style designs, GLM-5.2 adds IndexShare, reusing sparse-attention top-k indices across groups of layers to reduce the cost of 1M-token inference”
“GLM-5.2 $2.40, while some weaker options were orders of magnitude cheaper”
“free via Hugging Face Inference Providers for a limited window, local GGUF support via llama.cpp/Unsloth”
3 issues a week · Free forever · 4,200+ developers
Vercel Functions now serve WebSocket connections
Node.js WebSocket support runs on Vercel Functions with standard libraries (ws, Socket.IO) and charges only for active CPU time, not idle connections.
Eliminates infrastructure decisions for realtime features—chat, collaborative apps, and AI streaming now deploy directly alongside existing serverless code. Active CPU pricing cuts costs for connection-heavy workloads where traditional billing burns budget on idle time.
Replaces dedicated WebSocket infrastructure or third-party realtime services. Requires only standard Node.js ws or Socket.IO libraries; no new configuration. Ready now—public beta, documented, works with existing Vercel deployments.
“Vercel Functions can now serve WebSocket connections, enabling bidirectional communication between clients and server-side code on Vercel”
“With Active CPU pricing, billing only applies to the time your Function spends processing messages, not idle connection time”
“You can serve WebSocket connections using standard Node.js libraries, with no additional configuration”
Claude automates 95% of analytics queries via semantic layers
AI analytics accuracy scales with data governance and semantic layer enforcement, not model capability—Claude improved from 21% to 95% accuracy after encoding business context as reusable skills.
If you're building analytics agents or self-service BI tools, this exposes the actual constraint: your model's performance ceiling is set by metadata quality, metric definitions, and semantic consistency, not inference capability. Misaligned data foundations kill accuracy regardless of model size.
Replaces ad-hoc dashboard sprawl and metric conflicts with governed semantic layers and encoded analytical workflows. Requires dimensional modeling, centralized metric definitions, lineage tracking, and skill templates (Anthropic provides a redacted example). Worth trying now if you have fragmented analytics pipelines—the semantic layer approach is proven and language-agnostic.
“95% of business analytics queries are automated via Claude, with ~95% accuracy in aggregate”
“Claude answered only 21% of analytics questions correctly without skills”
“After encoding analytical workflows and business context as skills, accuracy rose to more than 95% overall and approached 99% in some domains”
“If data foundations are the data warehouse itself, sources of truth are the reference surfaces the agent consults to navigate it”
“AI performance is often constrained less by model capability and more by context definition”
Sakana Fugu Ultra routes work across frontier models
Multi-agent routing system coordinates 1-3 models per request, available via AI SDK with no platform markup on inference.
Developers get Claude Mythos/Fable 5-class reasoning without vendor lock-in, with unified cost tracking and failover control through a single API endpoint.
Replaces single-model inference calls. Requires only setting `model: 'sakana/fugu-ultra'` in AI SDK. Ready now—try in playground first to validate latency/cost tradeoff for your workload.
“Fugu Ultra is built on a pool of publicly accessible frontier models, rather than running as a single model”
“routing work to 1-3 agents depending on the problem”
“AI Gateway provides a unified API for calling models, tracking usage and cost, and configuring retries, failover, and performance optimizations”
“AI Gateway reflects provider pricing with no markup and does not charge a platform fee on inference”
Cloud-hosted agent that integrates directly with GitHub repos, plans before coding, reviews its own work, and opens PRs—replacing the need to switch contexts between your IDE and task management.
Shifts coding agents from synchronous IDE helpers to background workers that handle multi-step tasks asynchronously, freeing you from blocking on AI execution. The human-in-the-loop design lets you steer mid-execution without restarting, matching how real teams actually work.
Ready now via hosted version at swe.langchain.com—requires Anthropic API key and GitHub connection. Replaces manual task routing and PR creation for complex features; overkill for one-liners (they're working on a local CLI for that). Worth trying if you have substantial refactors or feature tasks to delegate.
“the first open-source, async, cloud-hosted coding agent”
“It connects directly to your GitHub repositories, allowing you to delegate tasks from GitHub issues”
“Open SWE operates like another engineer on your team: it can research a codebase, create a detailed execution plan, write code, run tests, review its own work for errors, and open a pull request”
“Every task runs in a secure, isolated Daytona sandbox”
“The Planner researches the codebase to form a robust strategy first. After the code is written, the Reviewer checks for common errors, runs tests and formatters, and reflects on the changes before ever opening a PR”