-
Auto-Generated Rubric Evaluators: Building Context-Aware Evaluators for AI Agents
The auto-generated rubric evaluator system creates task-specific evaluators that score AI agents' performance against defined criteria, validated across multiple datasets for accuracy and reliability.
-
Benchmarks in Microsoft Foundry (preview): Standardized model and agent quality checks
Benchmarks in Microsoft Foundry let developers run standardized open-source tests on any model deployment or agent, instantly comparing results via a UI or API while distinguishing between overall model performance and specific deployment quality.
-
Intelligent sampling in Microsoft Foundry: the science behind selecting better production traces
Microsoft Foundry’s intelligent sampling technique uses a MinHash farthest-first algorithm to boost lexical diversity and LLM judge preference by 29% and 78%, respectively, making it ideal for evaluation and fine-tuning workflows that prioritize coverage over mirroring production frequencies.
-
Azure AI Foundry Architecture
Azure AI Foundry introduces a centralized Hub for corporate governance and granular Project workspaces that inherit security policies while isolating developer assets, enabling seamless model swaps through a unified catalog with serverless pay-as-you-go or provisioned compute options, and integrates secure data fabrics via identity-dr
-
Azure AI Document Intelligence Tutorial
Azure AI Document Intelligence’s v4.0 API introduces optimized deterministic extraction for structured templates, a new Layout Model delivering Markdown outputs ideal for RAG systems, and streamlined prebuilt models for financial documents with automatic check table parsing.
-
A Guided Tour of the New Microsoft Foundry Labs
Microsoft Foundry Labs offers hands-on access to cutting-edge AI experiments across six impactful domains, including MAI-Image-2.5 for advanced image generation and Fara 1.5 for computer-vision reasoning, enabling developers to immediately test frontier technologies without lengthy reimplementation delays.
-
Dragon Copilot and Microsoft Marketplace are transforming the way healthcare is delivered
Dragon Copilot and integration via Microsoft Marketplace aim to remove friction from overloaded clinical workflows by embedding AI directly into existing tools, addressing capacity constraints and inefficiencies in documentation, administration, and patient care.
-
New in Cowork – the UI refresh and the cost question everyone’s asking
The refreshed Cowork UI in the Microsoft 365 Copilot app introduces a new "/cost" skill that lets users track exactly how many Copilot Credits each task consumes, enhancing transparency and control over usage.
-
Unlocking the next frontier of Local AI on Windows for Telecommunications
Unlocking next-frontier Local AI on Windows for telecommunications enables customer care agents to respond instantly with richer context, field workflows that continue offline, and privacy-by-default network operations, all leveraging the existing telecom device fleet without new hardware.
-
The Case for an Ontology Layer in Telecoms
An ontology layer in telecoms provides a shared semantic framework that translates fragmented, unstructured data into consistent business meanings, enabling AI models to reason across network, IT/OSS/BSS, customer interaction, and external ecosystem datasets while adhering to telecom-specific standards.
-
Azure AI Foundry vs Google Vertex AI
Azure AI Foundry focuses on enterprise compliance with a Hub-and-Project hierarchy, while Google Vertex AI emphasizes a pipeline-first ecosystem integrated with GCP data services and offers superior multimodal model capabilities.
-
Revolutionizing Document Intelligence: Scaling Construction Industries with AI-Driven Extraction
This article shows how Azure AI services-Content Understanding, Foundry, Blob Storage, and OpenAI-are used to automate extraction from construction drawings, boosting productivity by enabling digital threads that reduce manual handoffs and improve project coordination.
-
A Practical, Technical Guide to Bringing AI Into Everyday Nonprofit Workflows
Microsoft Copilot for Microsoft 365 embeds AI directly into everyday tools like Word, Excel, Outlook, Teams, and PowerPoint, enabling nonprofits to automate knowledge work efficiently without needing new systems or specialized AI teams.
-
Copilot Cowork is now generally available
Copilot Cowork’s general availability introduces usage-based billing with a grace period for Frontier users and three credit tiers-light ($1-$3), medium ($4-$7), and heavy (>$7)-plus the option to commit to P3 credits for discounts, reflecting its cloud
-
The Hidden Boundaries of Modern AI
The article explains that AI models process inputs as encoded tensors rather than human-readable text, highlighting the critical distinction between how humans perceive prompts and how models actually interpret them.
-
Azure OpenAI Architecture: The Decisions That Actually Matter (Part 3)
Azure OpenAI’s architecture emphasizes GenAIOps practices-evaluation pipelines, full-stack observability, Model Router patterns, and prompt governance-to turn continuous model upgrades into routine operations rather than emergency events.
-
How to Score a User Simulator: Introducing USR-8
USR-8 introduces an Eight-Metric User Simulation Rubric that separates simulator behavior from style, revealing failure modes hidden by composite scores and showing that most "good" simulators rely on prompt policy rather than orchestration code.
-
Build an Automated SLA Risk Agent with Routines in Microsoft Foundry
This tutorial demonstrates how to build an automated SLA risk agent in Microsoft Foundry using Routines, which analyzes ticket data with Azure AI Search and surfaces potential SLA breaches daily.
-
Detecting Python Vulnerabilities with GraphCodeBERT
Detecting Python Vulnerabilities with GraphCodeBERT introduces CSI, a novel tool that uses code structure understanding instead of regex pattern matching to achieve an F1 score over 90% in identifying real vulnerabilities, addressing the limitations of existing tools like Bandit which rely on token patterns and often miss subtle security issues
-
All Agents Report Back to Me: Monitoring AI Agents with Chat
The article introduces a self-hosted Rocket.Chat chat server that lets AI agents post status updates via a custom skill, providing a unified monitoring portal with secure internal access for humans while keeping agent data private.
-
Building ShadowQuest: A Multi-Agent RPG
ShadowQuest showcases how specialized AI agents collaborate using Foundry IQ and GPT-4.1 to deliver immersive, context-aware responses in a multiplayer fantasy RPG built for the Agents League Hackathon.
-
Agents That Test Agents: A Cloud-Native Skill-Eval Harness on Foundry Hosted Agents
azure_skill_eval provides a cloud‑native harness on Foundry Hosted Agents to rigorously test skills like edu-video-script across multiple models and scenarios, using deterministic validators, an LLM judge for structured scoring, and an adversarial attacker agent to ensure robustness under varied prompts.
-
Gamifying World Improvement: A Reasoning-Agent RPG on Microsoft Foundry
The new reasoning‑agent RPG on Microsoft Foundry demonstrates how a multi‑agent system with live judges can validate complex world‑building tasks by requiring human approval at verification points, showcasing an architecture where each step, from company analysis to worker factory creation, is logged and reusable.
-
Cross-Region Model Connectivity Options in Microsoft Foundry: Supported Patterns and Tradeoffs
The article details three cross-region model connectivity patterns in Microsoft Foundry-direct connections, Azure API Management gateways, and VNet-secured APIM variants-explaining their use cases, tradeoffs, and a detailed VNet-secured implementation for enterprise deployments.