Case Studies

AI for Energy

AI Agents and Models for Energy Plants

The Challenge of an Operator

In a power plant, resolving an equipment issue is far more complex than simply fixing a broken component. Operators and technicians must diagnose the problem, review engineering drawings, assess operational impacts, prepare a work plan, isolate the equipment safely, perform maintenance, verify the results, and document the entire process before the equipment can return to service.

Understanding Complex Industrial Systems

A critical part of this workflow involves analyzing multiple engineering representations—including P&IDs, logic diagrams, electrical schematics, single-line diagrams, physical layouts, and operational time-series data—to understand equipment behavior and system-wide dependencies. Therefore, an AI assistant designed for power plant operations must possess a strong capability to interpret and reason over diverse engineering diagrams and multimodal industrial information.

Data beyond the web

Moreover, the challenge extends far beyond understanding complex engineering documents. Much of the knowledge required to operate and maintain a power plant exists as tacit expertise accumulated by experienced operators and engineers over decades. In addition, many critical data sources, procedures, and operational records are proprietary and never leave the organization. As a result, the information needed to solve real-world problems is often unavailable on the public internet, making it significantly more difficult for generic AI models to acquire the necessary domain knowledge.

PowerBench (Thermal) — Building the Evaluation Foundation for Power plants

Because no benchmark existed to measure AI capabilities in power plant operations, we first built a comprehensive evaluation suite covering the full spectrum of knowledge, reasoning, diagram understanding, and operational workflows.

Engineering Knowledge

  • Evaluates scientific and engineering knowledge required in power plants.

  • Covers university- and graduate-level topics such as thermodynamics, fluid dynamics, heat transfer, and thermal-fluid engineering.

  • Includes both multiple-choice and open-ended questions.

Operational Reasoning

  • Evaluates the ability to analyze operational scenarios and equipment issues.

  • Measures root-cause analysis, impact assessment, response planning, and consequence prediction.

  • Assessed using expert-defined rubrics.

Static Diagram Understanding

  • Evaluates the understanding of engineering diagrams used in power plants, including schematics, P&IDs, electrical diagrams, and control logic diagrams.

  • Includes both multiple-choice and open-ended VLM tasks.

Dynamic Diagram Understanding

  • Evaluates reasoning over engineering systems and process flows.

  • Requires models to infer equipment relationships, physical mechanisms, process behavior, and the effects of operational interventions from diagrams.

  • Assessed using rubric-based generation tasks.

End-to-end Industrial Agentic Workflow

  • Evaluates real operational tasks performed by plant operators and engineers.

  • Includes document retrieval, drawing navigation, equipment relationship analysis, procedure execution, and problem-solving workflows.

Training a Domain-native Foundation Model

Using the datasets and benchmarks we developed, we trained Gravity-16B-A3B, one of our latest lightweight STEM-specialized models. Despite its compact size and significantly lower training cost, the model achieved state-of-the-art performance within the power plant domain and demonstrated performance comparable to leading frontier models such as Claude Opus 4.6 and DeepSeek V4 Pro across multiple evaluation categories.

The Next Frontier

Gravity-16B-A3B validates our thesis that domain specialization can deliver frontier-level performance efficiently. We are now scaling this approach with Gravity Flash, a 100B-class industrial foundation model currently in training. By combining larger-scale reasoning capabilities with our proprietary industrial datasets and evaluation framework, we expect Gravity Flash to establish a new state of the art for industrial AI systems.