In our first episode of 2026, swyx sits down with the cofounders of Artificial Analysis to discuss the state of LLM Evals and Benchmarks, and the key trends and drivers of LLM progress for the year.

Latent Space·1/8/2026·Nousresearch Anthropic Open Source Benchmarks

Aug 26

Nous: Hermes 4 70B (nousresearch/hermes-4-70b)

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit ... reasoning traces before answering. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates.

OpenRouter·8/26/2025·Meta-llama Nousresearch Open Source Release

Nous: Hermes 4 405B (nousresearch/hermes-4-405b)

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with ... traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior.

OpenRouter·8/26/2025·Meta-llama Nousresearch Open Source Release

May 9

Nous: DeepHermes 3 Mistral 24B Preview (nousresearch/deephermes-3-mistral-24b-preview)

DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications. DeepHermes 3 supports a **reasoning toggle via system prompt**, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a *"deep thinking"* mode—generating extended chains of thought wrapped in `` tags before delivering a final answer. System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.

OpenRouter·5/9/2025·Mistral AI Nousresearch Open Source

Feb 28

Nous: DeepHermes 3 Llama 3 8B Preview (nousresearch/deephermes-3-llama-3-8b-preview)

DeepHermes 3 Preview is the latest version of our flagship Hermes series of LLMs by Nous Research, and one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. We have also improved LLM annotation, judgement, and function calling. DeepHermes 3 Preview is one of the first LLM models to unify both "intuitive", traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt.

OpenRouter·2/28/2025·Meta-llama Nousresearch Open Source

Feb 8

Llama 3.1 Tulu 3 405B (allenai/llama-3.1-tulu-3-405b)

Tülu 3 405B is the largest model in the Tülu 3 family, applying fully open post-training recipes at a 405B parameter scale. Built on the Llama 3.1 405B base, it leverages Reinforcement Learning with Verifiable Rewards (RLVR) to enhance instruction following, MATH, GSM8K, and IFEval performance. As part of Tülu 3’s fully open-source approach, it offers state-of-the-art capabilities while surpassing prior open-weight models like Llama 3.1 405B Instruct and Nous Hermes 3 405B on multiple benchmarks. To read more, click here.

OpenRouter·2/8/2025·Allenai Meta-llama Open Source Benchmarks

Aug 18

Nous: Hermes 3 70B Instruct (nousresearch/hermes-3-llama-3.1-70b)

Hermes 3 is a generalist language model with many improvements over Hermes 2 , including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 70B is a competitive, if not superior finetune of the Llama-3.1 70B foundation model , focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

OpenRouter·8/18/2024·Meta-llama Nousresearch Open Source Coding

Aug 16

Nous: Hermes 3 405B Instruct (free) (nousresearch/hermes-3-llama-3.1-405b)

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

OpenRouter·8/16/2024·Meta-llama Nousresearch Open Source Release

Jul 11

Nous: Hermes 2 Theta 8B (nousresearch/hermes-2-theta-llama-3-8b)

An experimental merge model based on Llama 3, exhibiting a very distinctive style of writing. It combines the the best of Meta's Llama 3 8B and Nous Research's Hermes 2 Pro . Hermes-2 Θ (theta) was specifically designed with a few capabilities in mind: executing function calls, generating JSON output, and most remarkably, demonstrating metacognitive abilities (contemplating the nature of thought and recognizing the diversity of cognitive processes among individuals).

OpenRouter·7/11/2024·Meta-llama Nousresearch Open Source

May 27

NousResearch: Hermes 2 Pro - Llama-3 8B (nousresearch/hermes-2-pro-llama-3-8b)

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

OpenRouter·5/27/2024·Meta-llama Nousresearch Open Source Release

May 11

LLaVA v1.6 34B (liuhaotian/llava-yi-34b)

LLaVA Yi 34B is an open-source model trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: NousResearch/Nous-Hermes-2-Yi-34B It was trained in December 2023.

Nousresearch News

News Feed

[AINews] Gemma 4 crosses 2 million downloads

[AINews] Good Friday

[AINews] A quiet April Fools

[AINews] H100 prices are melting UP

[AINews] Apple's War on Slop

[AINews] The high-return activity of raising your aspirations for LLMs

Artificial Analysis: The Independent Ratings Agency of AI — with George Cameron and Micah-Hill Smith

Nous: Hermes 4 70B (nousresearch/hermes-4-70b)

Nous: Hermes 4 405B (nousresearch/hermes-4-405b)

Nous: DeepHermes 3 Mistral 24B Preview (nousresearch/deephermes-3-mistral-24b-preview)

Nous: DeepHermes 3 Llama 3 8B Preview (nousresearch/deephermes-3-llama-3-8b-preview)

Llama 3.1 Tulu 3 405B (allenai/llama-3.1-tulu-3-405b)

Nous: Hermes 3 70B Instruct (nousresearch/hermes-3-llama-3.1-70b)

Nous: Hermes 3 405B Instruct (free) (nousresearch/hermes-3-llama-3.1-405b)

Nous: Hermes 2 Theta 8B (nousresearch/hermes-2-theta-llama-3-8b)

NousResearch: Hermes 2 Pro - Llama-3 8B (nousresearch/hermes-2-pro-llama-3-8b)

LLaVA v1.6 34B (liuhaotian/llava-yi-34b)

Tools

Directories

Models & Pricing

Endpoints

Rankings

News

Nousresearch News

News Feed

[AINews] Gemma 4 crosses 2 million downloads

[AINews] Good Friday

[AINews] A quiet April Fools

[AINews] H100 prices are melting *UP*

[AINews] Apple's War on Slop

[AINews] The high-return activity of raising your aspirations for LLMs

Artificial Analysis: The Independent Ratings Agency of AI — with George Cameron and Micah-Hill Smith

Nous: Hermes 4 70B (nousresearch/hermes-4-70b)

Nous: Hermes 4 405B (nousresearch/hermes-4-405b)

Nous: DeepHermes 3 Mistral 24B Preview (nousresearch/deephermes-3-mistral-24b-preview)

Nous: DeepHermes 3 Llama 3 8B Preview (nousresearch/deephermes-3-llama-3-8b-preview)

Llama 3.1 Tulu 3 405B (allenai/llama-3.1-tulu-3-405b)

Nous: Hermes 3 70B Instruct (nousresearch/hermes-3-llama-3.1-70b)

Nous: Hermes 3 405B Instruct (free) (nousresearch/hermes-3-llama-3.1-405b)

Nous: Hermes 2 Theta 8B (nousresearch/hermes-2-theta-llama-3-8b)

NousResearch: Hermes 2 Pro - Llama-3 8B (nousresearch/hermes-2-pro-llama-3-8b)

LLaVA v1.6 34B (liuhaotian/llava-yi-34b)

Tools

Directories

Models & Pricing

Endpoints

Rankings

News

[AINews] H100 prices are melting UP