News

Mark Tech Post
marktechpost. com > 06/06/2026 > meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b

Meet Harness-1: A 20 B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

1+ hour, 16+ min ago  (671+ words) Harness-1 reaches 0. 730 average curated recall across eight benchmarks, trailing only Opus-4. 6 among the searchers tested. Their answer is Harness-1, a 20 B retrieval subagent built on gpt-oss-20b. It was trained with reinforcement learning inside a stateful search harness. The harness holds…...

Symbols: mmeb-v2,nasdaq:smx,btc-usd
Mark Tech Post
marktechpost. com > 05/31/2026 > parallax-a-parameterized-local-linear-attention-that-keeps-softmax-and-adds-a-learned-covariance-correction-branch

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

6+ day, 3+ hour ago  (1040+ words) The Transformer's attention mechanism has barely changed since 2017. Most efficiency work has tried to replace softmax attention outright. A new paper takes a different route. It keeps softmax attention and bolts on a correction branch. A team of researchers from…...

Symbols: btc-usd,eth-usd,ufrn.br
Mark Tech Post
marktechpost. com > 05/30/2026 > trajectory-releases-a-concurrent-multi-lora-training-stack-for-continual-learning-reporting-a-2-81x-experiment-throughput-gain

Trajectory Releases a Concurrent Multi-Lo RA Training Stack for Continual Learning, Reporting a 2. 81" Experiment-Throughput Gain

1+ week, 5+ hour ago  (962+ words) Most language models improve in discontinuous jumps. A team collects data, trains, and ships a new version. This takes months and produces remarkable or catastrophic behavior for users. Trajectory wants to replace that cycle with continual learning. The Trajectory team…...

Symbols: btc-usd,eth-usd
Mark Tech Post
marktechpost. com > 05/29/2026 > nvidia-introduces-x-token-projection-guided-cross-tokenizer-kd-that-outperforms-gold-by-3-82-average-points-on-llama-3-2-1b

NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3. 82 Average Points on Llama-3. 2-1 B

1+ week, 1+ day ago  (917+ words) Knowledge distillation (KD) transfers "dark knowledge" from a large teacher model to a smaller student. The student learns from the teacher's full output probability distribution over tokens, not just correct answers. This is done via per-position Kullback'Leibler (KL) divergence over…...

Mark Tech Post
marktechpost. com > 05/27/2026 > sakana-ai-proposes-diffusionblocks-a-block-wise-training-framework-that-converts-residual-networks-into-independently-trainable-denoising-modules

Sakana AI Proposes Diffusion Blocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules

1+ week, 3+ day ago  (900+ words) Researchers from Sakana AI and the University of Tokyo propose Diffusion Blocks. It trains transformer-based networks one block at a time. Training memory is reduced by a factor of B, where B is the number of blocks. Performance is maintained…...

Symbols: ufrn.br,cifar-10,post-ln
Mark Tech Post
marktechpost. com > 05/26/2026 > memo-a-modular-framework-for-training-a-dedicated-memory-model-on-new-knowledge-without-modifying-llm-parameters

MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters

1+ week, 4+ day ago  (876+ words) Large language models become static after pretraining. Their knowledge does not update as the world changes. Retraining a full LLM is too expensive at modern scales. Fine-tuning risks degrading previously learned knowledge. Retrieval-augmented generation (RAG) struggles when answers require reasoning…...

Symbols: btc-usd,eth-usd
Mark Tech Post
marktechpost. com > 05/26/2026 > design-a-complete-multimodal-rlvr-pipeline-with-open-mm-rl-vision-language-prompting-reward-scoring-and-grpo-export

Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Reward Scoring, and GRPO Export

1+ week, 5+ day ago  (935+ words) In this tutorial, we explore the Turing Enterprises/Open-MM-RL dataset as a practical foundation for multimodal reasoning and reinforcement learning with verifiable rewards. We load the dataset, inspect its schema, analyze domains, formats, question lengths, answer types, and image distributions,…...

Symbols: btc-usd
Mark Tech Post
marktechpost. com > 05/25/2026 > step-by-step-guide-to-build-and-compare-fedavg-and-fedprox-federated-learning-on-non-iid-cifar-10-with-nvidia-flare

Step by Step Guide to Build and Compare Fed Avg and Fed Prox Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE

1+ week, 5+ day ago  (628+ words) In this tutorial, we build an advanced federated learning experiment with NVIDIA FLARE. We compare Fed Avg and Fed Prox on a non-IID CIFAR-10 setup, where client data is split using a Dirichlet distribution to simulate realistic label imbalance across…...

Symbols: btc-usd
Mark Tech Post
marktechpost. com > 05/18/2026 > stochastic-gradient-descent-sgds-frequency-bias-and-how-adam-fixes-it

Stochastic Gradient Descent (SGD's) Frequency Bias and How Adam Fixes It

2+ week, 5+ day ago  (868+ words) Modern language models are trained on data with extremely uneven token distributions. A small number of words appear in almost every sentence, while many rare but meaningful tokens occur only occasionally. This creates a hidden optimization challenge: parameters associated with…...

Symbols: post-ln
Mark Tech Post
marktechpost. com > 05/16/2026 > nous-research-proposes-lighthouse-attention-a-training-only-selection-based-hierarchical-attention-that-delivers-1-4-1-7x-pretraining-speedup-at-long-context

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1. 41. 7" Pretraining Speedup at Long Context

3+ week, 9+ hour ago  (1602+ words) Lighthouse takes a different approach on both design decisions. It pools queries, keys, and values symmetrically across a multi-level pyramid, and it places selection entirely outside the attention kernel. After selection, the system gathers the chosen entries into a contiguous,…...

Symbols: btc-usd,eth-usd