AWS Launches Reinforcement Fine-Tuning on Amazon Bedrock
According to aws.amazon.com, Amazon Web Services has made reinforcement fine-tuning (RFT) available on Amazon Bedrock for customizing Amazon Nova and supported open source models. The company states that RFT “delivers up to 66% accuracy gains over base models at reduced customization cost and complexity.”
Unlike supervised fine-tuning, RFT uses reward signals rather than labeled input/output pairs. According to AWS, the technique works by having models generate candidate responses that are scored by a reward function—which can be rule-based, a trained grader model, or an LLM as a judge. The model weights are then updated to increase the probability of generating high-reward responses.
AWS indicates that RFT is “particularly valuable when the desired behavior can be evaluated, but difficult to demonstrate” through static examples. The company recommends RFT for use cases including code generation, structured extraction, and content moderation.
Researchers Release Pramana Epistemic Reasoning Framework
Separately, according to arxiv.org, researchers have introduced Pramana, an approach that fine-tunes large language models using Navya-Nyaya logic, described as “a 2,500-year-old Indian reasoning framework.” The researchers fine-tuned Llama 3.2-3B and DeepSeek-R1-Distill-Llama-8B on 55 Nyaya-structured logical problems, achieving “100% semantic correctness on held-out evaluation.” The researchers have released their models, datasets, and training infrastructure on Hugging Face.