Home

Why Language Models Hallucinate: The Epidemic of Penalizing Uncertainty

2025-11-07T00:00:00+00:00

Figure: Binary grading makes “guess when unsure” optimal → higher hallucinations.
Confidence-aware grading (penalize wrong answers; allow IDK) makes abstention rational → lower hallucinations.

Motivation

Large Language Models still ‘hallucinate’ confidently producing false statements that sound perfectly plausible.

Kalai et al. argues that hallucination isn’t a mysterious side effect of neural networks. It’s the inevitable outocme of how we train and evaluate models.

The core claim is : models hallucinate becaue our benchmarks reward guessing over honesty.

Why Hallucinations Happen

Pretraining teaches a model to imitate text not to know truth.
Post-training doesn’t fix this, because evaluation benchmarks themselves reinforce confident bluffing.
The result: models behave like overconfident students taking a test that punishes leaving a blank answer.

What this paper does differently

Instead of blaming architecture or data, the authors build a theoretical bridge between generative modeling and binary classification.

They show that generating vaild text is statistically harder than classifying validity. If a model can’t perfectly classify ‘valid’ vs ‘invalid’, it will inevitably produce false generations.

They formalize this in what they call the Is-it-Valid problem, proving that:

The hallucination rate of any pretrained language model >= 2 x its misclassification rate

This means hallucinations aren’t exotic, they’re mathematically baked into the training objective.

The Exam Incentive Problem

After pretraining, models are fine-tuned using benchmarks that give binary scores: 1 for ‘correct’ and 0 for ‘wrong’. No credit for ‘I don’t know’.

This creates what authors call an ‘epidemic of penalizing uncertainty’. Our current scoring systems literally teach models that guessing is better than admitting ignorance.

What they actually achieved

The paper contributions are :

A statistical proof linking hallucination to classification error.
A lower bound showing hallucination rates grow with data sparsity, rare facts are the first to break.
An analysis of real benchmarks showing that over 90% use binary grading, reinforcing the behavior.
A simple but powerful fix: add confidence thresholds to evaluation instructions.

The Fix: Confidence Aware Evaluation

Instead of inventing new hallucination tests, the authors suggest changing how all existing tests are scored. This way, abstaining becomes rational. It turns being careful into a winning strategy. It also enables a new measure called behavioral calibration, a model behaves consistently across confidence levels rather than bluffing.

Why it Matters

This paper reframes hallucination as an alignment and incentive design problem, not a data or architecture issue.

It challenges leaderboard culture itself: safety depends not only on better models but better reward structures
It offers a testable path forward: tweak benchmarks, watch hallucination rates fall.
And it connects statistical theory to moral intuition in both humans and machines, confidence without truth is dangerous.

Future & Limitations

This framework doesn’t solve all forms of hallucination e.g. - nonsense text or multi fact narratives.
It assumes honesty is expressible via an “IDK”, which may not capture nuance.
Yet the authors make a provocative point: our grading systems are part of the problem.

If we reward language models for bluffing, we shouldn't be surprised when they lie. The cure for hallucination might not be better models but fairer exams.

Scalable influence and fact tracing for large language models pretraining

2025-11-07T00:00:00+00:00

Figure: Difference between the classical lexical retrieval and the influence based retrieval for large language models

Motivation

Large language models can state thousands of facts confidently — but we still cannot reliably trace where those facts came from in the training data.

This is a core challenge for:

transparency,
copyright compliance,
model auditing,
safety-sensitive deployments.

Training Data Attribution (TDA) tried to answer: “Which training examples most influenced a model’s prediction?”
But until now, all attribution methods either:

did not scale beyond small models,
or collapsed into noise when applied to pretraining corpora.

Chang et al. introduce TrackStar, the first method that scales gradient-based influence tracing to 8B-parameter LLMs trained on hundreds of millions of documents.

And what they find is important: Attribution and influence are not the same.
Models rarely learn facts from the sentences that contain them.

Why Fact Tracing Is Hard

Pretraining mixes millions of subtle statistical patterns.
Most gradient-based influence methods explode in variance at LLM scale.
Fact-containing sentences (e.g., “X was born in Y”) are often not the most influential examples.
Classical retrievers like BM25 excel at string matching, not causal impact.

The result:
Human intuition about “where a model learned something” breaks down in large-scale pretraining.

What This Paper Does Differently

Instead of relying purely on lexical similarity or uncorrected gradients, the authors combine:

loss gradients over all model parameters,
optimizer second-moment scaling (Adafactor/Adam-style),
massive random projections (65k dimensions),
a mixed Hessian approximation (to remove template noise),
cosine-normalized influence scoring.

What TrackStar Actually Achieves

The paper demonstrates:

Influence ≠ attribution.
BM25 and Gecko retrieve fact-containing passages better than gradient methods.
But these passages barely shift model predictions under tail-patching.
Influential examples often lack the fact entirely.
They encode:
- relational templates,
- entity type priors,
- structural patterns,
- distributional cues.
Influence correlates with lexical attribution only at larger scales.
Bigger models show more alignment between “cause” and “content.”
TrackStar proponents change probabilities >2× more than lexical proponents.
Even ground-truth T-REx fact sentences have weaker causal impact than TrackStar’s retrieved examples.

This reframes what “learning a fact” means in LLMs.

Why Attribution Breaks (and Influence Wins)

Classical retrieval assumes:

If a model predicts “Paris is in France,” it must have learned this from a sentence containing both words.

But TrackStar shows the real story:

The model may have learned geography from many nearby examples.
It may rely on name similarity (“Paris Hilton”).
It may learn from country–capital templates.
It may rely on broad distributional patterns (“France” is often the completion to “capital of…”).

Influence-based methods capture these hidden causal pathways, not just literal text matches.

Why It Matters

This work provides the strongest evidence yet that:

LLMs rarely memorize facts explicitly.
They assemble answers from distributed patterns.
Fact tracing cannot rely on lexical matching alone.
For safety, auditing, and copyright, we need causal influence.
Scaling changes attribution behavior.
As models grow, influential examples gradually become more lexical — an emergent alignment.
Gradient-based influence can scale.
Even if the engineering cost (87 TB) is enormous, this sets the path for future systems.

Limitations & Open Questions

TrackStar is expensive: storing projected gradients for C4 still requires ~87 TB.
Influence estimates depend on linear approximations (gradients), not full retraining.
Per-token or per-span attribution might be needed to remove document-level noise.
Hessian mixing is heuristic; no theoretical foundation for λ beyond empirical tuning.
Influence does not necessarily map to human-understandable explanations.

Still, TrackStar moves the frontier substantially.

A Closing Thought

This paper shifts the conversation from *“Where did this fact appear?”*  
to *“What actually shaped the model’s belief?”* — a far more interesting and future-proof question for safety and transparency research.

Teaching Humanoids Without MoCap: Inside TWIST2’s Portable Data Collection System

2025-11-05T00:00:00+00:00

Motivation

How do we collect humanlike motion data for robots without a $100K motion-capture studio?

TWIST2 is like a GoPro for humanoid learning i.e., small, cheap, portable and built to scale.

Why Humanoid Data Collection is Hard

MoCap systems are accurate but expensive and bulky
VR based systems were either limited to partial control or lacked natural motion.
Humanoids need full body, long horizon coordination: walking, bending, grasping, looking simultaneously.

What TWIST2 Does Differently

Portable Setup — A PICO 4U VR headset with two motion trackers replaces the MoCap suit.
Robot Side — Unitree G1 humanoid with an attachable 2-DoF neck costing $250.
Human Control — A single operator in VR becomes the robot. Moves arms, legs, and head naturally.

The Magic Pipeline (Explained Simply)

Step 1 — Human moves in VR -> PICO streams motion at 100Hz
Step 2 — Software retargets that motion to the robot’s body
Step 3 — A learned motion tracking controller (trained via reinforcement learning) turns these into smooth, stable joint commands.
Step 4 — Robot acts in real time (<0.1 s delay)
Step 5 — The entire run - camera view, motion data, commands is saved as demonstration data.

What they actually achieved

Tell the story visually

Folding towels with both hands
Picking up baskets, opening doors, and walking through
Performing dexterous pick-and-place and even kicking a box.

Quantify the efficiency:

100 successful demos in under 20 minutes
Single operator, no calibration, no lab studio.

How Robots Learn from the data

Explain the next layer: the hierarchical policy

Low level controller keeps balance and tracks motion
High level Diffusion Policy predicts what motion comes next from the robot’s own visual input.
Result: a robot that can autonomously repeat complex whole body tasks it learned from human teleoperation

Why it matters

This is where you connect to the broader AI world:

Democratizes humanoid learning: <$2K setup instead of lab infrastructure.
Enables open source, reproducible datasets for humanoid RL.
Moves toward robots that can learn directly from natural human demonstrations.

Future & Limitations

Balance hype with realism:

VR tracking isn’t as precise as MoCap
High speed motions still hard to reproduce
But the trade-off in portability, cost, and scalibility opens door for thousands for researchers.

“The next time you put on a VR headset remember you might not just be playing a game. You could be teaching the next generation of robots how to move, see and live among us.”

What I Learned from Hackathons (and Losing One!)

2025-10-29T00:00:00+00:00

Hackathons have been among the best learning experiences of my career.

I’ve participated in two major ones:

🏆 Open IIT Data Analytics (2021) — 1st place out of 48 teams. Built a music popularity predictor with Voting Classifiers (91.2% accuracy).
💡 HackGT 12 (2025) — Built BackpackMate AI, a travel assistant using Mastra + LangChain + FastAPI.

Lessons learned

Speed ≠ sloppiness — rapid iteration teaches clarity under pressure.
LLMs are only as smart as your pipeline — retrieval design matters more than model choice.
Losing is learning — HackGT taught me far more than winning IIT.

Hackathons show recruiters that you can go from idea to prototype in hours — a skill that translates directly into startup and research settings.

5 Books That Changed How I Think About Machine Learning and Research

2025-10-22T00:00:00+00:00

Books have shaped how I approach ML — not just as a technical field, but as a way of thinking.

Here are 5 that deeply influenced me:

The Master Algorithm by Pedro Domingos — A grand tour of learning paradigms.
The Alignment Problem by Brian Christian — A must-read on ethics and interpretability.
Deep Learning by Goodfellow, Bengio & Courville — The bible of neural networks.
Weapons of Math Destruction by Cathy O’Neil — The societal side of data.
How Minds Change by David McRaney — Essential for anyone who communicates ideas.

Why it matters

These books helped me see ML as more than code — as a philosophy of learning and understanding.
If you’re early in your ML journey, start with The Alignment Problem — it will change the way you see “responsible AI.”

What is Data Shapley? Measuring the True Value of Data

2025-10-15T00:00:00+00:00

We often focus on model architectures — but what if the most valuable part of your ML system is your data?
Data Shapley assigns a contribution score to each training point, measuring its impact on model performance.

In my ongoing project, I use TreeExplainer and validation-based importance computation to approximate Shapley values efficiently.

Why it matters

Knowing which data points help or hurt your model allows:

Smarter dataset curation
Better fairness and robustness
Insights into which samples actually matter

Imagine debugging a biased model not by tweaking hyperparameters — but by identifying the “toxic” data points.

Enhancing Cybersecurity Risk Assessment using Temporal Knowledge Graphs

2025-09-13T00:00:00+00:00

My recent publication in Decision Support Systems (Elsevier, 2025) focuses on temporal knowledge graph-based explainable DSS for cybersecurity.

We created a dataset of cybersecurity policies from 190 global firms and built a temporal knowledge graph to capture entity relations over time. The model then used attention-based mechanisms to classify policy vulnerabilities.

Highlights:

Introduced the first temporal cybersecurity policy dataset.
Automated attention unit selection for interpretability.
Developed an explainable DSS that identifies and explains vulnerabilities.

Link to the paper: DOI Youtube video: Coming Soon(#)

Why it matters

Modern enterprises face evolving threats. Our framework doesn’t just flag a risky policy — it explains which rule and why it’s risky, helping companies improve their cybersecurity posture proactively.

Explaining SENE: Manifold Learning for Distracted Driving Analysis

2023-04-15T00:00:00+00:00

My first research paper, published in Engineering Applications of Artificial Intelligence (2023), proposed SENE — a novel manifold learning technique for analyzing distracted driving.

We developed a method that learns spatio-temporal embeddings from driver behavior and road data, enabling interpretable risk mapping across urban areas.

Key takeaways:

Combined spatio-temporal and praxeological features for the first time.
Reduced high-dimensional driving data into meaningful manifolds.
Achieved 91% accuracy in predicting distraction-related risk.

Link to the paper: DOI Youtube video : Video (coming soon)

Why it matters

Distracted driving is one of the top causes of accidents. SENE helps policymakers and insurance companies identify high-risk regions and understand why certain driving behaviors lead to accidents — not just that they do.