Posts by Tags

Scalable influence and fact tracing for large language models pretraining

3 minute read

Published: November 07, 2025

Figure: Difference between the classical lexical retrieval and the influence based retrieval for large language models

Why Language Models Hallucinate: The Epidemic of Penalizing Uncertainty

3 minute read

Published: November 07, 2025

Figure: Binary grading makes “guess when unsure” optimal → higher hallucinations.
Confidence-aware grading (penalize wrong answers; allow IDK) makes abstention rational → lower hallucinations.

Why Language Models Hallucinate: The Epidemic of Penalizing Uncertainty

3 minute read

Published: November 07, 2025

Figure: Binary grading makes “guess when unsure” optimal → higher hallucinations.
Confidence-aware grading (penalize wrong answers; allow IDK) makes abstention rational → lower hallucinations.

Scalable influence and fact tracing for large language models pretraining

3 minute read

Published: November 07, 2025

Figure: Difference between the classical lexical retrieval and the influence based retrieval for large language models

Why Language Models Hallucinate: The Epidemic of Penalizing Uncertainty

3 minute read

Published: November 07, 2025

Figure: Binary grading makes “guess when unsure” optimal → higher hallucinations.
Confidence-aware grading (penalize wrong answers; allow IDK) makes abstention rational → lower hallucinations.

Scalable influence and fact tracing for large language models pretraining

3 minute read

Published: November 07, 2025

Figure: Difference between the classical lexical retrieval and the influence based retrieval for large language models

Teaching Humanoids Without MoCap: Inside TWIST2’s Portable Data Collection System

2 minute read

Published: November 05, 2025

Motivation

How do we collect humanlike motion data for robots without a $100K motion-capture studio?

What I Learned from Hackathons (and Losing One!)

less than 1 minute read

Published: October 29, 2025

Hackathons have been among the best learning experiences of my career.

What is Data Shapley? Measuring the True Value of Data

less than 1 minute read

Published: October 15, 2025

We often focus on model architectures — but what if the most valuable part of your ML system is your data?
Data Shapley assigns a contribution score to each training point, measuring its impact on model performance.

Enhancing Cybersecurity Risk Assessment using Temporal Knowledge Graphs

less than 1 minute read

Published: September 13, 2025

My recent publication in Decision Support Systems (Elsevier, 2025) focuses on temporal knowledge graph-based explainable DSS for cybersecurity.

What is Data Shapley? Measuring the True Value of Data

less than 1 minute read

Published: October 15, 2025

We often focus on model architectures — but what if the most valuable part of your ML system is your data?
Data Shapley assigns a contribution score to each training point, measuring its impact on model performance.

Explaining SENE: Manifold Learning for Distracted Driving Analysis

less than 1 minute read

Published: April 15, 2023

My first research paper, published in Engineering Applications of Artificial Intelligence (2023), proposed SENE — a novel manifold learning technique for analyzing distracted driving.

What I Learned from Hackathons (and Losing One!)

less than 1 minute read

Published: October 29, 2025

Hackathons have been among the best learning experiences of my career.

Teaching Humanoids Without MoCap: Inside TWIST2’s Portable Data Collection System

2 minute read

Published: November 05, 2025

Motivation

How do we collect humanlike motion data for robots without a $100K motion-capture studio?

Enhancing Cybersecurity Risk Assessment using Temporal Knowledge Graphs

less than 1 minute read

Published: September 13, 2025

My recent publication in Decision Support Systems (Elsevier, 2025) focuses on temporal knowledge graph-based explainable DSS for cybersecurity.

What I Learned from Hackathons (and Losing One!)

less than 1 minute read

Published: October 29, 2025

Hackathons have been among the best learning experiences of my career.

5 Books That Changed How I Think About Machine Learning and Research

less than 1 minute read

Published: October 22, 2025

Books have shaped how I approach ML — not just as a technical field, but as a way of thinking.

What is Data Shapley? Measuring the True Value of Data

less than 1 minute read

Published: October 15, 2025

We often focus on model architectures — but what if the most valuable part of your ML system is your data?
Data Shapley assigns a contribution score to each training point, measuring its impact on model performance.

Explaining SENE: Manifold Learning for Distracted Driving Analysis

less than 1 minute read

Published: April 15, 2023

My first research paper, published in Engineering Applications of Artificial Intelligence (2023), proposed SENE — a novel manifold learning technique for analyzing distracted driving.

Scalable influence and fact tracing for large language models pretraining

3 minute read

Published: November 07, 2025

Figure: Difference between the classical lexical retrieval and the influence based retrieval for large language models

Why Language Models Hallucinate: The Epidemic of Penalizing Uncertainty

3 minute read

Published: November 07, 2025

Figure: Binary grading makes “guess when unsure” optimal → higher hallucinations.
Confidence-aware grading (penalize wrong answers; allow IDK) makes abstention rational → lower hallucinations.

5 Books That Changed How I Think About Machine Learning and Research

less than 1 minute read

Published: October 22, 2025

Books have shaped how I approach ML — not just as a technical field, but as a way of thinking.

5 Books That Changed How I Think About Machine Learning and Research

less than 1 minute read

Published: October 22, 2025

Books have shaped how I approach ML — not just as a technical field, but as a way of thinking.

Enhancing Cybersecurity Risk Assessment using Temporal Knowledge Graphs

less than 1 minute read

Published: September 13, 2025

My recent publication in Decision Support Systems (Elsevier, 2025) focuses on temporal knowledge graph-based explainable DSS for cybersecurity.

Explaining SENE: Manifold Learning for Distracted Driving Analysis

less than 1 minute read

Published: April 15, 2023

My first research paper, published in Engineering Applications of Artificial Intelligence (2023), proposed SENE — a novel manifold learning technique for analyzing distracted driving.

Scalable influence and fact tracing for large language models pretraining

3 minute read

Published: November 07, 2025

Figure: Difference between the classical lexical retrieval and the influence based retrieval for large language models

Why Language Models Hallucinate: The Epidemic of Penalizing Uncertainty

3 minute read

Published: November 07, 2025

Figure: Binary grading makes “guess when unsure” optimal → higher hallucinations.
Confidence-aware grading (penalize wrong answers; allow IDK) makes abstention rational → lower hallucinations.

Teaching Humanoids Without MoCap: Inside TWIST2’s Portable Data Collection System

2 minute read

Published: November 05, 2025

Motivation

How do we collect humanlike motion data for robots without a $100K motion-capture studio?

Subhajit Bag

Posts by Tags

-AI

-Hallucination

-Language

-Transparency

AI

Motivation

AI Projects

Data Valuation

Decision Support Systems

Explainable AI

Hackathons

Humanoids

Motivation

Knowledge Graphs

LLMs

Machine Learning

Manifold Learning

Models

Reflection

Research

Safety

VR

Motivation