<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>mapika.dev</title>
<link>https://mapika.dev/</link>
<atom:link href="https://mapika.dev/feed.xml" rel="self" type="application/rss+xml"/>
<description>Machine learning, written out properly. Real derivations, interactive equations, no hand-waving.</description>
<language>en</language>
<lastBuildDate>Wed, 01 Jul 2026 12:00:00 GMT</lastBuildDate>
<item>
<title><![CDATA[Softmax Is argmax with a dial]]></title>
<link>https://mapika.dev/essays/07</link>
<guid isPermaLink="true">https://mapika.dev/essays/07</guid>
<pubDate>Wed, 01 Jul 2026 12:00:00 GMT</pubDate>
<description><![CDATA[Softmax as the unique smooth argmax under entropic regularization; temperature as a Lagrange multiplier, annealing as a continuous path from uniform averaging to hard selection.]]></description>
</item>
<item>
<title><![CDATA[Coverage Without belief]]></title>
<link>https://mapika.dev/essays/10</link>
<guid isPermaLink="true">https://mapika.dev/essays/10</guid>
<pubDate>Wed, 01 Jul 2026 12:00:00 GMT</pubDate>
<description><![CDATA[Split conformal prediction produces intervals that contain the truth at least 90% of the time, for any model, any data distribution, at any finite sample size, and all it costs is sorting a list. The entire proof is one ranking argument: a fresh data point can't be told apart from the held-out points used for calibration. In this essay I work through that argument, run it live on data with no mean and no variance, and then explain what the 90% actually means and which single assumption does all the work.]]></description>
</item>
<item>
<title><![CDATA[Proofs That keep secrets]]></title>
<link>https://mapika.dev/essays/11</link>
<guid isPermaLink="true">https://mapika.dev/essays/11</guid>
<pubDate>Wed, 01 Jul 2026 12:00:00 GMT</pubDate>
<description><![CDATA[A zero-knowledge proof lets you convince someone that a statement is true without revealing anything about why it's true. In 1985 Goldwasser, Micali and Rackoff showed this is possible, and in this essay I walk through the classic protocol on a concrete example: a graph repainted with fresh random colors every round, one edge opened per challenge. A cheating prover gets caught with probability that compounds across rounds, and a simulator argument turns "the verifier learned nothing" from a slogan into a theorem.]]></description>
</item>
<item>
<title><![CDATA[Speculative Decoding loses nothing]]></title>
<link>https://mapika.dev/essays/09</link>
<guid isPermaLink="true">https://mapika.dev/essays/09</guid>
<pubDate>Wed, 01 Jul 2026 12:00:00 GMT</pubDate>
<description><![CDATA[During decoding a language model is memory-bound: generating each new token means reading every weight out of memory again. Speculative decoding speeds this up by letting a small draft model guess several tokens ahead so the big model can check them all in one pass, and a modified rejection rule guarantees the output distribution is exactly the big model's, not an approximation of it. In this essay I prove that exactness in four lines, show that the acceptance rate equals the overlap between the two models' distributions, and work out where the speedup comes from and when it runs out.]]></description>
</item>
<item>
<title><![CDATA[The Oracle regime]]></title>
<link>https://mapika.dev/essays/08</link>
<guid isPermaLink="true">https://mapika.dev/essays/08</guid>
<pubDate>Wed, 01 Jul 2026 12:00:00 GMT</pubDate>
<description><![CDATA[Twenty-one years after the chess machine Hydra crushed Michael Adams, the argument over "AI slop" code is stuck in the years between Deep Blue's famous 1997 win and the quieter matches that actually settled the question. In this essay I take three lessons from chess — what engine analysis did to Tal's famous sacrifices, why software has no equivalent of Stockfish, and how the era of human–machine teams ended — and use them to work out where the burden of proof has flipped: wherever output can be checked, and only there.]]></description>
</item>
<item>
<title><![CDATA[Attention Is a kernel]]></title>
<link>https://mapika.dev/essays/06</link>
<guid isPermaLink="true">https://mapika.dev/essays/06</guid>
<pubDate>Fri, 01 May 2026 12:00:00 GMT</pubDate>
<description><![CDATA[Take away the learned projections and attention is a Nadaraya–Watson estimator, a method from 1964 that predicts by averaging observed values, weighting nearer points more. The softmax supplies the weights, and the projections only learn what counts as near. In this essay I work out the correspondence exactly, use it to read off what attention can and can't do, and am clear about where the sixty-year-old analogy stops being useful.]]></description>
</item>
<item>
<title><![CDATA[What Adam actually adapts]]></title>
<link>https://mapika.dev/essays/05</link>
<guid isPermaLink="true">https://mapika.dev/essays/05</guid>
<pubDate>Sun, 01 Mar 2026 12:00:00 GMT</pubDate>
<description><![CDATA[Adam's denominator looks like a curvature correction, but what it actually estimates is the typical size of each gradient, and in the memoryless limit it doesn't rescale the gradient so much as throw its magnitude away. In this essay I take the update rule apart term by term, from what the denominator really estimates through what stepping by sign alone buys you, to why the two averages run on different timescales and the regime switch hiding inside ε.]]></description>
</item>
<item>
<title><![CDATA[The Lottery Ticket, re-drawn]]></title>
<link>https://mapika.dev/essays/04</link>
<guid isPermaLink="true">https://mapika.dev/essays/04</guid>
<pubDate>Thu, 01 Jan 2026 12:00:00 GMT</pubDate>
<description><![CDATA[A replication of iterative magnitude pruning — train a network, delete the smallest weights, reset the rest, repeat — with tighter controls than the original. The short version: what survives pruning isn't a reusable subnetwork, it's the pairing of a specific pruning pattern with a specific random initialization. In this essay I state the hypothesis precisely, run the two control experiments that rule out the popular reading, look at how the claim had to change at larger scale, and list what actually holds up.]]></description>
</item>
<item>
<title><![CDATA[Networks That never leave home]]></title>
<link>https://mapika.dev/essays/03</link>
<guid isPermaLink="true">https://mapika.dev/essays/03</guid>
<pubDate>Sat, 01 Nov 2025 12:00:00 GMT</pubDate>
<description><![CDATA[Very wide neural networks turn out to be surprisingly simple: they can fit their training data while their weights barely move, so they behave like linear models from start to finish. In this essay I build the neural tangent kernel, the object that describes this regime, explain the scaling accident that makes wide networks lazy, collect what the infinite-width limit genuinely explains, and argue that the interesting parts of deep learning happen exactly where the approximation breaks down.]]></description>
</item>
<item>
<title><![CDATA[Initialization Is a variance budget]]></title>
<link>https://mapika.dev/essays/02</link>
<guid isPermaLink="true">https://mapika.dev/essays/02</guid>
<pubDate>Mon, 01 Sep 2025 12:00:00 GMT</pubDate>
<description><![CDATA[Before training starts, a neural network is a random function, and whether it's a usable one comes down to variances. In this essay I walk through why every layer multiplies the signal's scale by a factor, why that factor has to be exactly one for deep networks to work, and how the standard initialization rules fall out of a two-line calculation you can check numerically.]]></description>
</item>
<item>
<title><![CDATA[Backprop Is just bookkeeping]]></title>
<link>https://mapika.dev/essays/01</link>
<guid isPermaLink="true">https://mapika.dev/essays/01</guid>
<pubDate>Tue, 01 Jul 2025 12:00:00 GMT</pubDate>
<description><![CDATA[Backpropagation has a reputation for being deep magic, but it's the chain rule from calculus plus a careful system for storing intermediate results. In this essay I walk through why computing gradients backwards is so much cheaper than computing them forwards, what the algorithm actually stores, and why the real cost shows up in memory rather than in compute.]]></description>
</item>
</channel>
</rss>
