<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>~iany/ Math</title><link>https://blog.iany.me/tags/math/</link><description>Recent content in Math «~iany/»</description><language>en-US</language><managingEditor>me@iany.me (Ian Yang)</managingEditor><webMaster>me@iany.me (Ian Yang)</webMaster><copyright>CC-BY-SA 4.0</copyright><lastBuildDate>Fri, 20 Feb 2026 00:00:00 +0800</lastBuildDate><atom:link href="https://blog.iany.me/tags/math/index.xml" rel="self" type="application/rss+xml"/><item><title>Power of Monoid, Beauty of Simplicity</title><link>https://blog.iany.me/2026/02/power-of-monoid-beauty-of-simplicity/</link><pubDate>Fri, 20 Feb 2026 00:00:00 +0800</pubDate><author>me@iany.me (Ian Yang)</author><guid>https://blog.iany.me/2026/02/power-of-monoid-beauty-of-simplicity/</guid><description>&lt;p&gt;A monoid is one of the smallest useful abstractions in algebra: a set closed under an associative binary operation, with an identity element. That simplicity is exactly why it shows up everywhere—from summing numbers and concatenating strings to powering divide-and-conquer algorithms and elegant data structures like finger trees. This post walks through what monoids are, why they give you &amp;ldquo;compute power&amp;rdquo; for free when you can phrase a problem in terms of them, and how to think about choosing the right monoid and predicate when you do.&lt;/p&gt;
&lt;h2 id="what-is-a-monoid"&gt;What is a monoid?&lt;/h2&gt;
&lt;p&gt;A monoid is a set &lt;code&gt;$S$&lt;/code&gt; equipped with a binary operator &lt;code&gt;$\bullet$&lt;/code&gt; and an identity element &lt;code&gt;$e$&lt;/code&gt; (&lt;a href="https://en.wikipedia.org/wiki/Monoid"&gt;Wikipedia&lt;/a&gt;).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The operator is closed on &lt;code&gt;$S$&lt;/code&gt;. For all &lt;code&gt;$a, b \in S$&lt;/code&gt;, the result &lt;code&gt;$a \bullet b$&lt;/code&gt; is also in &lt;code&gt;$S$&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The operator is associative: For all &lt;code&gt;$a,b,c \in S$&lt;/code&gt;, &lt;code&gt;$(a \bullet b) \bullet c = a \bullet (b \bullet c)$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The identity element &lt;code&gt;$e$&lt;/code&gt; satisfies &lt;code&gt;$e \bullet a = a \bullet e = a$&lt;/code&gt; for all &lt;code&gt;$a \in S$&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, The integer numbers with the operator addition (&lt;code&gt;+&lt;/code&gt;) is a monoid, where the identity element is &lt;code&gt;0&lt;/code&gt;. The integer numbers with the operator multiplication (&lt;code&gt;x&lt;/code&gt;) is also a monoid with the identity element &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The set of finite lists with the operator concatenation is a monoid since:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The operator is closed because concatenation of two finite lists is also a finite list.&lt;/li&gt;
&lt;li&gt;The operator is associative, because both &lt;code&gt;$(a \bullet b) \bullet c$&lt;/code&gt; and &lt;code&gt;$a \bullet (b \bullet c)$&lt;/code&gt; result in a new list by placing elements of &lt;code&gt;$a, b, c$&lt;/code&gt; consecutively.&lt;/li&gt;
&lt;li&gt;The identity element is the empty list.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The integer numbers with the operator &lt;code&gt;max&lt;/code&gt; is a counterexample. The operator is closed and associative, but there is no identity element. Given any integer &lt;code&gt;$e$&lt;/code&gt;, there&amp;rsquo;s always a smaller integer &lt;code&gt;$a$&lt;/code&gt; such that &lt;code&gt;$e \bullet a = e \ne a$&lt;/code&gt;. However, &lt;code&gt;max&lt;/code&gt; on the integer set with a lower bound is a monoid, such as the non-negative integers where the identity element is the lower bound &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="divide-and-conquer-why-associativity-matters"&gt;Divide and conquer: why associativity matters&lt;/h2&gt;
&lt;p&gt;At first glance, associativity may seem too trivial to be useful in programming. However, associativity is what enables powerful divide-and-conquer strategies, where problems can be split into parts, solved independently, and then safely recombined.&lt;/p&gt;
&lt;h3 id="exponentiation-by-squaring"&gt;Exponentiation by Squaring&lt;/h3&gt;
&lt;p&gt;Let’s begin with a simple application: repeatedly applying the binary operator to the same element.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\underbrace{a \bullet a \bullet \cdots \bullet a}_{a \text{ appears } n \text{ times}}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Instead of applying the binary operator &lt;code&gt;$n-1$&lt;/code&gt; times sequentially, we exploit associativity to group every two instances of &lt;code&gt;$a$&lt;/code&gt; together recursively. This gives us a smaller problem when n is even:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\underbrace{(a \bullet a) \bullet \cdots \bullet (a \bullet a)}_{(a \bullet a) \text{ appears } \frac{n}{2} \text{ times}}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When n is odd:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
a \bullet \underbrace{(a \bullet a) \bullet \cdots \bullet (a \bullet a)}_{(a \bullet a) \text{ appears } \frac{n-1}{2} \text{ times}}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We only need to compute &lt;code&gt;$a \bullet a$&lt;/code&gt; once to turn the problem of size &lt;code&gt;$n$&lt;/code&gt; to size &lt;code&gt;$n/2$&lt;/code&gt;. Repeating the process on &lt;code&gt;$(a \bullet a)$&lt;/code&gt; gives the &lt;a href="https://en.wikipedia.org/wiki/Exponentiation_by_squaring"&gt;Exponentiation by Squaring&lt;/a&gt; algorithm, which requires at most &lt;code&gt;$\displaystyle 2 \lfloor \log _{2}n\rfloor$&lt;/code&gt; computations that is more efficient than &lt;code&gt;$n-1$&lt;/code&gt; when &lt;code&gt;$n$&lt;/code&gt; is greater than 4.&lt;/p&gt;
&lt;p&gt;For integers or real numbers under multiplication (&lt;code&gt;×&lt;/code&gt;), exponentiation by squaring is an efficient algorithm to compute positive integer powers. Since the set of elliptic curve points under point addition forms a monoid, this same method can also be used to compute &lt;a href="https://kb.iany.me/para/lets/c/Cryptography/Elliptic&amp;#43;Curve&amp;#43;Scalar&amp;#43;Multiplication"&gt;Elliptic Curve Scalar Multiplication&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="general-divide-and-conquer-search-algorithm"&gt;General Divide-and-Conquer Search Algorithm&lt;/h3&gt;
&lt;p&gt;We can generalize the divide-and-conquer method to search an element in a sequence based solely on associativity.&lt;/p&gt;
&lt;p&gt;The result of applying the monoid operator to a sequence from left to right serves as a summary of that sequence. If a predicate can determine whether a given element is in the sequence based solely on this summary, we can devise a general divide-and-conquer search algorithm.&lt;/p&gt;
&lt;p&gt;Let’s say we want to search for a target element in the sequence &lt;code&gt;$t_1, \ldots, t_n$&lt;/code&gt;, where each &lt;code&gt;$t_i$&lt;/code&gt; belongs to a monoid &lt;code&gt;$(S, \bullet, e)$&lt;/code&gt;. Here, &lt;code&gt;$S$&lt;/code&gt; is the underlying set, &lt;code&gt;$\bullet$&lt;/code&gt; is the binary operation, and &lt;code&gt;$e$&lt;/code&gt; is the identity element.&lt;/p&gt;
&lt;p&gt;We don&amp;rsquo;t know which kind of predicates works for the search algorithm. Let&amp;rsquo;s give a best guess that the predicate &lt;code&gt;$p$&lt;/code&gt; is a function of the monoid &amp;ldquo;summary&amp;rdquo; that &lt;code&gt;$p(t_1 \bullet \cdots \bullet t_n)$&lt;/code&gt; is true if and only if &lt;code&gt;$t$&lt;/code&gt; is in the sequence &lt;code&gt;$t_1, \dots, t_n$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Assume that &lt;code&gt;$p(t_1 \bullet \cdots \bullet t_n)$&lt;/code&gt; is true, thus the target element &lt;code&gt;$t$&lt;/code&gt; is in the sequence. We divide the sequence into two halves: &lt;code&gt;$t_1,\ldots,t_k$&lt;/code&gt; and &lt;code&gt;$t_{k+1},\ldots,t_n$&lt;/code&gt;, where &lt;code&gt;$1 \le k \le n$&lt;/code&gt;. We then evaluate &lt;code&gt;$p(t_1 \bullet \cdots \bullet t_k)$&lt;/code&gt; to determine whether the target element lies in the first or second half, and continue the search.&lt;/p&gt;
&lt;p&gt;Based on this observation, we can deduce the following property of the predicate: there exists an index &lt;code&gt;$x$&lt;/code&gt; such that&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
p(t_1 \bullet \cdots \bullet t_k) := \begin{cases}
\text{false} &amp;amp; \text{if } k &amp;lt; x, \\
\text{true} &amp;amp; \text{if } k \ge x.
\end{cases}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;$t_x$&lt;/code&gt; is the target element if such &lt;code&gt;$x$&lt;/code&gt; exists; otherwise, the target element does not exist in the sequence.&lt;/p&gt;
&lt;p&gt;Intuitively, the target element is the turning point at which the predicate on the running summary changes from false to true.&lt;/p&gt;
&lt;figure class="kg-image-card"&gt;
&lt;img class="kg-image"
alt="Monoid Search Turning Point.excalidraw"
loading="lazy"
src="https://blog.iany.me/2026/02/power-of-monoid-beauty-of-simplicity/Monoid%20search%20turning%20point.excalidraw.svg" /&gt;
&lt;/figure&gt;
&lt;p&gt;Note that &lt;code&gt;$p$&lt;/code&gt; makes sense only on the summary of any prefix of the sequence. If we need to continue the search in the second half, we must remember the summary of the scanned prefix.&lt;/p&gt;
&lt;p&gt;Now we can define the search algorithm &lt;code&gt;$\mathrm{Search}(p, s, \{t_i,\ldots,t_j\})$&lt;/code&gt; where&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$s$&lt;/code&gt; is the summary of scanned prefix &lt;code&gt;$t_1 \bullet \cdots \bullet t_{i-1}$&lt;/code&gt; when &lt;code&gt;$i &amp;gt; 1$&lt;/code&gt; or the identity element &lt;code&gt;$e$&lt;/code&gt; otherwise.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$t_i, \ldots, t_j$&lt;/code&gt; is the sub-range to search next.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$p$&lt;/code&gt; is the predicate as defined above&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The algorithm proceeds as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;$p(s \bullet t_i \bullet \cdots \bullet t_j$&lt;/code&gt;) is false, the target element does not exist. The algorithm aborts with an error.&lt;/li&gt;
&lt;li&gt;Otherwise, if there&amp;rsquo;s only one element (&lt;code&gt;$i = j$&lt;/code&gt;), &lt;code&gt;$t_i$&lt;/code&gt; is the target element. The algorithm aborts with the found result.&lt;/li&gt;
&lt;li&gt;Otherwise, choose a pivot index &lt;code&gt;$i \le m \lt j$&lt;/code&gt; to split the sequence into two nonempty halves: &lt;code&gt;$t_i, \ldots, t_m$&lt;/code&gt; and &lt;code&gt;$t_{m+1},\ldots,t_j$&lt;/code&gt;. Test &lt;code&gt;$p(s \bullet t_i \bullet \cdots \bullet t_m)$&lt;/code&gt; that
&lt;ul&gt;
&lt;li&gt;If it is true, continue the search in the first half: &lt;code&gt;$\mathrm{Search}(p, s, \{t_i,\ldots,t_m\})$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Otherwise, continue the search in the second half: &lt;code&gt;$\mathrm{Search}(p, s \bullet (t_i \bullet \cdots \bullet t_m),\{t_{m+1},\ldots,t_j\})$&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The algorithm starts with &lt;code&gt;$\mathrm{Search}(p, e, \{t_1, \ldots, t_k\})$&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="application-random-access-sequence"&gt;Application: Random-Access Sequence&lt;/h3&gt;
&lt;p&gt;An application of the search algorithm is accessing the nth element in the sequence.&lt;/p&gt;
&lt;p&gt;We initialize the sequence to all 1s and use the monoid of non-negative integers with addition &lt;code&gt;$(\mathbb{N},+,0)$&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\underbrace{1, \ldots, 1}_{n \text{ times}}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The predicate to find the i-th (starting from 0) element is:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
p_i(s) := s &amp;gt; i
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It may seem silly to search for the i-th 1 in a sequence of 1s, but we can store any data in the sequence and attach the monoid values as annotations to guide the search algorithm.&lt;/p&gt;
&lt;figure class="kg-image-card"&gt;
&lt;img class="kg-image"
alt="Sequence Monoid Annotations.excalidraw"
loading="lazy"
src="https://blog.iany.me/2026/02/power-of-monoid-beauty-of-simplicity/Sequence%20Monoid%20Annotations.excalidraw.svg" /&gt;
&lt;/figure&gt;
&lt;h3 id="application-max-priority-queue"&gt;Application: Max-Priority Queue&lt;/h3&gt;
&lt;p&gt;Another application is finding the element with the max priority.&lt;/p&gt;
&lt;p&gt;We use the monoid of non-negative integers with operator &lt;code&gt;max&lt;/code&gt; &lt;code&gt;$(\mathbb{N},\mathrm{max},0)$&lt;/code&gt; and assume that the maximum value has the maximum priority.&lt;/p&gt;
&lt;p&gt;The predicate to find the element with the max priority is&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
p(s) := s = m
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Where &lt;code&gt;$m$&lt;/code&gt; is the monoid summary of the entire sequence—that is, the maximum value in the sequence. The predicate checks whether the summary equals to &lt;code&gt;$m$&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="annotated-search-tree"&gt;Annotated Search Tree&lt;/h3&gt;
&lt;p&gt;A natural way to support the divide-and-conquer search is an &lt;em&gt;annotated binary tree&lt;/em&gt;. Store the sequence elements at the leaves, and at each node store the monoid summary of the subtree—e.g. the sum of lengths or the maximum priority in that subtree. The predicate can then be evaluated on the left subtree’s annotation to decide whether to descend left or right, and the prefix summary is updated when going right by combining it with the left subtree’s summary.&lt;/p&gt;
&lt;figure class="kg-image-card"&gt;
&lt;img class="kg-image"
alt="Annotated Binary Tree.excalidraw"
loading="lazy"
src="https://blog.iany.me/2026/02/power-of-monoid-beauty-of-simplicity/Annotated%20binary%20tree.excalidraw.svg" /&gt;
&lt;/figure&gt;
&lt;p&gt;A plain binary tree can degenerate to a list in the worst case, so operations may become linear. A more advanced structure, the &lt;em&gt;finger tree&lt;/em&gt;&lt;sup id="fnref:1"&gt;&lt;span class="footnote-ref" role="doc-noteref"&gt;1&lt;/span&gt;&lt;/sup&gt;, keeps the tree balanced and supports efficient access at both ends and in the middle; each node carries a monoidal “measure” of its subtree, and the same search strategy applies. In Haskell, &lt;a href="https://hackage-content.haskell.org/package/containers-0.8/docs/Data-Sequence.html"&gt;Data.Sequence&lt;/a&gt; from the &lt;code&gt;containers&lt;/code&gt; library implements sequences as finger trees with size (length) as the measure, giving &lt;code&gt;$O(\log n)$&lt;/code&gt; indexing, splitting, and concatenation.&lt;/p&gt;
&lt;h3 id="utility-of-the-identity-element"&gt;Utility of the Identity Element&lt;/h3&gt;
&lt;p&gt;The general divide-and-conquer algorithm does not require a monoid—only a semigroup. A semigroup is a fancy word for a set equipped with a closed, associative binary operator but lacking an identity element. The presence of an identity element makes monoids convenient to work with.&lt;/p&gt;
&lt;p&gt;The identity element serves as a natural default value or starting point for algorithms. For instance, in the search algorithm, the summary of the scanned prefix is initialized to &lt;code&gt;$e$&lt;/code&gt;. Without an identity element, we need an additional flag to indicate whether any prefix has been scanned, and the algorithm would have to branch conditionally based on that flag.&lt;/p&gt;
&lt;h2 id="the-art-of-choosing-monoid-and-predicate"&gt;The art of choosing monoid and predicate&lt;/h2&gt;
&lt;p&gt;In the random-access example we used &lt;code&gt;$(\mathbb{N}, +, 0)$&lt;/code&gt; and annotated each position with &lt;code&gt;$1$&lt;/code&gt;—the summary of a segment is its length, and the predicate &lt;code&gt;$s &amp;gt; i$&lt;/code&gt; tells us whether the &lt;code&gt;$i$&lt;/code&gt;-th element lies in the prefix we have so far. In the max-priority queue we used &lt;code&gt;$(\mathbb{N}, \max, 0)$&lt;/code&gt; (or a bounded variant): the summary is the maximum value in the segment, and the predicate &lt;code&gt;$s = m$&lt;/code&gt; identifies the segment that contains the global maximum. In both cases, the monoid was chosen so that the &lt;em&gt;combined&lt;/em&gt; summary over a range is exactly what the predicate needs to decide where to go next.&lt;/p&gt;
&lt;p&gt;The flip side is that finding both the right monoid and the right predicate can be tricky. At each step the search has access only to the monoid summary of the prefix (or segment) seen so far, so the predicate must be decided from that summary alone. The monoid must be rich enough to supply the information the predicate needs. Sometimes the natural summary (e.g. sum or max) suggests the predicate (e.g. &lt;code&gt;$s &amp;gt; i$&lt;/code&gt; or &lt;code&gt;$s = m$&lt;/code&gt;). Sometimes you must try a different carrier or operation, or encode extra information into the monoid (e.g. pairs or custom types), so that the predicate can be expressed. There is no universal recipe—it is a matter of design and experimentation. Reframe the problem as: “What do I need to know about a segment to decide the next step?” Then choose a monoid that can represent that knowledge and a predicate that uses it.&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;Hinze, R., &amp;amp; Paterson, R. (2006). Finger trees: A simple general-purpose data structure. &lt;em&gt;Journal of Functional Programming, 16&lt;/em&gt;(2), 197–217. Cambridge University Press. &lt;a href="https://www.cs.ox.ac.uk/ralf.hinze/publications/FingerTrees.pdf"&gt;https://www.cs.ox.ac.uk/ralf.hinze/publications/FingerTrees.pdf&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description><category domain="https://blog.iany.me/post/">Posts</category><category domain="https://blog.iany.me/tags/algorithm/">Algorithm</category><category domain="https://blog.iany.me/tags/math/">Math</category><category domain="https://blog.iany.me/tags/programming/">Programming</category></item><item><title>Study on Quotient Spaces</title><link>https://blog.iany.me/2025/11/study-on-quotient-spaces/</link><pubDate>Tue, 18 Nov 2025 21:23:25 +0800</pubDate><author>me@iany.me (Ian Yang)</author><guid>https://blog.iany.me/2025/11/study-on-quotient-spaces/</guid><description>&lt;p&gt;I&amp;rsquo;m reading &lt;em&gt;Linear Algebra Done Right&lt;/em&gt; by Axler and found the section on quotient spaces difficult to understand, so I researched and took these notes.&lt;/p&gt;
&lt;h2 id="definitions"&gt;Definitions&lt;/h2&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.95 notion: $v + U$
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$v \in V$&lt;/code&gt; and &lt;code&gt;$U \subseteq V$&lt;/code&gt;. Then &lt;code&gt;$v + U$&lt;/code&gt; is the subset of &lt;code&gt;$V$&lt;/code&gt; defined by&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[v + U = \{v + u : u \in U\}.\]
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;Also called a translate. &lt;strong&gt;Attention&lt;/strong&gt; that a translate is a set.&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.97 definition: &lt;em&gt;translate&lt;/em&gt;
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
Suppose &lt;code&gt;$v \in V$&lt;/code&gt; and &lt;code&gt;$U \subseteq V$&lt;/code&gt;, the set &lt;code&gt;$v + U$&lt;/code&gt; is said to be a &lt;em&gt;translate&lt;/em&gt; of &lt;code&gt;$U$&lt;/code&gt;.
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;Quotient space is a set of all translates (set of sets):&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.99 definition: &lt;em&gt;quotient space&lt;/em&gt;, $V/U$
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$U$&lt;/code&gt; is a subspace of &lt;code&gt;$V$&lt;/code&gt;. Then the &lt;em&gt;quotient space&lt;/em&gt; &lt;code&gt;$V/U$&lt;/code&gt; is the set of all translates of &lt;code&gt;$U$&lt;/code&gt;. Thus&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[V/U = \{v + U : v \in V\}.\]
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;Quotient space is a set of sets. There are duplicates for each &lt;code&gt;$v \in V$&lt;/code&gt; because for some &lt;code&gt;$v_1, v_2 \in V$&lt;/code&gt;, &lt;code&gt;$v_1 + U$&lt;/code&gt; and &lt;code&gt;$v_2 + U$&lt;/code&gt; can be identical set.&lt;/p&gt;
&lt;p&gt;A quotient space &lt;code&gt;$V/U$&lt;/code&gt; is formed by &amp;ldquo;collapsing&amp;rdquo; a subspace &lt;code&gt;$U$&lt;/code&gt; to zero within a larger vector space &lt;code&gt;$V$&lt;/code&gt;. This construction is based on an equivalence relation where two vectors &lt;code&gt;$x, y \in V$&lt;/code&gt; are considered equivalent if their difference lies in &lt;code&gt;$U$&lt;/code&gt;—that is, &lt;code&gt;$x \sim y$&lt;/code&gt; if and only if &lt;code&gt;$x - y \in U$&lt;/code&gt;. &lt;a href="https://en.wikipedia.org/wiki/Quotient_space_%28linear_algebra%29"&gt;wikipedia&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="lemmas"&gt;Lemmas&lt;/h2&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.101 &lt;em&gt;two translates of a subspace are equal or disjoint&lt;/em&gt;
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$U$&lt;/code&gt; is a subspace of &lt;code&gt;$V$&lt;/code&gt; and &lt;code&gt;$v, w \in V$&lt;/code&gt;. Then&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
v - w \in U \iff v + U = w + U \iff (v + U) \cap (w + U) \neq \emptyset
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;If two translates are not disjoint (the union set is not empty), they must be equal. So they are equal or disjoint.&lt;/p&gt;
&lt;p&gt;All distinct translates of a subspace are disjoint. Given any &lt;code&gt;$v \in V$&lt;/code&gt;, it belongs to only one translate.&lt;/p&gt;
&lt;p&gt;Since the quotient space &lt;code&gt;$V/U$&lt;/code&gt; is a set of translates of a subspace, it is like a disjoint partition of values in &lt;code&gt;$V$&lt;/code&gt;. By using the definition of quotient map&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.104 definition: &lt;em&gt;quotient map&lt;/em&gt;, $\pi$
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$U$&lt;/code&gt; is a subspace of &lt;code&gt;$V$&lt;/code&gt;. The &lt;em&gt;quotient map&lt;/em&gt; &lt;code&gt;$\pi : V \to V/U$&lt;/code&gt; is the linear map defined by&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[\pi(v) = v + U\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;for each &lt;code&gt;$v \in V$&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;We can write that&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\pi(v_1) = \pi(v_2) \iff v_1 - v_2 \in U
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The quotient map has two essential properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;null space&lt;/strong&gt; of &lt;code&gt;$\pi$&lt;/code&gt; is exactly the subspace &lt;code&gt;$U$&lt;/code&gt;, because &lt;code&gt;$v+U=0+U \iff v-0 \in U \iff v \in U$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;range&lt;/strong&gt; of &lt;code&gt;$\pi$&lt;/code&gt; is the entire quotient space &lt;code&gt;$V/U$&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="quotient-space-is-a-vector-space"&gt;Quotient Space Is a Vector Space&lt;/h2&gt;
&lt;p&gt;First define the addition and scalar multiplication operations:&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.102 definition: &lt;em&gt;addition and scalar multiplication on&lt;/em&gt; $V/U$
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$U$&lt;/code&gt; is a subspace of &lt;code&gt;$V$&lt;/code&gt;. Then addition and scalar multiplication are defined on &lt;code&gt;$V/U$&lt;/code&gt; by&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[\begin{align*}
(v + U) + (w + U) &amp;amp;= (v + w) + U \\
\lambda(v + U) &amp;amp;= (\lambda v) + U
\end{align*}\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;for all &lt;code&gt;$v, w \in V$&lt;/code&gt; and &lt;code&gt;$\lambda \in \mathbf{F}$&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;&lt;code&gt;$v+U$&lt;/code&gt; is not the unique way to represent a member in &lt;code&gt;$V/U$&lt;/code&gt;, because there may exist &lt;code&gt;$v'\ne v$&lt;/code&gt; that &lt;code&gt;$u + U = v' + U$&lt;/code&gt;. The operations make sense only when the choice of &lt;code&gt;$v$&lt;/code&gt; to represent a translate makes no differences.&lt;/p&gt;
&lt;p&gt;Specifically, suppose &lt;code&gt;$v_1, v_2, w_1, w_2 \in V$&lt;/code&gt; such that&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
v_1 + U = v_2 + U \quad\textrm{and}\quad w_1 + U = w_2 + U
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;From the addition definition:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\begin{align*}
(v_1+U) + (w_1+U) &amp;amp;= (v_1 + w_1) + U \\
(v_2+U) + (w_2+U) &amp;amp;= (v_2 + w_2) + U
\end{align*}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The left side of the two equations indeed are the different representation of the same equation, so we must show that the right side equal: &lt;code&gt;$(v_1 + w_1)+U=(v2+w2)+U$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This applies to scalar multiplication as well:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\begin{align*}
\lambda(v_1 + U) &amp;amp;= (\lambda v_1) + U \\
\lambda(v_2 + U) &amp;amp;= (\lambda v_2) + U
\end{align*}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We must show that &lt;code&gt;$(\lambda v_1) + U = (\lambda v_2) + U$&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="dimension"&gt;Dimension&lt;/h2&gt;
&lt;p&gt;The dimension of the quotient space is given by a simple subtraction, relating the dimension of &lt;code&gt;$V/U$&lt;/code&gt; to the &amp;ldquo;lost&amp;rdquo; dimension of &lt;code&gt;$U$&lt;/code&gt;:&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.105 &lt;em&gt;dimension of quotient space&lt;/em&gt;
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$V$&lt;/code&gt; is finite-dimensional and &lt;code&gt;$U$&lt;/code&gt; is a subspace of V. Then&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[\text{dim } V/U = \text{dim }V - \text{dim }U.\]
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;h2 id="linear-map-from-vnull-t-to-w"&gt;Linear Map from V/(null T) to W&lt;/h2&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.106 notation: $\widetilde{T}$
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;p&gt;Suppose &lt;code&gt;$T \in \mathcal{L}(V, W)$&lt;/code&gt;. Define &lt;code&gt;$\widetilde{T}: V/(\text{null } T) \to W$&lt;/code&gt; by&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[\widetilde{T}(v + \text{null } T) = Tv.\]
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;Think of merging inputs having the same output. These inputs will be the same input in the quotient space &lt;code&gt;$V/(\text{null } T)$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For any &lt;code&gt;$v_1, v_2 \in V$&lt;/code&gt; that &lt;code&gt;$Tv_1 = Tv_2$&lt;/code&gt;, &lt;code&gt;$v_1 + \mathrm{null}\, T$&lt;/code&gt; and &lt;code&gt;$v_2 + \mathrm{null}\, T$&lt;/code&gt; are the same value in &lt;code&gt;$V/(\mathrm{null}\, T)$&lt;/code&gt;. This makes &lt;code&gt;$\widetilde{T}$&lt;/code&gt; injective. Because &lt;code&gt;$\mathrm{range}\,\widetilde{T}=\mathrm{range}\, T$&lt;/code&gt;, &lt;code&gt;$\widetilde{T}$&lt;/code&gt; is also surjective on to &lt;code&gt;$\mathrm{range}\, T$&lt;/code&gt;.&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.63 &lt;em&gt;invertibility&lt;/em&gt; $\iff$ &lt;em&gt;injectivity and surjectivity&lt;/em&gt;
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
A linear map is invertible if and only if it is injective and surjective.
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;3.63 shows us that &lt;code&gt;$\widetilde{T}$&lt;/code&gt; is invertible, and according to the definition of isomorphic, &lt;code&gt;$V/(\mathrm{null}\, T)$&lt;/code&gt; and &lt;code&gt;$\mathrm{range}\,T$&lt;/code&gt; are isomorphic vector spaces and &lt;code&gt;$\widetilde{T}$&lt;/code&gt; is their isomorphism.&lt;/p&gt;
&lt;details open disabled class="kg-card kg-callout kg-callout-definition" data-callout-type="definition"&gt;
&lt;summary class="kg-callout-title" tabindex="-1"&gt;
&lt;i aria-hidden="true" class="kg-callout-type fas fa-book"&gt;&lt;/i&gt;
3.69 definition: &lt;em&gt;isomorphism, isomorphic&lt;/em&gt;
&lt;/summary&gt;
&lt;div class="kg-callout-content"&gt;
&lt;ul&gt;
&lt;li&gt;An &lt;em&gt;isomorphism&lt;/em&gt; is an invertible linear map.&lt;/li&gt;
&lt;li&gt;Two vector spaces are called isomorphic if there is an isomorphism from one vector space onto the other one.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;One of the key uses of &lt;code&gt;$\widetilde{T}$&lt;/code&gt; is demonstrating a canonical isomorphism. For any linear map &lt;code&gt;$T \in \mathcal{L}(V, W)$&lt;/code&gt;, the quotient space &lt;code&gt;$V/(\text{null } T)$&lt;/code&gt; is isomorphic to the image space &lt;code&gt;$\text{range } T$&lt;/code&gt;. This shows that the quotient space &lt;code&gt;$V/(\text{null } T)$&lt;/code&gt; serves as a way to &amp;ldquo;mod out&amp;rdquo; the non-injective part of &lt;code&gt;$T$&lt;/code&gt;.&lt;/p&gt;</description><category domain="https://blog.iany.me/post/">Posts</category><category domain="https://blog.iany.me/tags/math/">Math</category><category domain="https://blog.iany.me/tags/linear-algebra/">Linear Algebra</category></item><item><title>Study on Alias Method</title><link>https://blog.iany.me/2010/05/study-on-alias-method/</link><pubDate>Sat, 29 May 2010 00:00:00 +0000</pubDate><author>me@iany.me (Ian Yang)</author><guid>https://blog.iany.me/2010/05/study-on-alias-method/</guid><description>&lt;p&gt;&lt;a href="http://twitter.com/miloyip"&gt;@miloyip&lt;/a&gt; has published a &lt;a href="http://www.cnblogs.com/miloyip/archive/2010/05/27/reply_discrete.html"&gt;post&lt;/a&gt; recently which motioned the Alias Method to generate a discrete random variable in &lt;em&gt;O(1)&lt;/em&gt;. After some research, I find out that it is a neat and clever algorithm. Following are some notes of my study on it.&lt;/p&gt;
&lt;h2 id="what-is-alias-method"&gt;What is Alias Method&lt;/h2&gt;
&lt;p&gt;Alias method is an efficient algorithm to generate a discrete random variable with specified probability mass function using a uniformly distributed random variable.&lt;/p&gt;
&lt;p&gt;Let &lt;code&gt;$Z$&lt;/code&gt; be the discrete random variable which has n possible outcomes &lt;code&gt;$z_0,z_1,\ldots,z_{n-1}$&lt;/code&gt;. To make the discussion below simple, we study another variable &lt;code&gt;$Y$&lt;/code&gt;, where &lt;code&gt;$P\{Y=i\}=P\{Z=z_i\}$&lt;/code&gt;. And when &lt;code&gt;$Y$&lt;/code&gt; takes on value &lt;code&gt;$i$&lt;/code&gt;, let &lt;code&gt;$Z$&lt;/code&gt; be &lt;code&gt;$z_i$&lt;/code&gt;. So &lt;code&gt;$Z$&lt;/code&gt; can be generated from &lt;code&gt;$Y$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Random variable &lt;code&gt;$X$&lt;/code&gt; is uniformly distributed in &lt;code&gt;$(0, n)$&lt;/code&gt;, which probability density function is&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
f(x) = \left\{
\begin{array}{rl}
1/n &amp;amp; \text{if } 0 &amp;lt; x &amp;lt; n\\
0 &amp;amp; \text{otherwise}\\
\end{array} \right.
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now generate a variable &lt;code&gt;$Y'$&lt;/code&gt; that&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
Y' = \left\{
\begin{array}{rl}
\lfloor x \rfloor &amp;amp; \text{if } (x - \lfloor x \rfloor) &amp;lt; F(\lfloor x \rfloor)\\
A(\lfloor x \rfloor) &amp;amp; \text{otherwise}\\
\end{array} \right.
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;$A(i)$&lt;/code&gt; is the alias function. When &lt;code&gt;$x$&lt;/code&gt; falls in range &lt;code&gt;$[i, i + 1)$&lt;/code&gt; (&lt;code&gt;$i$&lt;/code&gt; is an integer), &lt;code&gt;$y$&lt;/code&gt; has the probability &lt;code&gt;$F(i)$&lt;/code&gt; to be &lt;code&gt;$i$&lt;/code&gt;, and probability &lt;code&gt;$1 - F(i)$&lt;/code&gt; to be &lt;code&gt;$A(i)$&lt;/code&gt;. Because &lt;code&gt;$x$&lt;/code&gt; is uniformly distributed,&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\begin{aligned}
P\{x \in [i, i + F(i))\} &amp;amp;= \displaystyle\int_i^{i+F(i)}\frac{1}{n}dx\\
&amp;amp;= (i + F(i) - i) \times 1/n\\
&amp;amp;= F(i)/n,\\
\\
P\{x \in [i + F(i), i + 1)\} &amp;amp;= \displaystyle\int_{i+F(i)}^{i+1}\frac{1}{n}dx\\
&amp;amp;= (i + 1 - (i + F(i))) \times 1/n\\
&amp;amp;= (1-F(i))/n
\end{aligned}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&amp;rsquo;s denote the set of values &lt;code&gt;$j$&lt;/code&gt; that satisfies &lt;code&gt;$A(j) = i$&lt;/code&gt; as &lt;code&gt;$A^{-1}(i)$&lt;/code&gt;. The generated variable &lt;code&gt;$Y'$&lt;/code&gt; has following probability mass function:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
P\{Y' = i\} = F(i)/n + \sum_{j \in A^{-1}(i)}\frac{1-F(j)}{n}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Alias method is the algorithm to construct &lt;code&gt;$A$&lt;/code&gt; and &lt;code&gt;$F$&lt;/code&gt; so that &lt;code&gt;$P\{Y' = i\}$&lt;/code&gt; equals to &lt;code&gt;$P\{Y = i\}$&lt;/code&gt; for all &lt;code&gt;$i$&lt;/code&gt;. Because the domain of both &lt;code&gt;$A$&lt;/code&gt; and &lt;code&gt;$F$&lt;/code&gt; are integers &lt;code&gt;$0,1,\ldots,n-1$&lt;/code&gt;, they can be stored in array and values can be looked up in &lt;em&gt;O(1)&lt;/em&gt;, where the space efficiency is in &lt;em&gt;O(n)&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In miloyip&amp;rsquo;s implementation, &lt;code&gt;$A$&lt;/code&gt; and &lt;code&gt;$F$&lt;/code&gt; are stored in &lt;code&gt;std::vector&amp;lt;AliasItem&amp;gt; mAliasTable&lt;/code&gt;, where &lt;code&gt;$A$&lt;/code&gt;&amp;rsquo;s values are stored in &lt;code&gt;AliasItem::index&lt;/code&gt; and &lt;code&gt;$F$&lt;/code&gt;&amp;rsquo;s values are &lt;code&gt;AliasItem::prob&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="algorithm"&gt;Algorithm&lt;/h2&gt;
&lt;h3 id="construct-steps"&gt;Construct Steps&lt;/h3&gt;
&lt;p&gt;Initialize the set &lt;code&gt;$S$&lt;/code&gt; to be &lt;code&gt;${0,1,\ldots,n-1}$&lt;/code&gt; and n variables &lt;code&gt;$p_i$&lt;/code&gt; that with values:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[ p_i = P\{Y=i\}, i \in S \]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Denote the number of elements in &lt;code&gt;$S$&lt;/code&gt; as &lt;code&gt;$\|S\|$&lt;/code&gt;. We have a important invariant that&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[ \sum_{i \in S}{p_i} = \|S\| / n \]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At the beginning of the algorithm, the invariant holds because the sum of all probabilities must equal to 1.&lt;/p&gt;
&lt;p&gt;The algorithm is performed using following steps.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If there is an element &lt;code&gt;$i$&lt;/code&gt; in set &lt;code&gt;$S$&lt;/code&gt; such that &lt;code&gt;$p_i &amp;lt; 1/n$&lt;/code&gt;, there must be a &lt;code&gt;$j$&lt;/code&gt; in set &lt;code&gt;$S$&lt;/code&gt; such that &lt;code&gt;$p_j &amp;gt; 1/n$&lt;/code&gt;.&lt;sup id="fnref:1"&gt;&lt;span class="footnote-ref" role="doc-noteref"&gt;1&lt;/span&gt;&lt;/sup&gt; Let &lt;code&gt;$A(i) = j$&lt;/code&gt; and &lt;code&gt;$F(i) = p_i / (1/n) = p_i \times n$&lt;/code&gt;. Remove &lt;code&gt;$i$&lt;/code&gt; from &lt;code&gt;$S$&lt;/code&gt; and subtract &lt;code&gt;$n/1 - p_i$&lt;/code&gt; from &lt;code&gt;$p_j$&lt;/code&gt;. It is easy to verify that the invariant still holds after these changes.&lt;sup id="fnref:2"&gt;&lt;span class="footnote-ref" role="doc-noteref"&gt;2&lt;/span&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Repeat step 1 until &lt;code&gt;$S$&lt;/code&gt; is empty or there is no more elements &lt;code&gt;$i$&lt;/code&gt; in &lt;code&gt;$S$&lt;/code&gt; that &lt;code&gt;$p_i &amp;lt; 1/n$&lt;/code&gt;. If &lt;code&gt;$S$&lt;/code&gt; is empty, the algorithm finishes. Otherwise for all remaining &lt;code&gt;$i$&lt;/code&gt; in &lt;code&gt;$S$&lt;/code&gt;, we must have &lt;code&gt;$p_i = 1/n$&lt;/code&gt;.&lt;sup id="fnref:3"&gt;&lt;span class="footnote-ref" role="doc-noteref"&gt;3&lt;/span&gt;&lt;/sup&gt; Let &lt;code&gt;$A(i)=i$&lt;/code&gt; and &lt;code&gt;$F(i)=p_i\times n=1$&lt;/code&gt; for all remaining &lt;code&gt;$i$&lt;/code&gt;, and remove them from the set &lt;code&gt;$S$&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The algorithm finishes when &lt;code&gt;$S$&lt;/code&gt; becomes empty, and an element is removed only when its corresponding &lt;code&gt;$A$&lt;/code&gt; and &lt;code&gt;$F$&lt;/code&gt; has been determined, so all values of &lt;code&gt;$A$&lt;/code&gt; and &lt;code&gt;$F$&lt;/code&gt; has been generated.&lt;/p&gt;
&lt;p&gt;In miloyip&amp;rsquo;s implementation, &lt;code&gt;$p_i$&lt;/code&gt; is stored in &lt;code&gt;AliasItem::prob&lt;/code&gt; before &lt;code&gt;$i$&lt;/code&gt; is removed from the set. When &lt;code&gt;$i$&lt;/code&gt; is removed from the set, &lt;code&gt;AliasItem::prob&lt;/code&gt; is set to &lt;code&gt;$F(i)$&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="correctness"&gt;Correctness&lt;/h3&gt;
&lt;p&gt;The invariant holds at the beginning and at the end of each step, it guarantees that the algorithm can finish. It is easy to prove it using mathematical induction. So we only need to prove &lt;code&gt;$P\{Y'=i\}=P\{Y=i\}$&lt;/code&gt; for any &lt;code&gt;$i$&lt;/code&gt;, i.e.,&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[ P\{Y = i\} = F(i)/n + \sum_{j \in A^{-1}(i)}\frac{1-F(j)}{n} \]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Denote &lt;code&gt;$p'_i$&lt;/code&gt; as the value of &lt;code&gt;$p_i$&lt;/code&gt; when &lt;code&gt;$i$&lt;/code&gt; is removed from set &lt;code&gt;$S$&lt;/code&gt;. Check the construction steps again, we get following properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;No &lt;code&gt;$p_i$&lt;/code&gt; can increase. Thus &lt;code&gt;$p_i &amp;lt;= P\{Y=i\}$&lt;/code&gt; in all steps and &lt;code&gt;$p'_i &amp;lt;= P\{X=i\}$&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$p_i$&lt;/code&gt; decreases only when its initial value &lt;code&gt;$P\{Y=i\}&amp;gt;1/n$&lt;/code&gt;. So if &lt;code&gt;$P\{Y=i\}&amp;lt;=1/n$&lt;/code&gt;, &lt;code&gt;$p_i = P\{Y=i\}$&lt;/code&gt; throughout the algorithm and &lt;code&gt;$p'_i=P\{Y=i\}$&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$F(i) = p'_i \times n$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$i$&lt;/code&gt; is removed only when &lt;code&gt;$p_i \leq 1/n$&lt;/code&gt;, i.e., &lt;code&gt;$p'_i \leq 1/n$&lt;/code&gt;, thus &lt;code&gt;$F(i)=p'_i \times n \leq 1$&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$A(j)$&lt;/code&gt; is set to a value &lt;code&gt;$i \neq j$&lt;/code&gt; only if &lt;code&gt;$p_i &amp;gt; 1/n$&lt;/code&gt; (see step 1), i.e., &lt;code&gt;$P\{Y=i\}&amp;gt;1/n$&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now consider value &lt;code&gt;$i$&lt;/code&gt; when &lt;code&gt;$P\{Y=i\}&amp;lt;1/n$&lt;/code&gt;, &lt;code&gt;$P\{Y=i\}=1/n$&lt;/code&gt; and &lt;code&gt;$P\{Y=i\}&amp;gt;1/n$&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="pyi--1n"&gt;P{Y=i} &amp;lt; 1/n&lt;/h4&gt;
&lt;p&gt;If &lt;code&gt;$P\{Y=i\} &amp;lt; 1/n$&lt;/code&gt;, from property 2 and property 3, &lt;code&gt;$F(i) = p'_i \times n = P\{Y=i\} \times n$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Apparently &lt;code&gt;$A^{-1}(i) = {}$&lt;/code&gt;, because &lt;code&gt;$A$&lt;/code&gt; is either set to value &lt;code&gt;$j$&lt;/code&gt; where &lt;code&gt;$p_j&amp;gt;1/n$&lt;/code&gt; in step 1 or &lt;code&gt;$k$&lt;/code&gt; where &lt;code&gt;$p_k = 1/n$&lt;/code&gt; in step 2.&lt;/p&gt;
&lt;p&gt;Thus&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[
\begin{aligned}
&amp;amp;F(i)/n + \sum_{j \in A^{-1}(i)}\frac{1-F(j)}{n}\\
=&amp;amp;F(i)/n\\
=&amp;amp;P\{Y=i\} \times n / n\\
=&amp;amp;P\{Y=i\}
\end{aligned}
\]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which completes the proof.&lt;/p&gt;
&lt;h4 id="pyi--1n-1"&gt;P{Y=i} = 1/n&lt;/h4&gt;
&lt;p&gt;If &lt;code&gt;$P\{Y=i\} = 1/n$&lt;/code&gt;, apparently &lt;code&gt;$A(i) = i$&lt;/code&gt;. If there&amp;rsquo;s another value &lt;code&gt;$j\neq~i$&lt;/code&gt; also satisfies &lt;code&gt;$A(j) = i$&lt;/code&gt;, from property 4, &lt;code&gt;$P\{Y=i\} &amp;gt; 1/n$&lt;/code&gt;, conflict with the condition. So &lt;code&gt;$A^{-1}(i) = {i}$&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Thus&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[ \begin{aligned}
&amp;amp;F(i)/n + \sum_{j \in A^{-1}(i)}\frac{1-F(j)}{n}\\
=&amp;amp;F(i)/n + (1-F(i))/n\\
=&amp;amp;1/n
\end{aligned} \]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which completes the proof.&lt;/p&gt;
&lt;h4 id="pyi--1n-2"&gt;P{Y=i} &amp;gt; 1/n&lt;/h4&gt;
&lt;p&gt;When &lt;code&gt;$P\{Y=i\} &amp;gt; 1/n$&lt;/code&gt;, apparently i is not in &lt;code&gt;$A^{-1}(i)$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Consider each value &lt;code&gt;$j$&lt;/code&gt; in set &lt;code&gt;$A^{-1}(i)$&lt;/code&gt;. Once &lt;code&gt;$j$&lt;/code&gt; is removed from &lt;code&gt;$S$&lt;/code&gt;, &lt;code&gt;$A(j)$&lt;/code&gt; is set to &lt;code&gt;$i$&lt;/code&gt; and &lt;code&gt;$1/n - p'_j$&lt;/code&gt; is subtracted from &lt;code&gt;$p_i$&lt;/code&gt;. Thus&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[ p'_i = P\{Y=i\} - \sum_{j \in A^{-1}(i)}(1/n - p'_j) \]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-katex"&gt;\[ \begin{aligned}
&amp;amp;F(i)/n + \sum_{j \in A^{-1}(i)}\frac{1-F(j)}{n}\\
=&amp;amp;p'_i \times n / n + \sum_{j \in A^{-1}(i)}\frac{1-(p'_j \times~n)}{n}\\
=&amp;amp;P\{Y=i\} - \sum_{j \in A^{-1}(i)}(1/n - p'_j)\ + \sum_{j \in A^{-1}(i)}(1/n - p'_j)\\
=&amp;amp;P\{Y=i\}
\end{aligned} \]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For all &lt;code&gt;$i$&lt;/code&gt;, &lt;code&gt;$P\{Y'=i\} = P\{Y=i\}$&lt;/code&gt;, the proof completes.&lt;/p&gt;
&lt;h2 id="intuitive-presentation"&gt;Intuitive Presentation&lt;/h2&gt;
&lt;p&gt;The algorithm can be presented in intuitive meaning. The range &lt;code&gt;$(0, n]$&lt;/code&gt; is split into n consecutive sub ranges &lt;code&gt;$(i, i + 1]$&lt;/code&gt; for &lt;code&gt;$i = 0, 1, \ldots, n - 1$&lt;/code&gt;. The probability of &lt;code&gt;$X$&lt;/code&gt; falls into any range is &lt;code&gt;$(i + 1 - i) \times 1/n = 1/n$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;$P\{Y=i\} = 1/n$&lt;/code&gt;, we can allocate the whole slot &lt;code&gt;$i$&lt;/code&gt; to it. Let &lt;code&gt;$Y=i$&lt;/code&gt; when &lt;code&gt;$x$&lt;/code&gt; falls in &lt;code&gt;$(i, i + 1]$&lt;/code&gt; which has the probability &lt;code&gt;$1/n$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;$P\{Y=i\} &amp;lt; 1/n$&lt;/code&gt;, we can allocate the starting part &lt;code&gt;$(i,i+n\times~P\{Y=i\}]$&lt;/code&gt; in &lt;code&gt;$(i,i+1]$&lt;/code&gt;. Let &lt;code&gt;$Y = i$&lt;/code&gt; when &lt;code&gt;$x$&lt;/code&gt; falls in &lt;code&gt;$(i, i + n\times P\{Y=i\}]$&lt;/code&gt;, where the probability is &lt;code&gt;$n\times~P\{Y=i\}\times(1/n)=P\{Y=i\}$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;$P\{Y=i\} &amp;gt; 1/n$&lt;/code&gt;, we can allocate unused ranges in &lt;code&gt;$(j + n\times P\{Y=j\}, j + 1]$&lt;/code&gt; for any &lt;code&gt;$j$&lt;/code&gt; that &lt;code&gt;$P\{Y=j\} &amp;lt; 1/n$&lt;/code&gt;. However, unused range is not allowed to be split again.&lt;/p&gt;
&lt;p&gt;See the figure below, which demonstrates how to generate &lt;code&gt;$Y$&lt;/code&gt; with probability mass function &lt;code&gt;$n = 5$&lt;/code&gt;&lt;/p&gt;
&lt;figure class="kg-image-card"&gt;
&lt;img alt="Alias Method" class="kg-image" loading="lazy" src="https://blog.iany.me/2010/05/study-on-alias-method/alias-method_hu_31ea1afc8864766a.png" srcset="https://blog.iany.me/2010/05/study-on-alias-method/alias-method_hu_e264010de3a2ddd9.png 400w, https://blog.iany.me/2010/05/study-on-alias-method/alias-method_hu_31ea1afc8864766a.png 600w" sizes="(max-width: 400px) 100vw, 600px" /&gt;
&lt;figcaption &gt;Alias Method&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$P\{Y=0\} = 0.16$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$P\{Y=1\} = 0.1$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$P\{Y=2\} = 0.32$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$P\{Y=3\} = 0.22$&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$P\{Y=4\} = 0.2$&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;$P\{Y=4\}=1/n$&lt;/code&gt;, so let &lt;code&gt;$Y = 4$&lt;/code&gt; only when &lt;code&gt;$x$&lt;/code&gt; falls in &lt;code&gt;$(4, 5]$&lt;/code&gt;, which probability is &lt;code&gt;$(5-4)\times 0.2 = 0.2$&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$P\{Y=0\}=0.16&amp;lt;0.2$&lt;/code&gt;, so let &lt;code&gt;$Y = 0$&lt;/code&gt; only when &lt;code&gt;$x$&lt;/code&gt; falls in &lt;code&gt;$(0,0.16\times~5]$&lt;/code&gt;, i.e., &lt;code&gt;$(0,0.8]$&lt;/code&gt;, which probability is &lt;code&gt;$(0.8-0)\times~0.2=0.16$&lt;/code&gt;. &lt;code&gt;$(0.8,1]$&lt;/code&gt; is unused.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$P\{Y=1\}$&lt;/code&gt; is the same. &lt;code&gt;$(1,1.5]$&lt;/code&gt; is allocated and &lt;code&gt;$(1.5,2]$&lt;/code&gt; is unused.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$P\{Y=2\} = 0.32 &amp;gt; 0.2$&lt;/code&gt;, it needs ranges with total length &lt;code&gt;$0.32\times~5=1.6$&lt;/code&gt;. We allocate the range &lt;code&gt;$(0.8, 1]$&lt;/code&gt; and &lt;code&gt;$(1.5, 2]$&lt;/code&gt;. The remaining length &lt;code&gt;$1.6-0.2-0.5=0.9&amp;lt;1$&lt;/code&gt;, then we can allocate a part of its own slot. Finally, three ranges have been allocated, &lt;code&gt;$(0.8,1]$&lt;/code&gt;, &lt;code&gt;$(1.5,2]$&lt;/code&gt; and &lt;code&gt;$(2,2.9]$&lt;/code&gt;. &lt;code&gt;$(2.9,3]$&lt;/code&gt; is unused.&lt;/p&gt;
&lt;p&gt;Follow the same step to handle &lt;code&gt;$Y=3$&lt;/code&gt;. The final allocation is depicted in &lt;code&gt;$D$&lt;/code&gt;. The allocation is not unique, &lt;code&gt;$F$&lt;/code&gt; depicts another solution.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=355749"&gt;An Efficient Method for Generating Discrete Random Variables with General Distributions&lt;/a&gt;,
Alastair J. Walker&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.cnblogs.com/miloyip/archive/2010/05/27/reply_discrete.html"&gt;回应CSDN肖舸《做程序，要“专注”和“客观”》，实验比较各离散采样算法 - Milo的游戏开发 - 博客园&lt;/a&gt;,
Milo Yip&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;If all &lt;code&gt;$j$&lt;/code&gt; except &lt;code&gt;$i$&lt;/code&gt; that &lt;code&gt;$p_j \leq 1/n$&lt;/code&gt;, Sum up both end of the
inequalities for all &lt;code&gt;$j$&lt;/code&gt; and &lt;code&gt;$p_i &amp;lt; 1/n$&lt;/code&gt;, we can get
&lt;code&gt;$\sum_{i \in S}{p_i} &amp;lt; \|S\| / n$&lt;/code&gt; which is conflict with the invariant.&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;The right side has decreased &lt;code&gt;$1/n$&lt;/code&gt; because &lt;code&gt;$\|S\|$&lt;/code&gt; has decreased 1. The
left side has decreased &lt;code&gt;$p_i + (n/1 - p_i) = 1/n$&lt;/code&gt;, because &lt;code&gt;$i$&lt;/code&gt; is removed from the
set and &lt;code&gt;$(n/1 - p_i)$&lt;/code&gt; is subtracted from &lt;code&gt;$p_j$&lt;/code&gt;. Thus both side decrease the same
amount, the equality still holds.&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;Because no &lt;code&gt;$p_i &amp;lt; 1/n$&lt;/code&gt;, then &lt;code&gt;$p_i \geq 1/n$&lt;/code&gt;. To satisfy the invariant, no &lt;code&gt;$p_i$&lt;/code&gt;
can be larger then &lt;code&gt;$1/n$&lt;/code&gt;. Thus for all &lt;code&gt;$i$&lt;/code&gt; in &lt;code&gt;$S$&lt;/code&gt;, &lt;code&gt;$p_i = 1/n$&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description><category domain="https://blog.iany.me/post/">Posts</category><category domain="https://blog.iany.me/tags/algorithm/">Algorithm</category><category domain="https://blog.iany.me/tags/math/">Math</category><category domain="https://blog.iany.me/tags/probability/">Probability</category></item></channel></rss>