Surprisingly Popular Algorithm

TL;DR: A summary view of SPA - a mathematical model to extract correct answers from crowd wisdom using signal theory and Bayesian inference.

Summary sheet [1]

Theory

  • Model to be built by generalizing to mm coins, each coin having nn possible sides

  • Theorem 1 deals with a negative result to reveal limitations of votes and confidences

  • Subsequently show determination of answer in three complex settings:

    1. m=2,n2m = 2, n \geq 2
    2. m=n>2m = n > 2
    3. m,n2m, n \geq 2
  • Thus, theorem 2 proves SPA

  • Extensions for multiple choice (m>2m > 2 coins) rely on key Lemma applied in theorem 3

Model

  • Generalize to an arbitrary number mm of possible worlds
    • one of the worlds is actual, the rest are counterfactual
  • A correct answer occurs in actual world
    • this random variable should ideally belong to set {a1,...,am}\{a_1,...,a_m\} of mm possible answers to multiple-choice question
  • Distinguish between respondents’ vote and evidence
    • evidence of respondent rr is a private signal SrS^r that belong to set {s1,...,sn}\{s_1,...,s_n\}
  • Respondent’s vote VrV^r is:
Vr=V(Sr)V^r = V(S^r)

that maps signals to votes where Vr{v1,...,vm}V^r \in \{v_1,...,v_m\} for mm possible answers

  • Conditional of world aia_i, signals of different respondents distributed with p(skai)p(s_k \mid a_i) probability
  • Beliefs about signals received by other respondents:
p(sjsk)=ip(sjai)p(aisk)p(ai)p(s_j \mid s_k) = \sum_i p(s_j \mid a_i) p(a_i \mid s_k) p(a_i)

explicitly,

p(sjqskr)=Pr(Sq=sjSr=sk)p(s^q_j \mid s^r_k) = P_r(S^q = s_j \mid S^r = s_k)

Theorem 1

Stated as: The correct answer cannot be deduced by any algorithm relying exclusively on knowledge of actual signal probabilities, p(skai),k=1,...,np(s_k|a_{i^*}), k=1,...,n and posterior probabilities over answers implied by these signals, p(aisk),k=1,...,n,i=1,...,mp(a_i|s_k), k=1,...,n, i=1,...,m.

Theorem 2

Stated as: Assume that not everyone votes for the correct answer. Then the average estimate of the votes for the correct answer will be underestimated.

  • Applicable for the two worlds, many signals case m=2,n2m=2, n \geq 2
  • Proof: attempting to show that actual votes for correct answer exceed that of counter-factual ones i.e., p(viai)>p(viak),kip(v_{i^*} \mid a_{i^*}) > p(v_{i^*} \mid a_k), k \neq i^* as:
p(viai)p(viak)=p(aivi)p(ak)p(akvi)p(ai)=p(aivi)(1p(aivi))(1p(ai))p(ai)\frac{p(v_{i^*} \mid a_{i^*})}{p(v_{i^*} \mid a_k)} = \frac{p(a_{i^*} \mid v_{i^*}) p(a_k)}{p(a_k \mid v_{i^*}) p(a_{i^*})} = \frac{p(a_{i^*} \mid v_{i^*})}{(1 - p(a_{i^*} \mid v_{i^*}))} \cdot \frac{(1 - p(a_{i^*}))}{p(a_{i^*})}

References

[1] Supplementary information: A solution to the single question crowd wisdom problem - readcube.com