For the mathematical model, we have the K items,
, and total budget is T and we assume each label cost 1 so we are going to have T labels eventually. We assume each items has true label
which positive or negative, this binomial cases and we can extended to multiple class, labeling cases, this a singular idea. And the positive set
is defined as the set of items whose true label is positive. And
also defined a soft-label,
for each item which number between 0 and 1, and we define
as underlying probability of being labeled as positive by a member randomly picked from a group of perfect workers.
In this first case, we assume for every worker is perfect, it means they all reliable, but being perfect doesn’t means this worker gives the same answer or right answer. It just means they will try their best to figure out the best answer in their mind, and suppose everyone is perfect worker, just randomly picked one of them, and with
probability, we going to get a guy who believe this one is positive. That is how we explain
. So we are assume a label
is drawn from Bernoulli(
), and
must be consistent with the true label, which means
is greater or equal to 0.5 if and only if this item is positive with a true positive label. So our goal is to learn H*, the set of positive items. In other word, we want to make an inferred positive set H based on collected labels to maximize:

It can also be written as:

Before show the Bayesian framework, the paper use an example to mention why we choose Bayesian instead of frequency approach, such that we can propose some posterior of prior distribution on the soft-label
. We assume each
is drawn from a known Beta prior:

And the matrix:

So we know that the Bernoulli conjugate of beta, so once we get a new label for item i, we going to update posterior distribution, the beta distribution by:




Depending on the label is positive or negative.
Here is the whole procedure in the high level, we have T stage,
. And in current stage we look at matrix S, which summarized the posterior distribution information for all the 

We are going to make a decision, choose the next item to label
,
.
And depending what the label is positive or negative, we add a matrix to getting a label:




Above all, this is the whole framework.
When the t labels are collected, we can make an inference about the positive set Ht based on posterior distribution given by St

So here become the Bernoulli selection problem, we just take to look at the probability of being positive or being negative conditional
to see is greater than 0.5 or not, if it is greater than 0.5, then we prove this item into the current infer positive set
so this is a cost form for current optimal solution
based on the information in
.
After know what is optimal solution, then the paper show what is the optimal value. Plug
in the optimal function,

This function is just a single function which choose the larger one between the conditional probability of being positive and being negative. Once we get one more label for item i, we take a difference between this value, before and after we get a new label, we can see this conditional probability can actually simplify as follows:

The positive item being positive only depends on the beta posterior, therefore, if only the function of parameter of beta distribution function are a and b, as

One more label for this particular item, we double change the posterior function, so all of this items can be cancel except 1, so this is the change for whole accuracy and we defined as stage-wise reward: improvement the inference accuracy by one more sample. Of course this label have two positive value, we’ve get positive label or negative label, take average for this two, get expect reward. We just choose item to be label such that the expect reward is maximized using Knowledge Gradient:

They are multiple items, let us know how do we break the ties. If we break the tie deterministically, which means we choose the smallest index. We are going to have a problem because this is not consistent which means the positive stage
does not converge to the true positive stage
.
So we can also try to break the ties randomly, it works, however, we will see the performance is almost like uniform sampling, is the best reward. The writer’s policy is kinds of more greedy, instead of choosing the average in stage once reward, we can actually calculate the larger one, the max of the two stage possible reward, so Optimistic Knowledge Gradient:

And we know under optimistic knowledge gradient, the final inference accuracy converge to 100%. Above is based on every worker is perfect, however, in practice, workers are not always responsible. So if in imperfect workers, we assume K items,
.

The probability of item
being labeled as positive by a perfect worker.
M workers,
,
The probability of worker
giving the same label as a perfect worker. Distribution of the label
from worker
to item
:

And the action space is that

where
, label matrix: 
It is difficult to calculate, so we can use Variational Bayesian methods[5] of 