Guided Labeling Episode 6: Comparing Active Learning with Weak Supervision

Click to learn more about author Paolo Tamagnini.

Welcome to the sixth episode of our Guided Labeling Blog Series. In the last episode, we made an analogy with a number of “friends” labeling “movies” with three different outcomes: “good movie” (?), “not seen movie” ( – ), “bad movie” (?). We have seen how we can train a machine learning model, also predicting movies no friend has watched before and adding to the model additional feature data about such movies.

The other episodes are here:

Let’s pick up where we left off.

You can blend friends’ movie opinions into a single model, but how is this useful if you don’t have any labels to train a generic supervised model? How can weak supervision become an alternative to active learning in a generic classification task? How can this analogy with many “friends” labeling “movies” work better than a single human expert like in active learning?

Weak Supervision Instead of Active Learning

The key feature that differentiates active learning from weak supervision is the source of the labels we are using to train a generic classification model from an unlabeled dataset.

Unique vs. Flexible

In active learning, the source of labels — referred to in literature as the “oracle” — is usually quite unique, making it expensive and hard to find. This can be an expensive experiment, but, more often than not, we are talking about a subject matter expert (SME) that is a human with domain expertise. In weak supervision, the weak source can be a human with less expertise who makes mistakes but also something else like a heuristic, which applies only to a subset of the dataset.

IF “movie budget category” is “low”
AND “actor popularity” is “none”:
MOVIE LABEL = “?”
ELSE:
MOVIE LABEL = “-”

Of course, this rule (or heuristic) is not accurate at all and only applies to some movies, but this can be thought of as a weak source in weak supervision and considered a labeling function. In most cases, you will need an expensive human expert to build those heuristics, but this is still less time consuming than manual labeling work. Once you have a set of heuristics, you can apply them to millions of data points within a few seconds.

Solid vs. Weak

While in active learning, the label source theoretically always provides a 100 percent accurate label, in weak supervision, we can have weak sources that cannot label all samples and can be less accurate.

Single vs. Multiple

Active learning is usually described as a system counting on a single and expensive source of labels. Weak supervision counts on many not so accurate sources.

Human-in-the-Loop vs. Prior Model Training

In active learning, the labels are provided as the model improves within the human-in-the-loop process. In comparison, in weak supervision, the noisy labels are provided from all weak sources before the model is trained.

From Movie Opinions to Any Classification Task

Our example about blending movie opinions from people was helpful in explaining the weak supervision framework on an intuitive example. However, for movie recommendation use cases, there are better algorithms than weak supervision (e.g., collaborative filtering). Weak supervision is powerful because it can be used anywhere where:

There is a classification task to be solved
You want to use supervised machine learning
The dataset to train your model is unlabeled
You can use weak label sources

Those requirements are quite flexible, making weak supervision versatile for a number of use cases where active learning would have been far more time-consuming in terms of manual labeling.

Your unlabeled dataset of documents, images, or customer data can have weak label sources just like you had “opinions from friends” on “movies.” These “friends” can be considered labeling functions that can label only a subset of your rows (in the example, that would be only those “movies” they have watched) with accuracy better than random. The “opinions” we had (“?” or “?”) are the output labels of the labeling functions.

We can then extend this solution to any machine learning classification problem with missing labels. Those output labels can be only two for binary classification, like in our example, or even more for the multi-class problem. If a labeling function is not able to label a sample, it can output a missing value (“–”).

While in active learning, the expensive expert was providing labels row by row; in weak supervision, we can simply ask the expert to provide a number of labeling functions. By labeling function, we mean any heuristic that, in the expert opinion, can correctly label a subset of labels. The expert should provide as many labeling functions as possible that cover as many rows as possible with as high an accuracy as possible (see Figure 1 below).

Figure 1: **A possible weak supervision framework** — a Domain Expert provides Labeling Functions to the system. The produced weak label sources are fed to the Label Model, which outputs the Probabilistic Labels to train the final Discriminative Model.

Labeling functions are only one example of weak label sources, though. You can, for example, use predictions of an old model, which was only working for old data points in the training set. You can blend with a public dataset or with information crawled from the internet or ask cheaper non-experts to label your data and treat them as weak label sources. Any strategy that can label a subset of your rows with accuracy better than random labeling can be added to your weak supervision input. The theory behind the Label Model (Figure 1) algorithm requires all label sources to be independent. However, recent research shows that this requirement holds even with a wide variety of weak label sources.

When dealing with tons of data and no labels at all, weak supervision’s flexibility in blending knowledge from different generic sources can be a solution in training an accurate model without asking any expensive expert to label thousands of samples.

In the next Guided Labeling Blog Post episode, we will look at how to train a document classifier in this way, using movie reviews: one more movie example via interactive views! Stay tuned!

This is an on-going series on guided labeling; see each episode at:

Data Topics