Click to learn more about author Paolo Tamagnini.
Welcome to the sixth episode of our Guided Labeling Blog Series. In the
last episode, we made an analogy with a number of “friends” labeling “movies”
with three different outcomes: “good movie” (?), “not seen movie”
( – ), “bad movie” (?). We have seen how we can train a
machine learning model, also predicting movies no friend has watched before and
adding to the model additional feature data about such movies.
The other episodes are here:
GET UNLIMITED ACCESS TO 160+ ONLINE COURSES
Choose from a wide range of on-demand Data Management courses and comprehensive training programs with our premium subscription.
- Guided Labeling Episode 1: An Introduction to Active Learning
- Guided Labeling Episode 2: Label Density
- Guided Labeling Episode 3: Model Uncertainty
- Guided Labeling Episode 4: From Exploration to Exploitation
- Guided Labeling Episode 5: Blending Knowledge with Weak Supervision
Let’s pick up where we left off.
You can blend friends’ movie opinions into a single model, but how is this useful if you don’t have any labels to train a generic supervised model? How can weak supervision become an alternative to active learning in a generic classification task? How can this analogy with many “friends” labeling “movies” work better than a single human expert like in active learning?
Weak Supervision Instead of Active Learning
The key feature that differentiates active learning from weak supervision is the source of the labels we are using to train a generic classification model from an unlabeled dataset.
Unique vs. Flexible
In active learning, the source of labels — referred to in literature as the “oracle” — is usually quite unique, making it expensive and hard to find. This can be an expensive experiment, but, more often than not, we are talking about a subject matter expert (SME) that is a human with domain expertise. In weak supervision, the weak source can be a human with less expertise who makes mistakes but also something else like a heuristic, which applies only to a subset of the dataset.
IF “movie budget category” is “low”
AND “actor popularity” is “none”:
MOVIE LABEL = “?”
MOVIE LABEL = “-”
Of course, this rule (or heuristic) is not accurate at all and only applies to some movies, but this can be thought of as a weak source in weak supervision and considered a labeling function. In most cases, you will need an expensive human expert to build those heuristics, but this is still less time consuming than manual labeling work. Once you have a set of heuristics, you can apply them to millions of data points within a few seconds.
Solid vs. Weak
While in active learning, the label source theoretically always provides a 100 percent accurate label, in weak supervision, we can have weak sources that cannot label all samples and can be less accurate.
Single vs. Multiple
Active learning is usually described as a system counting on a single and expensive source of labels. Weak supervision counts on many not so accurate sources.
Human-in-the-Loop vs. Prior Model Training
In active learning, the labels are provided as the model improves within the human-in-the-loop process. In comparison, in weak supervision, the noisy labels are provided from all weak sources before the model is trained.
From Movie Opinions to Any Classification Task
Our example about blending movie opinions from people was helpful in explaining the weak supervision framework on an intuitive example. However, for movie recommendation use cases, there are better algorithms than weak supervision (e.g., collaborative filtering). Weak supervision is powerful because it can be used anywhere where:
- There is a classification task to be solved
- You want to use supervised machine learning
- The dataset to train your model is unlabeled
- You can use weak label sources
Those requirements are quite flexible, making weak supervision versatile for a number of use cases where active learning would have been far more time-consuming in terms of manual labeling.
Your unlabeled dataset of documents, images, or customer data can have weak label sources just like you had “opinions from friends” on “movies.” These “friends” can be considered labeling functions that can label only a subset of your rows (in the example, that would be only those “movies” they have watched) with accuracy better than random. The “opinions” we had (“?” or “?”) are the output labels of the labeling functions.
We can then extend this solution to any machine learning classification problem with missing labels. Those output labels can be only two for binary classification, like in our example, or even more for the multi-class problem. If a labeling function is not able to label a sample, it can output a missing value (“–”).
While in active learning, the expensive expert was providing labels row by row; in weak supervision, we can simply ask the expert to provide a number of labeling functions. By labeling function, we mean any heuristic that, in the expert opinion, can correctly label a subset of labels. The expert should provide as many labeling functions as possible that cover as many rows as possible with as high an accuracy as possible (see Figure 1 below).
Labeling functions are only one example of weak label sources, though. You can, for example, use predictions of an old model, which was only working for old data points in the training set. You can blend with a public dataset or with information crawled from the internet or ask cheaper non-experts to label your data and treat them as weak label sources. Any strategy that can label a subset of your rows with accuracy better than random labeling can be added to your weak supervision input. The theory behind the Label Model (Figure 1) algorithm requires all label sources to be independent. However, recent research shows that this requirement holds even with a wide variety of weak label sources.
When dealing with tons of data and no labels at all, weak supervision’s flexibility in blending knowledge from different generic sources can be a solution in training an accurate model without asking any expensive expert to label thousands of samples.
In the next Guided
Labeling Blog Post episode, we will look at how to train a document
classifier in this way, using movie reviews: one more movie example via
interactive views! Stay tuned!
This is an on-going series on guided labeling; see each episode at: