CrowdFlower lets users crowdsource massive, repetitive jobs to an immense workforce of reliable contributors. Here’s how it works.
Fig. 1: Explanation of Jobs, Rows, Pages, and Judgments
Users come to the CrowdFlower Platform seeking help to complete a job. Jobs come in many shapes and sizes, but tend to have a few things in common:
- Jobs are typically sizable tasks that would be unreasonable or inefficient for one person (or even a small team of people) to complete on their own.
- They can be completed from a computer but usually cannot be fully automated, or carried out by a computer.
- They can be divided into smaller tasks that members of the crowd, or contributors, can complete within the context of the greater job at hand.
Jobs can include anything from getting contributors to interpret images, providing subjective feedback on a logo, tagging and categorizing a series of webpages, or helping manually transform data into a consistent format. Indeed, there are many types of jobs that are ideal for crowdsourcing.
Rows and Pages
When users log in to CrowdFlower and create a new job, the first option that appears is to load data into the job. Whether data is uploaded as a data file or loaded from elsewhere on the web, it is represented as a data file on the CrowdFlower platform. Rows of data, represent individual data points that can be presented to contributors to work on.
Fig. 2: Example of a "Row" in the platform
While working on your job, contributors receive batches of rows to work on at a time. These batches of rows are called Pages. Contributors are paid on their submission of each page. They are not compensated per individual row, rather per each group of rows, or page. The number of rows per page can be set on the Job Settings Page. An average page consists of 5-10 rows.
The CrowdFlower Platform is designed to let users efficiently process data at scale. But the rate at which data can be processed means little if the quality of results isn’t top notch. That’s why the CrowdFlower Platform has built-in mechanisms that maintain high-quality data by identifying trustworthy contributors from the crowd, and using their responses to corroborate others.
Picture an individual row within a page that is presented to a contributor in the crowd, like moderating a photo for inappropriate content. A well-designed job would provide clear, objective instructions explaining what constitutes appropriate content in a photo. To be sure you get the right answer, you can collect multiple judgments, and compare them to one another or aggregate to the top response. For each job, you can specify the number of judgments you would like each row to receive. If you would like five judgments per row, that means five different contributors will need to provide an answer to every row before the job is finished.
Test Questions are "hidden" questions with predetermined answers that are used to test contributors and calculate their individual Trust Scores. The Trust Score is a measure of the contributor's accuracy over all of the Test Questions they’ve submitted. Test Questions are the primary quality control feature of the CrowdFlower platform.
Test Questions are used in three ways:
- Before being able to complete any work, Contributors must prove their understanding of your job by passing Quiz Mode. Quiz mode is a page comprised entirely of Test Questions. Contributors must complete the Quiz with a higher accuracy then the Minimum Accuracy set on the Settings page, in order to enter the job. Contributors that don't pass Quiz mode will not be paid and will not be allowed to work on your job.
- Contributors are tested on an on-going basis as they work through a job. We randomly insert a Test Question into every page. Contributors will not be able to tell which questions are test questions and which ones aren't. Their accuracy on these Test Questions allow us to calculate the contributor’s Trust Score. For contributors whose Trust Score falls below the pre-defined accuracy threshold within a job, we will stop them from working on your task, discard all their work, and re-launch the rows they worked on. This way we ensure that only work from Trusted contributors is collected.
- If a contributor gets a Test Question incorrect, they are shown the correct answer and a reason why this is the correct answer. This is a fantastic opportunity to further train your contributors and give instructions about edge cases in your data that might not have been fully covered in your instructions. Test Question reasons can be a powerful teaching tool, so be sure to create them as thorough as possible.
A CrowdFlower Job comprises a set of tasks—each of which requires contributors to perform one or more actions. But some jobs are long and complex, and delivering the entire scope of the job to the crowd in one job could be inefficient and confusing. In such cases, it may be preferable to set up a Workflow, or a series of interrelated jobs that can either be run simultaneously or sequentially. In the latter case, a Workflow can mean that results from one job flow into a second job by acting as the source of data for some secondary task.