Follow

How to Audit a Pixel Labeling for Semantic Segmentation Job

As with any other CrowdFlower job, we recommend doing frequent, systematic, and thorough audits of your data. This is a must-do for estimating the accuracy on the most important data you’re feeding into your algorithm, and will highlight the areas of improvement for your job. A detailed audit will also allow you to communicate with your team internally about the core priorities for your data enrichment tasks.

Follow the below step-by-step guide in annotating images.

Understand your data accuracy needs

With semantic segmentation, your ontology (list of classes or objects you are annotating) may be 5, 10, 15, 25+ items long. With a very long ontology you may not be able to audit on every single one - imagine checking 35 elements for 50 images, each audit would take days.

Instead we recommend that you identify the top 3 to 5 classes in your ontology for your use case. If you’re training an autonomous vehicle, think pedestrians and road surface. If you’re training an object classifier to identify kittens in scenes, audit the kittens and not the puppies.

This allows you to audit the accuracy on your core classes (let’s say it’s sky, road, and pedestrians) and define accuracy goals and metrics for those fields. For example, you may want to have 95% accuracy in annotating pedestrian, and 80% on annotating the sky.

Know where to go to see your finalized images

In a semantic segmentation job, the output data is a raster mask of the image that is not perfectly human readable. The output data is encoded in a black to red image with each pixel representing a class in the ontology. Therefore we allow you to visualize the completed images in a finished or running job via the Data page.

First, go to your job and access the Data page. Sort the page based on the “Judgments” column to access images with judgments > 0.

Notice that the unit ids (which are generated as metadata every time you upload data into the CrowdFlower platform) are links. This link leads you to a preview-like mode of your image with an annotation.

In this view you’ll find what you need to audit the images:

  1. The image and the annotation
  2. The ability to interact with the image by changing the opacity slider, erasing or modifying annotations, etc.
  3. The ability to pop out the tool using the green button in the top right to see the image and its annotation in full screen

Take a special note of the url on this page, it should look something like this:

https://make.crowdflower.com/jobs/{{your_job_id_here}}/units/{{your_unit_id_here}}

Randomly sample 100 rows of recently finished data

It’s up to you to determine the right sample size for your audit, but at a minimum we recommend 30 images for each run of data you’d like to audit. You may need to audit more every time if you’ve selected a class to grade on that occurs infrequently.

Find the unit ids of the rows you’d like to audit - we recommend random sampling. There are many ways to do this but one fast and easy way is:

  1. Download the full report from your job
  2. If desired, filter down to rows completed in the previous day or iteration
  3. Create a new column next to the unit ids and use the RAND function of excel to generate a random number
  4. Order the sheet based on the random column
  5. Select the first 30-100 rows of unit ids for your audit

Set up your worksheet

For this step we recommend using Google Sheets - that way you can conduct the rest of the audit in a web browser, easy delegate to an intern or contractor, and collaborate with your team.

You’ll be working towards an audit like this:

  • To set up the audit worksheet follow these steps:
  1. Paste your randomly sampled unit ids to audit in Column A.
  2. Create a URL to the Data Page Unit Preview mentioned above for each unit to audit using the CONCAT function in google sheets. You’ll paste the first part of the url which is always the same with the unit ids.
    • =CONCAT(“https://make.crowdflower.com/jobs/{{your_job_id_here}}/units/”,A1) - where A1 is the cell with your unit id.
  3. Once you have a link to each unit you want to audit, you can create a column for every field in your ontology you want to audit on.
  4. We also recommend making a comments column to jot down any notes on that unit.

Conduct the audit

You can now score every image you want to audit on every class you want to audit on. You can assign the following scores:

  • “0” - Giving a zero is appropriate when
    • The class was not annotated or misclassified (ie: they marked the kitten as a dog)
    • If there are many instances of the class (many leaves on the tree) and fewer than half were correct
  • “1” - Giving a perfect score of 1 is appropriate when nothing was missed, and the annotation was done cleanly and precisely. You would do nothing differently.
  • “.1” to “.9” - feel free to assign partial credit when you’re not sure. For example, if there are two trees, and one was done perfectly and the other not as well, you may give a 0.5, indicating 50% accuracy on that class

Calculate Accuracy

You can calculate accuracy for each field by averaging all the values in each column you are scoring. You can also average all the scores for each image, so you can see the count of perfect or imperfect images.

There are many other metrics (recall, precision) you can derive by building off this framework of auditing.


Was this article helpful?
0 out of 0 found this helpful


Have more questions? Submit a request
Powered by Zendesk