Follow

Integrating with the API

The CrowdFlower API allows you to post data to, launch, and receive results from CrowdFlower jobs in an automated fashion. This document provides an overview of the necessary process to complete a typical CrowdFlower API integration. The following topics are covered in this article:

 

API Key

CrowdFlower uses a RESTful API that accepts data as URL-encoded key value pairs. Responses are restricted to JSON format and authentication is key-based.

Prior to integrating with the CrowdFlower API, you will need to find your API key.

1. First click the Account link located in the menu that appears when you hover over your username in the top right.


2. Click the "Your Details" tab and find your key listed under the "Your API Key" section.

 

Posting Data

With CrowdFlower’s API you can post data unit by unit (row by row) or via CSV or JSON upload. Posting unit by unit is preferred because it allows you to send data in real time and facilitates more granular monitoring and error handling capabilities.

Below we provide you with two different implementations for posting data to the CrowdFlower API: unit by unit and with a CSV file. Please observe the following conventions:

URLs: All URLs begin with the following pattern - https://api.crowdflower.com/v1/...

Authentication: Each request to the API must contain a key parameter with your API key (see examples below).

Format: The API currently only supports JSON. You must set the HTTP header “Accept:” application/json or append .json to the URL of the request (see examples below).
 

Unit by Unit Posting Examples

cURL:

curl -d 'unit[data][column1]=helloworld' -d 'unit[data][column2]=helloworld2' 'https://api.crowdflower.com/v1/jobs/{job_id}/units.json?key={Your_CrowdFlower_API_Key}'

Ruby HTTParty Gem:

require 'httparty'   

  API_KEY = "Your_CrowdFlower_API_Key"
  job_id = "Your_Job_ID_or_Alias"
  data = {column1:"helloworld", column2:"helloworld2"}

  HTTParty.post(
  "https://api.crowdflower.com/v1/jobs/{job_id}/units.json", :query => {
  :key => API_KEY,
  :unit => {
  :data => data,
  :state => :new
}
}, :timeout => 5
)

Ruby CrowdFlower Gem (https://github.com/dolores/ruby-crowdflower):

require 'crowdflower'

  API_KEY = "Your_CrowdFlower_API_Key"
  DOMAIN_BASE = "https://api.crowdflower.com"
  job_id = "Your_Job_ID_or_Alias"

  CrowdFlower::Job.connect! API_KEY, DOMAIN_BASE
  job = CrowdFlower::Job.new(job_id)
  unit = CrowdFlower::Unit.new(job)
  unit.create({column1:"helloworld", column2:"helloworld2"})

Python Requests Lib:

import requests
    import json

    API_KEY = "Your_CrowdFlower_API_Key"
    job_id = "Your_Job_ID_or_Alias"

    data = {'column1': 'helloworld', 'column2': 'helloworld2'}

    request_url = "https://api.crowdflower.com/v1/jobs/{}/units.json".format(job_id)
    headers = {'content-type': 'application/json'}

    payload = {
    'key': API_KEY,
    'unit': {
    'data': data
  }
}

requests.post(request_url, data=json.dumps(payload), headers=headers)

 

CSV-Based Posting Examples

cURL:

curl -X PUT -T 'sampledata.csv' -H 'Content-Type: text/csv' http://api.crowdflower.com/v1/jobs/{job_id}/upload.json?key={Your_CrowdFlower_API_Key}

Note: the fie uploaded must be UTF-8 encoded.

 
Ruby HTTParty Gem:
 

require 'httparty'   

  API_KEY = "Your_CrowdFlower_API_Key"
  job_id = "Your_Job_ID_or_Alias"

  HTTParty.put(
  "http://api.crowdflower.com/v1/jobs/#{job_id}/upload",
  :body => File.read("sample.csv"),
  :headers => {"content-type" => "text/csv"},
  :query => { :key => API_KEY}
  )

Ruby CrowdFlower Gem (https://github.com/dolores/ruby-crowdflower):

require 'crowdflower'

  API_KEY = "Your_CrowdFlower_API_Key"
  DOMAIN_BASE = "https://api.crowdflower.com"
  job_id = "Your_Job_ID_or_Alias"

  CrowdFlower::Job.connect! API_KEY, DOMAIN_BASE
  job = CrowdFlower::Job.new(job_id)
  job.upload(File.dirname(__FILE__) + "/sample.csv", "text/csv")

Python Requests Lib:

import requests
  import json

  API_KEY = "Your_CrowdFlower_API_Key"
  job_id = "Your_Job_ID_or_Alias"

  file_path = "sample.csv"
  csv_file = open(file_path, 'rb')

  request_url = "https://api.crowdflower.com/v1/jobs/{}/upload".format(job_id)
  headers = {'content-type': 'text/csv'}
  payload = { 'key': API_KEY }

  requests.put(request_url, data=csv_file, params=payload, headers=headers)

 

Messaging

After your request, an HTTP status code will be sent in response, indicating whether your request was successful (See Below for a list of status codes and their definitions). If the status code of a response is something other than 200, you will find a JSON message in the body of the response that will give you the status of a given operation or report an error. CrowdFlower returns one of three messages: :

 

{message: {success: "Job created successfully."}}
  {message: {notice: "Your job is being completed."}}
  {message: {error: "Job could not be canceled."}}

 

API Security

CrowdFlower uses authentication keys to ensure that APIs are only accessible to those with the proper privileges. All API postings are made over a Secure Sockets Layer (SSL) connection, which encrypts communications between the user and web server to ensure data remains private. Note that this means all requests must be prepended by https://.

 

PHP API Integration

To integrate with the CrowdFlower API using PHP, please visit http://github.com/dadeg/php-crowdflower/.

Note: https://github.com/dadeg/php-crowdflower/ is an open source package. CrowdFlower is not responsible for the maintenance and functionality of this package. Please contact the GitHub owner if you have any questions about the package.

 

Receiving Results via Webhook URI

Real-time delivery of CrowdFlower results is handled via a webhook that provides callbacks between CrowdFlower and your web application.

Enter the webook in the Job Settings - API tab

CrowdFlower will fire HTTP POST requests to your Webhook URI as each unit in a job reaches its finalized state.  Each POST will feature three parameters: signal, payload and signature.

  • The signal defines the payload. Results for a unit will be delivered with a "unit complete" signal.
  • The payload parameter contains a JSON representation of all the data associated with the unit (See below for a verbose explanation of the payload parameter).
  • The signature contains a SHA1 encrypted version of your secret key concatenated with the payload i.e., signature = sha1_encrypt(payload + api_key).


Messaging

Your webhook server will need to provide a 200 response with valid JSON regardless of whether your application accepts the request.


Webhook Security


Using the webhook's signature: You will be able to sign CrowdFlower's posts by validating the sha1 signature attribute in the body of the payload.  The signature is a sha1 encrypted hash of the unparsed payload concatenated with your secret API key (i.e., signature = sha1(payload + your_api_key)). You can validate the webhook by generating the same sha1 hash and comparing it to the signature sent with the payload (e.g., webhook.valid? if payload[:signature] == sha1(payload + my_api_key))

Via basic authentication: If your endpoint (webhook) supports basic authentication, you can set the webhook in your job using the following convention: 'http://username:password@webhook'. The webhook will be posted to your server using HTTP basic authentication via the username:password credentials that you provide.

 

The Payload Attribute of a Webhook Post

At the top level in the payload you will encounter the following attributes:

  • results – Any unit that has gathered trusted judgments will also have a results attribute.   The results attribute is a hash containing all of the results and judgment data for the unit.
  • job_id – This is the numerical id of the job on CrowdFlower’s platform.
  • updated_at – Time stamp for when the unit was last updated
  • data – The data attribute is a hash containing the source data for the unit.
  • judgment_count – The total count of judgments on the unit.
  • created_at – This is the time when the unit was created.
  • state – Golden (a test question), judgeable (an unfinalized unit), finalized (a finalized unit).
  • id – CrowdFlower’s unique numeric id of the unit.

Results

It’s important to note that the results attribute may feature different fields from unit to unit depending on the following:

  1. The number of judgments that were collected for the unit
  2. The content of each of the judgments (what questions in the form were responded to).

Each of the child attributes of the results attribute are discussed in greater detail below.

Field Results

Within the results hash, there is a field result attribute for each question in the job that received a judgment. This attribute represents the final answer of that question, and is itself presented as a hash. These attributes will be keyed according to how they are named in the job.

Example:


  "field_name":{
    "confidence":1.0,
    "agg":"true"
  }

 

Here, confidence represents the confidence value for the response that CrowdFlower deems to be correct. Agg represents the value of the correct answer itself.

Note: Often, the form in a job is designed with logical contingencies that determine the questions that are asked for any given unit. Field results only exist for questions/fields that received judgments. So if every CrowdFlower contributor answered the same for a given unit without exposing a logic-hidden field, that field will not exist in the results hash.

Judgments

Results also contains a top-level attribute judgments, which contains a JSON array of all the Judgments that have been gathered so far during your job. This section of the unit.json is analogous to a row in a CrowdFlower full CSV, which contains every value for every judgment submitted for a job. Note the definitions for each key in bold.


  "judgments":[{
    "worker_id":131327,
    "city":"San Francisco",
    "job_id":110870,
    "external_type":"mob",
    "tainted":false,
    "data":{
      "field_name1":"answer1",
      "field_name2":"answer2"
    },
    "unit_data":{
      "your_data":"your_data",
      "your_data1":"your_data1",
      "your_data2":"your_data2"
    },
    "trust":1.0,
    "golden":false,
    "judgment":1,
    "created_at":"2012-07-11T19:31:18+00:00",
    "unit_id":181077044,
    "unit_state":"finalized",
    "region":"CA",
    "country":"USA",
    "rejected":null,
    "started_at":"2012-07-11T19:30:18+00:00",
    "id":518147165,
    "worker_trust":0.995059523809524, 
    "missed":null
  }]

 

Let’s see what it looks like with definitions instead:


  "judgments":[{
    "worker_id": CrowdFlower Contributor ID,
    "city": worker city
    "job_id": job number (ID),
    "external_type": Channel via which the contributor entered the job.
    "tainted": Boolean – is judgment from an untrusted contributor,
    "data": {
      “data” is a hash attribute that contains a workers responses to each of the questions they responded to. It is the judgment level version of the field result attribute:
      "field_name1":"answer1",
      "field_name2":"answer2"
    },
    "unit_data":{
      “unit_data is a hash attribute that contains the initial data that you posted for this unit (also called "source data"):
      "your_data":"your_data",
      "your_data1":"your_data1",
      "your_data2":"your_data2"
    },
    "trust": The trust of the contributor,
    "golden": Boolean - is this a gold (test) unit or not,
    "judgment": Numeric indices of the judgment for this unit,
    "created_at": Timestamp for the submission of the judgment,
    "unit_id": CrowdFlower’s numeric ID for the unit,
    "unit_state": State of the unit, should always read: "finalized",
    "region": Worker region,
    "country": Worker country code,
    "rejected": If the contributor was rejected or not,
    "started_at":Timestamp for the start of the judgment,
    "id": Judgement ID (Unique identifier),
    "worker_trust": Contributor trust (accuracy on hidden test questions),
    "missed": You may safely ignore this attribute.
  }]


Status Codes

The potential HTTP status codes that may be returned following an API request are listed in the Responses & Messaging article

 

Rate Limiting

There are two rate limits enforced in CrowdFlower's API:

  1. 40 requests every second
  2. 6200 requests every hour

Was this article helpful?
1 out of 1 found this helpful


Have more questions? Submit a request
Powered by Zendesk