Using Cohen's Kappa to Evaluate IRR
Training Center tests can be created to evaluate the level of agreement between two coders in relation to a specific set of codes and excerpts. The Dedoose Training Center is designed to assist research teams in building and maintaining inter-rater reliability for both code (the application of codes to excerpts) and code weighting/rating (the application of specified weighting/rating scales) associated with code applications.
This feature is available for text-based documents only. If you have a different form of data, please reference the Collaborative Code and Compare guide.
Training sessions (‘tests’) are specified based on the coding and rating of an ‘expert’ coder/rater. Creating a training session requires selecting the codes to be included in the ‘test,’ selecting previously coded/rated excerpts to include in the test, and then giving the test a name and a description. When accessing the test, ‘trainees’ are prompted to apply codes or weights to the set of excerpts provided. When the session has been completed, overall and code-specific results include Cohen’s Kappa coefficient for code application tests and Pearson’s correlation coefficient for code weighting/ratings tests Further, details are provided to examine agreement/disagreement between the 'expert' and 'trainee' on an excerpt-by-excerpt basis.
Caution: Kappa is only a reliable statistic if sufficient data are available and included in the test. You will have better results if you test fewer codes and more excerpts associated with those codes. You may want to begin by only testing codes that are essential to your study and/or are used frequently. For example, if you choose 5 essential codes to test and 15 excerpts per code, the subsequent test will have 75 excerpts for the test-taker to evaluate.
Step 1: Create a Test
To create a new test:
- Click the ‘Create New Test’ button in the lower right corner of the Training Center workspace
- Select either ‘Code Application’ or ‘Code Weighting’ (the process is identical for setting up tests and our illustration here will focus on code application tests) and click ‘Next’
- Select the codes to be included in the test and click ‘Next’
- A window with the Excerpts panel opens. Select the excerpts to include in the test. To see which excerpts contains the codes you will use for testing, check the box next to the relevant Codes in Columns. The columns for the codes in the Excerpt panel will include ‘true’ or ‘false’ in each row, indicating which excerpts contain these codes. After selecting the desired excerpts, click ‘Next’
- Provide a title and description for the test, click ‘Save,’ and setup is complete.
Tip for Selecting Appropriate Excerpts:
- Use the Columns Panel on the left to choose the codes you are including in the test
- Navigate to the Filters Panel and the desired code; select the box "True" in order to see only excerpts that have the desired code applied. You can also select excerpts that fall into the "False" category if you want to test excerpts that did not have the code applied.
- Select the excerpts from the Excerpts Panel that you want to include
- Repeat for every code you have in the test
Step 2: Take a Test
Once a test is saved to the training center test library, a team member can take the test by:
- Clicking the test in the list and then choosing the ‘Take this test’ button in the lower right corner of the panel
- In a code application test, the trainee is presented with each excerpt and the codes designated for the test and then expected to apply the appropriate code(s) to each excerpt. They can move back and forth through the test using the ‘Back’ and ‘Next’ buttons until they are finished.
Step 3: Evaluate Results
- Upon completion the results of the test are presented, including a pooled Cohen’s Kappa coefficient and Kappa for each of the codes included in the test. You will also find documentation and citations for interpreting these inter-rater reliability results and reporting in manuscripts and reports.
Note: By clicking the ‘View Code Applications’ button teams can review, excerpt by excerpt, how the "trainer/PI" and "trainee/coder" each coded the excerpt. This information and the conversations they inspire are invaluable in developing and documenting code application criteria and toward building and maintaining desirable levels of inter-rater reliability