top of page

How to Read and Actually Use a Test Item Analysis Report

  • Writer: Jacklyn DelPrete
    Jacklyn DelPrete
  • Feb 8
  • 4 min read

If you’ve ever opened a test item analysis report and thought, I have no idea where to start with this, then welcome. You’re in very good company.


Item analysis reports are packed with numbers, short on explanation, and often dropped into your inbox with zero guidance. But buried in that spreadsheet is information that can make your exams fairer, clearer, and easier to defend—once you know what to look for.


Below are the five item analysis statistics that matter most, what they mean, and how to use them without spiraling.



1. Item Difficulty (How many students got this question right)


Item difficulty is reported as a proportion or percentage (usually between 0.00 and 1.00). Despite the name, it does not describe how hard the question is —it only describes how students performed.

  • 0.90 = 90% of students answered correctly

  • 0.40 = 40% answered correctly


Why this matters: Item difficulty helps you determine whether a question functioned as intended. Every exam should have a range of difficulty—easy recall, moderate application, and harder synthesis questions.


Example: A question on basic infection control principles has a difficulty of 0.38.

That’s concerning because:

  • This content is foundational

  • It’s emphasized heavily in lecture

  • Students should demonstrate mastery


But what if a complex prioritization question has a difficulty of 0.38? That may be completely appropriate.


How to use it:

  • Compare difficulty to importance and timing of content

  • Look for items that are unexpectedly low or high

  • Don’t revise questions based on difficulty alone—pair it with discrimination



2. Discrimination Index (Did strong students perform better than weaker students?)


Discrimination tells you whether a question can differentiate between students who understand the material and those who don’t. This is one of the most important indicators of question quality.


High discrimination means:

  • High-performing students answered correctly

  • Lower-performing students were more likely to miss it

Low or negative discrimination means the opposite - that high performing students answered incorrectly.


Example: A question has:

  • Difficulty: 0.65

  • Discrimination: –0.12


This means students who did poorly overall were more likely to answer this question correctly than your top students.


That’s a sign of:

  • Ambiguous wording

  • A misleading stem

  • A “trick” question

  • Or a correct answer that isn’t clearly correct


How to use it:

  • Flag items with low or negative discrimination first

  • Review stem clarity and answer defensibility

  • Ask, “Is this question testing what I meant to test?”


This stat often identifies flawed items even when difficulty looks fine.



3. Point-Biserial Correlation (Does this question align with overall exam performance?)


The point-biserial measures how performance on one item relates to performance on the entire exam. Think of it as a consistency check.


A positive point-biserial means:

  • Students who did well overall tended to get this item right


A negative value suggests:

  • High-performing students missed this question

  • Lower-performing students got it right

That’s a major warning sign.


Example: A pharmacology question has acceptable difficulty and a negative point-biserial


This often means:

  • More than one answer seems correct

  • The “best” answer isn’t clearly the best

  • The question rewards test-taking strategy instead of knowledge


How to use it:

  • Treat negative point-biserial items as high priority for review

  • Look for subtle wording issues or outdated content

  • Consider whether the question aligns with course objectives



4. Distractor Analysis (Are the wrong answers actually working?)


Distractor analysis shows how often each incorrect answer was selected. This tells you whether your distractors are plausible and meaningful.


Good distractors:

  • Attract students with partial understanding

  • Reflect common misconceptions

  • Are selected by some students


Bad distractors:

  • Are rarely or never chosen

  • Are obviously wrong

  • Inflate item difficulty without improving discrimination


Example: A four-option question:

  • Correct answer: 70%

  • Distractor A: 25%

  • Distractor B: 3%

  • Distractor C: 2%


Distractors B and C aren’t doing any work.


How to use it:

  • Revise or replace distractors chosen by <5% of students

  • Use real student errors from assignments or exams to construct distractors

  • Strong distractors improve discrimination without increasing difficulty


Better distractors = better questions.



5. Reliability (KR-20 or Cronbach’s Alpha) (How consistent is the exam as a whole?)


Reliability measures whether your exam consistently assesses student knowledge across items. This is an exam-level statistic—not a judgment of individual questions or your competence as a faculty member.


Higher values indicate greater consistency, but context matters.


Example: An exam reliability of 0.70 may be perfectly acceptable for:

  • Short exams

  • New courses

  • Early semesters

  • Exams with diverse content areas

Reliability improves over time as poor-performing items are revised or removed.


How to use it:

  • Track reliability across semesters, not single exams

  • Use it to justify gradual test improvement

  • Pair it with item-level data to guide revisions



How Faculty Should Actually Use Item Analysis


You are not expected to fix every imperfect item immediately.


A realistic, defensible approach:

  • Identify 3–5 items to revise per exam

  • Prioritize low discrimination and negative point-biserial

  • Document your review and revisions

  • Improve exams incrementally over time


That’s how strong, defensible assessments are built.


Check out this FREE 1-page guide on test analysis!


Comments


  • Pinterest
  • Facebook

 

© 2026 by The Elevated NP LLC.

Powered and secured by Wix 

 

The Elevated NP is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn fees by linking to Amazon.com and affiliated sites.

bottom of page