Comparing free and paid LLMs for health information inequality

Reading: Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot, Musheyev et al. (2024), JAMA Network Open.

An interesting question: for those who can afford it, do you get better health information from paid vs free ChatGPT?

If you prompt the LLM, the answer is no:

These findings suggest that a chatbot can exacerbate health information inequities, but precise prompting is associated with increased readability without compromising information quality.

What they did is:

  1. Considered five types of cancer: lung, breast, prostate, colorectal, skin.
  2. Gathered the top five Google Trends queries for each cancer.
  3. Fed these into ChatGPT and recorded the responses.

This is nice: they’ve taken popular questions people ask of “Dr Google” and see what an LLM will give you back.

They tried this without prompting before the question, and prefixing it with “Explain the following at a sixth grade reading level”. A total of 100 questions and responses (5 cancers x 5 queries x 2 LLMs x 2 prompts = 100).

The paper’s authors scored the information using a questionnaire called DISCERN. It asks things like “Is this information relevant?” and you score it 1 (no) to 5 (yes). They also measured the reading level of the text.

They found:

  • prompting for readability improves readability, for both paid and free LLMs.
  • the quality of information is about the same for paid and free.
Uploaded image
Figure from the paper: “(C) DISCERN scores, scored from 0 (low quality) to 5 (high quality). The lines within boxes indicate medians, the top and bottom of the boxes indicate IQRs, the whiskers indicate 95% CIs, and the circles indicate outliers.”

This was at the start of 2025, so they were comparing ChatGPT 3.5 to 4.0.