Quality indicators & the wisdom of crowds

The Collaboration & Verification step during the archetype development process in particular poses a particular challenge to determine appropriate quality criteria as it is performed online using the Clinical Knowledge Manager and, in the main, on a volunteer basis. This is effectively a form of crowd-sourcing in a Web2.0 environment known as 'crowd wisdom', and from my research and enquiries there does not appear to be other examples of quality assessments based on crowd-sourced information. I would be very pleased if anyone could identify similar efforts in this or other domains that I have missed. While benefits from 'the wisdom of crowds' has been identified since 1907, beginning with Francis Galton's account of fair-goers guessing an ox's weight, there are clearly also some valid concerns that need to be addressed. How do we determine that 'the crowd' is appropriate for the collective task and the outcome is a safe and credible?

No doubt there will be concern expressed by some that we have no specific knowledge of each individual participating in a voluntary and collaborative process - that they have not been appointed as an expert for the task. It is true that they may have less credentials than might be chosen for appointment to a formal committee, but they may contribute critical grassroots knowledge that may not otherwise be available or, conversely, be more qualified than those experts we could gather together in one physical location at one specific time. It is very possible that the caliber of the CKM review group may actually be greater than those appointed by peers. The effect of individuals who have an 'outlier', or extreme, opinion will usually be balanced by the effect of the 'crowd'.

Consider also that it is just not possible to appoint a single group of experts, qualified and accredited in the traditional sense, who are equipped to arbitrate on the fitness of every single clinical model – model quality will be compromised because of the sheer breadth and depth of clinical knowledge that we need to capture in archetypes. For a task this large and diverse, one committee to verify every archetype will be woefully inadequate; specific committees determined according to a profession or a domain expertise will not fulfill the task either as many archetypes cross professional and domain boundaries. An ideal scenario is to draw together committees of experts that are appropriate for each individual archetype - comprising those who have expressed an explicit interest in nurturing this archetype through to publication and implementation ('Adopters' in CKM); representation from a broad range of relevant professional backgrounds, from a broad range of domain expertise and from a range of geographical locations; and still others who can be identified as having specific expertise in the area - only then can we hope to ensure that all aspects of the archetype are adequately addressed.

Pointers to the current maturity of any model and its' readiness for publication through the Collaboration and Verification process might include:

Collaboration & Verification Process Criteria Indicators
PROCESS An appropriate collaborative, peer review process has been followed
  • Number of review rounds completed <CKM derived>
  • Number of completed individual reviews <CKM derived>
An appropriate reviewer community has contributed to the reviews
  • Total number of unique reviewers <CKM derived>
  • Professions (more than one may be selected):
    • Number of Clinicians <CKM derived>
      • Medical <CKM derived>
      • Nursing <CKM derived>
      • etc
    • Number of Health informaticians/ technical experts
    • Number of Terminologists <CKM derived>
    • Number of Administrators    <CKM derived>
    • Number of Consumers <CKM derived>
  • Domain expertise breakdown (more than one may be selected):
    • Cardiology <CKM derived>
    • General Practice <CKM derived>
    • Allied health <CKM derived>
    • Etc
  • Geographical location of reviewers <CKM derived>
EVIDENCE Appropriate references have been supplied, where appropriate
  • <Insert all references> <CKM derived>
FIT FOR PURPOSE The model fulfils identified requirements (eg business, technical and stakeholder reqts)
  • Program X: <Manually assessed and inserted>
  • Standard Y: <Manually assessed and inserted>
  • Specification Z: <Manually assessed and inserted>
  • ...
The state of Reviewer consensus is appropriate:
  • Latest Review Round recommendations <CKM derived>:
    • Accept – ready for publication, or minor changes have been identified that don't impact on the meaning of the model when corrected (eg typo's)
    • Minor changes – issues identified that are content-related eg modification of metadata, element naming or descriptions, but are not structural
    • Major changes – issues identified that are structural and require significant re-organisation or re-working of the model
    • Reject – the model is fundamentally flawed
    • Abstain
Clinical safety assessment has been completed/up-to-date
  • <Manually assessed and inserted>
Clinical safety assessment status is appropriate
  • <Manually assessed and inserted>

Acknowledgements: Ian McNicoll, Sebastian Garde, Hugh Leslie, Mary Kelaher, Stephen Chu.

From the indicators above it is possible to draw some simple conclusions regarding the reviewer community, in terms of the size of the reviewer community and the balance in terms of domain expertise, professional involvement and geographical distribution. For example, if there are small numbers for a common archetype concept, it may be necessary to suspend the process until these factors can be rectified; on the other hand, for an obscure or specialised archetype concept small numbers may be acceptable. If there are few clinicians reviewing an archetype for a clinical concept, this should also raise alarm bells. In addition, the level of community has consensus will also indicate some sense of progress - a majority of 'Minor change' recommendations after 4 review rounds can indicate good progress; all 'Accept' recommendations after 'x' review rounds indicates consensus and agreement if the constituted community is well balanced; 'Major change' or 'Reject' recommendations after a number of review rounds can indicate an archetype in trouble and requiring investigation.

In order to support the crowd-wisdom approach, we need to provide adequate opportunities for those observing from 'outside' to have insight into the collaboration and verification process, plus make sure that the decision-making process is transparent, via the Clinical Knowledge Manager tool.  A single Governance Committee could be appointed to have oversight of the CKM processes using the quality criteria and indicators as markers and to enable formal sign off of the models prior to publication. This group will ensure that:

  • a quality processes has been followed in the development and evolution of the archetype; and
  • that the quality indicators as measured or assessed confirm that the processes have been appropriate for that archetype.

Harnessing the 'wisdom of the crowd' has the potential to be more powerful than traditional quality processes for a task of this nature, to determine the quality of the clinical content in our electronic health records - we need to learn how best to tap into that wisdom.

It will not be perfect - and the fail-safe is that if something has been missed or done inappropriately, then the CKM community of independent individuals have mechanisms to point out the flaws, errors or concerns to the wider CKM community. The Editors and the Governance committee, always accountable to the community, will be required to respond to and resolve the issues raised to the satisfaction of the CKM community.

Appropriate is a word that I've used quite a bit in this post. I'm fully aware that it is in defining 'appropriate' in each use-case that we meet our next challenge...

This post reflects my current thoughts; no doubt these will be refined further as we gain experience. Will keep you posted :)