State of the clinical modelling program and international CKM

I sent this email to the openEHR lists yesterday. Published here for broader sharing, and perhaps as a resource for future benchmarking…

22 July 2019

Dear colleagues,

We recently passed the eleven-year anniversary for the first upload to the international CKM – the body temperature archetype. As Europe readies itself for summer holidays and the clinical review season slows down, it is a good time to review the progress of the openEHR clinical modelling program.

 Roughly 6 weeks ago I created and downloaded a number of reports from CKM. I’ve spent some time analysing the data and thought I’d share what I learned with you.

This exploration was triggered by a tweet from Ewan Davis last December asking:

“How many person hours do you think has gone in to creating the openEHR archetypes available via CKM - I think it must be in excess of 100,000 hours (40 person years)”

 It took a while to gather the data and propose reasonable assumptions so that we could make time and effort estimates, but here goes…

CKM stats

(As of July 5 2019):

  1. Community

    • Registered users – 2239

    • Countries represented – 95

  2. Archetype library

    • Total archetypes – 785

    • Active archetypes

  • Published – 93

  • Published as v1, needing reassessment - 6

  • In review – 31, with at least 7 about to be published

  • Draft - 351

  • Initial (in incubators) – 110

    • Proposed archetypes - 10

Behind the scenes

(from CKM reports, May 2019)

  1. Number of archetypes which have completed or are undergoing a review process – 130

  2. Number of review rounds completed - 295

  3. Number of archetype reviews completed by all reviewers – 2995

  4. Number of unique reviewers – 272

  5. Reviews completed per review round – 10.15

  6. Average number of reviews per archetype – 23.04

  7. Average number of reviews per reviewer – 11.01

  8. On average, during the past 12 months, approximately 100 unique reviewers logged into CKM on 900 occasions per month.

Time estimates

This is where things become interesting…


This equates to roughly 8.5 person years.

Obviously, I have made some assumptions about the average time for many activities and if we factor in incidental conversations or pondering modelling conundrums or cross pollination between CKMs we could reasonably increase the estimate to 10 person years. However, try as I could, there was no way I could justify bumping them up in order to achieve estimates of 20, much less 40, person years.

These numbers reflect the work for archetypes that are owned and managed in the international CKM. This includes an estimation of work done by the reviewers and editors from the Apperta and Norwegian CKMs if their archetypes are now residing in the international CKM, or multiple CKMs. It does not reflect the work done on reviews from the now retired Australian CKM, although estimates of design time have been part of the assumptions.

I interpret Ewan’s estimate to reflect his impression that the effort to achieve what we have done so far was huge. I too believed that the effort was epic, but in my head it was still only in the ballpark of about half of his initial estimate. That the actual effort appears to be only 8-10 person years totally surprised me. Initially my figures were considerably lower; I did go back to the figures and tried to massage them upward because this is obviously a rather inexact science, more like an educated guestimate, but this is as far as I feel comfortable going.

In addition, Thomas Beale estimates that on average there are 14 clinically significant data elements per archetype, according to the ADL Workbench. These are the relevant data points that we design, review etc. So 785 active archetypes x 14 data points/archetype suggests that we have a library of approximately 10,990 data points, none of which are duplicates or overlapping in the governed archetypes. And if we agree with my estimate of a total of 16289 hours, the amount of time per data element is 16289/10990 - only 1.48 hours each, covering design, review, maintenance, governance.

 What conclusions can we draw?

  • Firstly, modelling ‘openEHR style’ seems to be quite efficient, surprising even those of us who are involved daily and secondly, this unique collaborative and crowdsourced approach to standardisation of clinical data is working well. On top of that, if you remember that more than 95% of the editorial work and reviewer’s time has been volunteer, then it this truly has been an extraordinary community endeavour.

  • Secondly, the ratio of reviewer time to design time is noteworthy – 1498 hours of review, compared to 10437 hours of design. In effect, we have successfully minimised reviewer effort by making each 30-minute review count as efficiently as possible, and that has been achieved by attention to detail and spending time investigating and developing strong design patterns before we send them out for review. Over the years we have made some bad design choices and had to rethink our approach. Gradually we have been developing some good patterns and, before you ask where we have documented them, I will point you to the published archetypes – each of them functions as a potential pattern for the next archetype we intend to develop – we reference and reuse the patterns as much as possible. In this way our library is growing, and our modelling is improving. As an example, a current area of serious rework is the Physical examination archetypes which are being ‘renovated’ at present. It does make me think that for every hour spent in design it is a good investment of time and effort – that may not seem apparent in the early days, but I think that we are finding that it is paying off for the archetypes that we are designing years later, based on the (good and bad) learnings from those earliest archetype designs.

  • Thirdly, we have some insights into the modelling community, and for the first time we have some idea about the level of activity by those with various roles and activities. We also have an estimate of the size of the data library at data element level, so that we are able to compare to other similar modelling efforts elsewhere in the world.

I would particularly like to thank my co-lead, Silje Ljosland Bakke, and Ian McNicoll for their dedicated efforts, and of course to all of the other Editors, Reviewers and Translators who have so generously volunteered their time and expertise to create a strong free and public foundation for digital health data standards. 

We should all be very proud of this work. This will be our legacy that will live on after well after we’ve all long retired.

Kind regards

Heather Leslie