Chasing data patterns

Everyone’s crying out for us to publish a set of patterns for openEHR archetypes. And I’ve always pushed back because it’s just not that simple. I’ve spent inordinate amounts of time exploring patterns that have turned out to be dead ends. No one has done this stuff before. Our only foundation has been the brilliant, abstract RM classes which have underpinned our modelling work and provided a solid base for all our models. Conceived nearly 25 years ago, the classes are more than a decade older than the first-ever, ubiquitous, Blood Pressure archetype and yet with each archetype I build I can further appreciate the elegance of these classes and the foresight of the original openEHR specs developers - @wolandscat and @samheardoi.

If we consider how to model smoking and alcohol data, back in 2009 we had this model for the concept of ‘Tobacco use summary’ (still in CKM, but now rejected) and it was a specialisation of a more generic ‘Substance use summary’ as the theory was, quite reasonably, that maybe all substance use followed a similar pattern.

Specialisation of 'Substance use summary' for all tobacco use

Specialisation of 'Substance use summary' for all tobacco use

However, the issues related to managing specialisation across a family of substance use archetypes proved too hard, and pattern-demanders please take note, the clinical requirements for recording have a lot in common but at the same time just enough difference to make it impossible for a neat parent-child specialisation pattern to work.

In 2013 we developed a stand-alone EVALUATION for ‘Tobacco use summary’ (also now rejected) and we struggled with the tension about clinical requirements needing to record ‘typical’ or summary tobacco use as well as the specific data that would be required in a smoking diary, which would require an OBSERVATION class of archetype to capture the data as points in time or over interval events.

First version of the Stand-alone 'Tobacco use summary' archetype

First version of the Stand-alone 'Tobacco use summary' archetype

In 2016 we abandoned that archetype and reattempted to create the perfect model. This time, after considerable consultation, we changed the scope to ‘Tobacco smoking summary’ rather than all 'Use'. This new archetype focused on the health priority about smoke inhalation.  The specifics of nicotine ingestion, vaping/e-cigarettes, use of chewing tobacco and snus use were explicitly excluded from scope and will need to be described in separate archetypes. See the recent first draft EVALUATION for ‘Smokeless tobacco summary’ as an example of other related concepts modelled as part of the tobacco family of archetypes.

This archetype eventually was revised and refined. In 2016 we identified an addiction specialist who was willing to act as subject matter expert and this accelerated progress with this version of Tobacco smoking summary finally being published in November 2016.

Published 'Tobacco smoking summary' archetype

Published 'Tobacco smoking summary' archetype

Trying to be good modellers who identify patterns we clinician modellers assumed that this pattern could be applied to Alcohol consumption as well – it needed to identify episodes and types of alcohol etc. We thought we just needed to edit some of the words, remove ‘Pack years’ and change some units and the archetype would be pretty good to go. So we did just that. We created an ‘Alcohol use summary’ that mirrored the published tobacco smoking archetype as much as possible. You can see the similarity for yourself.

'Alcohol use summary' archetype based on the published tobacco pattern

'Alcohol use summary' archetype based on the published tobacco pattern

But, despite our best efforts, our wise clinical community have pointed out the flaws in our pattern logic! Alcohol actually needs to be recorded slightly differently. It needs to record the number of standard drinks per period as a priority and the type of alcohol is usually incidental – the exact opposite to the way we’ve modelled tobacco smoking. Even after only part way through facilitating the first review round for ‘Alcohol use summary’ this is the proposed new pattern.

Latest proposed pattern incorporating major changes reflecting priority of the 'Per episode' over the type of alcohol, as at August 23, 2018

Latest proposed pattern incorporating major changes reflecting priority of the 'Per episode' over the type of alcohol, as at August 23, 2018

And as a result, given we really do seek patterns where we can, this new insight requires us to go back and revisit the Tobacco smoking archetype and just check whether this new pattern might improve the model further.

Clinicians will have a lot more empathy with the messy journey that we’ve been on, because clinical medicine and the recording thereof is inherently messy and follows the ‘rules were meant to be broken’ philosophy rather than following nice, neat patterns.

Dear engineer colleagues, please try to understand that the domain we are modelling is not simple or clear cut. The reality is that every time we think we can identify a common pattern, almost immediately we find a use case that breaks it. We know this isn’t ideal, but it is our reality. That said, we have identified some useful patternish things and we will endeavour to document this better in the future.