Results of a Novel and Comprehensive Training Program for Standardization of 2 -Dimensional ltrasound Interpretation as a Precursor to a Multicenter Trial
We report the process employed to design a curriculum and to set minimum standards for experts from 3 disciplines at 8 clinical sites in obtaining and interpreting standardized EUS for the NIH-sponsored “Behavioral Therapy versus Usual Care
in Primiparous Women with Anal Sphincter Tears and Fecal Incontinence” (BOOST) Trial. The BOOST Trial was originally designed as a multicenter, randomized trial of behavioral therapy for fecal incontinence (FI) in primiparous women sustaining an Obstetrical Anal Sphincter Injury. At 6 weeks postpartum, participants who had FI were randomized to behavior therapy or usual care. After initiation of the trial, the rates of FI were lower than predicted and it was not feasible to complete the trial in a reasonable timeframe3.
Endoanal Ultrasound Equipment and Settings
A representative image from the midanal canal (MAC) was obtained from each site as a Joint Photographic Experts Group (JPEG) file that identified: 1) site number, 2) megahertz (MHz) settings, 3) age and parity status of the subject, if known, and d) symptomatology. All images were stripped of other identifying data, including site name, subject name and medical record number. These images were used to compare the existing best practice images from each site and to determine outliers. The protocol committee reviewed each of the images in detail. Following review of the MAC images, other images from each site were requested for inclusion in testing/training program at UNC. Images requested included: 1) one normal examination at high, mid, and low anal canal levels; 2) at least 1 and up to 4 abnormal examinations that included at least 1 abnormal finding at the MAC level; and 3) one examination thought to be un-interpretable (over or underexposed) or sphincter defect undetermined (scar, fragmentation).
Sonographic Definitions of Anal Canal Levels, Anal Sphincters, and Anal Sphincter Defects
The sonographic anal canal, anal sphincters, and sphincter defects were defined as previously described by Corton 4 and summarized below.
Anal Canal Levels
The high anal canal (HAC) was defined as the region from the lowest level of the puborectalis muscle “slings” to the level where the external anal sphincter (EAS) formed a complete ring anteriorly (See Figure 1).
The puborectalis slings were defined as the right and left portion of the puborectalis muscle that extended anteriorly toward the inner surface of pubic bones. This level was called HAC 1 (Figure 1). Still in the HAC, but 1-5 mm distal to the lowest level of the puborectalis, the hyperechoic external anal sphincter may or may not have formed a complete ring anteriorly around the anal canal. Thus, absence of continuity of the EAS anteriorly at this level was not considered a defect. This level was called HAC2 (Figure 1). The mid anal canal (MAC) was defined as the region where the EAS muscle formed a complete ring anteriorly around hypoechoic the internal anal sphincter (IAS) (Figure 2). It extended inferiorly to the most distal end of the IAS. The low anal canal (LAC) was defined as the region below the end of the IAS muscle where only the EAS was identified (Figure 3).
Figure 2. Cross-sectional image at the mid anal canal level in a patient with normal sphincters.a = anal submucosa; b = internal anal sphincter; c = external anal sphincter.
Figure 3. Cross-sectional image at the low anal canal level illustrating complete continuity of the external anal sphincter (c). Note that the internal anal sphincter is no longer visualized at this level.
Anal Sphincter Defects
Figure 4. Representative image at the mid anal canal level in a patient with internal (b) and external anal sphincter (c) defects. Arrows indicate lateral borders of internal anal sphincter defect; arrowheads indicate lateral borders of external anal sphincter defect.
The number of ultrasound images to be reviewed was set to assess individual rater’s qualifications, as well as to certify the participant during the training session. With 21 images and15 readers per ultrasound image, there was at least 80% power to detect that an intraclass correlation (ICC) of 0.70 reflecting the alternative hypothesis was significantly different from 0.50, the ICC of the null hypothesis, with a one-sided Type I error of 0.05 5. The BOOST Ultrasound Committee defined the qualification criteria as part of the training curriculum; a reader was considered qualified if s/he was at least 75% concordant with the expert radiologist’s assessment (the gold standard) for categorizing a defect in the internal anal sphincter (IAS) and at
least 50% concordant with the expert for interpreting defects in the external anal sphincter (EAS).
Data on study interpretability (yes/no), anal canal level (distal, mid, proximal), and presence or absence of an EAS and/or IAS defect were summarized. ICCs among all readers excluding the expert radiologist and corresponding 95% confidence intervals (95% CI) were calculated separately to assess inter-reader concordance for IAS and EAS6. However, the value of ICC is dependent on the coding of the responses (calculations used 1=no, 2=yes and 0=not applicable). Given that there is no natural ordering for yes, no and not applicable, kappa estimates of agreement among multiple rates for nominal data [x] were also calculated as confirmatory measures because they treat the EUS responses as discrete values (presence or absence of sphincter defect, not applicable if poor image quality). We used the method described by Fleiss (1976) to calculate kappa estimates of agreement among multiple raters for nominal or ordinal scale data7. This method calculates the correct stan-dard error and addresses the problem noted by the reviewer of having more than two categories to rate. The overall kappa is a weighted average of the category-specific responses. Values of <0.40 were considered poor to slight agreement, 0.41 – 0.60 fair to moderate, 0.61 – 0.80 good, and 0.81 – 1.00 very good agreement8.
Fourteen of the identified 15 readers attended the 1-daylong training session. Site experts represented the following specialties: gynecology (87%), urology (7%) and gastroenterology (7%). Calculation of agreement or concordance among the readers was based on all 21 images. Thirteen of 14 readers met the pre-defined passing level for IAS and EAS at the training session (test set #1) while one reader passed by completing a second test (test set #2). Protocol leaders using test set #1 trained the one reader not attending the training session who then successfully completed only test set #2 for certification. Given that this reader did not complete test set #1 for qualification, this participant’s results are not incorporated in the analysis on concordance.
Concordance among the readers (for presence of sphincter defect) was ICC = 0.54 (95% CI: 0.41, 0.69) for IAS and 0.50 (95% CI: 0.37, 0.66) for EAS. Agreement using kappa was 0.58 (95% CI: 0.55, 0.61) for IAS and 0.55 (95% CI: 0.52, 0.58) for EAS.
Table 1 summarizes the degree of concordance for IAS, EAS, interpretability and anal canal level for all readers with that of the expert radiologist (JF) for their original qualification test. The biostatistician also reviewed reader responses (in a blinded fashion) relative to the expert’s for discrepant IAS, EAS, Interpretability and Anal Canal Level to determine if there were specific images that were most problematic for readers. Most participants had difficulty with interpreting defects in the high anal canal and low anal canal along with understanding when an image was not interpretable. This most often occurred when interpreting if the image was in the correct location to interpret the high anal canal.
This study demonstrated that a single day, centralized training course with investigators experienced in performing endoanal ultrasound, focusing on interpreting endoanal ultrasound images can result in moderate agreement between pelvic floor physicians and an expert radiologist. It should be no surprise that raters trained to interpret images using the same standards as an ‘expert’ will produce raters more like the ‘expert’. However, our study suggests that in the setting of preparing for a research trial, if study planners incorporate important contextual elements and outcomes related to the clinical trial as the frame of reference in the training program, one can also achieve moderate agreement among study investigators9. The use of the classification system of sphincteric injury allowed for a systematic interpretation pattern of ultrasound images and likely resulted in better agreement among readers. Another important finding is that this level of agreement can be achieved with a sizable and diverse multi-specialty group of practitioners. This is an important characteristic for ensuring that the findings of studies that rely on interpretability of imaging or standardization of procedures are generalizable to practitioners at trial completion.
Prior studies have investigated the intraobserver and interobserver agreement in endoanal sonography. Gold et al. studied 51 consecutive patients including 43 women who were referred for possible sphincter abnormalities10. Images were reviewed by two experienced sonographers with each unaware of the other’s findings. Although both observers agreed in 27 patients with intact sphincters, this study was limited in the use of only two sonographers and the majority of images were in patients with normal or intact sphincters (35 out of 51). Fowler et al. developed and validated the pictorial chart to document defects from endoanal ultrasound examination by having two independent assessors review 296 endoanal ultrasound scans in patients recruited for a longitudinal cohort assessing occult anal sphincter injury after vaginal delivery11. There was strong agreement between reviewers (kappa 0.99) in categorizing normal vs. abnormal in 60 out of 296 scans but when these images were compared to an “expert” reader the agreement was highly variable among different levels of experience in endoanal ultrasonography.
The strengths of this study are the use of multiple readers from a clinically diverse background. Most studies investigating inter- observer agreement of interpreting endoanal ultrasound have limited number of readers. In previous studies that have investigated multiple readers, the experience with endoanal ultrasound is highly variable. This study used 15 readers who specialize in the evaluation of pelvic floor disorders. The study had several limitations. The logistics of the course required that content was taught in a group; all readers were assessed immediately following training at the end of the course when recall was likely best. We have no knowledge on whether readers would retain this level of agreement if the assessment were performed months or even years from the training session. In the setting of a large research trial it would be important to reassess the individuals performing reading of images at intervals during the study to determine whether a refresher course is needed to maintain agreement during the entire duration of a study. Additionally, readers did not perform the endoanal ultrasound exams as they only read the images as part of the training. Therefore, this study does not provide insight into the actual psychomotor performance of the endoanal ultrasound technique on patients and can only comment on agreement in image interpretability. Finally, the study did not assess intraobserver agreement of the images. Future studies should investigate reliability of the performance of endoanal ultrasound in combination with image interpretability as this is what is commonly done in practice. Reliability may improve if the person interpreting the images actually performed the procedure. Unfortunately, the BOOST trial was discontinued due to issues with study enrollment and the ability to assess the longer-term impacts of the training session could not be performed. Additionally, studies should determine whether these findings correlate with the clinically important outcome such as fecal incontinence.
In conclusion, multi-center certification of competence of 15 experienced individuals on interpreting EUS images was performed in a carefully planned single-day event. Agreement for diagnosis of sphincter defects using endoanal ultrasound images is moderate. This strategy allowed the achievement of an acceptable level of EUS image characterization compared to a gold standard radiology expert.
Clinical Trials Registry: NCT01166399
Prècis: Centralized training for 2-dimensional endoanal ultrasound with experienced investigators resulted in moderate agreement of anal sphincter diagnoses with an expert radiologist.
11.Fowler GE, Adams EJ, Bolderson J, Hosker G, Lowe D et al. Liverpool Ultrasound Pictorial Chart: the development of a new method of documenting anal sphincter injury diagnosed by endoanal ultrasound. BJOG. 2008, 115(6): 767-772.