Zum Hauptinhalt springen

From direct observation to artificial intelligence

Enhancing surgical skill assessment

Surgical skills are a comprehensive and dynamic concept that includes a surgeon's ability to execute exact maneuvers, manage difficulties, and make informed decisions in the operating room. In this context, we distinguish between technical skills, such as manual dexterity and tissue handling, and non-technical skills, such as situational awareness, decision-making, communication, teamwork, or leadership.1 Both technical and non-technical skills are considered essential for achieving best patient outcomes in the dynamic field of surgical practice and will be refined by experience and constant learning.

All of these factors must be carefully evaluated by educators during surgical assessments, pondering how to effectively measure and improve surgical skills. Dexterity, or the skill with which a surgeon manipulates equipment, is an important factor to assess, necessitating educators to create exams that capture the complexities of manual proficiency. Cognitive aptitude, which includes problem-solving ability and decision-making under duress, demands examinations that reflect the complexities of real-world surgical circumstances. Situational awareness, a critical component, necessitates evaluations of a surgeon’s capacity to respond to dynamic and unpredictable events in the operating room. Surgical educators must design assessments that reflect the cumulative influence of experience, recognizing that true excellence is the result of ongoing learning and refining over time. Aside from mastery of the more technical components of the evaluation, surgical skill includes good communication within the healthcare team, adaptability in difficult situations, and a sophisticated grasp of patient care. It is a synthesis of art and science in which the surgeon's hands, head, and interpersonal skills all work together to ensure safe, successful, and patient-centered outcomes.

Recognizing and describing the nuanced nature of a surgical skill becomes critical in refining training techniques and building a culture of continuous improvement in the surgical domain as we navigate the environment of assessing surgical competence. Even if surgical skill is more than just performance, surgical skill assessments are increasingly being demonstrated to predict clinical outcomes.2 "Direct observation" by surgical supervisors refers to the long-standing practice of assessing surgeons in training through personal, real-time evaluation. While useful, this strategy has inherent limitations, such as subjectivity and potential bias. 

In order to improve surgical skill assessments, research activities show a revolutionary journey in the field of surgical education, representing a transition away from traditional techniques of skill evaluation, which rely mostly on direct observation by more experienced supervisors, and toward the incorporation of cutting-edge artificial intelligence technologies. 

The term "artificial intelligence" introduces the idea of using machine learning algorithms and data-driven methodologies to supplement and enhance human observation. Artificial intelligence adds objectivity, defined evaluation measures, and the ability to process massive amounts of data, allowing for a more comprehensive and theoretically unbiased evaluation of surgical skills.

As a first step to reduce the subjectivity of skill assessment by direct observation, checklist-based scoring systems have been introduced. In 1997, the Objective Structured Assessment of Technical Skill (OSATS) score has been proposed by Martin et al.3 OSATS assesses technical skill in seven dimensions (respect for tissue, time and motion, instrument handling, knowledge of instruments, use of assistants, flow of operation and forward planning, and knowledge of specific procedure) on a 5-point Likert scale. Of note, the interrater reliability, which is the agreement of observers when assessing the same candidate, of OSATS was only moderate (intraclass correlation coefficient [ICC] 0.64-0.72). In 2005, Vassiliou et al. developed a scoring system for laparoscopic surgery, called GOALS (Global Operative Assessment of Laparoscopic Skills).4 Five dimensions of technical skill (depth perception, bimanual dexterity, efficiency, tissue handling, and autonomy) are assessed on a 5-point Likert scale. Interrater reliability of GOALS was good ranging from 0.81-0.89 ICC depending on the expert level of the observer. In 2012, the Global Evaluative Assessment of Robotic Skill (GEARS) score has been proposed by Goh et al. particularly assessing robotic skills during prostatectomy.5 Similar to GOALS, the GEARS score assesses technical skill in 6 dimensions (depth perception, bimanual dexterity, efficiency, force sensitivity, autonomy, and robotic control) on a 5-point Likert scale. As a measure of interrater reliability, ICC for GEARS scores among raters was 0.80.

With the introduction of minimally invasive surgery, video recordings of surgical procedures became easily available. This allows for a postoperative assessment of surgical skills relieving some of the time constraints of expert surgeons performing surgical skill assessment. Video-based assessment of surgical skills has been applied  to numerous fields of minimally invasive surgery.6

Miskovic, et al. showed that video-based assessment can reliably assess the technical skill level of surgeons undergoing a National Training Program in Laparoscopic Colorectal Surgery in the UK. Experts scored significantly better than trainees.7 Birkmeyer et al. were among the first to demonstrate a correlation of surgical skills as assessed by video-based assessment with postoperative outcomes in laparoscopic Roux-en-Y gastric bypass surgery.8 Meanwhile, the correlation of surgical skills and clinical outcomes was confirmed by video-based assessment for numerous procedures including laparoscopic gastrectomy9 and laparoscopic colorectal surgery.10 The latter study by Stulberg et al. even suggested, that higher technical skill scores in colorectal procedures are associated with lower complication rates for colorectal and non-colorectal procedures. This highlights the importance of technical skill assessment not only from an educational perspective but also with a view to improving outcome quality.

Despite its importance, surgical skill assessment is not routinely implemented in surgical practice as it is time-consuming and needs surgical expertise. 

Therefore, research efforts have attempted to automatize surgical skill assessment.11 Lavanchy et al. trained a machine learning algorithm to track surgical tools in videos of laparoscopic cholecystectomies. Based on the movement patterns of the tracked tools, they trained a model to predict the surgical skill level of the surgeon on a 5-point Likert scale. A smaller range of instrument motion was correlated with higher surgical skills and the model accurately predicted good and poor skill level in 87% of cases.12

When using machine learning to train a surgical skill assessment model, most often surgical videos or kinematic data from robotic platforms or wearable sensors are leveraged as input. The model output usually is an ordinal skill level (novice, intermediate, expert) or a binary classification (good vs. poor skill). Due to the complexity of surgical skill assessment - even when relying on human expert knowledge - a reliable and valid machine learning model comprehensively assessing surgical skills for every surgical procedure and providing actionable feedback is missing. However, a very promising way forward to generic surgical skill assessment and individualized feedback is the analysis of surgical gestures (spread, dissect, cut, etc.) and its correlation to clinical outcomes.13

Machine learning may have a significant impact on the evaluation of surgical abilities, particularly in the development of Entrustable Professional Activity (EPA). 

EPAs are specific tasks within a profession that can be delegated to a learner based on demonstrated competence. In surgical assessment, machine learning may contribute significantly by offering objective and data-driven insights into a surgeon's performance.

First, machine learning algorithms can evaluate vast datasets of surgical procedures, discovering patterns and trends that the human eye may miss. This enables a more comprehensive evaluation of a surgeon's entrustability. Second, machine learning facilitates real-time assessment during surgeries. Advanced tools, such as computer vision systems, can analyze surgical movements, instrument usage, and overall technique, providing immediate feedback to both surgical trainees and educators. This real-time assessment aligns with the dynamic nature of surgical practice and allows for on-the-spot adjustments and improvements.

Moreover, machine learning can aid in personalized learning paths for surgeons. By analyzing individual performance data, algorithms can identify specific areas for improvement and appropriate tailor educational interventions. This not only enhances the overall skill development of surgeons but also aligns with the concept of entrustability, ensuring that practitioners are entrusted with tasks commensurate with their demonstrated abilities.

However, it is critical to proceed with caution when using machine learning in surgical skill assessment. 

The human touch, which includes mentorship, communication skills, and the ability to deal with unexpected obstacles, is still essential in surgical practice. 

Machine learning should be viewed as a complementary tool to human judgment, leading to a more holistic and effective approach to evaluating trustworthy professional behavior in surgery.


1.     Crossley J, Marriott J, Purdie H, Beard JD. Prospective observational study to evaluate NOTSS (Non-Technical Skills for Surgeons) for assessing trainees’ non-technical performance in the operating theatre. Br J Surg. 2011;98:1010–1020.

2.     Azari D, Greenberg C, Pugh C, Wiegmann D, Radwin R. In Search of Characterizing Surgical Skill. J Surg Educ. 2019;76:1348–1363.

3.     Martin JA, Regehr G, Reznick R, Macrae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents: OBJECTIVE STRUCTURED ASSESSMENT OF TECHNICAL SKILL. Br J Surg. 1997;84:273–278.

4.     Vassiliou MC, Feldman LS, Andrew CG, Bergman S, Leffondré K, Stanbridge D, Fried GM. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg. 2005;190:107–113.

5.     Goh AC, Goldfarb DW, Sander JC, Miles BJ, Dunkin BJ. Global Evaluative Assessment of Robotic Skills: Validation of a Clinical Assessment Tool to Measure Robotic Surgical Skills. J Urol. 2012;187:247–252.

6.     Grüter AAJ, Van Lieshout AS, Van Oostendorp SE, Henckens SPG, Ket JCF, Gisbertz SS, Toorenvliet BR, Tanis PJ, Bonjer HJ, Tuynman JB. Video-based tools for surgical quality assessment of technical skills in laparoscopic procedures: a systematic review. Surg Endosc. 2023;37:4279–4297.

7.     Miskovic D, Ni M, Wyles SM, Kennedy RH, Francis NK, Parvaiz A, Cunningham C, Rockall TA, Gudgeon AM, Coleman MG, Hanna GB. Is Competency Assessment at the Specialist Level Achievable? A Study for the National Training Programme in Laparoscopic Colorectal Surgery in England. Ann Surg. 2013;257:476–482.

8.     Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJO. Surgical Skill and Complication Rates after Bariatric Surgery. N Engl J Med. 2013;369:1434–1442.

9.     Fecso AB, Bhatti JA, Stotland PK, Quereshy FA, Grantcharov TP. Technical Performance as a Predictor of Clinical Outcomes in Laparoscopic Gastric Cancer Surgery. Ann Surg. 2019;270:115–120.

10.   Stulberg JJ, Huang R, Kreutzer L, Ban K, Champagne BJ, Steele SR, Johnson JK, Holl JL, Greenberg CC, Bilimoria KY. Association Between Surgeon Technical Skills and Patient Outcomes. JAMA Surg. 2020;155:960.

11.   Pedrett R, Mascagni P, Beldi G, Padoy N, Lavanchy JL. Technical skill assessment in minimally invasive surgery using artificial intelligence: a systematic review. Surg Endosc. 2023;37:7412–7424.

12.   Lavanchy JL, Zindel J, Kirtac K, Twick I, Hosgor E, Candinas D, Beldi G. Automation of surgical skill assessment using a three-stage machine learning algorithm. Sci Rep. 2021;11:5197.

13.   Ma R, Ramaswamy A, Xu J, Trinh L, Kiyasseh D, Chu TN, Wong EY, Lee RS, Rodriguez I, DeMeo G, Desai A, Otiato MX, Roberts SI, Nguyen JH, Laca J, Liu Y, Urbanova K, Wagner C, Anandkumar A, Hu JC, Hung AJ. Surgical gestures as a method to quantify surgical performance and predict patient outcomes. Npj Digit Med. 2022;5:187.



Damit diese Website ordnungsgemäß funktioniert und um dein Erlebnis zu verbessern, verwenden wir Cookies. Ausführlichere Informationen findest du in unserer Cookie-Richtlinie.

Einstellungen anpassen
  • Notwendige Cookies ermöglichen die Kernfunktionen. Die Website kann ohne diese Cookies nicht richtig funktionieren und kann nur deaktiviert werden, indem du deine Browsereinstellungen änderst.