Enhancing Local Context of Histology Features in Vision Transformers

Wood R.; Sirinukunwattana K.; Domingo E.; Sauer A.; Lafarge MW.; Koelzer VH.; Maughan TS.; Rittscher J.

Enhancing Local Context of Histology Features in Vision Transformers

Wood R., Sirinukunwattana K., Domingo E., Sauer A., Lafarge MW., Koelzer VH., Maughan TS., Rittscher J.

Predicting complete response to radiotherapy in rectal cancer patients using deep learning approaches from morphological features extracted from histology biopsies provides a quick, low-cost and effective way to assist clinical decision making. We propose adjustments to the Vision Transformer (ViT) network to improve the utilisation of contextual information present in whole slide images (WSIs). Firstly, our position restoration embedding (PRE) preserves the spatial relationship between tissue patches, using their original positions on a WSI. Secondly, a clustering analysis of extracted tissue features explores morphological motifs which capture fundamental biological processes found in the tumour micro-environment. This is introduced into the ViT network in the form of a cluster label token, helping the model to differentiate between tissue types. The proposed methods are demonstrated on two large independent rectal cancer datasets of patients selectively treated with radiotherapy and capecitabine in two UK clinical trials. Experiments demonstrate that both models, PREViT and ClusterViT, show improvements in the prediction over baseline models.

Original publication

DOI

10.1007/978-3-031-19660-7_15

Type

Conference paper

Publication Date

01/01/2022

Volume

13602 LNCS

Pages

154 - 163

Cookies on this website