The Bialystok Dataset contains 162 3500x3500 px images of 2 419 cells annotated in accordance with the Bethesda System. The images are derived from whole slide images (WSI) of routine cervical smears.

The dataset is extensively described in Conventional Cervical Cytology Image Dataset with Cell Outline Annotations available soon on IEEE Access.

To access the Bialystok Dataset, please fill out this form.


The dataset focuses on presenting a realistic array of smear diversity from simple and easy to differentiate cells, to challenging examples full of stained mucus, large dark cell clusters and images abundant with neutrophilic cells obscuring the view.



The dataset contains artefacts commonly found in pap smears:



The dataset is accompanied with annotations in form of cytoplasm maps divided into 6 Bethesda categories and two unidentifiable categories: Unidentifiable cells and Unidentifiable cell clusters. The unidentifiable category arises from the specificity of cytodiagnostician work. The spectrum of cell condition is continuous. Therefore in certain conditions a cytodiagnostician cannot make a definitive decision. Such cells are still outlined in the dataset to allow segmentation of all cells in the image.



In summary the dataset provides:

  • Realistic fragments of WSI fragments of routine smears
  • Annotations coherent with common reporting system
  • Suitability for segmentation, classification and detection
  • Challenging benchmark