Introduction
In recent years, Convolutional Neural Networks (CNNs) have emerged as a dominant paradigm in machine learning, particularly for tasks
involving image recognition, object detection, and classification. Their architecture, inspired by the human visual system, enables them
to effectively capture spatial hierarchies in data, making them highly effective for extracting features across varying
levels of granularity. While CNNs have demonstrated remarkable success in numerous domains, most applications are designed
to produce flat, single-level predictions, without explicitly leveraging the hierarchical relationships inherent in many
classification tasks.
Hierarchical classification tasks, such as taxonomic classification of species, inherently
involve nested relationships between classes. For example, biological taxonomies organize organisms in a hierarchy of taxonomic
ranks (e.g., class, order, family, and species), while diseases are categorized by systems and subtypes. Exploiting these
hierarchical structures within a classification framework has the potential to improve both the accuracy and interpretability
of machine learning models. Hierarchical CNNs, which either embed hierarchical structures into a single model or use a collection of CNN models
working together hierarchically, represent a promising approach to address this challenge.
The concept of hierarchical CNNs encompasses two primary strategies. The first approach integrates hierarchical relationships directly
into the architecture or loss function of a single CNN, enabling it to model dependencies among different levels of classification.
The second approach involves a collection of CNN models, each specialized to operate at a specific level of the hierarchy, progressively
refining predictions from coarse categories (e.g., order, family) to finer ones (e.g., species). This latter method closely mirrors the
hierarchical decision-making processes, where general categories are identified first, followed by finer distinctions.
Hierarchical CNNs have shown potential in domains like taxonomy, where large datasets often contain imbalances across levels of the hierarchy.
This paper explores the application of hierarchical CNNs in Mollusca. We discuss the advantages of leveraging hierarchical structures and
analyze existing methodologies.
Hierarchical Classification in Biological Taxonomy Using CNNs on Images
CNNs are a widely used to detect and classify organisms. It is common to use a traditional “flat” classification,
whereby CNN models do not explicitly consider evolutionary relationships between taxa [
1].
However, hierarchical classification has been applied in deep learning to manage large datasets with numerous classes effectively.
For the Mollusca dataset presented here, which includes over 100,000 species [
11],
a hierarchical classification approach is particularly well-suited.
Hierarchical classification was attempted for many organism such as arthropods [
3].
Tresson et al. (2021)[
3] used a two-step approach, first classifying arthropods in five super-classes and
then using another model to classify cropped images at the species level.
Ong SQ & Hamid SA. [
4] have shown the each taxonomic level has its optimal algorithm. Moreover each taxon gave
a different performance. They demonstrated that a single deep learning architecture was not robust enough to classify different
taxonomic levels of specimens. Another approach was used by Wu et al. [
6]. They used a hierarchical cross-entropy (HXE)
loss function to organize a “flat” network of leaves in a tree. However, results varied widely among different datasets.
Gupta et al. [
4] used an hierarchical classification together with YOLOv3 object detection in fish.
The same approach is used by Iwano et al [
14]. They developed a hierarchical framework that combines
YOLOv7 and EfficientNetV2, demonstrating its efficacy in plant disease diagnosis and hierarchical classification.
While hierarchical classification has already been developed for species classification, it has to our knowledge not been carried out
for Mollusca.
Approaches to use CNN for Hierarchical Classification
Figure 1: "Flat" classification approach using a multi-class classifier to predict the leaf nodes (Species).
The grey area represents the classifier.
There are several approaches to create a hierarchical CNN for a taxonomic tree, having many child nodes per parent node [
9].
The easiest method is to determine the
most specific level, species (the lowest level, the leafs), ignoring the hierarchical structure. This approach behaves like
a traditional classifier during training and testing and provides only indirect information about the hierarchy outside of
the classifier. Figure 1 illustrates this approach. This "flat" classification might become unwiedly when there are many species.
Figure 2: The Local Classifier per Parent Node uses a multi-class classifier to predict the nodes at the next level.
The grey squares represent the classifiers.
Another approach is the "Local Classifiers per Parent Node", where separate CNN models for each node in the taxonomic tree is trained.
Each classifier is responsible for discriminating between the child nodes of its parent node [
13].
It is also known as the top-down approach (see figure 2). In this top-down approach, for each new example in the test set, the system first
predicts its first-level (coarse-grained) class, then it uses that predicted class to narrow the choices of classes to be
predicted at the second level (the only valid candidate second-level classes are the children of the class predicted at the first level),
and so on, recursively, until the most specific (fine-grained) prediction is made.
A disadvantage of the top-down class-prediction approach is that an error at one level will be
propagated downwards the hierarchy, unless some procedure for avoiding this problem is used (see next chapter).
For example, one CNN at the class level to differentiate gastropods from bivalves, then another CNN for gastropods to distinguish orders,
and so on
Figure 3: Local Classifier per Level classification approach using a multi-class classifier to classify the nodes at a level.
The grey area represents the classifier.
A compromise between the previous two is the Local Classifiers per Level. In this approach
one CNN per taxonomic level (one for class, one for order, one for family, etc.) is trained.
This means that depending on the higher levels, there are less classes, so the model is trained in the coarse-grained
features, while the classifier on the lowest level will be trained on many more classes on the fine-grained features. Figure 3
gives a schematic presentation of this approach. One possible way of classifying test examples using classifiers
trained by this approach is when a new test example is presented to the classifier, to get the output of all
classifiers (one classifier per level) and use this information as the final classification. This approach was
used by S. Kiel on fossil bivalves [
20] (a subset of the Mollusca).
This research focused on training CNNs at different taxonomic levels (family, order, and subclass) and combining
their predictions to address the issue of convergent evolution, where unrelated species develop similar features.
The study found that incorporating predictions from higher taxonomic levels improved the accuracy of family-level clustering,
particularly for families not included in the training data. This finding supports the concept of hierarchical CNNs being able
to leverage broader morphological knowledge to refine fine-grained distinctions.
The major drawback of this class-prediction approach is being prone to class-membership inconsistency.
Figure 4: The Global multi-class classifier to predict the nodes at the leaf level but using hierarchy information.
The grey squares represent the classifiers.
At last, there is the approach to take the hierarchy into account in a global classifier.
This gives a single , but complex, classification model built from the training set.
For example, the CNN's loss function could incorporate the hierarchical structure.
This approach penalize errors more heavily that are further away in the taxonomic tree [
12].
Only one CNN is used to train the complete dataset, so this corresponds with the approach described in figure 1, but
with the advantage of having the hierarchy included in the model.
Rationale for using a local classifier per node approach
In this article the "Local Classifier per Parent Node (LCPN) approach [
9] was selected.
In LCPN a multi-class classifier is trained for each parent node to distinguish between
its child nodes. For example, a classifier at the family level would be trained to classify all species within that family.
Using a model for each node provides much more flexibility during training of the model. The hyperparameters were optimized for
each node.
Ong SQ & Hamid SA. [
5]
have demonstrated that using another model for each taxonomic level gives better results.
The hyperparameters were different for each model, which depends on the size of the image dataset, the quality of the images
and most important, the morphological features of each taxon (order, family, genus). Some genera have species that are
easily to recognize, while other genera have species that are very difficult to identify.
Therefore, having specializing classifiers for each node, LCPN can potentially improve overall classification accuracy .
Each classifier focuses on a specific subset of the data and learns the distinct characteristics of that taxonomic group. This can
be particularly beneficial when dealing with diverse and complex datasets, as is often the case in biological taxonomy.
LCPN has the potential to handle imbalanced datasets at different levels of the hierarchy. In biological
taxonomy, certain taxa may have significantly fewer representatives than others. By training separate classifiers for each node, LCPN can
address this imbalance by focusing on the specific characteristics of each group, even if some groups have limited data.
This allows working in an interative manner; adding more nodes over time. It enables also more experimentation because the time and
IT resources to train a model for a new node, especially the leaves in the tree. At last, because the Mollusca taxonomy changes
regularly, only a few nodes need new training.
At last, this approach is cost effective because it requires much less IT infrastructure when compared to training on the complete
dataset of images.
A common disadvantage of this approach is the potential for inconsistencies in classification across different levels of the hierarchy . This can occur when a
classifier at a lower level assigns an instance to a class that contradicts the classification made by a classifier at a higher level.
For example, when classifying mammals, if a classifier at the order level assigns a specimen to the order Primates, but a classifier at the
family level assigns it to the family Felidae (cats), this creates an inconsistency. But because an top-down approach is followed
during inference time, no inconsistencies will occur. This involves starting at the root of the hierarchy and progressing down, using the predictions at each level to guide the selection
of the next classifier. Thresholds can be applied to the classifier outputs to ensure that an instance is only assigned to a child node
if the parent node's prediction exceeds a certain confidence level [
9]
Another challenge is the increased complexity of managing and training a large number of individual classifiers, especially in a
complex hierarchy like biological taxonomy . This requires a careful design of the inference process. [
10]
Using efficient algorithms and data structures can help reduce the computational burden of training and managing multiple classifiers.
This could involve optimizing the code for specific IT architectures or using specialized libraries for hierarchical classification.
Methods
Data collection
Data collection was
described
previously. All taxonomic information was retrieved from MolluscaBase [
11] or Worms [
19]
To have a reliable classification success a minimum of
25 images per species are needed.
Training and testing the convolutional neural network
Model selection was described
here.
For each node in the taxonomy, the hyperparameters were optimized. The following table shows starting values.
Default hyperparameters
| Hyperparameter |
Initial value |
| Epochs |
50 |
| Learning rate |
0.0005 |
| Early stopping |
Yes |
| Batch Size |
64 |
| Dropout top layer |
0.20 |
| Optimizer |
Adam |
| Early stopping |
Yes |
All models were training using Tensorflow and Keras.
Model evaluation
The output layer (softmax) of the CNN consisted of a vector with a confidence value for each class (i.e., taxon) included
in the model. The class with the highest value is the predicted class for an image. We assessed if the image was predicted correctly,
if the ground truth name class is the same. The python module sklearn.metrics was used to calculate all common metrics: recall,
precision, F1 score and accuracy.
Results
Criteria to create the tree structure
The taxonomic tree for Mollusca can have many levels, for example the tree for Callista chione (Linneaeus, 1758), a bivalve from the
North Atlantic Ocen and Mediterranean Sea, is given below [
11]. There are 12 taxonomic levels in this example
between Mollusca and the species,
Callista choine
in this example.
Creating a CNN model for each node would make the collection very complex.
Figure 6.
A simplified approach will be to have a maximum of only 4 levels, with the most common levels being
Phylum > "Order" > "Family" > Genus > Species. This means in the maximal case, 4 separate CNN models are created to
classify a species. The classification begins with the root node, the phylum Mollusca and determine the order.
The first CNN model determines the order, although "order" is placed in quotes because this level can vary— it
may correspond to a higher (e.g., subterclass) or lower taxonomic rank.
Is some cases this intermediate level
will be skipped, and the family will be directly determined starting of the phylum (refer to the dotted line in figure 6).
The input
of the next model will be the classification made by the Mollusca model. Depending on the predicted class another CNN model will
be activated to determine the child class of the model.
For example, consider the hierarchy for
Callista chione.
The correct class predicted by the Mollusca model will be
Euheterodonta (see table 1). A Euheterodonta
model will be activated that predicts the family. The correct classification would be
Veneridae in the example.
The next step will be the activation the Veneridae model to determine the genus. At last a Callista model will
determine the species.
If not enough species are available in a particular genus (for example, only one or two species with enough images),
it might be more efficient to create a model that classifies
the species starting from the family. This is shown in figure 6 with the second dotted line.
This hierarchical approach ensures an efficient and targeted classification process,
with each CNN model specializing in specific levels of the taxonomy.
The root model: Mollusca
The classes selected for the first taxonomic level, the Mollusca (Phylum) parent node are listed
in the next table. It has 22 classes, which classify molluscs at the "order" level (see figure 6). The taxonomic
level is given in table I, and can be subterclass, infraclass, order, superfamily or family. When the class has the taxonomic
level family, we consider this first level to be skipped (see figure 6). The last column shows the number of images
available for the class. There is a large imbalance in the dataset.
Table I: Classes of the Mollusca model
| Name Taxon |
Taxonomic level |
# Images |
| Archiheterodonta |
Subterclass |
1473 |
| Arcida |
Order |
5567 |
| Caenogastropoda incertae sedis |
Order |
14023 |
| Chitonida |
Order |
800 |
| Cycloneritida |
Order |
8072 |
| Dentaliidae |
Family |
473 |
| Euheterodonta |
Subterclass |
16658 |
| Euthyneura |
Infraclass |
4637 |
| Lepetellida |
Order |
14463 |
| Lepidopleurida |
Order |
434 |
| Limida |
Order |
1408 |
| Littorinimorpha |
Order |
131468 |
| Lottioidea |
Superfamily |
1884 |
| Mytilida |
Order |
4542 |
| Neogastropoda |
Order |
170973 |
| Nuculanoidea |
Superfamily |
540 |
| Nuculoidea |
Superfamily |
1067 |
| Patelloidea |
Superfamily |
5322 |
| Pectinida |
Order |
11878 |
| Pleurotomariida |
Order |
3182 |
| Seguenziida |
Order |
638 |
| Trochida |
Order |
20693 |
Many orders were not included because there were not enough images available (for example, Laevidentaliidae, Neomphalida, Mesoneura, etc.).
Also extinct orders were excluded. The amount of images available is a snapshot of January 2025.
The criteria to select a particular taxonomic level are the number of classes available for the 2nd level (families) and the
amount of images. Having between 5 and 20 classes for a model makes the model manageable and easier to analyse and fine-tune.
While the Order taxonomic level is selected is most instances, sometimes a higher taxonomic level (Subterclass) or lower taxonomic
level (Superfamily, Family) was selected. Even after carefully selection of the class, an imbalanced dataset was constructed.
Including class weights [
15], a focal loss function [
16],
augmentation [
17] and/or undersampling [
18] in model training alleviated partly the problems
occuring with an
imbalances among the classes in the dataset.
Table II: Performance of all classes of the Mollusca model
| Name Taxon |
Recall |
Precision |
F1 |
| Archiheterodonta |
1.000 |
1.000 |
1.000 |
| Arcida |
0.386 |
0.809 |
0.523 |
| Caenogastropoda incertae sedis |
0.982 |
0.823 |
0.896 |
| Chitonida |
1.000 |
0.428 |
0.600 |
| Cycloneritida |
0.958 |
0.884 |
0.920 |
| Dentaliidae |
1.000 |
0.500 |
0.666 |
| Euheterodonta |
0.947 |
0.831 |
0.885 |
| Euthyneura |
1.000 |
0.900 |
0.947 |
| Lepetellida |
0.951 |
0.975 |
0.963 |
| Lepidopleurida |
1.000 |
1.000 |
1.000 |
| Limida |
1.000 |
0.833 |
0.909 |
| Littorinimorpha |
0.975 |
0.987 |
0.981 |
| Lottioidea |
1.000 |
0.500 |
0.666 |
| Mytilida |
0.944 |
0.894 |
0.918 |
| Neogastropoda |
0.973 |
0.989 |
0.981 |
| Nuculanoidea |
1.000 |
0.166 |
0.285 |
| Patelloidea |
0.888 |
0.841 |
0.864 |
| Pectinida |
1.000 |
1.000 |
1.000 |
| Pleurotomariida |
1.000 |
0.833 |
0.909 |
| Seguenziida |
1.000 |
0.800 |
0.888 |
| Trochida |
0.958 |
0.985 |
0.971 |
| Class weights were used, but no other adjustments were made to improve the imbalance.
The 3 top layers were unfrozen and a regularization value of 0.0001 was used. All other parameters
had default values. The dataset has 424406 images.
|
The results show that some classes have a bad performance, most likely due to the large imbalance between classes in the
dataset.
Because of this large imbalance class weights, focal loss, augmentation of minority classes (rotate, zoom out,
change brightness or contrast) and undersampling of the majority classes was tested.
Table III: Focal loss, augmentation and undersampling
| # images |
Accuracy |
Loss |
Dentaliidae |
Nuculanoidae |
Littorinimorpha |
Neogastropda |
| Total |
Max. per class |
Training |
Validation |
Training |
Validation |
F1 Score |
# images |
F1 Score |
# images |
F1 Score |
# images |
F1 Score |
# images |
| 12235 |
1000 |
0.986 |
0.938 |
0.010 |
0.033 |
0.928 |
473 |
0.914 |
540 |
0.849 |
1000 |
0.870 |
1000 |
| 26526 |
2500 |
0.989 |
0.959 |
0.009 |
0.025 |
0.914 |
473 |
0.980 |
540 |
0.943 |
2500 |
0.897 |
2500 |
| 44485 |
5000 |
0.983 |
0.964 |
0.012 |
0.022 |
1.000 |
473 |
0.923 |
540 |
0.952 |
5000 |
0.948 |
5000 |
| 68895 |
10000 |
0.985 |
0.971 |
0.012 |
0.018 |
1.000 |
473 |
0.870 |
540 |
0.967 |
10000 |
0.930 |
10000 |
| 103489 |
25000 |
0.980 |
0.971 |
0.012 |
0.019 |
1.000 |
473 |
0.875 |
540 |
0.961 |
25000 |
0.963 |
25000 |
| No undersampling |
0.960 |
0.964 |
0.030 |
0.033 |
0.666 |
473 |
1.000 |
540 |
0.981 |
131468 |
0.981 |
170973 |
Table III shows the result of undersampling.
It shows that increasing the maximum number of images per class improves the F1 score for the majority classes. The overall
validation accuracy also increases until the maximum reaches 10 000 images per class. Using 25 000 images for the majority classes
increases performance marginally. The F1 score for the minority class gives a mixed result, for one class there is an
improvement, the other classes deteriorates. Analysis of other classes with low number of images show a similar tendency (not shown).
Focal loss, regularization and top layer dropout were kept constant. The last row shows the results without undersampling. While the
results are ambiguous, the results still show that undersampling improves the performance slightly.
All tests with augmentation of the minority classes (rotate, change brightness and contrast and zoom-out) gave no better results
(not shown).
We have further fine-tuned the model and made more top layers trainable. This increases the overall performance, where 10 trainable
layers is optimal.
Table IV: Fine-tuning
| # images |
Accuracy |
Loss |
Dentaliidae |
Nuculanoidae |
Littorinimorpha |
Neogastropda |
| Total |
Max. per class |
Training |
Validation |
Training |
Validation |
F1 Score |
# images |
F1 Score |
# images |
F1 Score |
# images |
F1 Score |
# images |
| 44485 |
5000 |
0.991 |
0.970 |
0.008 |
0.019 |
1.000 |
473 |
0.952 |
540 |
0.984 |
3000 |
0.937 |
3000 |
| 103489 |
25000 |
0.989 |
0.976 |
0.013 |
0.016 |
1.000 |
473 |
1.000 |
540 |
0.977 |
15000 |
0.970 |
15000 |
| 133389 |
50000 |
0.982 |
0.975 |
0.023 |
0.018 |
0.666 |
473 |
1.000 |
540 |
0.982 |
30000 |
0.979 |
30000 |
Fine-tuning the model has further improved the model, and again it shows that undersampling improves the performance.
The best model has a maximum of 25000 images, splitted in 15000 images for the training dataset, and 5000 for the
validation and test dataset. To have more images to train, we have also omitted the test dataset, using the validation
dataset for testing. The results are similar (not shown).
Analyzing the models on genus taxonimic level of the Cypraeidae family
The species of the family
Cypraeidae have a similar form, with a wide variation in the pattern and colours. For 18
genera we created models. We have compared these models to see if the hyperparameters correlate with the size
of the dataset or the number of classes (species) in the dataset. Table V shows the result of all genera with the
same (initial) hyperparameters
and the final optimized, fine-tuned models.
Table IV: Fine-tuning
|
Model with same (initial) hyperparameters |
Fine-tuned Model with optimized hyperparameters |
| Genus |
# images |
# species |
Train. Acc. |
Val. Acc. |
Train. Loss |
Val. Loss |
Learning rate |
Top layer dropout |
Un-freezed top layers |
Regulari-zation |
Train. Acc. |
Val. Acc. |
Train. Loss |
Val. Loss |
| Austrasiatica |
1396 |
3 |
0.942 |
0.943 |
0.248 |
0.203 |
0.0005 |
0.2 |
- |
- |
0.942 |
0.943 |
0.248 |
0.203 |
| Bistolida |
4440 |
11 |
0.837 |
0.837 |
0.266 |
0.075 |
0.001 |
0.2 |
20 |
0.01 |
0.965 |
0.919 |
0.087 |
0.085 |
| Cypraeovula |
3830 |
14 |
0.885 |
0.876 |
0.867 |
0.885 |
0.0005 |
0.2 |
3 |
0.0001 |
0.956 |
0.914 |
0.266 |
0.289 |
| Eclogavena |
2110 |
5 |
0.892 |
0.879 |
0.485 |
0.376 |
0.00025 |
0.2 |
3 |
- |
0.959 |
0.950 |
0.182 |
0.189 |
| Erronea |
9338 |
16 |
0.860 |
0.892 |
1.122 |
0.362 |
0.0005 |
0.2 |
3 |
0.0001 |
0.953 |
0.935 |
0.324 |
0.211 |
| Leporicypraea |
2394 |
4 |
0.913 |
0.914 |
0.546 |
0.244 |
0.0005 |
0.2 |
3 |
0.0001 |
0.968 |
0.960 |
0.155 |
0.126 |
| Lyncina |
9210 |
11 |
0.912 |
0.934 |
1.279 |
0.226 |
0.0005 |
0.2 |
3 |
0.0001 |
0.978 |
0.964 |
0.291 |
0.123 |
| Mauritia |
8299 |
8 |
0.815 |
0.853 |
0.125 |
0.051 |
0.0005 |
0.2 |
3 |
0.0001 |
0.953 |
0.919 |
0.024 |
0.031 |
| Melicerona |
623 |
2 |
0.938 |
0.863 |
0.230 |
0.293 |
0.00025 |
0.25 |
3 |
0.0001 |
0.96 |
0.903 |
0.157 |
0.241 |
| Naria |
23518 |
24 |
0.879 |
0.925 |
0.924 |
0.296 |
0.001 |
0.1 |
3 |
0.0001 |
0.987 |
0.965 |
0.018 |
0.026 |
| Notocypraea |
2230 |
5 |
0.825 |
0.796 |
0.659 |
0.559 |
0.001 |
0.2 |
3 |
0.001 |
0.959 |
0.888 |
0.167 |
0.408 |
| Palmadusta |
7058 |
13 |
0.947 |
0.945 |
0.278 |
0.175 |
0.0005 |
0.2 |
- |
- |
0.947 |
0.945 |
0.278 |
0.175 |
| Pseudozonaria |
1753 |
4 |
0.962 |
0.980 |
0.152 |
0.089 |
0.0005 |
0.2 |
- |
- |
0.962 |
0.980 |
0.152 |
0.089 |
| Purpuradusta |
2575 |
8 |
0.870 |
0.849 |
1.233 |
0.533 |
0.0005 |
0.2 |
3 |
0.0001 |
0.932 |
0.882 |
0.663 |
0.650 |
| Pustularia |
4332 |
10 |
0.837 |
0.848 |
0.854 |
0.471 |
0.0005 |
0.2 |
20 |
0.01 |
0.967 |
0.916 |
0.050 |
0.060 |
| Talparia |
1075 |
2 |
0.949 |
0.977 |
0.222 |
0.086 |
0.0005 |
0.2 |
- |
- |
0.949 |
0.977 |
0.222 |
0.086 |
| Umbilia |
2898 |
7 |
0.846 |
0.889 |
0.947 |
0.341 |
0.0005 |
0.2 |
3 |
- |
0.941 |
0.941 |
0.408 |
0.213 |
| Zoila |
7036 |
27 |
0.855 |
0.848 |
1.264 |
0.502 |
0.0005 |
0.2 |
3 |
0.0001 |
0.977 |
0.916 |
0.231 |
0.299 |
Using the same parameters for the first training (see default parameters), the results vary widely between a
training accuracy of 0.815 for the Mauritia model until 0.962 for the Pseudozonaria model. Optimizing the hyperparameters
and further fine-tuning gave always better results, all models having at least a validation accuracy of 0.9.
When plotting the amount of images in the dataset, or the number of species (classes) in the dataset against
the accuracy, no correlation was found. The size of the dataset is not able to explain the variation in accuracy among
the models.
Conclusion
This study has successfully demonstrated the power of hierarchical convolutional neural networks (CNNs) as a robust tool for the
identification of Mollusca species. Our results provide compelling evidence for the efficacy of hierarchical classification strategies,
which can be implemented through diverse architectures. These range from a single, end-to-end CNN capable of refining predictions
from coarse taxonomic levels (e.g., class) down to fine-grained distinctions (e.g., species), to an ensemble of interconnected CNNs,
where each model is specialized for a specific node within the taxonomic hierarchy. In this work, we elected to pursue the latter
approach, constructing a hierarchical network of independent CNNs. As detailed in previous sections, this architectural decision
was driven by the inherent advantages it offers in terms of modularity, optimization, and feature specificity. The strong performance
of our model validates this choice and underscores the potential of hierarchical CNNs for accurate and efficient Mollusca classification.
The observed variations in performance across different taxonomic groups, particularly within the family Cypraeidae, offer critical insights
into the benefits of our chosen architecture. By utilizing a dedicated model for each node within the hierarchy, we are empowered to fine-tune
hyperparameters at each level of classification. This granular level of optimization ensures that each model is ideally suited to the specific
challenges posed by its target taxonomic group. Consequently, this strategy leads to demonstrably superior outcomes compared to a single,
"flat" model attempting to classify all Mollusca species simultaneously. In such a monolithic approach, optimal hyperparameters for one
group might prove detrimental to another, hindering overall accuracy.
Furthermore, the adoption of a hierarchical network of specialized models allows each CNN to focus on learning the distinct
morphological characteristics relevant to its assigned genus. Our findings suggest that the presence of these genus-specific
features is a more significant determinant of model performance than either the sheer size of the dataset or the total number of
classes within a given node. This implies that even within a complex and diverse group like Mollusca, a well-structured hierarchy
can leverage subtle, genus-level morphological differences to achieve accurate identification.
The success of our hierarchical CNN approach has significant implications for the field of conchology and biodiversity research.
It provides a framework for developing automated, accurate, and efficient tools for species identification, which can be invaluable
for tasks such as biodiversity monitoring, ecological surveys, conservation efforts and shell collection.
Investigating the interpretability of the learned features within each model could provide further insights into the specific
morphological characteristics that distinguish different Mollusca genera and species.