pointtree.evaluation

Evaluation tools.

class pointtree.evaluation.PerformanceTracker[source]

Bases: object

A tracker that stores the execution time and memory usage of different code sections.

reset()[source]: Deletes all tracked data.

save( desc: str, wall_clock_time: float, cpu_time: float, memory_usage: float, memory_increment: float, )[source]

Save the execution time and memory usage of a certain code section.

Parameters:

desc (str) – Description of the tracked code. If a value has already been saved for the description, the values are summed.
wall_clock_time (float) – Wallclock time needed for the execution of the tracked code.
cpu_time (float) – CPU time needed for the execution of the tracked code.
memory_usage (float) – Peak memory usage by the tracked code.
memory_increment (float) – Increase in allocated memory before entering and after exiting the tracked code section.

to_pandas() → DataFrame[source]

Returns:: Tracked execution times and memory usage data as pandas.DataFrame with the columns "Description", "Wallclock Time [s]", "CPU Time [s]", "Memory Usage [GB]", and "Memory Increment [GB]".

class pointtree.evaluation.Profiler( desc: str, performance_tracker: PerformanceTracker, )[source]

Bases: object

A context manager that tracks the execution time and memory usage of the contained code.

Parameters:

desc (str) – Description of the tracked code.
performance_tracker (PerformanceTracker) – Performance tracker in which the measured performance metrics are to be stored.

pointtree.evaluation.evaluate_instance_segmentation( xyz: ndarray[tuple[Any, ...], dtype[float32 | float64]], target: ndarray[tuple[Any, ...], dtype[int64]], prediction: ndarray[tuple[Any, ...], dtype[int64]], *, detection_metrics_matching_method: Literal['panoptic_segmentation', 'point2tree', 'for_instance', 'for_ai_net', 'for_ai_net_coverage', 'tree_learn'] = 'panoptic_segmentation', segmentation_metrics_matching_method: Literal['panoptic_segmentation', 'point2tree', 'for_instance', 'for_ai_net', 'for_ai_net_coverage', 'tree_learn'] = 'for_ai_net_coverage', include_unmatched_instances_in_seg_metrics: bool = True, invalid_instance_id: int = -1, uncertain_instance_id: int = -2, compute_partition_metrics: bool = True, num_partitions: int = 10, ) → Tuple[DataFrame, DataFrame, DataFrame | None, DataFrame | None, DataFrame | None, DataFrame | None][source]

Evaluates the quality of an instance segmentation by computing the following types of metrics:

Instance detection metrics (precision, commission error, recall, omission error, $\text{F}_1$-score)
Instance segmentation metrics averaged over all pairs of ground-truth instances and corresponding predicted instances (mIoU, mPrecision, mRecall)
Instance segmentation metrics for each pair of a ground-truth instance and a corresponding predicted instance
Instance segmentation metrics for different spatial partitions of the instances, averaged over all pairs of ground-truth instances and corresponding predicted instances
Instance segmentation metrics for different spatial partitions of the instances for each pair of a ground-truth instance and a corresponding predicted instance.

For more details on the individual metrics, see the documentation of pointtorch.evaluation.instance_detection_metrics, pointtorch.evaluation.instance_segmentation_metrics, and pointtorch.evaluation.instance_segmentation_metrics_per_partition. The metric calculations are based on a matching of ground-truth instances and predicted instances.

Parameters:

xyz (ndarray[tuple[Any, ...], dtype[float32 | float64]]) – Coordinates of all points.
target (ndarray[tuple[Any, ...], dtype[int64]]) – Ground truth instance ID for each point.
prediction (ndarray[tuple[Any, ...], dtype[int64]]) – Predicted instance ID for each point.
detection_metrics_matching_method (Literal['panoptic_segmentation', 'point2tree', 'for_instance', 'for_ai_net', 'for_ai_net_coverage', 'tree_learn']) – Method to be used for matching ground-truth and predicted instances for computing the instance detection metrics. (default: 'panoptic_segmentation')
segmentation_metrics_matching_method (Literal['panoptic_segmentation', 'point2tree', 'for_instance', 'for_ai_net', 'for_ai_net_coverage', 'tree_learn']) – Method to be used for matching ground-truth and predicted instances for computing the instance segmentation metrics. (default: 'for_ai_net_coverage')
include_unmatched_instances_in_seg_metrics (bool) – Whether ground-truth instances that cannot be matched with a predicted instance should be included in the computation of the of the instance segmentation metrics. (default: True)
invalid_instance_id (int) – ID that is assigned to points not assigned to any instance / to instances that could not be matched and are considered to be false negative or false positive instances. (default: -1)
uncertain_instance_id (int) – ID that is assigned to predicted instances that could not be matched to any target instance but still should not be counted as false positive instances. Must be equal to or smaller than invalid_instance_id. (default: -2)
compute_partition_metrics (bool) – Whether the metrics per partition should be computed. (default: True)
num_partitions (int) – Number of partitions for the computation of instance segmentation metrics per partition. (default: 10)

Returns:

Tuple of six pandas.DataFrames:

Instance detection metrics and the instance segmentation metrics averaged over all instance pairs. The dataframe has the following columns: "DetectionTP", "DetectionFP", "DetectionFN", "DetectionPrecision", "DetectionComissionError", "DetectionRecall", "DetectionOmissionError", "DetectionF1Score", "SegmentationMeanIoU", "SegmentationMeanPrecision", and "SegmentationMeanRecall".
Instance segmentation metrics for each instance pair.
Instance segmentation metrics for different horizontal partitions, averaged over all instance pairs.
Instance segmentation metrics for different horizontal partitions for each instance pair.
Instance segmentation metrics for different vertical partitions, averaged over all instance pairs.
Instance segmentation metrics for different vertical partitions for each instance pair.

The elements containing the metrics per partition are None when compute_partition_metrics is set to False.

pointtree.evaluation.instance_detection_metrics( target: ndarray[tuple[Any, ...], dtype[int64]], prediction: ndarray[tuple[Any, ...], dtype[int64]], matched_predicted_ids: ndarray[tuple[Any, ...], dtype[int64]], matched_target_ids: ndarray[tuple[Any, ...], dtype[int64]], *, invalid_instance_id: int = -1, uncertain_instance_id: int = -2, )[source]

Computes metrics to measure the instance detection quality. Based on a given matching of ground-truth instances $\mathcal{G}_i$ and corresponding predicted instances $\mathcal{P}_i$, the instances are categorized as true positives ($TP$), false positives ($FP$), or false negatives ($FN$). As proposed in Henrich, Jonathan, et al. “TreeLearn: A Deep Learning Method for Segmenting Individual Trees from Ground-Based LiDAR Forest Point Clouds.” Ecological Informatics 84 (2024): 102888. , unmatched predicted instances are not counted as false positives if less than min_precision_fp of their points belong to labeled ground-truth instances. This is because such cases often correspond to instances that are correctly detected but not labeled in the ground truth.

Based on the number of true positives, false positives, and false negatives the following instance detection metrics are calculated:

\[\text{Precision} = \frac{TP}{TP + FP}\]

\[\text{Commission error} = \frac{FP}{TP + FP}\]

\[\text{Recall} = \frac{TP}{TP + FN}\]

\[\text{Omission error} = \frac{FN}{TP + FN}\]

\[\text{F$_1$-Score} = \frac{2 \cdot TP}{2 \cdot TP + FP + FN}\]

Parameters:

target (ndarray[tuple[Any, ...], dtype[int64]]) – Ground truth instance ID for each point.
prediction (ndarray[tuple[Any, ...], dtype[int64]]) – Predicted instance ID for each point.
matched_predicted_ids (ndarray[tuple[Any, ...], dtype[int64]]) – ID of the matched predicted instance for each ground-truth instance.
matched_target_ids (ndarray[tuple[Any, ...], dtype[int64]]) – ID of the matched ground-truth instance for each predicted instance.
invalid_instance_id (int) – ID that is assigned to points not assigned to any instance / to instances that could not be matched and are considered to be false negative or false positive instances. (default: -1)
uncertain_instance_id (int) – ID that is assigned to predicted instances that could not be matched to any target instance but still should not be counted as false positive instances. Must be equal to or smaller than invalid_instance_id. (default: -2)

Raises:

- ValueError – If uncertain_instance_id is larger than invalid_instance_id.
- ValueError – If target and prediction have different lengths.
- ValueError – If the length of matched_predicted_ids is not equal to the number of ground-truth. instances
- ValueError – If the length of matched_target_ids is not equal to the number of predicted instances.
- ValueError – If the unique target and predicted instance IDs don’t start with the same number.

Returns:

A dictionary with the following keys: "TP", "FP", "FN", "Precision", "CommissionError" "Recall", "OmissionError", "F1Score".

Shape:

target: $(N)$
prediction: $(N)$
matched_predicted_ids: $(G)$
matched_target_ids: $(P)$

where

\(N = \text{ number of points}\)
\(G = \text{ number of ground-truth instances}\)
\(P = \text{ number of predicted instances}\)

pointtree.evaluation.instance_segmentation_metrics( target: ndarray[tuple[Any, ...], dtype[int64]], prediction: ndarray[tuple[Any, ...], dtype[int64]], matched_predicted_ids: ndarray[tuple[Any, ...], dtype[int64]], *, invalid_instance_id: int = -1, include_unmatched_instances: bool = True, ) → Tuple[Dict[str, float], DataFrame][source]

Given pairs of ground-truth instances $\mathcal{G}_i$ and matched predicted instances $\mathcal{P}_i$, the following metrics are calculated to measure the quality of the point-wise segmentation:

\[\text{IoU}(\mathcal{G}_i, \mathcal{P}_i) = \frac{|\mathcal{G}_i \cap \mathcal{P}_{i}|}{|\mathcal{G}_i \cup \mathcal{P}_{i}|}\]

\[\text{Precision}(\mathcal{G}_i, \mathcal{P}_i) = \frac{|\mathcal{G}_i \cap \mathcal{P}_{i}|}{|\mathcal{P}_{i}|}\]

\[\text{Recall}(\mathcal{G}_i, \mathcal{P}_i) = \frac{|\mathcal{G}_i \cap \mathcal{P}_{i}|}{|\mathcal{G}_i|}\]

Then, the metrics are averaged over all instance pairs of target instances and matched predicted instances:

\[\text{mIoU} = \frac{1}{N_G} \sum_{i=0}^{N_G} \text{IoU}(\mathcal{G}_i, \mathcal{P}_i)\]

\[\text{mPrecision} = \frac{1}{N_G} \sum_{i=0}^{N_G} \text{Precision}(\mathcal{G}_i, \mathcal{P}_i)\]

\[\text{mRecall} = \frac{1}{N_G} \sum_{i=0}^{N_G} \text{Recall}(\mathcal{G}_i, \mathcal{P}_i)\]

Parameters:

target (ndarray[tuple[Any, ...], dtype[int64]]) – Ground truth instance ID for each point.
prediction (ndarray[tuple[Any, ...], dtype[int64]]) – Predicted instance ID for each point.
matched_predicted_ids (ndarray[tuple[Any, ...], dtype[int64]]) – ID of the matched predicted instance for each ground-truth instance.
invalid_instance_id (int) – ID that is assigned to points not assigned to any instance / to instances that could not be matched. (default: -1)
include_unmatched_instances (bool) – Whether ground-truth instances that were not matched with a predicted instance should be included in the computation of the of the instance segmentation metrics. (default: True)

Raises:

- ValueError – if target and prediction have different lengths
- ValueError – if the length of matched_predicted_ids is not equal to the number of ground-truth instances.
- ValueError – If the unique target and predicted instance IDs don’t start with the same number.

Returns:

A tuple with two elements:

A dictionary containing the segmentation metrics averaged over all instance pairs. The dictionary contains the following keys: "MeanIoU", "MeanPrecision", and "MeanRecall".
A pandas.DataFrame containing the segmentation metrics for each instance pair. The dataframe contains the following columns: "TargetID", "PredictionID", "IoU", "Precision", "Recall".

Shape:

target: $(N)$
prediction: $(N)$
matched_predicted_ids: $(G)$

where

\(N = \text{ number of points}\)
\(G = \text{ number of ground-truth instances}\)

pointtree.evaluation.instance_segmentation_metrics_per_partition( xyz: ndarray[tuple[Any, ...], dtype[float32 | float64]], target: ndarray[tuple[Any, ...], dtype[int64]], prediction: ndarray[tuple[Any, ...], dtype[int64]], matched_predicted_ids: ndarray[tuple[Any, ...], dtype[int64]], partition: Literal['xy', 'z'], include_unmatched_instances: bool = True, invalid_instance_id: int = -1, num_partitions: int = 10, ) → Tuple[DataFrame, DataFrame][source]

Calculates instance segmentation metrics for different spatial partitions of a tree instance as proposed in Henrich, Jonathan, et al. “TreeLearn: A Deep Learning Method for Segmenting Individual Trees from Ground-Based LiDAR Forest Point Clouds.” Ecological Informatics 84 (2024): 102888. The reasoning behind this is that not all parts of a tree are equally difficult to segment. For example, points near the trunk are usually easier to assign than points at the crown boundary, where there are many interactions with other trees. To quantify how well different tree parts are segmented, the points of the ground-truth tree instances and the corresponding predicted instances are partitioned into num_partitions subsets. The segmentation metrics are then calculated separately for each subset and averaged across all pairs of ground-truth instances and corresponding predicted instances.

Henrich et al. propose two axes for partitioning: (1) horizontal distance to the trunk, and (2) vertical distance to the forest ground. For the horizontal partition, the $i$-th subset contains all points with a horizontal distance to the ground truth trunk between $\frac{i-1}{N_p}\cdot r$ and $\frac{i}{N_p}\cdot r$ where $N_p$ is the number of partitions and $r$ is the maximum distance to the trunk of all points in the ground-truth instance. For the vertical partition, the $i$-th subset contains all points with a vertical distance to the ground between $\frac{i-1}{N_p}\cdot h$ and $\frac{i}{N_p}\cdot h$ where $N_p$ is the number of partitions and $h$ is the is the height of the ground-truth tree instance. Points of a prediction that are farther away than $r$ and $h$ from the trunk and ground, respectively, are not taken into account in this part of the evaluation.

Given pairs of ground-truth instances $\mathcal{G}_i$ and matched predicted instances $\mathcal{P}_i$, the following metrics are calculated for each partition:

\[\text{IoU}(\mathcal{G}_i, \mathcal{P}_i) = \frac{|\mathcal{G}_i \cap \mathcal{P}_{i}|}{|\mathcal{G}_i \cup \mathcal{P}_{i}|}\]

\[\text{Precision}(\mathcal{G}_i, \mathcal{P}_i) = \frac{|\mathcal{G}_i \cap \mathcal{P}_{i}|}{|\mathcal{P}_{i}|}\]

\[\text{Recall}(\mathcal{G}_i, \mathcal{P}_i) = \frac{|\mathcal{G}_i \cap \mathcal{P}_{i}|}{|\mathcal{G}_i|}\]

Then, the metrics are averaged over all instance pairs of target instances and matched predicted instances:

\[\text{mIoU} = \frac{1}{N_G} \sum_{i=0}^{N_G} \text{IoU}(\mathcal{G}_i, \mathcal{P}_i)\]

\[\text{mPrecision} = \frac{1}{N_G} \sum_{i=0}^{N_G} \text{Precision}(\mathcal{G}_i, \mathcal{P}_i)\]

\[\text{mRecall} = \frac{1}{N_G} \sum_{i=0}^{N_G} \text{Recall}(\mathcal{G}_i, \mathcal{P}_i)\]

Parameters:

xyz (ndarray[tuple[Any, ...], dtype[float32 | float64]]) – Coordinates of all points.
target (ndarray[tuple[Any, ...], dtype[int64]]) – Ground truth instance ID for each point.
prediction (ndarray[tuple[Any, ...], dtype[int64]]) – Predicted instance ID for each point.
matched_predicted_ids (ndarray[tuple[Any, ...], dtype[int64]]) – ID of the matched predicted instance for each ground-truth instance.
partition (Literal['xy', 'z']) – Partioning schme to be used: "xy" | "z"
include_unmatched_instances (bool) – Whether ground-truth instances that were not matched with a predicted instance should be included in the computation of the of the instance segmentation metrics. (default: True)
invalid_instance_id (int) – ID that is assigned to points not assigned to any instance / to instances that could not be matched. (default: -1)
num_partitions (int) – Number of partitions. (default: 10)

Raises:

- ValueError – if partition is set to an invalid value.
- ValueError – if xyz and target have different lengths.
- ValueError – if target and prediction have different lengths.
- ValueErorr – if the length of matched_predicted_ids is not equal to the number of ground-truth instances.
- ValueError – If the unique target and predicted instance IDs don’t start with the same number.

Returns:

A tuple of two pandas.DataFrames:

Segmentation metrics for each partition averaged over all instance pairs. The dataframe contains the following columns: "Partition", "MeanIoU", "MeanPrecision", and "MeanRecall".
Segmentation metrics for each partition and each instance pair. The dataframe contains the following keys: "Partition", "TargetID", "PredictionID", "IoU", "Precision", "Recall".

Shape:

xyz: $(N, 3)$
target: $(N)$
prediction: $(N)$
matched_predicted_ids: $(G)$

where

\(N = \text{ number of points}\)
\(G = \text{ number of ground-truth instances}\)

pointtree.evaluation.match_instances( target: ndarray[tuple[Any, ...], dtype[int64]], prediction: ndarray[tuple[Any, ...], dtype[int64]], xyz: ndarray[tuple[Any, ...], dtype[float32 | float64]], method: Literal['panoptic_segmentation', 'point2tree', 'for_instance', 'for_ai_net', 'for_ai_net_coverage', 'tree_learn'], *, invalid_instance_id: int = -1, uncertain_instance_id: int = -2, min_tree_height_fp: float = 0.0, min_precision_fp: float = 0.0, labeled_mask: ndarray | None = None, ) → Tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]], Dict[str, ndarray[tuple[Any, ...], dtype[int64]]]][source]

This method implements the instance matching methods proposed in the following works:

panoptic_segmentation: Kirillov, Alexander, et al. “Panoptic segmentation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

This method matches predicted and target instances if their IoU is striclty greater than 0.5, which results in an unambigous matching. This method is also used in Wielgosz, Maciej, et al. “SegmentAnyTree: A Sensor and Platform Agnostic Deep Learning Model for Tree Segmentation Using Laser Scanning Data.” Remote Sensing of Environment 313 (2024): 114367.
point2tree: Wielgosz, Maciej, et al. “Point2Tree (P2T)—Framework for Parameter Tuning of Semantic and Instance Segmentation Used with Mobile Laser Scanning Data in Coniferous Forest.” Remote Sensing 15.15 (2023): 3737.

This method processes the target instances sorted according to their height. Starting with the highest target instance, each target instance is matched with the predicted instance with which it has the highest IoU. Predicted instances that were already matched to a target instance before, are excluded from the matching.
for_instance: Puliti, Stefano, et al. “For-Instance: a UAV Laser Scanning Benchmark Dataset for Semantic and Instance Segmentation of Individual Trees.” arXiv preprint arXiv:2309.01279 (2023).

This method is based on the method proposed by Wielgosz et al. (2023) but additionally introduces the criterion that target and predicted instances must have an IoU of a least 0.5 to be matched.
for_ai_net: Xiang, Binbin, et al. “Automated Forest Inventory: Analysis of High-Density Airborne LiDAR Point Clouds with 3D Deep Learning.” Remote Sensing of Environment 305 (2024): 114078.

This method is similar to the method proposed by Kirillov et al., with the difference that target and predicted instances are also matched if their IoU is equal to 0.5.
for_ai_net_coverage: Xiang, Binbin, et al. “Automated Forest Inventory: Analysis of High-Density Airborne LiDAR Point Clouds with 3D Deep Learning.” Remote Sensing of Environment 305 (2024): 114078.

This method matches each target instance with the predicted instance with which it has the highest IoU. This means that predicted instances can be matched with multiple target instances. Such a matching approach is useful for the calculation of segmentation metrics (e.g., coverage) that should be independent from the instance detection rate.
tree_learn: Henrich, Jonathan, et al. “TreeLearn: A Deep Learning Method for Segmenting Individual Trees from Ground-Based LiDAR Forest Point Clouds.” Ecological Informatics 84 (2024): 102888.

This method uses Hungarian matching to match predicted and target instances in such a way that the sum of the IoU scores of all matched instance pairs is maximized. Subsequently, matches with an IoU score less than o equal to 0.5 are discarded.

Parameters:

target (ndarray[tuple[Any, ...], dtype[int64]]) – Target instance ID for each point. Instance IDs must be integers forming a continuous range. The smallest instance ID in target must be equal to the smallest instance ID in prediction.
prediction (ndarray[tuple[Any, ...], dtype[int64]]) – Predicted instance ID for each point. Instance IDs must be integers forming a continuous range. The smallest instance ID in prediction must be equal to the smallest instance ID in target.
xyz (ndarray[tuple[Any, ...], dtype[float32 | float64]]) – Coordinates of all points.
method (Literal['panoptic_segmentation', 'point2tree', 'for_instance', 'for_ai_net', 'for_ai_net_coverage', 'tree_learn']) – Instance matching method to be used.
invalid_instance_id (int) – ID that is used as label for points not assigned to any instance in target and prediction. In the returned instance matchings, the matched instance ID is set to invalid_instance_id for target instances that were not matched with any predicted instance and for predicted instances that were not matched with any target instance and are considered as false positives according to min_tree_height_fp and min_precision_fp. (default: -1)
uncertain_instance_id (int) – ID that is used to mark predicted instances that were not matched with any target instance but are not counted as false positives according to min_tree_height_fp or min_precision_fp. Must be equal to or smaller than invalid_instance_id. (default: -2)
min_tree_height_fp (float) – Minimum height an unmatched predicted tree instance must have in order to be counted as a false positive. The height of a tree is defined as the maximum distance between its points and a digital terrain model. If a predicted tree instance could not be matched with any target instance but its height is below min_tree_height_fp, its matched instance ID is set to uncertain_instance_id. (default: 0.0)
min_precision_fp (float) – Minimum percentage of points of an unmatched predicted instance that must be labeled in order to count the predicted instance as a false positive. If labeled_mask is not None, the points for which the mask is True are considered as labeled points. If labeled_mask is None, all points that are labeled as instances are considered as labeled points (i.e., points labeled with invalid_instance_id are considered unlabeled). If a predicted instance could not be matched with any target instance but its percentage of labeled points is below min_precision_fp, its matched instance ID is set to uncertain_instance_id. (default: 0.0)
labeled_mask (ndarray | None) – Boolean mask indicating which points are labeled. This mask is used to mark false positive instances that mainly consist of unlabeled points. (default: None)

Returns: A tuple with the following elements:

matched_target_ids: IDs of the matched target instance for each predicted instance. If the predicted instance is not matched to any target instance, its entry is set to either invalid_instance_id (false positive) or uncertain_instance_id (uncertain predicted instance).
matched_predicted_ids: IDs of the matched predicted instance for each target instance. If the target instance is not matched to any predicted instance, its entry is set to invalid_instance_id.
metrics: Dictionary with the keys "tp", "fp", "fn". The values are tensors whose length is equal to the number of target instances and that contain the number of true positive, false positive, and false negative points between the matched instances. For target instances not matched to any prediction, the true and false posiitves are set to zero and the false negatives to the number of target points.

Shape:

xyz: $(N, 3)$
target: $(N)$
prediction: $(N)$
Output:
- matched_target_ids: $(P)$
- matched_predicted_ids: $(T)$
- metrics: Dictionary whose values are tensors of length $(T)$

where

\(N\) = number of points
\(P\) = number of predicted instances
\(T\) = number of target instances

pointtree.evaluation.semantic_segmentation_metrics( target: ndarray, prediction: ndarray, class_map: Dict[str, int], aggregate_classes: Dict[str, List[int]] | None = None, ) → Dict[str, float][source]

Calculates semantic segmentation metrics.

Parameters:

target (ndarray) – Ground truth semantic class IDs for each point.
prediction (ndarray) – Predicted semantic class IDs for each point.
class_map (Dict[str, int]) – A dictionary mapping class names to numeric class IDs.
aggregate_classes (Dict[str, List[int]] | None) – A dictionary with which aggregations of classes can be defined. The keys are the names of the aggregated classes and the values are lists of the IDs of the classes to be aggregated. (default: None)

Returns:

A dictionary containing the following keys for each semantic class: "<class_name>IoU", "<class_name>Precision", "<class_name>Recall". For each aggregated class, the keys "<class_name>IoUAggregated", "<class_name>PrecisionAggregated", "<class_name>RecallAggregated" are provided.