Automatic Classification

The automatic classification module groups rupture planes into clusters representing distinct fault sets. It identifies active faults with similar orientations and spatial proximity using both orientation-based clustering and optional spatial sub-clustering. For details, see the original HyFI publication (Truttmann et al., 2023).

Core Concepts

Fisher Distribution

The von Mises-Fisher distribution models directional data and is reflected by the concentration parameter κ that measures how clustered the vectors of the rupture planes are:

κ = 0: Uniform distribution (no preferred direction)
κ = 10: Moderate clustering
κ > 100: Very tight clustering

Computational Workflow

The auto classification module follows a structured multi-stage pipeline:

Step 1: Data Validation & Preparation

Check if the required fault plane data is available:

Check required columns: rupt_plane_azi, rupt_plane_dip
Filter valid data: Remove events with NaN fault plane parameters
Output: Subset DataFrame with only valid fault planes for clustering

Quality gate: Skip classification if fewer than 3 valid events

Step 2: Enhanced Point Cloud Generation (Optional, RECOMMENDED)

If enabled (use_fault_plane_points_for_clustering; true), generate additional synthetic points (“hypocenters”) along the calculated rupture planes to enhance the relatively sparse hypocenter catalog For each valid rupture plane:

Generate circular rupture plane using magnitude-based radius (Leonard 2014)
Create concentric rings at fixed intervals (fault_plane_radius_interval_meters: Spacing between concentric circles (typically 10-25 m))
Fill circles with points at specified density (fault_plane_point_density_meters: Point spacing along circles (typically 10-25 m))
Maintain mapping to source fault index

This generates a new hypocenter point cloud called “enhanced point cloud” with ~100-1000× more points than the original one while preserving full 3D rupture plane geometry. This can improve both spatial separation of nearby faults, as well as later interpolation.

Step 3: Hemispherical Consistency

Ensures that all fault plane normal vectors point to same hemisphere to prevent ambiguous clustering.

Reference vector: Take first fault plane normal as reference
Flip if needed: For each other fault plane, check if angular distance > 90°
Correction: If flipped, negate all components (x, y, z) → (-x, -y, -z)
Result: All vectors on same hemisphere, consistent for clustering

Step 4: Automatic Cluster Number Determination (Optional)

If enabled, this step determines the optimal number of orientation clusters automatically:

Test range: Evaluate k = 2 to max_clusters
Scoring for each k:
- Apply clustering algorithm
- Calculate Fisher concentration (κ) for each cluster
- Compute the silhouette score or dispersion metric
Selection strategy:
- Choose k with highest score
- Penalize k > 3 to prefer simpler solutions
- Only accept complex solution if >40% better than simple alternatives

Step 5: Orientation Clustering

Group fault planes by similar orientations using one of the following directional clustering algorithms:

A. Spherical K-Means (SKM)

Optimizes cluster centers on unit sphere
Minimizes within-cluster angular distances
Fast, deterministic (seeded)
Good for roughly equal-sized clusters
Convergence: max_iter=300, tol=1e-4 (default)

B. Von Mises-Fisher Soft (VMF Soft)

Probabilistic mixture model on sphere
Assigns soft membership probabilities
Better for overlapping clusters
Fisher concentration (κ) parameter auto-tuned
Convergence: max_iter=300, tol=1e-6 (default)

C. Von Mises-Fisher Hard (VMF Hard)

Deterministic variant of VMF
Hard cluster assignments (one cluster per event)
Better for distinct clusters
Convergence: max_iter=300, tol=1e-6 (default)

This generates orientation cluster labels (0, 1, 2, …, n_clusters-1) in the DataFrame column orient_cluster.

Step 6: Spatial Sub-Clustering (Optional)

The rupture planes were grouped into sets of similar orientations in step 5. In this step, the algorithm now allows to separate nearby groups of rupture planes from the same orientation set into distinct fault segments. This can either be done on the original hypocenter locations alone, or including the enhanced point cloud if generated in step 2 using use_fault_plane_points_for_clustering: true

This generates spatial sub-cluster IDs within each orientation cluster in the column spatial_cluster, and the final_cluster_id that is the temporary global fault identifier (e.g., “0.1”, “0.2”, “1.0”).

Step 7: Post-Clustering Quality Control

This step checks the minimum cluster size to remove spurious small clusters that contain less events than min_events_per_cluster to prevent over-segmentation.

Step 8: Results Mapping & Output

The clustering results are finally mapped backed to the full DataFrame df_hyfi.

Main Outputs

The following columns are added to the HyFI_results.csv output file:

Orientation Clustering:

orient_cluster: Fault system ID based on orientation

Spatial Clustering (if enabled):

spatial_cluster: Sub-cluster within each orientation group

Final Fault Identifiers:

final_cluster_id: Global fault system identifier
final_cluster_id_local: Sequence-specific fault identifier

Metadata:

sequence_label: Which sequence this event belongs to (multi-sequence context)
segmentation_level: Hierarchical level (A, B, C, etc.)

References

Truttmann, S., Diehl, T., & Herwegh, M. (2023). Hypocenter-based 3D imaging of active faults: Method and applications in the Southwestern Swiss Alps. Journal of Geophysical Research: Solid Earth, 128, e2023JB026352. https://doi.org/10.1029/2023JB026352

Happy fault imaging! 🎉