FXC-5747: Avoid unnecessary deep copies in monitor and unstructured data#3303
FXC-5747: Avoid unnecessary deep copies in monitor and unstructured data#3303momchil-flex wants to merge 2 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes performance by avoiding unnecessary deep copies in data update operations across monitor data, simulation data, and unstructured grid data. The changes add deep=False to copy/updated_copy calls where update values are already independent objects, and validate=False where updates are already the correct pydantic type. Additionally, it replaces symmetry_expanded_copy with symmetry_expanded for read-only access patterns.
Changes:
- Add
deep=Falseto copy operations where update values are new independent objects - Add
validate=Falsewhere update values are already correctly typed (TCAD symmetry expansion, unstructured grid operations, sim data with model-typed updates) - Replace
symmetry_expanded_copywithsymmetry_expandedfor read-only field access - Document the safety constraint regarding xarray arithmetic operations losing subclass types
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
data/monitor_data.py |
Adds deep=False to various copy operations (normalize, time_reversed_copy, translated_copy, mode operations, flux/projection normalize); switches to symmetry_expanded for read-only access |
data/sim_data.py |
Adds deep=False, validate=False to simulation renormalize operations |
data/unstructured/base.py |
Adds deep=False, validate=False to unstructured data operations (rename, clean, arithmetic, real/imag/abs, sel, isel, interpolation) |
data/unstructured/triangular.py |
Adds deep=False, validate=False to triangular grid reflection operations |
eme/data/sim_data.py |
Adds deep=False, validate=False to EME field reconstruction |
mode/data/sim_data.py |
Adds deep=False, validate=False to mode sorting operations |
tcad/data/monitor_data/abstract.py |
Adds deep=False, validate=False to TCAD symmetry expansion |
tcad/data/monitor_data/charge.py |
Adds deep=False, validate=False to capacitance data symmetry expansion |
components/medium.py |
Changes self.model_fields to type(self).model_fields for correct class attribute access |
docs/notebooks |
Updates subproject commit reference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add deep=False to copy/updated_copy calls where update values are already new objects, avoiding redundant deep copies of large data arrays. Also add validate=False where update values are already the correct pydantic type (e.g. TCAD symmetry expansion, unstructured grid operations). Notably, validate=False is NOT safe when xarray arithmetic produces plain DataArray objects that need pydantic coercion back to tidy3d subclasses. Key changes: - TCAD HeatChargeMonitorData/SteadyCapacitanceData symmetry expansion - UnstructuredGridDataset operations (rename, clean, arithmetic, sel, isel) - TriangularGridDataset.reflect for normal-axis reflection_only - Various MonitorData methods (normalize, time_reversed_copy, etc.) - SimulationData.renormalize, EME/Mode sim data operations - Use symmetry_expanded instead of symmetry_expanded_copy for read-only access Co-authored-by: Cursor <cursoragent@cursor.com>
15c8e94 to
c279242
Compare
|
@yaugenst could you take a look especially at this comment I'm adding to the base model copy method? One thing this is pointing to is that it would be nice if we can split type coercion "validation" from our custom validators, so we can at least avoid running the latter in some instances. |
- Remove validate=False from unstructured base.py updated_copy calls (xarray ops return plain DataArray, losing IndexedDataArray). - Remove validate=False from EME sim_data (drop_vars loses subclass). - Remove validate=False from SimulationData.renormalize (restores _check_normalize_index validator). - Add deep/validate safety guidance to Tidy3dBaseModel.copy() docstring. - Remove redundant comment from monitor_data.py. Co-authored-by: Cursor <cursoragent@cursor.com>
99596d7 to
4d88630
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| return self.updated_copy(symmetry=(0, 0, 0), **new_field_components) | ||
| return self.updated_copy( | ||
| symmetry=(0, 0, 0), **new_field_components, deep=False, validate=False | ||
| ) |
There was a problem hiding this comment.
Unsafe validate=False with xarray-derived SpatialDataArray values
Medium Severity
The symmetry_expanded_copy now uses validate=False, but _symmetry_expanded_copy_base processes SpatialDataArray field components through xarray operations (.sortby() inside SpatialDataArray.reflect and .isel() inside sel_inside) which, per the PR's own documentation in base.py, lose the tidy3d DataArray subclass. Without validation, those plain xr.DataArray objects are stored directly instead of being coerced back to SpatialDataArray. This affects TemperatureData and SteadyPotentialData which accept SpatialDataArray fields.
Diff CoverageDiff: origin/develop...HEAD, staged and unstaged changes
Summary
tidy3d/components/data/monitor_data.pyLines 654-662 654 if eig_val < 0:
655 update[field_name] = field * self.grid_dual_correction
656 else:
657 update[field_name] = field * self.grid_primal_correction
! 658 return field_data.copy(deep=False, update=update)
659
660 @property
661 def intensity(self) -> ScalarFieldDataArray:
662 """Return the sum of the squared absolute electric field components."""tidy3d/components/data/unstructured/base.pyLines 432-440 432 result = getattr(ufunc, method)(*inputs, **kwargs)
433
434 if type(result) is tuple:
435 # multiple return values
! 436 return tuple(self.updated_copy(values=x, deep=False) for x in result)
437 elif method == "at":
438 # no return value
439 return None
440 else: |


Summary
deep=Falsetocopy/updated_copycalls across monitor data, simulation data, and unstructured grid data where update values are already new independent objects, avoiding redundant deep copies of potentially large data arrays.validate=Falsewhere update values are already the correct pydantic type (TCAD symmetry expansion, unstructured grid operations, sim data with model-typed updates).symmetry_expanded_copywithsymmetry_expandedin two read-only call sites inFieldData(to_sourceand_make_adjoint_sources).Important note on
validate=Falsevalidate=Falseis not safe when update values come from xarray arithmetic/operations (e.g.field / amps,np.conj(field),field * factor,field.isel(...)) because these produce plainxarray.DataArrayobjects that lose the tidy3d subclass (e.g.ScalarFieldDataArray). Pydantic validation is needed to coerce them back. A comment documenting this has been added nearElectromagneticFieldData.time_reversed_copy.Files changed
data/monitor_data.pydeep=Falseon normalize, time_reversed_copy, grid_corrected_copy, translated_copy, mode reorder/subset/interp, flux/projection normalize;symmetry_expandedfor read-only accessdata/sim_data.pydeep=False, validate=Falseonrenormalizedata/unstructured/base.pydeep=False, validate=Falseon rename, clean, arithmetic, real/imag/abs, sel, isel, non-spatial interpdata/unstructured/triangular.pydeep=False, validate=Falseon reflect (normal-axis, reflection_only)eme/data/sim_data.pydeep=False, validate=Falseon EME field reconstructionmode/data/sim_data.pydeep=False, validate=Falseon sort_modestcad/data/monitor_data/abstract.pydeep=False, validate=Falseon symmetry_expanded_copytcad/data/monitor_data/charge.pydeep=False, validate=Falseon SteadyCapacitanceData.symmetry_expanded_copyTest plan
tests/test_data/test_monitor_data.py— 352 passedtests/test_data/test_sim_data.py— 84 passedtests/test_components/test_heat_charge.py— 57 passedtests/test_components/test_heat.py— 20 passedtests/test_components/test_eme.py— 18 passedtests/test_eme.py(numerical) — 20 passedMade with Cursor
Note
Medium Risk
Touches many copy/update pathways for core data containers; incorrect
deep=False/validate=Falseusage could introduce shared-mutation bugs or type-coercion/validator regressions, though changes are largely performance-oriented.Overview
Reduces memory/time overhead when creating modified copies of large simulation datasets by switching many
copy()/updated_copy()call sites todeep=False(and selectivelyvalidate=False) across monitor data, simulation data, and unstructured grid data.Also updates
Tidy3dBaseModel.copydocstring to clarify safety tradeoffs fordeepandvalidate, and replaces two read-only uses ofsymmetry_expanded_copywithsymmetry_expandedinFieldDatato avoid extra copying when only accessing expanded fields.Written by Cursor Bugbot for commit 4d88630. This will update automatically on new commits. Configure here.