Skip to content

Improve the overall speed and memory use of libCellML#1372

Open
agarny wants to merge 12 commits intocellml:mainfrom
agarny:issue1368
Open

Improve the overall speed and memory use of libCellML#1372
agarny wants to merge 12 commits intocellml:mainfrom
agarny:issue1368

Conversation

@agarny
Copy link
Copy Markdown
Contributor

@agarny agarny commented Apr 14, 2026

Fixes #1368.

agarny added 3 commits April 9, 2026 13:54
Note that the test is disabled since it performs repeated parse/analyse/generate loops, which take several minutes to complete. So, it should be enabled only when needed, and not as part of our regular test suite.
... rather than the generic areEquivalentVariables() method which doesn't include caching.
Indeed, in our WebAssembly module, the key was 32-bit (while 64-bit in C++/Python) which caused collisions and incorrect cache hits.

So, we replaced the Cantor-pairing key and std::map-based cache with a typed VariableKeyPair and std::unordered_map. This makes the cached-equivalence lookup more explicit, portable, and efficient.
Copilot AI review requested due to automatic review settings April 14, 2026 06:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on improving libCellML runtime performance and memory usage (issue #1368) by reducing allocations/copies, introducing caching, and replacing some heavier utilities/containers with lighter alternatives.

Changes:

  • Reduced copying/allocation across core code paths (e.g., reserve() on hot strings/vectors, fewer temporary objects, std::unordered_set/std::unordered_map in places).
  • Added/expanded caching for variable equivalence and internal/analyser variable lookup to speed up repeated queries.
  • Reworked some XML/Math handling to avoid std::regex overhead (e.g., printer math normalisation and libxml error string cleanup), and updated coverage tests accordingly.

Reviewed changes

Copilot reviewed 57 out of 59 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
CMakeLists.txt Sets global C++ standard to C++20.
src/CMakeLists.txt Adds CODE_COVERAGE_ENABLED define for coverage builds.
src/analyser.cpp Performance-oriented refactors (caching, sets, reduced repeated work) and MathML element name comparisons.
src/analyser_p.h Adds caches/sets to accelerate internal analysis bookkeeping.
src/analyserexternalvariable.cpp Minor lambda capture optimization; dependencies accessor returns by reference.
src/analyserequation.cpp Reduces allocations via reserve(); several accessors return by reference.
src/analyserequationast.cpp Returns AST value by reference to avoid copies.
src/analysermodel.cpp Adds caches for analyser variable lookup and variable equivalence checks; returns vectors by reference with static empties for invalid models.
src/analysermodel_p.h Replaces equivalence cache keying structure with hashed pointer-pair key.
src/analyservariable.cpp Reserves capacity when materializing weak dependencies.
src/annotator.cpp Reduces string allocations and uses std::format for some formatting.
src/commonutils.cpp Simplifies/optimizes owningModel() by walking parent chain directly.
src/component.cpp Avoids copies in lambdas; math() returns by reference.
src/componententity.cpp Avoids copies in lambdas; encapsulationId() returns by reference.
src/debug.cpp Reserves output string capacity.
src/entity.cpp id() returns by reference.
src/generator_p.h Updates generator internals to use sets and addCode() helper.
src/generatorprofile.cpp Many getters return by reference to avoid string copies.
src/generatorprofiletools.cpp Reserves buffer for profile serialization.
src/generatorvariabletracker.cpp Avoids intermediate vector copies and reserves capacity.
src/importedentity.cpp importReference() returns by reference.
src/importer.cpp Uses unordered containers and iterator-based lookups to reduce repeated map searches/copies.
src/importsource.cpp url() returns by reference.
src/internaltypes.h Switches several internal maps to unordered variants for speed.
src/issue.cpp description() returns by reference.
src/model.cpp Avoids copies in lambdas; uses set to avoid duplicate import URLs.
src/namedentity.cpp name() returns by reference.
src/parser.cpp Avoids string copies; reserves description capacity.
src/printer.cpp Replaces regex-based Math normalization/whitespace cleanup; reduces duplicates via sets; adds reserves.
src/reset.cpp Reset value/id getters return by reference.
src/units.cpp Avoids temporary copies for unit attributes and updates by in-place mutation.
src/utilities.cpp Reserves in hot helpers; uses std::format for numeric conversion; optimizes equivalence traversal with seen set; returns analyser-variable lists by reference.
src/utilities.h analyserVariables(AnalyserVariablePtr) now returns by reference.
src/validator.cpp Replaces repeated linear searches with unordered_sets; removes regex formatting; uses std::format; speeds MathML element membership checks.
src/xmlattribute.cpp Avoids allocating namespace URI strings repeatedly; handles null namespace pointers explicitly.
src/xmldoc.cpp Replaces regex newline replacement with std::replace.
src/xmlnode.cpp Speeds element checks; removes hasAttribute() and adjusts attribute handling; introduces rawName().
src/xmlnode.h Replaces hasAttribute() with rawName() API.
src/api/libcellml/analyserexternalvariable.h dependencies() returns by reference.
src/api/libcellml/analyserequation.h Several vector accessors return by reference.
src/api/libcellml/analyserequationast.h value() returns by reference.
src/api/libcellml/analysermodel.h Several vector accessors return by reference.
src/api/libcellml/component.h math() returns by reference.
src/api/libcellml/componententity.h encapsulationId() returns by reference.
src/api/libcellml/entity.h id() returns by reference.
src/api/libcellml/generatorprofile.h Many string getters return by reference.
src/api/libcellml/importedentity.h importReference() returns by reference.
src/api/libcellml/importsource.h url() returns by reference.
src/api/libcellml/issue.h description() returns by reference.
src/api/libcellml/namedentity.h name() returns by reference.
src/api/libcellml/reset.h Several string getters return by reference.
src/api/libcellml/variable.h initialValue() / interfaceType() return by reference.
tests/bindings/javascript/generator.test.js Adds (commented-out) stress-test scaffold for repeated big-model runs.
tests/bindings/python/test_generator.py Adds (commented-out) stress-test scaffold for repeated big-model runs.
tests/coverage/coverage.cpp Adds coverage tests for generator version strings and printer math parsing.
tests/generator/generator.cpp Adds (commented-out) stress-test scaffold for repeated big-model runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/printer.cpp Outdated
@agarny agarny force-pushed the issue1368 branch 2 times, most recently from 605139e to ea96b87 Compare April 14, 2026 07:14
Only support the three most recent versions of macOS, not to mention that we want to be able to use `std::format()` which is only available on macOS 13 and later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the overall speed and memory use of libCellML

2 participants