Improve the overall speed and memory use of libCellML#1372
Open
agarny wants to merge 12 commits intocellml:mainfrom
Open
Improve the overall speed and memory use of libCellML#1372agarny wants to merge 12 commits intocellml:mainfrom
agarny wants to merge 12 commits intocellml:mainfrom
Conversation
Note that the test is disabled since it performs repeated parse/analyse/generate loops, which take several minutes to complete. So, it should be enabled only when needed, and not as part of our regular test suite.
... rather than the generic areEquivalentVariables() method which doesn't include caching.
Indeed, in our WebAssembly module, the key was 32-bit (while 64-bit in C++/Python) which caused collisions and incorrect cache hits. So, we replaced the Cantor-pairing key and std::map-based cache with a typed VariableKeyPair and std::unordered_map. This makes the cached-equivalence lookup more explicit, portable, and efficient.
There was a problem hiding this comment.
Pull request overview
This PR focuses on improving libCellML runtime performance and memory usage (issue #1368) by reducing allocations/copies, introducing caching, and replacing some heavier utilities/containers with lighter alternatives.
Changes:
- Reduced copying/allocation across core code paths (e.g.,
reserve()on hot strings/vectors, fewer temporary objects,std::unordered_set/std::unordered_mapin places). - Added/expanded caching for variable equivalence and internal/analyser variable lookup to speed up repeated queries.
- Reworked some XML/Math handling to avoid
std::regexoverhead (e.g., printer math normalisation and libxml error string cleanup), and updated coverage tests accordingly.
Reviewed changes
Copilot reviewed 57 out of 59 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Sets global C++ standard to C++20. |
| src/CMakeLists.txt | Adds CODE_COVERAGE_ENABLED define for coverage builds. |
| src/analyser.cpp | Performance-oriented refactors (caching, sets, reduced repeated work) and MathML element name comparisons. |
| src/analyser_p.h | Adds caches/sets to accelerate internal analysis bookkeeping. |
| src/analyserexternalvariable.cpp | Minor lambda capture optimization; dependencies accessor returns by reference. |
| src/analyserequation.cpp | Reduces allocations via reserve(); several accessors return by reference. |
| src/analyserequationast.cpp | Returns AST value by reference to avoid copies. |
| src/analysermodel.cpp | Adds caches for analyser variable lookup and variable equivalence checks; returns vectors by reference with static empties for invalid models. |
| src/analysermodel_p.h | Replaces equivalence cache keying structure with hashed pointer-pair key. |
| src/analyservariable.cpp | Reserves capacity when materializing weak dependencies. |
| src/annotator.cpp | Reduces string allocations and uses std::format for some formatting. |
| src/commonutils.cpp | Simplifies/optimizes owningModel() by walking parent chain directly. |
| src/component.cpp | Avoids copies in lambdas; math() returns by reference. |
| src/componententity.cpp | Avoids copies in lambdas; encapsulationId() returns by reference. |
| src/debug.cpp | Reserves output string capacity. |
| src/entity.cpp | id() returns by reference. |
| src/generator_p.h | Updates generator internals to use sets and addCode() helper. |
| src/generatorprofile.cpp | Many getters return by reference to avoid string copies. |
| src/generatorprofiletools.cpp | Reserves buffer for profile serialization. |
| src/generatorvariabletracker.cpp | Avoids intermediate vector copies and reserves capacity. |
| src/importedentity.cpp | importReference() returns by reference. |
| src/importer.cpp | Uses unordered containers and iterator-based lookups to reduce repeated map searches/copies. |
| src/importsource.cpp | url() returns by reference. |
| src/internaltypes.h | Switches several internal maps to unordered variants for speed. |
| src/issue.cpp | description() returns by reference. |
| src/model.cpp | Avoids copies in lambdas; uses set to avoid duplicate import URLs. |
| src/namedentity.cpp | name() returns by reference. |
| src/parser.cpp | Avoids string copies; reserves description capacity. |
| src/printer.cpp | Replaces regex-based Math normalization/whitespace cleanup; reduces duplicates via sets; adds reserves. |
| src/reset.cpp | Reset value/id getters return by reference. |
| src/units.cpp | Avoids temporary copies for unit attributes and updates by in-place mutation. |
| src/utilities.cpp | Reserves in hot helpers; uses std::format for numeric conversion; optimizes equivalence traversal with seen set; returns analyser-variable lists by reference. |
| src/utilities.h | analyserVariables(AnalyserVariablePtr) now returns by reference. |
| src/validator.cpp | Replaces repeated linear searches with unordered_sets; removes regex formatting; uses std::format; speeds MathML element membership checks. |
| src/xmlattribute.cpp | Avoids allocating namespace URI strings repeatedly; handles null namespace pointers explicitly. |
| src/xmldoc.cpp | Replaces regex newline replacement with std::replace. |
| src/xmlnode.cpp | Speeds element checks; removes hasAttribute() and adjusts attribute handling; introduces rawName(). |
| src/xmlnode.h | Replaces hasAttribute() with rawName() API. |
| src/api/libcellml/analyserexternalvariable.h | dependencies() returns by reference. |
| src/api/libcellml/analyserequation.h | Several vector accessors return by reference. |
| src/api/libcellml/analyserequationast.h | value() returns by reference. |
| src/api/libcellml/analysermodel.h | Several vector accessors return by reference. |
| src/api/libcellml/component.h | math() returns by reference. |
| src/api/libcellml/componententity.h | encapsulationId() returns by reference. |
| src/api/libcellml/entity.h | id() returns by reference. |
| src/api/libcellml/generatorprofile.h | Many string getters return by reference. |
| src/api/libcellml/importedentity.h | importReference() returns by reference. |
| src/api/libcellml/importsource.h | url() returns by reference. |
| src/api/libcellml/issue.h | description() returns by reference. |
| src/api/libcellml/namedentity.h | name() returns by reference. |
| src/api/libcellml/reset.h | Several string getters return by reference. |
| src/api/libcellml/variable.h | initialValue() / interfaceType() return by reference. |
| tests/bindings/javascript/generator.test.js | Adds (commented-out) stress-test scaffold for repeated big-model runs. |
| tests/bindings/python/test_generator.py | Adds (commented-out) stress-test scaffold for repeated big-model runs. |
| tests/coverage/coverage.cpp | Adds coverage tests for generator version strings and printer math parsing. |
| tests/generator/generator.cpp | Adds (commented-out) stress-test scaffold for repeated big-model runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
605139e to
ea96b87
Compare
Only support the three most recent versions of macOS, not to mention that we want to be able to use `std::format()` which is only available on macOS 13 and later.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1368.