Skip to content

Add initial BEAST X backend integration#44

Open
hhua361 wants to merge 40 commits into
mainfrom
haoyuan/beastx-substitution-tiles
Open

Add initial BEAST X backend integration#44
hhua361 wants to merge 40 commits into
mainfrom
haoyuan/beastx-substitution-tiles

Conversation

@hhua361

@hhua361 hhua361 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

This change introduces the BEAST X backend integration for PhyloSpec. The current main branch does not yet include BEAST X support, so this adds the BEAST X integration module, backend state/model layer, tile library, runner entry point, MCMC execution support, XML export, XML validation, and representative tests.

Main changes include:

  • Adds the BEAST X integration module, including Maven setup, README notes, tile library registration, and runtime access to the BEAST X Java API.

  • Adds BEAST X tiles for input data, distributions, priors, tree priors, substitution models, site models, branch-rate models, PhyloCTMC likelihoods, continuous traits, RPN calculations, MCMC configuration, loggers, and operators.

  • Registers the BEAST X tile library through META-INF/services/org.phylospec.tiling.TileLibrary.

  • Adds a structured BEAST X backend state/model layer used by the tiler and runner. This includes parameter wrappers, tree/likelihood/prior model records, operator builders, MCMC builders, result summaries, and validation helpers.

  • Adds PhyloSpecRunner support for BEAST X so PhyloSpec source strings or .phylospec files can be built and run through one backend entry point, without directly exposing the lower-level BEAST X builders.

  • Adds BEAST X MCMC execution support through the BEAST X Java API. Representative tests verify that selected models can run and write parameter/tree logs.

  • Adds BEAST X XML export from the structured BEAST X backend state. XML generation is implemented with explicit XML component builders for parameters, priors, tree priors, alignments, tree models, substitution models, site models, branch-rate models, PhyloCTMC likelihoods, operators, and loggers.

  • Adds BEAST X XML parse/run validation using the BEAST X XML parser. This covers prior-only models, tree-prior models, nucleotide/protein/trait PhyloCTMC models, showcase models, and runner-level XML entry points.

  • Adds and reorganizes BEAST X tests and .phylospec fixtures around backend capability areas, including tiling scripts, expected BEAST X state, runner behavior, MCMC construction, XML component builders, XML execution, representative models, and showcase models.

Test plan

Checked whitespace and diff formatting:

git diff --check origin/main

Ran core tests, excluding two pre-existing converter tests:

mvn -pl core/java '-Dtest=!org.phylospec.converters.LPhyConverterTest,!org.phylospec.converters.RevConverterTest' test

Ran BEAST 3 integration tests:

mvn -pl integrations/beast3/java test

Ran BEAST X integration tests:

mvn -pl integrations/beastx/java test

Checked working tree status:

git status

Notes

  • A plain top-level mvn test currently fails because of two pre-existing converter tests in phylospec-core: LPhyConverterTest and RevConverterTest. These are in the LPhy/Rev converter layer and are unrelated to the BEAST X backend changes in this PR.

  • BEAST X XML export is generated from the structured BEAST X backend state rather than from arbitrary runtime BEAST X objects. I did not find a general BEAST X runtime-object-to-XML serializer, so this follows a BEAUti-style approach: keep a structured model/configuration representation and write XML from that representation.

  • Full PhyloCTMC XML execution requires a working BEAGLE native library installation. Without BEAGLE, some XML models can still be generated and parsed, but full tree-likelihood execution may fail.

  • Follow-up work includes more general dimension/type constraints such as Simplex(4), data-type-aware observed alignment matching.

@hhua361 hhua361 requested a review from tochsner June 15, 2026 12:24
tochsner and others added 29 commits June 20, 2026 18:04
Adds an install-beast-mcmc.sh helper that downloads the official BEAST X
v10.5.0 release tgz and registers the bundled fat beast.jar as
dr:beast-mcmc:10.5.0 in the local Maven repository, plus a README
documenting the one-time setup. The integrations/beastx/java pom now
declares this dependency.

Adds a first concrete tile, JC69Tile, mirroring the BEAST 2 pattern.
BEAST X has no dedicated JukesCantor class, so the tile builds an HKY
with kappa=1.0 and equal nucleotide frequencies, which is equivalent.
Registers the tile in BeastXCoreTileLibrary and adds a unit test
exercising the BEAST X classpath at runtime.
@tochsner tochsner force-pushed the haoyuan/beastx-substitution-tiles branch from 55528cf to e63b237 Compare June 20, 2026 16:05

@tochsner tochsner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! A first batch of comments. They mainly cover the tiles. I still have to finish the comments for the tiling package.

  • Most points are minor. Quite often, a remark applies to multiple places. I tried to find all of them, but you know the code better.
  • Feel free to respond first before making changes, I'm happy to discuss my comments :)

import beast.base.core.*;
import beast.base.inference.*;
import org.phylospec.tiling.TypeToken;
import tiling.TypeToken;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind this and all other changes in commit 40f93be ? Is the only change there the duplicated tiling classes in order to get rid of the generic type parameter for the state class? Cause in that case, I would argue that the code duplication is worse than having slightly more verbose types.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, it breaks the git version history of these files.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to drop this commit unless there is a reason for it I'm missing, wdyt?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I checked this more carefully.

The related changes seem to appear in three BEAST 3 files:

  • integrations/beast3/java/src/test/java/tiling/TilingScriptFilesTest.java
  • integrations/beast3/java/src/test/java/beaststate/BeastStateScriptFilesTest.java
  • integrations/beast3/java/src/main/java/beastconfig/BEASTState.java

These ended up in this PR because I was trying to sync my branch with the latest BEAST 3 backend changes before continuing the BEAST X work. I used the recent fix-example branch as a reference for those BEAST 3 files, and that branch also has BEASTState.java importing tiling.TypeToken.

So this was not intended as a BEAST X design change. I agree that these BEAST 3 tiling changes should not be mixed into this BEAST X PR, especially if they duplicate the core tiling classes just to avoid the generic state type.

I’ll remove/revert this BEAST 3 update commit unless there is a separate reason to keep it here.


EvaluateTiles<BEASTState> evaluateTiles = new EvaluateTiles<>(new BeastCoreTileLibrary().getTiles(), new ArrayList<>(), variableResolver, stochasticityResolver);
List<Tile<?, BEASTState>> bestTilings = null;
EvaluateTiles evaluateTiles = new EvaluateTiles(TileLibrary.loadAll(), OperatorTileLibrary.getTiles(), variableResolver, stochasticityResolver);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change from 40f93be I like though :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


EvaluateTiles<BEASTState> evaluateTiles = new EvaluateTiles(new BeastCoreTileLibrary().getTiles(), new ArrayList<>(), variableResolver, stochasticityResolver);
List<Tile<?, BEASTState>> bestTilings = null;
EvaluateTiles evaluateTiles = new EvaluateTiles(TileLibrary.loadAll(), OperatorTileLibrary.getTiles(), variableResolver, stochasticityResolver);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I'm happy to keep that

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


public class RelaxedClockTile extends GeneratorTile<DiscretizedBranchRates, BeastXState> {

private static final double MEAN_TOLERANCE = 1.0e-6;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a global PhyloSpec-wide threshold for these kind of things (e.g. also for the Simplex sum). Because I also do these checks in the type checker sometimes and right now it's a bit of a wild west with different numbers floating around. We don't have to change anything here, but it might make sense to do a short PR on this in the future.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The exception would be when the engine itself does checks with specific thresholds, but as far as I can see this is not the case for BEAST X here)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. The check here is meant to ensure that the RelaxedClock base distribution has mean 1.0, so that the overall mean rate is controlled by clockRate.

I agree that the tolerance itself should probably be shared PhyloSpec-wide, together with similar checks like the Simplex sum. I’ll keep it local for now unless you prefer changing it in this PR, and we can move it to a shared helper/constant in a follow-up PR.

Comment on lines +91 to +98
if (numBranches <= 0) {
throw new TileApplicationError(
this.getRootNode(),
"RelaxedClock requires a tree with at least one branch.",
"Use a tree with at least two taxa.",
List.of("Tree tree ~ Yule(birthRate=1.0, taxa=taxa)")
);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how thorough you are!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !!!

throw new FailedTilingAttempt.Irrelevant();
}

AutoOperatorConfigTile tile =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only relevant in a beastx block, you should add a block check too. Smth like

if (!(assignment.block instanceof Stmt.Block.Custom(String blockName)) || !Objects.equals(blockName, "beastx")) {
            throw new FailedTilingAttempt.Irrelevant();
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relevant for all block-level tiles. A different way to fix this is with a template tile (check out ChainLengthTile in the BEAST 2 parser for instance). Notably, template tiles don't match the variable name so this would still work here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This then simplifies the logic as well as it eliminates the need to manually parse the literal, you can just specify a PositiveReal input.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. This tile is meant to be BEAST X-specific, so it should not be able to match assignments outside a beastx block.

I’ll add the block check here, and I’ll also look at the ChainLengthTile pattern to see if this should be rewritten as a template tile instead. That would probably make the matching cleaner and avoid the manual literal parsing.

import java.util.Map;
import java.util.Set;

public class AutoOperatorConfigTile extends Tile<Void, BeastXState> implements CandidateTile<BeastXState> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the level of control regarding things like operator settings in a PhyloSpec model is an entire discussion in itself. While engine developers tend to prefer to give all the control to users, I personally would err on the cautious side to not end up with feature creep. One thing to keep in mind for example is backwards-compatibility. We can keep this in for now, but it would make sense to define a general strategy when it comes to the level of control.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a fair concern. I added this mainly to make the BEAST X runs configurable while testing different operator setups, especially for the BEAST X vs BEAST 3 comparisons. I agree we should be careful not to turn every backend option into a PhyloSpec language feature. I’ll keep this limited for now, and we can revisit the general strategy for engine-specific controls before expanding this further.

import java.util.Map;
import java.util.Set;

public class FileLoggerTile extends Tile<Void, BeastXState> implements CandidateTile<BeastXState> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a template tile like in FileLoggerTile of the BEAST 2 parser? I think this would heavily simplify this no?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same holds for the other two logger tiles.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll refactor these using the BEAST 2 FileLoggerTile pattern. The current BEAST X logger tiles do too much manual matching, so a template tile should simplify the file, screen, and tree logger cases.


import java.util.IdentityHashMap;

public class ObservedAsIntTile extends TemplateTile<IntScalar<? extends Int>, BeastXState> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we need this tile and the ObservedAsNonNegativeIntTile? Shouldn't it be covered by ObservedAsTile?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. ObservedAsTile currently only handles RealScalar; the name is a bit misleading.

The Int and NonNegativeInt versions exist because BEAST X binds observed values through different backend wrappers (BeastXRealScalarParam vs BeastXIntScalarParam), and BoundDistribution is typed by that concrete wrapper.

I agree the duplicated logic should be cleaned up. I can either rename ObservedAsTile to ObservedAsRealTile, or factor the shared logic into a helper with small typed wrappers.

import java.util.HashSet;
import java.util.Set;

public class BeastXBMTraitLikelihoodSpec extends AbstractModelLikelihood {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding short docstrings to these classes explaining why we need them? I try to add docstrings to all public classes and public methods. One exception are the tiles, because most of them are obvious. (I'm sure though that I have forgotten it at some places too tbj)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I’ll add short JavaDocs for these helper/model classes. The tiles are usually clear from the class name, but classes like this one do need a bit of context about why they exist in the BEAST X backend.

@tochsner

tochsner commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Something else: For me at least, to be able to run anything using BEAGLE in IntelliJ, I have to specify these VM options in the run config: -Djava.library.path=/usr/local/lib, otherwise I get the error Failed to load BEAGLE library: no hmsbeagle-jni in java.library.path. Do you have the same? If yes, we should probably document this in the README for future collaborators.

@tochsner

Copy link
Copy Markdown
Collaborator

Personally, I'm a big fan of opinionated code formatters as soon as multiple people work on the same code base to have a unified code style. I mainly know them from other languages (e.g. prettier for JS, black for python, gofmt for go...). For Java, reddit seems to recommend https://github.com/diffplug/spotless . What do you think? We could try it out in a future PR.

@tochsner tochsner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the pipeline to build the BEAST X state and model and then execute it seems a bit overcomplicated to me (or at least is a bit overwhelming). Generally, it seems that the builder and runner classes try to cover every possible edge case, even though most of them migh tnever be needed in practice. I would start with the actual use cases in the current code base (i.e. executing a .phylospec file or PhyloSpec string, generating the XML for a .phylospec file or PhyloSpec string, and the tests) and then prune everything else.
What do you think?


public class BeastXBMTraitLikelihoodSpec extends AbstractModelLikelihood {

private final Alignment observedTraits;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and BeastXOUTraitLikelihoodSpec: do I understand it correctly that we need them because BEAST X does not implement the OU or BM models?
Then it's part of the same discussion wrt. e.g. the Bernoulli distribution. I'm fine with leaving them here now, but it's definitely something we should discuss with them directly. Best case, they feel incentivized to implement it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while the Bernoulli seems simple enough, it's probably harder to implement something like the OU likelihood which works correctly and is as fast as possible without their help.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my understanding as well. I added these local specs because I could not find directly usable BEAST X BM/OU trait likelihood components for this backend path, so this was a way to keep the PhyloSpec trait models executable for now.

But I agree this is not ideal as a long-term solution, especially for OU where correctness and performance are much less trivial than for something like Bernoulli. I’m happy to leave these in for now if useful, but this should definitely go on the list of things to discuss with the BEAST X team.

Comment on lines +64 to +79
public BeagleTreeLikelihood materializeBeagleTreeLikelihood() {
SitePatterns patterns =
new SitePatterns(this.observedAlignment);

return new BeagleTreeLikelihood(
patterns,
this.treeModel,
this.branchModel,
this.siteRateModel,
this.branchRateModel,
null,
false,
PartialsRescalingScheme.DEFAULT,
false
);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than for consistency with the other two likelihood specs, is there a reason why we could not directly create the BeagleTreeLikelihood in the corresponding tile?
This would also remove the materialization logic in the model builder.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for this spec layer was to keep the likelihood construction path consistent with the BM/OU trait likelihood specs, and to delay the actual BEAST X likelihood creation until the model builder had collected the full model state. For PhyloCTMC, though, the required BEAST X objects are already available at that point, so I don’t think there is a strong reason to keep the extra materialization step here. I’ll try moving the BeagleTreeLikelihood creation into the tile and remove the redundant model-builder logic if that keeps the pipeline cleaner.


import java.util.function.Consumer;

public class TreeDistribution<L extends AbstractModelLikelihood> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we loosen the type guard T extends BeastXParam to T in BoundDistribution, can we remove this class?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the main reason for introducing TreeDistribution: tree priors did not fit well while BoundDistribution was restricted to T extends BeastXParam. I’ll check whether TreeDistribution has any tree-prior-specific logic left. If it is just a wrapper around the bound object and likelihood, then loosening BoundDistribution should let us remove it.

Comment on lines +157 to +162
TreeModel existingTreeModel =
this.treeModelsByPhyloSpecName.get(id);

if (existingTreeModel != null) {
return existingTreeModel;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit risky. Right now, addTreePriorDistribution is only called in TreeDrawTile and the ID should always be unique (because PhyloSpec var names have to be unique). However, this could (and probably should) be also called for drawn arguments (something like Age rootAge = rootAge(tree~Yule(...))). If we have a line like this twice, we will only register it once presuming the id is tree for both calls.
Can we just drop the if check? Or am I missing a case where this is relevant.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I added this as a defensive reuse path, but I see the risk if tree-producing functions are used more than once with the same default id.

I don’t think we need this check here, since the PhyloSpec variable name should already give us a unique tree registration. I’ll drop the reuse branch so duplicate tree registration fails more explicitly instead of silently reusing the first tree.

Comment on lines +220 to +222
public void addMCMCLogger(Logger logger) {
this.mcmcLoggers.add(logger);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, addScreenLoggerSpec(long logEvery) and addFileLoggerSpec(long logEvery, String fileName) seem to be dead code. If this is the case, can we remove it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those are leftovers from the earlier logger implementation. addMCMCLogger(...) is the current path now, so I’ll remove the old addScreenLoggerSpec and addFileLoggerSpec methods.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this file in combination with BeastXRunPipeline is a bit overwhelming to me. There are many small methods, most of which call like two other functions. Some also seem to never actually been called or only be called by dead code.
How do clients actually use these two classes? Could we boil it down to a handful of slightly longer methods?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we somehow merge this with BeastXRunPipeline?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the issue. I split this quite aggressively because I wanted each step of the runner pipeline to be testable while I was wiring the backend together, but it has ended up making the public path harder to follow.

I’ll simplify this around the actual entry points we need — running from a file/string and writing XML from a file/string — and see whether BeastXRunPipeline can be merged back into PhyloSpecRunner or reduced to a private helper.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is never used right?
If we want to document the compatibilities, we could do it the same way as the BEAST 2 parser with this csv file which can be rendered on the website.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right, this is not part of the actual BEAST X execution path. I added it as an internal summary of the current backend coverage, but a Java class is probably not the right place for that.

I’ll remove this file from the runtime code and move the capability information to a CSV/documentation format, following the BEAST 2 parser approach.

Comment on lines +56 to +68
private static String defaultRunName(Path sourcePath) {
String fileName =
sourcePath.getFileName().toString();

int extensionStart =
fileName.lastIndexOf('.');

if (extensionStart <= 0) {
return fileName;
}

return fileName.substring(0, extensionStart);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method already exists in PhyloSpecRunner. Do we need both?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is some repetition in this section. I will carefully check these parts to make them clearer.

Comment on lines +3 to +8
public enum RunMode {
BUILD_STATE,
BUILD_MODEL,
BUILD_MCMC,
EXECUTE_MCMC
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need all these options? E.g. when would we just want to build the state or to build the model?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question. These modes mostly came from the way I split the pipeline while debugging and testing each stage separately, rather than from clear user-facing use cases.

I’ll review where they are actually used. If only XML generation and MCMC execution are needed in practice, I’ll simplify this and remove the intermediate run modes instead of exposing them as part of the runner API.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the other .phylospec file in this folder, could we move them to the test resources?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this section is included here because I encountered a problem while testing mcmc run. Because I didn't add expected state to the .phylospec file, it caused an error every time TilingScriptFilesTest was executed. I will remove it later.

@hhua361

hhua361 commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Something else: For me at least, to be able to run anything using BEAGLE in IntelliJ, I have to specify these VM options in the run config: -Djava.library.path=/usr/local/lib, otherwise I get the error Failed to load BEAGLE library: no hmsbeagle-jni in java.library.path. Do you have the same? If yes, we should probably document this in the README for future collaborators.

For BEAGLE, yes, I have run into the same kind of issue on Windows. In my case I usually add the BEAGLE library directory to PATH rather than setting -Djava.library.path, but it is the same native-library loading problem. I think documenting this in the README would be useful, with notes for both macOS/Linux and Windows.

For formatting, I also like the idea. Having a project-wide formatter would help avoid noisy diffs once more people are working on the Java code. I would keep that separate from this PR, but trying Spotless in a small follow-up PR sounds good to me.

@hhua361

hhua361 commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Generally, the pipeline to build the BEAST X state and model and then execute it seems a bit overcomplicated to me (or at least is a bit overwhelming). Generally, it seems that the builder and runner classes try to cover every possible edge case, even though most of them migh tnever be needed in practice. I would start with the actual use cases in the current code base (i.e. executing a .phylospec file or PhyloSpec string, generating the XML for a .phylospec file or PhyloSpec string, and the tests) and then prune everything else.
What do you think?

I think that is a fair point. I tried to keep the backend pieces separated so that XML generation, model construction, and execution could be tested independently, but I agree the current pipeline may be more general than we actually need right now. I’ll review this from the current use cases first: running a .phylospec file/string, generating XML from a .phylospec file/string, and the tests. Then I can simplify or remove the extra layers that are not serving those paths clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants