Releases: MyrtleSoftware/vollo-sdk
Releases · MyrtleSoftware/vollo-sdk
Vollo SDK 28.0.0
- MMIO optimisations
- Speed up model compilation
- Optimise pointwise operations
- Support partial updates with inputs that have more than 65536 elements
- Support for transposed dynamic weight matmuls
- Add functions to check if a model was compiled with
generate_state_reset = Truevollo_rt_model_is_resettablein Vollo RT C/C++ APIvollo_rt.VolloRTContext.model_is_resettablein Vollo RT Python bindingsvollo_compiler.Program.model_is_resettableinvollo-compiler
- Add
vollo_rt_add_devicein Vollo RT C/C++ API as a more versatile version ofvollo_rt_add_acceleratorvollo_rt.VolloRTContext.add_devicein Vollo RT Python bindings
- Handle SIGINT in long-running
vollo_compilerfunctions - Support for non-streaming
torch.nn.Conv1d - Support
torch.sum/torch.Tensor.sumover multiple dimensions - Support variable-length arg for
dimsintorch.Tensor.permute - Fix handling of exponentiation on infinities
Vollo SDK 27.1.0
- State reset with
vollo_rt_add_reset_job(for models compiled withgenerate_state_reset = True) - Support for non-zero LSTM initial state
- Support GELU (exact and tanh approximation)
- Faster SiLU
- Improve operator fusion in some cases
- Fix a bug handling strides in slices
- Fix HwConfig info (
vollo-tool read-hw-config) of Artena bitstreams - Standardize naming of
vollo_compiler.Configpresets
Vollo SDK 27.0.1
- Reduce tensor ram usage in certain models
- Add experimental
vollo_rt_prepare_raw_buffer_output_completionandvollo_rt_check_raw_buffer_output_completion
functions to Vollo RT for completion detection from a different thread to the one holding thevollo_rt_context
Vollo SDK 27.0.0
- Add support for Silicom Artena
- IO optimization when using MMIO
- Add experimental support for specifying which cores to allocate PyTorch operations to using
vollo_torch.CorePartition - Add support for specifying which cores to allocate models to in a multi-model program by passing
core_indicestovollo_compiler.ProgramBuilder.add_nnir - Optimize sigmoid and SiLU activation functions
- Improve spaced latency for some stateful models that use dynamic weights
- Reduce tensor RAM usage of state in stateful models
vollo_torch.Fp8Weightsnow errors if used on operations which require bf16 weights, such as dynamic weights
Vollo SDK 26.2.0
- Support for FP8 (E4M3) weights on Versal devices using
vollo_torch.Fp8Weightscontext manager - Support for
torch.exp,torch.exp2at FP32 precision usingvollo_torch.Fp32Activationscontext manager - Support for
matmuloperations where both inputs are dynamic (non-constant) tensors, when usingallow_dynamic_weights - Optimize accumulations on Versal devices, improving performance of layers such as
LayerNormandRMSNorm, andLinearlayers with small output features - Add support for multiple state tensors in
vollo_torch.nn.Scan - Add
allow_unserializableflag tovollo_compiler.NNIR.to_programfor testing programs which can't be serialized - Fix multi-model programs that use dynamic weights
Vollo SDK 26.1.2
- Optimize handling of biases in
Linearlayers when usingallow_dynamic_weights - Speed up model compilation
- Add
random_seedsargument tovollo_compiler.NNIR.to_program
Vollo SDK 26.1.1
- Fix V80LL initialization bitstream so that the V80LL memory can be flashed over JTAG
- Optimize handling of biases in
Linearlayers when usingallow_dynamic_weights - Add support for multiple outputs to
vollo_torch.nn.Scan - Speed up loading
.volloprograms
Vollo SDK 26.1.0
- Fix DMA bug introduced on V80 in Vollo SDK 26.0.0
- Add Alveo V80LL bitstream and
vollo_compiler.Config.v80ll_c6b32hardware config - Add support for
Linearlayers where the contracted dimension is not the data dimension via theallow_dynamic_weightsflag forvollo_compiler.NNIR.to_program - Add support for multiple inputs to
vollo_torch.nn.Scan - Add support for indexing with negative indices in:
torch.stack,torch.sum,torch.permute,torch.squeeze,torch.unsqueeze - Add support for
torch.nn.functional.linear - Add optional
biasargument tovollo_torch.nn.PaddedConv1d
Vollo SDK 26.0.2
- Update example/partial_update.c to allow multiple inputs and mixed precision
- Fix bug in FP32/multi-input partial updates
- Speed up model compilation
Vollo SDK 26.0.1
- Make
vollo-tool licenseuse the system's CA certificates - Fix bug in FP32 partial updates