This folder contains Docker configurations for creating custom Databricks cluster images with pre-installed packages and specialized environments.
R-based runtime environments for data science and statistical analysis.
Use Cases:
- R packages pre-installed
- Statistical modeling environments
- Data analysis workflows
Alpine Linux-based minimal images for optimized performance.
Use Cases:
- Lightweight containers
- Minimal overhead
- Fast startup times
Minimal Ubuntu 20.04 configurations.
Use Cases:
- Clean Ubuntu 20.04 base
- Essential packages only
- Custom from-scratch builds
Python environment configurations with common data science packages.
Use Cases:
- Python-specific workloads
- Data science libraries
- Machine learning environments
R base configurations with core R installation.
Use Cases:
- Basic R runtime
- Foundation for R projects
- Minimal R environment
Standard R configurations with commonly used packages.
Use Cases:
- R with standard libraries
- Enterprise R environments
- Pre-configured R setups
Standard Ubuntu 20.04 images with common tools.
Use Cases:
- General-purpose environments
- Standard tooling
- Balanced configuration
Browse the folders above and select the configuration that matches your needs.
cd <folder-name>
docker build -t your-image-name:tag .# Tag for your registry
docker tag your-image-name:tag your-registry.azurecr.io/your-image-name:tag
# Push to Azure Container Registry
docker push your-registry.azurecr.io/your-image-name:tagIn your cluster configuration, specify the custom container:
{
"docker_image": {
"url": "your-registry.azurecr.io/your-image-name:tag"
}
}For detailed information on using custom containers with Azure Databricks:
- Base Image Selection: Choose the minimal base that meets your requirements
- Layer Optimization: Combine RUN commands to reduce layer count
- Package Versions: Pin specific versions for reproducibility
- Security: Scan images for vulnerabilities before deployment
- Size: Keep images as small as possible for faster startup
RUN pip install numpy pandas scikit-learnRUN R -e "install.packages(c('dplyr', 'ggplot2'), repos='https://cran.r-project.org')"RUN apt-get update && apt-get install -y \
package1 \
package2 \
&& rm -rf /var/lib/apt/lists/*- Custom images must be compatible with Databricks runtime
- Images must include required Databricks components
- Test images thoroughly before production use
- Keep images updated with security patches
Image too large:
- Use multi-stage builds
- Clean up package manager caches
- Remove unnecessary files
Build fails:
- Check base image compatibility
- Verify package availability
- Review Dockerfile syntax
Cluster won't start:
- Verify image is accessible from Databricks
- Check container registry credentials
- Review Databricks logs
Back to: Main Repository