Graphenus integrates different machine learning libraries to allow data science teams to develop any type of model, abstracting them from the complexity of distributed data management (infrastructure, configuration, processing, etc.).

Ease of use

  • Support for multiple programming languages: Java, Python, Scala, R.

  • Out-of-the-box integration with any data source supported by Graphenus.

  • Abstracting the complexity of distributed computing.

High performance and scalability

  • Execution of machine learning models and algorithms on Spark.

  • Availability of multiple optimised models.

  • Guaranteed scalability thanks to Graphenus and its container architecture.

Broad ecosystem

  • High volume of ML algorithms available (classification, regression, recommendation, clustering, etc).

  • Workflow-like utilities for feature transformation, pipeline definition, model evaluation, persistence, etc.

  • Neural network models, genetic algorithms, tensors, etc.


Graphenus has a solid foundation to support the development of machine learning models.

Building on the available Spark ML base, new libraries are added to provide Graphenus with differential ML capabilities:

  • Graphenus is fully compatible with the main ML libraries: scikit learn, pandas, TensorFlow, PyTorch, mlflow and Spark Mlib.


  • Thanks to Spark, ML models can be run in a fully distributed manner, using virtually any data source and in a way that is transparent to the developer.
  • Transfer Learning: the application of this technique using Graphenus will allow the knowledge (weights and biases) of a developed model to be stored and subsequently applied to a different problem.


  • Explainable ML: Graphenus will feature SHAP (Shapley Additive exPlanations), the game theory-based reference library for explaining machine learning models, in upcoming releases.