Integration of RStudio & Qubole Platform come together at your Fingertips | Qubole

Start Free Trial
August 6, 2020 by and Updated March 21st, 2024

The integration of both platforms accelerates data science and scientific research with single-click access to large datasets within the RStudio Integrated Development Environment (IDE) for data scientists.

Data scientists use RStudio for Machine Learning (ML), Artificial Intelligence (AI), and data exploration. With the vast amounts of enterprise and other sources of data that are accessible today, the volume of data to be processed for ML and exploration requires the power of cluster computational frameworks like Apache Spark. While R has traditionally provided integration with Spark using libraries like SparkR and Sparklyr, the learning curve and cluster administration burden are significant, resulting in many headaches for data scientists.

We introduce a seamless, non-disruptive RStudio Server Pro integration with Qubole—combining the power of the RStudio Server Pro with enhanced Apache Spark capabilities and near-zero cluster administration benefits provided by Qubole. With this integration, data scientists, analysts, and business users can easily crunch large datasets and derive actionable insights for their projects, closer to real-world scenarios in a significantly shorter time.

Addressing the Enterprise Needs of RStudio Users

As we worked with leading RStudio users in various enterprises and talked to the community’s active users, we organized and classified needs, tested solutions, and focused on delivering the following benefits through the  RStudio and Qubole integration:

  1. Simplified access to and processing of large datasets
    • Users can now access very large datasets for R-based AI/ML projects and deliver real-world scenario results faster.
  2. Increased productivity
    • Data Scientists can continue using familiar tools and languages to run and execute jobs on Qubole, skipping the steep learning curve.
  3. Optimized TCO with scalable infrastructure
    • Administrators can boost ROI and achieve significant cost savings for their enterprise data science projects without disrupting existing workflows.

Features of the Integration

The integration features have focused on helping data scientists realize the above-mentioned benefits by removing friction points to access large datasets; not disrupting the RStudio IDE experience; and giving the security of working within their enterprise cloud environment with Qubole, regardless of their public cloud of choice (AWS, Google or Azure).

  • Single Click Integration
    Single-click integration provides RStudio Server Pro users with a seamless and best-in-class experience for accessing Qubole-managed Spark clusters.
  • Automatic Persistence
    Data scientists now have on-demand access to RStudio on ephemeral Spark clusters on Qubole without the need to manage any servers or licenses. The Qubole’s RStudio integration transparently and automatically persists in cloud storage all the files and resources created in a user’s home folders, and restores their state on cluster restart. This way users’ workspaces are persisted and managed automatically.
  • Cluster Package Manager
    Data scientists can now use the unified package management solution available in Qubole for Spark clusters using Conda, PyPi, CRAN, and RSPM repositories. This allows users to define cluster-wide R and Python dependencies for Spark applications while providing a centralized location for all repositories.
  • Pre-Installed Packages
    Qubole’s Package Manager also provides RStudio users with various pre-packaged libraries provided for starting their data science journey with R at an enterprise level. In addition, it’s an easy and convenient way in Sparklyr to start Spark sessions on Qubole clusters without any administrative burden.
  • Performance Optimizations
    Data scientists and admins can now inherently benefit from Qubole’s Spark optimizations in their projects where results are time-sensitive, query performance times SLAs are important, and spend budgets are monitored closely. Qubole clusters automatically scale up when the Sparklyr application needs more resources, and downscale when resources are not in use.
  • Ease of Use and Access to Any Data Source
    Qubole platform’s metastore provides RStudio users an instant view of a list of tables and associated metadata available, via the  RStudio Connections tab. In addition, RStudio now includes various relevant links to the Spark UI, Resource Manager UI, Spark driver logs, and others, which are also conveniently accessible to all users.

You can find a short video demonstration showcasing RStudio capabilities at RStudio Demo.

How to Enable RStudio Integration

This product integration comes with zero license management. There is no need to procure separate licenses to try RStudio Server Pro with Qubole. Users can also have on-demand licenses via Qubole to enable Rstudio Server Pro within the Qubole platform. Besides this, users can also use their existing RStudio Server Pro Enterprise named user licenses with Qubole. We encourage you to connect with your Qubole account team member or contact our support team to enable RStudio Server Pro in your account.

To learn more about the implementation click here

Read our Press Release here

To experience Qubole with your own data, start your journey with Free Trial.

Start Free Trial
Read Introducing Capacity Reservation for Application Master to increase Workload Reliability despite Spot Interruptions