Plugging in Presto UDFs

Start Free Trial
March 4, 2015 by Updated September 11th, 2021

Presto is a great query engine for a variety of SQL workloads. We’ve been offering  Presto-as-a-Service for many months now and a frequent question that comes up is:

“How can I plug in custom user-defined functions in Presto?”

In this blog post, we will answer this very question. We’ve created a Presto UDF Project in Github that simplifies this process considerably. Here’s a layout of the code for ease of exposition.

└── com
└── facebook
└── presto
└── udfs
├── aggregation
│ ├──
│ └── state
│ ├──
│ ├──
│ └──
├── scalar
│ ├──
│ └──

When the Presto server launches, it requests PluginManager to find and load plugins under the $PRESTO_HOME/plugins/ directory. Plugins can contain ConnectorFactory, FunctionFactory, Types among other things. For this blog post, we’re interested in FunctionFactory. In this setup, UDFPlugin will supply a FunctionFactory that will contain all the UDFs contained in the project.

User-defined scalar functions go under the scalar subdirectory and user-defined aggregates go under the aggregation directory. You can create a jar file using mvn package and place the jar file under $PRESTO_HOME/plugin/udfs/ directory. Restart the Presto coordinator and you should now find the UDFs available for use. In a cluster setup, these steps must be repeated on all worker nodes. You should be able to query using the radians UDF that is part of the project.


presto:default> select radians(180);


Now, a little bit of explanation of the internals. The UdfPlugin class provides the UdfFactory to PrestoServer. UdfFactory peeks into its own jar and iterates over all classes to find UDFs. It looks for scalar functions in classes in the com.facebook.presto.udfs.scalar namespace (and similarly, UDAFs). An alternate (and simpler) implementation of UdfFactory could iterate over a static list of classes. But that’s no fun now, is it 🙂

In the Qubole world, Presto clusters are ephemeral. They are brought up when required, auto-scale, and shut down when not in use. Therefore, you’ll need to install the UDF jars every time the cluster is launched. The Node bootstrap functionality allows you to run arbitrary commands when the Presto cluster is launched. Your script can download jars from an accessible location (e.g. an s3 bucket) and copy it to /usr/lib/presto/plugin/udfs/ directory (be sure to create this directory first). Your script can restart the Presto worker using this command:


/usr/lib/presto/bin/presto server restart


For details on how to write Presto UDFs, you can take a look at this documentation and refer to a number of examples in the codebase.

We hope you find this little project useful and we welcome ideas and pull requests! Please send us a note at [email protected] if you’d like to talk to us about it.

Start Free Trial
  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    Data Lake & Data Warehouse – A Modern Data Strategy Discussion

    Oct. 22, 2021 | North America

    Get Technical With Qubole Solution Architects & Engineers

    Oct. 27, 2021 | Online

    Get Technical With Qubole Solution Architects & Engineers

    Nov. 10, 2021 | Online

    The Future of Data Science and Machine Learning at Enterprise Scale

    Nov. 12, 2021 | North America

    Open Data Science Conference

    Nov. 16, 2021 | North America - West

    Data Lake Vs Data Warehouse

    Nov. 17, 2021 | Middle East
  • Read Improving the Consumer Experience: How Media Companies are Using Big Data