Nathan's Blog computer stuff

September 14, 2012

Notes on Mysql UDFs

Filed under: mysql — Tags: , , , — @ 8:02 am


This is a collection of “gotchas” I encountered while exploring MySQL udf extensions, areas where maybe the documentation was  present but easy to miss, was vague, unclear, or downright missing.


1)   string arguments are not guaranteed to be NULL terminated.  This is fine, because in your function you will know the lengths of the arguments from args->lengths .   However, many C functions that accept strings will require null termination, so if you need to call string functions with the arguments passed, you’re going to have to allocate buffers large enough to hold the argument (plus terminating null).  A logical place to do this is in the init function and store these buffer(s) in a data structure pointed to by initid->ptr, then copy the args to that buffer.  The args->lengths[i] value within init gives you the maximum length that the string will be – the max length within the result set – so you can use this to size your buffers guaranteeing it will be big enough for the largest value in the result set.  The args->lengths[i] from within your main function will give you the length of the argument for that specific call, so you can use this to know how many bytes to copy into your buffer.  You can reuse the buffers for each call of the main function, then free them in the deinit function.  Within the main function, you will need to copy the values into the buffer(s) and ensure that they are null terminated, remembering to zero them within main() so that they’re “reset” between calls (at least for non-statics – see below).

2)   The init (and deinit) functions are only called ONCE PER RESULT SET.  The main function is called FOR EVERY ROW IN THE RESULT SET.  initid->ptr points to the same place FOR EVERY ROW IN THE RESULT SET.

3)   Any arguments that are present in args->args at init are considered CONSTANT values, i.e., ‘some constant string value’ or 2+2 passed in when the function was called, and any dynamic values, such as sometable.somecolumn will not be present here.  Within init, args->args[i] will be NULL for any arguments that contain dynamic data.  To require that a parameter be constant, you can check for null in args->args[i] in the init function, and if null, you can then set an error message and return non-zero.

4)   If you’re going to give custom error messages, you must do it from within the init function.  Errors that happen within the main function can be signaled by setting *error to 1, but this does not halt the query or return any indication to the caller that a problem occurred, other than it returns a NULL for every subsequent row processed (apparently it stops calling the function).  What this boils down to is that you can only do sanity checks with useful error messages on the values of function parameters that are constant, since these are the only ones that will be available to the init function. So I guess the lessons here are: check types in init with args->arg_type[i] rigorously (or at least coerce them by setting it), require constants where that makes sense, and be as flexible yet careful as possible handling the dynamic parameters inside the main function as there is no graceful way to return an error there.


Here is a link to a UDF I created for writing error messages to syslog().

« Newer PostsOlder Posts »

Powered by WordPress