Did you know that there's a null-safe operator in Azure Data Factory's expression syntax? No? Well, now you do. Watch the video to see how you can safely reference an activity output that might not always exist.
In this video, we'll take a look at how to safely reference a nullable activity output in Synapse Pipelines and Azure Data Factory. So I'm in Synapse Pipelines here which for those that don't know is essentially Azure Data Factory embedded within Azure Synapse Analytics. We've got a simple pipeline here with just the two activities that are required just to show what we wanted to show in this video with regards to nullable activity outputs.
So the first activity is just Get Metadata. And what we want to do here is we're going to get details about a folder that's in my Data Lake. This folder is actually referenced by the dataset "MyFolderThatMightNotExist". It takes a few parameters, which is year, month, and day. And what we're actually wanting to return from this activity is a list of the "childItems", i.e. the files within there and the directories within there, and whether or not this directory actually exists - so that's just a flag.
Let's just quickly look at the structure of the data Lake. We've got the year, month, day taxonomy here, we've got a "day = 14", which has a file within it, but that's the only directory that we have within the month directory.
And then the second activity here is where we're just setting a variable. And this is just to illustrate, trying to get the output of this "Get Metadata" activity and set it as a variable. So the expression here is very simple. We're referencing an activity output just like you do with any activity - you put the name of activity within the parentheses and you do "." output and then "." whatever the property name is. In this case it's "childItems".
Debugging the pipeline - existent directory
If I put a breakpoint on this first activity, we'll see exactly what this activity produces. So I go debug. Then we see we've got to new pipeline debug going. And if I refresh, this is quite a quick activity. So see we've got the input and output boxes here. If we click on the output, then we get the things that we've asked for, we've got "exists" is true. That directory does exist, we know it does, we just saw it. And the "childItems" is just the one file, and the type is file. Okay, that's great. But that's because we have the file there.
Debugging the pipeline - non-existent directory
Now, if we change this to 15, just to debug again, when we debug, and refresh, that again returned successfully. It returned successfully, even though the directory is not there because we've added this "exists" field, this will always return and tell us whether or not it exists, but note that we haven't got the "childItems" here because the directory doesn't exist. It doesn't have any "childItems" to return. And instead of just returning the property with an empty array, it just doesn't return the property at all.
Error - referencing an output property that does not exist
So why would this be an issue? So this is an issue. If we un-debug that - remember, this is the expression that we're using for the "Set Variable" activity we're referencing activity, we're saying "go to the activities output, and then give me the childItems from it". Let's see what happens when we hit debug.
So we've got a failure here, and what's the failure? It says the expression "bla de blah" cannot be evaluated because property "childItems" does not exist. The available properties are: "exists", and if there were other ones, it would tell us the rest of them. So this is a user error. Okay. Really, what I want is if there are no files then give me whatever, whether it would be null or an empty array, give me whatever I would expect that to default to.
Using a conditional statement to ensure null-handling
Now, one thing you might think is, "okay we'll just we'll just handle this separately". So what we can do is use an "if" expression, to see whether the previous activity contains the output that we're after. So we're saying, "Does this object contain the property 'childItems'? If it does, return the 'childItems', if it doesn't, return 'null'." And if we try that again now, after updating that...
Brilliant - it's succeeded. And that null has been converted - because it's an array variable - it's been converted into an empty array. So we have no files. That's what we'd expect. We don't have a directory for that date. But having to wrap this in an 'if' condition is a bit cumbersome.
Using the null-safe operator
Is there a way that we can, essentially shorten the expression down a bit? So if we go back to the original expression, I'm going to add the magic sauce here. Sorry - we've got an extra bracket on the end there. And the magic sauce is this question mark. And this is a "null safe operator". So what this means is if that property exists, return me that property, if it doesn't, return, null. And that's just by putting question mark on the end of output.
So if we finish here again, debug again, succeeded - again, we have an empty array. And that's a really useful bit of syntax to know, and it's not at the moment, at the time of recording this, it's not found anywhere in the documentation. I hope that you will find this useful for things like whether to iterate through things rather than having the whole pipeline break. So you can get a an array of files, for example. And use it in a subsequent activity to iterate through. And if there are no files and then that's fine. But it's better using that syntax with the null safe operator than using a conditional every time you want to check whether something, a property, actually exists in the output of a previous activity.
I hope that helps.
You can also subscribe to our YouTube and Vimeo channels