To implement either of the two options I suggested to Martin, you do not need to know the rules for identifying NRT or definitive ancillary data.
The rules are quite complex (overly so, if I were to have an opinion on the subject...), but the underlying goal is to select the 'best available'. Since
(in total for the soup-to-nuts affair) there are a variety of primary and secondary sources, each having it's own schedule of availability and applicability,
knowing which set is 'best available' at any given time (but particularly in NRT) is a wee bit of a monster. We've tamed that monster, so all you should
need to do is trust the result of the getanc.py script.
Your issue is not identifying the files, but doing so in the most efficient manner, and having individual machines in a cluster all act alone is not
the most efficient approach. Reread my response to Martin
and see if you can make either of the two options work for your environment.