In my last post I mentioned a project which required me to move documents from one list to another list in a different farm one folder at a time. Along with that was a requirement to set various field values (metadata) based on patterns in the folder name and/or filename. I needed a reasonably flexible way to accomplish this considering that the client didn’t actually have a clue as to what they really wanted the rules to be. I already had a command (gl-replacefieldvalues) which let me set the value of an existing field but it didn’t allow me to do it based on the values of other fields and there was not real filtering capability. So I built a new command called gl-setmetadata which allows me to pass in an XML file containing various rules.

There’s really not much to the code – the bulk of it is just parsing the XML and figuring out what to do. There’s two core methods – the first, ProcessFolder, is responsible for getting the collection of items that should be processed using the provided rules. This is done by using an SPQuery object and passing in the Query XML node if present. The second method, ApplyRule, is called by the ProcessFolder method for each Rule node found in the XML and it is responsible for setting any field data based on the rules.

  1public SetMetaData()
  2{
  3    SPParamCollection parameters = new SPParamCollection();
  4    parameters.Add(new SPParam("url", "url", true, null, new SPNonEmptyValidator(), "Please specify the url to search."));
  5    parameters.Add(new SPParam("quiet", "q"));
  6    parameters.Add(new SPParam("test", "t"));
  7    parameters.Add(new SPParam("inputfile", "input", true, null, new SPFileExistsValidator()));
  8    parameters.Add(new SPParam("logfile", "log", false, null, new SPDirectoryExistsAndValidFileNameValidator()));
  9    parameters.Add(new SPParam("recursefolders", "recurse"));
 10 
 11    StringBuilder sb = new StringBuilder();
 12    sb.Append("\r\n\r\nUpdates list field values based on the rules defined in the provided input file.  Use -test to verify your updates before executing.\r\n\r\nParameters:");
 13    sb.Append("\r\n\t-url <list folder url>");
 14    sb.Append("\r\n\t-inputfile <input file containing meta data rules>");
 15    sb.Append("\r\n\t[-recursefolders]");
 16    sb.Append("\r\n\t[-quiet]");
 17    sb.Append("\r\n\t[-test]");
 18    sb.Append("\r\n\t[-logfile <log file>]");
 19 
 20    Init(parameters, sb.ToString());
 21}
 22 
 23/// <summary>
 24/// Gets the help message.
 25/// </summary>
 26/// <param name="command">The command.</param>
 27/// <returns></returns>
 28public override string GetHelpMessage(string command)
 29{
 30    return HelpMessage;
 31}
 32 
 33/// <summary>
 34/// Runs the specified command.
 35/// </summary>
 36/// <param name="command">The command.</param>
 37/// <param name="keyValues">The key values.</param>
 38/// <param name="output">The output.</param>
 39/// <returns></returns>
 40public override int Execute(string command, StringDictionary keyValues, out string output)
 41{
 42    output = string.Empty;            
 43 
 44    string url = Params["url"].Value.TrimEnd('/');
 45    bool quiet = Params["quiet"].UserTypedIn;
 46    bool testMode = Params["test"].UserTypedIn;
 47    string logFile = Params["logfile"].Value;
 48    XmlDocument metaDataDoc = new XmlDocument();
 49    string inputFile = Params["inputfile"].Value;
 50    bool recurseFolders = Params["recursefolders"].UserTypedIn;
 51 
 52    Verbose = !quiet;
 53    LogFile = logFile;
 54 
 55    metaDataDoc.Load(inputFile);
 56 
 57    using (SPSite site = new SPSite(url))
 58    using (SPWeb web = site.OpenWeb())
 59    {
 60        SPFolder folder = web.GetFolder(url);
 61 
 62        if (!folder.Exists || folder == null) // the null check is unnecessary but it makes me feel better.
 63            throw new SPException("The specified list folder was not found.");
 64 
 65        SPList list = null;
 66        try
 67        {
 68            list = web.Lists[folder.ParentListId];
 69        }
 70        catch (ArgumentException)
 71        {}
 72        if (list == null) // This should never happen if we found a folder but again, it makes me feel better having it.
 73            throw new SPException("The specified list was not found.");
 74 
 75        // Process the folder.
 76        ProcessFolder(folder, list, metaDataDoc, recurseFolders, testMode);
 77    }
 78    return OUTPUT_SUCCESS;
 79}
 80 
 81/// <summary>
 82/// Processes the folder.
 83/// </summary>
 84/// <param name="folder">The folder.</param>
 85/// <param name="list">The list.</param>
 86/// <param name="metaDataDoc">The meta data doc.</param>
 87/// <param name="recurseFolders">if set to <c>true</c> [recurse folders].</param>
 88/// <param name="testMode">if set to <c>true</c> [test mode].</param>
 89private static void ProcessFolder(SPFolder folder, SPList list, XmlDocument metaDataDoc, bool recurseFolders, bool testMode)
 90{
 91    // If we don't have any rules to process then there's no sense continueing so error out.
 92    if (metaDataDoc.SelectNodes("//Rule").Count == 0)
 93        throw new SPException("Missing \"Rule\" node(s) which should be a child of the root \"MetaData\" node.");
 94 
 95    // Get a namespace manager so that we can retrieve the Query element if present.
 96    XmlNamespaceManager nsManager = new XmlNamespaceManager(metaDataDoc.NameTable);
 97    nsManager.AddNamespace("sp", "http://schemas.microsoft.com/sharepoint/");
 98 
 99    // Look for a Query element
100    XmlElement queryElement = (XmlElement)metaDataDoc.SelectSingleNode("//sp:Query", nsManager);
101    SPListItemCollection items;
102    SPQuery query = new SPQuery();
103    if (recurseFolders)
104        query.ViewAttributes = "Scope=\"Recursive\"";
105    // Set the root folder to query
106    query.Folder = folder;
107    if (queryElement != null)
108    {
109        // We have a query element so do an intial filtering using the provided filter
110        query.Query = queryElement.OuterXml;
111        items = list.GetItems(query);
112    }
113    else
114    {
115        // User didn't provide any query parameters so just use an empty query (no filtering)
116        items = list.GetItems(query);
117    }
118 
119    Log("Beginning processing of {0} items...", items.Count.ToString());
120    int modificationCount = 0;
121 
122    for (int i = 0; i < items.Count; i++)
123    {
124        SPListItem item = items[i];
125        Log("Progress: Processing item {0}: {1}\r\n", item.ID.ToString(), item["ServerUrl"].ToString());
126 
127        if (item.FileSystemObjectType == SPFileSystemObjectType.Folder)
128        {
129            // Currently not handling folders - no particular reason, I just don't need this ability.
130            // Commenting out this block will not hurt anything.
131            Log("Progress: Item {0} is a folder - skipping.", item.ID.ToString());
132            continue;
133        }
134 
135        bool modified = false;
136 
137        // Loop through each rule element and apply the rules changes
138        foreach (XmlElement ruleElement in metaDataDoc.SelectNodes("//Rule"))
139        {
140            if (ApplyRule(item, ruleElement))
141                modified = true;
142        }
143 
144        if (modified)
145        {
146            // The rules resulted in modified data so update the item if not in test mode.
147            if (!testMode)
148                item.SystemUpdate();
149            modificationCount++;
150            Log("Progress: Item ID {0} was modified.", item.ID.ToString());
151        }
152        else
153        {
154            // There were no modifications made
155            Log("Progress: Item ID {0} was NOT modified.", item.ID.ToString());
156        }
157 
158        Log("Progress: Finished Processing item {0}\r\n\r\n", item.ID.ToString());
159 
160    }
161    Log("Finished processing items.  {0} out of {1} items were modified.\r\n", modificationCount.ToString(), items.Count.ToString());
162 
163}
164 
165/// <summary>
166/// Applies the rule.
167/// </summary>
168/// <param name="item">The item.</param>
169/// <param name="ruleElement">The rule element.</param>
170/// <returns></returns>
171private static bool ApplyRule(SPListItem item, XmlElement ruleElement)
172{
173    bool modified = false;
174    string ruleName = ruleElement.GetAttribute("Name");
175 
176    XmlElement matchElement = (XmlElement)ruleElement.SelectSingleNode("Match");
177    bool isMatch = true;
178            
179    // The match element is optional and just provides some additional regular expression filtering beyond what the Query element can provide
180    if (matchElement != null)
181    {
182        bool isAnd = true;
183        if (matchElement.HasAttribute("Op"))
184            isAnd = matchElement.GetAttribute("Op").ToLowerInvariant() == "and";
185        // For "And" operations we default our starter item to true as everything must come back as true to be a match
186        // For "Or" operations we default our starter item to false as we only need one item to come back as true to 
187        // be a match and we don't want that one item to be the starter item.
188        bool fieldMatches = isAnd;
189                
190        // If we have a Match element then we need at least one Field element otherwise what's the point.
191        if (matchElement.SelectNodes("Field").Count == 0)
192            throw new SPException("Missing \"Field\" node(s) which should be a child of the \"Match\" node.");
193 
194        foreach (XmlElement fieldElement in matchElement.SelectNodes("Field"))
195        {
196            // The Field element needs a Name attribute and a value to use as the search pattern string
197            if (!fieldElement.HasAttribute("Name"))
198                throw new SPException("Missing \"Name\" attribute of \"Field\" node.");
199            if (string.IsNullOrEmpty(fieldElement.InnerText.Trim()))
200                throw new SPException(string.Format("Missing search pattern string value for match field '{0}'", fieldElement.GetAttribute("Name")));
201 
202            // We use the internal name for all field names
203            SPField field = item.Fields.GetFieldByInternalName(fieldElement.GetAttribute("Name"));
204 
205            // Determine if we have a match for this field.
206            bool fieldMatch = Regex.IsMatch(item[field.Id].ToString(), fieldElement.InnerText);
207 
208            // Apply the match results to our fieldMatches variable to track the overall result
209            if (isAnd)
210                fieldMatches = fieldMatches && fieldMatch;
211            else
212                fieldMatches = fieldMatches || fieldMatch;
213        }
214        // Set the overall result
215        isMatch = fieldMatches;
216    }
217    if (!isMatch)
218    {
219        Log("Progress: Unable to find match for rule '{0}'.", ruleName);
220        return modified; // No match so evaluate the next rule
221    }
222    else
223        Log("Progress: Found match for rule '{0}'.", ruleName);
224 
225    // Every Rule element must have one and only one Set element
226    XmlElement setElement = (XmlElement) ruleElement.SelectSingleNode("Set");
227    if (setElement == null)
228        throw new SPException("Missing \"Set\" node.");
229 
230    // Every Set element must have at least one Field element
231    if (setElement.SelectNodes("Field").Count == 0)
232        throw new SPException("Missing \"Field\" node(s) which should be a child of the \"Set\" node.");
233 
234    // Loop through all the Field elements and apply the indicated values
235    foreach (XmlElement fieldElement in setElement.SelectNodes("Field"))
236    {
237        // Every Field element must have a Name attribute - the value can be empty which is the same as setting the field to null.
238        if (!fieldElement.HasAttribute("Name"))
239            throw new SPException("Missing \"Name\" attribute of \"Field\" node.");
240 
241        string fieldName = fieldElement.GetAttribute("Name");
242        string fieldData = fieldElement.InnerText;
243        SPField field = item.Fields.GetFieldByInternalName(fieldName);
244 
245        if (field.ReadOnlyField)
246        {
247            // We can't update read-only fields so log a warning and move on.
248            Log("WARNING: Field '{0}' is read only and will not be updated.", EventLogEntryType.Warning, field.InternalName);
249            continue;
250        }
251 
252        if (field.Type == SPFieldType.Computed)
253        {
254            // We can't update computed fields so log a warning and move on.
255            Log("Progress: Field '{0}' is a computed column and will not be updated.", EventLogEntryType.Warning, field.InternalName);
256            continue;
257        }
258        // If a SearchPattern attribute was provided then do a regular expression replace instead of just a straight up set.
259        if (fieldElement.HasAttribute("SearchPattern"))
260        {
261            if (string.IsNullOrEmpty(fieldElement.GetAttribute("SearchPattern")))
262                throw new SPException(string.Format("SearchPattern attribute of Field node '{0}' is empty.", fieldName));
263            
264            if (item[field.Id] == null)
265            {
266                // We can't do a regex on a null value so move on
267                Log("Progress: Value of field '{0}' is 'null' - no replace operation will be performed.", field.InternalName);
268                continue;
269            }
270            else
271                fieldData = Regex.Replace(item[field.Id].ToString(), fieldElement.GetAttribute("SearchPattern"), fieldData);
272        }
273        // If the fieldData is empty then make sure it's set to null
274        if (string.IsNullOrEmpty(fieldData))
275            fieldData = null;
276 
277        
278        if (item[field.Id] == null || item[field.Id].ToString() != fieldData)
279        {
280            // The modified field data is different from the source so go ahead and apply the change
281            Log("Progress: Applying modification to field '{0}' per rule '{1}'", fieldName, ruleName);
282            if (field.Type == SPFieldType.URL)
283                item[field.Id] = new SPFieldUrlValue(fieldData);
284            else
285                item[field.Id] = fieldData;
286 
287            modified = true;
288        }
289        else
290        {
291            Log("Progress: No change required for field '{0}' per rule '{1}'.", fieldName, ruleName);
292        }
293    }
294    if (!modified)
295        Log("Progress: Set rules resulted in no change from existing data for rule '{0}'.", ruleName);
296 
297    return modified;
298}

The core thing to understand with this command is the structure of the input folder and this where things get a little more complicated. I don’t currently have an XSD for this (I may create one to aid in validation but I just didn’t have the time). So failing a good XSD here’s a reasonably detailed example XML file with comments:

 1<MetaData>
 2    <!-- Query is an optional CAML element and is used to filter the items that are to be considered.  Anything you can do with a standard CAML Query element you can put here (be sure to include the namespace attribute) -->
 3    <Query xmlns="http://schemas.microsoft.com/sharepoint/">
 4        <Where>
 5            <BeginsWith>
 6                <FieldRef Name="FileRef" />
 7                <Value Type="string">/Documents/Sub-Folder1/</Value>
 8            </BeginsWith>
 9        </Where>
10    </Query>
11    <!-- There must be at least one Rule element - multiple elements are processed in the order they appear -->
12    <!-- The Rule element may contain an optional Name attribute which is a simple label used for logging -->
13    <Rule Name="Set Content Type">
14        <!-- Every Rule element must have one and only one Set element -->
15        <Set>
16            <!-- The Set element must contain one or more Field elements -->
17            <!-- The Field element must have a Name attribute which corresponds to the fields internal name -->
18            <!-- The value of the Field element is what will be set to the list item for that field -->
19            <!-- A Field element may contain an optional SearchPattern attribute which can be used to update an existing value via a Regex.Replace() call -->
20            <!-- If no SearchPattern attribute is present then existing data is ignored -->
21            <Field Name="ContentType">Dublin Core Columns</Field>
22        </Set>
23    </Rule>
24    <Rule Name="Set English Language">
25        <!-- A Rule element can contain one optional Match element which is used to provide regular expression based filtering -->
26        <!-- The Match element can contain an optional Op attribute used to indicate whether the match logic is "AND" or "OR" (default is "AND" if not present) -->
27        <Match Op="OR">
28            <!-- The Field element must have a Name attribute which corresponds to the fields internal name -->
29            <!-- The value of the Field element is used in a Regex.IsMatch() call to determine whether the item should be processed -->
30            <Field Name="FileLeafRef">(?i:.* Eng.*|.*ENGLISH ONLY.*|.*-EN.*)</Field>
31            <Field Name="Title">(?i:.* Eng.*|.*ENGLISH ONLY.*|.*-EN.*)</Field>
32        </Match>
33        <Set>
34            <Field Name="FileLeafRef" SearchPattern="(?i: -?Eng|ENGLISH ONLY)|-EN">-English</Field>
35            <Field Name="Language">English</Field>
36        </Set>
37    </Rule>
38    <Rule Name="Set Korean Language">
39        <Match Op="And">
40            <Field Name="FileLeafRef">(?i:.* Kor.*|.*KOREAN ONLY.*|.*-KO.*)</Field>
41        </Match>
42        <Set>
43            <Field Name="FileLeafRef" SearchPattern="(?i: -?Kor|KOREAN ONLY)|-KO">-Korean</Field>
44            <Field Name="Language">Korean</Field>
45        </Set>
46    </Rule>
47</MetaData>

Note that I don’t claim to be a regular expression expert and I’ve not extensively tested the regular expressions in the examples above and I know that there are issues with them for more complex data but for the purpose of a simple demonstration they do well enough. The example above will return back all documents in the folder “/documents/sub-folder1” and will set the content type of every item to “Dublin Core Columns”. It will then standardize the name of the file (FileLeafRef) so that it only contains *-English or *-Korean using information in the filename and it will also set the Language field to English or Korean using this same information.

Probably the most important thing to remember when constructing your XML is that you need to use the internal field name and not the display name.

You can also do additional filtering using the command line parameters by restricting whether folders are recursed and by specifying a sub-folder instead of a root list folder. The syntax of the command can be seen below:

C:\>stsadm -help gl-setmetadata stsadm -o gl-setmetadata Updates list field values based on the rules defined in the provided input file. Use -test to verify your updates before executing. Parameters: -url <list folder url> -inputfile <input file containing meta data rules> [-recursefolders] [-quiet] [-test] [-logfile <log file>]

Here’s an example of how you would execute this command using the XML shown above as an input:

stsadm -o gl-setmetadata -url http://portal/documents -inputfile c:\metadata.xml -recursefolders -logfile c:\metadata.log

Like many of my commands that do batch updating you can run this command in a test mode by passing in a “-test” parameter.