Finding Duplicate Strings

This guide demonstrates finding duplicate string using RhinoScript.

Problem

Imagine you have an array of strings which contains duplicates. RhinoScript has a method to cull the duplicate strings. But rather than cull them, one would like to find them with a routine that will return the indices of the duplicate items.

Better yet, a routine that would return sets of indices, with each set containing the indices of a particular string would solve the problem. For example, if an array contained “Curve”, “Surface”, “curve”, “surface”, we would like to have an array containing [0,2] and [1,3] returned.

Solution

VBScript’s Dictionary is a useful tool for storing associative data, or data in the form of (key, item) pairs. In the problem outlined above, you could use a Dictionary to track each string and the indices where is appears in the array. In other words, use a Dictionary to store (string, indices) pairs.

To store the indices, you are going to need an array. But creating and resizing VBScript arrays is always a challenge. So, you might consider using a .NET’s ArrayList object. A .NET ArrayList is a COM-enabled object, which means it can be used in VBScript.

The following sample function demonstrates how you can use a Dictionary of strings and .NET ArrayList objects to find the indices of duplicate string items in an array.

Function FindDuplicateStrings(arrStrings, blnCase)

  ' Local variables
  Dim objDict, strKey, objItem, arrItems
  Dim i, j, nCount
  Dim arrResults()

  ' Default return value
  FindDuplicateStrings = Null

  ' Create a dictionary object and set it's compare mode
  Set objDict = CreateObject("Scripting.Dictionary")
  If (blnCase = True) Then
    objDict.CompareMode = vbBinaryCompare
  Else
    objDict.CompareMode = vbTextCompare
  End If

  ' Process input strings. If the string is not in the dictionary,
  ' then add it and add it's index to the ArrayList. Otherwise,
  ' just add it's index to the dictionary item's existing ArrayList.
  For i = 0 To UBound(arrStrings)
    strKey = arrStrings(i)
    If Not objDict.Exists(strKey) Then
      objDict.Add strKey, CreateObject("System.Collections.ArrayList")
    End If      
    objDict(strKey).Add(i)
  Next

  ' Find all of the dictionary items that have more than one index.
  ' Add those arrays to our result array
  nCount = 0
  arrItems = objDict.Items
  For Each objItem In arrItems
    If (objItem.Count > 1) Then
      ReDim Preserve arrResults(nCount)    
      arrResults(nCount) = objItem.ToArray()
      nCount = nCount + 1
    End If
  Next

  ' Done!
  FindDuplicateStrings = arrResults

End Function

Here is an example of how to use the above function:

Sub TestFindDuplicateStrings

  Dim arrStrings, arrResults, arrItem, nItem, i

  arrStrings = Array("Curve" ,  _
                     "Surface", _
                     "Mesh",    _
                     "Point",   _
                     "Surface", _
                     "Curve",   _
                     "Curve")

  arrResults = FindDuplicateStrings(arrStrings, False)
  If IsArray(arrResults) Then
    Call Rhino.Print("Duplicate Sets = " & CStr(UBound(arrResults) + 1))
    For i = 0 To UBound(arrResults)
      Call Rhino.Print("Set = " & CStr(i + 1))
      arrItem = arrResults(i)
      For Each nItem In arrItem
        Call Rhino.Print("Item " & CStr(nItem) & " = " & arrStrings(nItem))
      Next
    Next
  End If

End Sub