web
statistics
  • Finding Duplicate Files Recursively

    I was cleaning up a folder with files duplicated in subdirectories. Here is a simple Powershell code that identifies duplications recursively.

    There might be a shorter form, but I think this is as I short I can get without becoming very cryptic.

    function FindDuplicates{
     # Receives a list of ArrayLists
     # iteratively prints the full path of each ArrayList item
     # where the ArrayList contains multiple paths (i.e duplicates)
    Process
    {
     if ($_.Count -gt 1)
           {foreach($d in $_){$d.FullName}} }
    }
    
    function GetFilesInBuckets($dir,$filter = "*.*")
    {
     # Creates a hashtable of arraylists (1 entry per unique file name)
     # Returns the list of arraylists only (not the entire hashtable)
      $files = @{}
     foreach ($fl in (gci $dir -filter $filter -r))
      {
        if(!$files.ContainsKey($fl.Name))
        {
        $files[$fl.Name] = New-Object System.Collections.ArrayList
        }
        [void]$files[$fl.Name].Add($fl)
      }
     $files.Values
    }
    
    GetFilesInBuckets "C:\MyRootFolder" -f "*.txt"
        |FindDuplicates

    Additional filtering methods will look similar to as FindDuplicates, and can be piped at the end of the last line of code:

    GetFilesInBuckets "C:\MyRootFolder" -f "*.txt"|
        FindDuplicates|
        FilterOutOlder

    Leave a comment

  • Nant And Xml Programming

    I have used NAnt (the .Net port of the Java build tool Ant) in most of the .Net/VB6 projects I have worked on. The decision for using this tool was not always mine, but I guess I did not find (nor look for) a better alternative.

    As these projects grew, so did their build process and the complexity the build scripts. NAnt was flexible enough to cope with this. If a suitable task does not exist, you can download NAntContrib tasks, other tasks from the web, or you can create yours. If a task is too simple a language construct for the job, you can call an executable and pass the right parameter, create a .Net in-line function, or call a Powershell script.

    One of the systems I worked on was particularly complex but very well designed. So was its build script. Common functionality was not duplicated, but kept as targets or tasks in separate NAnt files, that where included in the main NAnt script.

    When reusing these blocks of code, you usually have to pass context-specific parameters to it. You do this by setting properties before invoking the NAnt target. For example:

    ...
    <property name="SpreadsheetPath" value="${Config} + ${CurrentProject}" />
    <call target="ProcessSpreadsheet" />
    ...

    This syntax becomes more awkward when more properties need to be passed to the target.

    Worse, the lack of explicit input and outputs to a NAnt target make this syntax a bad practice: code calling a target needs to know the name of the properties the target needs set, and the name of the property containing the expected output of the target. Let me picture this with an example:

    (Known inputs: SpreadsheetPath and BusinessArea)
    <property name="SpreadsheetPath"
        value="${Config} + ${CurrentProject}" />
    <property name="BusinessArea" value='Security' />
    
    <call target="GenerateCodeFromSpreadsheet" />
    
    (Known output: NumberOfGeneratedFiles)
    <property name="TotalCodeFiles" value="${NumberOfGeneratedFiles}" />

    The problem: When XML is not enough

    I don’t have the intention (nor the knowledge!) to enter the XML vs DSL discussion but I see a real disadvantage in using Ant/NAnt for big and complex systems. Generally speaking is not a good idea to program in an XML based DSL (like NAnt). It makes sense to declare targets that know what to do and how to do it, but as the code base grows and scripts are modularised for better reusability, then is only a matter of time until you find yourself programming on XML.

    The XML/DSL discussion is a very interesting one, and so is the classification of programming paradigms ( what is NAnt programming. Goal-Oriented? Declarative? Or simply a markup language?), but I’m not interested on them now. I think I’m done with Ant/NAnt…I’ll stop trying to extract oil from stones, as Spaniards say. There’s only so much you can ask xml-based languages, but soon you’ll need a proper programming language, or you’ll become an xmlholic.

    If you know Maven for Java solutions (NMaven is only an idea at the time of writing), you know this is an XML based tool too so it will suffer the same problems I just mentioned. Maven is, however, more flexible thanks to the life cycle structure and the ability of recursively build projects (as specified in their Project Object Model xml files) in a folder structure. This benefit is not enough to marry xml programming for life…​one size doesn’t fit all, in fact in terms of build process; standardisation is the exception rather than the rule. I still need to look into Gradle, but it may be good answer to Ant/Maven approaches.

    The solution:

    I don’t know really. Sorry if you’ve read this far, but I just wanted to brain dump my discontent with XML programming. Maybe is a useful question for us developers using or building frameworks on a daily basis. Where should we allow for a more multi-purpose, less templated approach? Extensibility is not always the answer. Some frameworks like NAnt are very extensible, but offer very little at their core. The extended functionality is more useful than the basic framework itself.

    So no, I don’t know the solution for avoiding XML programming (NAnt or Maven) for the time being I’m moving towards a build script fully written in PowerShell, with a flavour of Maven’s life cycles. I will write in detail about this later, it is not rocket science and is not a full solution, but is a start, just like this blog entry.

    Leave a comment