The problem

As part of my new role at Fujitsu here in Wellington I’ve been running a whole load of reports from the SAN I’ve been assigned to. I’m not going to go into any detail about the SAN itself other than to say it’s from HP (for obvious reasons) but the reports I get from the reporting tools aren’t that useful in themselves. Many tens of thousands of lines of reporting data is hard to read as it is and unfortunately these particular reports aren’t uniformly arranged i.e. there is a header at the top of the file, a set of performance data, a blank line and then another set of headers followed by more performance data. This pattern repeats for the entire file.

Microsoft Excel is ok at importing data from CSV files but can’t handle these files too well as there’s no uniform pattern that it seems to be able to match (it matches to the human eye but I guess not to Excel’s internal algorithms or something).

The solution

Anyway, to import these files into Excel I needed to do 2 things which seem pretty simple but that you really don’t want to do when the files are as big as these ones – 40MB+ of text. Yuck. Please note that the first step below works in both Excel for Windows and Excel for Mac but the second part, because it contains VBA script does not work in Excel for Mac (VBA isn’t supported in Excel 2008 for Mac). See the criticism section of this Wikipedia page for information on that.

Job 1: Delete all the empty rows. Thankfully this one’s easy. Select the entire worksheet or range of cells you’re working with, hit F5, select Special, check the box labelled Blanks and hit OK. This will select all the blank cells in the selection. From here select Edit, Delete and when prompted, select Entire Row. Alt-E then D will work in Excel 2007 as it doesn’t have an Edit menu as such. This will do exactly what it says and delete the entire row containing the selected blank cells. Be careful doing this with non-tabular data.

Job 2: Delete all the headers except the top row. Unfortunately I haven’t made this work without deleting the top row too but if you copy the top headers to a new sheet first you can easily paste them back later. In Excel 2007 hit the Developer tab (it can be enabled in Excel options if it’s not showing), click Visual Basic and add a new Module by right-clicking the top pane on the left. Paste the following code into the module that gets created:

The script

Sub DeleteRows()
    Dim Rg As Range
    On Error Resume Next
    For i = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
        Set Rg = Nothing
        Set Rg = Rows(i).Find("Time")
        If Not Rg Is Nothing Then Rows(i).Delete
    Next
End Sub

Change “Time” to a text pattern that will match the rows you want to delete. In the data I’m working on all the header rows have a “Time” column so this works for me.

From here it’s just a case of hitting the Run button and all the rows containing the word “Time” will be removed. Please note that in Excel 2007 you’ll need to save the file as an Excel Macro-Enabled Workbook once a file contains macros or VBA code.

Handy and much faster than deleting them manually.