My previous article discussed the various different metadata fields that are available in Microsoft Office documents.
This article will discuss ways of accessing these fields in multiple documents. It will start with looking at VBA, and then talk about an easier way.
As shown in the VBA article, it is possible to read document properties in VBA. In order to read multiple files, there is a three-step process.
This example will focus on a VBA module in Microsoft Excel which will open all Microsoft Excel spreadsheets in the folder C:\b\, and retrieve the “Author” property.
As an introduction, variables are initialised and a reference to the current spreadsheet is made.
Dim introw As Integer, mySheet Dim strFolder As String, strFile As String, strExtension As String Sub Catalog() introw = 1 Set mySheet = ActiveSheet strFolder = "C:\b\" strExtension = "*.xls*" 'INSERT NEXT CODE HERE End Sub
Firstly, generating the list of files can be done in several ways. One of the easiest is to use the DIR function as follows. The following will retrieve the first file, and then loop, accessing the next file, until there are no files left.
strFile = Dir(strFolder & strExtension)
Do Until strFile = ""
Call OpenFileAndCatalog
strFile = Dir
DoEvents
Loop
Once you have the name of the next file, you can then open the file within Excel as follows.
Sub OpenFileAndCatalog() On Error GoTo MyError Workbooks.Open strFolder & strFile, UpdateLinks:=False, ReadOnly:=True 'INSERT NEXT CODE HERE Exit Sub MyError: Exit Sub End Sub
Now that the file is open, you can access any of the document properties. For this case, the “Author” property gets written into the initial spreadsheet, along with the file name, and the row number is updated.
mySheet.Cells(introw, 1) = strFile
mySheet.Cells(introw, 2) = ActiveWorkbook.BuiltinDocumentProperties("Author")
introw = introw + 1
'INSERT NEXT CODE HERE
Now the file which has just been opened has to be closed, and the next file name is retrieved.
ActiveWorkbook.Close SaveChanges:=False
The completed workbook can be opened below.
But is this a good method?
The main advantage of using the above is that the code is yours, and you can adjust it as you wish. While the spreadsheet is opened, you can extract any additional data from the document and save it into your workbook. You can also define which folders or files you want to use.
One of the main disadvantages of using the above revolves on the fact that you have to open the spreadsheet. This means that:
A better way is to be able to examine the metadata without needing to open the document. The advantages to this are:
Why reinvent the wheel? If you can have a program that can do all of that, then you can use that as the basis of any additional analysis.
Two programs which can do the above are Filecats Professional and Filecats Metadata. Both create a table of files, folders and subfolders with hyperlinks and metadata from Microsoft Office, photographs, audio, video and MSG (exported from Outlook) files.
There is a free 7-day trial, and should save you a lot of time, allowing you to concentrate on using this date. Why not download it now?