What Document Properties are actually used in Word documents?
Common Metadata for Microsoft Word Documents
- All of the files included the Dates accessed, content created, modified (from Windows Explorer) and Content created (from Microsoft Office). Most of the files also include Date last saved (from
Microsoft Office). It should be noted that the Windows Explorer dates are not necessarily the same as the Microsoft Office dates, and that the Microsoft Office dates should be preferred. For more
information, see the article "Why are the Windows Explorer dates not reliable?"
- 19 out of 20 files (95%) include a Revision number and Template. Of these, 90% are based on a “Normal” template, leaving only 10% based on a different template.
- Around 19 out of 20 files also include Microsoft Office statistics (character count, word count, line count, paragraph count and pages), which raises the question: Why do the other 5.5% of the
files not include this. It is probable that they were created from outside of Microsoft Office and saved as a doc files from that program without all of the metadata which Microsoft Word
provides.
- Additionally, 16 files have a template of “Normal_Wordconv” and therefore were clearly converted from another application. They include some of the statistics (such as Word count and Pages), but
have a zero Line count and Paragraph count. 2 of these documents have a character count of -32766, which clearly indicates something wrong with the metadata calculation (perhaps an overall
error).
- Over 9 out of 10 files have a Last saved by, Authors, Creators and Participants. Despite the last 3 being shown in Windows as separate metadata properties, in each case the Author was the same as
the Creator and the Participants. Also, only 5 of these files had more than one author.
Lesser Used Document Properties for Microsoft Word Files
- 58% of these files have a different "Last saved by" to its Author, and 6% of the files have a user of "User", "Usuario", "Admin", "Owner" or "Preferred Customer".
- Nearly 6 out of 10 files had a Company indicated. This field is usually filled in when Microsoft Office is created, which may indicate that more than 4 out of 10 of these documents were created
by home users. However, 45 documents had a Company indicating “Hewlett-Packard”, 39 had “Microsoft”, and 19 had “Home”, thereby indicating unreliable data. This leaves over 50% which presumably has
reliable Company data.
- Over 1 in 2 files had a Last Printed date, and Title information. This 51% for Title can be broken down in 78% for .doc documents (Office 2003 format) and only 24 % for .docx documents (Office
2007 format). This may be because, when a document was initially saved, earlier versions of Microsoft Word saved the first line of a document as the Title information, something ceased when Office
2007 came out.
- Information which is infrequently used are Tags (about 1 file in 40), byte count (1 in 50), categories (1 in 75) and Manager (less than 1 file in 100). Whilst available, use of this metadata has
not caught on.
The analysis is contained in the spreadsheet below.
Analysis of Microsoft Office documents - 24 October 2014Analysis of Microsoft Office documents - 24 October 2014
ByLanguage141024.xls Microsoft Excel sheet [1.8 MB]
There are no entries yet.