VBA Project Storage

VBA Storage

VBA Project Storage

Introduction

As described in the first article of this seriesfirst article of this series [link to StructuredStorage.php], prior to Word 2007, Word Documents (and Excel Workbooks, amongst others) were held in a containing structure that mimics a mini file system. Despite creating an entirely new XML‑based format for documents themselves, Microsoft still use the old format for storing VBA Projects within the new structure. In this article, the first of several pages looking at the detail of the ‘old’ structure, I begin by tearing apart a VBA project.

It is not compulsory for you to follow along, but I will describe what I do in a way that you can if you wish. I will say now, and may say again, that VBA is not really the language of choice for this but I like to use VBA to demonstate because it is available to all users of Word: you don’t need special software to copy what I do.

A VBA Project

As an arbitrary starting point, create a new document, and insert a new module, “Module1”; copy the VBA code from the previous pageVBA code from the previous page [link to StructuredStorage.php#TheCode] into the new module and save the document, in Word 2007‑format as a macro‑enabled document, calling it, say, “Arbitrary.docm”. I was writing an article on the 2007‑format files before I allowed myself to be sidetracked into writing this one, and do not propose to provide any specific details here, other than those necessary for the matter at hand. If you rename the document file as “Arbitrary.docm.zip”, open the resultant zip folder, and navigate to the “word” directory inside it, the VBA project will be, by default, in the file “vbaProject.bin”; if you extract this file you will have something to work with. When you have extracted the file, you can close the folder and rename the file back to “Arbitrary.docm”. You don’t actually have to do this yourself: I have done it for you and you can download a zip folder containing the two files, by clicking here: Stylised text masquerading as a button [link to the file on this site at files/ArbitrarySample.zip]

The sample code that I posted is just some scaffolding on which to build; it shows how to navigate the physical file, but it doesn’t really offer much help when you want to work with the logical contents of the file that are held inside the physical wrapper. Any type of file, and file structure, can be held inside a compound binary and, before going any further, one needs to see whatever is inside the file being examined. I did work through the basic process in writing on the previous page, but now it’s time for some code. The code shown below recursively walks the trees of Storage children from left to right, throwing details of each element to the document body as it is extracted. Copy this code, as is, and add it to the “Module1” module; at the end is as good as anywhere.

Function ListChildren(ByVal Parent As Long, ByVal Depth As Long, Optional FileName As String)

    Dim Nodes()             As Long

    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' This walks the binary tree of children, recursively. Working left to right is a '
    ' fairly standard way of doing things but it does not, in this case, produce the  '
    ' results in any kind of recognisable order. Feel free to order it yourself!      '
    ' Before the walk, a single line is output, indicating the file itself as being   '
    ' the parent of all that follows.                                                 '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    If Depth = 1 Then AddStructureLineToDocument FileName, 0, True

    ReDim Nodes(0 To 0)
    Nodes(0) = Directory(Parent).RootChild

    Do
        If Directory(Nodes(UBound(Nodes))).LeftSibling > 0 Then
            ReDim Preserve Nodes(LBound(Nodes) To UBound(Nodes) + 1)
            Nodes(UBound(Nodes)) = Directory(Nodes(UBound(Nodes) - 1)).LeftSibling
        Else
            Do

                With Directory(Nodes(UBound(Nodes)))
                    AddStructureLineToDocument .EntryName, Depth, .EntryType = STGTY_Storage
                    If .EntryType = STGTY_Storage Then
                        ListChildren Nodes(UBound(Nodes)), Depth + 1
                    End If
                End With

                Nodes(UBound(Nodes)) = Nodes(UBound(Nodes)) * -1

                If Directory(Abs(Nodes(UBound(Nodes)))).RightSibling > 0 Then
                    ReDim Preserve Nodes(LBound(Nodes) To UBound(Nodes) + 1)
                    Nodes(UBound(Nodes)) = Directory(Abs(Nodes(UBound(Nodes) - 1))).RightSibling
                    Exit Do
                Else
                    Do
                        If UBound(Nodes) = LBound(Nodes) Then Exit Do
                        ReDim Preserve Nodes(LBound(Nodes) To UBound(Nodes) - 1)
                    Loop While Nodes(UBound(Nodes)) < 0
                
                End If

            Loop Until Nodes(UBound(Nodes)) < 0
        End If

    Loop Until Nodes(UBound(Nodes)) < 0

End Function
 
Sub AddStructureLineToDocument(Name As String, Depth As Long, Storage As Boolean) ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' This is just a bit of fun, really. No explanations, work it out! ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' With MacroContainer.Range .Collapse wdCollapseEnd .InsertParagraph .Collapse wdCollapseEnd .ParagraphFormat.LeftIndent = 15 * Depth - (Not Storage) * 3 .InsertSymbol Font:="Wingdings", CharacterNumber:=Storage - 4046, Unicode:=True End With With MacroContainer.Range .Collapse wdCollapseEnd .InsertAfter Space(2 - (Not Storage) * 2) .Collapse wdCollapseEnd .InsertAfter Name End With End Sub

To incorporate this into the project, a couple of things are needed. Firstly the right file must be opened; assuming you have downloaded and unzipped the file above, then the document containing the code, and the extracted VBA will both be in the same folder, and the demo lines:

    FilePath = "C:\Path\To\Your\"
    FileName = "Document.doc"

.. can be changed to pick up the file without you needing to know, and hard code, exactly where it is:

    FilePath = MacroContainer.Path & Application.PathSeparator
    FileName = "vbaProject.bin"

To run the code, it must be invoked at some point after the basic file structure has been unraveled. The observant reader will realise that sufficient is known by the time the ExtractDirectory routine has been run, but I prefer to let the ExtractShortSectorStream routine also run before trying to do anything else. Having said this is the first thing to do, it should be the first thing done! Adding a call immediately after the aforementioned ExtractShortSectorStream routine is, thus, the way forward. You can, if you wish, comment out the extraction of the “dir” stream as it is not (quite) yet relevant, but it does no harm to leave it alone. The driver routine now looks like this:

Sub SimpleTestDriver()

    Dim FilePath            As String
    Dim FileName            As String
    Dim Stream()            As Byte

    FilePath = MacroContainer.Path & Application.PathSeparator
    FileName = "vbaProject.bin"

    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' There are various ways to read files of binary data, none of which are      '
    ' entirely satisfactory. Here, the simple built-in mechanism is used.         '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    
    FileNo = FreeFile
    Open FilePath & FileName For Binary As FileNo
    
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' Positioning the file at the beginning for clarity, read the File Header.    '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    
    Seek FileNo, 1
    Get FileNo, , FileHeader
    
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' Gather the structural elements of the file, to enable what follows.         '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    
    ExtractSAT
    ExtractSSAT
    ExtractDirectory
    ExtractShortSectorStream
    
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' Look at the logical file structure.                                         '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    
    ListChildren 0, 1, FileName
    
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' Everything is now in place to extract any Stream we want by following the   '
    ' appropriate pointers. For this example, the "dir" stream is extracted.      '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    
    Stream = ExtractStream("dir")
    
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    ' For this example, the file is no longer needed, so close it.                '
    ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
    
    Close FileNo

End Sub

If you replace the existing driver routine with this version, and run it, the file structue will be written to the body of the document, and will look like this:

What you should see in your document
What you should see in your document

Word can't make much sense of this and it is likely that most of it will have squiggly underlining of one colour or another. Feel free to adjust your own copy if you don't like how it looks.

This is a pretty simple Project. Immediately within the file you can see the “VBA” Storage that holds the VBA, a Stream called “PROJECT”, and one called “PROJECTwm”. The distinction between the Project and the VBA is rather blurry, and the two streams, which contain a mixed bag of Project information, will be examined later. For now, the contents of the VBA Storage are more interesting.

The VBA Storage

You will immediately recognise “Module1” and “ThisDocument” as being the two modules (or, for pedants, the one one module and the one class module) that belong to the Project, but what of the others?

The _VBA_PROJECT stream can be ignored, but it is instructive to read the way in which Microsoft state this; they say it contains five fields:

Having ignored the _VBA_PROJECT stream, you come to the one you do need to read in order to be able to understand everything else: the “dir” stream. To avoid this page becoming too long, I am writing about this over two further pages, only one of which is yet written:

Just a stylised bullet Compression of data in the dir Stream

⇒ Compression of data in the dir Stream [link to StreamCompression.php on this site]

Something (Slightly) More Complex

If you have followed what I did, now save the amended document. If you follow the procedure at the start of this page and extract the vbaProject.bin file, and replace the one you previously had with it, and then re-open the document, and re-run the code, unchanged, the output this time, will look like this:

The structure of a compiled VBA Project
The structure of a compiled VBA Project

You now see a handful of Streams, with names that begin “__SRP_”. These have been created by running the code in the Project, or, more accurately, by compiling as much of the Project as necessary to run the code, something that has been done behind the scenes for you. Microsoft, in their only comment on these, say that they specify an ‘an implementation-specific and version-dependent performance cache’, and, somewhat disingenuously, I think, that they MUST be ignored on read, and MUST NOT be present on write.

If you have read the previous page, you should have enough knowledge to delete these Streams from the Compound Binary container. Although you would have to make some other changes to make the file strictly valid, you would find that Word could cope just fine and that these streams are unnecessary. I couldn’t recommend that you actually try this, or that it is in any way sensible, but it does rather seem as though the documentation that says they should not be present may be correct.

The only other thing of consequence that you might expect to see in a VBA Project is a UserForm. I added a simple one to the document I have been demonstrating with here, and this is the result of running the code:

The structure of a document with a UserForm
The structure of a document with a UserForm

Here you see some rather unhelpfully named streams in the UserForm itself, and a new VBA Module for the code in the UserForm. It is worth noting that the stream names will be the same in all UserForms, so you can see the importance of following all the pointers to make sure you find the one you want, in the right UserForm, if you are working outside Word, and not relying on the name alone.

One very minor point to be aware of is that the leading character on the “└VBFrame” stream name is 0x03. Behind the scenes, Word uses this as a code for the Footnote Separator, and, if you save the document containing the output from the VBA code, when you re‑open it, the character will display as a Footnote Separator character, even though it makes no sense in the context.

There isn’t really anything else to say that is relevant to this page - there’s plenty more on the general topic, but I am writing other pages for you to read as you will. You will see many bigger files, but nothing really any more complex than what you see here, in VBA projects.