SharePoint Automation Gary Lapointe – Founding Partner, Aptillon, Inc.

12Aug/1062

Getting an Inventory of All SharePoint Documents Using Windows PowerShell

I got an email today asking if I had anything that would generate a report detailing all the documents throughout an entire SharePoint Farm. As this wasn’t the first time I’ve been asked this same question I decided that I’d just go ahead and post the script for generating such a report.

The script is really quite straightforward – it simply iterates through all Web Applications, Site Collections, Webs, Lists, and finally, List Items. I skip any List that is not a Document Library (as well as the Central Admin site) and then build a hash table containing all the data I want to capture. I then convert that hash table to an object which is written to the pipeline.

All of this is placed in a function which I can call and then pipe the output to something like the Out-GridView cmdlet or the Export-Csv cmdlet. I also wrote the script so that it works with either SharePoint 2007 or SharePoint 2010 so that I don’t have to maintain two versions (I could have used cmdlets such as Get-SPWebApplication, Get-SPSite, and Get-SPWeb but there was little benefit to doing so and the script would be limited to SharePoint 2010).

One word of caution – in a large Farm this script should be run off hours or at least on a back facing server (not your WFE) – it’s going to generate a lot of traffic to your database.

function Get-DocInventory() {
    [void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint")
    $farm = [Microsoft.SharePoint.Administration.SPFarm]::Local
    foreach ($spService in $farm.Services) {
        if (!($spService -is [Microsoft.SharePoint.Administration.SPWebService])) {
            continue;
        }

        foreach ($webApp in $spService.WebApplications) {
            if ($webApp -is [Microsoft.SharePoint.Administration.SPAdministrationWebApplication]) { continue }

            foreach ($site in $webApp.Sites) {
                foreach ($web in $site.AllWebs) {
                    foreach ($list in $web.Lists) {
                        if ($list.BaseType -ne "DocumentLibrary") {
                            continue
                        }
                        foreach ($item in $list.Items) {
                            $data = @{
                                "Web Application" = $webApp.ToString()
                                "Site" = $site.Url
                                "Web" = $web.Url
                                "list" = $list.Title
                                "Item ID" = $item.ID
                                "Item URL" = $item.Url
                                "Item Title" = $item.Title
                                "Item Created" = $item["Created"]
                                "Item Modified" = $item["Modified"]
                                "File Size" = $item.File.Length/1KB
                            }
                            New-Object PSObject -Property $data
                        }
                    }
                    $web.Dispose();
                }
                $site.Dispose()
            }
        }
    }
}
Get-DocInventory | Out-GridView
#Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\inventory.csv

Comments (62) Trackbacks (1)
  1. can you also get the number of versions for each document?

  2. Yup – you would just add that as another property to expose:

    “Item Versions” = $item.Versions.Count

  3. Hi – thanks for posting the inventory script. When I run it, I am able to get a CSV file with some info and then powershell errors out.

    “Get-DocInventory : Exception has been thrown by the target of an invocation.
    At C:\getdocs.ps1:41 char:17
    + Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\inventory.csv
    + CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvocationExcepti
    + FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-DocInventory

    Any thoughts as to what I might be doing wrong?

  4. Not sure what the issue is – you might try adding some Write-Host statements in the script to see where exactly it is failing (or just use the $Error variable to dig into the stack trace if you know how to do that).

  5. In this case I don’t think code or powershell is the right way to go due to the overhead and speed (lack thereof) of sharepoint api at grabbing a dataset like this. I much prefere (and use) this direct SQL query instead:

    SELECT
    Webs.Title AS ‘Web’,
    AllLists.tp_Title AS ‘List Name’,
    AllDocs.DirName AS ‘URL’,
    AllDocs.LeafName AS ‘Name’,
    AllDocs.ExtensionForFile AS ‘File Type’,
    AllDocStreams.Size AS ‘File Size’,
    AllDocs.TimeCreated
    FROM AllDocs
    JOIN AllLists
    ON AllLists.tp_ID = AllDocs.ListId
    JOIN AllDocStreams
    ON AllDocStreams.Id = AllDocs.Id
    JOIN Webs
    ON Webs.Id = AllLists.tp_WebId
    ORDER BY Webs.Title

    ^takes literally a thousand times less overall processing time than sharepoint api.

    You can also filter results easily using something like this:
    WHERE Webs.Title NOT LIKE ‘%SiteName%’
    AND AllDocs.TimeCreated > ’2010-05-10 00:00:00.000′

  6. Yes, hitting the database directly is always going to be faster but it’s also not supported and puts you in an unsupported state with Microsoft. The supported route is to use the API – it’s slower but…

  7. A little Select statement never hurt nobody :)

  8. I have two questions:

    1. Does this script work on 2010 as well? It worked fine for me on 2007.
    2. I need to use a similar approach to get the following properties from each list (not list item):
    a. Content types configured on the list, and the scope of the content type (list, web or site)
    b. Columns configured on the list, and again the scope (list or content type, and if content type, which one)

    How do I go about finding out what properties are available on a list in PowerShell?

  9. Yes, this will work in both 2007 and 2010. To get the properties you can either use the SDK or, if you have an instance of a list (an SPList object), you can pass that object to the Get-Member cmdlet to see all the methods and properties that are available.

  10. I find it absurd that Microsoft dosn’t give you an easy way to determine the number of documents in a document library, not to mention listing the items out by folder. I’ve wasted a few days trying to find what I missed, it turns out that I haven’t missed anything – it dosn’t exist!!!!! Nice! I’ll keep working on this, while Bill gives my money to charities.

  11. Is there a way to retrieve the file extention as well? I’d like to also see what type of file (how many of each) I’m dealing with.

    Thanks

  12. Finally – i have been looking how to do this for ages.
    That is great
    Thankyou very much.

  13. Is there a way to get the author of the document – Thanks

  14. Gary, how would I change the script to against an individual Site Collection?
    Thanks

    • function Get-DocumentInventory([string]$siteUrl) {
      $site = New-Object Microsoft.SharePoint.SPSite $siteUrl
      foreach ($web in $site.AllWebs) {
      foreach ($list in $web.Lists) {
      if ($list.BaseType -ne “DocumentLibrary”) {
      continue
      }
      foreach ($item in $list.Items) {
      $data = @{ … }
      New-Object PSObject -Property $data
      }
      }
      $web.Dispose();
      }
      $site.Dispose()
      }

  15. Hey Gary,

    Just a quick tip that may improve performance…

    To get a filtered set of lists that are only of type DocumentLibrary, you can use this:

    $lists = $web.GetListsOfType([Microsoft.SharePoint.SPBaseType]::DocumentLibrary)

    Cheers,
    Ben

    • Thanks Ben. I always forget about that method. Note that you can also just do this:
      $lists = $web.GetListsOfType(“DocumentLibrary”)
      PowerShell’s adaptive type system will convert the string to the proper enum type.

  16. Gary,
    I tried the the example you gave for running against a Site Collection but I’m doing something wrong.
    What do I need to correct?
    Thanks. Terry

    function Get-DocumentInventory([string]$siteUrl) {
    $site = New-Object Microsoft.SharePoint.SPSite $siteUrl = “http://sharepoint.aa.com/sites/SCname”
    foreach ($web in $site.AllWebs) {
    foreach ($list in $web.Lists) {
    if ($list.BaseType -ne “DocumentLibrary”) {
    continue
    }
    foreach ($item in $list.Items) {

    $data = @{
    “Web Application” = $webApp.ToString()
    “Item Modified” = $item["Modified"]
    “Item Created” = $item["Created"]
    “File Size” = $item.File.Length/1KB
    “Web” = $web.Url
    “Site” = $site.Url
    “Item URL” = $item.Url
    }
    New-Object PSObject -Property $data
    }
    }
    $web.Dispose();
    }
    $site.Dispose()
    }

    Get-DocInventory | Out-GridView
    Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\PowerShell\LOGS\Report_SC_Docs.csv

    • Change this line:
      $site = New-Object Microsoft.SharePoint.SPSite $siteUrl = “http://sharepoint.aa.com/sites/SCname”
      To this:
      $site = New-Object Microsoft.SharePoint.SPSite $siteUrl

      Then pass in the site URL:
      Get-DocInventory “http://sharepoint.aa.com/sites/SCname” | Out-GridView

  17. Hey Gary,

    When I run the code I get this error:

    Get-DocInventory : Exception has been thrown by the target of an invocation.
    At line:39 char:17
    + Get-DocInventory <<<< | Out-GridView
    + CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvocationException
    + FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-DocInventory

    What am I doing wrong?

  18. Is there a way to break out the file name from the URL? We still need the URL but not in the same column as Filename.

    Thanks!

  19. Hi again Gary, it was a copy & paste issue as you said…I got that error cleared up. However, now when I run the script, nothing at all happens, no errors, nothing. Any ideas?

  20. This is a great script :-) Helps a lot. But as i am new to powershell: Is it possible to get the content if a list? So e.g. i have a List with the same name accross the Site Collection and in every site i have a list with the same name. And in this lists i have a column which is called “Topic” could i extract that data also with powershell?

    Also i wanted to search only a certain List with the name “persons” so i changed “if ($list.BaseType -ne “DocumentLibrary”) ” to “if ($list.Title -eq “persons”)” but it doen’t work :-( What is wrong ?

  21. What about files attached to lists that are not doc libraries?

  22. Gary –

    Have you tried to an export on all returned values from this script? I have several hundreds of subsites that we want to dispose of; I need to collect ALL documents from their libraries and migrate to a single library under the Site Collection.

  23. Hi,

    If I would like to filter out file extensions before running the script, i.e only quering for documents with .docx extension, is that something that is possible and how would the code look like and where do i place it?

    Many Thanks

    • It would involve use CAML to query items from a list rather than iterating through every item (though you could still iterate over all items and just check the file extension but that’s not as efficient).

  24. Hi Gary,

    First of all, I would like to thank you for this article. I looked for various methods to getting all documents list and I think this is the best way to realize what I want.

    I faced up a problem, an exception has been thrown about file exporting, I am copying error details. Someone also encounter this problem but nobody tells how to solve it. May be I miss some steps or do incorrectly because I’m new at powershell admin (I started to learn and use it today :) ). After the copied your script, I saved it on a text file then save name as osman.ps1 then invoked it from SPpowershell script.

    PS C:\Users\Administrator> c:\osman.ps1
    Get-DocInventory : Exception has been thrown by the target of an invocation.
    At C:\osman.ps1:42 char:17
    + Get-DocInventory <<< c:\osman.ps1
    Get-DocInventory : Exception has been thrown by the target of an invocation.
    At C:\osman.ps1:42 char:17
    + Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\inventory.csv
    + CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvo
    cationException
    + FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-
    DocInventory

    • Unfortunately that error is just a wrapper so I can’t say exactly what is going on without more information. After you get the error try to outout the $Error automatic variable to see if you can get more details:
      $Error[0] | select *

      0 is the last error that occurs so it should give you what you need (every once in a while I need to check a higher index because of things that happen behind the scenes in powershell which add to this collection but it’s pretty rare).

      • I had exactly the same message and I believe it’s a permissions issue. I tried running the script as the farm admin and had no problems.

  25. Hey there,

    This is a tremendous resource and I’ve had some great results. Thank you for your work and follow up throughout this thread.

    I’m looking for a full listing of all major and minor versions, including comments that have been appended to each version.

    We have the Author (I saw that parameter listed above), and I assume this is capturing who the last editor was.

    thanks again,

  26. Hi,
    To start off with I like your site and have made use of some of the information with great success.

    My question is, is it possible to find out who and when a file in a Document Library has been accessed (read only) if Auditing isn’t turned on. i.e. is there a property that logs the history of a file, like what you get from auditing, even though auditing is not turned on.

    Regards,

    Chris

    • There’s no way out of the box to do that. You’d have to create a custom handler for the specific document type (like I did with the PDF handler on one of my recent posts) and log the information that way.

  27. This is nice. So I’m wondering how we could modify this so we can download the document to a file share. I have a requirement to copy a boatload of documents to a filesystem. I guess I’ll check the SPListItem class and see if there is a download method.

  28. Gary, this is an excellent script thank you!

    Could it be modified to find unpublished documents in a farm?

  29. I’m going to show my ignorance here, but whats the moderation status?

    I tried googling the term in assication to SP2010 files but noting really came up appart from normal approval status.

  30. Hi Gary,
    After executing these below command I am getting below error . I am using SharePoint 2010 Management shell . I would require all the available file name along with their size in a SharePoint Site Collection .Could you please help me on this .

    Add-PsSnapin Microsoft.Sharepoint.Powershell
    function Get-DocumentInventory([string]$siteUrl) {
    $site = New-Object Microsoft.SharePoint.SPSite $siteUrl
    Get-DocInventory “http://sharepoint.share.com” | Out-GridView
    foreach ($web in $site.AllWebs) {
    foreach ($list in $web.Lists) {
    if ($list.BaseType -ne “DocumentLibrary”) {
    continue
    }
    foreach ($item in $list.Items) {

    $data = @{
    “Web Application” = $webApp.ToString()
    “Item Modified” = $item["Modified"]
    “Item Created” = $item["Created"]
    “File Size” = $item.File.Length/1KB
    “Web” = $web.Url
    “Site” = $site.Url
    “Item URL” = $item.Url
    }
    New-Object PSObject -Property $data
    }
    }
    $web.Dispose();
    }
    $site.Dispose()
    }

    Get-DocInventory | Out-GridView
    Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\Report Docs.csv

    The term ‘Get-DocInventory’ is not recognized as the name of a cmdlet, function, script file, or operabl
    e program. Check the spelling of the name, or if a path was included, verify that the path is correct an
    d try again.
    At C:\Temp\report.ps1:28 char:17
    + Get-DocInventory <<<< | Out-GridView
    + CategoryInfo : ObjectNotFound: (Get-DocInventory:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

    The term 'Get-DocInventory' is not recognized as the name of a cmdlet, function, script file, or operabl
    e program. Check the spelling of the name, or if a path was included, verify that the path is correct an
    d try again.
    At C:\Temp\report.ps1:29 char:17
    + Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\Report Docs.csv
    + CategoryInfo : ObjectNotFound: (Get-DocInventory:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException


Leave a comment

(required)