Getting an Inventory of All SharePoint Documents Using Windows PowerShell
I got an email today asking if I had anything that would generate a report detailing all the documents throughout an entire SharePoint Farm. As this wasn’t the first time I’ve been asked this same question I decided that I’d just go ahead and post the script for generating such a report.
The script is really quite straightforward – it simply iterates through all Web Applications, Site Collections, Webs, Lists, and finally, List Items. I skip any List that is not a Document Library (as well as the Central Admin site) and then build a hash table containing all the data I want to capture. I then convert that hash table to an object which is written to the pipeline.
All of this is placed in a function which I can call and then pipe the output to something like the Out-GridView cmdlet or the Export-Csv cmdlet. I also wrote the script so that it works with either SharePoint 2007 or SharePoint 2010 so that I don’t have to maintain two versions (I could have used cmdlets such as Get-SPWebApplication, Get-SPSite, and Get-SPWeb but there was little benefit to doing so and the script would be limited to SharePoint 2010).
One word of caution – in a large Farm this script should be run off hours or at least on a back facing server (not your WFE) – it’s going to generate a lot of traffic to your database.
function Get-DocInventory() { [void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint") $farm = [Microsoft.SharePoint.Administration.SPFarm]::Local foreach ($spService in $farm.Services) { if (!($spService -is [Microsoft.SharePoint.Administration.SPWebService])) { continue; } foreach ($webApp in $spService.WebApplications) { if ($webApp -is [Microsoft.SharePoint.Administration.SPAdministrationWebApplication]) { continue } foreach ($site in $webApp.Sites) { foreach ($web in $site.AllWebs) { foreach ($list in $web.Lists) { if ($list.BaseType -ne "DocumentLibrary") { continue } foreach ($item in $list.Items) { $data = @{ "Web Application" = $webApp.ToString() "Site" = $site.Url "Web" = $web.Url "list" = $list.Title "Item ID" = $item.ID "Item URL" = $item.Url "Item Title" = $item.Title "Item Created" = $item["Created"] "Item Modified" = $item["Modified"] "File Size" = $item.File.Length/1KB } New-Object PSObject -Property $data } } $web.Dispose(); } $site.Dispose() } } } } Get-DocInventory | Out-GridView #Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\inventory.csv
August 17th, 2010 - 14:17
can you also get the number of versions for each document?
August 18th, 2010 - 00:09
Yup – you would just add that as another property to expose:
“Item Versions” = $item.Versions.Count
October 11th, 2011 - 05:30
Hi Gary… I’m still quite new to the world of PowerShell but getting there…
Is it possible to break out from the version count how many Major and Minor versions there are?
October 21st, 2011 - 09:56
you’d have to look at the properties on the object and add them all up but it should be possible.
August 23rd, 2010 - 10:07
Hi – thanks for posting the inventory script. When I run it, I am able to get a CSV file with some info and then powershell errors out.
“Get-DocInventory : Exception has been thrown by the target of an invocation.
At C:\getdocs.ps1:41 char:17
+ Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\inventory.csv
+ CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvocationExcepti
+ FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-DocInventory
Any thoughts as to what I might be doing wrong?
August 23rd, 2010 - 11:10
Not sure what the issue is – you might try adding some Write-Host statements in the script to see where exactly it is failing (or just use the $Error variable to dig into the stack trace if you know how to do that).
August 23rd, 2010 - 18:51
In this case I don’t think code or powershell is the right way to go due to the overhead and speed (lack thereof) of sharepoint api at grabbing a dataset like this. I much prefere (and use) this direct SQL query instead:
SELECT
Webs.Title AS ‘Web’,
AllLists.tp_Title AS ‘List Name’,
AllDocs.DirName AS ‘URL’,
AllDocs.LeafName AS ‘Name’,
AllDocs.ExtensionForFile AS ‘File Type’,
AllDocStreams.Size AS ‘File Size’,
AllDocs.TimeCreated
FROM AllDocs
JOIN AllLists
ON AllLists.tp_ID = AllDocs.ListId
JOIN AllDocStreams
ON AllDocStreams.Id = AllDocs.Id
JOIN Webs
ON Webs.Id = AllLists.tp_WebId
ORDER BY Webs.Title
^takes literally a thousand times less overall processing time than sharepoint api.
You can also filter results easily using something like this:
WHERE Webs.Title NOT LIKE ‘%SiteName%’
AND AllDocs.TimeCreated > ’2010-05-10 00:00:00.000′
September 16th, 2011 - 08:22
I paste this and i am getting error. if there anything i am missing do i need to make any changes before i can use this.
please help thanks.
yp
September 23rd, 2011 - 10:47
What’s the error?
August 26th, 2010 - 10:32
Yes, hitting the database directly is always going to be faster but it’s also not supported and puts you in an unsupported state with Microsoft. The supported route is to use the API – it’s slower but…
January 13th, 2012 - 17:36
Gary is correct. IF you must do direct queries, you should be using the NOLOCK switch in the query. Otherwise, you could inadvertently be causing blocking you did not intend in your production farm.
http://support.microsoft.com/kb/841057
http://www.sharepoint4arabs.com/AymanElHattab/Lists/Posts/Post.aspx?ID=99
December 10th, 2010 - 08:59
A little Select statement never hurt nobody
December 16th, 2010 - 09:27
I have two questions:
1. Does this script work on 2010 as well? It worked fine for me on 2007.
2. I need to use a similar approach to get the following properties from each list (not list item):
a. Content types configured on the list, and the scope of the content type (list, web or site)
b. Columns configured on the list, and again the scope (list or content type, and if content type, which one)
How do I go about finding out what properties are available on a list in PowerShell?
May 18th, 2011 - 05:51
Hey David, we are getting errors when we tried to run this script on MOSS 2007 version. Error is because of -Property .
Pleae need your help.
Thanks
Manasa
June 3rd, 2011 - 09:22
Not sure who David is but I suspect your issue is because you are running PowerShell V1 and not V2?
December 16th, 2010 - 10:11
Yes, this will work in both 2007 and 2010. To get the properties you can either use the SDK or, if you have an instance of a list (an SPList object), you can pass that object to the Get-Member cmdlet to see all the methods and properties that are available.
March 28th, 2011 - 13:14
I find it absurd that Microsoft dosn’t give you an easy way to determine the number of documents in a document library, not to mention listing the items out by folder. I’ve wasted a few days trying to find what I missed, it turns out that I haven’t missed anything – it dosn’t exist!!!!! Nice! I’ll keep working on this, while Bill gives my money to charities.
March 30th, 2011 - 16:58
Did you try the ItemCount property?
$web = Get-SPWeb http://demo
$list = $web.GetList(“Pages”)
$list.ItemCount
March 30th, 2011 - 17:06
Oh, and if you want folders, just do this:
$web = Get-SPWeb http://demo
$list = $web.GetList(“Documents”)
Write-Host “Total Items: $($list.ItemCount)”
$list.Folders | % {
$hash = @{“Url”=$_.Url;”Count”=$_.Folder.ItemCount}
New-Object PSObject -Property $hash
}
April 18th, 2011 - 07:19
Is there a way to retrieve the file extention as well? I’d like to also see what type of file (how many of each) I’m dealing with.
Thanks
April 18th, 2011 - 07:28
You could use this to get the file extension: [System.IO.Path]::GetExtension($item.File.Name)
April 18th, 2011 - 09:10
Thanks! That worked perfect!
September 22nd, 2011 - 08:17
Finally – i have been looking how to do this for ages.
That is great
Thankyou very much.
September 22nd, 2011 - 08:27
Is there a way to get the author of the document – Thanks
September 23rd, 2011 - 10:46
Just add:
“Author” = $item["CreatedBy"]
November 29th, 2011 - 08:46
When I add this snip to retrieve the Author, the value is blank for all documents in the report.
Why would the value be blank?
December 8th, 2011 - 14:29
Try $item.File.Author.
September 23rd, 2011 - 10:28
Gary, how would I change the script to against an individual Site Collection?
Thanks
September 23rd, 2011 - 10:45
function Get-DocumentInventory([string]$siteUrl) {
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl
foreach ($web in $site.AllWebs) {
foreach ($list in $web.Lists) {
if ($list.BaseType -ne “DocumentLibrary”) {
continue
}
foreach ($item in $list.Items) {
$data = @{ … }
New-Object PSObject -Property $data
}
}
$web.Dispose();
}
$site.Dispose()
}
October 3rd, 2011 - 08:11
Hey Gary,
Just a quick tip that may improve performance…
To get a filtered set of lists that are only of type DocumentLibrary, you can use this:
$lists = $web.GetListsOfType([Microsoft.SharePoint.SPBaseType]::DocumentLibrary)
Cheers,
Ben
October 3rd, 2011 - 08:34
Thanks Ben. I always forget about that method. Note that you can also just do this:
$lists = $web.GetListsOfType(“DocumentLibrary”)
PowerShell’s adaptive type system will convert the string to the proper enum type.
October 5th, 2011 - 08:46
Gary,
I tried the the example you gave for running against a Site Collection but I’m doing something wrong.
What do I need to correct?
Thanks. Terry
function Get-DocumentInventory([string]$siteUrl) {
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl = “http://sharepoint.aa.com/sites/SCname”
foreach ($web in $site.AllWebs) {
foreach ($list in $web.Lists) {
if ($list.BaseType -ne “DocumentLibrary”) {
continue
}
foreach ($item in $list.Items) {
$data = @{
“Web Application” = $webApp.ToString()
“Item Modified” = $item["Modified"]
“Item Created” = $item["Created"]
“File Size” = $item.File.Length/1KB
“Web” = $web.Url
“Site” = $site.Url
“Item URL” = $item.Url
}
New-Object PSObject -Property $data
}
}
$web.Dispose();
}
$site.Dispose()
}
Get-DocInventory | Out-GridView
Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\PowerShell\LOGS\Report_SC_Docs.csv
November 2nd, 2011 - 11:01
Change this line:
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl = “http://sharepoint.aa.com/sites/SCname”
To this:
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl
Then pass in the site URL:
Get-DocInventory “http://sharepoint.aa.com/sites/SCname” | Out-GridView
November 2nd, 2011 - 09:26
Hey Gary,
When I run the code I get this error:
Get-DocInventory : Exception has been thrown by the target of an invocation.
At line:39 char:17
+ Get-DocInventory <<<< | Out-GridView
+ CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvocationException
+ FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-DocInventory
What am I doing wrong?
November 2nd, 2011 - 11:00
My guess is a copy and paste issue?
November 2nd, 2011 - 10:56
Is there a way to break out the file name from the URL? We still need the URL but not in the same column as Filename.
Thanks!
November 2nd, 2011 - 10:59
Just use $item.File.Name.
November 4th, 2011 - 06:41
Hi again Gary, it was a copy & paste issue as you said…I got that error cleared up. However, now when I run the script, nothing at all happens, no errors, nothing. Any ideas?
November 4th, 2011 - 09:42
I assume you’re calling the function after loading it?
November 11th, 2011 - 07:10
This is a great script
Helps a lot. But as i am new to powershell: Is it possible to get the content if a list? So e.g. i have a List with the same name accross the Site Collection and in every site i have a list with the same name. And in this lists i have a column which is called “Topic” could i extract that data also with powershell?
Also i wanted to search only a certain List with the name “persons” so i changed “if ($list.BaseType -ne “DocumentLibrary”) ” to “if ($list.Title -eq “persons”)” but it doen’t work
What is wrong ?
December 8th, 2011 - 14:18
It’s definitely possible to do everything you are wanting – I’d have to see your code to understand why it’s not working for you.
December 6th, 2011 - 12:38
What about files attached to lists that are not doc libraries?
December 8th, 2011 - 11:56
You’d have to modify the code to account for lists with attachments enabled and then iterate over the Attachments property and use it to get the SPFile object.
December 8th, 2011 - 10:00
Gary –
Have you tried to an export on all returned values from this script? I have several hundreds of subsites that we want to dispose of; I need to collect ALL documents from their libraries and migrate to a single library under the Site Collection.
December 8th, 2011 - 11:51
Should be pretty easy – just a matter of calling the OpenBinary() method on the SPFile object and saving it to the appropriate path using the [System.IO.File]::WriteAllBytes() method.
December 19th, 2011 - 08:38
Hi,
If I would like to filter out file extensions before running the script, i.e only quering for documents with .docx extension, is that something that is possible and how would the code look like and where do i place it?
Many Thanks
January 14th, 2012 - 10:40
It would involve use CAML to query items from a list rather than iterating through every item (though you could still iterate over all items and just check the file extension but that’s not as efficient).
December 26th, 2011 - 05:54
Hi Gary,
First of all, I would like to thank you for this article. I looked for various methods to getting all documents list and I think this is the best way to realize what I want.
I faced up a problem, an exception has been thrown about file exporting, I am copying error details. Someone also encounter this problem but nobody tells how to solve it. May be I miss some steps or do incorrectly because I’m new at powershell admin (I started to learn and use it today
). After the copied your script, I saved it on a text file then save name as osman.ps1 then invoked it from SPpowershell script.
PS C:\Users\Administrator> c:\osman.ps1
Get-DocInventory : Exception has been thrown by the target of an invocation.
At C:\osman.ps1:42 char:17
+ Get-DocInventory <<< c:\osman.ps1
Get-DocInventory : Exception has been thrown by the target of an invocation.
At C:\osman.ps1:42 char:17
+ Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\inventory.csv
+ CategoryInfo : NotSpecified: (:) [Get-DocInventory], TargetInvo
cationException
+ FullyQualifiedErrorId : System.Reflection.TargetInvocationException,Get-
DocInventory
January 14th, 2012 - 10:31
Unfortunately that error is just a wrapper so I can’t say exactly what is going on without more information. After you get the error try to outout the $Error automatic variable to see if you can get more details:
$Error[0] | select *
0 is the last error that occurs so it should give you what you need (every once in a while I need to check a higher index because of things that happen behind the scenes in powershell which add to this collection but it’s pretty rare).
March 1st, 2012 - 12:33
I had exactly the same message and I believe it’s a permissions issue. I tried running the script as the farm admin and had no problems.
February 16th, 2012 - 08:37
Hey there,
This is a tremendous resource and I’ve had some great results. Thank you for your work and follow up throughout this thread.
I’m looking for a full listing of all major and minor versions, including comments that have been appended to each version.
We have the Author (I saw that parameter listed above), and I assume this is capturing who the last editor was.
thanks again,
March 9th, 2012 - 08:04
To get the versions you’ll have to loop through the Versions property. Look up SPListItemVersion to see what members are available with that object.
March 14th, 2012 - 07:03
Hi,
To start off with I like your site and have made use of some of the information with great success.
My question is, is it possible to find out who and when a file in a Document Library has been accessed (read only) if Auditing isn’t turned on. i.e. is there a property that logs the history of a file, like what you get from auditing, even though auditing is not turned on.
Regards,
Chris
March 19th, 2012 - 10:01
There’s no way out of the box to do that. You’d have to create a custom handler for the specific document type (like I did with the PDF handler on one of my recent posts) and log the information that way.
March 23rd, 2012 - 12:17
This is nice. So I’m wondering how we could modify this so we can download the document to a file share. I have a requirement to copy a boatload of documents to a filesystem. I guess I’ll check the SPListItem class and see if there is a download method.
March 31st, 2012 - 20:03
There’s an OpenBinary() method on the SPFile object. Just use that in conjunction with the [System.IO.File]::WriteAllBytes() method.
March 29th, 2012 - 07:13
Gary, this is an excellent script thank you!
Could it be modified to find unpublished documents in a farm?
March 31st, 2012 - 20:01
Yeah, definitely could – just a matter of checking the moderation status. The Get-SPCheckedOutFiles cmdlet could also be useful.
April 6th, 2012 - 09:44
I’m going to show my ignorance here, but whats the moderation status?
I tried googling the term in assication to SP2010 files but noting really came up appart from normal approval status.
May 15th, 2012 - 20:16
It’s the approval status – moderation status is how the API refers to it.
April 24th, 2012 - 03:22
Hi Gary,
After executing these below command I am getting below error . I am using SharePoint 2010 Management shell . I would require all the available file name along with their size in a SharePoint Site Collection .Could you please help me on this .
Add-PsSnapin Microsoft.Sharepoint.Powershell
function Get-DocumentInventory([string]$siteUrl) {
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl
Get-DocInventory “http://sharepoint.share.com” | Out-GridView
foreach ($web in $site.AllWebs) {
foreach ($list in $web.Lists) {
if ($list.BaseType -ne “DocumentLibrary”) {
continue
}
foreach ($item in $list.Items) {
$data = @{
“Web Application” = $webApp.ToString()
“Item Modified” = $item["Modified"]
“Item Created” = $item["Created"]
“File Size” = $item.File.Length/1KB
“Web” = $web.Url
“Site” = $site.Url
“Item URL” = $item.Url
}
New-Object PSObject -Property $data
}
}
$web.Dispose();
}
$site.Dispose()
}
Get-DocInventory | Out-GridView
Get-DocInventory | Export-Csv -NoTypeInformation -Path c:\Report Docs.csv
The term ‘Get-DocInventory’ is not recognized as the name of a cmdlet, function, script file, or operabl
e program. Check the spelling of the name, or if a path was included, verify that the path is correct an
d try again.
At C:\Temp\report.ps1:28 char:17
+ Get-DocInventory <<<< | Out-GridView
+ CategoryInfo : ObjectNotFound: (Get-DocInventory:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
The term 'Get-DocInventory' is not recognized as the name of a cmdlet, function, script file, or operabl
e program. Check the spelling of the name, or if a path was included, verify that the path is correct an
d try again.
At C:\Temp\report.ps1:29 char:17
+ Get-DocInventory <<<< | Export-Csv -NoTypeInformation -Path c:\Report Docs.csv
+ CategoryInfo : ObjectNotFound: (Get-DocInventory:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
May 15th, 2012 - 20:12
You changed the function name from Get-DocInventory to Get-DocumentInventory but you didn’t change how you’re calling the function.