chore: complete v1.0 milestone
Archive 5 phases (36 plans) to milestones/v1.0-phases/. Archive roadmap, requirements, and audit to milestones/. Evolve PROJECT.md with shipped state and validated requirements. Collapse ROADMAP.md to one-line milestone summary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
756
.planning/milestones/v1.0-phases/03-storage/03-RESEARCH.md
Normal file
756
.planning/milestones/v1.0-phases/03-storage/03-RESEARCH.md
Normal file
@@ -0,0 +1,756 @@
|
||||
# Phase 3: Storage and File Operations - Research
|
||||
|
||||
**Researched:** 2026-04-02
|
||||
**Domain:** CSOM StorageMetrics, SharePoint KQL Search, WPF DataGrid, duplicate detection
|
||||
**Confidence:** HIGH
|
||||
|
||||
---
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|-----------------|
|
||||
| STOR-01 | User can view storage consumption per library on a site | CSOM `Folder.StorageMetrics` (one Load call per folder) + flat DataGrid with indent column |
|
||||
| STOR-02 | User can view storage consumption per site with configurable folder depth | Recursive `Collect-FolderStorage` pattern translated to async CSOM; depth guard via split-count |
|
||||
| STOR-03 | Storage metrics include total size, version size, item count, and last modified date | `StorageMetrics.TotalSize`, `TotalFileStreamSize`, `TotalFileCount`, `StorageMetrics.LastModified`; version size = TotalSize - TotalFileStreamSize |
|
||||
| STOR-04 | User can export storage metrics to CSV | New `StorageCsvExportService` — same UTF-8 BOM pattern as Phase 2 |
|
||||
| STOR-05 | User can export storage metrics to interactive HTML with collapsible tree view | New `StorageHtmlExportService` — port PS lines 1621-1780; toggle() JS + nested table rows |
|
||||
| SRCH-01 | User can search files across sites using multiple criteria | `KeywordQuery` + `SearchExecutor` (CSOM search); KQL built from filter params; client-side Regex post-filter |
|
||||
| SRCH-02 | User can configure maximum search results (up to 50,000) | SharePoint Search `StartRow` hard cap is 50,000 (boundary); 500 rows/batch × 100 pages = 50,000 max |
|
||||
| SRCH-03 | User can export search results to CSV | New `SearchCsvExportService` |
|
||||
| SRCH-04 | User can export search results to interactive HTML (sortable, filterable) | New `SearchHtmlExportService` — port PS lines 2112-2233; sortable columns via data attributes |
|
||||
| DUPL-01 | User can scan for duplicate files by name, size, creation date, modification date | Search API (same as SRCH) + client-side GroupBy composite key; no content hashing needed |
|
||||
| DUPL-02 | User can scan for duplicate folders by name, subfolder count, file count | `SharePointPaginationHelper.GetAllItemsAsync` with CAML `FSObjType=1`; read `FolderChildCount`, `ItemChildCount` from field values |
|
||||
| DUPL-03 | User can export duplicate report to HTML with grouped display and visual indicators | New `DuplicatesHtmlExportService` — port PS lines 2235-2406; collapsible group cards, ok/diff badges |
|
||||
</phase_requirements>
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 3 introduces three feature areas (Storage Metrics, File Search, Duplicate Detection), each requiring a dedicated ViewModel, View, Service, and export services. All three areas can be implemented without adding new NuGet packages — `Microsoft.SharePoint.Client.Search.dll` is already in the output folder as a transitive dependency of PnP.Framework 1.18.0.
|
||||
|
||||
**Storage** uses CSOM `Folder.StorageMetrics` (loaded via `ctx.Load(folder, f => f.StorageMetrics)`). One CSOM round-trip per folder. Version size is derived as `TotalSize - TotalFileStreamSize`. The data model is a recursive tree (site → library → folder → subfolder), flattened to a `DataGrid` with an indent-level column for WPF display. The HTML export ports the PS `Export-StorageToHTML` function (PS lines 1621-1780) with its toggle(i) JS pattern.
|
||||
|
||||
**File Search** uses `Microsoft.SharePoint.Client.Search.Query.KeywordQuery` + `SearchExecutor`. KQL is assembled from UI filter fields (extension, date range, creator, editor, library path). Pagination is `StartRow += 500` per batch; the hard ceiling is `StartRow = 50,000` (SharePoint Search boundary), which means the 50,000 max-results requirement (SRCH-02) is exactly the platform limit. Client-side Regex is applied after retrieval. The HTML export ports PS lines 2112-2233.
|
||||
|
||||
**Duplicate Detection** uses the same Search API for file duplicates (with all documents query) and `SharePointPaginationHelper.GetAllItemsAsync` with FSObjType CAML filter for folder duplicates. Items are grouped client-side by a composite key (name + optional size/dates/counts). No content hashing is needed — the DUPL-01/02/03 requirements specify name+size+dates, which exactly matches the PS reference implementation.
|
||||
|
||||
**Primary recommendation:** Three ViewModels (StorageViewModel, SearchViewModel, DuplicatesViewModel), three service interfaces, six export services (storage CSV/HTML, search CSV/HTML, duplicates HTML — duplicates CSV is bonus), all extending existing Phase 2 patterns.
|
||||
|
||||
---
|
||||
|
||||
## User Constraints
|
||||
|
||||
No CONTEXT.md exists for Phase 3 (no /gsd:discuss-phase was run). All decisions below are from the locked technology stack in the prompt.
|
||||
|
||||
### Locked Decisions
|
||||
- .NET 10 LTS + WPF + MVVM (CommunityToolkit.Mvvm 8.4.2)
|
||||
- PnP.Framework 1.18.0 (CSOM-based SharePoint access)
|
||||
- No new major packages preferred — only add if truly necessary
|
||||
- Microsoft.Extensions.Hosting DI
|
||||
- Serilog logging
|
||||
- xUnit 2.9.3 tests
|
||||
|
||||
### Deferred / Out of Scope
|
||||
- Content hashing for duplicate detection (v2)
|
||||
- Storage charts/graphs (v2 requirement VIZZ-01/02/03)
|
||||
- Cross-tenant file search
|
||||
|
||||
---
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core (no new packages needed)
|
||||
|
||||
| Library | Version | Purpose | Why Standard |
|
||||
|---------|---------|---------|--------------|
|
||||
| PnP.Framework | 1.18.0 | CSOM access, `ClientContext` | Already in project |
|
||||
| Microsoft.SharePoint.Client.Search.dll | (via PnP.Framework) | `KeywordQuery`, `SearchExecutor` | Transitive dep — confirmed present in `bin/Debug/net10.0-windows/` |
|
||||
| CommunityToolkit.Mvvm | 8.4.2 | `[ObservableProperty]`, `AsyncRelayCommand` | Already in project |
|
||||
| Microsoft.Extensions.Hosting | 10.x | DI container | Already in project |
|
||||
| Serilog | 4.3.1 | Structured logging | Already in project |
|
||||
| xUnit | 2.9.3 | Tests | Already in project |
|
||||
| Moq | 4.20.72 | Mock interfaces in tests | Already in project |
|
||||
|
||||
**No new NuGet packages required.** `Microsoft.SharePoint.Client.Search.dll` ships as a transitive dependency of PnP.Framework — confirmed present at `SharepointToolbox/bin/Debug/net10.0-windows/Microsoft.SharePoint.Client.Search.dll`.
|
||||
|
||||
### New Models Needed
|
||||
|
||||
| Model | Location | Fields |
|
||||
|-------|----------|--------|
|
||||
| `StorageNode` | `Core/Models/StorageNode.cs` | `string Name`, `string Url`, `string SiteTitle`, `string Library`, `long TotalSizeBytes`, `long FileStreamSizeBytes`, `long TotalFileCount`, `DateTime? LastModified`, `int IndentLevel`, `List<StorageNode> Children` |
|
||||
| `SearchResult` | `Core/Models/SearchResult.cs` | `string Title`, `string Path`, `string FileExtension`, `DateTime? Created`, `DateTime? LastModified`, `string Author`, `string ModifiedBy`, `long SizeBytes` |
|
||||
| `DuplicateGroup` | `Core/Models/DuplicateGroup.cs` | `string GroupKey`, `string Name`, `List<DuplicateItem> Items` |
|
||||
| `DuplicateItem` | `Core/Models/DuplicateItem.cs` | `string Name`, `string Path`, `string Library`, `long? SizeBytes`, `DateTime? Created`, `DateTime? Modified`, `int? FolderCount`, `int? FileCount` |
|
||||
| `StorageScanOptions` | `Core/Models/StorageScanOptions.cs` | `bool PerLibrary`, `bool IncludeSubsites`, `int FolderDepth` |
|
||||
| `SearchOptions` | `Core/Models/SearchOptions.cs` | `string[] Extensions`, `string? Regex`, `DateTime? CreatedAfter`, `DateTime? CreatedBefore`, `DateTime? ModifiedAfter`, `DateTime? ModifiedBefore`, `string? CreatedBy`, `string? ModifiedBy`, `string? Library`, `int MaxResults` |
|
||||
| `DuplicateScanOptions` | `Core/Models/DuplicateScanOptions.cs` | `string Mode` ("Files"/"Folders"), `bool MatchSize`, `bool MatchCreated`, `bool MatchModified`, `bool MatchSubfolderCount`, `bool MatchFileCount`, `bool IncludeSubsites`, `string? Library` |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended Project Structure (additions only)
|
||||
|
||||
```
|
||||
SharepointToolbox/
|
||||
├── Core/Models/
|
||||
│ ├── StorageNode.cs # new
|
||||
│ ├── SearchResult.cs # new
|
||||
│ ├── DuplicateGroup.cs # new
|
||||
│ ├── DuplicateItem.cs # new
|
||||
│ ├── StorageScanOptions.cs # new
|
||||
│ ├── SearchOptions.cs # new
|
||||
│ └── DuplicateScanOptions.cs # new
|
||||
├── Services/
|
||||
│ ├── IStorageService.cs # new
|
||||
│ ├── StorageService.cs # new
|
||||
│ ├── ISearchService.cs # new
|
||||
│ ├── SearchService.cs # new
|
||||
│ ├── IDuplicatesService.cs # new
|
||||
│ ├── DuplicatesService.cs # new
|
||||
│ └── Export/
|
||||
│ ├── StorageCsvExportService.cs # new
|
||||
│ ├── StorageHtmlExportService.cs # new
|
||||
│ ├── SearchCsvExportService.cs # new
|
||||
│ ├── SearchHtmlExportService.cs # new
|
||||
│ └── DuplicatesHtmlExportService.cs # new
|
||||
├── ViewModels/Tabs/
|
||||
│ ├── StorageViewModel.cs # new
|
||||
│ ├── SearchViewModel.cs # new
|
||||
│ └── DuplicatesViewModel.cs # new
|
||||
└── Views/Tabs/
|
||||
├── StorageView.xaml # new
|
||||
├── StorageView.xaml.cs # new
|
||||
├── SearchView.xaml # new
|
||||
├── SearchView.xaml.cs # new
|
||||
├── DuplicatesView.xaml # new
|
||||
└── DuplicatesView.xaml.cs # new
|
||||
```
|
||||
|
||||
### Pattern 1: CSOM StorageMetrics Load
|
||||
|
||||
**What:** Load `Folder.StorageMetrics` with a single round-trip per folder. StorageMetrics is a child object — you must include it in the Load expression or it will not be fetched.
|
||||
|
||||
**When to use:** Whenever reading storage data for a folder or library root.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Source: https://learn.microsoft.com/en-us/dotnet/api/microsoft.sharepoint.client.storagemetrics
|
||||
// + https://longnlp.github.io/load-storage-metric-from-SPO
|
||||
|
||||
// Get folder by server-relative URL (library root or subfolder)
|
||||
Folder folder = ctx.Web.GetFolderByServerRelativeUrl(serverRelativeUrl);
|
||||
ctx.Load(folder,
|
||||
f => f.StorageMetrics, // pulls TotalSize, TotalFileStreamSize, TotalFileCount, LastModified
|
||||
f => f.TimeLastModified, // alternative timestamp if StorageMetrics.LastModified is null
|
||||
f => f.ServerRelativeUrl,
|
||||
f => f.Name);
|
||||
await ExecuteQueryRetryHelper.ExecuteQueryRetryAsync(ctx, progress, ct);
|
||||
|
||||
long totalBytes = folder.StorageMetrics.TotalSize;
|
||||
long streamBytes = folder.StorageMetrics.TotalFileStreamSize; // current-version files only
|
||||
long versionBytes = Math.Max(0L, totalBytes - streamBytes); // version overhead
|
||||
long fileCount = folder.StorageMetrics.TotalFileCount;
|
||||
DateTime? lastMod = folder.StorageMetrics.IsPropertyAvailable("LastModified")
|
||||
? folder.StorageMetrics.LastModified
|
||||
: folder.TimeLastModified;
|
||||
```
|
||||
|
||||
**Unit:** `TotalSize` and `TotalFileStreamSize` are in **bytes** (Int64). `TotalFileStreamSize` is the aggregate stream size for current-version file content only — it excludes version history, metadata, and attachments (confirmed by [MS-CSOMSPT]). Version storage = `TotalSize - TotalFileStreamSize`.
|
||||
|
||||
### Pattern 2: KQL Search with Pagination
|
||||
|
||||
**What:** Use `KeywordQuery` + `SearchExecutor` (in `Microsoft.SharePoint.Client.Search.Query`) to execute a KQL query, paginating 500 rows at a time via `StartRow`.
|
||||
|
||||
**When to use:** SRCH-01/02/03/04 (file search) and DUPL-01 (file duplicate detection).
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Source: https://learn.microsoft.com/en-us/dotnet/api/microsoft.sharepoint.client.search.query.searchexecutor
|
||||
// + https://usefulscripts.wordpress.com/2015/09/11/how-to-fetch-all-results-from-sharepoint-search-using-dot-net-managed-csom/
|
||||
|
||||
using Microsoft.SharePoint.Client.Search.Query;
|
||||
|
||||
// namespace: Microsoft.SharePoint.Client.Search.Query
|
||||
// assembly: Microsoft.SharePoint.Client.Search.dll (via PnP.Framework transitive dep)
|
||||
|
||||
var allResults = new List<IDictionary<string, object>>();
|
||||
int startRow = 0;
|
||||
const int batchSize = 500;
|
||||
|
||||
do
|
||||
{
|
||||
ct.ThrowIfCancellationRequested();
|
||||
|
||||
var kq = new KeywordQuery(ctx)
|
||||
{
|
||||
QueryText = kql, // e.g. "ContentType:Document AND FileExtension:pdf"
|
||||
StartRow = startRow,
|
||||
RowLimit = batchSize,
|
||||
TrimDuplicates = false
|
||||
};
|
||||
// Explicit managed properties to retrieve
|
||||
kq.SelectProperties.AddRange(new[]
|
||||
{
|
||||
"Title", "Path", "Author", "LastModifiedTime",
|
||||
"FileExtension", "Created", "ModifiedBy", "Size"
|
||||
});
|
||||
|
||||
var executor = new SearchExecutor(ctx);
|
||||
ClientResult<ResultTableCollection> clientResult = executor.ExecuteQuery(kq);
|
||||
await ExecuteQueryRetryHelper.ExecuteQueryRetryAsync(ctx, progress, ct);
|
||||
// Note: ctx.ExecuteQuery() is called inside ExecuteQueryRetryAsync — do NOT call again
|
||||
|
||||
var table = clientResult.Value
|
||||
.FirstOrDefault(t => t.TableType == KnownTableTypes.RelevantResults);
|
||||
if (table == null) break;
|
||||
|
||||
int retrieved = table.RowCount;
|
||||
foreach (System.Collections.Hashtable row in table.ResultRows)
|
||||
{
|
||||
allResults.Add(row.Cast<System.Collections.DictionaryEntry>()
|
||||
.ToDictionary(e => e.Key.ToString()!, e => e.Value ?? string.Empty));
|
||||
}
|
||||
|
||||
progress.Report(new OperationProgress(allResults.Count, maxResults, $"Retrieved {allResults.Count} results…"));
|
||||
startRow += batchSize;
|
||||
}
|
||||
while (startRow < maxResults && startRow <= 50_000 // platform hard cap
|
||||
&& allResults.Count < maxResults);
|
||||
```
|
||||
|
||||
**Critical detail:** `ExecuteQueryRetryHelper.ExecuteQueryRetryAsync` wraps `ctx.ExecuteQuery()`. Call it AFTER `executor.ExecuteQuery(kq)` — do NOT call `ctx.ExecuteQuery()` directly afterward.
|
||||
|
||||
**StartRow limit:** SharePoint Search imposes a hard boundary of 50,000 for `StartRow`. With batch size 500, max pages = 100, max results = 50,000. This exactly satisfies SRCH-02.
|
||||
|
||||
**KQL field mappings (from PS reference lines 4747-4763):**
|
||||
- Extension: `FileExtension:pdf OR FileExtension:docx`
|
||||
- Created after/before: `Created>=2024-01-01` / `Created<=2024-12-31`
|
||||
- Modified after/before: `Write>=2024-01-01` / `Write<=2024-12-31`
|
||||
- Created by: `Author:"First Last"`
|
||||
- Modified by: `ModifiedBy:"First Last"`
|
||||
- Library path: `Path:"https://tenant.sharepoint.com/sites/x/Shared Documents*"`
|
||||
- Documents only: `ContentType:Document`
|
||||
|
||||
### Pattern 3: Folder Enumeration for Duplicate Folders
|
||||
|
||||
**What:** Use `SharePointPaginationHelper.GetAllItemsAsync` with a CAML filter on `FSObjType = 1` (folders). Read `FolderChildCount` and `ItemChildCount` from `FieldValues`.
|
||||
|
||||
**When to use:** DUPL-02 (folder duplicate scan).
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Source: PS reference lines 5010-5036; Phase 2 SharePointPaginationHelper pattern
|
||||
|
||||
var camlQuery = new CamlQuery
|
||||
{
|
||||
ViewXml = @"<View Scope='RecursiveAll'>
|
||||
<Query>
|
||||
<Where>
|
||||
<Eq>
|
||||
<FieldRef Name='FSObjType' />
|
||||
<Value Type='Integer'>1</Value>
|
||||
</Eq>
|
||||
</Where>
|
||||
</Query>
|
||||
<RowLimit>2000</RowLimit>
|
||||
</View>"
|
||||
};
|
||||
|
||||
await foreach (var item in SharePointPaginationHelper.GetAllItemsAsync(ctx, list, camlQuery, ct))
|
||||
{
|
||||
var fv = item.FieldValues;
|
||||
var name = fv["FileLeafRef"]?.ToString() ?? string.Empty;
|
||||
var fileRef = fv["FileRef"]?.ToString() ?? string.Empty;
|
||||
var subCount = Convert.ToInt32(fv["FolderChildCount"] ?? 0);
|
||||
var childCount = Convert.ToInt32(fv["ItemChildCount"] ?? 0);
|
||||
var fileCount = Math.Max(0, childCount - subCount);
|
||||
var created = fv["Created"] is DateTime cr ? cr : (DateTime?)null;
|
||||
var modified = fv["Modified"] is DateTime md ? md : (DateTime?)null;
|
||||
// ...build DuplicateItem
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 4: Duplicate Composite Key (name+size+date grouping)
|
||||
|
||||
**What:** Build a string composite key from the fields the user selected, then `GroupBy(key).Where(g => g.Count() >= 2)`.
|
||||
|
||||
**When to use:** DUPL-01 (files) and DUPL-02 (folders).
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Source: PS reference lines 4942-4949 (MakeKey function)
|
||||
|
||||
private static string MakeKey(DuplicateItem item, DuplicateScanOptions opts)
|
||||
{
|
||||
var parts = new List<string> { item.Name.ToLowerInvariant() };
|
||||
if (opts.MatchSize && item.SizeBytes.HasValue) parts.Add(item.SizeBytes.Value.ToString());
|
||||
if (opts.MatchCreated && item.Created.HasValue) parts.Add(item.Created.Value.Date.ToString("yyyy-MM-dd"));
|
||||
if (opts.MatchModified && item.Modified.HasValue) parts.Add(item.Modified.Value.Date.ToString("yyyy-MM-dd"));
|
||||
if (opts.MatchSubfolderCount && item.FolderCount.HasValue) parts.Add(item.FolderCount.Value.ToString());
|
||||
if (opts.MatchFileCount && item.FileCount.HasValue) parts.Add(item.FileCount.Value.ToString());
|
||||
return string.Join("|", parts);
|
||||
}
|
||||
|
||||
var groups = allItems
|
||||
.GroupBy(i => MakeKey(i, opts))
|
||||
.Where(g => g.Count() >= 2)
|
||||
.Select(g => new DuplicateGroup
|
||||
{
|
||||
GroupKey = g.Key,
|
||||
Name = g.First().Name,
|
||||
Items = g.ToList()
|
||||
})
|
||||
.OrderByDescending(g => g.Items.Count)
|
||||
.ToList();
|
||||
```
|
||||
|
||||
### Pattern 5: Storage Recursive Tree → Flat Row List for DataGrid
|
||||
|
||||
**What:** Flatten the recursive tree (site → library → folder → subfolder) into a flat `List<StorageNode>` where each node carries an `IndentLevel`. The WPF `DataGrid` renders a `Margin` on the name cell based on `IndentLevel`.
|
||||
|
||||
**When to use:** STOR-01/02 WPF display.
|
||||
|
||||
**Rationale for DataGrid over TreeView:** WPF `TreeView` requires hierarchical `HierarchicalDataTemplate` and loses virtualization with deep nesting. A flat `DataGrid` with `VirtualizingPanel.IsVirtualizing="True"` stays performant for thousands of rows and is trivially sortable.
|
||||
|
||||
**Example:**
|
||||
```csharp
|
||||
// Flatten tree to observable list for DataGrid binding
|
||||
private static void FlattenTree(StorageNode node, int level, List<StorageNode> result)
|
||||
{
|
||||
node.IndentLevel = level;
|
||||
result.Add(node);
|
||||
foreach (var child in node.Children)
|
||||
FlattenTree(child, level + 1, result);
|
||||
}
|
||||
```
|
||||
|
||||
```xml
|
||||
<!-- WPF DataGrid cell template for name column with indent -->
|
||||
<DataGridTemplateColumn Header="Library / Folder" Width="*">
|
||||
<DataGridTemplateColumn.CellTemplate>
|
||||
<DataTemplate>
|
||||
<TextBlock Text="{Binding Name}"
|
||||
Margin="{Binding IndentLevel, Converter={StaticResource IndentConverter}}" />
|
||||
</DataTemplate>
|
||||
</DataGridTemplateColumn.CellTemplate>
|
||||
</DataGridTemplateColumn>
|
||||
```
|
||||
|
||||
Use `IValueConverter` mapping `IndentLevel` → `new Thickness(IndentLevel * 16, 0, 0, 0)`.
|
||||
|
||||
### Pattern 6: Storage HTML Collapsible Tree
|
||||
|
||||
**What:** The HTML export uses inline nested tables with `display:none` rows toggled by `toggle(i)` JS. Each library/folder that has children gets a unique numeric index.
|
||||
|
||||
**When to use:** STOR-05 export.
|
||||
|
||||
**Key design (from PS lines 1621-1780):**
|
||||
- A global `_togIdx` counter assigns unique IDs to collapsible rows: `<tr id='sf-{i}' style='display:none'>`.
|
||||
- A `<button onclick='toggle({i})'>` triggers `row.style.display = visible ? 'none' : 'table-row'`.
|
||||
- Library rows embed a nested `<table class='sf-tbl'>` inside the collapsible row (colspan spanning all columns).
|
||||
- This is a pure inline pattern — no external JS or CSS dependencies.
|
||||
- In C# the counter is a field on `StorageHtmlExportService` reset at the start of each `BuildHtml()` call.
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
|
||||
- **Loading StorageMetrics without including it in ctx.Load:** `folder.StorageMetrics.TotalSize` throws `PropertyOrFieldNotInitializedException` if `StorageMetrics` is not included in the Load expression. Always use `ctx.Load(folder, f => f.StorageMetrics, ...)`.
|
||||
- **Calling ctx.ExecuteQuery() after executor.ExecuteQuery(kq):** The search executor pattern requires calling `ctx.ExecuteQuery()` ONCE (inside `ExecuteQueryRetryAsync`). Calling it twice is a no-op at best, throws at worst.
|
||||
- **StartRow > 50,000:** SharePoint Search hard boundary — will return zero results or error. Cap loop exit at `startRow <= 50_000`.
|
||||
- **Modifying ObservableCollection from Task.Run:** Same rule as Phase 2 — accumulate in `List<T>` on background thread, then `Dispatcher.InvokeAsync(() => StorageResults = new ObservableCollection<T>(list))`.
|
||||
- **Recursive CSOM calls without depth guard:** Without a depth guard, `Collect-FolderStorage` on a deep site can make thousands of CSOM round-trips. Always pass `MaxDepth` and check `currentDepth >= maxDepth` before recursing.
|
||||
- **Building a TreeView for storage display:** WPF TreeView loses UI virtualization with more than ~1000 visible items. Use DataGrid with IndentLevel.
|
||||
- **Version size from index:** The Search API's `Size` property is the current-version file size, not total including versions. Only `StorageMetrics.TotalFileStreamSize` vs `TotalSize` gives accurate version overhead.
|
||||
|
||||
---
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| CSOM throttle retry | Custom retry loop | `ExecuteQueryRetryHelper.ExecuteQueryRetryAsync` (Phase 1) | Already handles 429/503 with exponential backoff |
|
||||
| List pagination | Raw `ExecuteQuery` loop | `SharePointPaginationHelper.GetAllItemsAsync` (Phase 1) | Handles 5000-item threshold, CAML position continuation |
|
||||
| Search pagination | Manual `do/while` per search | Same `KeywordQuery`+`SearchExecutor` pattern (internal to SearchService) | Wrap in a helper method inside `SearchService` to avoid duplication across SRCH and DUPL features |
|
||||
| HTML header/footer boilerplate | New template each export service | Copy from existing `HtmlExportService` pattern (Phase 2) | Consistent `<!DOCTYPE>`, viewport meta, `Segoe UI` font stack |
|
||||
| CSV field escaping | Custom escaping | RFC 4180 `Csv()` helper pattern from Phase 2 `CsvExportService` | Already handles quotes, empty values, UTF-8 BOM |
|
||||
| OperationProgress reporting | New progress model | `OperationProgress.Indeterminate(msg)` + `new OperationProgress(current, total, msg)` (Phase 1) | Already wired to UI via `FeatureViewModelBase` |
|
||||
| Tenant context management | Directly create `ClientContext` | `ISessionManager.GetOrCreateContextAsync` (Phase 1) | Handles MSAL cache, per-tenant context pooling |
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: StorageMetrics PropertyOrFieldNotInitializedException
|
||||
**What goes wrong:** `folder.StorageMetrics.TotalSize` throws `PropertyOrFieldNotInitializedException` at runtime.
|
||||
**Why it happens:** CSOM lazy-loading — if `StorageMetrics` is not in the Load expression, the proxy object exists but has no data.
|
||||
**How to avoid:** Always include `f => f.StorageMetrics` in the `ctx.Load(folder, ...)` lambda.
|
||||
**Warning signs:** Exception message contains "The property or field 'StorageMetrics' has not been initialized".
|
||||
|
||||
### Pitfall 2: Search ResultRows Type Is IDictionary-like But Not Strongly Typed
|
||||
**What goes wrong:** Accessing `row["Size"]` returns object — Size comes back as a string `"12345"` not a long.
|
||||
**Why it happens:** `ResultTable.ResultRows` is `IEnumerable<IDictionary<string, object>>`. All values are strings from the search index.
|
||||
**How to avoid:** Always parse with `long.TryParse(row["Size"]?.ToString() ?? "0", out var sizeBytes)`. Strip non-numeric characters as PS does: `Regex.Replace(sizeStr, "[^0-9]", "")`.
|
||||
**Warning signs:** `InvalidCastException` when binding Size to a numeric column.
|
||||
|
||||
### Pitfall 3: Search API Returns Duplicates for Versioned Files
|
||||
**What goes wrong:** Files with many versions appear multiple times in results via `/_vti_history/` paths.
|
||||
**Why it happens:** SharePoint indexes each version as a separate item in some cases.
|
||||
**How to avoid:** Filter items where `Path.Contains("/_vti_history/", StringComparison.OrdinalIgnoreCase)` — port of PS line 4973.
|
||||
**Warning signs:** Duplicate file paths in results with `_vti_history` segment.
|
||||
|
||||
### Pitfall 4: StorageMetrics.LastModified May Be DateTime.MinValue
|
||||
**What goes wrong:** `LastModified` shows as 01/01/0001 for empty folders.
|
||||
**Why it happens:** SharePoint returns a default DateTime for folders with no modifications.
|
||||
**How to avoid:** Check `lastModified > DateTime.MinValue` before formatting. Fall back to `folder.TimeLastModified` if `StorageMetrics.LastModified` is unset.
|
||||
**Warning signs:** "01/01/0001" in the LastModified column.
|
||||
|
||||
### Pitfall 5: KQL Query Text Exceeds 4096 Characters
|
||||
**What goes wrong:** Search query silently fails or returns error for very long KQL strings.
|
||||
**Why it happens:** SharePoint Search has a 4096-character KQL text boundary.
|
||||
**How to avoid:** For extension filters with many extensions, use `(FileExtension:a OR FileExtension:b OR ...)` and validate total length before calling. Warn user if limit approached.
|
||||
**Warning signs:** Zero results returned when many extensions entered; no CSOM exception.
|
||||
|
||||
### Pitfall 6: CAML FSObjType Field Name
|
||||
**What goes wrong:** CAML query for folders returns no results.
|
||||
**Why it happens:** The internal CAML field name is `FSObjType`, not `FileSystemObjectType`. Using the wrong name returns no matches silently.
|
||||
**How to avoid:** Use `<FieldRef Name='FSObjType' />` (integer) with `<Value Type='Integer'>1</Value>`. Confirmed by PS reference line 5011 which uses CSOM `FileSystemObjectType.Folder` comparison.
|
||||
**Warning signs:** Zero items returned from folder CAML query on a library known to have folders.
|
||||
|
||||
### Pitfall 7: StorageService Needs Web.ServerRelativeUrl to Compute Site-Relative Path
|
||||
**What goes wrong:** `Get-PnPFolderStorageMetric -FolderSiteRelativeUrl` requires a path relative to the web root (e.g., `Shared Documents`), not the server root (e.g., `/sites/MySite/Shared Documents`).
|
||||
**Why it happens:** CSOM `Folder.StorageMetrics` uses server-relative URLs, so you need to strip the web's ServerRelativeUrl prefix.
|
||||
**How to avoid:** Load `ctx.Web.ServerRelativeUrl` first, then compute: `siteRelUrl = rootFolder.ServerRelativeUrl.Substring(webSrl.Length).TrimStart('/')`. Use `ctx.Web.GetFolderByServerRelativeUrl(siteAbsoluteUrl)` which accepts full server-relative paths.
|
||||
**Warning signs:** 404/FileNotFoundException from CSOM when calling StorageMetrics.
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Loading StorageMetrics (STOR-01/02/03)
|
||||
|
||||
```csharp
|
||||
// Source: MS Learn — StorageMetrics Class; [MS-CSOMSPT] TotalFileStreamSize definition
|
||||
|
||||
ctx.Load(ctx.Web, w => w.ServerRelativeUrl, w => w.Url, w => w.Title);
|
||||
await ExecuteQueryRetryHelper.ExecuteQueryRetryAsync(ctx, progress, ct);
|
||||
|
||||
string webSrl = ctx.Web.ServerRelativeUrl.TrimEnd('/');
|
||||
|
||||
// Per-library: iterate document libraries
|
||||
ctx.Load(ctx.Web.Lists, lists => lists.Include(
|
||||
l => l.Title, l => l.BaseType, l => l.Hidden, l => l.RootFolder.ServerRelativeUrl));
|
||||
await ExecuteQueryRetryHelper.ExecuteQueryRetryAsync(ctx, progress, ct);
|
||||
|
||||
foreach (var list in ctx.Web.Lists)
|
||||
{
|
||||
if (list.Hidden || list.BaseType != BaseType.DocumentLibrary) continue;
|
||||
|
||||
string siteRelUrl = list.RootFolder.ServerRelativeUrl.Substring(webSrl.Length).TrimStart('/');
|
||||
Folder rootFolder = ctx.Web.GetFolderByServerRelativeUrl(list.RootFolder.ServerRelativeUrl);
|
||||
ctx.Load(rootFolder,
|
||||
f => f.StorageMetrics,
|
||||
f => f.TimeLastModified,
|
||||
f => f.ServerRelativeUrl);
|
||||
await ExecuteQueryRetryHelper.ExecuteQueryRetryAsync(ctx, progress, ct);
|
||||
|
||||
var node = new StorageNode
|
||||
{
|
||||
Name = list.Title,
|
||||
Url = $"{ctx.Web.Url.TrimEnd('/')}/{siteRelUrl}",
|
||||
SiteTitle = ctx.Web.Title,
|
||||
Library = list.Title,
|
||||
TotalSizeBytes = rootFolder.StorageMetrics.TotalSize,
|
||||
FileStreamSizeBytes = rootFolder.StorageMetrics.TotalFileStreamSize,
|
||||
TotalFileCount = rootFolder.StorageMetrics.TotalFileCount,
|
||||
LastModified = rootFolder.StorageMetrics.LastModified > DateTime.MinValue
|
||||
? rootFolder.StorageMetrics.LastModified
|
||||
: rootFolder.TimeLastModified,
|
||||
IndentLevel = 0,
|
||||
Children = new List<StorageNode>()
|
||||
};
|
||||
|
||||
// Recursive subfolder collection up to maxDepth
|
||||
if (maxDepth > 0)
|
||||
await CollectSubfoldersAsync(ctx, list.RootFolder.ServerRelativeUrl, node, 1, maxDepth, progress, ct);
|
||||
}
|
||||
```
|
||||
|
||||
### KQL Build from SearchOptions
|
||||
|
||||
```csharp
|
||||
// Source: PS reference lines 4747-4763
|
||||
|
||||
private static string BuildKql(SearchOptions opts)
|
||||
{
|
||||
var parts = new List<string> { "ContentType:Document" };
|
||||
|
||||
if (opts.Extensions.Length > 0)
|
||||
{
|
||||
var extParts = opts.Extensions.Select(e => $"FileExtension:{e.TrimStart('.').ToLowerInvariant()}");
|
||||
parts.Add($"({string.Join(" OR ", extParts)})");
|
||||
}
|
||||
if (opts.CreatedAfter.HasValue)
|
||||
parts.Add($"Created>={opts.CreatedAfter.Value:yyyy-MM-dd}");
|
||||
if (opts.CreatedBefore.HasValue)
|
||||
parts.Add($"Created<={opts.CreatedBefore.Value:yyyy-MM-dd}");
|
||||
if (opts.ModifiedAfter.HasValue)
|
||||
parts.Add($"Write>={opts.ModifiedAfter.Value:yyyy-MM-dd}");
|
||||
if (opts.ModifiedBefore.HasValue)
|
||||
parts.Add($"Write<={opts.ModifiedBefore.Value:yyyy-MM-dd}");
|
||||
if (!string.IsNullOrEmpty(opts.CreatedBy))
|
||||
parts.Add($"Author:\"{opts.CreatedBy}\"");
|
||||
if (!string.IsNullOrEmpty(opts.ModifiedBy))
|
||||
parts.Add($"ModifiedBy:\"{opts.ModifiedBy}\"");
|
||||
if (!string.IsNullOrEmpty(opts.Library))
|
||||
parts.Add($"Path:\"{opts.SiteUrl.TrimEnd('/')}/{opts.Library.TrimStart('/')}*\"");
|
||||
|
||||
return string.Join(" AND ", parts);
|
||||
}
|
||||
```
|
||||
|
||||
### Parsing Search ResultRows
|
||||
|
||||
```csharp
|
||||
// Source: PS reference lines 4971-4987
|
||||
|
||||
private static SearchResult ParseRow(IDictionary<string, object> row)
|
||||
{
|
||||
static string Str(IDictionary<string, object> r, string key) =>
|
||||
r.TryGetValue(key, out var v) ? v?.ToString() ?? string.Empty : string.Empty;
|
||||
|
||||
static DateTime? Date(IDictionary<string, object> r, string key)
|
||||
{
|
||||
var s = Str(r, key);
|
||||
return DateTime.TryParse(s, out var dt) ? dt : null;
|
||||
}
|
||||
|
||||
static long ParseSize(IDictionary<string, object> r, string key)
|
||||
{
|
||||
var raw = Str(r, key);
|
||||
var digits = System.Text.RegularExpressions.Regex.Replace(raw, "[^0-9]", "");
|
||||
return long.TryParse(digits, out var v) ? v : 0L;
|
||||
}
|
||||
|
||||
return new SearchResult
|
||||
{
|
||||
Title = Str(row, "Title"),
|
||||
Path = Str(row, "Path"),
|
||||
FileExtension = Str(row, "FileExtension"),
|
||||
Created = Date(row, "Created"),
|
||||
LastModified = Date(row, "LastModifiedTime"),
|
||||
Author = Str(row, "Author"),
|
||||
ModifiedBy = Str(row, "ModifiedBy"),
|
||||
SizeBytes = ParseSize(row, "Size")
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Localization Keys Needed
|
||||
|
||||
The following keys are needed for Phase 3 Views. Keys from the PS reference (lines 2747-2813) are remapped to the C# `Strings.resx` naming convention. Existing keys already in `Strings.resx` are marked with (existing).
|
||||
|
||||
### Storage Tab
|
||||
|
||||
| Key | EN Value | Notes |
|
||||
|-----|----------|-------|
|
||||
| `tab.storage` | `Storage` | (existing — already in Strings.resx line 77) |
|
||||
| `chk.per.lib` | `Per-Library Breakdown` | new |
|
||||
| `chk.subsites` | `Include Subsites` | new |
|
||||
| `lbl.folder.depth` | `Folder depth:` | (existing — shared with permissions) |
|
||||
| `chk.max.depth` | `Maximum (all levels)` | (existing — shared with permissions) |
|
||||
| `stor.note` | `Note: deeper folder scans on large sites may take several minutes.` | new |
|
||||
| `btn.gen.storage` | `Generate Metrics` | new |
|
||||
| `btn.open.storage` | `Open Report` | new |
|
||||
| `stor.col.library` | `Library` | new |
|
||||
| `stor.col.site` | `Site` | new |
|
||||
| `stor.col.files` | `Files` | new |
|
||||
| `stor.col.size` | `Size` | new |
|
||||
| `stor.col.versions` | `Versions` | new |
|
||||
| `stor.col.lastmod` | `Last Modified` | new |
|
||||
| `stor.col.share` | `Share of Total` | new |
|
||||
|
||||
### File Search Tab
|
||||
|
||||
| Key | EN Value | Notes |
|
||||
|-----|----------|-------|
|
||||
| `tab.search` | `File Search` | (existing — already in Strings.resx line 79) |
|
||||
| `grp.search.filters` | `Search Filters` | new |
|
||||
| `lbl.extensions` | `Extension(s):` | new |
|
||||
| `ph.extensions` | `docx pdf xlsx` | new (placeholder) |
|
||||
| `lbl.regex` | `Name / Regex:` | new |
|
||||
| `ph.regex` | `Ex: report.* or \.bak$` | new (placeholder) |
|
||||
| `chk.created.after` | `Created after:` | new |
|
||||
| `chk.created.before` | `Created before:` | new |
|
||||
| `chk.modified.after` | `Modified after:` | new |
|
||||
| `chk.modified.before` | `Modified before:` | new |
|
||||
| `lbl.created.by` | `Created by:` | new |
|
||||
| `ph.created.by` | `First Last or email` | new (placeholder) |
|
||||
| `lbl.modified.by` | `Modified by:` | new |
|
||||
| `ph.modified.by` | `First Last or email` | new (placeholder) |
|
||||
| `lbl.library` | `Library:` | new |
|
||||
| `ph.library` | `Optional relative path e.g. Shared Documents` | new (placeholder) |
|
||||
| `lbl.max.results` | `Max results:` | new |
|
||||
| `btn.run.search` | `Run Search` | new |
|
||||
| `btn.open.search` | `Open Results` | new |
|
||||
| `srch.col.name` | `File Name` | new |
|
||||
| `srch.col.ext` | `Extension` | new |
|
||||
| `srch.col.created` | `Created` | new |
|
||||
| `srch.col.modified` | `Modified` | new |
|
||||
| `srch.col.author` | `Created By` | new |
|
||||
| `srch.col.modby` | `Modified By` | new |
|
||||
| `srch.col.size` | `Size` | new |
|
||||
|
||||
### Duplicates Tab
|
||||
|
||||
| Key | EN Value | Notes |
|
||||
|-----|----------|-------|
|
||||
| `tab.duplicates` | `Duplicates` | (existing — already in Strings.resx line 83) |
|
||||
| `grp.dup.type` | `Duplicate Type` | new |
|
||||
| `rad.dup.files` | `Duplicate files` | new |
|
||||
| `rad.dup.folders` | `Duplicate folders` | new |
|
||||
| `grp.dup.criteria` | `Comparison Criteria` | new |
|
||||
| `lbl.dup.note` | `Name is always the primary criterion. Check additional criteria:` | new |
|
||||
| `chk.dup.size` | `Same size` | new |
|
||||
| `chk.dup.created` | `Same creation date` | new |
|
||||
| `chk.dup.modified` | `Same modification date` | new |
|
||||
| `chk.dup.subfolders` | `Same subfolder count` | new |
|
||||
| `chk.dup.filecount` | `Same file count` | new |
|
||||
| `chk.include.subsites` | `Include subsites` | new |
|
||||
| `ph.dup.lib` | `All (leave empty)` | new (placeholder) |
|
||||
| `btn.run.scan` | `Run Scan` | new |
|
||||
| `btn.open.results` | `Open Results` | new |
|
||||
|
||||
---
|
||||
|
||||
## Duplicate Detection Scale — Known Concern Resolution
|
||||
|
||||
The STATE.md concern ("Duplicate detection at scale (100k+ files) — Graph API hash enumeration limits") is resolved: the PS reference does NOT use file hashes. It uses name+size+date grouping, which is exactly what DUPL-01/02/03 specify. The requirements do not mention hash-based deduplication.
|
||||
|
||||
**Scale analysis:**
|
||||
- File duplicates use the Search API. SharePoint Search caps at 50,000 results (StartRow=50,000 max). A site with 100k+ files will be capped at 50,000 returned results. This is the same cap as SRCH-02, and is a known/accepted limitation.
|
||||
- Folder duplicates use CAML pagination. `SharePointPaginationHelper.GetAllItemsAsync` handles arbitrary folder counts with RowLimit=2000 pagination — no effective upper bound.
|
||||
- Client-side GroupBy on 50,000 items is instantaneous (Dictionary-based O(n) operation).
|
||||
- **No Graph API or SHA256 content hashing is needed.** The concern was about a potential v2 enhancement not required by DUPL-01/02/03.
|
||||
|
||||
---
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| `Get-PnPFolderStorageMetric` (PS cmdlet) | CSOM `Folder.StorageMetrics` | Phase 3 migration | One CSOM round-trip per folder; no PnP PS module required |
|
||||
| `Submit-PnPSearchQuery` (PS cmdlet) | CSOM `KeywordQuery` + `SearchExecutor` | Phase 3 migration | Same pagination model; TrimDuplicates=false explicit |
|
||||
| `Get-PnPListItem` for folders (PS) | `SharePointPaginationHelper.GetAllItemsAsync` with CAML | Phase 3 migration | Reuses Phase 1 helper; handles 5000-item threshold |
|
||||
| Storage TreeView control | Flat DataGrid with IndentLevel + IValueConverter | Phase 3 design decision | Better UI virtualization for large sites |
|
||||
|
||||
---
|
||||
|
||||
## Validation Architecture
|
||||
|
||||
### Test Framework
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Framework | xUnit 2.9.3 |
|
||||
| Config file | none (SDK auto-discovery) |
|
||||
| Quick run command | `dotnet test SharepointToolbox.Tests/SharepointToolbox.Tests.csproj --filter "Category!=Integration" -x` |
|
||||
| Full suite command | `dotnet test SharepointToolbox.slnx` |
|
||||
|
||||
### Phase Requirements → Test Map
|
||||
|
||||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||||
|--------|----------|-----------|-------------------|-------------|
|
||||
| STOR-01/02 | `StorageService.CollectStorageAsync` returns `StorageNode` list | unit (mock ISessionManager) | `dotnet test --filter "StorageServiceTests"` | ❌ Wave 0 |
|
||||
| STOR-03 | VersionSizeBytes = TotalSizeBytes - FileStreamSizeBytes | unit | `dotnet test --filter "StorageNodeTests"` | ❌ Wave 0 |
|
||||
| STOR-04 | `StorageCsvExportService.BuildCsv` produces correct header and rows | unit | `dotnet test --filter "StorageCsvExportServiceTests"` | ❌ Wave 0 |
|
||||
| STOR-05 | `StorageHtmlExportService.BuildHtml` contains toggle JS and nested tables | unit | `dotnet test --filter "StorageHtmlExportServiceTests"` | ❌ Wave 0 |
|
||||
| SRCH-01 | `SearchService` builds correct KQL from `SearchOptions` | unit | `dotnet test --filter "SearchServiceTests"` | ❌ Wave 0 |
|
||||
| SRCH-02 | Search loop exits when `startRow > 50_000` | unit | `dotnet test --filter "SearchServiceTests"` | ❌ Wave 0 |
|
||||
| SRCH-03 | `SearchCsvExportService.BuildCsv` produces correct header | unit | `dotnet test --filter "SearchCsvExportServiceTests"` | ❌ Wave 0 |
|
||||
| SRCH-04 | `SearchHtmlExportService.BuildHtml` contains sort JS and filter input | unit | `dotnet test --filter "SearchHtmlExportServiceTests"` | ❌ Wave 0 |
|
||||
| DUPL-01 | `MakeKey` function groups identical name+size+date items | unit | `dotnet test --filter "DuplicatesServiceTests"` | ❌ Wave 0 |
|
||||
| DUPL-02 | CAML query targets `FSObjType=1`; `FileCount = ItemChildCount - FolderChildCount` | unit (logic only) | `dotnet test --filter "DuplicatesServiceTests"` | ❌ Wave 0 |
|
||||
| DUPL-03 | `DuplicatesHtmlExportService.BuildHtml` contains group cards with ok/diff badges | unit | `dotnet test --filter "DuplicatesHtmlExportServiceTests"` | ❌ Wave 0 |
|
||||
|
||||
**Note:** `StorageService`, `SearchService`, and `DuplicatesService` depend on live CSOM — service-level tests use Skip like `PermissionsServiceTests`. ViewModel tests use Moq for `IStorageService`, `ISearchService`, `IDuplicatesService` following `PermissionsViewModelTests` pattern. Export service tests are fully unit-testable (no CSOM).
|
||||
|
||||
### Sampling Rate
|
||||
|
||||
- **Per task commit:** `dotnet test SharepointToolbox.Tests/SharepointToolbox.Tests.csproj -x`
|
||||
- **Per wave merge:** `dotnet test SharepointToolbox.slnx`
|
||||
- **Phase gate:** Full suite green before `/gsd:verify-work`
|
||||
|
||||
### Wave 0 Gaps
|
||||
|
||||
- [ ] `SharepointToolbox.Tests/Services/StorageServiceTests.cs` — covers STOR-01/02 (stub + Skip like PermissionsServiceTests)
|
||||
- [ ] `SharepointToolbox.Tests/Services/Export/StorageCsvExportServiceTests.cs` — covers STOR-04
|
||||
- [ ] `SharepointToolbox.Tests/Services/Export/StorageHtmlExportServiceTests.cs` — covers STOR-05
|
||||
- [ ] `SharepointToolbox.Tests/Services/SearchServiceTests.cs` — covers SRCH-01/02 (KQL build + pagination cap logic)
|
||||
- [ ] `SharepointToolbox.Tests/Services/Export/SearchCsvExportServiceTests.cs` — covers SRCH-03
|
||||
- [ ] `SharepointToolbox.Tests/Services/Export/SearchHtmlExportServiceTests.cs` — covers SRCH-04
|
||||
- [ ] `SharepointToolbox.Tests/Services/DuplicatesServiceTests.cs` — covers DUPL-01/02 composite key logic
|
||||
- [ ] `SharepointToolbox.Tests/Services/Export/DuplicatesHtmlExportServiceTests.cs` — covers DUPL-03
|
||||
- [ ] `SharepointToolbox.Tests/ViewModels/StorageViewModelTests.cs` — covers STOR-01 ViewModel (Moq IStorageService)
|
||||
- [ ] `SharepointToolbox.Tests/ViewModels/SearchViewModelTests.cs` — covers SRCH-01/02 ViewModel
|
||||
- [ ] `SharepointToolbox.Tests/ViewModels/DuplicatesViewModelTests.cs` — covers DUPL-01/02 ViewModel
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **StorageMetrics.LastModified vs TimeLastModified**
|
||||
- What we know: `StorageMetrics.LastModified` exists per the API docs. `Folder.TimeLastModified` is a separate CSOM property.
|
||||
- What's unclear: Whether `StorageMetrics.LastModified` can return `DateTime.MinValue` for recently created empty folders in all SharePoint Online tenants.
|
||||
- Recommendation: Load both (`f => f.StorageMetrics, f => f.TimeLastModified`) and prefer `StorageMetrics.LastModified` when it is `> DateTime.MinValue`, falling back to `TimeLastModified`.
|
||||
|
||||
2. **Search index freshness for duplicate detection**
|
||||
- What we know: SharePoint Search is eventually consistent — newly created files may not appear for up to 15 minutes.
|
||||
- What's unclear: Whether users expect real-time accuracy or accept eventual consistency.
|
||||
- Recommendation: Document in UI that search-based results (files) reflect the search index, not the current state. Add a note in the log output.
|
||||
|
||||
3. **Multiple-site file search scope**
|
||||
- What we know: The PS reference scopes search to `$siteUrl` context only (one site per search). SRCH-01 says "across sites" in the goal description but the requirements only specify search criteria, not multi-site.
|
||||
- What's unclear: Whether SRCH-01 requires multi-site search in one operation or per-site.
|
||||
- Recommendation: Implement per-site search (matching PS reference). Multi-site search would require separate `ClientContext` per site plus result merging — treat as a future enhancement.
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
|
||||
- [StorageMetrics Class — MS Learn CSOM reference](https://learn.microsoft.com/en-us/dotnet/api/microsoft.sharepoint.client.storagemetrics?view=sharepoint-csom) — properties TotalSize, TotalFileStreamSize, TotalFileCount, LastModified confirmed
|
||||
- [StorageMetrics.TotalSize — MS Learn](https://learn.microsoft.com/en-us/dotnet/api/microsoft.sharepoint.client.storagemetrics.totalsize?view=sharepoint-csom) — confirmed as Int64, ReadOnly
|
||||
- [[MS-CSOMSPT] TotalFileStreamSize](https://learn.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-csomspt/635464fc-8505-43fa-97d7-02229acdb3c5) — confirmed definition: "Aggregate stream size in bytes for all files... Excludes version, metadata, list item attachment, and non-customized document sizes"
|
||||
- [SearchExecutor Class — MS Learn CSOM reference](https://learn.microsoft.com/en-us/dotnet/api/microsoft.sharepoint.client.search.query.searchexecutor?view=sharepoint-csom) — namespace `Microsoft.SharePoint.Client.Search.Query`, assembly `Microsoft.SharePoint.Client.Search.Portable.dll`
|
||||
- [Search limits for SharePoint — MS Learn](https://learn.microsoft.com/en-us/sharepoint/search-limits) — StartRow max 50,000 (boundary), RowLimit max 500 (boundary) confirmed
|
||||
- [SharepointToolbox/bin/Debug output] — `Microsoft.SharePoint.Client.Search.dll` confirmed present as transitive dep
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
|
||||
- [Load storage metric from SPO — longnlp.github.io](https://longnlp.github.io/load-storage-metric-from-SPO) — CSOM Load pattern: `ctx.Load(folder, f => f.StorageMetrics)` verified
|
||||
- [Fetch all results from SharePoint Search using CSOM — usefulscripts.wordpress.com](https://usefulscripts.wordpress.com/2015/09/11/how-to-fetch-all-results-from-sharepoint-search-using-dot-net-managed-csom/) — KeywordQuery + SearchExecutor pagination pattern with StartRow; confirmed against official docs
|
||||
- PowerShell reference `Sharepoint_ToolBox.ps1` lines 1621-1780 (Export-StorageToHTML), 2112-2233 (Export-SearchResultsToHTML), 2235-2406 (Export-DuplicatesToHTML), 4432-4534 (storage scan), 4747-4808 (file search), 4937-5059 (duplicate scan) — authoritative reference implementation
|
||||
|
||||
### Tertiary (LOW confidence — implementation detail, verify when coding)
|
||||
|
||||
- [SharePoint CSOM Q&A — Getting size of subsite](https://learn.microsoft.com/en-us/answers/questions/1518977/getting-size-of-a-subsite-using-csom) — general pattern confirmed; specific edge cases not verified
|
||||
- [Pagination for large result sets — MS Learn](https://learn.microsoft.com/en-us/sharepoint/dev/general-development/pagination-for-large-result-sets) — DocId-based pagination beyond 50k exists but is not needed for Phase 3
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Standard Stack: HIGH — no new packages needed; Search.dll confirmed present; all APIs verified against MS docs
|
||||
- Architecture Patterns: HIGH — direct port of working PS reference; CSOM API shapes confirmed
|
||||
- Pitfalls: HIGH for StorageMetrics loading, search result typing, vti_history filter (all from PS reference or official docs); MEDIUM for KQL length limit (documented but not commonly hit)
|
||||
- Localization keys: HIGH — directly extracted from PS reference lines 2747-2813
|
||||
|
||||
**Research date:** 2026-04-02
|
||||
**Valid until:** 2026-07-01 (CSOM APIs stable; SharePoint search limits stable; re-verify if PnP.Framework upgrades past 1.18)
|
||||
Reference in New Issue
Block a user