Files
Sharepoint-Toolbox/.planning/research/SUMMARY.md
Kawa 0c2e26e597 docs: complete project research for SharePoint Toolbox rewrite
Research covers stack (NET10/WPF/PnP.Framework), features (v1 parity + v1.x
differentiators), architecture (MVVM four-layer pattern), and pitfalls
(10 critical pitfalls all addressed in foundation phase). SUMMARY.md
synthesizes findings with phase-structured roadmap implications.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:07:47 +02:00

24 KiB
Raw Blame History

Project Research Summary

Project: SharePoint Toolbox — C#/WPF SharePoint Online Administration Desktop Tool Domain: SharePoint Online administration, auditing, and provisioning (MSP / IT admin) Researched: 2026-04-02 Confidence: HIGH

Executive Summary

This project is a full rewrite of a PowerShell-based SharePoint Online administration toolbox into a standalone C#/WPF desktop application targeting MSP administrators who manage 1030 client tenants simultaneously. The research confirms that the correct technical path is .NET 10 LTS with WPF, PnP.Framework (not PnP.Core SDK) as the SharePoint library, and CommunityToolkit.Mvvm for the MVVM layer. The key architectural constraint is that multi-tenant session caching — holding MSAL token caches per tenant with MsalCacheHelper — must be the very first infrastructure component built, because every single feature gates on it. The recommended architecture is a strict four-layer MVVM pattern (View → ViewModel → Service → Infrastructure) with no WPF types below the ViewModel layer, constructor-injected interfaces throughout, and AsyncRelayCommand for every SharePoint operation.

The feature scope is well-defined: parity with the existing PowerShell tool is the v1 MVP (permissions reports, storage metrics, file search, bulk operations, site templates, duplicate detection, error reporting, EN/FR localization). Three new features are justified for a v1.x release once core parity is validated — user access export across sites, simplified plain-language permissions view, and storage charts by file type. These represent genuine competitive differentiation against SaaS tools like ShareGate and ManageEngine, which are cloud-based, subscription-priced, and do not offer local offline operation or MSP-grade multi-tenant context switching.

The most dangerous risk is not technical complexity but porting discipline: the existing codebase has 38 silent catch blocks and no async discipline. The single highest-priority constraint for the entire project is that async patterns (AsyncRelayCommand, IProgress<T>, CancellationToken, ExecuteQueryRetryAsync) must be established in the foundation phase and enforced through code review before any feature work begins. Retrofitting these patterns after-the-fact is among the most expensive refactors possible in a WPF codebase. Similarly, the write-then-replace JSON persistence pattern and SharePoint pagination helpers must be built once in the foundation and reused everywhere — building these per-feature guarantees divergence and bugs.

Key Findings

The stack is fully resolved with high confidence. All package versions are confirmed on NuGet as of 2026-04-02. The runtime is .NET 10 LTS (EOL November 2028); .NET 8 was explicitly rejected because it reaches EOL in November 2026 — too soon for a new project. PnP.Framework 1.18.0 is the correct SharePoint library choice because this is a CSOM-heavy migration from PnP.PowerShell patterns and the PnP Provisioning Engine (required for site templates) lives only in PnP.Framework, not in PnP.Core SDK. Do not use PublishTrimmed=true — PnP.Framework and MSAL use reflection and are not trim-safe; the self-contained EXE will be approximately 150200 MB, which is acceptable per project constraints.

Core technologies:

  • .NET 10 LTS + WPF: Windows-only per constraint; richer MVVM binding than WinForms (the existing framework)
  • PnP.Framework 1.18.0: CSOM operations, PnP Provisioning Engine, site templates — the direct C# equivalent of PnP.PowerShell
  • Microsoft.Graph 5.103.0: Teams, groups, user enumeration across tenants — Graph-native operations only
  • MSAL.NET 4.83.1 + Extensions.Msal 4.83.3 + Desktop 4.82.1: Multi-tenant token cache per tenant, Windows broker (WAM) support
  • CommunityToolkit.Mvvm 8.4.2: Source-generated [ObservableProperty], [RelayCommand], AsyncRelayCommand — eliminates MVVM boilerplate
  • Microsoft.Extensions.Hosting 10.x: DI container (IServiceCollection), app lifetime, IConfiguration
  • Serilog 4.3.1 + file sink: Structured logging to rolling files in %AppData%\SharepointToolbox\logs\ — essential for diagnosing the silent failures in the existing app
  • ScottPlot.WPF 5.1.57: Pie and bar charts for storage metrics — stable MIT-licensed library (LiveCharts2 WPF is still RC as of April 2026)
  • System.Text.Json (built-in): JSON profiles, settings, templates — no Newtonsoft.Json dependency
  • CsvHelper: CSV export — replaces manual string concatenation
  • .resx localization: EN/FR compile-time-safe resource files

Expected Features

The feature scope is well-researched. Competitive analysis against ShareGate, ManageEngine SharePoint Manager Plus, and AdminDroid confirms that local offline operation, instant multi-tenant switching, plain-language permissions, and folder structure provisioning are genuine differentiators that no competitor SaaS tool offers.

Must have (table stakes — v1 parity):

  • Tenant profile registry + multi-tenant session caching — everything gates on this
  • Permissions report (site-level) with CSV + HTML export
  • Storage metrics per site
  • File search across sites
  • Bulk operations (member add, site creation, transfer) with progress and cancellation
  • Site template management + folder structure provisioning
  • Duplicate file detection
  • Error reporting (replace 38 silent catch blocks with visible failures)
  • Localization (EN/FR) — existing users depend on this

Should have (competitive differentiators — v1.x):

  • User access export across selected sites — "everything User X can access across 15 sites" — no native M365 equivalent
  • Simplified permissions view (plain language) — "can edit files" instead of "Contribute"
  • Storage graph by file type (pie + bar toggle) — file-type breakdown competitors don't provide

Defer (v2+):

  • Scheduled scan runs via Windows Task Scheduler (requires stable CLI/headless mode first)
  • Permission comparison/diff between two time points (requires snapshot storage)
  • XLSX export (CSV opens in Excel adequately for v1)

Anti-features to reject outright: real-time permission change alerts (requires persistent Azure service), automated remediation (liability risk), cloud sync, AI governance recommendations (Microsoft's own roadmap).

Architecture Approach

The recommended architecture is a strict four-layer MVVM pattern hosted in Microsoft.Extensions.Hosting. The application is organized as: Views (XAML only, zero code-behind) → ViewModels (CommunityToolkit.Mvvm, one per feature tab) → Services (domain logic, stateless, constructor-injected via interfaces) → Infrastructure (PnP.Framework, Microsoft.Graph, local JSON files). Cross-ViewModel communication uses WeakReferenceMessenger (e.g., tenant-switched event resets all feature VM state). A singleton SessionManager is the only class that holds ClientContext objects — services request a context per operation and never store it. The Core/ folder contains pure C# models and interfaces with no WPF references, making all services independently testable.

Major components:

  1. AuthService / SessionManager — multi-tenant MSAL token cache, TenantSession per tenant, active profile state; singleton; every feature gates on this
  2. Feature Services (6) — PermissionsService, StorageService, SearchService, TemplateService, DuplicateService, BulkOpsService — stateless, cancellable, progress-reporting; registered as transient
  3. ReportExportService + CsvExportService — self-contained HTML reports (embedded JS/CSS) and CSV generation; called after operation completes
  4. SettingsService — JSON profiles, templates, settings with write-then-replace pattern and SemaphoreSlim concurrency guard; singleton
  5. MainWindowViewModel — shell navigation, tenant selector, log panel; delegates all feature logic to feature ViewModels via DI
  6. Feature ViewModels (7) — one per tab (Permissions, Storage, Search, Templates, Duplicates, BulkOps, Settings); own CancellationTokenSource and ObservableCollection<T> per operation

Critical Pitfalls

10 pitfalls were identified. All 10 are addressed in Phase 1 (Foundation) — none can be deferred to feature phases.

  1. Sync calls on the UI thread — Never use .Result/.Wait() on the UI thread; every PnP call must use await with the async overload or Task.Run; use AsyncRelayCommand for all commands. Establish this pattern before any feature work begins or retrofitting costs will be severe.

  2. Porting silent error suppression — The existing app has 38 empty catch blocks. Every catch in the C# rewrite must do one of three things: log-and-recover, log-and-rethrow, or log-and-surface to the user. Treat empty catch as a build defect from day one.

  3. SharePoint 5,000-item list view threshold — All CSOM list enumeration must use CamlQuery with RowLimit ≤ 2,000 and ListItemCollectionPosition pagination. Build a shared pagination helper in Phase 1 and mandate its use in every feature that enumerates list items.

  4. Multi-tenant token cache race conditions — Use MsalCacheHelper (Microsoft.Identity.Client.Extensions.Msal) for file-based per-tenant token cache serialization. Scope IPublicClientApplication per ClientId, not per tenant URL. Provide a "Clear cached sessions" UI action.

  5. JSON settings file corruption on concurrent writes — Use write-then-replace (filename.tmp → validate → File.Move) plus SemaphoreSlim(1) per file. Implement before any feature persists data. Known bug in the existing app per CONCERNS.md.

  6. WPF ObservableCollection updates from background threads — Collect results into List<T> on background thread, then assign new ObservableCollection<T>(list) atomically via Dispatcher.InvokeAsync. Use IProgress<T> for streaming. Never modify ObservableCollection from Task.Run.

  7. async void command handlers — Use AsyncRelayCommand exclusively for async operations. async void swallows exceptions post-await. Wire Application.DispatcherUnhandledException and TaskScheduler.UnobservedTaskException as last-resort handlers.

  8. API throttling (429/503) — Always use ExecuteQueryRetryAsync (never ExecuteQuery). For Graph SDK, the default retry handler respects Retry-After automatically. Surface retry events to the user as progress messages.

  9. ClientContext resource disposal gaps — Always obtain ClientContext inside using or await using. Verify Dispose() is called on cancellation via unit tests.

  10. WPF trimming breaks self-contained EXE — Never set PublishTrimmed=true. Accept the ~150200 MB EXE size. Use PublishReadyToRun=true for startup performance instead.

Implications for Roadmap

Based on the combined research, the dependency graph from ARCHITECTURE.md and FEATURES.md, and the pitfall-to-phase mapping from PITFALLS.md, the following phase structure is strongly recommended:

Phase 1: Foundation and Infrastructure

Rationale: All 10 critical pitfalls must be resolved before feature work begins. The dependency graph in FEATURES.md shows that every feature requires the tenant profile registry and session caching layer. Establishing async patterns, error handling, DI container, logging, and JSON persistence now prevents the most expensive retrofits. Delivers: Runnable WPF shell with tenant selector, multi-tenant session caching (MSAL + MsalCacheHelper), DI container wiring, Serilog logging, SettingsService with write-then-replace persistence, ResX localization scaffolding, shared pagination helper, shared AsyncRelayCommand pattern, global exception handlers. Addresses: Tenant profile registry (prerequisite for all features), EN/FR localization scaffolding, error reporting infrastructure. Avoids: All 10 pitfalls — async deadlocks, silent errors, token cache races, JSON corruption, ObservableCollection threading, async void, throttling, disposal gaps, trimming. Research flag: Standard patterns — Microsoft.Extensions.Hosting + CommunityToolkit.Mvvm + MsalCacheHelper are well-documented. No additional research needed.

Phase 2: Permissions and Audit Core

Rationale: Permissions reporting is the highest-value daily-use feature and the canonical audit use case. Building it second validates that the auth layer and pagination helper work under real conditions before other features depend on them. It also forces the error reporting UX to be finalized early. Delivers: Site-level permissions report with recursive scan (configurable depth), CSV export, self-contained HTML export, plain progress feedback ("Scanning X of Y sites"), error surface for failed scans (no silent failures). Addresses: Permissions report (table stakes P1), CSV + HTML export (table stakes P1), error reporting (table stakes P1). Avoids: 5,000-item threshold (pagination helper reuse), silent errors (error handling from Phase 1), sync/async deadlock (AsyncRelayCommand from Phase 1). Research flag: Standard patterns — PnP Framework permission scanning is well-documented. PnP permissions API is HIGH confidence.

Phase 3: Storage Metrics and File Operations

Rationale: Storage metrics and file search are the other two daily-use features in the existing tool. They reuse the auth session and export infrastructure from Phases 12. Duplicate detection depends on the file enumeration infrastructure built for file search, so these belong together. Delivers: Storage metrics per site (total + breakdown), file search across sites (KQL-based), duplicate file detection (hash or name+size matching), storage data export (CSV + HTML). Addresses: Storage metrics (P1), file search (P1), duplicate detection (P1). Avoids: Large collection streaming (IProgress pattern from Phase 1), Graph SDK pagination (PageIterator), API throttling (retry handler from Phase 1). Research flag: Duplicate detection against large tenants under Graph throttling may need tactical research during planning — hash-based detection at scale has specific pagination constraints.

Phase 4: Bulk Operations and Provisioning

Rationale: Bulk operations (member add, site creation, transfer) and site/folder template management are the remaining P1 features. They are the highest-complexity features (HIGH implementation cost in FEATURES.md) and benefit from stable async/cancel/progress infrastructure from Phase 1. Folder provisioning depends on site template management — build together. Delivers: Bulk member add/remove, bulk site creation, ownership transfer, site template capture and apply, folder structure provisioning from template. Addresses: Bulk operations with progress/cancel (P1), site template management (P1), folder structure provisioning (P1). Avoids: Operation cancellation (CancellationToken threading from Phase 1), partial-failure reporting (error surface from Phase 2), API throttling (retry handler from Phase 1). Research flag: PnP Provisioning Engine for site templates may need specific research during planning — template schema and apply behavior are documented but edge cases (Teams-connected sites, modern vs. classic) need validation.

Phase 5: New Differentiating Features (v1.x)

Rationale: These three features are new capabilities (not existing-tool parity) that depend on stable v1 infrastructure. User access export across sites requires multi-site permissions scan from Phase 2. Storage charts require storage metrics from Phase 3. Plain-language permissions view is a presentation layer on top of the permissions data model from Phase 2. Grouping them as v1.x avoids blocking the v1 release on new development. Delivers: User access export across arbitrary site subsets (cross-site access report for a single user), simplified plain-language permissions view (jargon-free labels, color coding), storage graph by file type (pie/bar toggle via ScottPlot.WPF). Addresses: User access export (P2), simplified permissions view (P2), storage graph by file type (P2). Uses: ScottPlot.WPF 5.1.57, existing PermissionsService and StorageService from Phases 23. Research flag: User access export across sites involves enumerating group memberships, direct assignments, and inherited access across N sites — the Graph API volume and correct enumeration approach may need targeted research.

Phase 6: Distribution and Hardening

Rationale: Packaging, end-to-end validation on clean machines, FR locale completeness check, and the "looks done but isn't" checklist from PITFALLS.md. Must be done before any release, not as an afterthought. Delivers: Single self-contained EXE (PublishSingleFile=true, SelfContained=true, PublishTrimmed=false, win-x64), validated on a machine with no .NET runtime, FR locale fully tested, throttling recovery verified, JSON corruption recovery verified, cancellation verified, 5,000+ item library tested. Avoids: WPF trimming crash (Pitfall 6), "works on dev machine" surprises. Research flag: Standard patterns — dotnet publish single-file configuration is well-documented.

Phase Ordering Rationale

  • Foundation first is mandatory: all 10 pitfalls map to Phase 1. The auth layer and async patterns are prerequisites for every subsequent phase. Starting features before the foundation is solid replicates the original app's architectural problems.
  • Permissions before storage/search because permissions validates the pagination helper, auth layer, and export pipeline under real conditions with the most complex data model.
  • Bulk ops and provisioning after core read operations because they have higher risk (they write to client tenants) and should be tested against a validated auth layer and error surface.
  • New v1.x features after v1 parity to avoid blocking the release on non-parity features. The three P2 features are all presentation or cross-cutting enhancements on top of stable Phase 23 data models.
  • Distribution last because EXE packaging must be validated against the complete feature set.

Research Flags

Phases likely needing /gsd:research-phase during planning:

  • Phase 3 (Duplicate detection): Hash-based detection under Graph throttling constraints at large scale — specific pagination strategy and concurrency limits for file enumeration need validation.
  • Phase 4 (Site templates): PnP Provisioning Engine behavior for Teams-connected sites, modern site template schema edge cases, and apply-template behavior on non-empty sites need verification.
  • Phase 5 (User access export): Graph API approach for enumerating all permissions for a single user across N sites (group memberships + direct assignments + inherited) — the correct API sequence and volume implications need targeted research.

Phases with standard patterns (skip research-phase):

  • Phase 1 (Foundation): Microsoft.Extensions.Hosting + CommunityToolkit.Mvvm + MsalCacheHelper patterns are extensively documented in official Microsoft sources.
  • Phase 2 (Permissions): PnP Framework permission scanning APIs are HIGH confidence from official PnP documentation.
  • Phase 6 (Distribution): dotnet publish single-file configuration is straightforward and well-documented.

Confidence Assessment

Area Confidence Notes
Stack HIGH All package versions verified on NuGet; .NET lifecycle dates confirmed on Microsoft support policy page; PnP.Framework vs PnP.Core SDK choice verified against authoritative GitHub issue
Features MEDIUM Microsoft docs (permissions reports, storage reports, Graph API) are HIGH; competitor feature analysis from marketing pages is MEDIUM; no direct API testing performed
Architecture HIGH MVVM patterns from Microsoft Learn (official); PnP Framework auth patterns from official PnP docs; MsalCacheHelper from official MSAL.NET docs
Pitfalls HIGH Critical pitfalls verified via official docs, PnP GitHub issues, and direct audit of the existing codebase (CONCERNS.md); async deadlock and WPF trimming pitfalls confirmed via dotnet/wpf GitHub issues

Overall confidence: HIGH

Gaps to Address

  • PnP Provisioning Engine for Teams-connected sites: The behavior of PnP.Framework's provisioning engine when applied to Teams-connected modern team sites (vs. classic or communication sites) is not fully documented. Validate during Phase 4 planning with a dedicated research spike.
  • User cross-site access enumeration via Graph API: The correct Graph API sequence for "all permissions for user X across N sites" (covering group memberships, direct site assignments, and SharePoint group memberships) has multiple possible approaches with different throttling profiles. Validate the most efficient approach during Phase 5 planning.
  • Graph API volume for duplicate detection: Enumerating file hashes across a large tenant (100k+ files) via driveItem Graph calls has unclear throttling limits at that scale. The practical concurrency limit and whether SHA256 computation must happen client-side needs validation.
  • ScottPlot.WPF XAML integration: ScottPlot 5.x WPF XAML control integration patterns are less documented than the WinForms equivalent. Validate the WpfPlot control binding approach during Phase 5 planning.

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • NuGet: ScottPlot.WPF XAML control documentation — sparse; WpfPlot binding patterns need hands-on validation

Research completed: 2026-04-02 Ready for roadmap: yes