Files
Sharepoint-Toolbox/.planning/research/PITFALLS.md
Kawa 0c2e26e597 docs: complete project research for SharePoint Toolbox rewrite
Research covers stack (NET10/WPF/PnP.Framework), features (v1 parity + v1.x
differentiators), architecture (MVVM four-layer pattern), and pitfalls
(10 critical pitfalls all addressed in foundation phase). SUMMARY.md
synthesizes findings with phase-structured roadmap implications.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:07:47 +02:00

384 lines
33 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Pitfalls Research
**Domain:** C#/WPF SharePoint Online administration desktop tool (PowerShell-to-C# rewrite)
**Researched:** 2026-04-02
**Confidence:** HIGH (critical pitfalls verified via official docs, PnP GitHub issues, and known existing codebase problems)
---
## Critical Pitfalls
### Pitfall 1: Calling PnP/CSOM Methods Synchronously on the UI Thread
**What goes wrong:**
`AuthenticationManager.GetContext()`, `ExecuteQuery()`, and similar PnP Framework / CSOM calls are blocking network operations. If called directly on the WPF UI thread — even inside a button click handler — the entire window freezes until the call completes. This is precisely what causes the UI freezes in the current PowerShell app, and the problem migrates verbatim into C# if async patterns are not used from day one.
A subtler variant: using `.Result` or `.Wait()` on a `Task` from the UI thread. The UI thread holds a `SynchronizationContext`; the async continuation needs that same context to resume; deadlock ensues. The application hangs with no exception and no feedback.
**Why it happens:**
Developers migrating from PowerShell think in sequential terms and instinctively port one-liner calls directly to event handler bodies. The WPF framework does not prevent synchronous blocking — it just stops processing messages, which looks like a freeze.
**How to avoid:**
- Every SharePoint/PnP call must be wrapped in `await Task.Run(...)` or use the async overloads directly (`ExecuteQueryRetryAsync`, `GetContextAsync`).
- Never use `.Result`, `.Wait()`, or `Task.GetAwaiter().GetResult()` on the UI thread.
- Establish a project-wide convention: all ViewModels execute SharePoint operations through `async Task` methods with `CancellationToken` parameters. Codify this in architecture docs from Phase 1.
- Use `ConfigureAwait(false)` in all service/repository layer code (below ViewModel level) so continuations do not need to return to the UI thread unnecessarily.
**Warning signs:**
- Any `void` method containing a PnP call.
- Any `Task.Result` or `.Wait()` in ViewModel or code-behind.
- Button click handlers that are not `async`.
- Application hangs for seconds at a time when switching tenants or starting operations.
**Phase to address:** Foundation/infrastructure phase (first phase). This pattern must be established before any feature work begins. Retrofitting async throughout a codebase is one of the most expensive rewrites possible.
---
### Pitfall 2: Replicating Silent Error Suppression from the PowerShell Original
**What goes wrong:**
The existing codebase has 38 empty `catch` blocks and 27 instances of `-ErrorAction SilentlyContinue`. During a rewrite, developers under time pressure port the "working" behavior, which means they replicate the silent failures. The C# version appears to work in demos but hides the same class of bugs: group member additions that silently did nothing, storage scans that silently skipped folders, JSON loads that silently returned empty defaults from corrupted files.
**Why it happens:**
Port-from-working-code instinct. The original returned a result (even if wrong), so the C# version is written to also return a result without questioning whether an error was swallowed. Also, `try { ... } catch (Exception) { }` in C# is syntactically shorter and less ceremonial than PowerShell's equivalent, making it easy to write reflexively.
**How to avoid:**
- Treat every `catch` block as code that requires a positive decision: log and recover, log and rethrow, or log and surface to the user. A `catch` that does none of these three things is a bug.
- Adopt a structured logging pattern (e.g., `ILogger<T>` with `Microsoft.Extensions.Logging`) from Phase 1 so logging is never optional.
- Create a custom `SharePointOperationException` hierarchy that preserves original exceptions and adds context (which site, which operation, which user) before rethrowing. This prevents exception swallowing during the port.
- In PR reviews, flag any empty or log-only catch blocks that do not surface the error to the user as a defect.
**Warning signs:**
- Any `catch (Exception ex) { }` with no body.
- Any `catch` block that only calls `_logger.LogWarning` but returns a success result to the caller.
- Operations that complete in < 1 second when they should take 510 seconds (silent skip).
- Users reporting "the button did nothing" with no error shown.
**Phase to address:** Foundation/infrastructure phase. Define the error handling strategy and base exception types before porting any features.
---
### Pitfall 3: SharePoint List View Threshold (5 000 Items) Causing Unhandled Exceptions
**What goes wrong:**
Any CSOM or PnP Framework call that queries a SharePoint list without explicit pagination throws a `Microsoft.SharePoint.Client.ServerException` with message "The attempted operation is prohibited because it exceeds the list view threshold" when the list contains more than 5 000 items. In the current PowerShell code this is partially masked by `-ErrorAction SilentlyContinue`. In C# it becomes an unhandled exception that crashes the operation unless explicitly caught and handled.
Real tenant libraries with 5 000+ files are common. Permissions reports, storage scans, and file search are all affected.
**Why it happens:**
Developers test against small tenant sites during development. The threshold is not hit, tests pass, the feature ships. First production use against a real client library fails.
**How to avoid:**
- All `GetItems`, `GetListItems`, and folder-enumeration calls must use `CamlQuery` with `RowLimit` set to a page size (5002 000), iterating with `ListItemCollectionPosition` until exhausted.
- For Graph SDK paths, use the `PageIterator` pattern; never call `.GetAsync()` on a collection without a `$top` parameter.
- The storage recursion function (`Collect-FolderStorage` equivalent) must default to depth 34, not 999, and show estimated time before starting.
- Write an integration test against a seeded list of 6 000 items before shipping each feature that enumerates list items.
**Warning signs:**
- Any `GetItems` call without a `CamlQuery` with explicit `RowLimit`.
- Any Graph SDK call to list items without `.Top(n)`.
- `ServerException` appearing in logs from client sites but not in dev testing.
**Phase to address:** Each feature phase that touches list enumeration (permissions, storage, file search). The pagination helper should be a shared utility written in the foundation phase and reused everywhere.
---
### Pitfall 4: Multi-Tenant Token Cache Race Conditions and Stale Tokens
**What goes wrong:**
The design requires cached authentication sessions so users can switch between client tenants without re-authenticating. MSAL.NET token caches are not thread-safe by default. If two background operations run concurrently against different tenants, cache read/write races produce corrupted cache state, silent auth failures, or one tenant's token being used for another tenant's request.
A secondary problem: when an Azure AD app registration's permissions change (e.g., a new Graph scope is granted), MSAL returns the cached token for the old scope. The operation fails with a 403 but looks like a permissions error, not a stale cache error, sending the developer on a false debugging path.
**Why it happens:**
Multi-tenant caching is not covered in most MSAL.NET tutorials, which show single-tenant flows. The token cache API (`TokenCacheCallback`, `BeforeAccessNotification`, `AfterAccessNotification`) is low-level and easy to implement incorrectly.
**How to avoid:**
- Use `Microsoft.Identity.Client.Extensions.Msal` (`MsalCacheHelper`) for file-based, cross-process-safe token persistence. This is the Microsoft-recommended approach for desktop public client apps.
- The `AuthenticationManager` instance in PnP Framework accepts a `tokenCacheCallback`; wire it to `MsalCacheHelper` so cache is persisted safely per-tenant.
- Scope the `IPublicClientApplication` instance per-ClientId (app registration), not per-tenant URL. Different tenants share the same client app but have different account entries in the cache.
- Implement an explicit "clear cache for tenant" action in the UI so users can force re-authentication when permissions change.
- Never share a single `AuthenticationManager` instance across concurrent operations on different tenants without locking.
**Warning signs:**
- Intermittent 401 or 403 errors that resolve after restarting the app.
- User reports "wrong tenant data shown" (cross-tenant token bleed).
- `MsalUiRequiredException` thrown only on the second or third operation of a session.
**Phase to address:** Authentication/multi-tenant infrastructure phase (early, before any feature uses the auth layer).
---
### Pitfall 5: WPF ObservableCollection Updates from Background Threads
**What goes wrong:**
Populating a `DataGrid` or `ListView` bound to an `ObservableCollection<T>` from a background `Task` or `Task.Run` throws a `NotSupportedException`: "This type of CollectionView does not support changes to its SourceCollection from a thread different from the Dispatcher thread." The exception crashes the background operation. If it is swallowed (see Pitfall 2), the UI simply does not update.
This maps directly to the current app's runspace-to-UI communication via synchronized hashtables polled by a timer. The C# version must use the Dispatcher or the MVVM toolkit equivalently.
**Why it happens:**
In a `Task.Run` lambda, the continuation runs on a thread pool thread, not the UI thread. Developers add items to the collection inside that lambda. It works in small-scale testing (timing may work) but fails under load.
**How to avoid:**
- Never add items to an `ObservableCollection<T>` from a non-UI thread.
- Preferred pattern: collect results into a plain `List<T>` on the background thread, then `await Application.Current.Dispatcher.InvokeAsync(() => { Items = new ObservableCollection<T>(list); })` in one atomic swap.
- For streaming progress (show items as they arrive), use `BindingOperations.EnableCollectionSynchronization` with a lock object at initialization, then add items with the lock held.
- Use `IProgress<T>` with `Progress<T>` (captures the UI `SynchronizationContext` at construction) to report incremental results safely.
**Warning signs:**
- `InvalidOperationException` or `NotSupportedException` in logs referencing `CollectionView`.
- UI lists that do not update despite background operation completing.
- Items appearing out of order or partially in lists.
**Phase to address:** Foundation/infrastructure phase. Define the progress-reporting and collection-update patterns before porting any feature that returns lists of results.
---
### Pitfall 6: WPF Trimming Breaks Self-Contained EXE
**What goes wrong:**
Publishing a WPF app as a self-contained single EXE with `PublishTrimmed=true` silently removes types that WPF and XAML use via reflection at runtime. The app compiles and publishes successfully but crashes at startup or throws `TypeInitializationException` when opening a window whose XAML references a type that was trimmed. PnP Framework and MSAL also use reflection heavily; trimming removes their internal types.
**Why it happens:**
The .NET trimmer performs static analysis and removes code it cannot prove is referenced. XAML data binding, converters, `DataTemplateSelector`, `IValueConverter`, and `DynamicResource` are resolved at runtime via reflection — the trimmer cannot see these references.
**How to avoid:**
- Do not use `PublishTrimmed=true` for WPF + PnP Framework + MSAL projects. The EXE will be larger (~150 MB self-contained is expected and acceptable per PROJECT.md).
- Use `PublishSingleFile=true` with `SelfContained=true` and `IncludeAllContentForSelfExtract=true`, but without trimming. This bundles the runtime into the EXE correctly.
- Verify the single-file output in CI by running the EXE on a clean machine (no .NET installed) before each release.
- Set `<PublishReadyToRun>true</PublishReadyToRun>` for startup performance improvement instead of trimming.
**Warning signs:**
- Publish profile has `<PublishTrimmed>true</PublishTrimmed>`.
- "Works on dev machine, crashes on client machine" with `TypeInitializationException` or `MissingMethodException`.
- EXE is suspiciously small (< 50 MB for a self-contained WPF app).
**Phase to address:** Distribution/packaging phase. Establish the publish profile with correct flags before any release packaging work.
---
### Pitfall 7: Async Void in Command Handlers Swallows Exceptions
**What goes wrong:**
In WPF, button `Click` event handlers are `void`-returning delegates. Developers writing `async void` handlers (e.g., `private async void OnRunButtonClick(...)`) create methods where exceptions thrown after an `await` are raised on the `SynchronizationContext` rather than returned as a faulted `Task`. These exceptions cannot be caught by a caller and will crash the process (or be silently eaten by `Application.DispatcherUnhandledException` without the stack context needed to debug them).
**Why it happens:**
MVVM `ICommand` requires a `void Execute(object parameter)` signature. New C# developers write `async void Execute(...)` without understanding the consequence. The `CommunityToolkit.Mvvm` provides `AsyncRelayCommand` to solve this correctly, but it is not the obvious choice.
**How to avoid:**
- Never write `async void` anywhere in the codebase except the required WPF event handler entry points in code-behind, and only when those entry points immediately delegate to an `async Task` ViewModel method.
- Use `AsyncRelayCommand` from `CommunityToolkit.Mvvm` for all commands that invoke async operations. It wraps the `Task`, exposes `ExecutionTask`, `IsRunning`, and `IsCancellationRequested`, and handles exceptions via `AsyncRelayCommandOptions.FlowExceptionsToTaskScheduler`.
- Wire a global `Application.DispatcherUnhandledException` handler and `TaskScheduler.UnobservedTaskException` handler that log full stack traces and show a user-facing error dialog. This is the last line of defense.
**Warning signs:**
- Any `async void` method outside of a `MainWindow.xaml.cs` entry point.
- Commands implemented as `async void Execute(...)` in ViewModels.
- Exceptions that appear in logs with no originating ViewModel context.
**Phase to address:** Foundation/infrastructure phase (MVVM base classes and command patterns established before any feature code).
---
### Pitfall 8: SharePoint API Throttling Not Handled (429/503)
**What goes wrong:**
SharePoint Online and Microsoft Graph enforce per-app, per-tenant throttling. Bulk operations (permissions scan across 50+ sites, storage scan on 10 000+ folders, bulk member additions) generate enough API calls to trigger HTTP 429 or 503 responses. Without explicit retry-after handling, the operation fails partway through with an unhandled `HttpRequestException` and leaves the user with partial results and no indication of how to resume.
**Why it happens:**
PnP.PowerShell handled this invisibly for the PowerShell app. PnP Framework in C# does have built-in retry via `ExecuteQueryRetryAsync`, but developers unfamiliar with C#-side PnP may use the raw CSOM `ExecuteQuery()` or direct `HttpClient` calls that lack this protection.
**How to avoid:**
- Always use `ExecuteQueryRetryAsync` (never `ExecuteQuery`) for all CSOM batch calls.
- When using Graph SDK, use the `GraphServiceClient` with the default retry handler enabled — it handles 429 with `Retry-After` header respect automatically.
- For multi-site bulk operations, add a short delay (100300 ms) between site connections to avoid burst throttling. Implement a configurable concurrency limit (default: sequential or max 3 parallel).
- Surface throttling events in the progress log: "Rate limited, retrying in 15s…" so the user knows the operation is paused, not hung.
**Warning signs:**
- Raw `ExecuteQuery()` calls anywhere in the codebase.
- `HttpRequestException` with 429 status in logs.
- Operations that fail consistently at the same approximate item count across multiple runs.
**Phase to address:** Foundation/infrastructure phase for the retry handler; each feature phase must use the established pattern.
---
### Pitfall 9: Resource Disposal Gaps in Long-Running Operations
**What goes wrong:**
`ClientContext` objects returned by `AuthenticationManager.GetContext()` are `IDisposable`. If a background `Task` is cancelled or throws an exception mid-operation, a `ClientContext` created in the try block is not disposed if the `finally` block is missing. Over a long session (MSP workflow: dozens of tenant switches, multiple scans), leaked `ClientContext` objects accumulate unmanaged resources and eventually cause connection refusals or memory degradation. This is the C# equivalent of the runspace disposal gaps in the current codebase.
**Why it happens:**
`using` statements are the idiomatic C# solution, but they do not compose well with async cancellation. Developers use `try/catch` without `finally`, or structure the code so the `using` scope is exited before the `Task` completes.
**How to avoid:**
- Always obtain `ClientContext` inside a `using` statement or `await using` if using C# 8+ disposable pattern: `await using var ctx = await authManager.GetContextAsync(url, token)`.
- Wrap the entire operation body in `try/finally` with disposal in the `finally` block when `await using` is not applicable.
- When a `CancellationToken` is triggered, let the `OperationCanceledException` propagate naturally; the `using` / `finally` will still execute.
- Add a unit test for the "cancelled mid-operation" path that verifies `ClientContext.Dispose()` is called.
**Warning signs:**
- `GetContext` calls without `using`.
- `catch (Exception) { return; }` that bypasses a `ClientContext` created earlier in the method.
- Memory growth over a multi-hour MSP session visible in Task Manager.
**Phase to address:** Foundation/infrastructure phase (define the context acquisition pattern) and validated in each feature phase.
---
### Pitfall 10: JSON Settings Corruption on Concurrent Writes
**What goes wrong:**
The app writes profiles, settings, and templates to JSON files on disk. If the user triggers two rapid operations (e.g., saves a profile while a background scan completes and updates settings), both code paths may attempt to write the same file simultaneously. The second write overwrites a partially-written first write, producing a truncated or syntactically invalid JSON file. On next startup, the file fails to parse and silently returns empty defaults — erasing all user profiles.
This is a known bug in the current app (CONCERNS.md: "Profile JSON file: no transaction semantics").
**Why it happens:**
File I/O is not inherently thread-safe. `System.Text.Json`'s `JsonSerializer.SerializeAsync` writes to a stream but does not protect the file from concurrent access by another code path.
**How to avoid:**
- Serialize all writes to each JSON file through a single `SemaphoreSlim(1)` per file. Acquire before reading or writing, release in `finally`.
- Use write-then-replace: write to `filename.tmp`, validate the JSON by deserializing it, then `File.Move(tmp, original, overwrite: true)`. An interrupted write leaves the original intact.
- On startup, if the primary file is invalid, check for a `.tmp` or `.bak` version before falling back to defaults — and log which fallback was used.
**Warning signs:**
- Profile file occasionally empty after normal use.
- `JsonException` on startup that the user cannot reproduce on demand.
- App loaded with correct profiles yesterday, empty profiles today.
**Phase to address:** Foundation/infrastructure phase (data access layer). Must be solved before any feature persists data.
---
## Technical Debt Patterns
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|----------|-------------------|----------------|-----------------|
| Copy PowerShell logic verbatim into a `Task.Run` | Fast initial port, works locally | Inherits all silent failures, no cancellation, no progress reporting | Never — always re-examine the logic |
| `async void` command handlers | Compiles and runs | Exceptions crash app silently; no cancellation propagation | Only for WPF event entry points that immediately call `async Task` |
| Direct `ExecuteQuery()` without retry | Simpler call site | Crashes on throttling for real client tenants | Never — use `ExecuteQueryRetryAsync` |
| Single shared `AuthenticationManager` instance | Simple instantiation | Token cache race conditions under concurrent operations | Only if all operations are strictly sequential (initial MVP, clearly documented) |
| Load entire list into memory before display | Simple binding | `OutOfMemoryException` on libraries with 50k+ items | Only for lists known to be small and bounded (e.g., profiles list) |
| No `CancellationToken` propagation | Simpler method signatures | Operations cannot be cancelled; UI stuck waiting | Never for operations > 2 seconds |
| Hard-code English fallback strings in code | Quick to write | Breaks FR locale; strings diverge from key system | Never — always use resource keys |
---
## Integration Gotchas
| Integration | Common Mistake | Correct Approach |
|-------------|----------------|------------------|
| PnP Framework `GetContext` | Calling on UI thread synchronously | Always `await Task.Run(() => authManager.GetContext(...))` or use `GetContextAsync` |
| MSAL token cache (multi-tenant) | One `IPublicClientApplication` per call | One `IPublicClientApplication` per ClientId, long-lived, with `MsalCacheHelper` wired |
| SharePoint list enumeration | No `RowLimit` in `CamlQuery` | Always paginate with `RowLimit` ≤ 2 000 and `ListItemCollectionPosition` |
| Graph SDK paging | Calling `.GetAsync()` on collections without `$top` | Use `PageIterator` or explicit `.Top(n)` on every collection request |
| PnP `ExecuteQueryRetryAsync` | Forgetting to `await`; using synchronous `ExecuteQuery` | Always `await ctx.ExecuteQueryRetryAsync()` |
| WPF `ObservableCollection` | Modifying from `Task.Run` lambda | Collect into `List<T>`, then assign via `Dispatcher.InvokeAsync` |
| PnP Management Shell client ID | Using the shared PnP app ID in a multi-tenant production tool | Register a dedicated Azure AD app per deployment; don't rely on PnP's shared registration |
| SharePoint Search API (KQL) | No result limit, assuming all results returned | Always set `RowLimit`; results capped at 500 per page, max 50 000 total |
---
## Performance Traps
| Trap | Symptoms | Prevention | When It Breaks |
|------|----------|------------|----------------|
| Loading all `ObservableCollection` items before displaying any | UI freezes until entire operation completes | Use `IProgress<T>` to stream items as they arrive; enable UI virtualization | Any list > ~500 items |
| WPF virtualization disabled by `ScrollViewer.CanContentScroll=False` or grouping | DataGrid scroll is sluggish with 200+ rows | Never disable `CanContentScroll`; set `VirtualizingPanel.IsVirtualizingWhenGrouping=True` | > 200 rows in a DataGrid |
| Adding items to `ObservableCollection` one-by-one from background | Thousands of UI binding notifications; UI jank | Batch-load: assign `new ObservableCollection<T>(list)` once | > 50 items added in a loop |
| Permissions scan without depth limit | Scan takes hours on deep folder structures | Default depth 34; show estimated time; require explicit user override for deeper | Sites with > 5 folder levels |
| HTML report built entirely in memory | `OutOfMemoryException` or report generation takes minutes | Stream HTML to file; write rows as they are produced, not after full scan | > 10 000 rows in report |
| Sequential site processing for multi-site reports | Report for 20 sites takes 20× single-site time | Process up to 3 sites concurrently with `SemaphoreSlim`; show per-site progress | > 5 sites selected |
| Duplicate `Connect-PnPOnline` calls per operation | Redundant browser popups or token refreshes | Cache authenticated `ClientContext` per (tenant, clientId) for session lifetime | Any operation that reconnects unnecessarily |
---
## Security Mistakes
| Mistake | Risk | Prevention |
|---------|------|------------|
| Storing Client ID in plaintext JSON profile | Low on its own (Client ID is not a secret), but combined with tenant URL it eases targeted phishing | Document that Client ID is not a secret; optionally encrypt the profile file with DPAPI `ProtectedData.Protect` for defence-in-depth |
| Writing temp files with tenant credentials to `%TEMP%` | File readable by other processes on the same user account; not cleaned up on crash | Use `SecureString` in-memory for transient auth data; delete temp files in `finally` blocks; prefer named pipes or in-memory channels |
| No validation of tenant URL format before connecting | Typo sends auth token to wrong endpoint; user confused by misleading auth error | Validate against regex `^https://[a-zA-Z0-9-]+\.sharepoint\.com` before any connection attempt |
| Logging full exception messages that include HTTP request URLs | Tenant URLs and item paths exposed in log files readable on shared machines | Strip or redact SharePoint URLs in log output at `Debug` level; keep them out of `Information`-level user-visible logs |
| Bundling PnP Management Shell client ID (shared multi-tenant app) | App uses a shared identity not owned by the deploying organisation; harder to audit and revoke | Require each deployment to use a dedicated app registration; document the registration steps clearly |
---
## UX Pitfalls
| Pitfall | User Impact | Better Approach |
|---------|-------------|-----------------|
| No cancellation for operations > 5 seconds | User closes app via Task Manager; loses in-progress results; must restart | Every operation exposed in UI must accept a `CancellationToken`; show a "Cancel" button that is always enabled during operation |
| Progress bar with no ETA or item count | User cannot judge whether to wait or cancel | Show "Scanned X of Y sites" or "X items found"; update every 0.5 s minimum |
| Error messages showing raw exception text | Non-technical admin users see stack traces and `ServerException: CSOM call failed` | Translate known error types to plain-language messages; offer a "Copy technical details" link for support escalation |
| Silent success on bulk operations with partial failures | User thinks all 50 members were added; 12 failed silently | Show a per-item result summary: "38 added successfully, 12 failed — see details" |
| Language switches require app restart | FR-speaking users see flickering English then French on startup | Load correct language before any UI is shown; apply language from settings before `InitializeComponent` |
| Permissions report jargon ("Full Control", "Contribute", "Limited Access") shown raw | Non-technical stakeholders do not understand the report | Map SharePoint permission levels to plain-language equivalents in the report output; keep raw names in a "technical details" expandable section |
---
## "Looks Done But Isn't" Checklist
- [ ] **Multi-tenant session switching:** Verify that switching from Tenant A to Tenant B does not return Tenant A's data. Test with two real tenants, not two sites in the same tenant.
- [ ] **Operation cancellation:** Verify that pressing Cancel stops the operation within 2 seconds and leaves no zombie threads or unreleased `ClientContext` objects.
- [ ] **5 000+ item libraries:** Verify permissions report and storage scan complete without `ServerException` on a real library with > 5 000 items (not a test tenant with 50 items).
- [ ] **Self-contained EXE on clean machine:** Install the EXE on a machine with no .NET runtime installed; verify startup and a complete workflow before every release.
- [ ] **JSON file corruption recovery:** Corrupt a profile JSON file manually; verify the app starts, logs the corruption, does not silently return empty profiles, and preserves the backup.
- [ ] **Concurrent writes:** Simultaneously trigger "Save profile" and "Export settings" from two rapid button clicks; verify neither file is truncated.
- [ ] **Large HTML reports:** Generate a permissions report for a site with > 5 000 items; verify the HTML file opens in a browser in < 10 seconds and the DataGrid is scrollable.
- [ ] **FR locale completeness:** Switch to French; verify no UI string shows an untranslated key or hardcoded English text.
- [ ] **Throttling recovery:** Simulate a 429 response; verify the operation pauses, logs "Retrying in Xs", and completes successfully after the retry interval.
---
## Recovery Strategies
| Pitfall | Recovery Cost | Recovery Steps |
|---------|---------------|----------------|
| Async/sync deadlocks introduced in foundation | HIGH — requires refactoring all affected call chains | Identify all `.Result`/`.Wait()` calls with a codebase grep; convert bottom-up (services first, then ViewModels) |
| Silent failures ported from PowerShell | MEDIUM — requires audit of every catch block | Search all `catch` blocks; classify each as log-and-recover, log-and-rethrow, or log-and-surface; fix one feature at a time |
| Token cache corruption | LOW — clear the cache file and re-authenticate | Expose a "Clear cached sessions" action in the UI; document in troubleshooting guide |
| JSON profile file corruption | LOW if backup exists, HIGH if no backup | Implement write-then-replace before first release; add backup-on-corrupt logic to deserializer |
| WPF trimming breaks EXE | MEDIUM — need to republish with trimming disabled | Update publish profile, re-run publish, retest EXE on clean machine |
| Missing pagination on large lists | MEDIUM — need to refactor per-feature enumeration | Create shared pagination helper; replace calls feature by feature; test each against 6 000-item library |
---
## Pitfall-to-Phase Mapping
| Pitfall | Prevention Phase | Verification |
|---------|------------------|--------------|
| Sync/async deadlocks on UI thread | Phase 1: Foundation — establish async-first patterns | Code review checklist: no `.Result`/`.Wait()` in any ViewModel or event handler |
| Silent error suppression replication | Phase 1: Foundation — define error handling strategy and base types | Automated lint rule (Roslyn analyser or SonarQube) flagging empty catch blocks |
| SharePoint 5 000-item threshold | Phase 1: Foundation — write shared paginator; reused in all features | Integration test against 6 000-item library for every feature that enumerates lists |
| Multi-tenant token cache race | Phase 1: Foundation — auth layer with `MsalCacheHelper` | Test: two concurrent operations on different tenants return correct data |
| ObservableCollection cross-thread updates | Phase 1: Foundation — define progress-reporting pattern | Automated test: populate collection from background thread; verify no exception |
| WPF trimming breaks EXE | Final distribution phase | CI step: run published EXE on a clean Windows VM, assert startup and one workflow completes |
| Async void command handlers | Phase 1: Foundation — establish MVVM base with `AsyncRelayCommand` | Code review: no `async void` in ViewModel files |
| API throttling unhandled | Phase 1: Foundation — retry handler; applied by every feature | Load test: run storage scan against a tenant with rate-limiting; verify retry log entry |
| Resource disposal gaps | Phase 1: Foundation — context acquisition pattern | Unit test: cancel a long operation mid-run; verify `ClientContext.Dispose` called |
| JSON concurrent write corruption | Phase 1: Foundation — write-then-replace + `SemaphoreSlim` | Stress test: 100 concurrent save calls; verify file always parseable after all complete |
---
## Sources
- PnP Framework GitHub issue #961: `AuthenticationManager.GetContext` freeze in C# desktop app — https://github.com/pnp/pnpframework/issues/961
- PnP Framework GitHub issue #447: `AuthenticationManager.GetContext` hanging in ASP.NET — https://github.com/pnp/pnpframework/issues/447
- Microsoft Learn: Token cache serialization (MSAL.NET) — https://learn.microsoft.com/en-us/entra/msal/dotnet/how-to/token-cache-serialization
- Microsoft Learn: SharePoint Online list view threshold — https://learn.microsoft.com/en-us/troubleshoot/sharepoint/lists-and-libraries/items-exceeds-list-view-threshold
- Microsoft Learn: Single-file publishing overview — https://learn.microsoft.com/en-us/dotnet/core/deploying/single-file/overview
- dotnet/wpf GitHub issue #4216: `PublishTrimmed` causes `Unhandled Exception` in self-contained WPF app — https://github.com/dotnet/wpf/issues/4216
- dotnet/wpf GitHub issue #6096: Trimming for WPF — https://github.com/dotnet/wpf/issues/6096
- Microsoft .NET Blog: Await, and UI, and deadlocks — https://devblogs.microsoft.com/dotnet/await-and-ui-and-deadlocks-oh-my/
- Microsoft Learn: AsyncRelayCommand (CommunityToolkit.Mvvm) — https://learn.microsoft.com/en-us/dotnet/communitytoolkit/mvvm/asyncrelaycommand
- Microsoft Learn: Graph SDK paging — https://learn.microsoft.com/en-us/graph/sdks/paging
- Microsoft Learn: Graph throttling guidance — https://learn.microsoft.com/en-us/graph/throttling
- Rick Strahl's Web Log: Async and Async Void Event Handling in WPF — https://weblog.west-wind.com/posts/2022/Apr/22/Async-and-Async-Void-Event-Handling-in-WPF
- Existing codebase CONCERNS.md audit (2026-04-02) — `.planning/codebase/CONCERNS.md`
---
*Pitfalls research for: C#/WPF SharePoint Online administration desktop tool (PowerShell-to-C# rewrite)*
*Researched: 2026-04-02*