Research covers stack (NET10/WPF/PnP.Framework), features (v1 parity + v1.x differentiators), architecture (MVVM four-layer pattern), and pitfalls (10 critical pitfalls all addressed in foundation phase). SUMMARY.md synthesizes findings with phase-structured roadmap implications. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
384 lines
33 KiB
Markdown
384 lines
33 KiB
Markdown
# Pitfalls Research
|
||
|
||
**Domain:** C#/WPF SharePoint Online administration desktop tool (PowerShell-to-C# rewrite)
|
||
**Researched:** 2026-04-02
|
||
**Confidence:** HIGH (critical pitfalls verified via official docs, PnP GitHub issues, and known existing codebase problems)
|
||
|
||
---
|
||
|
||
## Critical Pitfalls
|
||
|
||
### Pitfall 1: Calling PnP/CSOM Methods Synchronously on the UI Thread
|
||
|
||
**What goes wrong:**
|
||
`AuthenticationManager.GetContext()`, `ExecuteQuery()`, and similar PnP Framework / CSOM calls are blocking network operations. If called directly on the WPF UI thread — even inside a button click handler — the entire window freezes until the call completes. This is precisely what causes the UI freezes in the current PowerShell app, and the problem migrates verbatim into C# if async patterns are not used from day one.
|
||
|
||
A subtler variant: using `.Result` or `.Wait()` on a `Task` from the UI thread. The UI thread holds a `SynchronizationContext`; the async continuation needs that same context to resume; deadlock ensues. The application hangs with no exception and no feedback.
|
||
|
||
**Why it happens:**
|
||
Developers migrating from PowerShell think in sequential terms and instinctively port one-liner calls directly to event handler bodies. The WPF framework does not prevent synchronous blocking — it just stops processing messages, which looks like a freeze.
|
||
|
||
**How to avoid:**
|
||
- Every SharePoint/PnP call must be wrapped in `await Task.Run(...)` or use the async overloads directly (`ExecuteQueryRetryAsync`, `GetContextAsync`).
|
||
- Never use `.Result`, `.Wait()`, or `Task.GetAwaiter().GetResult()` on the UI thread.
|
||
- Establish a project-wide convention: all ViewModels execute SharePoint operations through `async Task` methods with `CancellationToken` parameters. Codify this in architecture docs from Phase 1.
|
||
- Use `ConfigureAwait(false)` in all service/repository layer code (below ViewModel level) so continuations do not need to return to the UI thread unnecessarily.
|
||
|
||
**Warning signs:**
|
||
- Any `void` method containing a PnP call.
|
||
- Any `Task.Result` or `.Wait()` in ViewModel or code-behind.
|
||
- Button click handlers that are not `async`.
|
||
- Application hangs for seconds at a time when switching tenants or starting operations.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase (first phase). This pattern must be established before any feature work begins. Retrofitting async throughout a codebase is one of the most expensive rewrites possible.
|
||
|
||
---
|
||
|
||
### Pitfall 2: Replicating Silent Error Suppression from the PowerShell Original
|
||
|
||
**What goes wrong:**
|
||
The existing codebase has 38 empty `catch` blocks and 27 instances of `-ErrorAction SilentlyContinue`. During a rewrite, developers under time pressure port the "working" behavior, which means they replicate the silent failures. The C# version appears to work in demos but hides the same class of bugs: group member additions that silently did nothing, storage scans that silently skipped folders, JSON loads that silently returned empty defaults from corrupted files.
|
||
|
||
**Why it happens:**
|
||
Port-from-working-code instinct. The original returned a result (even if wrong), so the C# version is written to also return a result without questioning whether an error was swallowed. Also, `try { ... } catch (Exception) { }` in C# is syntactically shorter and less ceremonial than PowerShell's equivalent, making it easy to write reflexively.
|
||
|
||
**How to avoid:**
|
||
- Treat every `catch` block as code that requires a positive decision: log and recover, log and rethrow, or log and surface to the user. A `catch` that does none of these three things is a bug.
|
||
- Adopt a structured logging pattern (e.g., `ILogger<T>` with `Microsoft.Extensions.Logging`) from Phase 1 so logging is never optional.
|
||
- Create a custom `SharePointOperationException` hierarchy that preserves original exceptions and adds context (which site, which operation, which user) before rethrowing. This prevents exception swallowing during the port.
|
||
- In PR reviews, flag any empty or log-only catch blocks that do not surface the error to the user as a defect.
|
||
|
||
**Warning signs:**
|
||
- Any `catch (Exception ex) { }` with no body.
|
||
- Any `catch` block that only calls `_logger.LogWarning` but returns a success result to the caller.
|
||
- Operations that complete in < 1 second when they should take 5–10 seconds (silent skip).
|
||
- Users reporting "the button did nothing" with no error shown.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase. Define the error handling strategy and base exception types before porting any features.
|
||
|
||
---
|
||
|
||
### Pitfall 3: SharePoint List View Threshold (5 000 Items) Causing Unhandled Exceptions
|
||
|
||
**What goes wrong:**
|
||
Any CSOM or PnP Framework call that queries a SharePoint list without explicit pagination throws a `Microsoft.SharePoint.Client.ServerException` with message "The attempted operation is prohibited because it exceeds the list view threshold" when the list contains more than 5 000 items. In the current PowerShell code this is partially masked by `-ErrorAction SilentlyContinue`. In C# it becomes an unhandled exception that crashes the operation unless explicitly caught and handled.
|
||
|
||
Real tenant libraries with 5 000+ files are common. Permissions reports, storage scans, and file search are all affected.
|
||
|
||
**Why it happens:**
|
||
Developers test against small tenant sites during development. The threshold is not hit, tests pass, the feature ships. First production use against a real client library fails.
|
||
|
||
**How to avoid:**
|
||
- All `GetItems`, `GetListItems`, and folder-enumeration calls must use `CamlQuery` with `RowLimit` set to a page size (500–2 000), iterating with `ListItemCollectionPosition` until exhausted.
|
||
- For Graph SDK paths, use the `PageIterator` pattern; never call `.GetAsync()` on a collection without a `$top` parameter.
|
||
- The storage recursion function (`Collect-FolderStorage` equivalent) must default to depth 3–4, not 999, and show estimated time before starting.
|
||
- Write an integration test against a seeded list of 6 000 items before shipping each feature that enumerates list items.
|
||
|
||
**Warning signs:**
|
||
- Any `GetItems` call without a `CamlQuery` with explicit `RowLimit`.
|
||
- Any Graph SDK call to list items without `.Top(n)`.
|
||
- `ServerException` appearing in logs from client sites but not in dev testing.
|
||
|
||
**Phase to address:** Each feature phase that touches list enumeration (permissions, storage, file search). The pagination helper should be a shared utility written in the foundation phase and reused everywhere.
|
||
|
||
---
|
||
|
||
### Pitfall 4: Multi-Tenant Token Cache Race Conditions and Stale Tokens
|
||
|
||
**What goes wrong:**
|
||
The design requires cached authentication sessions so users can switch between client tenants without re-authenticating. MSAL.NET token caches are not thread-safe by default. If two background operations run concurrently against different tenants, cache read/write races produce corrupted cache state, silent auth failures, or one tenant's token being used for another tenant's request.
|
||
|
||
A secondary problem: when an Azure AD app registration's permissions change (e.g., a new Graph scope is granted), MSAL returns the cached token for the old scope. The operation fails with a 403 but looks like a permissions error, not a stale cache error, sending the developer on a false debugging path.
|
||
|
||
**Why it happens:**
|
||
Multi-tenant caching is not covered in most MSAL.NET tutorials, which show single-tenant flows. The token cache API (`TokenCacheCallback`, `BeforeAccessNotification`, `AfterAccessNotification`) is low-level and easy to implement incorrectly.
|
||
|
||
**How to avoid:**
|
||
- Use `Microsoft.Identity.Client.Extensions.Msal` (`MsalCacheHelper`) for file-based, cross-process-safe token persistence. This is the Microsoft-recommended approach for desktop public client apps.
|
||
- The `AuthenticationManager` instance in PnP Framework accepts a `tokenCacheCallback`; wire it to `MsalCacheHelper` so cache is persisted safely per-tenant.
|
||
- Scope the `IPublicClientApplication` instance per-ClientId (app registration), not per-tenant URL. Different tenants share the same client app but have different account entries in the cache.
|
||
- Implement an explicit "clear cache for tenant" action in the UI so users can force re-authentication when permissions change.
|
||
- Never share a single `AuthenticationManager` instance across concurrent operations on different tenants without locking.
|
||
|
||
**Warning signs:**
|
||
- Intermittent 401 or 403 errors that resolve after restarting the app.
|
||
- User reports "wrong tenant data shown" (cross-tenant token bleed).
|
||
- `MsalUiRequiredException` thrown only on the second or third operation of a session.
|
||
|
||
**Phase to address:** Authentication/multi-tenant infrastructure phase (early, before any feature uses the auth layer).
|
||
|
||
---
|
||
|
||
### Pitfall 5: WPF ObservableCollection Updates from Background Threads
|
||
|
||
**What goes wrong:**
|
||
Populating a `DataGrid` or `ListView` bound to an `ObservableCollection<T>` from a background `Task` or `Task.Run` throws a `NotSupportedException`: "This type of CollectionView does not support changes to its SourceCollection from a thread different from the Dispatcher thread." The exception crashes the background operation. If it is swallowed (see Pitfall 2), the UI simply does not update.
|
||
|
||
This maps directly to the current app's runspace-to-UI communication via synchronized hashtables polled by a timer. The C# version must use the Dispatcher or the MVVM toolkit equivalently.
|
||
|
||
**Why it happens:**
|
||
In a `Task.Run` lambda, the continuation runs on a thread pool thread, not the UI thread. Developers add items to the collection inside that lambda. It works in small-scale testing (timing may work) but fails under load.
|
||
|
||
**How to avoid:**
|
||
- Never add items to an `ObservableCollection<T>` from a non-UI thread.
|
||
- Preferred pattern: collect results into a plain `List<T>` on the background thread, then `await Application.Current.Dispatcher.InvokeAsync(() => { Items = new ObservableCollection<T>(list); })` in one atomic swap.
|
||
- For streaming progress (show items as they arrive), use `BindingOperations.EnableCollectionSynchronization` with a lock object at initialization, then add items with the lock held.
|
||
- Use `IProgress<T>` with `Progress<T>` (captures the UI `SynchronizationContext` at construction) to report incremental results safely.
|
||
|
||
**Warning signs:**
|
||
- `InvalidOperationException` or `NotSupportedException` in logs referencing `CollectionView`.
|
||
- UI lists that do not update despite background operation completing.
|
||
- Items appearing out of order or partially in lists.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase. Define the progress-reporting and collection-update patterns before porting any feature that returns lists of results.
|
||
|
||
---
|
||
|
||
### Pitfall 6: WPF Trimming Breaks Self-Contained EXE
|
||
|
||
**What goes wrong:**
|
||
Publishing a WPF app as a self-contained single EXE with `PublishTrimmed=true` silently removes types that WPF and XAML use via reflection at runtime. The app compiles and publishes successfully but crashes at startup or throws `TypeInitializationException` when opening a window whose XAML references a type that was trimmed. PnP Framework and MSAL also use reflection heavily; trimming removes their internal types.
|
||
|
||
**Why it happens:**
|
||
The .NET trimmer performs static analysis and removes code it cannot prove is referenced. XAML data binding, converters, `DataTemplateSelector`, `IValueConverter`, and `DynamicResource` are resolved at runtime via reflection — the trimmer cannot see these references.
|
||
|
||
**How to avoid:**
|
||
- Do not use `PublishTrimmed=true` for WPF + PnP Framework + MSAL projects. The EXE will be larger (~150 MB self-contained is expected and acceptable per PROJECT.md).
|
||
- Use `PublishSingleFile=true` with `SelfContained=true` and `IncludeAllContentForSelfExtract=true`, but without trimming. This bundles the runtime into the EXE correctly.
|
||
- Verify the single-file output in CI by running the EXE on a clean machine (no .NET installed) before each release.
|
||
- Set `<PublishReadyToRun>true</PublishReadyToRun>` for startup performance improvement instead of trimming.
|
||
|
||
**Warning signs:**
|
||
- Publish profile has `<PublishTrimmed>true</PublishTrimmed>`.
|
||
- "Works on dev machine, crashes on client machine" with `TypeInitializationException` or `MissingMethodException`.
|
||
- EXE is suspiciously small (< 50 MB for a self-contained WPF app).
|
||
|
||
**Phase to address:** Distribution/packaging phase. Establish the publish profile with correct flags before any release packaging work.
|
||
|
||
---
|
||
|
||
### Pitfall 7: Async Void in Command Handlers Swallows Exceptions
|
||
|
||
**What goes wrong:**
|
||
In WPF, button `Click` event handlers are `void`-returning delegates. Developers writing `async void` handlers (e.g., `private async void OnRunButtonClick(...)`) create methods where exceptions thrown after an `await` are raised on the `SynchronizationContext` rather than returned as a faulted `Task`. These exceptions cannot be caught by a caller and will crash the process (or be silently eaten by `Application.DispatcherUnhandledException` without the stack context needed to debug them).
|
||
|
||
**Why it happens:**
|
||
MVVM `ICommand` requires a `void Execute(object parameter)` signature. New C# developers write `async void Execute(...)` without understanding the consequence. The `CommunityToolkit.Mvvm` provides `AsyncRelayCommand` to solve this correctly, but it is not the obvious choice.
|
||
|
||
**How to avoid:**
|
||
- Never write `async void` anywhere in the codebase except the required WPF event handler entry points in code-behind, and only when those entry points immediately delegate to an `async Task` ViewModel method.
|
||
- Use `AsyncRelayCommand` from `CommunityToolkit.Mvvm` for all commands that invoke async operations. It wraps the `Task`, exposes `ExecutionTask`, `IsRunning`, and `IsCancellationRequested`, and handles exceptions via `AsyncRelayCommandOptions.FlowExceptionsToTaskScheduler`.
|
||
- Wire a global `Application.DispatcherUnhandledException` handler and `TaskScheduler.UnobservedTaskException` handler that log full stack traces and show a user-facing error dialog. This is the last line of defense.
|
||
|
||
**Warning signs:**
|
||
- Any `async void` method outside of a `MainWindow.xaml.cs` entry point.
|
||
- Commands implemented as `async void Execute(...)` in ViewModels.
|
||
- Exceptions that appear in logs with no originating ViewModel context.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase (MVVM base classes and command patterns established before any feature code).
|
||
|
||
---
|
||
|
||
### Pitfall 8: SharePoint API Throttling Not Handled (429/503)
|
||
|
||
**What goes wrong:**
|
||
SharePoint Online and Microsoft Graph enforce per-app, per-tenant throttling. Bulk operations (permissions scan across 50+ sites, storage scan on 10 000+ folders, bulk member additions) generate enough API calls to trigger HTTP 429 or 503 responses. Without explicit retry-after handling, the operation fails partway through with an unhandled `HttpRequestException` and leaves the user with partial results and no indication of how to resume.
|
||
|
||
**Why it happens:**
|
||
PnP.PowerShell handled this invisibly for the PowerShell app. PnP Framework in C# does have built-in retry via `ExecuteQueryRetryAsync`, but developers unfamiliar with C#-side PnP may use the raw CSOM `ExecuteQuery()` or direct `HttpClient` calls that lack this protection.
|
||
|
||
**How to avoid:**
|
||
- Always use `ExecuteQueryRetryAsync` (never `ExecuteQuery`) for all CSOM batch calls.
|
||
- When using Graph SDK, use the `GraphServiceClient` with the default retry handler enabled — it handles 429 with `Retry-After` header respect automatically.
|
||
- For multi-site bulk operations, add a short delay (100–300 ms) between site connections to avoid burst throttling. Implement a configurable concurrency limit (default: sequential or max 3 parallel).
|
||
- Surface throttling events in the progress log: "Rate limited, retrying in 15s…" so the user knows the operation is paused, not hung.
|
||
|
||
**Warning signs:**
|
||
- Raw `ExecuteQuery()` calls anywhere in the codebase.
|
||
- `HttpRequestException` with 429 status in logs.
|
||
- Operations that fail consistently at the same approximate item count across multiple runs.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase for the retry handler; each feature phase must use the established pattern.
|
||
|
||
---
|
||
|
||
### Pitfall 9: Resource Disposal Gaps in Long-Running Operations
|
||
|
||
**What goes wrong:**
|
||
`ClientContext` objects returned by `AuthenticationManager.GetContext()` are `IDisposable`. If a background `Task` is cancelled or throws an exception mid-operation, a `ClientContext` created in the try block is not disposed if the `finally` block is missing. Over a long session (MSP workflow: dozens of tenant switches, multiple scans), leaked `ClientContext` objects accumulate unmanaged resources and eventually cause connection refusals or memory degradation. This is the C# equivalent of the runspace disposal gaps in the current codebase.
|
||
|
||
**Why it happens:**
|
||
`using` statements are the idiomatic C# solution, but they do not compose well with async cancellation. Developers use `try/catch` without `finally`, or structure the code so the `using` scope is exited before the `Task` completes.
|
||
|
||
**How to avoid:**
|
||
- Always obtain `ClientContext` inside a `using` statement or `await using` if using C# 8+ disposable pattern: `await using var ctx = await authManager.GetContextAsync(url, token)`.
|
||
- Wrap the entire operation body in `try/finally` with disposal in the `finally` block when `await using` is not applicable.
|
||
- When a `CancellationToken` is triggered, let the `OperationCanceledException` propagate naturally; the `using` / `finally` will still execute.
|
||
- Add a unit test for the "cancelled mid-operation" path that verifies `ClientContext.Dispose()` is called.
|
||
|
||
**Warning signs:**
|
||
- `GetContext` calls without `using`.
|
||
- `catch (Exception) { return; }` that bypasses a `ClientContext` created earlier in the method.
|
||
- Memory growth over a multi-hour MSP session visible in Task Manager.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase (define the context acquisition pattern) and validated in each feature phase.
|
||
|
||
---
|
||
|
||
### Pitfall 10: JSON Settings Corruption on Concurrent Writes
|
||
|
||
**What goes wrong:**
|
||
The app writes profiles, settings, and templates to JSON files on disk. If the user triggers two rapid operations (e.g., saves a profile while a background scan completes and updates settings), both code paths may attempt to write the same file simultaneously. The second write overwrites a partially-written first write, producing a truncated or syntactically invalid JSON file. On next startup, the file fails to parse and silently returns empty defaults — erasing all user profiles.
|
||
|
||
This is a known bug in the current app (CONCERNS.md: "Profile JSON file: no transaction semantics").
|
||
|
||
**Why it happens:**
|
||
File I/O is not inherently thread-safe. `System.Text.Json`'s `JsonSerializer.SerializeAsync` writes to a stream but does not protect the file from concurrent access by another code path.
|
||
|
||
**How to avoid:**
|
||
- Serialize all writes to each JSON file through a single `SemaphoreSlim(1)` per file. Acquire before reading or writing, release in `finally`.
|
||
- Use write-then-replace: write to `filename.tmp`, validate the JSON by deserializing it, then `File.Move(tmp, original, overwrite: true)`. An interrupted write leaves the original intact.
|
||
- On startup, if the primary file is invalid, check for a `.tmp` or `.bak` version before falling back to defaults — and log which fallback was used.
|
||
|
||
**Warning signs:**
|
||
- Profile file occasionally empty after normal use.
|
||
- `JsonException` on startup that the user cannot reproduce on demand.
|
||
- App loaded with correct profiles yesterday, empty profiles today.
|
||
|
||
**Phase to address:** Foundation/infrastructure phase (data access layer). Must be solved before any feature persists data.
|
||
|
||
---
|
||
|
||
## Technical Debt Patterns
|
||
|
||
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|
||
|----------|-------------------|----------------|-----------------|
|
||
| Copy PowerShell logic verbatim into a `Task.Run` | Fast initial port, works locally | Inherits all silent failures, no cancellation, no progress reporting | Never — always re-examine the logic |
|
||
| `async void` command handlers | Compiles and runs | Exceptions crash app silently; no cancellation propagation | Only for WPF event entry points that immediately call `async Task` |
|
||
| Direct `ExecuteQuery()` without retry | Simpler call site | Crashes on throttling for real client tenants | Never — use `ExecuteQueryRetryAsync` |
|
||
| Single shared `AuthenticationManager` instance | Simple instantiation | Token cache race conditions under concurrent operations | Only if all operations are strictly sequential (initial MVP, clearly documented) |
|
||
| Load entire list into memory before display | Simple binding | `OutOfMemoryException` on libraries with 50k+ items | Only for lists known to be small and bounded (e.g., profiles list) |
|
||
| No `CancellationToken` propagation | Simpler method signatures | Operations cannot be cancelled; UI stuck waiting | Never for operations > 2 seconds |
|
||
| Hard-code English fallback strings in code | Quick to write | Breaks FR locale; strings diverge from key system | Never — always use resource keys |
|
||
|
||
---
|
||
|
||
## Integration Gotchas
|
||
|
||
| Integration | Common Mistake | Correct Approach |
|
||
|-------------|----------------|------------------|
|
||
| PnP Framework `GetContext` | Calling on UI thread synchronously | Always `await Task.Run(() => authManager.GetContext(...))` or use `GetContextAsync` |
|
||
| MSAL token cache (multi-tenant) | One `IPublicClientApplication` per call | One `IPublicClientApplication` per ClientId, long-lived, with `MsalCacheHelper` wired |
|
||
| SharePoint list enumeration | No `RowLimit` in `CamlQuery` | Always paginate with `RowLimit` ≤ 2 000 and `ListItemCollectionPosition` |
|
||
| Graph SDK paging | Calling `.GetAsync()` on collections without `$top` | Use `PageIterator` or explicit `.Top(n)` on every collection request |
|
||
| PnP `ExecuteQueryRetryAsync` | Forgetting to `await`; using synchronous `ExecuteQuery` | Always `await ctx.ExecuteQueryRetryAsync()` |
|
||
| WPF `ObservableCollection` | Modifying from `Task.Run` lambda | Collect into `List<T>`, then assign via `Dispatcher.InvokeAsync` |
|
||
| PnP Management Shell client ID | Using the shared PnP app ID in a multi-tenant production tool | Register a dedicated Azure AD app per deployment; don't rely on PnP's shared registration |
|
||
| SharePoint Search API (KQL) | No result limit, assuming all results returned | Always set `RowLimit`; results capped at 500 per page, max 50 000 total |
|
||
|
||
---
|
||
|
||
## Performance Traps
|
||
|
||
| Trap | Symptoms | Prevention | When It Breaks |
|
||
|------|----------|------------|----------------|
|
||
| Loading all `ObservableCollection` items before displaying any | UI freezes until entire operation completes | Use `IProgress<T>` to stream items as they arrive; enable UI virtualization | Any list > ~500 items |
|
||
| WPF virtualization disabled by `ScrollViewer.CanContentScroll=False` or grouping | DataGrid scroll is sluggish with 200+ rows | Never disable `CanContentScroll`; set `VirtualizingPanel.IsVirtualizingWhenGrouping=True` | > 200 rows in a DataGrid |
|
||
| Adding items to `ObservableCollection` one-by-one from background | Thousands of UI binding notifications; UI jank | Batch-load: assign `new ObservableCollection<T>(list)` once | > 50 items added in a loop |
|
||
| Permissions scan without depth limit | Scan takes hours on deep folder structures | Default depth 3–4; show estimated time; require explicit user override for deeper | Sites with > 5 folder levels |
|
||
| HTML report built entirely in memory | `OutOfMemoryException` or report generation takes minutes | Stream HTML to file; write rows as they are produced, not after full scan | > 10 000 rows in report |
|
||
| Sequential site processing for multi-site reports | Report for 20 sites takes 20× single-site time | Process up to 3 sites concurrently with `SemaphoreSlim`; show per-site progress | > 5 sites selected |
|
||
| Duplicate `Connect-PnPOnline` calls per operation | Redundant browser popups or token refreshes | Cache authenticated `ClientContext` per (tenant, clientId) for session lifetime | Any operation that reconnects unnecessarily |
|
||
|
||
---
|
||
|
||
## Security Mistakes
|
||
|
||
| Mistake | Risk | Prevention |
|
||
|---------|------|------------|
|
||
| Storing Client ID in plaintext JSON profile | Low on its own (Client ID is not a secret), but combined with tenant URL it eases targeted phishing | Document that Client ID is not a secret; optionally encrypt the profile file with DPAPI `ProtectedData.Protect` for defence-in-depth |
|
||
| Writing temp files with tenant credentials to `%TEMP%` | File readable by other processes on the same user account; not cleaned up on crash | Use `SecureString` in-memory for transient auth data; delete temp files in `finally` blocks; prefer named pipes or in-memory channels |
|
||
| No validation of tenant URL format before connecting | Typo sends auth token to wrong endpoint; user confused by misleading auth error | Validate against regex `^https://[a-zA-Z0-9-]+\.sharepoint\.com` before any connection attempt |
|
||
| Logging full exception messages that include HTTP request URLs | Tenant URLs and item paths exposed in log files readable on shared machines | Strip or redact SharePoint URLs in log output at `Debug` level; keep them out of `Information`-level user-visible logs |
|
||
| Bundling PnP Management Shell client ID (shared multi-tenant app) | App uses a shared identity not owned by the deploying organisation; harder to audit and revoke | Require each deployment to use a dedicated app registration; document the registration steps clearly |
|
||
|
||
---
|
||
|
||
## UX Pitfalls
|
||
|
||
| Pitfall | User Impact | Better Approach |
|
||
|---------|-------------|-----------------|
|
||
| No cancellation for operations > 5 seconds | User closes app via Task Manager; loses in-progress results; must restart | Every operation exposed in UI must accept a `CancellationToken`; show a "Cancel" button that is always enabled during operation |
|
||
| Progress bar with no ETA or item count | User cannot judge whether to wait or cancel | Show "Scanned X of Y sites" or "X items found"; update every 0.5 s minimum |
|
||
| Error messages showing raw exception text | Non-technical admin users see stack traces and `ServerException: CSOM call failed` | Translate known error types to plain-language messages; offer a "Copy technical details" link for support escalation |
|
||
| Silent success on bulk operations with partial failures | User thinks all 50 members were added; 12 failed silently | Show a per-item result summary: "38 added successfully, 12 failed — see details" |
|
||
| Language switches require app restart | FR-speaking users see flickering English then French on startup | Load correct language before any UI is shown; apply language from settings before `InitializeComponent` |
|
||
| Permissions report jargon ("Full Control", "Contribute", "Limited Access") shown raw | Non-technical stakeholders do not understand the report | Map SharePoint permission levels to plain-language equivalents in the report output; keep raw names in a "technical details" expandable section |
|
||
|
||
---
|
||
|
||
## "Looks Done But Isn't" Checklist
|
||
|
||
- [ ] **Multi-tenant session switching:** Verify that switching from Tenant A to Tenant B does not return Tenant A's data. Test with two real tenants, not two sites in the same tenant.
|
||
- [ ] **Operation cancellation:** Verify that pressing Cancel stops the operation within 2 seconds and leaves no zombie threads or unreleased `ClientContext` objects.
|
||
- [ ] **5 000+ item libraries:** Verify permissions report and storage scan complete without `ServerException` on a real library with > 5 000 items (not a test tenant with 50 items).
|
||
- [ ] **Self-contained EXE on clean machine:** Install the EXE on a machine with no .NET runtime installed; verify startup and a complete workflow before every release.
|
||
- [ ] **JSON file corruption recovery:** Corrupt a profile JSON file manually; verify the app starts, logs the corruption, does not silently return empty profiles, and preserves the backup.
|
||
- [ ] **Concurrent writes:** Simultaneously trigger "Save profile" and "Export settings" from two rapid button clicks; verify neither file is truncated.
|
||
- [ ] **Large HTML reports:** Generate a permissions report for a site with > 5 000 items; verify the HTML file opens in a browser in < 10 seconds and the DataGrid is scrollable.
|
||
- [ ] **FR locale completeness:** Switch to French; verify no UI string shows an untranslated key or hardcoded English text.
|
||
- [ ] **Throttling recovery:** Simulate a 429 response; verify the operation pauses, logs "Retrying in Xs", and completes successfully after the retry interval.
|
||
|
||
---
|
||
|
||
## Recovery Strategies
|
||
|
||
| Pitfall | Recovery Cost | Recovery Steps |
|
||
|---------|---------------|----------------|
|
||
| Async/sync deadlocks introduced in foundation | HIGH — requires refactoring all affected call chains | Identify all `.Result`/`.Wait()` calls with a codebase grep; convert bottom-up (services first, then ViewModels) |
|
||
| Silent failures ported from PowerShell | MEDIUM — requires audit of every catch block | Search all `catch` blocks; classify each as log-and-recover, log-and-rethrow, or log-and-surface; fix one feature at a time |
|
||
| Token cache corruption | LOW — clear the cache file and re-authenticate | Expose a "Clear cached sessions" action in the UI; document in troubleshooting guide |
|
||
| JSON profile file corruption | LOW if backup exists, HIGH if no backup | Implement write-then-replace before first release; add backup-on-corrupt logic to deserializer |
|
||
| WPF trimming breaks EXE | MEDIUM — need to republish with trimming disabled | Update publish profile, re-run publish, retest EXE on clean machine |
|
||
| Missing pagination on large lists | MEDIUM — need to refactor per-feature enumeration | Create shared pagination helper; replace calls feature by feature; test each against 6 000-item library |
|
||
|
||
---
|
||
|
||
## Pitfall-to-Phase Mapping
|
||
|
||
| Pitfall | Prevention Phase | Verification |
|
||
|---------|------------------|--------------|
|
||
| Sync/async deadlocks on UI thread | Phase 1: Foundation — establish async-first patterns | Code review checklist: no `.Result`/`.Wait()` in any ViewModel or event handler |
|
||
| Silent error suppression replication | Phase 1: Foundation — define error handling strategy and base types | Automated lint rule (Roslyn analyser or SonarQube) flagging empty catch blocks |
|
||
| SharePoint 5 000-item threshold | Phase 1: Foundation — write shared paginator; reused in all features | Integration test against 6 000-item library for every feature that enumerates lists |
|
||
| Multi-tenant token cache race | Phase 1: Foundation — auth layer with `MsalCacheHelper` | Test: two concurrent operations on different tenants return correct data |
|
||
| ObservableCollection cross-thread updates | Phase 1: Foundation — define progress-reporting pattern | Automated test: populate collection from background thread; verify no exception |
|
||
| WPF trimming breaks EXE | Final distribution phase | CI step: run published EXE on a clean Windows VM, assert startup and one workflow completes |
|
||
| Async void command handlers | Phase 1: Foundation — establish MVVM base with `AsyncRelayCommand` | Code review: no `async void` in ViewModel files |
|
||
| API throttling unhandled | Phase 1: Foundation — retry handler; applied by every feature | Load test: run storage scan against a tenant with rate-limiting; verify retry log entry |
|
||
| Resource disposal gaps | Phase 1: Foundation — context acquisition pattern | Unit test: cancel a long operation mid-run; verify `ClientContext.Dispose` called |
|
||
| JSON concurrent write corruption | Phase 1: Foundation — write-then-replace + `SemaphoreSlim` | Stress test: 100 concurrent save calls; verify file always parseable after all complete |
|
||
|
||
---
|
||
|
||
## Sources
|
||
|
||
- PnP Framework GitHub issue #961: `AuthenticationManager.GetContext` freeze in C# desktop app — https://github.com/pnp/pnpframework/issues/961
|
||
- PnP Framework GitHub issue #447: `AuthenticationManager.GetContext` hanging in ASP.NET — https://github.com/pnp/pnpframework/issues/447
|
||
- Microsoft Learn: Token cache serialization (MSAL.NET) — https://learn.microsoft.com/en-us/entra/msal/dotnet/how-to/token-cache-serialization
|
||
- Microsoft Learn: SharePoint Online list view threshold — https://learn.microsoft.com/en-us/troubleshoot/sharepoint/lists-and-libraries/items-exceeds-list-view-threshold
|
||
- Microsoft Learn: Single-file publishing overview — https://learn.microsoft.com/en-us/dotnet/core/deploying/single-file/overview
|
||
- dotnet/wpf GitHub issue #4216: `PublishTrimmed` causes `Unhandled Exception` in self-contained WPF app — https://github.com/dotnet/wpf/issues/4216
|
||
- dotnet/wpf GitHub issue #6096: Trimming for WPF — https://github.com/dotnet/wpf/issues/6096
|
||
- Microsoft .NET Blog: Await, and UI, and deadlocks — https://devblogs.microsoft.com/dotnet/await-and-ui-and-deadlocks-oh-my/
|
||
- Microsoft Learn: AsyncRelayCommand (CommunityToolkit.Mvvm) — https://learn.microsoft.com/en-us/dotnet/communitytoolkit/mvvm/asyncrelaycommand
|
||
- Microsoft Learn: Graph SDK paging — https://learn.microsoft.com/en-us/graph/sdks/paging
|
||
- Microsoft Learn: Graph throttling guidance — https://learn.microsoft.com/en-us/graph/throttling
|
||
- Rick Strahl's Web Log: Async and Async Void Event Handling in WPF — https://weblog.west-wind.com/posts/2022/Apr/22/Async-and-Async-Void-Event-Handling-in-WPF
|
||
- Existing codebase CONCERNS.md audit (2026-04-02) — `.planning/codebase/CONCERNS.md`
|
||
|
||
---
|
||
|
||
*Pitfalls research for: C#/WPF SharePoint Online administration desktop tool (PowerShell-to-C# rewrite)*
|
||
*Researched: 2026-04-02*
|