Files
Sharepoint-Toolbox/.planning/research/PITFALLS.md
2026-04-08 10:57:27 +02:00

67 KiB
Raw Blame History

Pitfalls Research

Domain: C#/WPF SharePoint Online administration desktop tool (PowerShell-to-C# rewrite) Researched: 2026-04-02 Confidence: HIGH (critical pitfalls verified via official docs, PnP GitHub issues, and known existing codebase problems)


Critical Pitfalls

Pitfall 1: Calling PnP/CSOM Methods Synchronously on the UI Thread

What goes wrong: AuthenticationManager.GetContext(), ExecuteQuery(), and similar PnP Framework / CSOM calls are blocking network operations. If called directly on the WPF UI thread — even inside a button click handler — the entire window freezes until the call completes. This is precisely what causes the UI freezes in the current PowerShell app, and the problem migrates verbatim into C# if async patterns are not used from day one.

A subtler variant: using .Result or .Wait() on a Task from the UI thread. The UI thread holds a SynchronizationContext; the async continuation needs that same context to resume; deadlock ensues. The application hangs with no exception and no feedback.

Why it happens: Developers migrating from PowerShell think in sequential terms and instinctively port one-liner calls directly to event handler bodies. The WPF framework does not prevent synchronous blocking — it just stops processing messages, which looks like a freeze.

How to avoid:

  • Every SharePoint/PnP call must be wrapped in await Task.Run(...) or use the async overloads directly (ExecuteQueryRetryAsync, GetContextAsync).
  • Never use .Result, .Wait(), or Task.GetAwaiter().GetResult() on the UI thread.
  • Establish a project-wide convention: all ViewModels execute SharePoint operations through async Task methods with CancellationToken parameters. Codify this in architecture docs from Phase 1.
  • Use ConfigureAwait(false) in all service/repository layer code (below ViewModel level) so continuations do not need to return to the UI thread unnecessarily.

Warning signs:

  • Any void method containing a PnP call.
  • Any Task.Result or .Wait() in ViewModel or code-behind.
  • Button click handlers that are not async.
  • Application hangs for seconds at a time when switching tenants or starting operations.

Phase to address: Foundation/infrastructure phase (first phase). This pattern must be established before any feature work begins. Retrofitting async throughout a codebase is one of the most expensive rewrites possible.


Pitfall 2: Replicating Silent Error Suppression from the PowerShell Original

What goes wrong: The existing codebase has 38 empty catch blocks and 27 instances of -ErrorAction SilentlyContinue. During a rewrite, developers under time pressure port the "working" behavior, which means they replicate the silent failures. The C# version appears to work in demos but hides the same class of bugs: group member additions that silently did nothing, storage scans that silently skipped folders, JSON loads that silently returned empty defaults from corrupted files.

Why it happens: Port-from-working-code instinct. The original returned a result (even if wrong), so the C# version is written to also return a result without questioning whether an error was swallowed. Also, try { ... } catch (Exception) { } in C# is syntactically shorter and less ceremonial than PowerShell's equivalent, making it easy to write reflexively.

How to avoid:

  • Treat every catch block as code that requires a positive decision: log and recover, log and rethrow, or log and surface to the user. A catch that does none of these three things is a bug.
  • Adopt a structured logging pattern (e.g., ILogger<T> with Microsoft.Extensions.Logging) from Phase 1 so logging is never optional.
  • Create a custom SharePointOperationException hierarchy that preserves original exceptions and adds context (which site, which operation, which user) before rethrowing. This prevents exception swallowing during the port.
  • In PR reviews, flag any empty or log-only catch blocks that do not surface the error to the user as a defect.

Warning signs:

  • Any catch (Exception ex) { } with no body.
  • Any catch block that only calls _logger.LogWarning but returns a success result to the caller.
  • Operations that complete in < 1 second when they should take 510 seconds (silent skip).
  • Users reporting "the button did nothing" with no error shown.

Phase to address: Foundation/infrastructure phase. Define the error handling strategy and base exception types before porting any features.


Pitfall 3: SharePoint List View Threshold (5 000 Items) Causing Unhandled Exceptions

What goes wrong: Any CSOM or PnP Framework call that queries a SharePoint list without explicit pagination throws a Microsoft.SharePoint.Client.ServerException with message "The attempted operation is prohibited because it exceeds the list view threshold" when the list contains more than 5 000 items. In the current PowerShell code this is partially masked by -ErrorAction SilentlyContinue. In C# it becomes an unhandled exception that crashes the operation unless explicitly caught and handled.

Real tenant libraries with 5 000+ files are common. Permissions reports, storage scans, and file search are all affected.

Why it happens: Developers test against small tenant sites during development. The threshold is not hit, tests pass, the feature ships. First production use against a real client library fails.

How to avoid:

  • All GetItems, GetListItems, and folder-enumeration calls must use CamlQuery with RowLimit set to a page size (5002 000), iterating with ListItemCollectionPosition until exhausted.
  • For Graph SDK paths, use the PageIterator pattern; never call .GetAsync() on a collection without a $top parameter.
  • The storage recursion function (Collect-FolderStorage equivalent) must default to depth 34, not 999, and show estimated time before starting.
  • Write an integration test against a seeded list of 6 000 items before shipping each feature that enumerates list items.

Warning signs:

  • Any GetItems call without a CamlQuery with explicit RowLimit.
  • Any Graph SDK call to list items without .Top(n).
  • ServerException appearing in logs from client sites but not in dev testing.

Phase to address: Each feature phase that touches list enumeration (permissions, storage, file search). The pagination helper should be a shared utility written in the foundation phase and reused everywhere.


Pitfall 4: Multi-Tenant Token Cache Race Conditions and Stale Tokens

What goes wrong: The design requires cached authentication sessions so users can switch between client tenants without re-authenticating. MSAL.NET token caches are not thread-safe by default. If two background operations run concurrently against different tenants, cache read/write races produce corrupted cache state, silent auth failures, or one tenant's token being used for another tenant's request.

A secondary problem: when an Azure AD app registration's permissions change (e.g., a new Graph scope is granted), MSAL returns the cached token for the old scope. The operation fails with a 403 but looks like a permissions error, not a stale cache error, sending the developer on a false debugging path.

Why it happens: Multi-tenant caching is not covered in most MSAL.NET tutorials, which show single-tenant flows. The token cache API (TokenCacheCallback, BeforeAccessNotification, AfterAccessNotification) is low-level and easy to implement incorrectly.

How to avoid:

  • Use Microsoft.Identity.Client.Extensions.Msal (MsalCacheHelper) for file-based, cross-process-safe token persistence. This is the Microsoft-recommended approach for desktop public client apps.
  • The AuthenticationManager instance in PnP Framework accepts a tokenCacheCallback; wire it to MsalCacheHelper so cache is persisted safely per-tenant.
  • Scope the IPublicClientApplication instance per-ClientId (app registration), not per-tenant URL. Different tenants share the same client app but have different account entries in the cache.
  • Implement an explicit "clear cache for tenant" action in the UI so users can force re-authentication when permissions change.
  • Never share a single AuthenticationManager instance across concurrent operations on different tenants without locking.

Warning signs:

  • Intermittent 401 or 403 errors that resolve after restarting the app.
  • User reports "wrong tenant data shown" (cross-tenant token bleed).
  • MsalUiRequiredException thrown only on the second or third operation of a session.

Phase to address: Authentication/multi-tenant infrastructure phase (early, before any feature uses the auth layer).


Pitfall 5: WPF ObservableCollection Updates from Background Threads

What goes wrong: Populating a DataGrid or ListView bound to an ObservableCollection<T> from a background Task or Task.Run throws a NotSupportedException: "This type of CollectionView does not support changes to its SourceCollection from a thread different from the Dispatcher thread." The exception crashes the background operation. If it is swallowed (see Pitfall 2), the UI simply does not update.

This maps directly to the current app's runspace-to-UI communication via synchronized hashtables polled by a timer. The C# version must use the Dispatcher or the MVVM toolkit equivalently.

Why it happens: In a Task.Run lambda, the continuation runs on a thread pool thread, not the UI thread. Developers add items to the collection inside that lambda. It works in small-scale testing (timing may work) but fails under load.

How to avoid:

  • Never add items to an ObservableCollection<T> from a non-UI thread.
  • Preferred pattern: collect results into a plain List<T> on the background thread, then await Application.Current.Dispatcher.InvokeAsync(() => { Items = new ObservableCollection<T>(list); }) in one atomic swap.
  • For streaming progress (show items as they arrive), use BindingOperations.EnableCollectionSynchronization with a lock object at initialization, then add items with the lock held.
  • Use IProgress<T> with Progress<T> (captures the UI SynchronizationContext at construction) to report incremental results safely.

Warning signs:

  • InvalidOperationException or NotSupportedException in logs referencing CollectionView.
  • UI lists that do not update despite background operation completing.
  • Items appearing out of order or partially in lists.

Phase to address: Foundation/infrastructure phase. Define the progress-reporting and collection-update patterns before porting any feature that returns lists of results.


Pitfall 6: WPF Trimming Breaks Self-Contained EXE

What goes wrong: Publishing a WPF app as a self-contained single EXE with PublishTrimmed=true silently removes types that WPF and XAML use via reflection at runtime. The app compiles and publishes successfully but crashes at startup or throws TypeInitializationException when opening a window whose XAML references a type that was trimmed. PnP Framework and MSAL also use reflection heavily; trimming removes their internal types.

Why it happens: The .NET trimmer performs static analysis and removes code it cannot prove is referenced. XAML data binding, converters, DataTemplateSelector, IValueConverter, and DynamicResource are resolved at runtime via reflection — the trimmer cannot see these references.

How to avoid:

  • Do not use PublishTrimmed=true for WPF + PnP Framework + MSAL projects. The EXE will be larger (~150 MB self-contained is expected and acceptable per PROJECT.md).
  • Use PublishSingleFile=true with SelfContained=true and IncludeAllContentForSelfExtract=true, but without trimming. This bundles the runtime into the EXE correctly.
  • Verify the single-file output in CI by running the EXE on a clean machine (no .NET installed) before each release.
  • Set <PublishReadyToRun>true</PublishReadyToRun> for startup performance improvement instead of trimming.

Warning signs:

  • Publish profile has <PublishTrimmed>true</PublishTrimmed>.
  • "Works on dev machine, crashes on client machine" with TypeInitializationException or MissingMethodException.
  • EXE is suspiciously small (< 50 MB for a self-contained WPF app).

Phase to address: Distribution/packaging phase. Establish the publish profile with correct flags before any release packaging work.


Pitfall 7: Async Void in Command Handlers Swallows Exceptions

What goes wrong: In WPF, button Click event handlers are void-returning delegates. Developers writing async void handlers (e.g., private async void OnRunButtonClick(...)) create methods where exceptions thrown after an await are raised on the SynchronizationContext rather than returned as a faulted Task. These exceptions cannot be caught by a caller and will crash the process (or be silently eaten by Application.DispatcherUnhandledException without the stack context needed to debug them).

Why it happens: MVVM ICommand requires a void Execute(object parameter) signature. New C# developers write async void Execute(...) without understanding the consequence. The CommunityToolkit.Mvvm provides AsyncRelayCommand to solve this correctly, but it is not the obvious choice.

How to avoid:

  • Never write async void anywhere in the codebase except the required WPF event handler entry points in code-behind, and only when those entry points immediately delegate to an async Task ViewModel method.
  • Use AsyncRelayCommand from CommunityToolkit.Mvvm for all commands that invoke async operations. It wraps the Task, exposes ExecutionTask, IsRunning, and IsCancellationRequested, and handles exceptions via AsyncRelayCommandOptions.FlowExceptionsToTaskScheduler.
  • Wire a global Application.DispatcherUnhandledException handler and TaskScheduler.UnobservedTaskException handler that log full stack traces and show a user-facing error dialog. This is the last line of defense.

Warning signs:

  • Any async void method outside of a MainWindow.xaml.cs entry point.
  • Commands implemented as async void Execute(...) in ViewModels.
  • Exceptions that appear in logs with no originating ViewModel context.

Phase to address: Foundation/infrastructure phase (MVVM base classes and command patterns established before any feature code).


Pitfall 8: SharePoint API Throttling Not Handled (429/503)

What goes wrong: SharePoint Online and Microsoft Graph enforce per-app, per-tenant throttling. Bulk operations (permissions scan across 50+ sites, storage scan on 10 000+ folders, bulk member additions) generate enough API calls to trigger HTTP 429 or 503 responses. Without explicit retry-after handling, the operation fails partway through with an unhandled HttpRequestException and leaves the user with partial results and no indication of how to resume.

Why it happens: PnP.PowerShell handled this invisibly for the PowerShell app. PnP Framework in C# does have built-in retry via ExecuteQueryRetryAsync, but developers unfamiliar with C#-side PnP may use the raw CSOM ExecuteQuery() or direct HttpClient calls that lack this protection.

How to avoid:

  • Always use ExecuteQueryRetryAsync (never ExecuteQuery) for all CSOM batch calls.
  • When using Graph SDK, use the GraphServiceClient with the default retry handler enabled — it handles 429 with Retry-After header respect automatically.
  • For multi-site bulk operations, add a short delay (100300 ms) between site connections to avoid burst throttling. Implement a configurable concurrency limit (default: sequential or max 3 parallel).
  • Surface throttling events in the progress log: "Rate limited, retrying in 15s…" so the user knows the operation is paused, not hung.

Warning signs:

  • Raw ExecuteQuery() calls anywhere in the codebase.
  • HttpRequestException with 429 status in logs.
  • Operations that fail consistently at the same approximate item count across multiple runs.

Phase to address: Foundation/infrastructure phase for the retry handler; each feature phase must use the established pattern.


Pitfall 9: Resource Disposal Gaps in Long-Running Operations

What goes wrong: ClientContext objects returned by AuthenticationManager.GetContext() are IDisposable. If a background Task is cancelled or throws an exception mid-operation, a ClientContext created in the try block is not disposed if the finally block is missing. Over a long session (MSP workflow: dozens of tenant switches, multiple scans), leaked ClientContext objects accumulate unmanaged resources and eventually cause connection refusals or memory degradation. This is the C# equivalent of the runspace disposal gaps in the current codebase.

Why it happens: using statements are the idiomatic C# solution, but they do not compose well with async cancellation. Developers use try/catch without finally, or structure the code so the using scope is exited before the Task completes.

How to avoid:

  • Always obtain ClientContext inside a using statement or await using if using C# 8+ disposable pattern: await using var ctx = await authManager.GetContextAsync(url, token).
  • Wrap the entire operation body in try/finally with disposal in the finally block when await using is not applicable.
  • When a CancellationToken is triggered, let the OperationCanceledException propagate naturally; the using / finally will still execute.
  • Add a unit test for the "cancelled mid-operation" path that verifies ClientContext.Dispose() is called.

Warning signs:

  • GetContext calls without using.
  • catch (Exception) { return; } that bypasses a ClientContext created earlier in the method.
  • Memory growth over a multi-hour MSP session visible in Task Manager.

Phase to address: Foundation/infrastructure phase (define the context acquisition pattern) and validated in each feature phase.


Pitfall 10: JSON Settings Corruption on Concurrent Writes

What goes wrong: The app writes profiles, settings, and templates to JSON files on disk. If the user triggers two rapid operations (e.g., saves a profile while a background scan completes and updates settings), both code paths may attempt to write the same file simultaneously. The second write overwrites a partially-written first write, producing a truncated or syntactically invalid JSON file. On next startup, the file fails to parse and silently returns empty defaults — erasing all user profiles.

This is a known bug in the current app (CONCERNS.md: "Profile JSON file: no transaction semantics").

Why it happens: File I/O is not inherently thread-safe. System.Text.Json's JsonSerializer.SerializeAsync writes to a stream but does not protect the file from concurrent access by another code path.

How to avoid:

  • Serialize all writes to each JSON file through a single SemaphoreSlim(1) per file. Acquire before reading or writing, release in finally.
  • Use write-then-replace: write to filename.tmp, validate the JSON by deserializing it, then File.Move(tmp, original, overwrite: true). An interrupted write leaves the original intact.
  • On startup, if the primary file is invalid, check for a .tmp or .bak version before falling back to defaults — and log which fallback was used.

Warning signs:

  • Profile file occasionally empty after normal use.
  • JsonException on startup that the user cannot reproduce on demand.
  • App loaded with correct profiles yesterday, empty profiles today.

Phase to address: Foundation/infrastructure phase (data access layer). Must be solved before any feature persists data.


Technical Debt Patterns

Shortcut Immediate Benefit Long-term Cost When Acceptable
Copy PowerShell logic verbatim into a Task.Run Fast initial port, works locally Inherits all silent failures, no cancellation, no progress reporting Never — always re-examine the logic
async void command handlers Compiles and runs Exceptions crash app silently; no cancellation propagation Only for WPF event entry points that immediately call async Task
Direct ExecuteQuery() without retry Simpler call site Crashes on throttling for real client tenants Never — use ExecuteQueryRetryAsync
Single shared AuthenticationManager instance Simple instantiation Token cache race conditions under concurrent operations Only if all operations are strictly sequential (initial MVP, clearly documented)
Load entire list into memory before display Simple binding OutOfMemoryException on libraries with 50k+ items Only for lists known to be small and bounded (e.g., profiles list)
No CancellationToken propagation Simpler method signatures Operations cannot be cancelled; UI stuck waiting Never for operations > 2 seconds
Hard-code English fallback strings in code Quick to write Breaks FR locale; strings diverge from key system Never — always use resource keys

Integration Gotchas

Integration Common Mistake Correct Approach
PnP Framework GetContext Calling on UI thread synchronously Always await Task.Run(() => authManager.GetContext(...)) or use GetContextAsync
MSAL token cache (multi-tenant) One IPublicClientApplication per call One IPublicClientApplication per ClientId, long-lived, with MsalCacheHelper wired
SharePoint list enumeration No RowLimit in CamlQuery Always paginate with RowLimit ≤ 2 000 and ListItemCollectionPosition
Graph SDK paging Calling .GetAsync() on collections without $top Use PageIterator or explicit .Top(n) on every collection request
PnP ExecuteQueryRetryAsync Forgetting to await; using synchronous ExecuteQuery Always await ctx.ExecuteQueryRetryAsync()
WPF ObservableCollection Modifying from Task.Run lambda Collect into List<T>, then assign via Dispatcher.InvokeAsync
PnP Management Shell client ID Using the shared PnP app ID in a multi-tenant production tool Register a dedicated Azure AD app per deployment; don't rely on PnP's shared registration
SharePoint Search API (KQL) No result limit, assuming all results returned Always set RowLimit; results capped at 500 per page, max 50 000 total

Performance Traps

Trap Symptoms Prevention When It Breaks
Loading all ObservableCollection items before displaying any UI freezes until entire operation completes Use IProgress<T> to stream items as they arrive; enable UI virtualization Any list > ~500 items
WPF virtualization disabled by ScrollViewer.CanContentScroll=False or grouping DataGrid scroll is sluggish with 200+ rows Never disable CanContentScroll; set VirtualizingPanel.IsVirtualizingWhenGrouping=True > 200 rows in a DataGrid
Adding items to ObservableCollection one-by-one from background Thousands of UI binding notifications; UI jank Batch-load: assign new ObservableCollection<T>(list) once > 50 items added in a loop
Permissions scan without depth limit Scan takes hours on deep folder structures Default depth 34; show estimated time; require explicit user override for deeper Sites with > 5 folder levels
HTML report built entirely in memory OutOfMemoryException or report generation takes minutes Stream HTML to file; write rows as they are produced, not after full scan > 10 000 rows in report
Sequential site processing for multi-site reports Report for 20 sites takes 20× single-site time Process up to 3 sites concurrently with SemaphoreSlim; show per-site progress > 5 sites selected
Duplicate Connect-PnPOnline calls per operation Redundant browser popups or token refreshes Cache authenticated ClientContext per (tenant, clientId) for session lifetime Any operation that reconnects unnecessarily

Security Mistakes

Mistake Risk Prevention
Storing Client ID in plaintext JSON profile Low on its own (Client ID is not a secret), but combined with tenant URL it eases targeted phishing Document that Client ID is not a secret; optionally encrypt the profile file with DPAPI ProtectedData.Protect for defence-in-depth
Writing temp files with tenant credentials to %TEMP% File readable by other processes on the same user account; not cleaned up on crash Use SecureString in-memory for transient auth data; delete temp files in finally blocks; prefer named pipes or in-memory channels
No validation of tenant URL format before connecting Typo sends auth token to wrong endpoint; user confused by misleading auth error Validate against regex ^https://[a-zA-Z0-9-]+\.sharepoint\.com before any connection attempt
Logging full exception messages that include HTTP request URLs Tenant URLs and item paths exposed in log files readable on shared machines Strip or redact SharePoint URLs in log output at Debug level; keep them out of Information-level user-visible logs
Bundling PnP Management Shell client ID (shared multi-tenant app) App uses a shared identity not owned by the deploying organisation; harder to audit and revoke Require each deployment to use a dedicated app registration; document the registration steps clearly

UX Pitfalls

Pitfall User Impact Better Approach
No cancellation for operations > 5 seconds User closes app via Task Manager; loses in-progress results; must restart Every operation exposed in UI must accept a CancellationToken; show a "Cancel" button that is always enabled during operation
Progress bar with no ETA or item count User cannot judge whether to wait or cancel Show "Scanned X of Y sites" or "X items found"; update every 0.5 s minimum
Error messages showing raw exception text Non-technical admin users see stack traces and ServerException: CSOM call failed Translate known error types to plain-language messages; offer a "Copy technical details" link for support escalation
Silent success on bulk operations with partial failures User thinks all 50 members were added; 12 failed silently Show a per-item result summary: "38 added successfully, 12 failed — see details"
Language switches require app restart FR-speaking users see flickering English then French on startup Load correct language before any UI is shown; apply language from settings before InitializeComponent
Permissions report jargon ("Full Control", "Contribute", "Limited Access") shown raw Non-technical stakeholders do not understand the report Map SharePoint permission levels to plain-language equivalents in the report output; keep raw names in a "technical details" expandable section

"Looks Done But Isn't" Checklist

  • Multi-tenant session switching: Verify that switching from Tenant A to Tenant B does not return Tenant A's data. Test with two real tenants, not two sites in the same tenant.
  • Operation cancellation: Verify that pressing Cancel stops the operation within 2 seconds and leaves no zombie threads or unreleased ClientContext objects.
  • 5 000+ item libraries: Verify permissions report and storage scan complete without ServerException on a real library with > 5 000 items (not a test tenant with 50 items).
  • Self-contained EXE on clean machine: Install the EXE on a machine with no .NET runtime installed; verify startup and a complete workflow before every release.
  • JSON file corruption recovery: Corrupt a profile JSON file manually; verify the app starts, logs the corruption, does not silently return empty profiles, and preserves the backup.
  • Concurrent writes: Simultaneously trigger "Save profile" and "Export settings" from two rapid button clicks; verify neither file is truncated.
  • Large HTML reports: Generate a permissions report for a site with > 5 000 items; verify the HTML file opens in a browser in < 10 seconds and the DataGrid is scrollable.
  • FR locale completeness: Switch to French; verify no UI string shows an untranslated key or hardcoded English text.
  • Throttling recovery: Simulate a 429 response; verify the operation pauses, logs "Retrying in Xs", and completes successfully after the retry interval.

Recovery Strategies

Pitfall Recovery Cost Recovery Steps
Async/sync deadlocks introduced in foundation HIGH — requires refactoring all affected call chains Identify all .Result/.Wait() calls with a codebase grep; convert bottom-up (services first, then ViewModels)
Silent failures ported from PowerShell MEDIUM — requires audit of every catch block Search all catch blocks; classify each as log-and-recover, log-and-rethrow, or log-and-surface; fix one feature at a time
Token cache corruption LOW — clear the cache file and re-authenticate Expose a "Clear cached sessions" action in the UI; document in troubleshooting guide
JSON profile file corruption LOW if backup exists, HIGH if no backup Implement write-then-replace before first release; add backup-on-corrupt logic to deserializer
WPF trimming breaks EXE MEDIUM — need to republish with trimming disabled Update publish profile, re-run publish, retest EXE on clean machine
Missing pagination on large lists MEDIUM — need to refactor per-feature enumeration Create shared pagination helper; replace calls feature by feature; test each against 6 000-item library

Pitfall-to-Phase Mapping

Pitfall Prevention Phase Verification
Sync/async deadlocks on UI thread Phase 1: Foundation — establish async-first patterns Code review checklist: no .Result/.Wait() in any ViewModel or event handler
Silent error suppression replication Phase 1: Foundation — define error handling strategy and base types Automated lint rule (Roslyn analyser or SonarQube) flagging empty catch blocks
SharePoint 5 000-item threshold Phase 1: Foundation — write shared paginator; reused in all features Integration test against 6 000-item library for every feature that enumerates lists
Multi-tenant token cache race Phase 1: Foundation — auth layer with MsalCacheHelper Test: two concurrent operations on different tenants return correct data
ObservableCollection cross-thread updates Phase 1: Foundation — define progress-reporting pattern Automated test: populate collection from background thread; verify no exception
WPF trimming breaks EXE Final distribution phase CI step: run published EXE on a clean Windows VM, assert startup and one workflow completes
Async void command handlers Phase 1: Foundation — establish MVVM base with AsyncRelayCommand Code review: no async void in ViewModel files
API throttling unhandled Phase 1: Foundation — retry handler; applied by every feature Load test: run storage scan against a tenant with rate-limiting; verify retry log entry
Resource disposal gaps Phase 1: Foundation — context acquisition pattern Unit test: cancel a long operation mid-run; verify ClientContext.Dispose called
JSON concurrent write corruption Phase 1: Foundation — write-then-replace + SemaphoreSlim Stress test: 100 concurrent save calls; verify file always parseable after all complete

Sources


Pitfalls research for: C#/WPF SharePoint Online administration desktop tool (PowerShell-to-C# rewrite) Researched: 2026-04-02



v2.2 Pitfalls: Report Branding & User Directory

Milestone: v2.2 — HTML report branding (MSP/client logos) + user directory browse mode Researched: 2026-04-08 Confidence: HIGH for logo handling and Graph pagination (multiple authoritative sources); MEDIUM for print CSS specifics (verified via MDN/W3C but browser rendering varies)

These pitfalls are specific to adding logo branding to the existing HTML export services and replacing the people-picker search with a full directory browse mode. They complement the v1.0 foundation pitfalls above.


Critical Pitfalls (v2.2)

Pitfall v2.2-1: Base64 Logo Encoding Bloats Every Report File

What goes wrong: The five existing HTML export services (HtmlExportService, UserAccessHtmlExportService, StorageHtmlExportService, SearchHtmlExportService, DuplicatesHtmlExportService) are self-contained by design — no external dependencies. The natural instinct is to embed logos as inline data:image/...;base64,... strings in the <style> or <img src> tag of every report. This works, but base64 encoding inflates image size by ~33%. A 200 KB PNG logo becomes 267 KB of base64 text, inlined into every single exported HTML file. An MSP generating 10 reports per client per month accumulates significant bloat per file, and the logo data is re-read, re-encoded, and re-concatenated into the StringBuilder on every export call.

The secondary problem is that StringBuilder.AppendLine with a very long base64 string (a 500 KB logo becomes ~667 KB of text) causes a single string allocation of that size per report, wasted immediately after the file is written.

Why it happens: The "self-contained HTML" design goal (no external files) is correct for portability. Developers apply it literally and embed every image inline. They test with a small 20 KB PNG and never notice. Production logos from clients are often 300600 KB originals.

Consequences:

  • Report files 300700 KB larger than necessary — not catastrophic, but noticeable when opening in a browser.
  • Logo bytes are re-allocated in memory on every export call — fine for occasional use, wasteful in batch scenarios.
  • If the same logo is stored in AppSettings or TenantProfile as a raw file path, it is read from disk and re-encoded on every export. File I/O error at export time if the path is invalid.

Prevention:

  1. Enforce a file size limit at import time: reject logos > 512 KB. Display a warning in the settings UI. This keeps base64 strings under ~700 KB worst case.
  2. Cache the base64 string. Store it in the AppSettings/TenantProfile model as the pre-encoded base64 string (not the original file path), so it is computed once on import and reused on every export. TenantProfile and AppSettings already serialize to JSON — base64 strings serialize cleanly.
  3. Enforce image dimensions in the import UI: warn if the image is wider than 800 px and suggest the user downscale. A 200×60 px logo at 72 dpi is sufficient for an HTML report header.
  4. When reading from the JSON-persisted base64 string, do not re-decode and re-encode. Inject it directly into the <img src="data:image/png;base64,{cachedBase64}"> tag.

Detection:

  • Export a report and check the generated HTML file size. If it is > 100 KB before any data rows are added, the logo is too large.
  • Profile BuildHtml with a 500 KB logo attached — memory allocation spike is visible in the .NET diagnostic tools.

Phase to address: Logo import/settings phase. The size validation and pre-encoding strategy must be established before any export service is modified to accept logo parameters. If the export services are modified first with raw file-path injection, every caller must be updated again later.


Pitfall v2.2-2: Graph API Full Directory Listing Requires Explicit Pagination — 999-User Hard Cap Per Page

What goes wrong: The existing GraphUserSearchService uses $filter with startsWith and $top=10 — a narrow search, not a full listing. The new user directory browse mode needs to fetch all users in a tenant. Graph API GET /users returns a maximum of 999 users per page (not 1000 — the valid range for $top is 1999). Without explicit pagination using @odata.nextLink, the call silently returns at most 999 users regardless of tenant size. A 5 000-user tenant appears to have 999 users in the directory with no error or indication of truncation.

Why it happens: Developers see $top=999 and assume a single call returns everything for "normal" tenants. The Graph SDK's .GetAsync() call returns a UserCollectionResponse with a Value list and an OdataNextLink property. If OdataNextLink is not checked, pagination stops after the first page. The existing SearchUsersAsync intentionally returns only 10 results — the pagination concern was never encountered there.

Consequences:

  • The directory browse mode silently shows fewer users than the tenant contains.
  • An MSP auditing a 3 000-user client tenant sees only 999 users with no warning.
  • Guest/service accounts in the first 999 may appear; those after page 1 are invisible.

Prevention: Use the Graph SDK's PageIterator<User, UserCollectionResponse> for all full directory fetches. This is the Graph SDK's built-in mechanism for transparent pagination:

var users = new List<User>();
var response = await graphClient.Users.GetAsync(config =>
{
    config.QueryParameters.Select = new[] { "displayName", "userPrincipalName", "mail", "userType" };
    config.QueryParameters.Top = 999;
    config.QueryParameters.Orderby = new[] { "displayName" };
}, ct);

var pageIterator = PageIterator<User, UserCollectionResponse>.CreatePageIterator(
    graphClient,
    response,
    user => { users.Add(user); return true; },
    request => { request.Headers.Add("ConsistencyLevel", "eventual"); return request; });

await pageIterator.IterateAsync(ct);

Always pass CancellationToken through the iterator. For tenants with 10 000+ users, this will make multiple sequential API calls — surface progress to the user ("Loading directory... X users loaded").

Detection:

  • Request $count=true with ConsistencyLevel: eventual on the first page call. Compare the returned @odata.count to the number of items received after full iteration. If they differ, pagination was incomplete.
  • Test against a tenant with > 1 000 users before shipping the directory browse feature.

Phase to address: User directory browse implementation phase. The interface IGraphUserSearchService will need a new method GetAllUsersAsync alongside the existing SearchUsersAsync — do not collapse them.


Pitfall v2.2-3: Graph API Directory Listing Returns Guest, Service, and Disabled Accounts Without Filtering

What goes wrong: GET /users returns all user objects in the tenant: active members, disabled accounts, B2B guest users (userType eq 'Guest'), on-premises sync accounts, and service/bot accounts. In an MSP context, a client's SharePoint tenant may have dozens of guest users from external collaborators and several service accounts (e.g., sharepoint@clientdomain.com, MicrosoftTeams@clientdomain.com). If the directory browse mode shows all 3 000 raw entries, admins spend time scrolling past noise to find real staff.

Filtering on userType helps for guests but there is no clean Graph filter for "service accounts" — it is a convention, not a Graph property. There is also no Graph filter for disabled accounts from the basic $filter syntax without ConsistencyLevel: eventual.

Why it happens: The people-picker search in v1.1 is text-driven — the user types a name, noise is naturally excluded. A browse mode showing all users removes that implicit filter and exposes the raw directory.

Consequences:

  • Directory appears larger and noisier than expected for MSP clients.
  • Admin selects the wrong account (service account instead of user) and runs an audit that returns no meaningful results.
  • Guest accounts from previous collaborations appear as valid targets.

Prevention: Apply a default filter in the directory listing that excludes obvious non-staff entries, while allowing the user to toggle the filter off:

  • Default: $filter=accountEnabled eq true and userType eq 'Member' — this excludes guests and disabled accounts. Requires no ConsistencyLevel header (supported in standard filter mode).
  • Provide a checkbox in the directory browse UI: "Include guest accounts" that adds or userType eq 'Guest' to the filter.
  • For service account noise: apply a client-side secondary filter that hides entries where displayName contains common service patterns (SharePoint, Teams, No Reply, Admin) — this is a heuristic and should be opt-in, not default.

Note: filtering accountEnabled eq true in the $filter parameter without ConsistencyLevel: eventual works on the v1.0 /users endpoint. Verify before release.

Detection:

  • Count the raw user total vs. the filtered total for a test tenant. If they differ by more than 20%, the default filter is catching real users — review the filter logic.

Phase to address: User directory browse implementation phase, before the UI is built. The filter strategy must be baked into the service interface so the ViewModel does not need to know about it.


Pitfall v2.2-4: Full Directory Load Hangs the UI Without Progress Feedback

What goes wrong: Fetching 3 000 users with page iteration takes 38 seconds depending on tenant size and Graph latency. The existing people-picker search is a debounced 500 ms call that returns quickly. The directory browse "Load All" operation is fundamentally different in character. Without progress feedback, the user sees a frozen list and either waits or clicks the button again (triggering a second concurrent load).

The existing IsBusy / IsRunning pattern on AsyncRelayCommand will disable the button, but there is no count feedback in the existing ViewModel pattern for this case.

Why it happens: Developers implement the API call first, wire it to a button, and test with a 50-user dev tenant where it returns in < 500 ms. The latency problem is only discovered when testing against a real client.

Consequences:

  • On first use with a large tenant, the admin thinks the feature is broken and restarts the app.
  • If the command is not properly guarded, double-clicks trigger two concurrent Graph requests populating the same ObservableCollection.

Prevention:

  • Add a DirectoryLoadStatus observable property: "Loading... X users" updated via IProgress<int> inside the PageIterator callback.
  • Use BindingOperations.EnableCollectionSynchronization on the users ObservableCollection so items can be streamed in as each page arrives rather than waiting for full iteration.
  • The AsyncRelayCommand CanExecute must return false while loading is in progress (the toolkit does this automatically when IsRunning is true — verify it is wired).
  • Add a cancellation button that is enabled during the load, using the same CancellationToken passed to PageIterator.IterateAsync.

Detection:

  • Test with a mock that simulates 10 pages of 999 users each, adding a 200 ms delay between pages. The UI should show incrementing count feedback throughout.

Phase to address: User directory browse ViewModel phase.


Pitfall v2.2-5: Logo File Format Validation Is Skipped, Causing Broken Images in Reports

What goes wrong: The OpenFileDialog filter (*.png;*.jpg;*.jpeg) prevents selecting a .exe file, but it does not validate that the selected file is actually a valid image. A user may select a file that was renamed with a .png extension but is actually a PDF, a corrupted download, or an SVG (which is XML text, not a binary image format). When the file is read and base64-encoded, the string is valid base64, but the browser renders a broken image icon in the HTML report.

WPF's BitmapImage will throw an exception on corrupt or unsupported binary files. SVG files loaded as a BitmapImage throw because SVG is not a WPF-native raster format.

A second failure mode: BitmapImage throws NotSupportedException or FileFormatException for EXIF-corrupt JPEGs. This is a known .NET issue where WPF's BitmapImage is strict about EXIF metadata validity.

Why it happens: The file picker filter is treated as sufficient validation. EXIF corruption is not anticipated because it is invisible to casual inspection.

Consequences:

  • Report is generated successfully from the app's perspective, but every page has a broken image icon where the logo should appear.
  • The user does not see the error until they open the HTML file.
  • EXIF-corrupt JPEG from a phone camera or scanner is a realistic scenario in an MSP workflow.

Prevention: After file selection and before storing the path or encoding:

  1. Load the file as a BitmapImage in a try/catch. If it throws, reject the file and show a user-friendly error: "The selected file could not be read as an image. Please select a valid PNG or JPEG file."
  2. Check BitmapImage.PixelWidth and PixelHeight after load — a 0×0 image is invalid.
  3. For EXIF-corrupt JPEGs: BitmapCreateOptions.IgnoreColorProfile and BitmapCacheOption.OnLoad reduce (but do not eliminate) EXIF-related exceptions. Wrap the load in a retry with these options if the initial load fails.
  4. Do not accept SVG files. The file filter should explicitly include only *.png;*.jpg;*.jpeg;*.bmp;*.gif. SVG requires a third-party library (e.g., SharpVectors) to rasterize — out of scope for this milestone.
  5. After successful load, verify the resulting base64 string decodes back to a valid image (round-trip check) before persisting to JSON.

Detection:

  • Unit test: attempt to load a .txt file renamed to .png and a known EXIF-corrupt JPEG. Verify both are rejected with a user-visible error, not a silent crash.

Phase to address: Logo import/settings phase. Validation must be in place before the logo path or base64 is persisted.


Pitfall v2.2-6: Logo Path Stored in JSON Settings Becomes Stale After EXE Redistribution

What goes wrong: The simplest implementation of logo storage is to persist the file path (C:\Users\admin\logos\msp-logo.png) in AppSettings JSON. This works on the machine where the logo was imported. When the tool is redistributed to another MSP technician (or when the admin reinstalls Windows), the path no longer exists. The export service reads the path, the file is missing, and the logo is silently omitted from new reports — or worse, throws an unhandled FileNotFoundException.

Why it happens: Path storage is the simplest approach. Base64 storage feels "heavy." The problem is only discovered when a colleague opens the tool on their own machine.

Consequences:

  • Client-branded reports stop including the logo without any warning.
  • The user does not know the logo is missing until a client complains about the unbranded report.
  • The AppSettings.DataFolder pattern is already established in the codebase — the team may assume all assets follow the same pattern, but logos are user-supplied files, not app-generated data.

Prevention: Store logos as base64 strings directly in AppSettings and TenantProfile JSON, not as file paths. The import action reads the file once, encodes it, stores the string, and the original file path is discarded after import. This makes the settings file fully portable across machines.

The concern about JSON file size is valid but manageable: a 512 KB PNG becomes ~700 KB of base64, which increases the settings JSON file by that amount. For a tool that already ships as a 200 MB EXE, a 1 MB settings file is acceptable. Document this design decision explicitly.

Alternative if file-path storage is preferred: copy the logo file into a logos/ subdirectory of AppSettings.DataFolder at import time (use a stable filename like msp-logo.png), store only the relative path in JSON, and resolve it relative to DataFolder at export time. This is portable as long as the DataFolder travels with the settings.

Detection:

  • After importing a logo, manually edit AppSettings.json and verify the logo data is stored correctly.
  • Move the settings JSON to a different machine and verify a report is generated with the logo intact.

Phase to address: Logo import/settings phase. The storage strategy must be decided and implemented before any export service accepts logo data.


Moderate Pitfalls (v2.2)

Pitfall v2.2-7: Logo Breaks HTML Report Print Layout

What goes wrong: The existing HTML export services produce print-friendly reports (flat tables, no JavaScript required for static reading). Adding a logo <img> tag to the report header introduces two print layout risks:

  1. Logo too large: An <img> without explicit CSS constraints stretches to its natural pixel size. A 1200×400 px banner image pushes the stats cards and table off the first page, breaking the expected report layout.
  2. Image not printed: Some users open HTML reports and use "Print to PDF." Browsers' print stylesheets apply @media print rules. By default, most browsers print background images but not inline <img> elements with display:none — this is usually not a problem, but logos inside <div> containers with overflow:hidden or certain CSS transforms may be clipped or omitted in print rendering.

Why it happens: Logo sizing is set by the designer in the settings UI but the reports are opened in diverse browsers (Chrome, Edge, Firefox) with varying print margin defaults. The logo is tested visually on-screen but not in a print preview.

Prevention:

  • Constrain all logo <img> elements with explicit CSS: max-height: 60px; max-width: 200px; object-fit: contain;. This prevents the image from overflowing its container regardless of the original image dimensions.
  • Add a @media print block in the report's inline CSS that keeps the logo visible and appropriately sized: @media print { .report-logo { max-height: 48px; max-width: 160px; } }.
  • Use break-inside: avoid on the header <div> containing both logos and the report title so a page break never splits the header from the first stat card.
  • Test "Print to PDF" in Edge (Chromium) before shipping — it is the most common browser for MSP tools on Windows.

Detection:

  • Open a generated report in Edge, use Ctrl+P, check print preview. Verify the logo appears on page 1 and the table is not pushed to page 2 by an oversized image.

Phase to address: HTML report template phase when logo injection is added to BuildHtml.


Pitfall v2.2-8: ConsistencyLevel Header Amplifies Graph Throttling for Directory Listing

What goes wrong: The existing GraphUserSearchService already uses ConsistencyLevel: eventual with $count=true for its startsWith filter query. This is required for the advanced filter syntax. However, applying ConsistencyLevel: eventual to a full directory listing with $top=999 and $orderby=displayName forces Graph to route requests through a consistency-checked path rather than a lightweight read cache. Microsoft documentation confirms this increases the cost of each request against throttling limits.

For a tenant with 10 000 users (11 pages of 999), firing 11 consecutive requests with ConsistencyLevel: eventual is significantly more expensive than 11 standard read requests. Under sustained MSP use (multiple tenants audited back-to-back), this can trigger per-app throttling (HTTP 429) after 23 directory loads in quick succession.

Why it happens: ConsistencyLevel: eventual is already in the existing service and developers copy it to the new GetAllUsersAsync method because it was needed for $count support.

Prevention: For GetAllUsersAsync, evaluate whether ConsistencyLevel: eventual is actually needed:

  • $orderby=displayName on /users does not require ConsistencyLevel: eventual — standard $orderby on displayName is supported without it.
  • $count=true does require ConsistencyLevel: eventual. If user count is needed for progress feedback, request it only on the first page, then use the returned @odata.count value without adding the header to subsequent page requests. The PageIterator does not automatically carry the header to next-link requests — verify this behaviour.
  • If ConsistencyLevel: eventual is not needed for the primary listing, omit it from GetAllUsersAsync. Use it only when $search or $count are required.

Detection:

  • Load the full directory for two different tenants back-to-back. Check for HTTP 429 responses in the Serilog output. If throttling occurs within the first two loads, ConsistencyLevel overhead is the likely cause.

Phase to address: User directory browse service implementation phase.


Pitfall v2.2-9: WPF ListView with 5 000+ Users Freezes Without UI Virtualization

What goes wrong: A WPF ListView or DataGrid bound to an ObservableCollection<DirectoryUser> with 5 000 items renders all 5 000 item containers on first bind if UI virtualization is disabled or inadvertently defeated. This causes a 510 second freeze when the directory loads and ~200 MB of additional memory for the rendered rows, even though only ~20 rows are visible in the viewport.

Virtualization is defeated by any of these common mistakes:

  • The ListView is inside a ScrollViewer that wraps both the list and other content (ScrollViewer.CanContentScroll=False is the kill switch).
  • The ItemsPanel is overridden with a non-virtualizing panel (StackPanel instead of VirtualizingStackPanel).
  • Items are added one-by-one to the ObservableCollection (each addition fires a CollectionChanged notification, causing incremental layout passes — 5 000 separate layout passes are expensive).

Why it happens: The existing people-picker SearchResults collection has at most 10 items — virtualization was never needed and its absence was never noticed. The directory browse ObservableCollection is a different scale.

Prevention:

  • Use a ListView with its default VirtualizingStackPanel (do not override ItemsPanel).
  • Set VirtualizingPanel.IsVirtualizing="True", VirtualizingPanel.VirtualizationMode="Recycling", and ScrollViewer.CanContentScroll="True" explicitly — do not rely on defaults being correct after a XAML edit.
  • Never add items to the collection one-by-one from the background thread. Use BindingOperations.EnableCollectionSynchronization and assign new ObservableCollection<T>(loadedList) in one operation after all pages have been fetched, or batch-swap when each page arrives.
  • For 5 000+ items, add a search-filter input above the directory list that filters the bound ICollectionView — this reduces the rendered item count to a navigable size without requiring the user to scroll 5 000 rows.

Detection:

  • Load a 3 000-user directory into the ListView. Open Windows Task Manager. The WPF process should not spike above 300 MB during list rendering. Scroll should be smooth (60 fps) with recycling enabled.

Phase to address: User directory browse View/XAML phase.


Pitfall v2.2-10: Dual Logo Injection Requires Coordinated Changes Across All Five HTML Export Services

What goes wrong: There are five independent HtmlExportService-style classes, each with its own BuildHtml method that builds the full HTML document from scratch using StringBuilder. Adding logo support means changing all five methods. If logos are added to only two or three services (the ones the developer remembers), the other reports ship without branding. The inconsistency is subtle — the tool "works," but branded exports alternate with unbranded exports depending on which tab generated the report.

Why it happens: Each export service was written independently and shares no base class. There is no shared "HTML report header" component that all services delegate to. Each service owns its complete <!DOCTYPE html> block.

Consequences:

  • Permissions report is branded; duplicates report is not.
  • Client notices inconsistency and questions the tool's reliability.
  • Future changes to the report header (adding a timestamp, changing the color scheme) must be applied to all five files separately.

Prevention: Before adding logo injection to any service, extract a shared HtmlReportHeader helper method (or a small HtmlReportBuilder base class/utility) that generates the <head>, <style>, and branded header <div> consistently. All five services call this shared method with a BrandingOptions parameter (MSP logo base64, client logo base64, report title). This is a refactoring prerequisite — not optional if branding consistency is required.

The refactoring is low-risk: the CSS blocks in all five services are nearly identical (confirmed by reading the code), so consolidation is straightforward.

Detection:

  • After branding is implemented, export one report from each of the five export services. Open all five in a browser side by side and verify logos appear in all five.

Phase to address: HTML report template refactoring phase — this must be done before logo injection, not after.


Minor Pitfalls (v2.2)

Pitfall v2.2-11: User.Read.All Permission Scope May Not Be Granted for Full Directory Listing

What goes wrong: The existing SearchUsersAsync uses startsWith filter queries that work with User.ReadBasic.All (the least-privileged scope for user listing). Full directory browse with all user properties may require User.Read.All, depending on which properties are selected. If the Azure AD app registration used by MSP clients only has User.ReadBasic.All consented (which is sufficient for the v1.1 people-picker), the GetAllUsersAsync call may silently return partial data or throw a 403.

User.ReadBasic.All returns only: displayName, givenName, id, mail, photo, securityIdentifier, surname, userPrincipalName. Requesting accountEnabled or userType (needed for filtering out guests/disabled accounts per Pitfall v2.2-3) requires User.Read.All.

Prevention:

  • Define the exact $select fields needed for the directory browse feature and verify each field is accessible under User.ReadBasic.All before assuming User.Read.All is required.
  • If User.Read.All is required, update the app registration documentation and display a clear message in the tool if the required permission is missing (catch the 403 and surface it as "Insufficient permissions — User.Read.All is required for directory browse mode").
  • Add User.Read.All to the requested scopes in MsalClientFactory alongside existing scopes.

Detection:

  • Test the directory browse against a tenant where the app registration has only User.ReadBasic.All consented. Verify the error message is user-readable, not a raw ServiceException.

Phase to address: User directory browse service interface phase.


Pitfall v2.2-12: Logo Preview in Settings UI Holds a File Lock

What goes wrong: When showing a logo preview in the WPF settings UI using BitmapImage with a file URI (new BitmapImage(new Uri(filePath))), WPF may hold a read lock on the file until the BitmapImage is garbage collected. If the user then tries to re-import a different logo (which involves overwriting the same file), the file write fails with a sharing violation. This is a known WPF BitmapImage quirk.

Prevention: Load logo previews with BitmapCacheOption.OnLoad and set UriSource then call EndInit():

var bitmap = new BitmapImage();
bitmap.BeginInit();
bitmap.UriSource = new Uri(filePath);
bitmap.CacheOption = BitmapCacheOption.OnLoad;
bitmap.EndInit();
bitmap.Freeze(); // Makes it immutable and thread-safe; also releases the file handle

Freeze() is the critical call — it forces the image to be fully decoded into memory and releases the file handle immediately, preventing file locks.

Detection:

  • Import a logo, then immediately try to overwrite the source file using Windows Explorer. Without Freeze(), the file is locked. With Freeze(), the overwrite succeeds.

Phase to address: Settings UI / logo import phase.


Phase-Specific Warnings (v2.2)

Phase Topic Likely Pitfall Mitigation
Logo import + settings persistence Base64 bloat (v2.2-1) + path staleness (v2.2-6) Store pre-encoded base64 in JSON; enforce 512 KB import limit
Logo import + settings persistence Invalid/corrupt image file (v2.2-5) Validate via BitmapImage load before persisting; Freeze() to release handle (v2.2-12)
HTML report template refactoring Inconsistent branding across 5 services (v2.2-10) Extract shared header builder before touching any service
HTML report template Print layout broken by oversized logo (v2.2-7) Add max-height/max-width CSS and @media print block
Graph directory service Silent truncation at 999 users (v2.2-2) Use PageIterator; request $count on first page for progress
Graph directory service Guest/service account noise (v2.2-3) Default filter accountEnabled eq true and userType eq 'Member'; UI toggle for guests
Graph directory service Throttling from ConsistencyLevel header (v2.2-8) Omit ConsistencyLevel: eventual from standard listing; use only when $search or $count required
Graph directory service Missing permission scope (v2.2-11) Verify User.Read.All vs. User.ReadBasic.All against required fields; update app registration docs
Directory browse ViewModel UI freeze during load (v2.2-4) Stream pages via IProgress<int>; cancellable AsyncRelayCommand
Directory browse View (XAML) ListView freeze with 5 000+ items (v2.2-9) Explicit virtualization settings; batch ObservableCollection assignment; filter input

v2.2 Integration Gotchas

Integration Common Mistake Correct Approach
Logo base64 in AppSettings JSON Store file path; re-encode on every export Store pre-encoded base64 string at import time; inject directly into <img src>
BitmapImage logo preview Default BitmapImage constructor holds file lock Use BeginInit/EndInit with BitmapCacheOption.OnLoad and call Freeze()
Graph GetAllUsersAsync Single GetAsync call; no pagination Always use PageIterator<User, UserCollectionResponse>
Graph $top parameter $top=1000 — invalid; silently rounds down Maximum valid value is 999
Graph directory filter No filter — returns all account types Default: accountEnabled eq true and userType eq 'Member'
ConsistencyLevel: eventual Applied to all Graph requests by habit Required only for $search, $filter with non-standard operators, and $count
HTML export services Logo injected in only the modified services Extract shared header builder; all five services use it
WPF ListView with large user list No virtualization settings, items added one-by-one Explicit VirtualizingPanel settings; assign new ObservableCollection<T>(list) once

v2.2 "Looks Done But Isn't" Checklist

  • Logo size limit enforced: Import a 600 KB PNG. Verify the UI rejects it with a clear message and does not silently accept it.
  • Corrupt image rejected: Rename a .txt file to .png and attempt to import. Verify rejection with user-friendly error.
  • Logo portability: Import a logo on machine A, copy the settings JSON to machine B (without the original file), generate a report. Verify the logo appears.
  • All five report types branded: Export one report from each of the five HTML export services. Open all five in a browser and verify logos appear in all.
  • Print layout intact: Open each branded report type in Edge, Ctrl+P, print preview. Verify logo appears on page 1 and table is not displaced.
  • Directory listing complete (large tenant): Connect to a tenant with > 1 000 users. Load the full directory. Verify user count matches the Azure AD count shown in the Azure portal.
  • Directory load cancellation: Start a directory load and click Cancel before it completes. Verify the list shows partial results or is cleared, no crash, and the button re-enables.
  • Guest account filter: Verify guests are excluded by default. Verify the "Include guests" toggle adds them back.
  • ListView performance: Load 3 000 users into the directory list. Verify scroll is smooth and memory use is reasonable (< 400 MB total).
  • FR locale for new UI strings: All logo import labels, error messages, and directory browse UI strings must have FR translations. Verify no untranslated keys appear when FR is active.

v2.2 Sources


v2.2 pitfalls appended: 2026-04-08