-
Notifications
You must be signed in to change notification settings - Fork 90
Description
The purpose of this issue is to share some learnings (and some salt about DbgHelp) that I have gained in the past few days furthering my descent into madness my understanding of symbols; whether any of this should be considered an issue that prompts changes to DbgShell or an endorsement of any particular approach I leave to the readers' judgement.
SymbolSearchPerf.cs.txt
- SymSearch results are a superset of SymEnumSymbols - there is no symbol that SymEnumSymbols will return that SymSearch can not also return.
"But wait!" you say, "I have done searches that get results from SymEnumSymbols but not SymSearch!" And indeed, you are correct. So I shall share the sad tale of a simple, unassuming symbol, known to its friends as mshtml!g_hIsolatedHeap.
> Get-DbgSymbol "mshtml!g_hIsolatedHeap" -UseSymEnum
0x642b1ba0 g_hIsolatedHeap
As expected. And what about SymSearch?
> Get-DbgSymbol "mshtml!g_hIsolatedHeap" -UseSymSearch
Nothing. Have I lied to you? No! I have not! It is DbgHelp that lies! This symbol's proper name is not mshtml!g_hIsolatedHeap at all, but in truth is mshtml!_g_hIsolatedHeap!
> Get-DbgSymbol "mshtml!_g_hIsolatedHeap" -UseSymSearch
0x642b1ba0 g_hIsolatedHeap
SymEnumSymbols succceeds where SymSearch does not because DbgHelp secretly inserts search globs to cover for the lies it tells! But at what cost, you ask? "No, I didn-" Yes, you did. But unfortunately, there are so many different ways that DbgHelp is heckin' slow, you can't unambiguously pin it to one single thing† - but one can at least quantify how slow the various ways of calling into it to ask about our friend mshtml!_g_hIsolatedHeap are!
| Method | Mean | Error | Ratio |
|---|---|---|---|
| FindSymbol_Enum | 1,751,811.693 us | 77,915.3682 us | 1.000 |
| FindSymbol_Enum_ExplicitStar | 1,225,950.550 us | 22,614.1927 us | 0.691 |
| FindSymbol_Enum_NoType | 1,167,999.680 us | 17,032.5053 us | 0.668 |
| FindSymbol_Enum_Mangled_SymOpts | 705,314.202 us | 34,711.8878 us | 0.421 |
| FindSymbol_Enum_PublicsOnly_Mangled_SymOpts | 670,834.033 us | 16,164.5092 us | 0.384 |
| FindSymbol_Search_ForcedStarGlob | 3,409,318.860 us | 37,855.2651 us | 1.949 |
| FindSymbol_Search_SecretUncodumentedWildcard | 3,017,108.117 us | 156,676.2463 us | 1.736 |
| FindSymbol_Search_Mangled_SymOpts | 368,730.864 us | 6,861.2060 us | 0.210 |
(spoiler alert: Benchmarkdotnet picked microseconds for this table because some of the rows I cut out actually need it)
There are few things you can tell from that table (should the names happen to make sense to you, or should you reference against the attached code). First is that with the default settings DbgHelp does try to match against your original search terms before searching again with the injected star glob. Next you might notice that SymSearch manages to take twice as long (unless you set symopts to search against still-"decorated" streams, a trival concession for our symbol with no decoration to be had). I'm not really sure how it does that. One might also notice that there are undocumented wildcards (that one might want to, you know, escape). In this case, I used #, which seems to mean "match the preceding character 0 or 1 times"‡, but there's also some handling for a [a-z] style syntax.
Now, if one were to examine DbgHelp more closely††, one might notice that the public apis seem to pretty much just be messy wrappers aound DIA functions. And there is an undocumented SymGetDiaSession, what might one get from that? Crashes, unless one also notices the SymFreeDiaString and surmises that DbgHelp's DIA is special and doesn't use SysAllocString to allocate "BSTR" values.
| Method | Mean | Error | Ratio |
|---|---|---|---|
| findChildren_Mangled_DiaComInterop | 2,816.546 us | 45.3609 us | 0.002 |
| findChildren_Mangled_DiaCppCli | 111.870 us | 0.5413 us | 0.000 |
| findChildren_Unmangled_DiaCppCli | 338,699.820 us | 6,055.1215 us | 0.194 |
| findChildren_Mangled_DiaCppCli_BuiltinSearchGlob | 146,313.182 us | 7,739.2609 us | 0.084 |
Wow, using .NET COM interop with an exact search string gets us an answer 100 times faster (as long as we don't look at any string properties...). C++/CLI our way around the string issue and we get a whopping 3,000 times faster. Even undecoration & search globs can't get us as slow as the DbgHelp APIs go. But wait! There's more... most of those 112 microsecond were just fetching the DiaSession!
| Method | Mean | Error | Ratio |
|---|---|---|---|
| findChildren_Mangled_DiaCppCli_CachedSession | 4.629 us | 0.0850 us | 0.000 |
But why stop there? If you profile it, it's painfully obvious that DbgHelp's DIA is hamstrung by some inefficient UTF8 -> UTF16 conversion logic... and the disassembly tarot cards will tell you that the version of DIA in msdia140 somewhat improves on that conversion, in addition to a statically linked towlower... so how would that fare?
| Method | Mean | Error | Ratio |
|---|---|---|---|
| findChildren_Mangled_msdia140CppCli | 3.573 us | 0.1180 us | 0.000 |
| findChildren_Unmangled_msdia140CppCli | 361,605.138 us | 7,670.6590 us | 0.206 |
| findChildren_Mangled_msdia140CppCli_BuiltinSearchGlob | 140,708.329 us | 5,434.0552 us | 0.080 |
Well, shoot. Those numbers are rather remarkably close to DbgHelp's... I profiled them to make sure I wasn't accidentally double testing. I'm starting to suspect there was a reason the DbgHelp DIA used LocalAlloc instead of SysAllocString... msdia140 will use LocalAlloc if you use a secret alternate GUID to get your IDiaDataSource but I think I should post this and go to sleep before I get lured into testing that
( SymbolSearchPerf.cs.txt contains the benchmark code, which should compile & run against the head of my personal DbgShell repository)
† aside from the dynamically linked towlower call in DbgHelp!nextToken
‡ except in mshtml!_#_hIsolatedHeap where it means "get stuck in an infinite loop"
†† Not by diassembling it, of course, that would be against the terms of use!