Learning Notes on The Art of Unix Programming 12 - On Optimization Timing and Techniques
=============== Optimization
Regarding optimization, the first question in Unix is when not to optimize, and only then how to optimize.
The most effective optimizations are often things other than optimization itself, such as a clear and clean design.
12.1 Don't Just Do Something, Stand There The programmer's toolkit. Reasons not to optimize include waiting for hardware to upgrade itself in a few months, freeing up time for more meaningful tasks such as algorithm complexity, clean and clear design, structured design, and transparent, visible design. 12.2 Estimate First, Then Optimize If there is concrete evidence that an application is running slowly, then (and only then) should you consider optimizing the code. But before implementing, estimate first. Unix has profilers: make good use of them. Interpreting profiler diagnostics is an art. 12.3 The Perils of Non-Locality The most effective way to optimize code is to keep it short and simple. Never throw core data structures and time-critical loops out of the cache. Small is beautiful. Instruction loading often takes more time than execution. 12.4.1 Batch Processing Persistent service daemons are more typical Unix-style batch processing examples. There are two reasons to write persistent daemons (as opposed to CLI servers that start a fresh session with each invocation): one is obvious, and one is a bit more subtle. The obvious reason is to control updates to shared resources. The less obvious reason is that background programs, even if not processing updates, can amortize the cost of reading a backend database across multiple requests. -------------------------------- Why does the author particularly like analyzing examples like email? Is it because email services are exceptionally well-implemented in Unix, making them representative of Unix in networking, or is the author's background in networking and email? Including DNS, P0P3, SMTP, IMAP XML markup language: protocol language; --------------------------------- 12.4 Achieving Both: Low Cache, High Throughput
Cache operation results to reduce login latency. For example, the use of binary caches can eliminate the overhead associated with parsing text database files. Some Unix variants already use this technique to speed up access to password information.
Problems Arising: All code involving binary caches must check the timestamps of two files; if the primary text is updated, the cache must be updated accordingly. In other words, all changes to the primary text must be made through a wrapper capable of updating the binary format. Once this approach is adopted, the SPOT principle will lead us to discover all its drawbacks. Redundant data indicates that this storage is not economical—it's purely a speed optimization. But the real problem is that code designed to ensure consistency between the cache and the primary text is highly prone to vulnerabilities and bugs. Frequently updated cache files can lead to elusive race conditions simply due to second-level timestamp resolution.
Cached copies. The more complex the update pattern of the primary text, the more prone to vulnerabilities the synchronization code becomes. Several Unix variants that use caching techniques to accelerate access to critical system databases are notorious for frequent system administrator 'horror stories,' which precisely reflects this point (this sentence and example are very impactful, indicating the author is not speaking idly but has ample evidence). Conclusion: In summary, binary cache files are an unstable technique and should be avoided as much as possible. [Bottleneck]. When you believe caching is urgently needed, the wise approach is to consider it at a deeper level and ask why caching is necessary. This is much easier than accounting for all the boundary conditions of caching.