Part of what I look for in my research on green data center is techniques that are lasting and have big impacts.
Monitoring systems are complex and many times not effective, but a necessary evil. Why are these systems so hard to use? The aha moment is Monitoring systems have not embraced the fundamental idea of hints in computer systems.
It started reading James Hamilton’s post on Butler Lampson. Curious, I found Butler Lampson’s paper on hints. The paper was written in July 1983. And, here is the part that got my attention.
Use hints to speed up normal execution. A hint, like a cache entry, is the saved result of some computation. It is different in two ways: it may be wrong, and it is not necessarily reached by an associative lookup. Because a hint may be wrong, there must be a way to check its correctness before taking any unrecoverable action. It is checked against the ‘truth’, information that must be correct but can be optimized for this purpose rather than for efficient execution. Like a cache entry, the purpose of a hint is to make the system run faster. Usually this means that it must be correct nearly all the time.
This all makes sense for how Monitoring Systems should be designed.
- Monitoring should speed up execution of changes.
- Speed is traded for accuracy, and monitoring data must have a way to check correctness, because a hint can be wrong. But, any monitoring data could be wrong, yet who designs in monitoring redundancy?
- Monitoring data that must be correct is optimized for its purpose vs. efficient execution.
What is a bit confusing is the paper itself is about hints, and the most useful hint was to use hints.
Hints for Computer System Design[1]
Butler W. Lampson
Computer Science Laboratory
Xerox Palo Alto Research Center
Palo Alto, CA 94304Abstract
Studying the design and implementation of a number of computer has led to some general hints for system design. They are described here and illustrated by many examples, ranging from hardware such as the Alto and the Dorado to application programs such as Bravo and Star.
1. Introduction
Designing a computer system is very different from designing an algorithm:
The external interface (that is, the requirement) is less precisely defined, more complex, and more subject to change.
The system has much more internal structure, and hence many internal interfaces.
The measure of success is much less clear.
The designer usually finds himself floundering in a sea of possibilities, unclear about how one choice will limit his freedom to make other choices, or affect the size and performance of the entire system. There probably isn’t a ‘best’ way to build the system, or even any major part of it; much more important is to avoid choosing a terrible way, and to have clear division of responsibilities among the parts.
I have designed and built a number of computer systems, some that worked and some that didn’t. I have also used and studied many other systems, both successful and unsuccessful. From this experience come some general hints for designing successful systems. I claim no originality for them; most are part of the folk wisdom of experienced designers. Nonetheless, even the expert often forgets, and after the second system [6] comes the fourth one.
[1] This paper was originally presented at the. 9th acm Symposium on Operating Systems Principles and appeared in Operating Systems Review 15, 5, Oct. 1983, p 33-48. The present version is slightly revised.
I can have a lot of fun with this topic. And, I’ll start working on a paper using this method after I have researched it further.