Microsoft Certified Partner
Skip Navigation Links.
 

Book Review: Microsoft Windows Internals, Fourth Edition

by svante@axantum.com 2008-02-07 12:46

Microsoft Windows Internals, Fourth Edition, by Mark E. Russinovich and David A. Salomon, Microsoft Press, LOCCN 2004115221

Many years ago, before the release of NT 3.1, I read a book entitled "Inside Windows NT" by Helen Custer. It was a great book, basically a text-book on operating system theory - as exemplified by Windows NT. It covered the theory of how to implement an operating system kernel, showing how it was done in Windows NT. It did not talk about API's so much as about the data structures and logic behind the scenes and the theory of the basic functions of an operating system such as memory mamangement and the IO system.

As I'm now getting back into some heavy-duty C++ coding for the Windows environment, I thought this might be a good refresher for me to (re-)learn about internal structures and enable me to find the right places to implement the functionality I need.

With these expectations I was a bit disappointed by "Windows Internals, Fourth Edition". It's a very different kind of book compared to the original first edition - in fact it's not the fourth edition of "Inside Windows NT" - it's really the second or third edition of "Windows Internals". So, what kind of book is it then?

"Windows Internals" is a cross between a troubleshooting manual for very advanced system managers, a hackers memoirs, an applied users guide to sysinternals utilities and the documentation Microsoft didn't produce for Windows.

It's almost like an independent black-box investigators' report of findings after many years of peering into the internals of Windows - from the outside. Instead of describing how Windows is designed from the designers point of view, it describes a process of external discovery based on reverse-engineering and observation. Instead of just describing how it works, the book focuses on "experiments" whereby with the help of a bunch of very nifty utilities from sysinternals you can "see" how it works.

I find the approach a little strange, I was expecting a more authoritative text, not an experimental guide to 'discovery'. I don't think one should use experimental approaches to learning about a piece of commercial software. Software is an engineering practice - and it should be described, not discovered. It should not be a research project to find out how Windows works - it should be be a matter of reading documentation and backgrounders, which was what I was hoping for when purchasing the book.

Having read all 870 pages, what did I learn? I learnt that sysinternals (http://technet.microsoft.com/en-us/sysinternals/default.aspx) has some very cool utilities (which I already knew), and I learnt a bit about how they do what they do, and how to use them to inspect the state of a Windows system for troubleshooting purposes. As such, it should really be labelled "The essential sysinternals companion", because that's what it really is. It shows you a zillion ways to use the utilities for troubleshooting. Which is all well and good as it goes and very useful in itself.

To summarize, this is not really the book to read if you want to get an authoritative reference about the Windows operating system, although you will learn quite a bit along the way - after all, there is quite a bit of information here. If you're a system manager and/or facing extremely complicated troubleshooting scenarios, then this book is indeed for you. Also, if you're a more practical-minded person, and just want to discover the 'secrets' of Windows, you'll find all the tools here. I would have preferred that Microsoft documented things, instead of leaving it for 'discovery' (and then hiring the people doing the discovering if they're to good at it, and then make them write a book about - which is what happend here).

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

General | Programming

Lock object sharing with hashes

by svante@axantum.com 2008-01-08 23:53

In web applications we frequently need to serialize access to resources in concurrent applications, since ASP.NET is inherently concurrent. A typical scenario is that we have several loosely connected objects that all apply to the same user, and we need to ensure single-threaded access during a read-read or read-modify-write cycle to get a consistent view or update.

A user is usually represented via some sort of string identifier, perhaps an e-mail address. In C# what we what to do is something like:

lock (user)
{
    // Do something that needs single-thread access
}

The problem is what are we to use as a lock object? C# can use any object as a lock, but which one to pick? We must ensure that multiple threads will always get the right object instance, regardless of when in the application life-time the need arises, so in effect these objects must live for the life of the application. This can lead to massive memory consumption, assume a system with one million users - after a while we'll have to keep one million objects around probably in hash table indexed by the e-mail string. That can mean some serious memory problems.

One approach would be to clean up the table when no-one is using a specific lock object, but this is complicated and fraught with it's own threading problems.

After a few false starts, I came up with the following scheme which has now been tested in the wild and been found quite effective as a trade-off between memory and lock contention.

In actual fact, there are usually a rather limited number of possible concurrent actions, limited basically by the number of threads that are active. This number is typically 100 per processor in ASP.NET, and in most applications even with many users the number of actual concurrent requests at any given time is even fewer. So, assuming a 100 concurrent threads, and assuming that they will only acquire one user lock (our example here) at a time, we really only need at most 100 lock objects - not a million. But how to do this?

The algorithm I've come up with is probably not new, but I've not seen it before in the literature nor when actively searching on the web, so it's at least a bit unusual. Here's how it works:

1. Allocate an array with a fixed number of elements, perhaps twice the number of estimated concurrent accesses.
2. Fill the array with objects to be used as locks.
3. To acquire a lock for a given user, generate an index into the lock object array by taking a hash of the user identifier typically with the GetHashCode() method and then take that module the number of lock objects. This is the index into the lock table, use the indexed object and lock.

At best, you'll get a free lock and acquire the lock.

At second best, another thread is actually holding the lock for the same user, and your thread is put on hold as it should be.

At worst, another thread is actually holding the lock but for a different user that happens to use the same lock when calculating the index via the hash modulo the lock array size. By having good hash algorithms and an appropriate number of locks in relation to the number of concurrent accesses, this should be a very infrequent occurrence. But even if it happens, nothing bad happens except that your thread will have to wait a little longer than was absolutely necessary.

This simple algorithm will require a fixed number of locks in relation to the level of concurrency instead of the number of potential objects that require locking, and at a very low cost. Sample code follows:

    public class LockObjects
    {
        private object[] _lockObjects;
 
        public LockObjects(int numberOfLockObjects)
        {
            _lockObjects = GetInitialLockObjects(numberOfLockObjects);
        }
 
        public LockObjects()
            : this(20)
        {
        }
 
        private object[] GetInitialLockObjects(int numberOfLockObjects)
        {
            object[] lockObjects = new object[numberOfLockObjects];
            for (int i = 0; i < lockObjects.Length; ++i)
            {
                lockObjects[i] = new object();
            }
            return lockObjects;
        }
 
        public virtual object GetLockObject(params string[] keys)
        {
            int lockHash = 0;
            foreach (string key in keys)
            {
                lockHash += key.ToLowerInvariant().GetHashCode();
            }
            lockHash = Math.Abs(lockHash) % _lockObjects.Length;
 
            return _lockObjects[lockHash];
        }
    }

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

General | ASP.NET | C# | Programming

Book Review: XSLT 2.0 Programmer's Reference Third Edition

by svante@axantum.com 2007-11-26 18:18

XSLT 2.0 Programmer's Reference Third Edition, by Michael Kay, Wiley Publishing, Inc., ISBN 0-7645-6909-0

XSLT, or XSL, is a subject that I'm no expert in but I've come across it from time to time and generally have had a hard time to really grasp the how and the why of it. In most cases I can program, or at least tweak, just about anything with very little introduction. Fixing and tweaking the XSLT stylesheets that I've come upon has been a tougher experience where I've felt myself reduced to guesswork and magic. That's not a feeling I like, so I decided to do some background studying.

A Programmer's Reference is perhaps not the first choice as an introduction to a subject, but in this case it was hard to find just where to start, and I felt that I was experienced enough to go for some core literature from the start, which also would have the benefit of being useful in a real situation as reference literature.

Since I'm a newcomer to XSLT this review will have to be both about the book as such, and also about the subject matter. Let's start with the book.

Michael Kay is certainly an authority, being the editor of the XSLT 2.0 Working Group. The book is also authoritative and extremely carefully written with an extraordinary focus on details. I did find a few typos, errors and editorials mistakes but taking the amount of text into account it's still a very, very good piece of work.

This book is not written to be read cover to cover, which I did, but it's still not a bad way to get a thorough introduction to XLST. Be prepared for quite a few hours though, I spent about 20 hours reading this book. It's entitled XSLT 2.0, and was written before XSLT was actually approved as an official recommendation which it was on 23 January 2007. I've not checked, but there are sure to be some minor differences between the final recommendation and the drafts upon which the book was written. In consequence being such a recent standard, there are very few XLST 2.0 compliant implementations in existance, so XSLT 1.0 is still very much in use. The book is careful to keep track of differences and changes, and should work well for XLST 1.0 use as well.

It's very heavy reading indeed, but if you only want to get one book about XSLT 2.0 this is very probably the one to get.

The real question though that I must raise after reading this and getting a good feel for XSLT is: Do you want to get any book about XLST at all?

XSLT is about XML transformation, or actually transformation in general. It doesn't really have to be from or to XML, it can be from plain text to HTML or any number of other combinations depending on the requirements and capabilities of parsers and processors available. This is obviously extremely useful - to be able to massage data from and to different forms, and frequently used in one-off applications and in various integration projects. The target use of XSLT is also to fit in along CSS 2.0 as a way to perform formatting for presentation that is not possible with CSS - that's why it's called XML Stylesheet Transformations.

So XSLT certainly address an important area. However, sadly, I must conclude that it's not a very good tool in my opinion. Even if supplemented with good development environments with color coded and syntax checking editors, it's still simply not very human eye-friendly. Too many angle brackets and colons one might say. Syntax does matter! The real problem though is that it's a functional programming language, not a procedural language, and this simply does not lend itself to performing complex tasks in the real world.

Functional languages focus on defining the program in terms of functions that are state-less and without variables. Everything is defined as functions without side-effects, that is to say, each call to a function with the same parameters will always return the same result. Iteration is replaced with recursion - even when iteration is the natural way to address a problem, because an iterator must be updated for each round in a loop, and you can't do that. This means that while anything can be programmed in a functional language, it must frequently be done in ways that are not well known to the majority of developers.

There's a reason why functional languages like Lisp, ML and Scheme have not become commerically successful, although loved by the academic community for decades. Basically I think it's a question about maintenance and complexity. In the real world of commercial programming, the systems must be maintained for decades by perhaps 1000's of different developers over the years. This has always been an uphill task, but no functional language with the possible exception of Erlang has succeeded in combining expressive power, with robustness, documentability and maintainability.

XLST is certainly expressive, but I categorize it as being of the class of write-once and write-only languages. Integrated with XPath 2.0 it's possible to write programs that are so smart, that even the author will have trouble understanding them the next day.

There's nothing wrong with the basic concept of defining a standard way of transforming documents between different representations, and making it possible to choose between doing the processing in the browser or on the server. It's neat and it's cool. However, doing anything but non-trivial transformations is a maintenance nightmare. That the functional programming model is very little known among main-stream developers does not make it any better.

Somehow, it feels like XSLT is 90% geared towards the internal needs of the W3C - it's used extensively to format and publish the specifications for the various specifications published by the W3C. But, this actually means as far as I can judge, that the specifiations are written in raw XML using plain text editors and doing the markup manually - something that won't exactly work for any other organization.

So, unfortunately, in the end I feel that XSLT 2.0 is a technology that's elegant, but will never be used on a wider scale. However, if you do have the situation of having many, many documents in some kind of structured format (not necessarily XML surprisingly enough) and want to transform them to XML or XML-like format like HTML, then XSLT may well be just what you need. Be prepared for a very high entrance cost though, and rest assured that as author of the stylesheets you'll have a very high level of job-security.

There are also serious performance issues with XSLT, due to the functional style of programming, compilers and optimizers have a hard time generating decent code for the underlying procedural architecture of our computers. In theory, functional programming could come into it's own performance wise as multi-core architectures become more common because it does make it easier to realize parallell computation, but today other problems overshadow, and in most cases I'm fairly sure that performance will in many cases be unacceptable with XSLT.

So, to summarize: If you want to learn and use XSLT 1.0 or 2.0, this book is probably the one to get, but you should not assume that XSLT is a silver bullet for XML transformation, there are many caveats.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

General | ASP.NET | C# | Programming

Programming it real

Name of author

When I'm not riding my bike, I keep fairly busy trying to make a living as a self-employed programmer.

E-mail me Send mail

Recent comments