Update: Getting Started with OpenSL on Android

A while ago I made a post about the use of OpenSL on Android. That post has an error, outlined below:

While the OpenSL documentation for the SL_PLAYEVENT_HEADATEND event seems to suggest that the sound has been processed when the event is fired, this isn’t actually the case, at least on Android. That event fires when the sound’s underlying buffer has been processed, but that’s different from the sound having actually been played out of the speakers. In most cases, the difference is negligible.

However, when playing sounds at certain frequencies (which likely vary from system to system, as I believe it’s certain multiples of the hardware’s native output sampling rate), the playback engine will render the audio in a mode where it will process the source data into some sort of intermediate buffer and then play the audio back from that. In that case, the SL_PLAYEVENT_HEADATEND event will be delivered a significant fraction of a second before the audio makes it to the speakers. Stopping the player immediately will, in those cases, clip off the end of your sound.

Unfortunately, there’s no nice way to work around this. The correct solution is to first keep track of when you started playing the sound:

#ifdef TARGET_ANDROID
(*player_buf_q)->Enqueue( player_buf_q, clip_buffer, clip_size );
#endif
 
is_playing = true;
is_done_buffer = false;
 
(*player)->SetPlayState( player, SL_PLAYSTATE_PLAYING );
play_time = current_time();

and to then only stop the player when you have both received the SL_PLAYEVENT_HEADATEND event and enough time has elapsed for the sound to play plus a few milliseconds (1-2ms seems sufficient) to account for latency within the audio pipeline:

if( is_playing && is_done_buffer &&
    current_time() - play_time > clip_length + two_milliseconds )
{
    (*player)->SetPlayState( player, SL_PLAYSTATE_STOPPED );
 
#ifdef TARGET_ANDROID
    (*player_buf_q)->Clear( player_buf_q );
#endif
 
    is_playing = false;
}

Make sure you implement current_time using a monotonic system-time based timer. You don’t want to time this against a clock which might jump around if the user crosses between time zones or otherwise tinkers with the system clock.

One final note: the original code technically has a data race in it. The is_done_buffer variable is accessed on multiple threads and isn’t protected in any way. The code, as written, shouldn’t normally produce errors, but it might start miscompiling in new versions of GCC or under higher levels of optimization. If you’re writing your code in C++, I’d strongly recommend redeclaring it as type std::atomic<bool>.

Filed in Android, code | Comment Now

WebGL Index Validation

If you’ve ever browsed through the WebGL spec, you’ve likely seen section 6: Differences Between WebGL and OpenGL ES 2.0. Right at the top of that section, we find section 6.1: Buffer Object Binding. That section reads as follows:

In the WebGL API, a given buffer object may only be bound to one of the ARRAY_BUFFER or ELEMENT_ARRAY_BUFFER binding points in its lifetime. This restriction implies that a given buffer object may contain either vertices or indices, but not both.

The type of a WebGLBuffer is initialized the first time it is passed as an argument to bindBuffer. A subsequent call to bindBuffer which attempts to bind the same WebGLBuffer to the other binding point will generate an INVALID_OPERATION error, and the state of the binding point will remain untouched.

This is in stark contrast to the language in the glBindBuffer documentation for both OpenGL and OpenGL ES:

Once created, a named buffer object may be re-bound to any target as often as needed. However, the GL implementation may make choices about how to optimize the storage of a buffer object based on its initial binding target.

The reason for the discrepancy is security, or rather the lack of security in most OpenGL (ES included) implementations. The basic OpenGL standard states that out-of-range access of any resource type results in undefined behavior, and performance-minded implementers the wold over historically took this to mean that it’s OK to crash the process or even the entire operating system when given invalid indices. (I still remember a few years back when I could consistently blue-screen my computer with a given combination of small vertex buffers and an index buffer with many huge index values in it.)

While the situation’s been improving ever since DX10-grade hardware started coming out (Microsoft mandated deterministic non-crashing behavior for out of range access on DX10-level hardware, and that safety net’s been leaking into GL implementations ever since), we’re still not at a point where OpenGL implementations could be considered secure against DoS (or worse) attacks. (In fact, newer OpenGL specifications make safe array access an optional feature.)

Since web browsers deal with inherently untrustworthy content and have to support all sorts of GL implementations (everything from old implementations that didn’t care at all about security to newer implementations with buggy safety features), the WebGL specification mandates that browsers strictly validate all input data before it’s sent to the graphics driver. And that includes the contents of all index (ELEMENT_ARRAY) buffers.

The restriction in 6.1 exists to make it easier for WebGL implementers to validate input indices and cache the results of that validation.

You can find further discussion of the issue here, on the WebGL mailing list.

WebGL makes other restrictions in the name of security as well. Check out section 4 of the spec for details.

Filed in code, graphics | Comment Now

HenchLua: Representing Values

Values

Lua supports the following standard types:

  • Nil
  • Booleans
  • Numbers (doubles)
  • Strings (that is, byte-strings)
  • Tables
  • UserData (arbitrary object references)
  • Functions (both Lua functions and callable user code)
  • Threads (coroutine state)

Similarly to .NET types, that list can be split into reference and value types. Nil, booleans, and numbers are true value types, while strings behave like value types due to their immutability. The rest are reference types. Now, since any variable may be of any type, and since that type may change dynamically, the backing storage for values needs to accept all of the above types.

Values in Standard Lua

In standard Lua this is done using a tagged union, which looks something like this (not a direct cut from the Lua source, I’ve evaluated some of the macros and typedefs and reformatted for clarity – this may also come out differently on other architectures):

struct lua_TValue
{
    union
    {
        struct
        {
            union Value
            {
                GCObject *gc;    /* collectable objects */
                void *p;         /* light userdata */
                int b;           /* booleans */
                lua_CFunction f; /* light C functions */
            } v__;
            int tt__;
        } i;
        double d__;
    } u;
};

Alright, so let’s make sense of this. A Lua value is a union of a double (d__, for storing number values) and a struct (i) which is a combination of fields for storing all the other possible types of values (v__) and a field (tt__) which keeps track of the actual value.

The tt__ field’s position in the i struct and its values are all carefully chosen such that the useful number values and non-number types are distinguishable (if you try to read a non-number as a number you’ll see some kind of NaN, and the Lua VM asserts on arithmetic operations that produce NaNs).

This makes lua_TValue eight bytes long, which is a wonderfully efficient state of affairs.

Values in HenchLua

Unfortunately, there’s no way to match the 8-byte value type in C# without boxing (which is counter to one of the primary design goals – being nice to the GC). So what can be done? A naive approach would be to create a struct with a field for each of the value types and a field of type object for the reference types, along with yet another field to act as the equivalent of tt__.

 
public enum ValueType
{
    // ...
}
 
public struct Value
{
    private object asRefType;
    private double asNumber;
    private bool asBool;
    private ValueType type;
}

Unfortunately, that Value type would be somewhere around twenty bytes long (alignment, padding – let’s not even bring up x64). That’s unacceptably large.

My first attempt to reduce Value‘s size was the FieldOffset attribute, which is the obvious way to implement unions in C#. I didn’t have much success with that approach. For one thing, the object field cannot be overlaid over the other fields (just imagine the havoc it would play with the GC), so all I have to play with are asNumber, asBool, and type. While that does indeed bring our struct size down to twelve bytes (which happens to be optimal), it’s brittle since I can’t actually put the type field where I want it on all platforms – there’s no way to dynamically compute the offset values to account for differences in endianness and alignment between architectures, and there goes the goal of working on all sorts of .NET runtimes.

So I took a step back and looked at the fields one at a time. The first thing that struck me was that asBool could easily be replaced by reinterpreting asNumber – zero is false, one is true. The annoying thing about that was that I’d constantly be loading and testing an 8-byte register when there’s really only one bit’s worth of data around.

But these tests would always come after testing type. So the first change I made was to split ValueType.Bool into ValueType.True and ValueType.False (naturally this could be hidden behind a public interface that only exposes a Bool enumerant, to keep things simple for external code).

After that change, all that was left to eliminate was the type field. I already knew that overlaying it over asNumber wouldn’t work, so all that left was somehow overlaying it over asRefType. Sentinel object instances to the rescue:

public struct Value
{
    internal object RefVal;
    internal double NumVal;
 
    internal static readonly object NumTypeTag = new object();
}
 
internal sealed class BoolBox
{
    public readonly bool Value;
    private BoolBox( bool value ) { Value = value; }
 
    public static readonly BoolBox True = new BoolBox( true );
    public static readonly BoolBox False = new BoolBox( false );
}

The final semantics are fairly straight-forward. RefVal always carries the type information. For reference types, the already existing .NET type info is sufficient to identify the actual value. For true value types, I either use a preallocated and immutable boxed value (for booleans), or I use the sentinel value Value.NumTypeTag (which tells us that the actual value is in the NumVal field). And null obviously means nil.

Strings

Lua strings are byte arrays. That’s unfortunate, because it means the standard System.String type can’t be used directly. So, first step in writing a custom type is gathering requirements. Lua strings are:

  • Byte arrays – they can readily contain embedded zeroes
  • Immutable
  • Very often used as keys to a hashtable
  • Sometimes used to hold large blocks of data
  • Reference types

So our implementation needs to be compact and quick to compare. The naive approach follows:

class LString
{
    private byte[] data;
 
    //cache the hash code to keep things snappy
    private int hashCode;
}

Unfortunately, due to how ubiquitous strings are in Lua, this type violates some of our fundamental requirements. First, it’s actually two objects, and that wastes memory since each object has some overhead. Second, it adds a level of indirection: when we need the string data (say, to compare values) we first need to load the LString object and only then can we read the data object. That’s up to two cache misses where there should only be one, which is relevant in a tight loop like the one at the heart of the GC’s marking phase.

Fortunately, while strings are reference types, their immutability makes them behave like value types, which allows us to expose the public interface through a struct, with no loss of clarity, while internally handling the data as a byte array:

struct LString
{
    //the first four bytes contain the hash of the remaining data
    internal byte[] InternalData;
 
    public bool IsNil { get { return InternalData == null; } }
    public int Length
    {
        get
        {
            return InternalData != null ? InternalData.Length - 4 : 0;
        }
    }
 
    // ...
}

When constructing a Value which contains a string, we don’t box the LString struct, we just grab the InternalData field directly. A RefVal of type byte[] is understood to mean string.

But what if the user gives us a byte[] as user data? This is probably a rare case (relative to how ubiquitous strings are in Lua), so we handle it by allocating a small proxy object around the user data. This is hidden from the user, so the library interface stays simple.

A small aside: I had originally named the type String, which was fine and worked well inside the HenchLua namespace. However, outside that namespace, it was constantly conflicting with System.String, and after the fourth or fifth time I wrote using LString = HenchLua.String; I decided to just rename the type for my sanity’s sake.

Callables

Callable Lua objects are represented using any of the following types:

public abstract class Function;
internal class Proto : Function;
internal class Closure : Function;
public abstract class UserFunction : Function;
 
public delegate int UserCallback( Thread thread );

That last one, the delegate, complicates things for us. In all the other cases, it would be enough to treat types derived from Function specially. However, delegates are too useful to ignore (particularly since they can be cleanly constructed around lambdas, and anonymous and static methods).

I use a trick similar to the LString struct to keep things clear. The Callable struct wraps either a Function or a UserCallback in the public interface, while the raw object values are passed around internally without any boxed or proxy objects being in the way.

Table

And, finally, this brings me to Lua’s core structured type: the table. I’m not going to go into too much detail on the underlying algorithms here, as Table is a fairly direct port of Lua’s implementation. If you’re curious, either the Table code or Lua’s luaH_* functions will tell you everything you need to know, though Table.cs might be easier for someone new to Lua to understand since nothing is buried in macros.

Table mainly differs from the standard Lua implementation in its interface. Since tables are ordinary .NET objects, with no reliance on any sort of global state, there’s no need to hide them behind Lua’s cumbersome stack interface. Raw access is directly exposed as an indexer. (Non-raw access requires an execution context, in case metamethods need to be called, and must therefore be done through a Thread object.)

The only interesting implementation detail is the way table members are stored. The thing with tables is that their storage is often rather sparse (especially in the hashtable part), and using full Value types would be wasteful. Tables instead use the internal CompactValue type, which works like Value except that they don’t have a separate NumVal field. Instead, numbers are boxed.

One thing to note, however, is that they aren’t boxed using the standard .NET boxing mechanism. The reason for this is that I wanted to reuse the boxes when possible, to keep allocation pressure low, and the standard boxes are impossible to efficiently mutate in C#.

This is the main exception to the “don’t allocate where standard Lua wouldn’t” rule. Lua can allocate memory when setting nil fields to non-nil values. HenchLua can, in addition, allocate when setting non-number fields to number values. However updating a number value won’t allocate additional boxes. That compromise was made to save memory and to make the GC run a little faster when scanning a table’s fields, and I think it’s a net win since, in my experience, the type of a table element doesn’t often change dynamically in common Lua code.

Filed in code, HenchLua, Lua | Comment Now

Introducing HenchLua

This is the first of a series of posts on the subject of HenchLua. HenchLua is an implementation of the Lua VM in C#. It’s targeted at projects running in otherwise limited .NET contexts, such as web-based Unity games (the Unity plugin, I believe, requires pure verifiable CIL), mobile apps (which are memory-limited and must meet the limitations of Mono’s full AOT compiler, or apps that run on the .NET Compact Framework (whose garbage collector has some serious performance issues, as anyone who’s written an XNA game targeted at the Xbox can attest).

Studying the standard Lua runtime and reimplementing it in a fundamentally different environment has been an enlightening (and at times maddening) experience. I’m writing this series to share some of the insights I’ve had along the way, both with respect to .NET programming and in relation to the standard Lua implementation.

Design Goals

Unlike KopiLua, which aims for the highest possible degree of compatibility with standard Lua, HenchLua’s first goal is efficiency, followed closely by a useful degree of compatibility with the standard. To that end, I’ve made a number of compromises. So, what exactly does that mean?

First, HenchLua is designed to be gentle to the garbage collector. HenchLua avoids transient objects at all costs (as even small and short-lived allocations can trigger expensive collection cycles at inopportune times on some .NET runtimes). The rule is simple: if standard Lua doesn’t touch the heap when executing a given operation, then (apart from a few limited exceptions) it’s a bug for HenchLua to do otherwise. What this means for the user is that if you’re careful about how you structure your scripts, you can be reasonably sure that HenchLua won’t be the trigger of an unexpected collection cycle.

Further, when a collection cycle does happen, HenchLua does its best to maintain a minimal impact. This is mainly achieved by keeping the object graph small and simple. So in addition to avoiding the construction of ephemeral objects, HenchLua also avoids creating wrapper objects (read: bloat) and unnecessary references among objects (read: cache misses). And while we’re on the subject of garbage collection, HenchLua directly uses the .NET GC. Apart from avoiding the silliness of implementing a garbage collector in a garbage-collected language, this immensely reduces the number of inter-object references.

In addition, HenchLua compiles to pure, verifiable, CIL, it needs no special permissions to run, and it avoids advanced features of the .NET framework. As awesome as it would have been to use Reflection.Emit or Expression.Compile, those techniques don’t work on the Compact Framework or with Mono’s AOT compiler, and broad compatibility is definitely a goal.

Of course, some of these goals complicates the implementation of the VM. Fortunately, the situation isn’t all that bad since Lua is incredibly simple to begin with.

The API is also vastly different from Lua’s. Since Lua objects and .NET objects live in the same conceptual memory space and are both subject to the same garbage collector, there’s no need to firewall Lua objects behind the standard runtime’s stack API. Lua objects are directly accessible to .NET code, to the extent that HenchLua’s Table type can almost be used like a Dictionary<Value, Value> (there are some semantic differences concerning the way nil keys and values are treated).

The only exception to this is Lua’s function objects. While strings and tables can be directly constructed and manipulated, Lua functions can’t be called directly. There’s a good deal of state that needs to be tracked when running a Lua function, and for that we have the Thread object, whose job it is to execute the Lua bytecode contained in Lua Function objects.

What Works

HenchLua is a work in progress. As of today, you can set up an environment with (parts of) the Lua standard library and your own callbacks, load compiled bytecode with respect to that environment, and execute that code – provided that it doesn’t rely on missing features. The reality is that HenchLua is being developed as part of a larger project, and it’s getting features on an as-needed basis, so apart from the core VM, the current feature set is somewhat eclectic.

Don’t be too put off by the missing features. The codebase is clean, and it’s not very difficult to add the missing bits in. Furthermore, as limited as the current feature set is, it’s sufficient to run some fairly complex code. If you don’t rely too heavily on features in the What’s Missing list, chances are HenchLua would be useful to you, even in its current state, with only minimal effort.

What’s Missing

A lot is missing at this stage. Most notably, the following are absent:

  • Coroutines
  • Metamethods that aren’t __index
  • The Lua compiler (HenchLua loads bytecode produced by the standard Lua 5.2 compiler)
  • Weak keys and values (HenchLua uses the .NET GC directly – working around it to implement these features would be burdensome, to say the least)
  • Some of the implicit string-number conversions
  • The debug libraries (Debug info is loaded, and it’s even useful in a debugger, but the routines to parse it and implement the actual debug API haven’t been ported)
  • Most of the standard libraries

Moving Onward…

That’s about it for the introduction. Next time: on to the implementation!

  1. Representing Values
Filed in code, HenchLua, Lua | Comment Now

Bicubic Filtering in Fewer Taps

Author’s Note

This post is based on the technique described in GPU Gems 2, chapter 20, Fast Third-Order Texture Filtering. While that’s certainly a good read, I found that the authors skipped over a lot of detail and optimized a little prematurely, making the result rather difficult to parse. If you’ve read and understood their paper, then this isn’t going to be news to you.

Bicubic Filtering Has a Terrible Worst Case

The trouble with naively implementing Bicubic filtering in a shader is that you end up doing sixteen texture taps. That’s rather inefficient. A simple improvement might be to separate the horizontal and vertical passes, bringing it down to eight taps, however you then incur an extra render target swap, as well as having to come up with the memory for that extra render target, which can be a deal-breaker on some architectures.

Exploiting Bilinear Filtering

However, GPUs come with texture sampling hardware that take and blend four individual taps at once. We call this bilinear filtering, and it’s the most commonly used texture filtering in 3D. And by carefully selecting our coordinates, we can take up to four of our taps at once, bringing the best case for bicubic filtering down to four taps – even better than separating the filter out into horizontal and vertical passes.

The rest of this post will show how to exploit bilinear filtering to implement a 4-tap B-Spline bicubic filter and a 9-tap Catmull-Rom filter.

The Reference Implementation

How does this work?

Let’s start with the naive implementation:

Texture2D g_Tex; //the texture we're zooming in on
SamplerState g_Lin; //a sampler configured for bilinear filtering
 
float4 ps_main( float2 iTc : TEXCOORD0 )
{
    //get into the right coordinate system
 
    float2 texSize;
    g_Tex.GetDimensions( texSize.x, texSize.y );
    float2 invTexSize = 1.0 / texSize;
 
    iTc *= texSize;

This bit could be replaced with a couple of multiplications and some global uniforms. I’m including it here so it’s utterly clear what coordinate space we’re in, as that’s very important.

    //round tc *down* to the nearest *texel center*
 
    float2 tc = floor( iTc - 0.5 ) + 0.5;

The one-half offsets are important here. We’re doing our own filtering here, so we want each of our samples to land directly on a texel center, so that no filtering is done by the hardware, even if our sampler is set to bilinear.

    //compute the fractional offset from that texel center
    //to the actual coordinate we want to filter at
 
    float2 f = iTc - tc;
 
    //we'll need the second and third powers
    //of f to compute our filter weights
 
    float2 f2 = f * f;
    float2 f3 = f2 * f;
 
    //compute the filter weights
 
    float2 w0 = //...
    float2 w1 = //...
    float2 w2 = //...
    float2 w3 = //...

Remember, we’ve got two sets of four weights. One set is horizontal, one vertical. We can generally compute the corresponding horizontal and vertical pairs at once.

So w0.x is the first horizontal weight, w0.y is the first vertical weight. Similarly, w1.x is the second horizontal weight, and so on.

The actual weight equations vary depending on the filtering curve you’re using, so I’m just going to omit that detail for now.

We also need to compute the coordinates of our sixteen taps. Again, these are separable in the horizontal and vertical directions, so we just have four coordinates for each, which we’ll combine later on:

    //get our texture coordinates
 
    float2 tc0 = tc - 1;
    float2 tc1 = tc;
    float2 tc2 = tc + 1;
    float2 tc3 = tc + 2;
 
    /*
        If we're only using a portion of the texture,
        this is where we need to clamp tc2 and tc3 to
        make sure we don't sample off into the unused
        part of the texture (tc0 and tc1 only need to
        be clamped if our subrectangle doesn't start
        at the origin).
    */
 
    //convert them to normalized coordinates
 
    tc0 *= invTexSize;
    tc1 *= invTexSize;
    tc2 *= invTexSize;
    tc3 *= invTexSize;

And finally, we take and blend our sixteen taps.

    return
        g_Tex.Sample( g_Lin, float2( tc0.x, tc0.y ) ) * w0.x * w0.y
      + g_Tex.Sample( g_Lin, float2( tc1.x, tc0.y ) ) * w1.x * w0.y
      + g_Tex.Sample( g_Lin, float2( tc2.x, tc0.y ) ) * w2.x * w0.y
      + g_Tex.Sample( g_Lin, float2( tc3.x, tc0.y ) ) * w3.x * w0.y
 
      + g_Tex.Sample( g_Lin, float2( tc0.x, tc1.y ) ) * w0.x * w1.y
      + g_Tex.Sample( g_Lin, float2( tc1.x, tc1.y ) ) * w1.x * w1.y
      + g_Tex.Sample( g_Lin, float2( tc2.x, tc1.y ) ) * w2.x * w1.y
      + g_Tex.Sample( g_Lin, float2( tc3.x, tc1.y ) ) * w3.x * w1.y
 
      + g_Tex.Sample( g_Lin, float2( tc0.x, tc2.y ) ) * w0.x * w2.y
      + g_Tex.Sample( g_Lin, float2( tc1.x, tc2.y ) ) * w1.x * w2.y
      + g_Tex.Sample( g_Lin, float2( tc2.x, tc2.y ) ) * w2.x * w2.y
      + g_Tex.Sample( g_Lin, float2( tc3.x, tc2.y ) ) * w3.x * w2.y
 
      + g_Tex.Sample( g_Lin, float2( tc0.x, tc3.y ) ) * w0.x * w3.y
      + g_Tex.Sample( g_Lin, float2( tc1.x, tc3.y ) ) * w1.x * w3.y
      + g_Tex.Sample( g_Lin, float2( tc2.x, tc3.y ) ) * w2.x * w3.y
      + g_Tex.Sample( g_Lin, float2( tc3.x, tc3.y ) ) * w3.x * w3.y;
}

Again, this bears repeating: it doesn’t matter that g_Lin is set for bilinear filtering. All of these taps are landing dead center on a single texel, so no filtering is being done in any of them.

Collapsing Adjacent Taps

OK. So, starting with that. What have we got? Well, as mentioned, these filters are fully separable, so we can carry right on treating both dimensions identically, and things will just work. So let’s keep things simple and work with just one dimension for now.

We’re blending the values of four adjacent texels $T$, at offsets $-1$, $0$, $+1$, and $+2$. Let’s call these values $T_{-1}$, $T_0$, $T_{+1}$, and $T_{+2}$ (these are sampled from our texture at tc0.x, tc1.x, etc – the subscripts correspond to the offsets, I’m just switching to math-friendly notation). Each of those gets multiplied by the corresponding weight, $w_{-1}$, $w_0$, $w_{+1}$, $w_{+2}$. We also know that our weights add up to $1$ (because we’re using a well-behaved weight function).

If we look at just the last two adjacent samples, we’ve got the following:

$$C_{+1,+2} = w_{+1}T_{+1} + w_{+2}T_{+2}$$

Now, if we did a linear (not bilinear, we’re working in 1D at the moment) sample somewhere between those two texels, at coordinate $+(1+t)$ (that’s $t$ units to the right of the offset $+1$, where $0 \le t \le 1$), we’d end up with the following:

$$L_{+(1+t)} = (1-t)T_{+1} + tT_{+2}$$

And that’s pretty close to the equation that we want. We just need to find a $t$, that yields an equivalent expression.

First, we take a look at the weights in the linear blend ($t$ and $1-t$) which clearly add up to $1$, whereas $w_{+1}$ and $w_{+2}$ clearly don’t. To start, we’ll need to scale our weights by some value $s$ so that they have the same property:

$$
\begin{align}
s(w_{+1} + w_{+2}) &= 1 \\
s &= \frac{1}{w_{+1} + w_{+2}}
\end{align}
$$

Playing with these a little more we get:

$$
\begin{align}
sw_{+1} + sw_{+2} &= 1 \\
sw_{+1} &= 1 – sw_{+2}
\end{align}
$$

And that makes $sw_{+2}$ look suspiciously like the $t$ we’re looking for. Plugging it in to check (remembering to multiply left side of the blend equation by $s$ to match what we did to our weights):

$$
\begin{align}
C_{+1,+2} &= w_{+1}T_{+1} + w_{+2}T_{+2} \\
sC_{+1,+2} &= sw_{+1}T_{+1} + sw_{+2}T_{+2} \\
sC_{+1,+2} &= (1-sw_{+2})T_{+1} + sw_{+2}T_{+2}
\end{align}
$$

Substituting $t=sw_{+2}$:

$$
\begin{align}
sC_{+1,+2} &= (1-t)T_{+1} + tT_{+2} \\
sC_{+1,+2} &= L_{+(1+t)} \\
C_{+1,+2} &= s^{-1}L_{+(1+t)} \\
&= (w_{+1} + w_{+2})L_{+(1+t)}
\end{align}
$$

And we’ve just turned two individual taps into a single linear tap. If we apply this in two dimensions at once, we can turn four taps into a single bilinear tap, reducing the original sixteen-sample shader to a much more manageable four samples:

    //get our texture coordinates
 
    float2 s0 = w0 + w1;
    float2 s1 = w2 + w3;
 
    float2 f0 = w1 / (w0 + w1);
    float2 f1 = w3 / (w2 + w3);
 
    float2 t0 = tc - 1 + f0;
    float2 t1 = tc + 1 + f1;
 
    //and sample and blend
 
    return
        g_Tex.Sample( g_Lin, float2( t0.x, t0.y ) ) * s0.x * s0.y
      + g_Tex.Sample( g_Lin, float2( t1.x, t0.y ) ) * s1.x * s0.y
      + g_Tex.Sample( g_Lin, float2( t0.x, t1.y ) ) * s0.x * s1.y
      + g_Tex.Sample( g_Lin, float2( t1.x, t1.y ) ) * s1.x * s1.y;
}

We can also exploit the fact that these weights add up to one to turn most of those multiplies into a trio of lerp calls, if we know our hardware is better at executing those than a few extra multiplications.

And there it is! Bicubic filtering in four taps.

Not so Fast!

Now, we can’t actually go blithly applying this optimization to just any bicubic filter. If you were paying attention, you’ll note that there’s actually a restriction that we must satisfy, or the result will just be wrong. Going back to our example:

$$
\begin{array}{rcccl}
0 &\le& t &\le& 1 \\
0 &\le& sw_{+2} &\le& 1 \\
0 &\le& \frac{w_{+2}}{w_{+1} + w_{+2}} &\le& 1
\end{array}
$$

So we can’t actually apply this optimization unless we know how our weights will vary as our fractional offset ($f$, corresponding to the value f from near the top of our shader) varies from zero to one. So let’s look at some weighting functions:

The B-Spline Weighting Function

The B-Spline weight function is defined as follows:

$$W(d) =
\frac{1}{6}\cases{
4 + 3|d|^3 – 6|d|^2 & \text{for } 0 \le |d| \le 1 \\
(2 – |d|)^3 & \text{for } 1 \lt |d| \le 2 \\
0 & \text{otherwise}
}$$

where $d$ is the texel to be weighted’s distance from the sampling point.

Now, the piecewise nature of the function makes reasoning about this function a little daunting, as do all the absolute values we’re taking, but it’s actually not bad. We’re sampling at four points, and we already know what the distances to those points from our sampling point are, because we computed those sampling points:

$$
\begin{align}
|d_{-1}| &= f + 1 \\
|d_0| &= f \\
|d_{+1}| &= 1 – f \\
|d_{+2}| &= 2 – f \\
\end{align}
$$

And given that $0 \le f \lt 1$, we can see that each weight cleanly falls into one case of the weighting function or another, and its piecewise definition no longer matters:

$$
\begin{align}
w_{-1} &= \frac{1}{6}(1 – f)^3 \\
w_0 &= \frac{1}{6}\left(4 + 3f^3 – 6f^2\right) \\
w_{+1} &= \frac{1}{6}\left(4 + 3(1 – f)^3 – 6(1 – f)^2\right) \\
w_{+2} &= \frac{1}{6}f^3
\end{align}
$$

In order to merge the sixteen taps down to four, we need to combine the first pair of taps into a linear tap and the second pair into another linear tap (remember, a linear 2:1 reduction becomes a is 4:1 reduction in 2D, taking 16 taps down to 4). So we need to prove that $0 \le \frac{w_0}{w_{-1} + w_0} \le 1$ and $0 \le \frac{w_{+2}}{w_{+1} + w_{+2}} \le 1$.

This is easy enough – just go to this awesome graphing calculator and drop in the equations. You’ll see that everything is well behaved over that range, and we can therefore reduce this filter to just four taps.

The Catmull-Rom Weighting Function

This one’s a little trickier. Here’s the definition:

$$W(d) =
\frac{1}{6}\cases{
9|d|^3 – 15|d|^2 + 6 & \text{for } 0 \le |d| \le 1 \\
-3|d|^3 + 15|d|^2 – 24|d| + 12 & \text{for } 1 \lt |d| \le 2 \\
0 & \text{otherwise}
}$$

As above, this gives us four functions weighting each of our taps:

$$
\begin{align}
w_{-1} &= \frac{1}{6}\left(-3f^3 + 6f^2 – 3f\right) \\
w_0 &= \frac{1}{6}\left(6f^3 – 15f^2 + 6\right) \\
w_{+1} &= \frac{1}{6}\left(-9f^3 + 12f^2 +3f\right) \\
w_{+2} &= \frac{1}{6}\left(3f^3 – 3f^2\right)
\end{align}
$$

Unfortunately, plugging these equations into Desmos (yes, it really is worth linking twice – check it out!) quickly shows that we can’t optimize a Catmull-Rom filter down to four taps like we did the B-Spline filter.

$$
\begin{array}{rcl}
\frac{w_0}{w_{-1} + w_0} &\notin& [0, 1] \text{ for } 0 \le f \le 1 \\
\frac{w_{+2}}{w_{+1} + w_{+2}} &\notin& [0, 1] \text{ for } 0 \le f \le 1 \\
\end{array}
$$

Now, all is not lost. The reason those ratios escape from the range we’re interested in is that the outer weights ($w_{-1}$ and $w_{+2}$) are negative, where the rest are positive. This makes the denominator smaller than the numerator, yielding a final value greater than one. However, the middle weights ($w_0$ and $w_{+1}$) are well-behaved and in the range $[0, 1]$. This means that $0 \le \frac{w_{+1}}{w_0 + w_{+1}} \le 1$.

So, in 1D, we can compute the filter in three taps – one for the leftmost texel, one for the center two, and one for the rightmost one. In 2D, that yields nine taps, which is a hell of a lot better than sixteen.

Other Optimizations

These filters are separable, and the weighting functions are identical both vertically and horizontally, making it easy to compute both sets of offsets in one go:

    //compute the B-Spline weights
 
    float2 w0 = f2 - 0.5 * (f3 + f);
    float2 w1 = 1.5 * f3 - 2.5 * f2 + 1.0;
    float2 w2 = -1.5 * f3 + 2 * f2 + 0.5 * f;
    float2 w3 = 0.5 * (f3 - f2);

We also know that these weights add up to one, so we don’t actually need to compute all four:

    float2 w0 = f2 - 0.5 * (f3 + f);
    float2 w1 = 1.5 * f3 - 2.5 * f2 + 1.0;
    float2 w3 = 0.5 * (f3 - f2);
    float2 w2 = 1.0 - w0 - w1 - w3;

And then there are some repeated multiplications down in our final blend, which can be factored out (though the compiler is probably already doing this for us):

    return
        (g_Tex.Sample( g_Lin, float2( t0.x, t0.y ) ) * s0.x
      +  g_Tex.Sample( g_Lin, float2( t1.x, t0.y ) ) * s1.x) * s0.y
      + (g_Tex.Sample( g_Lin, float2( t0.x, t1.y ) ) * s0.x
      +  g_Tex.Sample( g_Lin, float2( t1.x, t1.y ) ) * s1.x) * s1.y;
Filed in code, graphics, shaders | 11 Comments

Simple Flip Book Animation in WPF

WPF makes it easy to animate numbers, colors, sizes, and a host of other properties. Unfortunately, it isn’t easy to animate an ImageSource property, which is what we’re usually looking for when implementing a flip book animation. The closest we get out of the box is ObjectAnimationUsingKeyFrames, which works, but it’s very tedious to set up all of the individual key frame times.

What we really want is a more specialized animation type, and we’re going to have to make it ourselves. The full code is available right here.

Following the official guidelines, we begin by creating an abstract animation timeline base for ImageSources. This is all pretty much boilerplate, so I’m not going to go into any detail about ImageSourceAnimationBase.

The implementation of our actual FlipBookAnimation class is split into three main sections.

The Properties

The first is the class’s properties. Because these types are Freezable, we need to take care when we define these.

The first thing to note is that, because freezable objects are deeply frozen, all of our properties must themselves be freezable. So we aren’t going to use a simple collection to store our frames. We’re going to use a FreezableCollection<T>. This also affects our actual Frames property, as we need to disallow changing it once our object is frozen.

The Frames property comes with two other related bits of code. One is at the very top of our class:

[ContentProperty( "Frames" )]

This tells the XAML parser that our element can have ImageSource declarations nested directly within its element, and that those declarations should be routed to the Frames property.

The other bit of code which implements the IAddChild interface can be ignored – it’s the old way of accomplishing what the ContentProperty attribute does, and is just there for compatibility.

The FrameTime property is, thankfully, much easier. Dependency properties automatically work for all Freezable types, so we only need to define it and we’re done.

Implementing Freezable

Types derived from Freezable are required to override CreateInstanceCore, plus several others if they store information outside of dependency properties. We do store information outside of dependency properties, so we need to implement the whole lot.

This is, again, boilerplate. All four methods are very similar, just optimized to different tasks, so I’ll just look at one:

protected override void CloneCore( Freezable sourceFreezable )
{
    base.CloneCore( sourceFreezable );
 
    var source = (FlipBookAnimation)sourceFreezable;
 
    if( source.frames != null )
    {
        frames = (FreezableCollection<ImageSource>)
            source.frames.Clone();
        OnFreezablePropertyChanged( null, frames );
    }
}

This is very straight-forward. We start by calling the base implementation, which takes care of all of our dependency properties. All that’s left is the Frames property, so we clone it manually and ensure that the clone is correctly linked to its parent. That’s it.

Evaluating the Animation

And, finally, we can compute our actual animation:

protected override ImageSource GetCurrentValueCore(
    ImageSource defaultOriginValue,
    ImageSource defaultDestinationValue,
    AnimationClock animationClock )
{
    if( frames == null || frames.Count == 0 )
        return defaultDestinationValue;
 
    var now = animationClock.CurrentTime.Value;
 
    long frame = now.Ticks / FrameTime.Ticks;
 
    if( frame <= 0 )
        return frames[0];
 
    return frames[(int)(frame % frames.Count)];
}

We start by taking care of the trivial case. If we don’t have any frames defined, we do nothing, and simply return the default value.

Otherwise, all we’re doing is some simple math. We start by dividing the running time of the animation by FrameTime, the amount of time we’re devoting to displaying each frame. This gives us the index of the frame we should be displaying. We do a quick sanity check, in case the animation clock supplied us with a negative time value (can that even happen?), and then we wrap the frame number by the number of frames we have defined, causing the animation to repeat if the clock runs on past the end.

protected override Duration GetNaturalDurationCore( Clock clock )
{
    int numFrames = frames != null ? frames.Count : 0;
    return new Duration( new TimeSpan(
        FrameTime.Ticks * numFrames ) );
}

And the natural length of our animation is simply our frame time multiplied by the number of frames we have.

Putting It to Use

And that’s it. Using the animation type is just like using any other. You create a definition that looks something like this:

<vec3:FlipBookAnimation
    FrameTime="0:0:0.042"
 
    Storyboard.TargetName="TickingImage"
    Storyboard.TargetProperty="Source"
    >                            
    <ImageSource>./Tick00.png</ImageSource>
    <ImageSource>./Tick01.png</ImageSource>
    <ImageSource>./Tick02.png</ImageSource>
    <ImageSource>./Tick03.png</ImageSource>
    <ImageSource>./Tick04.png</ImageSource>
 
    <!-- etc -->
</vec3:FlipBookAnimation>

And then you just drop it into any storyboard you like, and off you go.

Here’s another link to the code for those that skimmed right past the first one.

Filed in code, Windows, WPF | Comment Now

Using Win32 Icons in WPF

Using custom icons can be a little tricky in WPF. It’s simple enough if you want to use your application’s main icon or an icon file that you can refer to using a pack URI – so long as you do that, everything just works.

However, if your icon data is anywhere else, then things can get a little tricky.

The Icon Property

It seems enough to just set a window’s Icon property to any old ImageSource should be enough, and indeed that generally works.

However there’s a snag. An ImageSource typically refers to just one image, whereas Windows requires two separate images. These images have different sizes, according to the current system metrics. The larger one needs to be SM_CXICON by SM_CYICON pixels, and is used in the task-switcher dialog and on the Windows 7 task bar. The smaller one is SM_CXSMICON by SM_CYSMICON, and is used in the window’s caption and on the task bar (in the preview thumbnails that pop up on Windows 7).

If you set the window’s icon to a simple bitmap image, then WPF will simply scale it to the two sizes and pass those images to Windows. Unfortunately, images which work well at one size (usually 32 by 32) tend to look bad at the other (16 by 16). That’s why Windows icon files have individually authored images for each size – the two images will be different, each created specifically with its size in mind. We can’t do that by just throwing any old ImageSource at the Icon property.

And yet everything works fine if we set the property to a URI that refers to a windows icon file – the system will happily find the correct image in the icon data. So what does WPF do with that URI and how do we replicate it if we haven’t got our image data in a URI-friendly location?

The BitmapFrame Class

The trick is that when WPF decodes an icon, it returns a BitmapFrame object. That object keeps a reference back to the decoder which parsed the icon file. When you set the Icon property to a BitmapFrame, WPF will go and look at the frame’s decoder’s output and see if that decoder found more than a single image in the source file. If it did, WPF will choose the two images from that set which best match the required resolution and color depth, scale those if they’re not exact matches, and then pass those images to Windows.

So all we need to do is decode a multi-image file and pass one of the resulting images to the Icon property, and WPF will do the rest.

Loading a Windows Icon From a Stream

The typical multi-image file format that’s used for Windows icons is, unsurprisingly, the Windows Icon (.ico) format. Loading one of these is trivial. All you need to do is get your data into a Stream, and you can pass it to IconBitmapDecoder‘s constructor. Once the decoder is constructed, simply set the target window’s Icon property to any one of the frames that the decoder loaded from the file:

Stream icoData = //load the data from wherever it is
 
var ico = new IconBitmapDecoder( icoData, BitmapCreateOptions.None,
    BitmapCacheOption.Default );
 
window.Icon = ico.Frames[0];

Loading From a Windows Resource

One of the more common places to find icon resources is embedded into PE (executable) files. Loading icon resources from these is a little tricky, since the icon’s parts are split up into multiple resource entries, and IconBitmapDecoder can’t handle that directly.

Fortunately, we know to fix that. We simply load the icon resource into a MemoryStream using that code and pass the stream to IconBitmapDecoder.

Filed in code, Windows, WPF | Comment Now

Extracting Icons from PE Files

There are times when you need an icon file, but all you have is an icon resource embedded in a PE (executable) file. Getting at these is a little tricky, since icon files aren’t stored as a simple blob in the PE file. In fact, they’re split up into a number of different entries. Fortunately, it isn’t very hard to combine these entries into an ICO-format data blob which you can then save to file or pass to an API that expects it.

I’ll be writing the sample code for this post in C#, and as such I’ll be using the Win32ResourceStream class from my last post. For this particular example, I’ll be loading the current assembly’s main icon.

public static Stream ExtractAssemblyIcon( Assembly asm )
{
    var module = asm.ManifestModule;
    var resId = (ushort)32512;
 
    //extract the icon here...
}

(If you’re wondering where the weird 32512 comes from, it’s the value of IDI_APPLICATION, and is the resource ID assigned to the assembly icon by the C# compiler.)

The Icon Header

The first thing we’ll need to load is the icon’s header. It’s stored in its own resource and consists of an array of image descriptions. I’m not going to get into any detail about many of the fields since we’ll generally just be writing them to our output stream.

We’ll need to store the entries in an array. Here’s the struct defining each entry:

struct MemIconEntry
{
    public byte Width;
    public byte Height;
    public byte ColorCount;
    public byte Reserved;
    public ushort Planes;
    public ushort BitCount;
    public uint BytesInRes;
    public ushort Id;
}

and here’s how we’ll load them:

MemIconEntry[] entries;
 
using( var resStream = new Win32ResourceStream( module,
    resId, Win32ResourceType.GroupIcon ) )
{
    var reader = new BinaryReader( resStream );
 
    if( reader.ReadUInt16() != 0 )
        throw new InvalidDataException();
    if( reader.ReadUInt16() != 1 )
        throw new InvalidDataException();
 
    var numEntries = reader.ReadUInt16();
 
    entries = new MemIconEntry[numEntries];
    for( int i = 0; i < entries.Length; i++ )
    {
        entries[i].Width = reader.ReadByte();
        entries[i].Height = reader.ReadByte();
 
        entries[i].ColorCount = reader.ReadByte();
 
        entries[i].Reserved = reader.ReadByte();
 
        entries[i].Planes = reader.ReadUInt16();
        entries[i].BitCount = reader.ReadUInt16();
 
        entries[i].BytesInRes = reader.ReadUInt32();
        entries[i].Id = reader.ReadUInt16();
    }
}

Now that we have those, we’re ready to start writing our icon data. We’ll be writing it to a MemoryStream here, though it could just as easily be written to file:

var ret = new MemoryStream();
var writer = new BinaryWriter( ret );
 
writer.Write( (ushort)0 );
writer.Write( (ushort)1 );
writer.Write( (ushort)entries.Length );
 
//each entry has an offset to the start of that
//icon's image data, we start that offset at the
//byte immediately following the header data
uint offset = 6U + 16U * (uint)entries.Length;
 
foreach( var e in entries )
{
    writer.Write( e.Width );
    writer.Write( e.Height );
 
    writer.Write( e.ColorCount );
 
    writer.Write( e.Reserved );
 
    writer.Write( e.Planes );
    writer.Write( e.BitCount );
 
    writer.Write( e.BytesInRes );
    writer.Write( offset );
 
    offset += e.BytesInRes;
}
 
writer.Flush();

And finally we load each individual image’s data and append it to the output.

foreach( var e in entries )
{
    using( var imgData = new Win32ResourceStream( module,
        e.Id, Win32ResourceType.Icon ) )
    {
        if( imgData.Length != e.BytesInRes )
            throw new InvalidDataException();
 
        imgData.CopyTo( ret );
    }
}

And that’s it. We just rewind our output stream and return it to finish.

ret.Position = 0;
 
return ret;
Filed in code, Windows | Comment Now

Working With Win32 Resources in .NET

Most native applications make extensive use of Win32 resources. While the .NET Framework provides a far more useful resource API, it’s sometimes necessary to access the old style Win32 resources. Fortunately, this isn’t very difficult.

The Win32 Resource API

The first thing we’ll need is some P/Invoke to get at the resources. Let’s start with that.

The HINSTANCE

Windows refers to each loaded module (EXE or DLL) through a handle called an HINSTANCE. There are two ways to get one of these. If the target module is part of a loaded .NET assembly, then we can use Marshal.GetHINSTANCE. Otherwise, we need to use GetModuleHandle. It has the following P/Invoke signature:

partial static class Native
{
    [DllImport( "kernel32.dll", CharSet = CharSet.Auto )]
    public static extern IntPtr GetModuleHandle( string modName );
}

Opening the Resource

Win32 resources are identified by their type and ID. These parameters can be specified either as a string or as an integer. They are located using FindResource, which we’ll import several times since there’s no other way to do pointer-casting tricks with P/Invoke.

partial static class Native
{
    [DllImport( "kernel32.dll", CharSet = CharSet.Auto )]
    public static extern IntPtr FindResource( IntPtr hModule,
        string name, string type );
    [DllImport( "kernel32.dll", CharSet = CharSet.Auto )]
    public static extern IntPtr FindResource( IntPtr hModule,
        string name, IntPtr type );
    [DllImport( "kernel32.dll", CharSet = CharSet.Auto )]
    public static extern IntPtr FindResource( IntPtr hModule,
        IntPtr name, IntPtr type );
}

Yes, it’s also legal to use an integer name and string type, but that’s a less likely usage so we won’t provide an overload for it. If necessary, it’s always possible to pass an integer ID as a string by prepending its string representation with a hash sign (#).

Getting the Actual Data

We need three functions to get the actual data. SizeofResource will tell us the size of our resource. LoadResource and LockResource are used together to get the actual data pointer:

partial static class Native
{
    [DllImport( "kernel32.dll", ExactSpelling = true )]
    public static extern uint SizeofResource( IntPtr hModule,
        IntPtr hResInfo );
    [DllImport( "kernel32.dll", ExactSpelling = true )]
    public static extern IntPtr LoadResource( IntPtr hModule,
        IntPtr hRes );
    [DllImport( "kernel32.dll", ExactSpelling = true )]
    public static extern IntPtr LockResource( IntPtr hRes );
}

Wrapping it All Up

These can wrapped up in a simple class, which I’ll call Win32ResourceStream. We’ll derive from UnmanagedMemoryStream in order to make things simple:

public class Win32ResourceStream : UnmanagedMemoryStream
{
    public Win32ResourceStream( Module managedModule,
        string resName, string resType );
    public Win32ResourceStream( string moduleName,
        string resName, string resType );
 
    private IntPtr GetModuleHandle( string name );
    private IntPtr GetModuleHandle( Module module );
 
    protected void Initialize( IntPtr hModule,
        string resName, string resType );
    protected unsafe void Initialize(
        IntPtr hModule, IntPtr hResource );
}

The implementation is pretty simple. We’ll start with the constructors, which get the necessary HINSTANCE value and pass it along to Initialize, which will do the rest of the work. There’s also a bit of validation going on in the GetModuleHandle implementations.

public Win32ResourceStream( Module managedModule,
    string resName, string resType )
{
    Initialize( GetModuleHandle( managedModule ),
        resName, resType );
}
 
public Win32ResourceStream( string moduleName,
    string resName, string resType )
{
    Initialize( GetModuleHandle( moduleName ),
        resName, resType );
}
 
private IntPtr GetModuleHandle( string name )
{
    if( name == null )
        throw new ArgumentNullException( name );
 
    var hModule = Native.GetModuleHandle( name );
    if( hModule == IntPtr.Zero )
        throw new FileNotFoundException();
 
    return hModule;
}
 
private IntPtr GetModuleHandle( Module module )
{
    if( module == null )
        throw new ArgumentNullException( "module" );
 
    var hModule = Marshal.GetHINSTANCE( module );
    if( hModule == (IntPtr)(-1) )
        throw new ArgumentException( "Module has no HINSTANCE." );
 
    return hModule;
}

And that just leaves us with Initialize. This one isn’t hard. We just validate our parameters and call the necessary functions in sequence, checking for errors as we go:

protected void Initialize( IntPtr hModule,
    string resName, string resType )
{
    if( hModule == IntPtr.Zero )
        throw new ArgumentNullException( "hModule" );
 
    if( resName == null )
        throw new ArgumentNullException( "resName" );
    if( resType == null )
        throw new ArgumentNullException( "resType" );
 
    var hRes = Native.FindResource( hModule, resName, resType );
    Initialize( hModule, hRes );
}
 
protected unsafe void Initialize( IntPtr hModule, IntPtr hRes )
{
    if( hModule == IntPtr.Zero || hRes == IntPtr.Zero )
        throw new FileNotFoundException();
 
    var size = Native.SizeofResource( hModule, hRes );
    var hResData = Native.LoadResource( hModule, hRes );
    var pResData = Native.LockResource( hResData );
 
    Initialize( (byte*)pResData, size, size, FileAccess.Read );
}

Cleanup

There isn’t any! Windows cleans everything up when the module is unloaded. Just take care that you don’t unload the module before you’re finished with the stream.

Standard Type IDs and a Convenient Overload

The built-in resource types have predefined IDs. These IDs are 16-bit integers which are passed in place of a type name by casting them to a string pointer (or IntPtr, in our case). We can stick these in an enumeration to make things more convenient when working with the built-in types:

public enum Win32ResourceType : ushort
{
    Accelerator = 9,
    AnimatedCursor = 21,
    AnimatedIcon = 22,
    Bitmap = 2,
    Cursor = 1,
    Dialog = 5,
    Font = 8,
    FontDir = 7,
    GroupCursor = 12,
    GroupIcon = 14,
    Icon = 3,
    Html = 23,
    Menu = 4,
    Manifest = 24,
    MessageTable = 11,
    UserData = 10,
    String = 6,
    Version = 16,
    PlugAndPlay = 19,
}

And then we can add some overloads to make it easy to use the enumeration:

public Win32ResourceStream( Module managedModule,
    string resName, Win32ResourceType resType )
{
    Initialize( GetModuleHandle( managedModule ),
        resName, (ushort)resType );
}
 
public Win32ResourceStream( string moduleName,
    string resName, ushort resType )
{
    Initialize( GetModuleHandle( moduleName ),
        resName, resType );
}
 
protected void Initialize( IntPtr hModule,
    string resName, ushort resType )
{
    if( hModule == IntPtr.Zero )
        throw new ArgumentNullException( "hModule" );
 
    if( resName == null )
        throw new ArgumentNullException( "resName" );
 
    var hRes = Native.FindResource( hModule,
        resName, (IntPtr)resType );
    Initialize( hModule, hRes );
}

Opening Resources by Integer ID

We’ll add one more overload to make it easy to open resources by integer ID.

public Win32ResourceStream( Module managedModule,
    ushort resId, Win32ResourceType resType )
{
    Initialize( GetModuleHandle( managedModule ),
        resId, (ushort)resType );
}
 
public Win32ResourceStream( string moduleName,
    ushort resId, ushort resType )
{
    Initialize( GetModuleHandle( moduleName ),
        resId, resType );
}
 
protected void Initialize( IntPtr hModule,
    ushort resId, ushort resType )
{
    if( hModule == IntPtr.Zero )
        throw new ArgumentNullException( "hModule" );
 
    var hRes = Native.FindResource( hModule,
        (IntPtr)resId, (IntPtr)resType );
    Initialize( hModule, hRes );
}

And that’s that!

Filed in code, Windows | Comment Now

Windowed Fullscreen

Windowed (fake) fullscreen is probably my favorite graphics option ever when it comes to PC games. It lets me have my nice fullscreen game, but doesn’t lock me out of using my other monitor, and any programs running behind the game are an instant ALT+TAB away. Games that can go from fully windowed to fake-fullscreened in an instant are also super cool, and not all that difficult to write. So how does one implement such a thing?

Creating the Window

Let’s start off with a game running in a regular window. We start off by making an ordinary window and then setting up our 3D context.

One note, before we get to the code – we’re going to be changing the window border styles as we go, and that means that the thickness of the borders will change. Since our content is what really matters, the window position will be specified in terms of the client rectangle and later adjusted as necessary.

//pick out our window styles
 
DWORD style = WS_OVERLAPPED | WS_SYSMENU |
    WS_BORDER | WS_THICKFRAME | WS_CAPTION |
    WS_MAXIMIZEBOX | WS_MINIMIZEBOX;
DWORD ex_style = WS_EX_APPWINDOW;
 
//pick out our desired client rect coordinates
//these are relative to the desktop rectangle
 
int left = CW_USEDEFAULT;
int top = CW_USEDEFAULT;
int width = 800;
int height = 600;
 
//convert the client rectangle into a window
//rectangle for CreateWindow
 
RECT rc = { 0, 0, 200, 200 };
AdjustWindowRectEx( &rc, style, FALSE, ex_style );
 
if( left != CW_USEDEFAULT )
    left += rc.left;
if( top != CW_USEDEFAULT )
    top += rc.top;
 
if( width != CW_USEDEFAULT )
    width += (rc.right - rc.left) - 200;
if( height != CW_USEDEFAULT )
    height += (rc.bottom - rc.top) - 200;
 
//create the window
 
HWND hwnd = CreateWindowEx( ex_style, _T( "MyWindowClassName" ),
    _T( "Sample Window" ), style, left, top, width, height,
    NULL, NULL, GetModuleHandle( NULL ), NULL );
 
ShowWindow( hwhd, SW_SHOW );

Transitioning to Fullscreen

The first thing we need to do before transitioning into fullscreen mode, is we need to save our window’s position, so that we can restore to that position when we transition back out:

RECT saved_pos;
bool is_fullscreen = false;
 
//in our to-fullscreen function
 
if( is_fullscreen )
    //already fullscreen, nothing more to do
    return;
 
GetClientRect( hwnd, &saved_pos );
 
POINT pt = { 0, 0 };
ClientToScreen( hwnd, &pt );
 
saved_pos.left = pt.x;
saved_pos.top = pt.y;

Next up, we need to find a rectangle that covers the monitor we’re going to go fullscreen on. In this case, I’m going to take the monitor that the window is on (or mostly on). You could just as easily ask DXGI to hand you the desktop rectangle associated with whatever adapter you’d like to render on, or use some other API.

HMONITOR target_monitor = MonitorFromWindow( hwnd,
    MONITOR_DEFAULTTONEAREST );
 
MONITORINFO info;
info.cpSize = sizeof( MONITORINFO );
GetMonitorInfo( monitor, &info );
 
RECT dest_pos = info.rcMonitor;

Once we have our target rectangle, we’re ready to make the transition. We get rid of the window’s borders and move it so that it covers the entire target rectangle:

DWORD style = WS_POPUP;
DWORD ex_style = WS_EX_APPWINDOW;
 
//future-proofing in case MS fiddles with the meaning
//of "no borders, please"
 
RECT rc = dest_pos;
AdjustWindowRect( &rc, style, FALSE, ex_style );
 
//update the styles
 
if( IsWindowVisible( hwnd ) )
    //important: odd bugs arise otherwise
    style |= WS_VISIBLE;
 
SetWindowLong( hwnd, GWL_STYLE, style );
SetWindowLong( hwnd, GWL_EXSTYLE, ex_style );
 
//move the window
 
SetWindowPos( hwnd, NULL, rc.left, rc.top, rc.right - rc.left,
    rc.bottom - rc.top, SWP_NOACTIVATE | SWP_NOZORDER |
    SWP_FRAMECHANGED );
 
//and note the new state
 
is_fullscreen = true;

And there we are! Our window is now fullscreen. One small problem – the start bar covers it, as do any desktop toolbar apps, and we don’t want that. So let’s change the last bit:

SetWindowPos( hwnd, HWND_TOPMOST, rc.left, rc.top,
    rc.right - rc.left, rc.bottom - rc.top,
    SWP_NOACTIVATE | SWP_FRAMECHANGED );

OK, better, except that now we can’t ALT+TAB to other programs. Well, we can, and they become active, but we can’t see them if they’re underneath our window because they can’t be brought up on top of it. We fix this by giving up our topmost status whenever our window loses focus and taking it back when focus returns. Somewhere in the window’s message procedure:

case WM_ACTIVATE:
    if( is_fullscreen )
    {
        SetWindowPos( hwnd, LOWORD( wParam ) != WA_INACTIVE ?
            HWND_TOPMOST : HWND_NOTTOPMOST, 0, 0, 0, 0,
            SWP_NOACTIVATE | SWP_NOMOVE | SWP_NOSIZE );
    }
    break;

Transitioning Back

Transitioning back to windowed mode is also straightforward, we simply put our old window border back, move the window to its original location, give up our topmost status, and carry on as usual:

//in our to-windowed function
 
if( !is_fullscreen )
    //already windowed, nothing more to do
    return;
 
DWORD style = WS_OVERLAPPED | WS_SYSMENU |
    WS_BORDER | WS_THICKFRAME | WS_CAPTION |
    WS_MAXIMIZEBOX | WS_MINIMIZEBOX;
DWORD ex_style = WS_EX_APPWINDOW;
 
RECT rc = saved_pos;
AdjustWindowRect( &rc, style, FALSE, ex_style );
 
//update the styles
 
if( IsWindowVisible( hwnd ) )
    style |= WS_VISIBLE;
 
SetWindowLong( hwnd, GWL_STYLE, style );
SetWindowLong( hwnd, GWL_EXSTYLE, ex_style );
 
//move the window
 
SetWindowPos( hwnd, HWND_NOTTOPMOST, rc.left, rc.top,
    rc.right - rc.left, rc.bottom - rc.top,
    SWP_NOACTIVATE | SWP_FRAMECHANGED );
 
is_fullscreen = false;

And there we are. Or are we?

Handling Maximized Windows

The above code will work wonderfully in all cases except when the window is maximized before transitioning into fullscreen. Handling that case is a bit trickier, since we not only have to save the window’s old position, but also its old pre-maximized position so that it restores properly after switching to fullscreen and back. Thankfully, it’s not hard to get and set this info all at once. We need to modify the first bit of transitioning to fullscreen as follows:

union
{
    RECT rc;
    WINDOWPLACEMENT placement;
} saved_pos;
 
bool is_fullscreen = false;
bool saved_as_placement;
 
//in our to-fullscreen function
 
if( is_fullscreen )
    //already fullscreen, nothing more to do
    return;
 
saved_as_placement = IsZoomed( hwnd );
if( saved_as_placement )
{
    saved_pos.placement.length = sizeof( WINDOWPLACEMENT );
    GetWindowPlacement( hwnd, &saved_pos.placement );
}
else
{
    GetClientRect( hwnd, &saved_pos.rc );
 
    POINT pt = { 0, 0 };
    ClientToScreen( hwnd, &pt );
 
    saved_pos.rc.left = pt.x;
    saved_pos.rc.top = pt.y;
}

And restoring back to windowed mode becomes this:

//in our to-windowed function
 
if( !is_fullscreen )
    //already windowed, nothing more to do
    return;
 
DWORD style = WS_OVERLAPPED | WS_SYSMENU |
    WS_BORDER | WS_THICKFRAME | WS_CAPTION |
    WS_MAXIMIZEBOX | WS_MINIMIZEBOX;
DWORD ex_style = WS_EX_APPWINDOW;
 
//update the styles
 
if( IsWindowVisible( hwnd ) )
    style |= WS_VISIBLE;
 
SetWindowLong( hwnd, GWL_STYLE, style );
SetWindowLong( hwnd, GWL_EXSTYLE, ex_style );
 
if( saved_as_placement )
{
    SetWindowPlacement( hwnd, &saved_pos.placement );
 
    SetWindowPos( hwnd, HWND_NOTTOPMOST, 0, 0, 0, 0,
        SWP_NOACTIVATE | SWP_NOMOVE | SWP_NOSIZE |
        SWP_FRAMECHANGED );
}
else
{
    RECT rc = saved_pos.rc;
    AdjustWindowRect( &rc, style, FALSE, ex_style );
 
    //move the window
 
    SetWindowPos( hwnd, HWND_NOTTOPMOST, rc.left, rc.top,
        rc.right - rc.left, rc.bottom - rc.top,
        SWP_NOACTIVATE | SWP_FRAMECHANGED );
}
 
is_fullscreen = false;

Why do we use the two different save modes? According to MSDN, WINDOWPLACEMENT is supposed to contain everything there is to know about the window’s location on the desktop, we should be able to get away with always just saving that…right? Well, no. If you do that, then, for whatever reason, the window won’t play well with Windows 7’s WIN_KEY+ARROW_KEY shortcuts. Don’t ask me why, I haven’t got a clue.

Filed in code, graphics, Windows | Comment Now