Visual C#: Phantom Breakpoints

So I’ve got a large solution with a lot of projects (mostly build tools) and every now and then I would get suspicious phantom breakpoints, meaning that the debugger would break where no breakpoint had been placed. This would usually happen after closing and reopening the project, and it only ever seemed to afflict the executables in my solution – never the libraries.

It turns out there’s a bug in how Visual C# saves and restores breakpoints. While they restore correctly in the UI, they can “leak” into other identically named files behind the scenes, creating phantom breakpoints in those files.

So if your debugger won’t stop breaking on an unassuming line in Program.cs, take a peek at the Program.cs files in your solution’s other projects – chances are one has a breakpoint set on that line, and clearing it will fix the phantom for you.

Filed in code | Comment Now

Getting Started with OpenSL on Android

Edit: This article, as originally posted, has an error in it, which has been corrected. Anyone who just wants to see the fix can check out this update.

So, you want to play some audio on an Android device. You’ve got your NDK set up, you grab your code, you hit compile and there’s a problem: OpenAL’s not supported on this platform. What we have instead is partial support for OpenSL ES 1.0.

And the problem with SLES is that it’s very hard to find a decent tutorial on how to use it. The specification has some examples at the end, but they’re not entirely easy to follow, and they reference features not supported on Android. So, without further ado, a brief OpenSL ES 1.0 tutorial:

Before I get into actually using the API, a brief word on how it’s structured, so that everything is easy to follow.

Objects

OpenSL is object-oriented. Objects are exposed via interfaces, which are somewhat similar to COM. Like COM objects, OpenSL objects expose a number of interfaces, each of which is identified and retrieved using a predefined IID.

//COM
 
IUnknown *obj;
IDXGIObject *ifc;
 
HRESULT res = obj->QueryInterface( IID_IDXGIObject, &ifc );
 
if( SUCCEEDED( res ) )
    //got our object
 
//OpenSL ES
 
SLObjectItf obj;
SLEngineItf ifc;
 
SLresult res = (*obj)->GetInterface( obj, SL_IID_ENGINE, &ifc );
 
if( res == SL_RESULT_SUCCESS )
    //got our object

One major difference between the two is that OpenSL doesn’t declare its objects as C++ classes, so you have to dereference the object and follow the link to the v-table yourself (like when accessing a COM object from plain C). The generated code is identical in both cases.

Another key difference between the two is that OpenSL interfaces do not inherit from SLObjectItf the way COM interfaces inherit from IUnknown, so you have to store the SLObjectItf alongside any useful interfaces you get from it so that you can manage the object’s lifetime and query it for other interfaces.

Creating Objects

OpenSL objects are created by a root engine object (except, of course, the engine object itself). Creating the engine is simple:

SLObjectItf engine_obj;
slCreateEngine( &engine_obj, 0, nullptr, 0, nullptr, nullptr );

The zeroes and nullptr arguments tell the API to use default options and that we don’t require any special interfaces (more on that later).

Creating other objects is, as mentioned earlier, done through the engine object. Once the engine is set up and you’ve retrieved its SLEngineItf interface you can create other objects by using its family of Create methods.

SLEngineItf engine; //initialized elsewhere
 
SLObjectItf output_mix_obj;
 
const SLInterfaceID ids[] = { SL_IID_VOLUME };
const SLboolean req[] = { SL_BOOLEAN_FALSE };
(*engine)->CreateOutputMix( engine, &output_mix_obj, 1, ids, req );

Alright, time to get to those parameters, which you’ll find on every Create function, that I glossed over before. OpenSL objects usually expose at least one interface besides SLObjectItf, but all other interfaces are optional and may not be supported. Furthermore, you need to tell OpenSL up front which interfaces you’re going to be requesting (the ids array), whether you absolutely need them or not (the req array), and obviously how many there are (the 1). If you mark an interface as required, but the implementation cannot provide it, object creation will fail and return SL_RESULT_FEATURE_UNSUPPORTED.

Realizing Objects

OpenSL objects come in three basic states. The initial state is SL_OBJECT_STATE_UNREALIZED. An object in this state has not allocated the resources it will ultimately use. An object in this state is useless until a call to Realize succeeds. And by useless I mean you can’t do anything with it except check the state, register a state callback, or call Realize. You can’t even retrieve the object’s main interface when it’s in this state.

The second state, SL_OBJECT_STATE_SUSPENDED, is something you may or may not encounter normally. It’s like the unrealized state, in that the object is useless, except that it still holds its resources, maintains the values of its properties, and the caller can query them. You get out of this state by calling Resume.

Successfully calling either Realize or Resume puts the object in the SL_OBJECT_STATE_REALIZED. An object in this state is fully useable. Objects can transition out of this state if there’s a system event (speakers unplugged, hardware error) or if another application takes control of the audio device. You can register a callback function to be notified if that happens, or you can catch the error you’ll get from a subsequent command to that object.

Realized objects can transition either to the suspended or to the unrealized state. It’s up to you to handle either case. The main difference between the two is that any interface pointers you hold on this object remain valid while it’s in SL_OBJECT_STATE_SUSPENDED (though you can’t set its properties or make it do anything useful). If an object falls into the SL_OBJECT_STATE_UNREALIZED state, you need to discard all interface pointers (except to the main SLObjectItf), get them back after a call to Realize succeeds, and then fully reinitialize all of the object’s properties.

Anyway, we’ve created an object, and it’s in the unrealized state. Let’s realize it so that we can use it.

SLObjectItf obj;
(*obj)->Realize( obj, SL_BOOLEAN_FALSE );

The second parameter to Realize enables asynchronous activation. If you set it to SL_BOOLEAN_TRUE then Realize will return immediately, but the object won’t actually be realized until some later time. You can register a call back or poll the object via GetState to see when that is.

Destroying Objects

OpenSL also differs from COM in that its objects aren’t reference-counted. When you’re done with an object, call Destroy on its object interface:

SLObjectItf obj;
(*obj)->Destroy( obj );

The object and all interfaces you got from it are invalid after that call is made.

Error Checking

I’m omitting error checking for the sake of clarity. Always be sure to check return codes, especially from functions which create objects and from Realize.

The Audio Graph

If you’ve used XAudio2, you’ll be right at home with OpenSL. The API is structured around the concept of an audio graph, where AudioPlayer objects read sources and stream to other objects (such as the output mixer).

I’m not really going to go into much detail here, mainly because Android only supports a small subset of the possible features, and I think things will become clear as I go through the example.

Using OpenSL

Right! On to the code! Again, and I can’t possibly stress this enough, check the return values. You will run into errors and unsupported features, especially on a platform as diverse as Android. Don’t expect things to just work.

Since each object will be accessed by a number of interfaces, I’m using a simple naming convention to make it clear when multiple interfaces all refer to the same object. The object’s interface variables will all have a common root. The main SLObjectItf bears the suffix _obj, the main object-specific interface has no suffix, and any secondary or optional interfaces get their own suffix.

Initializing OpenSL

The first thing we need to do is initialize OpenSL for playback. That means creating an Engine and an OutputMix.

//create the Engine object
 
SLObjectItf engine_obj;
SLEngineItf engine;
 
slCreateEngine( &engine_obj, 0, nullptr, 0, nullptr, nullptr );
(*engine_obj)->Realize( engine_obj, SL_BOOLEAN_FALSE );
(*engine_obj)->GetInterface( engine_obj, SL_IID_ENGINE, &engine );
 
//create the main OutputMix, try to get a volume interface for it
 
SLObjectItf output_mix_obj;
SLVolumeItf output_mix_vol;
 
const SLInterfaceID ids[] = { SL_IID_VOLUME };
const SLboolean req[] = { SL_BOOLEAN_FALSE };
 
(*engine)->CreateOutputMix( engine, &output_mix_obj, 1, ids, req );
 
(*output_mix_obj)->Realize( output_mix_obj, SL_BOOLEAN_FALSE );
 
if( (*output_mix_obj)->GetInterface( output_mix_obj,
    SL_IID_VOLUME, &output_mix_vol ) != SL_RESULT_SUCCESS )
    output_mix_vol = nullptr;

Alright, so we create and realize our engine. So far, so good.

The output mix is a little trickier. We’d like to have a global volume control in one place, but the OutputMix object doesn’t necessarily support it (and on current Android builds it doesn’t), so we ask for it, and if we don’t get it then we set output_mix_vol to NULL.

Playing a Clip

To play a clip, we create an AudioPlayer object linking our source data and our OutputMix, set the volume, and send it a command to start playback. For this example I’m going to assume we’ve got 16-bit mono audio samples stored in a single buffer in memory:

const void *clip_samples;             //the raw samples
unsigned int clip_num_samples;        //how many samples there are
unsigned int clip_samples_per_sec;    //the sample rate in Hz

Creating an AudioPlayer

Given that, let’s set up our player. First we need to set up our input link to the audio buffer, which OpenSL calls a DataLocator.

#ifdef TARGET_ANDROID
SLDataLocator_AndroidSimpleBufferQueue in_loc;
in_loc.locatorType = SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE;
in_loc.numBuffers = 1;
#else
SLDataLocator_Address in_loc;
in_loc.locatorType = SL_DATALOCATOR_ADDRESS;
in_loc.pAddress = clip_samples;
in_loc.length = clip_num_samples * 2;
#endif

While OpenSL defines a simple data locator designed for in-memory buffers, Android doesn’t actually support it, so when compiling for Android we’ll have to use an Android extension called the Simple Buffer Queue.

Once we have our data locator defined, we need to define what’s in it and link the two together into an SLDataSource structure:

SLDataFormat_PCM format;
format.formatType = SL_DATAFORMAT_PCM;
format.numChannels = 1;
format.samplesPerSec = clip_samples_per_sec() * 1000; //mHz
format.bitsPerSample = SL_PCMSAMPLEFORMAT_FIXED_16;
format.containerSize = 16;
format.channelMask = SL_SPEAKER_FRONT_CENTER;
format.endianness = SL_BYTEORDER_LITTLEENDIAN;
 
SLDataSource src;
src.pLocator = &in_loc;
src.pFormat = &format;

Fairly straighforward. One odd thing is that the samplesPerSec is misnamed. OpenSL actually requires the sample rate in millihertz.

Moving on to the output. This link goes straight to our OutputMix object.

SLDataLocator_OutputMix out_loc;
out_loc.locatorType = SL_DATALOCATOR_OUTPUTMIX;
out_loc.outputMix = output_mix_obj;
 
SLDataSink dst;
dst.pLocator = &out_loc;
dst.pFormat = nullptr;

Alright. Now we create our AudioPlayer object. We want volume controls and if we’re on Android then we need it to support the Simple Buffer Queue extension as well:

//some variables to store interfaces
 
SLObjectItf player_obj;
SLPlayItf player;
SLVolumeItf player_vol;
 
#ifdef TARGET_ANDROID
SLAndroidSimpleBufferQueueItf player_buf_q;
#endif
 
//create the object
 
#ifdef TARGET_ANDROID
const SLInterfaceID ids[] = { SL_IID_VOLUME,
    SL_IID_ANDROIDSIMPLEBUFFERQUEUE };
const SLboolean req[] = { SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE };
#else
const SLInterfaceID ids[] = { SL_IID_VOLUME };
const SLboolean req[] = { SL_BOOLEAN_TRUE };
#endif
 
(*engine)->CreateAudioPlayer( engine,
    &player_obj, &src, &dst, lengthof( ids ), ids, req );
 
(*player_obj)->Realize( player_obj, SL_BOOLEAN_FALSE );
 
(*player_obj)->GetInterface( player_obj,
    SL_IID_PLAY, &player );
(*player_obj)->GetInterface( player_obj,
    SL_IID_VOLUME, &player_vol );
 
#ifdef TARGET_ANDROID
(*player_obj)->GetInterface( player_obj,
    SL_IID_ANDROIDSIMPLEBUFFERQUEUE, &player_buf_q );
#endif

Knowing When to Stop

Alright, here’s where it gets interesting. AudioPlayer objects don’t automatically transition their state to SL_PLAYSTATE_STOPPED when they reach the end of their data source. So we’ll need to register a callback in order to find out when playback is actually complete. And this callback, unfortunately, runs on a background thread and is required to return very quickly, so we can’t do much from directly inside of it.

//some flags to keep track of playback state
//I have those next to my player interface variables
//make sure you read the note below about thread safety!
bool is_playing, is_done_buffer;
 
//define our callback
 
void SLAPIENTRY play_callback( SLPlayItf player,
    void *context, SLuint32 event )
{
    if( event & SL_PLAYEVENT_HEADATEND )
        is_done_buffer = true;
}
 
//register the callback
 
(*player)->RegisterCallback( player, play_callback, nullptr );
(*player)->SetCallbackEventsMask( player, SL_PLAYEVENT_HEADATEND );

That last parameter to RegisterCallback is passed to the callback in the context parameter, so if you’ve got all these variables embedded in a custom object instead of sitting around as globals (as I do) you can just pass your object’s pointer there.

The callback simply sets a flag, which we’ll pick up on our main thread which will be regularly polling the playback state.

Checking up on Playback

While the AudioPlayer is playing, we need to periodically check up on it to see if the sound is finished. If we don’t do this, then we won’t know when to release the object. Once a frame, we do something like the following:

if( is_playing && is_done_buffer )
{
    (*player)->SetPlayState( player, SL_PLAYSTATE_STOPPED );
 
#ifdef TARGET_ANDROID
    (*player_buf_q)->Clear( player_buf_q );
#endif
 
    is_playing = false;
}

Note that you might have to guard the is_done_buffer variable with some sort of synchronization primitive, as it will be accessed from multiple threads. While the code written above will generally work when compiled by the NDK, it’s technically incorrect, and might break under aggressive optimization levels. If you’re using C++ and aren’t stuck on an old version of the NDK, it suffices to declare is_done_buffer as std::atomic<bool>.

Oh, Android…

Unfortunately, there’s one little wrinkle. On Android, the callback can sometimes (depending on the sample rate of the audio clip and the particulars of the device’s audio hardware) fire before the sound has finished playing. This happens when the system plays the audio through an intermediate buffer. When the last bit of the sound is processed into the intermediate buffer, OpenSL is done with the input buffer, and it fires the event we’re looking for then, despite the fact that it might be a good half second or more before the intermediate buffer’s contents make it out to the speakers. So we also need to set a timer for ourselves.

We do this by adding another variable (play_start_time) next to is_done_buffer to track the time we started playing the sound, and then modifying our periodic check as follows:

if( is_playing && is_done_buffer &&
    current_time() - play_start_time > length_of_clip )

You can use whatever timer you like to implement current_time. Just make sure it’s based on the system’s run time and is monotonic (that is, it won’t suddenly jump if the user turns their phone on after a flight and the system clock adjusts to a new time zone).

Play!

And now we’re ready to start playing the clip:

#ifdef TARGET_ANDROID
(*player_buf_q)->Enqueue( player_buf_q, clip_buffer, clip_size );
#endif
 
is_playing = true;
is_done_buffer = false;
 
(*player)->SetPlayState( player, SL_PLAYSTATE_PLAYING );
play_start_time = current_time();

Again, on Android we have to use the special buffer queue interface, so set that up first off.

After that we set up our state tracking variables, so that we know what’s going on.

And finally, set the AudioPlayer to the SL_PLAYSTATE_PLAYING state, which begins actual playback.

Stop!

(*player)->SetPlayState( player, SL_PLAYSTATE_STOPPED );
 
#ifdef TARGET_ANDROID
(*player_buf_q)->Clear( player_buf_q );
#endif
 
is_playing = false;

Pretty straightforward. The only interesting thing here is the Clear command, which resets the input source for a subsequent play command.

Volume Controls

Controlling the volume on an individual sound is easy. The global volume, however, is tricky since Android doesn’t give us a volume control interface on our OutputMix.

I’m assuming the volume setting is coming in as a “gain” value (that is, as a linear 0-1 “loudness”).

//assuming you have these values kicking around
float sound_gain, global_gain;
 
//update the gain on a sound
 
float g = sound_gain;
 
if( !output_mix_vol )
    g *= global_gain;
 
(*player_vol)->SetVolumeLevel( player_vol,
    (SLmillibel)(gain_to_attenuation( g ) * 100) );

OpenSL takes its volume as an attenuation, or as the number of decibels to change the volume from its default loudness. These values are typically going to be negative decibels:

float gain_to_attenuation( float gain )
{
    return gain < 0.01F ? -96.0F : 20 * log10( gain );
}

Also note the check against output_mix_vol. If we don’t have a global volume control, then we need to run this code to adjust every active AudioPlayer’s volume whenever global_gain changes.

Cleaning Up

Audio resources are limited, and some Android devices have buggy firmware, so it’s important that we’re very careful about cleaning up and that we don’t just blindly trust the OS to do it for us.

AudioPlayer Objects

Timely cleanup is especially important when it comes to the AudioPlayer objects. You can only create so many of them before you run out of system resources and Realize starts to fail. So when we’re doing our status polling and we notice that an AudioPlayer is not playing and we’re sure it won’t be asked to play again, we do the following:

(*player_obj)->Destroy( player_obj );
 
player_obj = nullptr;
player = nullptr;
player_vol = nullptr;
 
#ifdef TARGET_ANDROID
player_buf_q = nullptr;
#endif

This ensures that we don’t run out of available playback resources.

The OutputMix and Engine

And when we’re done playing sound in general (say at application shutdown), we destroy all AudioPlayers and then the OutputMix and Engine objects:

(*output_mix_obj)->Destroy( output_mix_obj );
output_mix_obj = nullptr;
output_mix_vol = nullptr;
 
(*engine_obj)->Destroy( engine_obj );
engine_obj = nullptr;
engine = nullptr;
Filed in Android, code | 8 Comments

Abstracting Low-Level I/O

One of the common requirements in game development is that we need to load large blocks of (usually) compressed data in as little time as possible. This, however, is somewhat easier said than done. Ideally, what we’re looking for is a simple asynchronous I/O API.

Unfortunately, PCs and consoles and mobile devices differ significantly on the sort of APIs they offer. Not only do the types of API differ, but the particular restrictions they make on read sizes also vary. And so do the performance characteristics of each. Some platforms don’t even offer much in the way of options: different drives end up exposed through entirely different APIs.

I recently settled on a fairly simple abstraction to make it easy to work with all these different APIs in a consistent manner. Instead of treating a file as a string of bytes, the loader code treats it as a series of variably-sized chunks. The interface consists of one core function which returns the address and size of the next chunk of data. This lets me easily and efficiently wrap up almost any API.

Got asynchronous I/O? Great! Allocate some buffers, kick off a bunch of requests, and hand the chunks to the decoder as they come in. It’s trivial to make additional read requests or block the loader thread as needed when it asks for the next chunk.

No luck? Stuck with a synchronous stream? Well, alright. Allocate a chunk-sized buffer, fill it with data, and each time the loader asks for the next page just go ahead and block the thread with another read into that same buffer. Or go crazy and spawn another thread to emulate async I/O, if that’s somehow faster.

Memory-mapped files the fastest way to go? Maybe just for certain files? No problem. Map the file and hand the loader a single large chunk.

Happen to somehow have your data sitting in memory already? That’s no different from the memory-mapped case.

So long as the loader doesn’t make assumptions about the number and size of chunks, the lower-level code is free to make those buffers in whatever number and at whatever size is the most efficient for the target device.

Filed in code | Comment Now

Simple Digital Signatures

One of the things that comes up when sending data over the internet is verifying that it hasn’t been corrupted. This is generally a simple thing to resolve: send the data and a good hash (MD5 or SHA-1) of the data together. Recompute the hash on the client side and compare it to the hash you sent. If any bits have changed, the two won’t match, and you know you need to redownload the file. I suppose it’s possible both the data and hash could be corrupted in such a way that they match, but if your hash function is any good then the likelihood of this happening by chance is so astonishingly low that it doesn’t bear consideration.

But what if you’re worried that someone might be tampering with your file? An attacker editing the data could also replace the hash, leaving your app none the wiser. This is what digital signatures were made for.

Digital signatures are based on public key cryptography, for instance the RSA algorithm. In a nutshell, what you produce is a pair of keys (these are basically a related pair of enormously large numbers, though the details vary depending on your encryption algorithm of choice), and data which is encrypted with one key can only be decrypted with the other. One key, the private key, is kept secret. The other key, the public key, is attached to your program.

Once you’ve got your keys set up, you produce a digital signature by encrypting the hash you originally computed with your private key. Your app then decrypts it with its copy of the public key and compares it to the hash of the data which it computes. So long as you keep your private key closely guarded, an attacker will be unable to create a signature which decrypts with your app’s public key.

Implementation

So how do we implement this? Well, it’s not particularly difficult to do, but be warned, you must be very careful when implementing as it is very easy to make mistakes which defeat the entire system. In fact, the sample code below should not be used verbatim. It’s been pulled from a larger block of code and trimmed and modified significantly for public consumption. Don’t assume I haven’t broken something in the process. You have been warned!

Another note before I begin: the toolchain accompanying the engine I’ve integrated this into is written in C#. That means I’m going to be using the standard .NET cryptography API to create my signature. The details may change if you use a different API. I’ll be using SHA-1 hashes and RSA encryption.

Creating the Signature

Creating the signature is fairly straight-forward. Generate a hash and encrypt it:

using System.IO;
using System.Security.Cryptography;
 
//...
 
byte[] CreateSignature( Stream data, RSAParameters privateKey )
{
	//first we compute the hash of the data
 
	byte[] hash;
	using( var hasher = new SHA1Managed() )
		hash = hasher.ComputeHash( stream );
 
	//and then we sign it
 
	using( var rsa = new RSACryptoServiceProvider() )
	{
		rsa.ImportParameters( privateKey );
		return rsa.SignHash( hash, "SHA1" );
	}
}

Creating the hash is fairly straightforward, but if we’re going to use that signature later then we need to know what RSACryptoServiceProvider.SignHash is doing. Well, the first thing it does is it takes our hash and appends it to a little block of data describing the hash algorithm we’ve used to create it (that’s why it needs the last parameter). After that, it attaches PKCS 1.5 padding to the data in order to make it large enough to encrypt (RSA encrypts blocks of data equal in size to the key you use). Once that’s done, it encrypts the data using the private key you supplied earlier and returns the encrypted blob (also equal in size to the key you’re using).

Now in my case I attach both the signature and the public key (making it easier to use multiple keys) to the data that I’m sending, but this isn’t required. If you’ll only ever support one key you can omit that portion, and there’s really no rule that says the data has to be attached to the same data stream – it could be stored anywhere.

void SignStream( Stream data, RSAParameters key )
{
	var signature = CreateSignature( data, key );
 
	//creating the signature leaves the stream's position
	//at the end of the data, so we're good to write more
 
	var writer = new BinaryWriter( data );
 
	//write some header info (the key size)
 
	writer.Write( key.Modulus.Length );
	writer.Write( key.Exponent.Length );
 
	//write the public portion of the key
 
	writer.Write( key.Modulus );
	writer.Write( key.Exponent );
 
	//and write the signature
 
	writer.Write( signature );
}

Great, so now we’ve got our signature tacked onto our file. Now what?

Verifying the Signature

Well, first we’ll need an implementation of SHA-1 and RSA. The former is easy. The latter, well, not so much. Many of the common system libraries are set up such that they only accept keys from their own secure stores, making it a pain to set them up for use. In my case I also want this code to be portable, and rewriting (and testing!) it for each and every platform just isn’t something I’m about to do. Fortunately, there are several free RSA implementations available.

For this example, I’ll be using the one in the axTLS project. I’ve picked this one because its RSA implementation is small and self-contained (many others depend on massive math libraries) which makes it easy to integrate (take rsa.c and bigint.c plus the headers they need and make sure CONFIG_SSL_CERT_VERIFICATION is defined). It’s also got a handy SHA-1 implementation sitting right next to its RSA code. Another good option is LibTomCrypt (but be warned, it takes some effort to get it compiling on Windows).

#include "axTLS/crypto/crypto.h"
#define HASH_SIZE SHA1_SIZE //SHA1_SIZE = 20
#define MAX_KEY_LEN 512 //max possible key.Modulus.Length

Once that’s done, we need to load our file and parse out the signature. Let’s skip the tedious IO code and assume we’ve got everything in memory as follows:

const void *data = /* the signed data */ ;
size_t data_size = /* the size of the data */ ;
 
size_t mod_size = /* key.Modulus.Length */ ;
size_t exp_size = /* key.Exponent.Length */ ;
 
const void *mod = /* key.Modulus data */ ;
const void *exp = /* key.Exponent data */ ;
 
const void *sig = /* the signature */ ;

The first step is to make sure that this is indeed our public key (you can skip this if you’re using only one key and haven’t got it attached to each signature).

//valid_keys is a list of all the public
//keys that match trusted private keys
 
bool is_trusted_key = false;
 
for( auto p = valid_keys.begin(); p != valid_keys.end(); ++p )
{
	if( p->mod_size != mod_size || p->exp_size != exp_size )
		continue;
 
	if( memcmp( p->mod, mod, mod_size ) != 0 )
		continue;
 
	if( memcmp( p->exp, exp, exp_size ) != 0 )
		continue;
 
	//found a matching key
	is_trusted_key = true;
	break;
}
 
if( !is_trusted_key )
	return ERR_KEY_NOT_TRUSTED;

The next step is computing the SHA-1 hash of the data.

unsigned char hash[SHA1_SIZE];
 
SHA1_CTX md;
SHA1_Init( &md );
SHA1_Update( &md, (const uint8_t*)data, data_size );
SHA1_Final( hash, &md );

Next up, we decrypt the signature.

//initialize an RSA context
 
RSA_CTX *rsa = NULL;
RSA_pub_key_new( &rsa, mod, mod_size, exp, exp_size );
 
if( !rsa )
	return ERR_OUT_OF_MEMORY;
 
//decrypt the data
 
uint8_t sig_bytes[MAX_KEY_LEN];
int len = RSA_decrypt( rsa, (const uint8_t*)sig, sig_bytes, 0 );
 
//clean up
 
RSA_free( rsa );
 
//check for errors
 
if( len == -1 )
	return ERR_INVALID_SIGNATURE;

Now RSA_decrypt takes care of unpadding the data, but the signature is still preceded by the hash identifier. So we need to take just the last HASH_SIZE bytes of the decoded buffer. (Note: it’s not a bad idea to validate hash ID, too. I’m keeping things simple to illustrate the basic process.)

if( len < HASH_SIZE )
	return ERR_INVALID_SIGNATURE;
 
uint8_t *sig_hash = sig_bytes + len - HASH_SIZE;

And all that’s left now is to compare the hashes:

if( memcmp( hash, sig_hash, HASH_SIZE ) != 0 )
	//they don't match, therefore the data
	//has changed since we made the signature
	return ERR_INVALID_DATA;
 
//everything checks out, the hashes match
 
return SUCCESS;

And that’s that. If we get a value of SUCCESS then we can be certain of the following:

  • The data isn’t randomly corrupted.
  • The data hasn’t been tampered with.
  • The data was produced by someone who has a trusted private key.

And if we know that we’ve kept our private keys safe, then we can be sure that we’re the ones that made the data.

Filed in code, security | Comment Now

The Player’s Name vs Localization

From a game developer’s perspective, the English language is incredibly simple. Our grammar is only minimally inflectional, making it easy to author strings like “@(PLAYER) runs away!” and “Give this to @(TARGET).” and use simple text substitution to replace tokens like “@(TARGET)” with the name of a player, NPC, or object as needed. There are some places where this doesn’t quite work (dealing with numbers and plurals or dialog which might be referring to either males or females), but they’re either uncommon or easy to ignore (in many cases it’s unlikely the player will have just one of something and we can just use the plural).

However, this is far from the case in many other languages. Now that’s obviously common sense, but I’ve seen authors underestimate the magnitude of the problem many times, so an example is in order. Let’s take Russian and assume we’ve got two players, Антон (Anton) and Юлия (Julia). Let’s take a look at how some simple examples play out, and mind the italics.

The first example I gave works easily with both: “Антон убегает.” and “Юлия убегает.” are both correct. However, if we put the sentence in the past tense, things change: “Антон убежал.” and “Юлия убежала.” It’s no longer enough to just have the player name around, you now need a system that knows the player’s gender and can match the appropriate verb form to it. You’ve now got a template that includes little functions like “@(PLAYER) @(match-gender, PLAYER, убежал, убежала).” and the substitution code just got a hell of a lot more complex.

The second example is even worse, because it’s not just the verbs that change in Russian. The nouns do too, including the ones you’re trying to drop into sentences. Let’s take a look: “Дайте это Юлии.” and “Дайте это Антону.” Awesome. More endings changing. These ones reflect the noun’s role in the sentence, and they’re particularly nasty because the rules depend not just on the word’s gender and role in the sentence, but also on the particular word itself. Oh, and adjectives also change along with the noun they’re attached to, too.

And then there’s numbers. One year, two years, three years, so on. In Russian that’s один год, два года, три года, so far so good, the plural’s a bit more complex than just adding an S (it varies by gender) but no big deal… until we get to the number five: пять лет… and then the pattern repeats at twenty one, thirty one, forty one…

And that’s just a few examples in one language. Other languages present their own difficulties and will need their own sets of complex substitution rules (like our gender-matching example above). So how do we deal with this?

Well, we can ignore it. The translation team will usually be able to work around the worst offenses, though the resulting sentences may come out stilted or ambiguous. It’s unfortunate and looks much worse in some languages than “Anton picks up one boxes.” does to us, but sometimes that’s just life.

Alternately, we can develop a sophisticated system of substitution functions. This makes it possible to author sentences that correctly morph to accommodate different words. However, this is an incredibly difficult task in some languages. Furthermore, the more complex the system gets, the more the translators need to be able to think like programmers in order to really leverage the functions. So, realistically, this approach only goes so far, and we’ll want to work with the localization team to come up with the minority of features which will cover the majority of cases.

And finally, we can plan ahead and author the content to be less reliant on substitutions in the first place. Use pronouns where possible, and restrict substitutions to the simplest of sentences. Don’t let the players type in a name every single thing they see and everyone they meet. Things like that.

Filed in design, language, Uncategorized | Comment Now

iOS Compiler Bug: -O3 FIXES It?!

This is officially the strangest optimization-related compiler bug I’ve ever seen.

It turned up as an error decompressing some data. Now, the decompression isn’t some crazy custom algorithm. It’s zlib, one of the most portable and well-tested bits of code in the world. And it’s C, so it isn’t as though the compiler’s got any crazy templates or other such goodies to trip over.

The really weird bit was that the bug would only manifest on an iOS device. The data would decompress just fine on a PC or in the simulator. Normally, this would suggest some sort of uninitialized-memory bug, but after memsetting everything I could think of to zero in the code driving zlib, the bug persisted, and I was at a loss.

So off I went into the disassembly. It turns out that the compiler decided to turn this:

if (state->lens[256] == 0) {
        strm->msg = (char *)"invalid code -- missing end-of-block";
        state->mode = BAD;
        break;
}

into this:

if (state->lens[0] == 0) {
        strm->msg = (char *)"invalid code -- missing end-of-block";
        state->mode = BAD;
        break;
}

The variable state is a pointer to a struct, and lens is an array of 16-bit integers declared inline to the structure, so there isn’t even an extra indirection. The compiler was just miscalculating the offset it needs to add to the pointer to get the address of the 256th element.

The fix? Turning optimization on (even at -O3). That’s something I’ve never seen before. Optimization is usually the thing that breaks code, not the thing that fixes it… Especially not in a language as simple and as old as C.

Spotted on XCode 4.2.1 using Apple’s LLVM Compiler 3.0 targeting an armv6 device.

It’ll be interesting to see if the bug reproduces on an armv7 device, to find out if it’s a matter of Apple rushing out the new compiler or if they just decided to skip QA for systems targeting legacy devices.

Filed in build, iOS | Comment Now

Using Win32 Asynchronous I/O

Recently wrote some asynchronous I/O code for a fast data loader. The data file was logically a stream of separate objects, so it made sense to parse it a chunk at a time. That’s a situation which practically screams for asynchronous I/O. Unfortunately, it’s rather hard to find a useful example on how to use the relevant APIs…

Setup

So the general idea is to read the data a chunk at a time. That means we need to have space available for each chunk to be read into. In addition, Windows tracks asynchronous file I/O via the address of an OVERLAPPED struct. That means that the OVERLAPPED objects must persist for the duration of each read request, so they’ll need to be allocated ahead of time as well. And, to keep things simple, we’re going to use events for synchronization, so we’ll need to make a few of those.

We also need to keep one interesting thing in mind – getting the best performance out of asynchronous I/O requires us to open the file in unbuffered mode. The trouble with that is that it imposes restrictions on the alignment of our target buffers and on the size of the reads we’re allowed to make. We’re going to deal with these by doing everything in multiples of the system’s page size, which will be such that it satisfies the API.

//get the system's page size and compute
//the size of data chunk we'll be dealing with
 
SYSTEM_INFO osInfo;
GetSystemInfo( &osInfo );
 
DWORD chunkSize = osInfo.dwPageSize * 32;
 
//the maximum number of concurrent read requests
//we'll issue - plus one!
#define NUM_REQS 16
 
HANDLE ev[NUM_REQS];
OVERLAPPED olp[NUM_REQS];
void *buf[NUM_REQS];
 
//allocate the data buffers
 
buf[0] = VirtualAlloc( NULL, chunkSize * NUM_REQS,
	MEM_COMMIT, PAGE_READWRITE );
for( int i = 1; i < NUM_REQS; i++ )
	buf[i] = (char*)buf[i - 1] + chunkSize;
 
//create the events
 
for( int i = 0; i < NUM_REQS; i++ )
	ev[i] = CreateEvent( NULL, TRUE, FALSE, NULL );

And we’re now ready to open the file.

Opening the File

Opening the file is straightforward. We just need to add a couple flags to the usual call to CreateFile. In particular, we need FILE_FLAG_NO_BUFFERING and FILE_FLAG_OVERLAPPED.

We’re also going to query the size of the file so we know how many chunks we’ll need to process in total.

HANDLE file = CreateFile( path, GENERIC_READ, FILE_SHARE_READ,
	NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL |
	FILE_FLAG_SEQUENTIAL_SCAN | FILE_FLAG_NO_BUFFERING |
	FILE_FLAG_OVERLAPPED, NULL );
 
if( file == INVALID_HANDLE_VALUE )
	//handle errors
	;
 
LARGE_INTEGER size;
GetFileSizeEx( file, &size );
ULONGLONG numChunks = (size.QuadPart + chunkSize - 1) / chunkSize;

And now it’s time to read our file.

Starting the Read

So, now that the file’s open, we need to start kicking off read requests via ReadFile. These requests all look rather alike, so let’s make a helper function for this.

void RequestChunk( ULONGLONG chunkNum )
{
	int n = chunkNum % NUM_REQS;
 
	OVERLAPPED *o = &olp[n];
	void *b = buf[n];
 
	memset( o, 0, sizeof( OVERLAPPED ) );
 
	LARGE_INTEGER ofs;
	ofs.QuadPart = chunkNum * chunkSize;
	o.Offset = ofs.LowPart;
	o.OffsetHigh = ofs.HighPart;
	o.hEvent = ev[i];
 
	ReadFile( file, b, chunkSize, NULL, o );
}

One thing to note is that we do not do anything special when requesting the last chunk. Even if the file isn’t an even multiple of the chunk size in length, we must still request an even multiple of the block size at a time. This is most easily handled by requesting a full chunk and letting the OS sort it out if there isn’t enough actual data in the file to fill the buffer.

Now, asynchronous I/O works best when we keep multiple requests in flight. We’re going to start by kicking off the number of concurrent requests we plan on running at once. Note that NUM_REQS is one greater than this number. This is because the parsing code will be handling one chunk at any given time, and we can’t be reading into it while that’s happening.

for( int i = 0; i < (int)min( numPages, NUM_REQS - 1 ); i++ )
	RequestChunk( i );

Parsing

And now we (finally) come to the heart of algorithm. This is where we actually loop over each chunk and process the data within it. Synchronization is handled for us by Windows (so long as we set the hEvent field in the OVERLAPPED struct). The call we make to GetOverlappedResult will block if the data isn’t ready yet (that is, if we’re parsing the data more quickly than the disk can provide it).

for( ULONGLONG i = 0; i < numChunks; i++ )
{
	int n = i % NUM_REQS;
 
	OVERLAPPED *o = &olp[n];
	void *b = buf[n];
 
	DWORD cb;
	GetOverlappedResult( file, o, &cb, TRUE );
 
	ULONGLONG nextRequest = i + NUM_REQS - 1;
	if( nextRequest < numChunks)
		RequestChunk( nextRequest );
 
	//b points at our current chunk,
	//which has cb bytes of data in it
 
	ParseChunk( b, cb );
}

Whew. And now that we’re done with the file, we can clean things up.

Cleanup

Not much to do here. We just need to close the open file handle, delete the synchronization events, and free our buffer.

CloseHandle( file );
for( int i = 0; i < NUM_REQS; i++ )
	CloseHandle( ev[i] );
VirtualFree( buf[0] );

And we’re done.

Tuning

The number I picked for NUM_REQS and the multiplier for chunkSize come from experimentation on a few machines with a fairly limited set of files. The optimal values are likely going to be different depending on the drive you’re reading from, the amount of data you’re reading, and how quickly you’re processing the data (relative to the drive’s read speed). It takes some experimentation to find good values.

My advice is to tune the algorithm for the average system you plan to target. It won’t be any slower on a better machine, and isn’t very likely to be any worse on a low end machine than plain synchronous reads would be.

Filed in code | 2 Comments

iOs Compiler (Bug?): “invalid offset, value too big”

This is a bug that’s bit me a few times already. Basically, the iOS compiler fails to generate some function somewhere in the code file causing it to bail with the error “invalid offset, value too big”. Problem is, it doesn’t tell you which function it failed on (the numbers surrounding the error seem largely meaningless).

Fixing it is simple, if tedious. Go through each function definition (including C++ methods and Obj-C message handlers), starting with the largest, and comment out the body. If commenting the body out makes the file compile, you’ve found your culprit. All that’s left is to break the function out into several parts and call them all in sequence from the original function. Note that while this bug usually comes up around complex flow-control (giant switch-case statements, hugely nested ifs, etc), that isn’t always the case (though perhaps the compiler is failing on some inlined-in flow-control).

I don’t know if this is a bug in the compiler’s code generator or not (though that seems to be the case). It may just be a limitation of the target platform, in which case the real bug is how useless the error message is.

Filed in build, iOS | Comment Now

Holy crap, it’s finally shipped. Now I may die peacefully…

Pretty damn proud of how it turned out in the end. You can check it out here or on iTunes. Definitely recommend it if you like rhythm games.

Posted on by phill | Comment Now

WPF: Subtle Binding Crash

Got a crash in, of all things, System.Windows.Controls.Primitives.Popup.OnWindowResize(Object sender, AutoResizedEventArgs e). A NullReferenceException, to be precise. Ages later, it turns out that this error was actually caused by a binding operation on one of the controls in the popup failing because it was trying to instantiate itself as TwoWay with a read-only source property. Somehow, the binding error managed to turn into the NullReferenceException in the layout pass…

Just throwing that out there for anyone that might be seeing something similarly awful.

Filed in code, WPF | Comment Now