Knight’s Tour in C# and C++ with Warnsdorff’s Algorithm

C# code is not inherently slow, and writing code in C++ does not guarantee that your programs will execute fast. For me, these facts were again highlighted by taking part in last month’s Programming Challenge from the C/C++/C# Blog by David Bolton at (which is part of The New York Times Company). The challenge was to calculate a 20x20 Knight’s Tour in the fastest possible time. The final results are out, and my C++ and C# entries ended up in the first and third places, respectively. When comparing my two entries with each other, one can get an idea of how C# and C++ perform when they implement essentially the same logic. Furthermore, one can compare the performance of this C# entry with all the C++ entries and realize that C# code can in fact be very fast.

The actual performance of C# code depends somewhat on the hardware that the C# code is executed on and JIT-compiled, as well as the choices between Mono or the Microsoft .NET Framework and the version of whatever framework is used. Together, these factors can easily double or halve your execution time. But among all the C++ entries, the average time was about 1 millisecond to calculate a tour, which is quite fast for the challenge involving the specified board size; it can easily take thousands or millions times longer for brute-force algorithms that include backtracking. Still, that makes my C# entry more than 20 times faster than the average C++ entry. The point is, even C++ code can be relatively slow, often much slower than C# code. When it comes to writing fast code, it is more about the algorithm and the way you structure your code than it is about your choice of programming language.

Having said that, C++ code can still beat C# code in terms of speed, as illustrated by the drastically different performance of my C# versus C++ entries; the core fragments that were measured for the challenge are almost identical between the two entries. Despite the similarities, the C# version ends up roughly two (or perhaps three) times slower when compared to the C++ version; generally I would expect a difference in the order of only 15% to 30%. So, the difference in this specific case was much bigger than I expected, but this is probably something that cannot be generalized. Nonetheless, my C# entry still ended up being very fast in comparison with the average C++ entry. So, when performance is absolutely essential, prefer C++ over C#, but then first make sure you get the algorithm and structure of the code right or it won’t matter anyway.

The code that I wrote and submitted for Programming Challenge 35 can be downloaded from the links at the end. The software is released under the MIT License (those files that include the license header). The programs implement the Knight’s Tour using Warnsdorff’s Algorithm. It is also interesting to note that, although the C# program uses unsafe code to eliminate the bounds checking for arrays, the resulting performance gain is only in the order of about 10%, and, if you are looking for a “safe” implementation, you would need to change only two or three lines of code.

Overall, I think C# is a good language in terms of general execution speed and a developer’s productivity gains resulting from the .NET Framework; for example, my C# entry also includes code to render the Knights Tour as an image (similar to the one shown below), which would be much more difficult to implement if you insisted on using C++ alone. Of course, you can always combine C++ and the power of .NET through C++/CLI, if you want.

Illustration of the resulting moves for the 20x20 Knight's Tour



Also available in D

Leonardo Maffi ported my Knight's Tour code to the D programming language. Read more about the D version on his journal, or download the code from his “useless software” page, where he also shares code of numerous interesting projects (in addition to the projects on the “better software” page).

Clarification question

Hi, Tiaan.

I am so interesting for your so efficient implementation that I spent many hours to study it. Although I know the Warnsdorff rules, at last I failed to understand your genius codes after I have debugged them many times. Maybe some knowledge or skills I am short of have blocked my understanding. Whatever, I have tried my best and now I am very depressed, so could you please add some comments in the source code or give me some links to help me understand your implementation.

Any help will be greatly appreciated.
Best Wishes.

Re: Clarification question

Hello zilf,

There is really not all that much to understand about the code, but some of the code may look a bit confusing at first. The logic involving the array indexes may seem slightly awkward and is implemented the way it is to improvement execution speed. However, the rest is really just an implementation of the Warnsdorff rule, which states that the next move should go to the block from which the knight will have the fewest onward moves.

Here are some hints to help clarify the implementation; I am mainly referring to the C# code, but the C++ implementation is almost identical:

  • The data stored in isAvailable keeps track of which blocks must still be visited. The actual grid being solved is 20x20 blocks in size, but isAvailable represents a 32x28 grid for performance reasons (i.e., 32 blocks wide by 28 blocks high).
  • Only an inner region of 20x20 blocks inside the larger 32x28 grid is really traversed by the knight, but there is a margin surrounding the 20x20 region that prevents the knight from leaving the inner region. The margin is 4 blocks wide on the top, left and bottom sides, while being 8 blocks wide on the right side.
  • The direction constants define the index deltas required to move the knight from the current block to the 8 locations of a possible next move. (North is defined as upwards, South as downwards, East towards the right and West to the left.)
  • The two lines involving the fixed keyword are, from an algorithmic perspective, the same as the two preceding lines that have been commented out; they just allocate two arrays (one for isAvailable that was discussed above, and another for sequenceValues that stores the resulting order in which we traverse the board).
  • We start the tour in the lop-left corner of the inner grid, corresponding with index TopLeftPosition in terms of the outer grid, which is actually 4 blocks down and 4 blocks toward the right of the outer top-left corner (this is to avoid the top and left margins).
  • The nested for loops carve out the inner 20x20 region where the knight is initially allowed to move.
  • The main while loop picks each next block based on the score that is calculated for that block, which can be one of 8 possible blocks (i.e., NNW, NNE, WNW, ENE, WSW, ESE, SSW or SSE from the current block).
  • The score for each of the eight possible moves is determined by how many blocks the knight will be able to move to in a subsequent (i.e., onward) move, should that possible block be selected.
  • The score can never be higher than 7, since the subsequent move can always be in only 7 of the 8 possible directions; the eighth subsequent direction will never be possible, since that one subsequent direction will always correspond with the block that is evaluated for the current iteration. For example, if the next move being considered is NNW, the subsequent move cannot be SSE, since that will return us to the current occupied location; it is always the opposite direction of the next move that the subsequent move cannot be.
  • For each iteration in the main loop, the block with the lowest score wins.
  • The block number in the context of the smaller grid is calculated from the index number defined in the context of the larger grid, using some bit-manipulation logic for improved speed.
  • You can ignore the Debug.Assert statements from an algorithmic perspective, since those are just sanity checks.
    • Hopefully these additional comments will clarify the code for you.

Why 32 by 28 blocks?

Hi Tiaan,

Your so detailed reply really surprised me. They are very neat and very helpful. Thank you so much.

BTW: Whether the large grid is designed 32*28, which you said is for performance reasons, is based on some design philosophy or theoretical basis?

Your sincerely. Best Wishes.

Re: Why 32 by 28 blocks?

Yes, zilf, the size of the larger grid was carefully chosen, especially the width to be exactly 32 blocks. First of all, for performance reasons, a 1-dimensional array is used to store the 2-dimensional grid data in isAvailable instead of using a 2-dimensional array. The data is stored row by row, one block after another, and we want the number of blocks per row to be chosen in such a way so that it will allow fast “jumping” between rows, which end up being 32 blocks wide for this specific 20x20 grid problem, as explained further below. The number of rows is 28 just so that it can also hold the margin data, but it doesn’t have to be bigger than that.

The important number is the width being 32 blocks, so that the values of sequenceValues can be calculated using bitwise-and and bit-shift operations, instead of using division that is slow. Bit operations are extremely fast. (I suspect that even addition and subtraction operations may sometimes be faster when the number-space has similar power-of-two boundaries that could minimize the carry-over effect between neighboring bits, although this will likely depend on the actual CPU hardware being used.)

Anyway, we end up with the number 32 for the width since it is the next larger power-of-two number after 28. The constants that are defined in the program are mathematically related as follows:

SideLength = 20
MarginCount = 4
HeightCount = (SideLength + (2 × MarginCount)) = 28
WidthShift = ceiling(log2(HeightCount)) = 5
WidthCount = (2WidthShift) = 32
WidthMask = (WidthCount - 1) = 31

Given the equations above, one can change the code to solve grids with different sizes from the current 20x20 grid. To give the correct results in all cases, I think one may have to add SquareOneOffset (i.e., 1) to HeightCount before taking the logarithm in the equation for calculating WidthShift (even though the answer would remain the same for the 20x20 solution), or one might consider rather switching to zero-based values for sequenceValues so that this is not necessary. If the math looks a bit unfamiliar to you, you might want to read up more about binary numeral systems and bitwise operations.

Everything is clear

Hi Tiaan,

Everything is clear.
I must say that you are so kind and so patient. I will always keep my eyes on your blog. Good luck!

Thank you sincerely.
Best wishes.