-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The problem
After thinking a little following @mundya's #143 which makes the routing table generator merge routes which share keys and masks. I'm presuming that this above change was made in an effort to support N:M (multi-source, multi-sink) nets within Rig's net model, rather supporting some black-magic post-processing stuff and so I'll just consider N:M nets here.
Rig's "Net" model of multicast routes is that they are 1:M (single-source, multiple sink). Further the assumption is made that keys/masks are globally unique, one per net, and completely unambiguous in all situations (which use of BitFields can help ensure). This model has the nice property that it can straight-forwardly map to valid routing tables in the general case. (As an aside: this set of restrictions isn't clear in the docs and needs emphasising).
N:M nets can clearly be a useful thing to have: consider a case one population of neurons, a, is connected all-to-all with a second population, b. Here an N:M net could be used whose N sources were the processors simulating a and M sinks are the processors simulating b.
Clearly, such a net can exist in SpiNNaker, for example here's an example where the two populations are split across two processors each and reside on separate chips:
Indeed, this route could be generated by using two 1:M nets with the same key (after #143, anyway). There are plenty of others examples you could concoct where this approach also works correctly.
Unfortunately, N:M nets with one key/mask are not possible in the general case on SpiNNaker. Consider the following concrete example:
Here the two populations a and b are split into two processors each but this time there's an a and b processor on each chip. On the top chip, a route going South and to the b core would be created. On the bottom chip, a route going North and to the core with b would be created. The result is that any packet in the net will be infinitely bounced back-and-forth between the two chips creating duplicate packets for the two b cores and generally killing the network.
What next
This is by no means an unknown problem and, as far as I'm aware, something all existing software has had to solve independently, by hand. Unfortunately it is cripplingly easy to fall silently into this trap accidentally with Rig, especially following #143. This duplication of effort and also the ease-of-silent-failure in Rig seems like a Bad-Thing (TM). The following solutions come to mind:
-
Support general N:M nets as the underlying primitive abstraction in Rig. To do this Rig's Net abstraction would need a major re-work requiring far more involvement in the key-allocation process and also complicating the routing process a little.
As such I'm rather against this since it adds a lot of complexity for what isn't an equally major expense. It will also substantially break backward compatibility and a huge amount of implementation work. Volunteers are welcome of course...
-
Support N:M nets where all N sources are placed on the same chip. A new N:M net type would be defined which, in the initial problem description, acts as a general N:M net. The placement and allocation steps should not require any significant work to support such a net. After placement, a new processing step would split any N:M nets with source vertices placed on multiple chips into a new set of N:M nets for which all source vertices are on the same chip. The routing algorithm can then treat these N:M nets with all N sources on a single chip the same way as a 1:M net is currently treated.
The user-application is then burdened with assigning unique keys/masks to any nets created as a result of splitting up N:M nets, a process which should be straight-forward. An additional side-effect of this process is that a larger number of routing entries may be produced than a 'smart' general N:M router. Though such entries should be very readily compressible using @mundya's algorithm, the routing algorithm would inherently operate with less information than a true N:M router and therefore may generate less efficient routes in general.
This approach would be a modest amount of work for a pretty good (though not perfect) result. It could even be done in a completely backward-compatible way if required. (It would probably be best done by making the base Net type an N:M net breaking backwards compatibility very slightly.)
-
Revert Merge entries with equivalent keys/masks #143 and add a set of sanity checks to the
build_routing_tableswhich cause the process to fail where nets with the same key/mask cross. Obviously this doesn't fully enforce the Rig Net model (e.g. it won't catch ambiguous but different keys/masks) but it would help things fail-early. Applications would be required to create N 1:M nets in place of a N:M net.This is clearly the easiest approach and doesn't (majorly) break backwards compatibility. It does however come at a relatively significant cost in terms of key space and routing entries.
Opinions are sought from both @mundya and @neworderofjamie on this matter since I imagine this is something you both have had to deal with in some form or another in your respective codebases. In particular I'd be interested to know what your existing solution is and whether you even consider this worth making Rig's problem? Either way, I presume you'd agree that making more noise if you violate Rig's Net model (in places where a Rig utility function like build_routing_tables depends on it) would be a good thing?

