Color refinement algorithm
From Wikipedia, the free encyclopedia
In graph theory and theoretical computer science, the color refinement algorithm also known as the naive vertex classification, or the 1-dimensional version of the Weisfeiler-Leman algorithm, is a routine used for testing whether two graphs are isomorphic.[1] While it solves graph isomorphism on almost all graphs, there are graphs such as all regular graphs that cannot be distinguished using color refinement.
History
Description
The algorithm takes as an input a graph with vertices. It proceeds in iterations and in each iteration produces a new coloring of the vertices. Formally a "coloring" is a function from the vertices of this graph into some set (of "colors"). In each iteration, we define a sequence of vertex colorings as follows:
- is the initial coloring. If the graph is unlabelled, the initial coloring assigns a trivial color to each vertex . If the graph is labelled, is the label of vertex .
- For all vertices , we set .
In other words, the new color of the vertex is the pair formed from the previous color and the multiset of the colors of its neighbours. This algorithm keeps refining the current coloring. At some point it stabilises, i.e., if and only if . This final coloring is called the stable coloring.
Graph Isomorphism
Color refinement can be used as a subroutine for an important computational problem: graph isomorphism. In this problem we have as input two graphs and our task is to determine whether they are isomorphic. Informally, this means that the two graphs are the same up to relabelling of vertices.
To test if and are isomorphic we could try the following. Run color refinement on both graphs. If the stable colorings produced are different we know that the two graphs are not isomorphic. However, it could be that the same stable coloring is produced despite the two graphs not being isomorphic; see below.
Complexity
It is easy to see that if color refinement is given a vertex graph as input, a stable coloring is produced after at most iterations. Conversely, there exist graphs where this bound is realised.[4] This leads to a implementation where is the number of vertices and the number of edges.[5] This complexity has been proven to be optimal under reasonable assumptions.[6]
Expressivity
We say that two graphs and are distinguished by color refinement if the algorithm yields a different output on as on . There are simple examples of graphs that are not distinguished by color refinement. For example, it does not distinguish a cycle of length 6 from a pair of triangles (example V.1 in [7]). Despite this, the algorithm is very powerful in that a random graph will be identified by the algorithm asymptotically almost surely.[8] Even stronger, it has been shown that as increases, the proportion of graphs that are not identified by color refinement decreases exponentially in order .[9]
Equivalent Characterizations
For two graphs and with the same number of vertices, the following conditions are equivalent:
- and are indistinguishable by color refinement.
- The minimum fibration bases of and are isomorphic.
- and are fractionally isomorphic.[10][11]
- and have a common coarsest equitable partition.
- and have the same universal cover.[12]
- For all trees , there are an equal number of homomorphisms from to as there are from to .[13]
- and cannot be distinguished by the two variable fragment of first order logic with counting.[14]
- Any message passing graph neural network will map and to the same output, if the input node features are the initial colors .[15][16]
- Any synchronous anonymous algorithm with broadcast/mailbox message passing in which the input depends on the color only will generate an output that depends, again, on the initial color only.[17]