For a graph G with n vertices, the notion of distance we will use is the edit distance. That is, we say that the distance between two graphs is the smallest ε such that one can add and/or delete εn2 edges and get from the first graph to the second. Under a reasonable representation of graphs, this is equivalent to the earlier Hamming distance definition (up to possibly a change of constants).
To make precise the general notions of property testing in the context of graphs, we say a tester for graph property P should distinguish with at least two-thirds probability between the cases of G satisfying P and the cases where G is ε-far in edit distance from satisfying P. The tester can access some oracle to query whether a pair of vertices has an edge between them in G or not. The query complexity is the number of such oracle queries. Say the tester has one-sided error if it has false positives and not false negatives, i.e. if G satisfies P, the tester always outputs the correct answer. [4][5]
We can only differentiate between graphs that satisfy P versus those far from P, as opposed to satisfying versus not satisfying P. In the latter case, consider two graphs: G satisfying P and H not satisfying P by changing only a few edges. One example is testing triangle-freeness with H a graph with exactly one triangle and G having one of these edges removed. Then, the tester cannot tell them apart unless it queries every edge, which it cannot do.
The field of graph property testing was first introduced by Goldreich, Goldwasser, and Ron. In their seminal paper published in 1998, an abstract graph partition problem is analyzed and some testers provided. These include as special cases several important graph properties like bipartiteness, k-colorability, having a large clique, and having a large cut. [4] In particular, the natural algorithms that sample a subgraph and check whether it satisfy the property are all correct, albeit with perhaps-suboptimal query complexities.
Since then, several related discoveries have been made
- In 1992, Alon, Duke, Lefmann, Rödl, and Yuster showed that for every graph H, the property of not containing H as a subgraph is testable. [6]
- In 1999, Alon, Fischer, Krivelevich, and Szegedy showed that for every graph H, the property of not containing H as an induced subgraph subgraph is testable. [7]
- In 2005, Alon and Shapira showed that any monotone graph property (one that is preserved under vertex and edge deletion) is testable with one-sided error. [8]
- In 2008, Alon and Shapira exhibited testers with one-sided error for all hereditary graph properties. They also characterized properties that are easy to test. Namely, these natural properties are semi-hereditary. These statements will be clarified below. [2]
A graph property is hereditary if it is preserved under deletion of vertices, or equivalently, if it is preserved under taking induced subgraphs. A few important hereditary properties are H-freeness (for some graph H), k-colorability, and planarity. All hereditary properties are testable.
- Theorem (Alon & Shapira 2008). Every hereditary graph property is testable with one-sided error. [2]
The proof relies on a version of the graph removal lemma for infinite families of induced subgraphs. The query complexity using this regularity approach is large due to the tower function bound in the Szemerédi regularity lemma.
- Theorem (Infinite graph removal lemma). For each (possibly infinite) set of graphs H and ε > 0, there exist h0 and δ > 0 so that, if G is an n-vertex graph with fewer than δnv(H) copies of H for every H ∈ H with at most h0 vertices, then G can be made induced H-free by adding/removing fewer than εn2 edges. [9]
Informally, an oblivious tester is oblivious to the size of the input. For a graph property P, it is an algorithm that takes as input a parameter ε and graph G, and then runs as a property testing algorithm on G for the property P with proximity parameter ε that makes exactly q(ε) queries to G.
- Definition. An oblivious tester is an algorithm that takes as input a parameter ε. It computes an integer q(ε) and then asks an oracle for an induced subgraph H on exactly q(ε) vertices from G chosen uniformly at random. It then accepts or rejects (possibly randomly) according to ε and H. We say it tests for the property P if it accepts with probability at least 2/3 for G that has property P, and rejects with probability at least 2/3 or G that is ε-far from having property P. [2][1][10]
Crucially, the number of queries an oblivious tester makes is a constant dependent only on ε and not the size of the input graph G. In complete analogy with property testing algorithms, we can talk about oblivious testers with one-sided error.
We can contrive some graph properties for which a tester must access the number of vertices.
- Example. A graph G satisfies property P if it is bipartite with an even number of vertices or perfect with an odd number of vertices.[2]
In this case, the tester cannot even differentiate which property (bipartiteness or perfectness) to test unless it knows the number of vertices. There are many examples of such unnatural properties. In fact, the characterization of graph properties testable by an oblivious tester with one-sided error leads to a class of natural properties.
- Definition. A graph property H is semi-hereditary if there exists a hereditary graph property H such that any graph satisfying P satisfies H, and for every ε > 0, there is an M(ε) such that every graph of size at least M(ε) that is ε-far from satisfying P contains an induced subgraph that does not satisfy H. [2]
Trivially, hereditary properties are also semi-hereditary. This characterization partially answers the converse to the other Alon & Shapira theorem above: the properties that are easy to test properties (having oblivious testers with one-sided error) are almost hereditary. In the same paper, they showed that
- Theorem (Alon & Shapira 2008). A graph property P has an oblivious one-sided error tester if and only if P is semi-hereditary. [2]
In this section, we will give some natural oblivious testing algorithms with one-sided error for triangle-freeness, bipartiteness, and k-colorability. They are natural in the sense that we follow the natural idea of randomly sampling some subset X of vertices of G and checking whether the graph property holds on the subgraph spanned by X by brute-force search. We have one-sided error since these properties are actually hereditary: if G satisfy the property, so must the induced subgraph spanned by X, so our tester always accepts.
For triangle-freeness, the tester is an application of the triangle removal lemma. In particular, it tells us that if graph G is ε-far from being triangle-free, then there is a (computable) constant δ = δ(ε) so that G has at least δn3 triangles.
Example (Triangle-freeness Testing Algorithm).
- Given graph G, choose a random set X of q(ε) = 1/δ triples of vertices independently at random, where δ is as above.
- For every triple of vertices in X, query whether all three pairs of vertices are adjacent in G.
- The algorithm accepts if no triple of vertices induces a triangle, and rejects otherwise. [1]
For bipartiteness and k-colorability, let δ be the desired upper bound on error probability for the following testers. Note that query complexity should not be confused with running time. The latter is often exponential (as is the case of both) due to a lack of polynomial time decision algorithm to test the property on the induced subgraph. We instead check by brute-force search. [4]
Example (Bipartite Testing Algorithm).
- Given graph G, choose a random set X of q(ε) = O(log(1/(εδ))/ε2) vertices.
- For every pair of vertices in X, query whether they are adjacent in G.
- It accepts if the induced subgraph of G on X is bipartite and rejects otherwise. [4]
Example (k-colorability Testing Algorithm).
- Given graph G, choose a random set X of q(ε) = O(k4 log2(k/δ)/ε3) vertices.
- For every pair of vertices in X, query if they are adjacent in G.
- It accepts if the induced subgraph of G on X is k-colorable and rejects otherwise. [4]