It is important that you can very efficiently calculate the cost of placing a rectangle at certain position. A little bit of math is needed to do this.

Let $P$ be the set of all $(x,y)$ coordinates. For each pixel $p \in P$, let $v_p$ be its color in the input. That is, $v_p$ is a vector of size 3, where $v_{p,0}$ is the red component, $v_{p,1}$ is the green component, and $v_{p,2}$ is the blue component.

Let us partition $P$ in two disjoint sets, $X \cup Y = P$ and $X \cap Y = \emptyset$. In our application, $X$ is the “rectangle” and $Y$ is the “background”, but this is not yet important here.

We choose color $a$ for set $X$ and color $b$ for set $Y$. Again, these are vectors of size 3.

For each pixel $p \in X$ and color component $c$ the error is $a_c - v_{p,c}$, and for each pixel $p \in Y$ and color component $c$ the error is $b_c - v_{p,c}$. The total cost over all pixels and all color components is therefore
$$
f(X,Y,a,b)
= \sum_c \sum_{p \in X} \left(a_c - v_{p,c}\right)^2 + \sum_c \sum_{p \in Y} \left(b_c - v_{p,c}\right)^2,
$$
and our task is to find $(X,Y,a,b)$ that **minimizes the cost** $f(X,Y,a,b)$. Equally well we can **maximize the utility**
$$
g(X,Y,a,b) = - f(X,Y,a,b) + \sum_c \sum_{p \in P} v_{p,c}^2
$$

For any $Z \subseteq P$, let us introduce the shorthand notation $$ v_{Z,c} = \sum_{p \in Z} v_{p,c}. $$ We can rewrite the cost as follows: $$ \begin{split} f(X,Y,a,b) &= \sum_c \left( \sum_{p \in X} a_c^2 - \sum_{p \in X} 2a_c v_{p,c} + \sum_{p \in X} v_{p,c}^2 + \sum_{p \in Y} b_c^2 - \sum_{p \in Y} 2b_c v_{p,c} + \sum_{p \in Y} v_{p,c}^2 \right) \\ &= \sum_c \left( |X| \cdot a_c^2 + |Y| \cdot b_c^2 - 2a_c v_{X,c} - 2b_c v_{Y,c} + \sum_{p \in P} v_{p,c}^2 \right). \end{split} $$ Therefore the utility that we want to maximize is simply $$ g(X,Y,a,b) = \sum_c \left( 2a_c v_{X,c} + 2b_c v_{Y,c} - |X| \cdot a_c^2 - |Y| \cdot b_c^2 \right). $$

To maximize $g(X,Y,a,b)$, we try all possible partitions $(X,Y)$, and for each partition we find the best colors $(a^*,b^*)$ that maximize $g(X,Y,a^*,b^*)$. By differentiation, it is easy to see that the optimal colors are the arithmetic means $$ a^*_c = \frac{v_{X,c}}{|X|}, \quad b^*_c = \frac{v_{Y,c}}{|Y|}. $$ We use the shorthand $h(X,Y) = g(X,Y,a^*,b^*)$ for the utility of partition $(X,Y)$ for the best choice of colors. Note that this function takes the following very simple form: $$ h(X,Y) = \sum_c \left( \frac{v_{X,c}^2}{|X|} + \frac{v_{Y,c}^2}{|Y|} \right). $$

Therefore to find an optimal segmentation, it is sufficient to check all possible rectangles $X$ and calculate $h(X,Y)$. To do that, we only need to know $v_{X,c}$ and $v_{Y,c}$. Naturally, $v_{Y,c} = v_{P,c} - v_{X,c}$. Hence to calculate $h(X,Y)$ efficiently, we only need to know how to calculate $v_{X,c}$ efficiently, for any rectangle $X$.

Note that $v_{X,c}$ is simply the sum of color component $c$ for all pixels in rectangle $X$. With a little bit of preprocessing, we can calculate this sum in constant time.

Let us write $s(x_0,y_0,x_1,y_1,c) = v_{R,c}$, where $R = [x_0,x_1-1] \times [y_0,y_1-1]$ is the rectangle with the corners $(x_0,y_0)$ and $(x_1,y_1)$. Then by the inclusion–exclusion principle (see the illustration below), $$ \begin{split} s(x_0,y_0,x_1,y_1,c) {}={} &s(0,0,x_1,y_1,c) \\ {}-{} &s(0,0,x_0,y_1,c) \\ {}-{} &s(0,0,x_1,y_0,c) \\ {}+{} &s(0,0,x_0,y_0,c). \end{split} $$ So we only need to precompute $s(0,0,x,y,c)$ for all $x$, $y$, and $c$. With this precomputed information, we can then calculate $h(X,Y)$ for any rectangle $X$ in constant time. Vector types will help us to do all work conveniently for all color components in parallel.

For further performance improvements, it may be helpful to organise the loops so that the outer loop tries out different sizes of the rectangle, and the inner loop tries all possible positions of the rectangle. Then $1/|X|$ and $1/|Y|$ remain constant in the inner loop.

Here is one approach: First use a coarse grid, with e.g. 10 × 10 pixels per grid cell. Then try all possible ways to place the rectangle in this coarse grid. Each coarse location represents a set of approx. 10000 fine-grained locations (so it is much faster to try out all possible coarse locations). For each coarse location, calculate the following two estimates (efficiently, in constant time):

**An upper bound:**at most how much is the cost if I place the rectangle in some of the fine-grained locations that are represented by this coarse-grained location?**A lower bound:**at least how much is the cost if I place the rectangle in some of the fine-grained locations that are represented by this coarse-grained location?

After these calculations, you can hopefully rule out a majority of coarse-grained locations: you know that there are other locations where the cost is **at most** some value *s*, so you can then ignore all coarse-grained locations where the cost is **larger than** *s*.

Finally, zoom in to those coarse-grained locations that you have not yet ruled out.

You can also apply the grid idea recursively.