Potential outcomes selection bias

I have an issue with the derivation of the selection bias from the Rubin potential outcomes framework.

I am looking at this slide who shows how to go from a difference in means between treated and controls to the ATT + selection bias.

However, I don't get how to go from Step 2 to Step 3.

Why is consistency invoked here?

And how do you go from a subtraction in Step 1 to an addition in Step 2.

