It's more or less along these lines, but there are more optimizations, that you are unlikely to see unless you keep submitting the problem.
I actually solved the equation X^N + Y^N = 1, reducing the problem to whether there are two N-th powers that sum to p+1. Finding two consecutive powers works just as fine.
As mentioned, since the map a -> a^n is essentially "random", we expect to find powers that satisfy with complexity O(sqrt(p)), as opposed to O(p). This map can also be efficiently computed by
exponentiation by squaring.
However, there's still a catch. It's not hard to prove (use the fact that the multiplicative group of a field of prime size has a primitive root) that the image of the endomorphism a -> a^n has size (p-1)/gcd(n,p-1).
Because of this, the judge can make it very unlikely for us to find a solution by sending an n which shares lots of common factors with p-1. That way, the set will be very small, and you'll have to scan a large portion of the residues to find a solution, if one exists at all.
The solution to this is simple. Keep track of the size of the set of n-th power residues. If, at any point, you reach (p-1)/gcd(n,p-1), give up and say it's impossible.
I found this solution pretty neat when I found it. There are two cases: (1) the image of the endomorphism is large and we are very likely to find a solution in O(sqrt(p)) because of "randomness", (2) the image of the endomorphism is small, and we are very likely to find all its elements in O(sqrt(p)), again because of randomness.
It should also be pointed out that a -> a^n is not "random" if n = 1 mod (p-1), since by Fermat's Little Theorem, it's just the identity map. But this case is trivial and we can simply write down a solution.
DM me if you want the exact code (I don't want to post it on the Internet so anyone can submit it and get the problem), but the idea is:
if p is 2, output impossible
if p is not two, but n = 1 mod (p-1), output 1 1 2
declare a map m
for all numbers k, from 1 to p-1:
compute a <- k^n
set m[a] = k
if (m[p-1-a] is not null) output m[a], m[p-1-a], 1 and break
if (m.size = (p-1)/gcd(n,p-1)), output impossible and break
The implementation of the map has to be very efficient, because of the tight time limit. If you're using C++, it's just knowing which functions to call to make it run fast.