fsel 2006-08-08 - By Michel Colman
Back I know PPC code is probably not very high on anybody's priority list right now, but this is really weird:
I was trying to write efficient code by doing things like "x >= 0 ? y : z", knowing that this translates into a single fsel instruction. However, the compiler produces this:
double fseltest1(double x, double y, double z) { return x >= 0 ? y : z; } __Z9fseltest1ddd: LFB1542: lis r2,ha16(LC0) lfd f0,lo16(LC0)(r2) fcmpu cr7,f1,f0 cror 30,29,30 beq cr7,L2 fmr f2,f3 L2: fmr f1,f2 blr
I thought I had missed a compiler option checkbox "allow blatantly obvious optimizations" somewhere, but then I tried a different comparison:
double fseltest2(double x, double y, double z) { return x < 0 ? y : z; } __Z9fseltest2ddd: LFB1543: fneg f0,f1 fsel f1,f1,f3,f2 fsel f1,f0,f1,f3 blr
So the compiler DOES know how to use fsel! (It took me some time to realise that you can't just write "fsel f1, f1, f3, f2" because f1 can be a NaN)
Anyway, why is the second example optimized, while the first is not? I have read that fsel is not fully IEEE754-compliant because it doesn't cause an exception for signaling NaNs, but I would think a single extra dummy instruction with result in f0 would take care of that. (fneg f0, f1; fsel f1, f1, f2, f3)
So, why all the branching?
Michel
__ ____ ____ ____ ____ ____ ____ ____ ____ ____ Do not post admin requests to the list. They will be ignored. Mac-games-dev mailing list (Mac-games-dev@(protected)) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/mac-games-dev/junlu%405341.com
This email sent to junlu@(protected)
|
|