16 Jul 20:17
Re: Question about doloop_end pattern
From: Ramana Radhakrishnan <ramana.r <at> gmail.com>
Subject: Re: Question about doloop_end pattern
Newsgroups: gmane.comp.gcc.devel
Date: 2008-07-16 18:17:10 GMT
Subject: Re: Question about doloop_end pattern
Newsgroups: gmane.comp.gcc.devel
Date: 2008-07-16 18:17:10 GMT
Hi Bingfeng,
> Hello,
> I tried to use doloop_end pattern to reduce loop overhead for our target
> processor, which features a dedicated loop instruction. Somehow even a
> simple loop just cannot pass the test of doloop_condition_get, which
> requires following canonical pattern.
I checked this on our private port of GCC . This is based off 4.3
branch which is off what we are working on right now . We do use the
doloop pattern to generate out these cases in our port and I can
confirm that for our case we generate the following bit of code. Our
tree does have a few other tweaks that we maintain that we'd like to
contribute once the copyright assignments are in place.
Unroll:
c2c $c5,$c2
i2cs $c4,63
.L2:
ldw $c2,($c5)+=1
add $c2,$c1,$c2
stw ($c3)+=1,$c2
brinzdec $c4,.L2
brz $zero,$link
You probably want to see the mt backend for some example as to how to
do it . It looks similar to how we do it in ours.
cheers
Ramana
----
Ramana Radhakrishnan
Icera Semiconductor
On Wed, Jul 16, 2008 at 12:05 PM, Bingfeng Mei <bmei <at> broadcom.com> wrote:
> Hello,
> I tried to use doloop_end pattern to reduce loop overhead for our target
> processor, which features a dedicated loop instruction. Somehow even a
> simple loop just cannot pass the test of doloop_condition_get, which
> requires following canonical pattern.
>
> /* The canonical doloop pattern we expect has one of the following
> forms:
>
> 1) (parallel [(set (pc) (if_then_else (condition)
> (label_ref (label))
> (pc)))
> (set (reg) (plus (reg) (const_int -1)))
> (additional clobbers and uses)])
>
> The branch must be the first entry of the parallel (also required
> by jump.c), and the second entry of the parallel must be a set of
> the loop counter register. Some targets (IA-64) wrap the set of
> the loop counter in an if_then_else too.
>
> 2) (set (reg) (plus (reg) (const_int -1))
> (set (pc) (if_then_else (reg != 0)
> (label_ref (label))
> (pc))). */
>
>
> Here is a simple function I used, it should meet all doloop optimization
> requirements.
> void Unroll( short s, int * restrict b_inout, int *restrict out, int N)
> {
> int i;
> for (i=0; i<64; i++)
> {
> out[i] = b_inout[i] + s;
> }
> }
>
>
> In tree ivcanon pass, it is converted to
> ;; Function Unroll (Unroll)
>
> Unroll (short int s, int * restrict b_inout, int * restrict out, int N)
> {
> unsigned int ivtmp.14;
> int pretmp.9;
> long unsigned int pretmp.8;
> int storetmp.6;
> int i;
> int D.1459;
> int D.1458;
> int D.1457;
> int * D.1456;
> int * D.1455;
> long unsigned int D.1454;
> long unsigned int D.1453;
>
> <bb 2>:
> pretmp.9_8 = (int) s_12(D);
>
> <bb 3>:
> # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)>
> # i_19 = PHI <i_15(4), 0(2)>
> D.1453_3 = (long unsigned int) i_19;
> D.1454_4 = D.1453_3 * 4;
> D.1455_6 = out_5(D) + D.1454_4;
> D.1456_10 = b_inout_9(D) + D.1454_4;
> D.1457_11 = *D.1456_10;
> D.1459_14 = pretmp.9_8 + D.1457_11;
> *D.1455_6 = D.1459_14;
> i_15 = i_19 + 1;
> ivtmp.14_21 = ivtmp.14_13 - 1;
> if (ivtmp.14_21 != 0)
> goto <bb 4>;
> else
> goto <bb 5>;
>
> <bb 4>:
> goto <bb 3>;
>
> <bb 5>:
> return;
>
> }
>
>
> This should match requirements of doloop_condition_get. But after
> ivopts pass, the code is transformed to:
>
> ;; Function Unroll (Unroll)
>
> Unroll (short int s, int * restrict b_inout, int * restrict out, int N)
> {
> long unsigned int ivtmp.21;
> unsigned int ivtmp.14;
> int pretmp.9;
> long unsigned int pretmp.8;
> int storetmp.6;
> int i;
> int D.1459;
> int D.1458;
> int D.1457;
> int * D.1456;
> int * D.1455;
> long unsigned int D.1454;
> long unsigned int D.1453;
>
> <bb 2>:
> pretmp.9_8 = (int) s_12(D);
>
> <bb 3>:
> # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)>
> D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7];
> D.1459_14 = pretmp.9_8 + D.1457_11;
> MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14;
> ivtmp.21_16 = ivtmp.21_7 + 4;
> if (ivtmp.21_16 != 256)
> goto <bb 4>;
> else
> goto <bb 5>;
>
> <bb 4>:
> goto <bb 3>;
>
> <bb 5>:
> return;
>
> }
>
>
> It is not required canonical form anymore. And later RTL level
> optimizations cannot convert it back. Since it doesn't pass the
> doloop_condition_get test, modulo scheduling pass doesn't work too. Do
> I miss something here? Any hint is greatly appreciated.
>
> Cheers,
> Bingfeng Mei
>
>
>
--
--
Ramana Radhakrishnan
RSS Feed