17 Jul 10:27
RE: Question about doloop_end pattern
From: Bingfeng Mei <bmei <at> broadcom.com>
Subject: RE: Question about doloop_end pattern
Newsgroups: gmane.comp.gcc.devel
Date: 2008-07-17 08:27:13 GMT
Subject: RE: Question about doloop_end pattern
Newsgroups: gmane.comp.gcc.devel
Date: 2008-07-17 08:27:13 GMT
Thanks. I was looking at bfin. MT's implementation looks similar but
simpler.
> -----Original Message-----
> From: Ramana Radhakrishnan [mailto:ramana.r <at> gmail.com]
> Sent: 16 July 2008 19:17
> To: Bingfeng Mei
> Cc: gcc <at> gcc.gnu.org
> Subject: Re: Question about doloop_end pattern
>
> Hi Bingfeng,
>
> > Hello,
> > I tried to use doloop_end pattern to reduce loop overhead
> for our target
> > processor, which features a dedicated loop instruction.
> Somehow even a
> > simple loop just cannot pass the test of doloop_condition_get, which
> > requires following canonical pattern.
>
>
> I checked this on our private port of GCC . This is based off 4.3
> branch which is off what we are working on right now . We do use the
> doloop pattern to generate out these cases in our port and I can
> confirm that for our case we generate the following bit of code. Our
> tree does have a few other tweaks that we maintain that we'd like to
> contribute once the copyright assignments are in place.
>
> Unroll:
> c2c $c5,$c2
> i2cs $c4,63
> .L2:
> ldw $c2,($c5)+=1
> add $c2,$c1,$c2
> stw ($c3)+=1,$c2
> brinzdec $c4,.L2
> brz $zero,$link
>
> You probably want to see the mt backend for some example as to how to
> do it . It looks similar to how we do it in ours.
>
>
> cheers
> Ramana
>
> ----
> Ramana Radhakrishnan
> Icera Semiconductor
>
> On Wed, Jul 16, 2008 at 12:05 PM, Bingfeng Mei
> <bmei <at> broadcom.com> wrote:
> > Hello,
> > I tried to use doloop_end pattern to reduce loop overhead
> for our target
> > processor, which features a dedicated loop instruction.
> Somehow even a
> > simple loop just cannot pass the test of doloop_condition_get, which
> > requires following canonical pattern.
> >
> > /* The canonical doloop pattern we expect has one of the following
> > forms:
> >
> > 1) (parallel [(set (pc) (if_then_else (condition)
> > (label_ref (label))
> > (pc)))
> > (set (reg) (plus (reg) (const_int -1)))
> > (additional clobbers and uses)])
> >
> > The branch must be the first entry of the parallel
> (also required
> > by jump.c), and the second entry of the parallel must
> be a set of
> > the loop counter register. Some targets (IA-64) wrap the set of
> > the loop counter in an if_then_else too.
> >
> > 2) (set (reg) (plus (reg) (const_int -1))
> > (set (pc) (if_then_else (reg != 0)
> > (label_ref (label))
> > (pc))). */
> >
> >
> > Here is a simple function I used, it should meet all doloop
> optimization
> > requirements.
> > void Unroll( short s, int * restrict b_inout, int *restrict
> out, int N)
> > {
> > int i;
> > for (i=0; i<64; i++)
> > {
> > out[i] = b_inout[i] + s;
> > }
> > }
> >
> >
> > In tree ivcanon pass, it is converted to
> > ;; Function Unroll (Unroll)
> >
> > Unroll (short int s, int * restrict b_inout, int * restrict
> out, int N)
> > {
> > unsigned int ivtmp.14;
> > int pretmp.9;
> > long unsigned int pretmp.8;
> > int storetmp.6;
> > int i;
> > int D.1459;
> > int D.1458;
> > int D.1457;
> > int * D.1456;
> > int * D.1455;
> > long unsigned int D.1454;
> > long unsigned int D.1453;
> >
> > <bb 2>:
> > pretmp.9_8 = (int) s_12(D);
> >
> > <bb 3>:
> > # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)>
> > # i_19 = PHI <i_15(4), 0(2)>
> > D.1453_3 = (long unsigned int) i_19;
> > D.1454_4 = D.1453_3 * 4;
> > D.1455_6 = out_5(D) + D.1454_4;
> > D.1456_10 = b_inout_9(D) + D.1454_4;
> > D.1457_11 = *D.1456_10;
> > D.1459_14 = pretmp.9_8 + D.1457_11;
> > *D.1455_6 = D.1459_14;
> > i_15 = i_19 + 1;
> > ivtmp.14_21 = ivtmp.14_13 - 1;
> > if (ivtmp.14_21 != 0)
> > goto <bb 4>;
> > else
> > goto <bb 5>;
> >
> > <bb 4>:
> > goto <bb 3>;
> >
> > <bb 5>:
> > return;
> >
> > }
> >
> >
> > This should match requirements of doloop_condition_get. But after
> > ivopts pass, the code is transformed to:
> >
> > ;; Function Unroll (Unroll)
> >
> > Unroll (short int s, int * restrict b_inout, int * restrict
> out, int N)
> > {
> > long unsigned int ivtmp.21;
> > unsigned int ivtmp.14;
> > int pretmp.9;
> > long unsigned int pretmp.8;
> > int storetmp.6;
> > int i;
> > int D.1459;
> > int D.1458;
> > int D.1457;
> > int * D.1456;
> > int * D.1455;
> > long unsigned int D.1454;
> > long unsigned int D.1453;
> >
> > <bb 2>:
> > pretmp.9_8 = (int) s_12(D);
> >
> > <bb 3>:
> > # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)>
> > D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7];
> > D.1459_14 = pretmp.9_8 + D.1457_11;
> > MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14;
> > ivtmp.21_16 = ivtmp.21_7 + 4;
> > if (ivtmp.21_16 != 256)
> > goto <bb 4>;
> > else
> > goto <bb 5>;
> >
> > <bb 4>:
> > goto <bb 3>;
> >
> > <bb 5>:
> > return;
> >
> > }
> >
> >
> > It is not required canonical form anymore. And later RTL level
> > optimizations cannot convert it back. Since it doesn't pass the
> > doloop_condition_get test, modulo scheduling pass doesn't
> work too. Do
> > I miss something here? Any hint is greatly appreciated.
> >
> > Cheers,
> > Bingfeng Mei
> >
> >
> >
>
>
>
> --
> Ramana Radhakrishnan
>
>
RSS Feed