Julius Thyssen | 2 May 2008 22:24
Picon
Favicon

Stripping slack from all php, htm*, js, xml on a webserver

Hi,

In my search for a way to minimize all (production level)
static code on some webservers, I'm trying to get all
wasted space out of those files.

Found a couple of sed-samples;

's/^[[:blank:]]\{1,\}//;s/[[:blank:]]\{1,\}$//'

's/ *|/|/g'

's/ *N/' <<< "N"

'/^$/d;s/^[ \t]*//;s/[ \t]*$//'

and more.
What I would like to do is strip all tabs (indents)
and spaces at the start of lines, all empty lines
(i.e. newlines/linebreaks straight after another one),
all trailing space on code lines, double spaces
in html (except when enclosed as PRE or TT),
in short: All useless garbage.

I also like this idea:
http://sed.sourceforge.net/grabbag/scripts/strip_html_comments.sed

but I'm not really sure how to apply all this.
I have installed SED on a CentOS 4 server, but
how do I make it work an entire web document root
(including subdirs)?
(assuming I have a backup of the whole thing of course)

If any of you have an example shell command for this,
that would really help me on my way.

How do I best combine my wishes into one script?
Can I just batch them one after another?

Perhaps some of you already have something that does this?
I used to have something like this in Perl and RegExp,
but I hear sed is better for it.

Thanks in advance if you find the time to read and respond,
meanwhile I'll try some more sed docs to see if I can get
this idea from the ground,

Julius

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Yahoo! Groups

Special K Challenge

Join others who

are losing pounds.

Yahoo! Groups

Find balance

between nutrition,

activity & well-being.

.

__,_._,___

Gmane