I need a sed ninja

Everything Unix Technical

Started by BeanDog December 04, 2007 04:23 PM

4 comments, last by BeanDog 17 years, 1 month ago

BeanDog

Author

1,065

December 04, 2007 04:23 PM

I have a large text file that's the output of mysqldump. This means that the output looks like this:


-- This is a comment
-- another comment.

USE `dbname`;

-- More comments here

DROP TABLE IF EXISTS `achentry`;
CREATE TABLE `achentry` (
  `EntryId` int(11) NOT NULL auto_increment,
  `BatchId` int(11) NOT NULL,
  `Info` blob NOT NULL,
  `Created` datetime NOT NULL,
  `Amount` decimal(50,2) NOT NULL,
  PRIMARY KEY  (`EntryId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

My goal is to get each SQL statement onto a single line. As far as I can tell, comments never appear on a line with any actual SQL code, so we don't have to worry about stripping trailing comments. Essentially, I see this working like this: 1) Get rid of all comments 2) Remove the newline at the end of all lines that don't end in a semicolon Step 1 is easy: cat dump.sql | sed 's/--.*//' > dump2.sql Step 2 is tough. sed operates on single lines--how would I make it strip out the newline at the end of lines not ending in semicolons? If it matters, I'm using bash on CentOS5 (essentially RHEL5).

Sneftel

1,788

December 04, 2007 04:58 PM

Use tr to delete newlines, then use sed to change all semicolons to semicolon-newlines.

ToohrVyk

1,596

December 04, 2007 05:07 PM

sed -n -e '/--/d; H; $ { g; s/\n//g; s/ \+/ /g; s/;/;\n/g; p }' dump.sql > result.sql

sed does not really operate on individual lines. It merely adds text to the pattern space one line at a time, and flushes the pattern space in-between lines. If you wish to work on several lines at once, just append these lines to the work space using the H command. The above source does, for every line inside the input file:

/--/d       # if a line contains '--' stop processing itH           # append the line to the work buffer, separated by newline$ {         # only do this stuff when done reading all input  g         # load the work space into the pattern space  s/\n//g   # remove all newlines in the pattern space  s/ \+/ /g # remove all redundant whitespace in the pattern space  s/;/;\n/g # add a newline after every semicolon in pattern space  p         # print pattern space to output}

The -n argument specifies that the only script output shall come from the 'p' instruction inside, instead of echoing the pattern space after each line read.

Blog — Facebook

BeanDog

Author

1,065

December 04, 2007 05:09 PM

Quote: Original post by ToohrVyk

sed -n -e '/--/d; H; $ { g; s/\n//g; s/ \+/ /g; s/;/;\n/g; p }' dump.sql > result.sql

Thanks, that worked perfectly! Rating++

Sander

1,332

December 04, 2007 06:33 PM

Out of curiosity: Why do you need them to be on a single line?

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

BeanDog

Author

1,065

December 05, 2007 03:12 PM

I'm playing with uni/cluster, a middleware clustering solution for MySQL. Its command-line client to interface with the MySQL cluster (the equivalent of the mysql CLI) will run each line of the file as a query if the file is piped into the CLI (unlike the regular mysql CLI).

I need a sed ninja

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

I need a sed ninja

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines