grep '<\+' {training-data}
grep '\+>' {training-data}
preprocessForParser2 {training-data}
cat {training-data} | tr -s '\011' '\012' | tr -s ' ' '\012' | grep '(' | sort | uniq > TMP
$TO = /home1/b/beatrice/parser-results/better/corpus-data/1
cd $TO
scp beatrice@babel.ling.upenn.edu:/home/migration/other/MIDENG/PPCMBE/parsingExperiments/training-plain.psd .
scp beatrice@babel.ling.upenn.edu:/home/migration/other/MIDENG/PPCMBE/parsingExperiments/testing-plain.psd .
scp beatrice@babel.ling.upenn.edu:/home/migration/other/MIDENG/PPCMBE/parsingExperiments/eval-plain.psd .
mkdir train test eval
cd /home1/b/beatrice/parser-config
The command expects input with wrapper parens.
The file Makefile09 referenced below is analogous to Makefile, but using the old 0.9.9c-modified version of the parser.
make -f Makefile09 \
DOMAIN=better \
MODEL=1 \
INFILE=/home1/b/beatrice/parser-results/better/corpus-data/1/trainingData-0324.psd \
OUTDIR=/home1/b/beatrice/parser-results/better/corpus-data/1/train/ \
preparedata
make -f Makefile09 \
DOMAIN=better \
MODEL=1 \
INFILE=/home1/b/beatrice/parser-results/better/corpus-data/1/testingData-0324.psd \
OUTDIR=/home1/b/beatrice/parser-results/better/corpus-data/1/test/ \
preparedata
make -f Makefile09 \
DOMAIN=better \
MODEL=1 \
FROM=/home1/b/beatrice/parser-results/better/corpus-data/1/test/trees.txt \
maketest
python ./parser-config/split-file.py \
/home1/b/beatrice/parser-results/better/corpus-data/1/test/ \
trees.txt.preparse \
10
source ~sgeadmin/sge/nlp/common/settings.csh
DOMAIN is the directory where the files are set up (here, ~/parser-results/better)
MODEL is the subdirectory of DOMAIN indicating the model number (here, 1).
Be sure that the MODEL directory has a subdirectory model
(e.g. parser-results/better/1/model)
BETTER_CORPUS_LOC is the subdirectory of DOMAIN with the training and testing data (here, .../better/corpus-data).
ifeq ($(DOMAIN)$(MODEL),better1)
TRAINING_FILE=${BETTER_CORPUS_LOC}/1/train/trees.txt
TRAINING_SETTINGS_FILE=${DBPARSER_MYEXT_LOC}/settings/ppceme3R1.properties
endif
make -n -f Makefile09 DOMAIN=better MODEL=1 train2
./submittrain2better.sh better 1
/mnt/castor/seas_home/b/beatrice/parser-config> qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6227 0.55500 run_job.sh yugu r 03/30/2011 11:28:51 default@nlpgrid01.seas.upenn.e 1
6230 0.00000 model1-bet beatrice qw 04/01/2011 11:26:12 1
ifeq ($(DOMAIN)$(RUN),better1)
MODEL=1
TEST_FILE_ORIG=${BETTER_CORPUS_LOC}/1/test/trees.txt.preparse
TESTING_SETTINGS_FILE=${DBPARSER_MYEXT_LOC}/settings/ppceme3R2.properties
GOLD_FILE=$(BETTER_CORPUS_LOC)/1/test/trees.txt
endif
cd $BETTER/0 mkdir runm0-4R2 cd runm0-4R2 cp $BETTER/corpus-data/0/test/trees.txt.preparseor (with split input files)cp $BETTER/corpus-data/0/test/*preparse.part* .
make -n -f Makefile09 \
DOMAIN=better \
MODEL=1 \
RUN=2 \
TEST_FILE_BASE=trees.txt.preparse.part1 \
testnocopy
submit-all-better.sh
./submittestbetter.sh better 1 2 trees.txt.preparse.part1
make -f Makefile09 DOMAIN=better MODEL=1 RUN=2 \
TEST_FILE_BASE=trees.txt.preparse.part1 testnocopy
make -f Makefile09 DOMAIN=better MODEL=1 RUN=2 \
TEST_FILE_BASE=trees.txt.preparse.part2 testnocopy
...
make -f Makefile09 DOMAIN=better MODEL=1 RUN=2 \
TEST_FILE_BASE=trees.txt.preparse.part10 testnocopy
Commands like those above can of course be collected in a shell script.
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6219 0.55500 run2 skulick r 03/27/2011 16:38:36 default@nlpgrid05.seas.upenn.e 1
make -f Makefile09 DOMAIN=better RUN=1 \
INDIR=/mnt/castor/seas_home/b/beatrice/parser-results/better/1/run1 \
conversion4
So: if the actual file for evaluation is
DOMAIN/MODEL/RUN/anotherName.parsed
the Makefile09 entry will be
DOMAIN/MODEL/test/anotherName
Example:
berkeley/0/runm0/enhancedtest.parsed
is referenced as
berkeley/0/test/enhancedtest
gold file = DOMAIN/MODEL/test/someName
Example: ~/berkeley/0/test/trees.txt
grep ' [A-Z]\+[ )]' {parsed-file} | grep -v ' [AIO])'
Bikel: null → ( (CODE NULL))
Berkeley: (()) → ( (CODE NULL))
~/ut/add-wrapper-parens {parsed-file} > {userRedirectsOutput}
CS3 ~/queries/raisePunc.q {parsed-file}
CS3 ~/queries/raisePunc.q {parsed-file}.out
CS3 ~/queries/raisePunc.q {parsed-file}.out.out
and so on
~/ut/rm-wrapper-parens {parsed-file} > {userRedirectsOutput}
make -f Makefile09 DOMAIN=better RUN=2 eval
make -f Makefile09 DOMAIN=better RUN=2 parsedebug
CS3 ~/queries/reformat.q *.psd
CS3 expects its input files to have wrapper parens, which you may have to add.
~/ut/add-wrapper-parens {parsed-file} > {userRedirectsOutput}
The output of the reformatting query has the same name as its input,
but with an .fmt extension.
rm-id.prl expects wrapper parens around tokens and blank lines
between tokens.
~/ut/rm-id.prl {parsed-file} > {userRedirectsOutput}
~/ut/re-ref *.psd
re-ref expects wrapper parens around tokens and blank lines between tokens.
The output of the script has the same name as its input, but with an .with.id extension.