Multimodal Summarization Demo - IARPA Research, 2018, Columbia University
Conversational Abstractive/Extractive Summarization - Vishal Anand, Jessica Ouyang, Kathy McKeown
Text to be summarized (scrollable select)

Summary method Injection (select)


Amplifier input
scale (select)



Morphemes injected
safe come fine way reached just time faring yes mm maybe mmhm days home good

Summary output
work is work that has become too much maybe if we get a chance to come visit you . mm and how are yours ? ohh hear these days where do you work ? mm .



Selected Conversation (Scrollable)



Code base Details on using the system on non-GPU setup with acceleration mentioned in following section. Usage is a single line (refer below for details):
./conversational.sh '2 3 4 5 6 7 8 9 10 11 12'
The mechanical turk for qualitative analysis for current system with parallel dataset: https://research.google.com/audioset/download.html#split
The results have also been validated on another parallel dataset: SQuAD (Stanford Question Answer Dataset) wherein a given bidirectional LSTM (BiDAF)'s re-implemented in-house system was tested on with and without morpheme injection. The results are summarized below. Since the results are for exact matches in extractive summaries, however, the questions with "why" parts have to have abstractive components to come up with the exact matches. Since, the percent increase in "why" questions were up by 14.28%, it is non-trivial to see how morpheme injection can help the global module learn to better prioritize relevant details (or question's enitites). So, morpheme infusion approach majorly improves abstractive and abstractive-based extractive summarizations. The same is also exemplified in qualitative experiments above (with the live example above)
Usage script (invoke_shell_script.sh):
#!/usr/bin/env bash set -x python code/pre_batch.py \ -glove ../GoogleNews-vectors-negative300.bin \ -query ./questions/query.txt \ -preparse input_fed.txt \ -features 1000 \ -topics 100 \ -words 15 \ -word_tf_input 65 \ #-verbose python code/make_datafiles.py \ -stories ./input_fed.txt \ -tokenized ./data_file/tokenized \ -package ./data_file/finished_files # SCALE=1 SUMMARIZATION_DIR="summarization_dir" mkdir -p ${SUMMARIZATION_DIR} DECODED_DIR="./pretrained_model/decode_test_400maxenc_4beam_35mindec_100maxdec_ckpt-238410" SCALE_ARR=$1 for SCALE in ${SCALE_ARR[@]} do echo $SCALE SUMMARY="./$SUMMARIZATION_DIR/summarization_beam_search_$SCALE.txt" SUMMARY_LOG="./$SUMMARIZATION_DIR/log_summarization_beam_search_$SCALE.txt" rm -rf ${DECODED_DIR} python code/run_summarization.py \ --mode=decode \ --data_path=./data_file/finished_files/chunked/test_* \ --vocab_path=./data_file/finished_files/vocab \ --log_root=. \ --exp_name=pretrained_model \ --max_enc_steps=400 \ --max_dec_steps=100 \ --coverage=1 \ --single_pass=1 \ --scale=$SCALE \ > ${SUMMARY_LOG} python code/parse_decoding.py \ -decode ${DECODED_DIR}/decoded/ \ -summary ${SUMMARY} sort $SUMMARY > "./$SUMMARIZATION_DIR/summary_sort_$SCALE.csv" done set +x
Accelerated (Non-GPU) Hyperparameter beam-search based README:
$ `virtualenv env` $ `source env/bin/activate` $ `pip install -r requirements.txt` $ `chmod +x ./conversational.sh` $ `./conversational.sh '2 3 4 5 6 7 8 9 10 11 12'` `# ./conversational.sh '< List of scaling factors to be evaluated>'`
Vishal Anand

Department of Computer Science, Columbia University

Jessica Ouyang

Department of Computer Science, Columbia University

Prof Kathy McKeown

Department of Computer Science, Columbia University