CS 533
Rutgers - CS 533
  • 1 Page geniafix2
    Geniafix2

    School: Rutgers

    # Refinder3.txt # to change field divider from / to _ in Genia 3.0 POS file # and add _SYM tag to = open(INPUT,"GENIA3_0_pos.txt"); open(OUTPUT,">genia30pos4.txt"); @lines = <INPUT>; $n=0; while ($lines[$n]) { $_ = $lines[$n]; s/(.*)

  • 72 Pages Genia30counts2
    Genia30counts2

    School: Rutgers

    4) 323 ] 7, 8, 38 : 10 79 Both 221 Neither 492 both 787 either 1243 neither 48 AND 1291 or 5 +/6+ 1362 plus 49 AND 1292 or 1851 x 80 Both 493 both 222 Neither 494 both 788 either 50 AND 9 -2 1793 two13 1 19 2 25 3 27 4 35 5 167 I 18 247464 28 4 37 9.3 127

  • 4 Pages geniafix4
    Geniafix4

    School: Rutgers

    # Refinder5.txt # to clean up illegal tags in genia30pos1.txt > genia30pos2.txt open(INPUT,"genia30pos1.txt"); @lines = <INPUT>; close(INPUT); open(OUTPUT,">genia30pos2.txt"); $n=0; while ($lines[$n]) cfw_ $_ = $lines[$n]; if (/.*\_\-$/) cfw_ chop; pr

  • 2 Pages geniaTBL2
    GeniaTBL2

    School: Rutgers

    if (!open(INPUT,"GeniaOut1.txt") cfw_goto end; @textlines=<INPUT>; close (INPUT); print @textlines[0,1,2,3]; $nx=scalar(@textlines); print "Number of text lines = $nx\n"; # # split input lines into tag and dtag # for ($j=0;$j<$nx;$j+) cfw_ $tag[$j]=(spli

  • 3 Pages geniaTBL1
    GeniaTBL1

    School: Rutgers

    if (!open(INPUT,"GeniaOut1.txt") cfw_goto end; @textlines=<INPUT>; close (INPUT); print @textlines[0,1,2,3]; $nx=scalar(@textlines); print "Number of text lines = $nx\n"; # # split input lines into tag and dtag # for ($j=0;$j<$nx;$j+) cfw_ $tag[$j]=(spli

  • 16 Pages CS533pres
    CS533pres

    School: Rutgers

    BioNLP Tools from the GENIA Corpora Mark Sharp & Lu Liu CS533 Term Project Spring 2003 Prof. Stone 1 BioNLP NLP in the biomedical domain Unique lexicon; highly technical terminology; changes rapidly; but simple syntax (~100% declarative) Text only, no sp

  • 132 Pages Genia30counts1
    Genia30counts1

    School: Rutgers

    7+ 11 18 20-epi 39 9-cis 55 AP-2 61 A 81 B 117 E. 130 ERK1 131 ERK2 132 ER 142 FCS 148 G. 150 GAS-motif 154 GR 174 Hence 192 JAK/STAT 203 LPS 205 LTR 210 M. 212 M. 225 Murine 233 NFAT 239 NXS 243 Northern 244 Northern 251 P. 252 PBL 280 S. 297 Southern 34

  • 17 Pages CS533proposal
    CS533proposal

    School: Rutgers

    CS533 TERM PROJECT PROPOSAL Mark Sharp & Lu Liu Spring 2003 We want to use the GENIA bioNLP corpora (http:/www-tsujii.is.s.u-tokyo.ac.jp/~genia/) to train a tagger and see how well it tags an arbitrary biomedical text, say, a set of Medline abstracts. The

  • 4 Pages genia30badtags
    Genia30badtags

    School: Rutgers

    GENIA 3.0 POS BAD TAGS AND COPRRECTIONS (N=193) C:\PERL\MARK>perl refinder5.txt LINE# BAD TAG CORRECTION -1202 ER_NN. ER_NN 25466 A_NN. A_NN 27556 transcription_NN. transcription_NN 42300 B_NN. B_NN 50755 granulocyte-macrophage_JJ! granulocyte-macrophage_

  • 4 Pages geniafix4
    Geniafix4

    School: Rutgers

    # Refinder5.txt # to clean up illegal tags in genia30pos1.txt > genia30pos2.txt open(INPUT,"genia30pos1.txt"); @lines = <INPUT>; close(INPUT); open(OUTPUT,">genia30pos2.txt"); $n=0; while ($lines[$n]) { $_ = $lines[$n]; if (/.*\_\-$/

  • 1 Page geniafix3
    Geniafix3

    School: Rutgers

    # Refinder4.txt # to print records with .*_.*\/.* or .*_.*_.* in genia30pos4.txt open(INPUT,"genia30pos4.txt"); @lines = <INPUT>; $n=0; while ($lines[$n]) { $_ = $lines[$n]; if (/.*_.*\/.*/) { print "$_"; }

  • 1 Page geniafix1
    Geniafix1

    School: Rutgers

    # Refinder2.txt # to change field divider from / to _ in Genia 3.0 POS file # and add _SYM tag to = open(INPUT,"GENIA3_0_pos.txt"); open(OUTPUT,">genia30pos2.txt"); open(OUTPUT2,">genia30pos3.txt"); @lines = <INPUT>; $n=0; while ($lines[$n]) {

  • 4 Pages genia30badtags
    Genia30badtags

    School: Rutgers

    GENIA 3.0 POS BAD TAGS AND COPRRECTIONS (N=193) C:\PERL\MARK>perl refinder5.txt LINE# BAD TAG CORRECTION -1202 ER_NN. ER_NN 25466 A_NN. A_NN 27556 transcription_NN. transcription_NN 42300 B_NN. B_NN 50755 granulocyte-macrophage_JJ! granulocyte-macrop

  • 9 Pages CS533final
    CS533final

    School: Rutgers

    CS533 TERM PROJECT Mark Sharp & Lu Liu msharp@scils.rutgers.edu luliu@scils.rutgers.edu Spring 2003 GENIA is an information extraction project targeted to the biomedical domain. The project has made available to the BioNLP community a variety of reso

  • 16 Pages CS533pres
    CS533pres

    School: Rutgers

    BioNLP Tools from the GENIA Corpora Mark Sharp & Lu Liu CS533 Term Project Spring 2003 Prof. Stone 1 BioNLP NLP in the biomedical domain Unique lexicon; highly technical terminology; changes rapidly; but simple syntax (~100% declarative) Text on

  • 17 Pages CS533proposal
    CS533proposal

    School: Rutgers

    CS533 TERM PROJECT PROPOSAL Mark Sharp & Lu Liu Spring 2003 We want to use the GENIA bioNLP corpora (http:/www-tsujii.is.s.u-tokyo.ac.jp/~genia/) to train a tagger and see how well it tags an arbitrary biomedical text, say, a set of Medline abstracts

  • 132 Pages Genia30counts1
    Genia30counts1

    School: Rutgers

    7+ 11 18 20-epi 39 9-cis 55 AP-2 61 A 81 B 117 E. 130 ERK1 131 ERK2 132 ER 142 FCS 148 G. 150 GAS-motif 154 GR 174 Hence 192 JAK/STAT 203 LPS 205 LTR 210 M. 212 M. 225 Murine 233 NFAT 239 NXS 243 Northern 244 Northern 251 P. 252 PBL 280 S. 297 Southe

  • 72 Pages Genia30counts2
    Genia30counts2

    School: Rutgers

    4) 323 ] 7, 8, 38 : 10 79 Both 221 Neither 492 both 787 either 1243 neither 48 AND 1291 or 5 +/6+ 1362 plus 49 AND 1292 or 1851 x 80 Both 493 both 222 Neither 494 both 788 either 50 AND 9 -2 1793 two13 1 19 2 25 3 27 4 35 5 167 I 18 247464 28 4 37 9.

  • 3 Pages geniaTBL1
    GeniaTBL1

    School: Rutgers

    if (!open(INPUT,"GeniaOut1.txt") {goto end;} @textlines=<INPUT>; close (INPUT); print @textlines[0,1,2,3]; $nx=scalar(@textlines); print "Number of text lines = $nx\n"; # # split input lines into tag and dtag # for ($j=0;$j<$nx;$j+) { $tag[$j

  • 2 Pages geniaTBL2
    GeniaTBL2

    School: Rutgers

    if (!open(INPUT,"GeniaOut1.txt") {goto end;} @textlines=<INPUT>; close (INPUT); print @textlines[0,1,2,3]; $nx=scalar(@textlines); print "Number of text lines = $nx\n"; # # split input lines into tag and dtag # for ($j=0;$j<$nx;$j+) { $tag[$j

  • 9 Pages CS533final
    CS533final

    School: Rutgers

    CS533 TERM PROJECT Mark Sharp & Lu Liu msharp@scils.rutgers.edu luliu@scils.rutgers.edu Spring 2003 GENIA is an information extraction project targeted to the biomedical domain. The project has made available to the BioNLP community a variety of resources

Back to course listings