Lecture8-PrSequencing

Lecture8-PrSequencing - BIOC*2580
Lecture
8.



Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: BIOC*2580
Lecture
8.

 Determining
the
amino
acid
sequence
of
a
protein.

 1
 
 Synopsis:
 In
 practice,
 proteins
 need
 to
 hydrolyzed
 into
 shorter
 peptides
 for
 sequencing.
 Selective
 hydrolysis
 of
 the
 polypeptide
 chain
 by
 proteases
 or
 with
 chemicals
 cuts
 very
 long
 polypeptides
 into
 specific
 fragments
 of
 more
 manageable
 sizes.
 
 Using
 tandem
 mass
 spectrometry,
 proteins
 can
 be
 sequenced
 and
 their
 identity
 determined
 by
 searching
 protein
 databases
and
using
search
tools
like
BLAST.

 Reading:
Lehninger
82‐84,
92‐100
(4th
ed
p.
88‐89,
96‐101).
 
 
 
 Selective
Hydrolysis
 
 Selective
hydrolysis
of
polypeptides
allows
a
long
polypeptide
to
be
cut
at
specific
locations,
to
 give
 shorter
 oligopeptides.
 If
 the
 oligopeptides
 are
 no
 longer
 than
 20‐30
 amino
 acids,
 their
 sequences
can
be
determined
by
Edman's
method
or
mass
spectrometry.
 
 Proteases
 
 Selective
 hydrolysis
 can
 be
 achieved
 with
 the
 help
 of
 digestive
 enzymes
 called
 proteases.
 Enzymes
 are
 proteins
 that
 catalyze
a
 specific
reaction,
in
this
 case,
 hydrolysis
of
the
 targeted
peptide
 bond.
 Lehninger
 p.
 96‐97
 and
 Fig
 3‐27
 (4th
ed
p.
99‐100
and
Fig.
3‐27).
 
 Trypsin
 is
 an
 enzyme
 that
 binds
 a
 polypeptide
 and
 cuts
 the
 peptide
 bond
 on
 the
 carboxylate
 side
 of
 the
 targets
 Arg
or
Lys.
 
 Chymotrypsin
 cuts
 polypeptide
 on
 carboxylate
side
of
Phe,
Tyr
or
Trp.

 
 In
both
cases,
if
the
next
amino
acid
after
the
target
is
Proline,
the
polypeptide
fails
to
bind
to
 the
 enzyme
and
can't
be
cut
at
that
point.
Proline
has
an
unusual
shape
due
to
the
side
chain
 bonding
to
the
alpha
amino
N.

 
 
 e.g.
for
trypsin
 Gly‐‐‐‐‐‐Lys‐X‐‐‐‐‐‐‐Arg‐Y‐‐‐‐‐‐Lys‐Pro‐‐‐‐‐‐Asn
 
 Gly‐‐‐‐‐‐Lys+
X‐‐‐‐‐‐‐Arg
+
Y‐‐‐‐‐Lys‐Pro‐‐‐‐‐‐Asn
 
 
 
 Page
1
of
5
 BIOC*2580
Lecture
8.

 Determining
the
amino
acid
sequence
of
a
protein.

 2
 
 and
for
chymotrypsin
 Gly‐‐‐‐‐‐Phe‐X‐‐‐‐‐‐‐Trp‐Y‐‐‐‐‐‐Phe‐Pro‐‐‐‐‐‐Asn
 
 Gly‐‐‐‐‐‐Phe+
X‐‐‐‐‐‐‐Trp
+
Y‐‐‐‐‐Phe‐Pro‐‐‐‐‐‐Asn
 
 
 Chemical
Hydrolysis
 
 The
chemical
reagent
cyanogen
bromide,
CNBr,
may
also
be
used;
Cyanogen
bromide
attacks
 on
the
carboxylate
side
of
methionine,
converting
it
to
homoserine,
Hse.
Being
a
chemical
 reagent,
not
a
catalyst,
cyanogen
bromide
is
consumed
in
the
reaction:
 
 Gly‐‐‐Met‐X‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐Met‐Y‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐Asn
 
 Gly‐‐‐Hse
+
X‐‐‐‐‐‐‐‐‐‐‐‐‐Hse
+
Y‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐Asn
 
 
 The
Overlap
Method
 
 The
 sequence
 of
 myoglobin
 showing
 sites
 where
 the
 polypeptidechain
 can
 be
 cut:
 red
 for
 sites
 where
 chymotrypsin
 attacks;
 blue
 where
 trypsin
 attacks
 (note
 the
 underlined
 Lys‐Pro
 is
 not
 cut);
 and
 green
 where
 cyanogen
bromide
attacks
Met.
 
 
 
 
 If
 myoglobin
 is
 digested
 in
 chymotrypsin,
 all
 the
 red
 labelled
 sites
 will
 be
 hydrolysed
 at
 the
 peptide
 bonds
 immediately
 following
 the
 target
 amino
 acid,
 it's
 not
 possible
 to
 attack
 at
 one
 location
at
a
time.
Similarly
all
the
sites
labelled
in
blue
will
be
cut
by
trypsin.
 
 Digestion
 of
 a
 polypeptide
 by
 trypsin
 or
 chymotrypsin
 creates
 a
 series
 of
 oligopeptides
 with
 a
 characteristic
pattern
of
molar
masses
that
is
unique
to
a
given
polypeptide.

The
oligopeptide
 masses
are
easily
measured
by
mass
spectrometry
(see
lecture
6),
and
can
be
used
to
identify
a
 particular
protein.
 
 In
experiments
to
determine
the
complete
amino
acid
sequence
of
a
protein,
selective
hydrolysis
 is
first
carried
out,
and
the
resulting
oligopeptides
are
separated
by
chromatography.
Usually
ion
 
 
 Page
2
of
5
 BIOC*2580
Lecture
8.

 Determining
the
amino
acid
sequence
of
a
protein.

 3
 
 exchange,
reversed
phase
or
gel
filtration
techniques
are
used.
The
individual
peptides
can
then
 be
sequenced
by
Edman's
method.

 
 After
all
oligopeptide
sequences
have
been
determined,
the
complete
polypeptide
sequence
is
 deduced
by
the
overlap
method,
see
Lehninger
Fig
3‐27,
p.
97
(4th
ed
Fig
3‐27
p.101).
 
 The
overlap
method
is
demonstrated
in
an
animation
separately
from
these
notes.

See
the
 Extra
Stuff
section
on
Courselink.
 
 
 
 Using
Mass
Spectrometry
to
Sequence
and
Identify
Proteins
 
 Proteins
can
be
sequenced
directly
using
tandem
mass
spectrometry
(Tandem
MS
or
MS/MSB).

 This
 is
 a
 modern
 technique
 that
 is
 commonly
 used
 in
 the
 analysis
 of
 the
 entire
 protein
 complement
of
an
organism
or
cell
(a
technique
called
proteomics).

Since
very
small
amounts
of
 proteins
are
required,
individual
bands
on
2D
gels
(see
lecture
6)
can
be
cut
out
and
sequenced
 without
the
need
for
complicated
protein
purification
techniques.
 
 The
protein
sample
is
hydrolyzed
into
a
mixture
of
shorter
peptides
using
a
protease
or
through
 chemical
 means.
 
 This
 mixture
 is
 then
 injected
 into
 a
 tandem
 MS;
 essentially
 two
 mass
 spectrometers
in
series.
 
 In
the
first
MS,
peptides
of
different
masses
are
separated.
Each
peptide
is
then
introduced
into
 a
 collision
 cell
where
each
peptide
molecule
fragments
only
once,
usually
at
a
peptide
bond.
 In
 the
second
MS,
the
masses
of
the
peptide
fragments
are
measured.

 
 
 Page
3
of
5
 BIOC*2580
Lecture
8.

 Determining
the
amino
acid
sequence
of
a
protein.

 4
 
 
 The
resulting
spectrum
looks
something
like
this:
 
 
 


 Δ
mass

 (m‐m‐1)
 ‐‐
 87.03
 87.03
 174.05
 115.04
 97.05
 115.03
 163.06
 163.07
 
 AA
 y1
 ‐‐
 y2
 Ser
 y3
 Ser
 y4
 Cys*
 y5
 Asp
 y6
 Pro
 y7
 Asp
 y8
 Tyr
 y9
 Tyr
 Cys*:
modified
with
acrylamide.
 
 Yasothornsrikul,
et
al
(2003)

 PNAS
100:9590‐9595.
 
 
 mass
 147.11
 234.14
 321.17
 495.22
 610.26
 707.31
 822.34
 985.40
 1148.47
 
 Since
breaking
one
peptide
bond
generates
these
fragments,
each
successive
peak
has
a
 difference
in
mass
of
one
amino
acid.

That
difference
in
mass
identifies
the
amino
acid
that
 was
lost.

The
only
ambiguity
involves
Ile
and
Leu,
which
has
the
same
mass.
 
 Amino
Acid
Residue
Masses
(Da)
 
 Glycine
 57.02147
 Isoleucine
 113.08407
 Methionine
 131.04049
 Alanine
 71.03712
 Leucine
 113.08407
 Histidine
 137.05891
 Serine
 
 87.03203
 Asparagine
 114.04293
 Phenylalanine
147.06842
 Proline

 97.05277
 Aspartic
acid
 115.02695
 Arginine
 156.10112
 Valine
 
 99.06842
 Glutamine
 128.05858
 Tyrosine
 163.06333
 Threonine
 101.04768
 Lysine
 
 128.09497
 Tryptophan
 186.07932 Cysteine
 103.00919
 Glutamic
acid
 129.04264
 
 In
the
example
above,
the
spectra
peaks
are
from
fragments
starting
at
the
C‐terminus.

As
a
 result,
the
sequence
of
the
peptide
is:
 YYDPDCSS
 
 
 
 Page
4
of
5
 BIOC*2580
Lecture
8.

 Determining
the
amino
acid
sequence
of
a
protein.

 5
 
 The
NCBI
Database
and
BLAST
searching
 
 The
sequence
of
a
peptide
can
be
compared
with
databanks
of
protein
sequences
of
all
known
 proteins.
 
 The
 NCBI
 (National
 Center
 for
 Biotechnology
 Information)
 database
 (www.ncbi.nlm.nih.gov)
contains
a
vast
amount
of
 sequence
 information.

 With
the
explosion
 of
 DNA
 sequencing
 available
 today
 and
 our
 ability
 to
 convert
 those
 DNA
 sequences
 into
 the
 protein
sequences
that
they
encode,
the
amount
of
protein
sequence
information
available
is
 astronomical,
and
is
growing
every
day.
 
 One
tool
that
is
available
through
the
NCBI
is
a
search
of
protein
sequences,
called
BLAST
(Basic
 Local
 Alignment
 Search
 Tool).
 
 With
 this
 tool,
 one
 can
 enter
 a
 peptide
 or
 protein
 primary
 sequence
and
generate
 a
list
of
 sequences
that
contain
the
highest
 homology.

Several
results
 can
occur:
 1. If
the
protein
being
analyzed
is
already
in
the
protein
database,
then
a
100%
match
will
 be
generated.


 2. If
 it
 is
 a
 protein
 that
 is
 not
 in
 the
 database,
 but
 has
 a
 close
 relative
 from
 another
 organism
or
has
a
similar
isoforms
in
the
same
organism,
then
a
very
close
match
will
 result.

These
kinds
of
similar
proteins
 are
called
 homologs;
 e.g.
myoglobin
from
horse
 and
whale,
or
α‐actin
and
β‐actin
in
humans.
This
is
often
enough
to
identify
the
kind
of
 protein.
 3. If
it
is
a
completely
new
protein,
then
there
will
be
very
little
homology
in
the
database
 and
the
identity
of
the
protein
will
not
be
apparent.
 
 Beyond
sequence
alignments,
performing
biochemical
tests
with
a
purified
protein
is
the
most
 definitive
method
of
determining
the
identity
of
a
protein.
 
 This
technique
also
has
the
added
advantage
of
being
able
to
identify
the
location
and
type
of
 any
posttranslational
modifications
that
may
be
present
on
the
protein,
as
shown
in
the
 example
above
with
the
modification
of
the
Cys
residue
with
acrylamide.
 
 
 
 
 Page
5
of
5
 ...
View Full Document

This note was uploaded on 09/21/2011 for the course BIOOC 2580 taught by Professor Douger during the Fall '10 term at University of Guelph.

Ask a homework question - tutors are online