53 Pages

hms_9675

Course: HST 512, Spring 2006
School: MIT
Rating:
 
 
 
 
 

Word Count: 1623

Document Preview

Division Harvard-MIT of Health Sciences and Technology HST.512: Genomic Medicine Prof. Zoltan Szallasi Limitations of massively parallel technologies Zoltan Szallasi, MD Children's Hospital Informatics Program www.chip.org New technology All problems will be solved within a couple of years Realistic Expectations (limitations) Limitations: (you want to make predictions) Accuracy noise Sensitivity -...

Register Now

Unformatted Document Excerpt

Coursehero >> Massachusetts >> MIT >> HST 512

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Division Harvard-MIT of Health Sciences and Technology HST.512: Genomic Medicine Prof. Zoltan Szallasi Limitations of massively parallel technologies Zoltan Szallasi, MD Children's Hospital Informatics Program www.chip.org New technology All problems will be solved within a couple of years Realistic Expectations (limitations) Limitations: (you want to make predictions) Accuracy noise Sensitivity - completeness Inherent limitations (think about unpredictability > chaos) NOISE: - what is noise ? (and what is signal ?) - noise as an inherent feature of complex systems - noise in continuous and discrete measurements - noise as the limitation of the technology - what can be done about noise ? Statistics Normalization as a way to deal with systematic errors c : an unwanted signal or a disturbance (as static or a variation of voltage) in an electronic device or instrument (as radio or television); broadly : a disturbance interfering with the operation of a usually mechanical device or system d : electromagnetic radiation (as light or radio waves) that is composed of several frequencies and that involves random changes in frequency or amplitude e : irrelevant or meaningless data or output occurring along with desired information Noise may turn out to be an important signal !!!! -Penzias and Wilson >>> cosmic background radiation - discovery of the chemotherapeutic agent cis-platinum What we perceive as noise/error might be a key component of biological processes: 1) Mutations in evolution 2) "Junk" DNA 3) Asymmetric cell division may contribute to differentiation 4) Stochastic fluctuations may be important for the stability of complex physicochemical systems Genetic networks are stochastic systems: 1) A couple of hundred copies of a given transcription factor/nucleus 2) Intracellular environment is the not a free solution 3) Reaction kinetics is often slow etc. A B C D E F A B C D E F A B C D E F A B C D E F Please see Science. 2002 Aug 16; 297(5584):1183-6. Comment in: Science. 2002 Aug 16; 297(5584):1129-31. Stochastic gene expression in a single cell. Elowitz MB, Levine AJ, Siggia ED, Swain PS. -measuring population averaged data. That is true even if single cells are quantified due to stochasticity > two cells can get from a given state to another one via different paths Noise in measurements There is no measurement without noise - (it is the accuracy/sensitivity of your measurement that is low ) For continuous variables it is expected to obtain data with a certain "spread" Consequently: Statistics was invented - 0.5, -0.3, 0.2, 1.4, -1.5.....etc what is the true value of the observed variable ? - Did the variable change due to a given treatment? Etc. Lots of measurements and/or fairly good idea about the nature of the noise (e.g. normal distribution) Statistical analysis in biology: 1) What is the true value of a given parameter ? 2) the most common analysis Bayesian 3) You don't believe the measurements >> normalization 4) There are too many numbers >> permutation etc. Biological measurements are often expensive !!!!!!!!! A large number of papers relating to cancer were published in Nature/Science ..... based on single microarray measurements STATISTICS Reliable numbers cannot be produced without replicates The central problem : In massively parallel biological measurements quantitative or qualitative calls are supposed to be made on a large number of heterogeneous variables using only a few replicates. Noise of continuous variables, e.g. microarray measurements Tissue or Tissue under influence ..... RNA cDNA/cRNA Tagged with fluorescent dye Microarray of genes aka gene chips Ideally: 1 copy of a given RNA will produce 1 unit of a specific signal !!!!!!!!!!!!!! 1) cDNA produced from RNA (initiation of RT step, RT might drop off etc.) 2) cRNA produced in the presence of fluorescent dyes (cRNA production in not linear, Dye incorporation) 3) Breaking down cRNA into small pieces 4) hybridization/cross hybridization final signal = (all of the above) The situation is further complicated by other experimental issues >>> two-color cDNA microarray Ratio is influenced on background calculations equal amounts of labelled cDNA samples There is no truly blank spot !!!! Background mRNA reference sequence 5` 3` Spaced DNA probe pairs Reference sequence ... TGTGATGGTGGGAATGGGTCAGAAGGACTCCTATGTGGGTGACGGAGGCC ... Fluorescence Intensity Image Perfect match probe cells Mismatch probe cells AAT G G G T C A G AA G G A C T C C TAT G T G G G T G AAT G G G T C A G AA C G A C T C C TAT G T G G G T G Perfect match Oligo Mismatch Oligo Data representation If we express gene expression measurements as "per unit RNA" then decrease in the level of a given message unavoidably leads to a relative increase in the level of other messages. Distribution of probe intensities of several Affymetrix data sets belonging to the same set of experiment. Systematic error Density (x = x[, 1], from = 4, to = 16) 1.0 0.8 Density 0.6 0.4 0.2 0.0 4 6 8 10 12 14 16 N = 131822 Bandwidth = 0.1128 Normalization Normalization You don't believe the numbers 1) "most or certain things do not change" 2) Error model Shifting the means or medians and adjusting the distributions by Cubic spline fit/ Lowess etc. (Overfitting !!!) Density (x = x[, 1], from = 4, to = 16) 1.0 0.8 Density 0.6 0.4 0.2 0.0 4 6 8 10 12 14 16 N = 131822 Bandwidth = 0.1128 Density (x = y[, 1], from = 4, to = 16) 0.6 0.4 0.2 0.0 4 6 8 10 12 14 16 N = 131822 Bandwidth = 0.09808 cDNA microarray: the R/G ratios are intensity dependent Values should scatter about zero. Courtesy of Natalie Thorne. Used with permission. Overview of normalization: - to correct for systematic errors 1) Choose a set of that elements will be used - housekeeping genes - special control genes etc. 2) Determine the normalization function - global mean/median normalization - intensity dependent normalization Microarray Gene Expression Data Society www.mged.org Intensity dependent normalization by error models Error model: Low concentrations High concentrations Rocke, Vingron x=+ x = e x = e + ( ~ N 0, ) 2 ~ 2 N(0, ) Noise will limit the useful information content of measurements: A reliable detection of 2-fold differences seems to be the practical limit of massively parallel quantitation. (estimate: optimistic and not cross-platform) Level of gene expression Measurements with error bars Time window T me i A rational experiment will sample gene-expression according to a time-series in which each consecutive time point is expected to produce at least as large expression level difference as the error of measurement: approximately 5 min intervals in yeast, 15-30 min intervals in mammalian cells. Limitations: (you want to make predictions) Accuracy noise Sensitivity - completeness Inherent limitations (think about unpredictability > chaos) Sensitivity completeness How many parameters are we measuring ? How many parameters should we measure ? How many bionodes ? Cautious estimate: on the order of 1-2x105 10,000-20,000 active genes per cell < 3 posttranslational modifications/protein in yeast 3-6 (?) posttranslational modifications/protein in humans The number of bionodes is probably less than 10 times the number of genes Splice variants < > modules The coverage of microarray chips and proteomics keeps increasing >>>> complete genome Holland MJ. Transcript abundance in yeast varies over six orders of magnitude. J Biol Chem. 2002 Sensitivity : 2 copies/cell MOST transcripts are not seen by microarray Please see J Biol Chem. 2002 Apr 26; 277(17): 14363-6. Epub 2002 Mar 06. Transcript abundance in yeast varies over six orders of magnitude. Holland MJ. The utmost goal of technology : Single copy/ single cell BUT even if you measure everything accurately there might be problems with predictions Even a relatively simple set of ODEs can produce a rather strange behavior. Edward Lorenz 3 linked ODEs produced a behavior very sensitive to the initial conditions. (Chaos theory, Bifurcations etc.) Small changes in the initial conditions can cause huge changes at later time points The problem of way too many correlated numbers: Can this be due to chance ? -Analytical solution - Computational solution: Permutate and look for similar patterns In some cases analytical solution may exist Six breast cancer cell lines yielded 13 consistently mis-regulated genes (H-cadherin, S1002A, keratin 5 etc.) Can this be due to chance ? "E" different cell lines "N"-gene microarray Mi genes mis-regulated in the "i"-th cell line, K consistently mis-regulated across all E cell lines. What is the probability that the K genes were mis-regulated by chance ? This translates into a simple combinatorics problem BUT !!! - what if more genes are involved Distribution of pair-wise correlation coefficients in cancer associated gene expression data randomized Real The problem of way too many correlated numbers is a particularly nasty one. Significance can be off by orders of magnitude when comparing completely random permutations with "structural permutations" Noise in discrete measurements: DNA sequences Measurement error: Sequencing errors (0.1%-1%) Solution: sequence a lot AAATAACTCGGTGACCAAAAAAGAGTGTGAGGATAGATGTCA GAATGGTTGCTAAGGCACCTATTATTAGGTCGCTTATTAGTTTT CATGCCGTACATTGCACCTGGCAGACCTTGCCTTATTTCTCTGT ACATTTTTATTTTCCCGCGTGCTGCGCGGTGTTACACTGCGTTG TGTATTGCGCTGTGCACGGGGTCTGCGTAAGCGATGTTTTAGG GCACGGTTTGCTTCTAGAGTGGCCTCTCGCTCTTTTATTACCTCG CGCTTGTCAATTAGCTTTTTACCTCGCGCAAGGGATATAAGAA GCTTCGCGCGGCCGTTCCTGAAATAAAACTTGATGGGCACCAG GGTTATACCAGG........................ 3 billion -Find genes, introns, exons, transcription factor binding sites etc. Help can be found --- cDNA libraries etc. BUT 1) Yelin et al. Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 2003:379-86. ~1600 ACTUALLY transcribed antisense transcriptional Units 2) Kapranov et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science, 2002 As much as one order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized exons. TF binding site: TGGACT It can also be: TGCACT TGG/CACT TCG/CNCT Try to add constraints 1) Within 500 bp from the ATG 2) Tends to cluster in the same region Even if you do all this you will find that many "obviously" TF binding site-looking sequences do not function as such. (due to higher level DNA organization etc.) AND You often do not know what sequence to start with. 1. Statistical overrepresentation You define the rules 2. Cross-species conservation 3. Using artificial intelligence/Machine learning Hidden Markov models for exon/intron/gene identification (GENIE) Please see Nature. 2003 May 15; 423(6937): 241-54. Sequencing and comparison of yeast species to identify genes and regulatory elements. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. S. cerevisiae S. bayanus S. mikatae S. paradoxus Number of genes ~ 5,500 High level of synteni Courtesy of Eric Lander. Used with permission. Slow and rapid evolution: YBR184W 32% nucleotide and 13% aa identity MATa2 - 100 % nucleotide and 100 % aa identity !!!!!!!! Courtesy of Eric Lander. Used with permission. XYZn(0-21)ABC Intergenic conservation Intergenic vs. genic conservation Upstream vs. downstream conservation A given motif is also enriched in front of genes with similar function Courtesy of Eric Lander. Used with permission.
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

MIT - HST - 512
Harvard-MIT Division of Health Sciences and Technology HST.512: Genomic Medicine Prof. Alberto A. RivaInformational Resources(Finding your way through the Human Genome)Alberto Riva, PhD Children's Hospital Informatics Program Harvard Medical Sch
Wisconsin - ECE - 332
ECE 332 Homework #21) Draw the Nyquist and Bode plots for each of the following rational functions. 1 a) s3 +3s2 +2s s+1 b) s4 +5s3 +6s2 c) s2 +1 s -s d) s3 + 3s2 + 2s 2) Determine whether each of the following is BIBO stable. a) ss+1 2 -1 b) ss-1
Wisconsin - ECE - 332
Wisconsin - ECE - 332
ECE 332 Homework #21) Draw the Nyquist and Bode plots for each of the following rational functions. 1 a) s3 +3s2 +2s s+1 b) s4 +5s3 +6s2 c) s2 +1 s -s d) s3 + 3s2 + 2s 2) Determine whether each of the following is BIBO stable. a) ss+1 2 -1 b) ss-1
Wisconsin - ECE - 332
ECE 332 Homework #31) Design a second-order transfer function Hd (s) to meet all of the following specifications. Choose n as small as possible. a) ess0 = 0 b) Mr 1.3 c) Mp 1.3 d) ess1 .8 s e) b 1.5 rad/s f) r .7 rad/s g) Tr 2.3 s h) Tp 3.5
Wisconsin - ECE - 317
Abstract In this lab, the characteristic and function of two sound sensor, electret microphone element and a speaker, will be introduced. First, student will be asked to measure the output the voltage of the circuit corresponding to the several diffe
Texas A&M - CPSC - 110
ORDINAL TYPESA Type whose values are specified by a list is called an Ordinal Type.Integer Char Boolean Given a value( e.g. `D' , 10 , False) in an ordinal type ,we can specify the one unique value which proceeds or follows the value.Real and
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
MOSFET PRIMERSameer Sonkusale http:/nanolab.ece.tufts.eduMOSFETMetal Oxide Semiconductor Field Effect TransistorGate electrode is used to control the electric field in the channel region which in turn controls the flow of charges between sourc
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Tufts - EE - 12
Texas A&M - CPSC - 110
WELCOME TO CPSC 110STRUCTURED PROGRAMMING INPASCAL1LECTURE INFORMATIONhttp:/people.cs.tamu.edu/yjoo9317/cpsc206/2INTRODUCTION TO COMPUTER SCIENCECONCEPTS AND PROGRAMMING3Outline Part I: An overview of Computer Science. Part II: Compu
Wisconsin - EE - ECE 332
Discussion Notes - ECE 332 - 12/11/06 Algebraic Pole Placement DesignLet's say we have a plant G(s) in a unity feedback system, where we design the compensator Gc (s) to achieve some desired CLTF H(s). The most obvious way to accomplish this is to s
Texas A&M - CPSC - 110
CPSC 110 PASCAL PROGRAMMING Developed at Dartmouth (1970) by Wirth Designed as a language that can be utilized to develop programs in a structured manner. A high-level general purpose language1General Format of a Pascal ProgramProgram Heading
Texas A&M - CPSC - 110
Predefined FunctionsName Type of Argument Type of Result Exampleabs integer real real real integer real real or integer integer real integer integer integer real real abs (-2) abs (-2.4) round (2.6) trunc (2.6) sqr (2) sqr (1.100) sqrt (4)Value o
Texas A&M - CPSC - 110
Chapter 5 Modularity, Functions, and Data FlowLocal Variables: The SCOPE of a valuable is determined by where the variable is created. A Local variable is one that is declared within a procedure. It is known and can be referenced ONLY within that pr
Texas A&M - CPSC - 110
Multi way BranchingIf-Then Handles A 2-choice problem If Age &gt; 65 Then else Code-1 Code - 2 ;we either execute (1) Code-1 or (2) Code - 2101Problem: Assign the student the correct letter grade based on the following: 90 - 100 A 80 - 89 B 70 - 7
Texas A&M - CPSC - 110
Chapter 7Design And Implementation Of LoopsRepeat until : Repeat Body of Loop until Boolean Expression 1. Repeat Body Until Expression is True 2. No Compound statement used. 3. Always at least 1 pass through the loop.116Loop CategoriesConditi
Texas A&M - CPSC - 110
Chapter 1 Notes Computers- machines that perform very simple tasks according to specific instructions Program- a set of instructions for a computer to follow Software- a collection of programs Hardware- the physical machines that make up a computer C
Texas A&M - CPSC - 110
{ -Program Description: This program will use procedures to evaluate gross and gross pay of the worker by inputing the number of hours worked and calculating withholdings. pg.104 --} program WorkersPay; const payrate= 9.63; sstax= 0.06; fitax= 0.14;
Texas A&M - CPSC - 110
{ -Program Description: This program will allow the user to compute their electric bill. pg.351 #22. -} program Grades; var Scores: array[1.50] of integer; Students: array[1.50] of integer;{-PromptForInput: This procedure prompts the user to enter
Texas A&M - CPSC - 110
program Mechanics; vars hrs, quaterhrs, mechanic1charge, mechanic2charge : real;Procedure GetData(var hrs:real); begin writeln('Please enter the number of hours you expect the job to take, then press return:'); readln(hrs); end; Procedure CalcQuart
Texas A&M - CPSC - 110
program Sizes; vars height, weight, age, hatsize, sweatersize, pantsize : real;Procedure GetData(var height, weight, age: real); begin writeln('Please enter your height in inches, then press enter:'); readln(height); writeln('Please enter your weig
Texas A&M - CPSC - 110
program WorkersPay; const payrate= 9.63 sstax= .06 fitax= .14 sitax= .05 uniondues= 6 dependdues= 10 overtimerate= 1.5 vars hrs, grosspay, netpay, dependents, withholdings, socialdeduc, federaldeduc, statededuc, uniondeduc, dependdeduc := real;Proc
Wisconsin - EE - ECE 332
10-28 Bode plot for Gp(s)Bode Diagram 50 System: sys Frequency (rad/sec): 985 Magnitude (dB): -28.6Magnitude (dB) Phase (deg)0-50-100-150 -90-180 System: sys Frequency (rad/sec): 999 Phase (deg): -180-270-360 10110210 Frequenc
Texas A&M - CPSC - 110
{ -Program Description: This program will use procedures to evaluate gross and gross pay of the worker by inputing the number of hours worked and calculating withholdings. pg.104 --} program WorkersPay; const payrate= 9.63; sstax= 0.06; fitax= 0.14;
Texas A&M - CPSC - 110
{ -Program Description: This program will use procedures to compare the prices of two different mechanics. It will accept the number of hours it will take to complete the job and then calculate the price of the services. pg.142 -} program Mechanics(i
Texas A&M - CPSC - 110
{ -Program Description: This program will use procedures to evaluate the hat, sweater, and pant sizes using the user's input of their weight, height, and age. pg.142 --} program Sizes; var height, weight, age, hatsize, sweatersize, pantsize : real; {
Texas A&M - CPSC - 110
{ -Program Description: This program will use procedures to evaluate the area of a triangle. pg.104 -} program TriangleArea(input,output); var a, b, c, s, area : real; {one side of the triangle} {one side of the triangle} {one side of the triangle} {
Wisconsin - MICROBIO - 101
Wisconsin - EE - ECE 332
beforeAfter
Catawba Valley Community College - HUM - Spa-181
Lee L. Skinner2/28/08Northern Mexico FactsDonde Esta Norte Mxico? Mxico esta al norte Guatemala y El Salvador. Mexico es un pais de America Central. La capital de Mxico es Ciudad de Mxico Distrito Federal. Los colores de Mxico es verde y blanco
Texas A&M - STAT - 211
STATISTICS 211 HONORS 2007 PROF EMANUEL PARZEN KEY CONCEPTS ONE SAMPLE STATISTICAL INFERENCE 10/31 0.Population Parameters \mu, p Estimators from sample \mu\hat, \p\hat Denote standard error by S.E.; derive formulas from SONG OF SUMS for mean, varian
Texas A&M - STAT - 211
STAT 211 Prof Parzen CHAPTER 2 PROBABILITY, CONDITIONAL PROBABILITY, BAYES Probability theory enables us to measure uncertainity, chance, likelihood. Probability theory has applications to explain and predict observations in every aspect of life: sci
Texas A&M - STAT - 211
Statistics 211 Prof. Emanuel Parzen Chapter 5 Sampling Distributions, Central Limit Theorem, Normal Approximation to the Binomial This chapter will complete our set of tools of probability theory that we need to conduct statistical inference. defined
Texas A&M - STAT - 211
1 Stat 211 Prof Parzen CHAPTER 1 STATISTICAL DATA ANALYSIS Statistical methods seek to learn patterns from a data set by computing, comparing, and interpreting statistical summaries, including mean, median, quartiles, midquartile, inter-quartile rang
Texas A&M - STAT - 211
Stat 211 Prof Parzen CHAPTER 3 Binomial Probability, Random Variables, ExpectationA Binomial Probability problem considers independent trials whose outcome are 0 or 1 (also called failure or success) according as a specified event A does not or doe
Texas A&M - STAT - 211
Statistics 211 Prof. Emanuel Parzen Chapter 6 STATISTICAL INFERENCE, HYPOTHESIS TESTS, CONFIDENCE INTERVALS Statistical inference is the science of learning from data. Its strategy (long range plan) is to determine probability models which fit the ob
Wisconsin - EE - ECE 332
I and B part2I and C part2I and A Part4I and D part 4A and D part 6A (grounded) and C part 7D (grounded) and CPart 9 commonPart 10 P (grounded) and CPart 10 Q (grounded) and C
Texas A&M - STAT - 211
STATISTICS 211 HONORS Chapter 6A STATISTICAL INFERENCE CONFIDENCE INTERVALS HYPOTHESIS TESTSPROF EMANUEL PARZENSTATISTICAL INFERENCE seeks to learn from data values of parameters of the probability distribution obeyed by the random variable of wh
Texas A&M - STAT - 211
STATISTICS 211 PROF EMANUEL PARZEN Chapter 7 ONE SAMPLE, TWO SAMPLE STATISTICAL METHODS STATISTICAL INFERENCE PARAMETERS , pOur Data Modeling Strategy has VALIDATION action, phase, problem 3 whose goal is to find parameters of probability models t
Texas A&M - STAT - 211
STATISTICS 211 Prof EMANUEL PARZEN Chapter 7A OUTLINE TWO SAMPLE INFERENCE CASE \mu: Two samples of continuous variable Y Scientific nature of random variable Y being observed Distribution of variable Y: (1) Assume NORMAL or (2) assume finite populat
Texas A&M - STAT - 211
STATISTICS 211 PROF EMANUEL PARZEN CHAPTER 8 ANALYSIS OF VARIANCE, MULTIPLE SAMPLES Statistical methods for learning from multiple (more than 2) samples is called Analysis of Variance; they were pioneered by Sir Ronald Fisher in the 1920/s. We observ
Texas A&M - STAT - 211
STATISTICS 211 CHAPTER 8A REGRSSION SUMMARY This chapter is a summary of the formulas derive in the next chapter on Simple Linear RegressionREGRESSION FORMULAS SUMMARY Response variable Y continuous quantitative ; Y Random variable Explanatory vari
Texas A&M - STAT - 211
STATISTICS 211 PROF EMANUEL PARZEN CHAPTER 9 BIVARIATE DATA ANALYSIS, CORRELATION, REGRESSION LINE A very important application of statistical methods is study of relations between two continuous variables X and Y . given observed data ( X j , Y j )
Texas A&M - STAT - 211
Chapter 4 Stat 211 Prof Parzen STANDARD DISTRIBUTIONS FOR APPLIED STATISTICS In statistical practice there are a small number of distinguished distributions which researchers use as models for observed data. The continuous distributions that are fund
Texas A&M - MATH - 308
Day 8: In Day 8 we began Chapters 4 (second order linear) and 6 (linear higher than second order ). Since differential equations in both chapters are handled exactly the same way, both will be studied at the same time and homework from both chapters
Texas A&M - MATH - 308
Day 1: In the first class, we covered what differential equations are, what makes them linear or not linear, and why we care. Differential equations have equal signs. Differential equations also have &quot;fleas.&quot; (The prime, indicating a derivative.) The
Texas A&M - MATH - 222
Math 222 - 5001.Final ExamDec 13, 2006(25) Define the following: a. L is a linear transformation, L is first of all a function mapping a vector space V into a vector space W with the following additional property: if and are any two vectors
Texas A&M - MATH - 222
Math 222-5001.Solutions Exam 1October 10, 2006(15) Define the following: a. the span of the set of vectors 1 , 2 , , k , x x x The set of all linear combinations of these vectors is the span.b.the set of vectors 1 , 2 , , k is linearly inde