MESSAGE
DATE | 2016-10-27 |
FROM | Ruben Safir
|
SUBJECT | Re: [Learn] Phylogenetics educational links
|
From learn-bounces-at-nylxs.com Thu Oct 27 23:08:57 2016 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: from www.mrbrklyn.com (www.mrbrklyn.com [96.57.23.82]) by mrbrklyn.com (Postfix) with ESMTP id CE30C161312; Thu, 27 Oct 2016 23:08:56 -0400 (EDT) X-Original-To: learn-at-nylxs.com Delivered-To: learn-at-nylxs.com Received: from [10.0.0.62] (flatbush.mrbrklyn.com [10.0.0.62]) by mrbrklyn.com (Postfix) with ESMTP id 4FA83160E77 for ; Thu, 27 Oct 2016 23:08:53 -0400 (EDT) To: learn-at-nylxs.com References: <8a528f28-6298-2fe8-0a01-8899000d0244-at-mrbrklyn.com> <8760odd05h.fsf-at-contrapunctus.net> <0baaaa4e-9412-c72b-7cfe-183648b929c2-at-panix.com> From: Ruben Safir Message-ID: Date: Thu, 27 Oct 2016 23:08:53 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <0baaaa4e-9412-c72b-7cfe-183648b929c2-at-panix.com> Subject: Re: [Learn] Phylogenetics educational links X-BeenThere: learn-at-nylxs.com X-Mailman-Version: 2.1.17 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: learn-bounces-at-nylxs.com Sender: "Learn"
FWIW
https://www.youtube.com/watch?v=3D4FUr3_2tzOc
the future I think is in ctscans
On 10/27/2016 10:47 PM, Ruben Safir wrote: > On 10/27/2016 04:09 PM, Christopher League wrote: >> >> Here's a set of introductory slides on inference of phylogenetic trees. >> >> >> >> Based on what I've learned today, "small" parsimony algorithms like >> Fitch and Sankoff rely on labeling a *given* tree shape (aka topology), >> so we already have to know (or hypothesize) the ancestral relationships. >> The algorithm just determines which labels (mutations) to assign to >> interior nodes. That's really unsatisfying to me. >> >> But the "big" parsimony techniques have to search the entire tree space, >> which is ENORMOUS. The problem is formally NP-hard. Now -- people do >> manage to solve (or approximately solve) NP-hard problems every day by >> using piles of dirty tricks. When it comes to searching gigantic spaces, >> those dirty tricks are the classic techniques of artificial >> intelligence. >> >> So the lecture slides get into techniques like greedy algorithms, >> hill-climbing, simulated annealing, genetic algorithms, etc. Anyway, >> there could be a lot of meat here that fits under the heading of AI + >> phylogenetics. It's much more accessible stuff than trying to >> automatically interpret 3D model data to take measurements of maxillary >> bones. >> >> CL >> >> >> Ruben Safir writes: >> >>> http://telliott99.blogspot.com/2010/03/fitch-and-sankoff-algorithms-for= .html >>> >>> "The Fitch algorithm considers the sites (or characters) one at a time.= At each tip in the tree, we create a set containing those nucleotides (sta= tes) that are observed or are compatible with the observation. Thus, if we = see an A, we create the set {A}. If we see an ambiguity such as R, we creat= e the set {AG}. Now we move down the tree [away from the tips]. In algorith= mic terms, we do a postorder tree traversal. At each interior node we creat= e a set that is the intersection of sets at the two descendant nodes. Howev= er, if that set is empty, we instead create the set that is the union of th= e two sets at the descendant nodes. Every time we create such a union, we a= lso count one change of state." >>> >>> >>> https://en.wikipedia.org/wiki/Non-parametric >>> Nonparametric statistics are statistics not based on parameterized fami= lies of probability distributions. They include both descriptive and infere= ntial statistics. The typical parameters are the mean, variance, etc. Unlik= e parametric statistics, nonparametric statistics make no assumptions about= the probability distributions of the variables being assessed. The differe= nce between parametric models and non-parametric models is that the former = has a fixed number of parameters, while the latter grows the number of para= meters with the amount of training data.[1] Note that the non-parametric mo= del does, counterintuitively, contain parameters: the distinction is that p= arameters are determined by the training data in the case of non-parametric= statistics, not the model. >>> >>> https://en.wikipedia.org/wiki/Fitch-Margoliash_algorithm >>> Distance-matrix methods >>> >>> Distance-matrix methods of phylogenetic analysis explicitly rely on a m= easure of "genetic distance" between the sequences being classified, and th= erefore they require an MSA (multiple sequence alignment) as an input. Dist= ance is often defined as the fraction of mismatches at aligned positions, w= ith gaps either ignored or counted as mismatches.[1] Distance methods attem= pt to construct an all-to-all matrix from the sequence query set describing= the distance between each sequence pair. From this is constructed a phylog= enetic tree that places closely related sequences under the same interior n= ode and whose branch lengths closely reproduce the observed distances betwe= en sequences. Distance-matrix methods may produce either rooted or unrooted= trees, depending on the algorithm used to calculate them. They are frequen= tly used as the basis for progressive and iterative types of multiple seque= nce alignment. The main disadvantage of distance-matrix methods is their in= ability to efficiently use =
>>> information about local high-variation regions that appear across mult= iple subtrees.[2] >>> >>> >>> -- =
>>> So many immigrant groups have swept through our town >>> that Brooklyn, like Atlantis, reaches mythological >>> proportions in the mind of the world - RI Safir 1998 >>> http://www.mrbrklyn.com =
>>> >>> DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 >>> http://www.nylxs.com - Leadership Development in Free Software >>> http://www2.mrbrklyn.com/resources - Unpublished Archive >>> http://www.coinhangout.com - coins! >>> http://www.brooklyn-living.com >>> >>> Being so tracked is for FARM ANIMALS and and extermination camps, >>> but incompatible with living as a free human being. -RI Safir 2013 >>> _______________________________________________ >>> Learn mailing list >>> Learn-at-nylxs.com >>> http://lists.mrbrklyn.com/mailman/listinfo/learn >> >> >> >> _______________________________________________ >> Learn mailing list >> Learn-at-nylxs.com >> http://lists.mrbrklyn.com/mailman/listinfo/learn >> > =
> Since there are n OTUs to insert, the outer loop iterates n > =E2=88=92 > 1 times. > When there are n leaves in the tree, there are > 2n > =E2=88=92 > 1 places to insert the new OTU, and determining the cost of the result > requires O(n)time. Thus, a greedy algorithm for inferring a phylogeny is > O(n3). > =
> =
> Any chance you can explain this better? How did it get the N cubed? > I'm sick of not understanding this. > =
> =
-- =
So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998 http://www.mrbrklyn.com
DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 http://www.nylxs.com - Leadership Development in Free Software http://www2.mrbrklyn.com/resources - Unpublished Archive http://www.coinhangout.com - coins! http://www.brooklyn-living.com
Being so tracked is for FARM ANIMALS and and extermination camps, but incompatible with living as a free human being. -RI Safir 2013 _______________________________________________ Learn mailing list Learn-at-nylxs.com http://lists.mrbrklyn.com/mailman/listinfo/learn
|
|