MESSAGE
DATE | 2016-10-27 |
FROM | Ruben Safir
|
SUBJECT | Re: [Learn] Phylogenetics educational links
|
From learn-bounces-at-nylxs.com Thu Oct 27 22:47:54 2016 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: from www.mrbrklyn.com (www.mrbrklyn.com [96.57.23.82]) by mrbrklyn.com (Postfix) with ESMTP id 4A0FD161312; Thu, 27 Oct 2016 22:47:54 -0400 (EDT) X-Original-To: learn-at-nylxs.com Delivered-To: learn-at-nylxs.com Received: from mailbackend.panix.com (mailbackend.panix.com [166.84.1.89]) by mrbrklyn.com (Postfix) with ESMTP id 305C5160E77 for ; Thu, 27 Oct 2016 22:47:50 -0400 (EDT) Received: from [10.0.0.62] (www.mrbrklyn.com [96.57.23.82]) by mailbackend.panix.com (Postfix) with ESMTPSA id 6B19519F59 for ; Thu, 27 Oct 2016 22:47:50 -0400 (EDT) To: learn-at-nylxs.com References: <8a528f28-6298-2fe8-0a01-8899000d0244-at-mrbrklyn.com> <8760odd05h.fsf-at-contrapunctus.net> From: Ruben Safir Message-ID: <0baaaa4e-9412-c72b-7cfe-183648b929c2-at-panix.com> Date: Thu, 27 Oct 2016 22:47:49 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <8760odd05h.fsf-at-contrapunctus.net> Subject: Re: [Learn] Phylogenetics educational links X-BeenThere: learn-at-nylxs.com X-Mailman-Version: 2.1.17 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: learn-bounces-at-nylxs.com Sender: "Learn"
On 10/27/2016 04:09 PM, Christopher League wrote: > =
> Here's a set of introductory slides on inference of phylogenetic trees. > =
> > =
> Based on what I've learned today, "small" parsimony algorithms like > Fitch and Sankoff rely on labeling a *given* tree shape (aka topology), > so we already have to know (or hypothesize) the ancestral relationships. > The algorithm just determines which labels (mutations) to assign to > interior nodes. That's really unsatisfying to me. > =
> But the "big" parsimony techniques have to search the entire tree space, > which is ENORMOUS. The problem is formally NP-hard. Now -- people do > manage to solve (or approximately solve) NP-hard problems every day by > using piles of dirty tricks. When it comes to searching gigantic spaces, > those dirty tricks are the classic techniques of artificial > intelligence. > =
> So the lecture slides get into techniques like greedy algorithms, > hill-climbing, simulated annealing, genetic algorithms, etc. Anyway, > there could be a lot of meat here that fits under the heading of AI + > phylogenetics. It's much more accessible stuff than trying to > automatically interpret 3D model data to take measurements of maxillary > bones. > =
> CL > =
> =
> Ruben Safir writes: > =
>> http://telliott99.blogspot.com/2010/03/fitch-and-sankoff-algorithms-for.= html >> >> "The Fitch algorithm considers the sites (or characters) one at a time. = At each tip in the tree, we create a set containing those nucleotides (stat= es) that are observed or are compatible with the observation. Thus, if we s= ee an A, we create the set {A}. If we see an ambiguity such as R, we create= the set {AG}. Now we move down the tree [away from the tips]. In algorithm= ic terms, we do a postorder tree traversal. At each interior node we create= a set that is the intersection of sets at the two descendant nodes. Howeve= r, if that set is empty, we instead create the set that is the union of the= two sets at the descendant nodes. Every time we create such a union, we al= so count one change of state." >> >> >> https://en.wikipedia.org/wiki/Non-parametric >> Nonparametric statistics are statistics not based on parameterized famil= ies of probability distributions. They include both descriptive and inferen= tial statistics. The typical parameters are the mean, variance, etc. Unlike= parametric statistics, nonparametric statistics make no assumptions about = the probability distributions of the variables being assessed. The differen= ce between parametric models and non-parametric models is that the former h= as a fixed number of parameters, while the latter grows the number of param= eters with the amount of training data.[1] Note that the non-parametric mod= el does, counterintuitively, contain parameters: the distinction is that pa= rameters are determined by the training data in the case of non-parametric = statistics, not the model. >> >> https://en.wikipedia.org/wiki/Fitch-Margoliash_algorithm >> Distance-matrix methods >> >> Distance-matrix methods of phylogenetic analysis explicitly rely on a me= asure of "genetic distance" between the sequences being classified, and the= refore they require an MSA (multiple sequence alignment) as an input. Dista= nce is often defined as the fraction of mismatches at aligned positions, wi= th gaps either ignored or counted as mismatches.[1] Distance methods attemp= t to construct an all-to-all matrix from the sequence query set describing = the distance between each sequence pair. From this is constructed a phyloge= netic tree that places closely related sequences under the same interior no= de and whose branch lengths closely reproduce the observed distances betwee= n sequences. Distance-matrix methods may produce either rooted or unrooted = trees, depending on the algorithm used to calculate them. They are frequent= ly used as the basis for progressive and iterative types of multiple sequen= ce alignment. The main disadvantage of distance-matrix methods is their ina= bility to efficiently use =
>> information about local high-variation regions that appear across multi= ple subtrees.[2] >> >> >> -- =
>> So many immigrant groups have swept through our town >> that Brooklyn, like Atlantis, reaches mythological >> proportions in the mind of the world - RI Safir 1998 >> http://www.mrbrklyn.com =
>> >> DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 >> http://www.nylxs.com - Leadership Development in Free Software >> http://www2.mrbrklyn.com/resources - Unpublished Archive >> http://www.coinhangout.com - coins! >> http://www.brooklyn-living.com >> >> Being so tracked is for FARM ANIMALS and and extermination camps, >> but incompatible with living as a free human being. -RI Safir 2013 >> _______________________________________________ >> Learn mailing list >> Learn-at-nylxs.com >> http://lists.mrbrklyn.com/mailman/listinfo/learn > =
> =
> =
> _______________________________________________ > Learn mailing list > Learn-at-nylxs.com > http://lists.mrbrklyn.com/mailman/listinfo/learn > =
Since there are n OTUs to insert, the outer loop iterates n =E2=88=92 1 times. When there are n leaves in the tree, there are 2n =E2=88=92 1 places to insert the new OTU, and determining the cost of the result requires O(n)time. Thus, a greedy algorithm for inferring a phylogeny is O(n3).
Any chance you can explain this better? How did it get the N cubed? I'm sick of not understanding this.
-- =
So many immigrant groups have swept through our town that Brooklyn, like Atlantis, reaches mythological proportions in the mind of the world - RI Safir 1998 http://www.mrbrklyn.com
DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002 http://www.nylxs.com - Leadership Development in Free Software http://www2.mrbrklyn.com/resources - Unpublished Archive http://www.coinhangout.com - coins! http://www.brooklyn-living.com
Being so tracked is for FARM ANIMALS and and extermination camps, but incompatible with living as a free human being. -RI Safir 2013 _______________________________________________ Learn mailing list Learn-at-nylxs.com http://lists.mrbrklyn.com/mailman/listinfo/learn
|
|