The mostly complete chart of Neural Networks, explained

文章推薦指數: 80 %
投票人數:10人

The zoo of neural network types grows exponentially. One needs a map to navigate between many emerging architectures and approaches. HomeNotificationsListsStoriesWritePublishedinTowardsDataScienceThemostlycompletechartofNeuralNetworks,explainedThezooofneuralnetworktypesgrowsexponentially.Oneneedsamaptonavigatebetweenmanyemergingarchitecturesandapproaches.Fortunately,FjodorvanVeenfromAsimovinstitutecompiledawonderfulcheatsheetonNNtopologies.IfyouarenotnewtoMachineLearning,youshouldhaveseenitbefore:Inthisstory,Iwillgothrougheverymentionedtopologyandtrytoexplainhowitworksandwhereitisused.Ready?Let’sgo!Perceptron.ThesimplestandoldestmodelofNeuron,asweknowit.Takessomeinputs,sumsthemup,appliesactivationfunctionandpassesthemtooutputlayer.Nomagichere.Feedforwardneuralnetworksarealsoquiteold—theapproachoriginatesfrom50s.Thewayitworksisdescribedinoneofmypreviousarticles—“TheoldschoolmatrixNN”,butgenerallyitfollowsthefollowingrules:allnodesarefullyconnectedactivationflowsfrominputlayertooutput,withoutbackloopsthereisonelayerbetweeninputandoutput(hiddenlayer)InmostcasesthistypeofnetworksistrainedusingBackpropagationmethod.RBFneuralnetworksareactuallyFF(feedforward)NNs,thatuseradialbasisfunctionasactivationfunctioninsteadoflogisticfunction.Whatmakesthedifference?Logisticfunctionmapsomearbitraryvaluetoa0…1range,answeringa“yesorno”question.Itisgoodforclassificationanddecisionmakingsystems,butworksbadforcontinuousvalues.Contrary,radialbasisfunctionsanswerthequestion“howfararewefromthetarget”?Thisisperfectforfunctionapproximation,andmachinecontrol(asareplacementofPIDcontrollers,forexample).Tobeshort,thesearejustFFnetworkswithdifferentactivationfunctionandappliance.DFFneuralnetworksopenedpandoraboxofdeeplearninginearly90s.ThesearejustFFNNs,butwithmorethanonehiddenlayer.So,whatmakesthemsodifferent?Ifyoureadmypreviousarticleonbackpropagation,youmayhavenoticedthat,whentrainingatraditionalFF,wepassonlyasmallamountoferrortopreviouslayer.Becauseofthatstackingmorelayersledtoexponentialgrowthoftrainingtimes,makingDFFsquiteimpractical.Onlyinearly00swedevelopedabunchofapproachesthatallowedtotrainDFFseffectively;nowtheyformacoreofmodernMachineLearningsystems,coveringthesamepurposesasFFs,butwithmuchbetterresults.RecurrentNeuralNetworksintroducedifferenttypeofcells—Recurrentcells.ThefirstnetworkofthistypewassocalledJordannetwork,wheneachofhiddencellreceivedit’sownoutputwithfixeddelay—oneormoreiterations.Apartfromthat,itwaslikecommonFNN.Ofcourse,therearemanyvariations—likepassingthestatetoinputnodes,variabledelays,etc,butthemainidearemainsthesame.ThistypeofNNsismainlyusedthencontextisimportant—whendecisionsfrompastiterationsorsamplescaninfluencecurrentones.Themostcommonexamplesofsuchcontextsaretexts—awordcanbeanalysedonlyincontextofpreviouswordsorsentences.Thistypeintroducesamemorycell,aspecialcellthatcanprocessdatawhendatahavetimegaps(orlags).RNNscanprocesstextsby“keepinginmind”tenpreviouswords,andLSTMnetworkscanprocessvideoframe“keepinginmind”somethingthathappenedmanyframesago.LSTMnetworksarealsowidelyusedforwritingandspeechrecognition.Memorycellsareactuallycomposedofacoupleofelements—calledgates,thatarerecurrentandcontrolhowinformationisbeingrememberedandforgotten.Thestructureiswellseeninthewikipediaillustration(notethattherearenoactivationfunctionsbetweenblocks):Longshort-termmemory-WikipediaLongshort-termmemory(LSTM)isanartificialneuralnetworkarchitecturethatsupportsmachinelearning.Itis…en.wikipedia.orgThe(x)thingiesonthegrapharegates,andtheyhavetheyownweightsandsometimesactivationfunctions.Oneachsampletheydecidewhethertopassthedataforward,erasememoryandsoon—youcanreadaquitemoredetailedexplanationhere.Inputgatedecideshowmanyinformationfromlastsamplewillbekeptinmemory;outputgateregulatetheamountofdatapassedtonextlayer,andforgetgatescontrolthetearingrateofmemorystored.Thisis,however,averysimpleimplementationofLSTMcells,manyothersarchitecturesexist.GRUsareLSTMswithdifferentgating.Period.Soundssimple,butlackofoutputgatemakesiteasiertorepeatthesameoutputforaconcreteinputmultipletimes,andarecurrentlyusedthemostinsound(music)andspeechsynthesis.Theactualcomposition,though,isabitdifferent:allLSTMgatesarecombinedintoso-calledupdategate,andresetgateiscloselytiedtoinput.TheyarelessresourceconsumingthanLSTMsandalmostthesameeffective.Autoencodersareusedforclassification,clusteringandfeaturecompression.WhenyoutrainFFneuralnetworksforclassificationyoumostlymustfeedthenXexamplesinYcategories,andexpectoneofYoutputcellstobeactivated.Thisiscalled“supervisedlearning”.AEs,ontheotherhand,canbetrainedwithoutsupervision.Theirstructure—whennumberofhiddencellsissmallerthannumberofinputcells(andnumberofoutputcellsequalsnumberofinputcells),andwhentheAEistrainedthewaytheoutputisasclosetoinputaspossible,forcesAEstogeneralisedataandsearchforcommonpatterns.VAEs,comparingtoAE,compressprobabilitiesinsteadoffeatures.Despitethatsimplechange,whenAEsanswerthequestion“howcanwegeneralisedata?”,VAEsanswerthequestion“howstrongisaconnectionbetweentwoevents?shouldwedistributeerrorbetweenthetwoeventsortheyarecompletelyindependent?”.Alittlebitmorein-depthexplanation(withsomecode)isaccessiblehere.WhileAEsarecool,theysometimes,insteadoffindingthemostrobustfeatures,justadapttoinputdata(itisactuallyanexampleofoverfitting).DAEsaddabitofnoiseontheinputcells—varythedatabyrandombit,randomlyswitchbitsininput,etc.Bydoingthat,oneforcesDAEtoreconstructoutputfromabitnoisyinput,makingitmoregeneralandforcingtopickmorecommonfeatures.SAEisyetanotherautoencodertypethatinsomecasescanrevealsomehiddengroupingpattersindata.StructureisthesameasinAEbuthiddencellcountisbiggerthaninput/outputlayercellcount.MarkovChainsareprettyoldconceptofgraphswhereeachedgehasaprobability.Inoldtimestheywereusedtoconstructtextslike“afterwordhellowemighthaveworddearwith0.0053%probabilityandwordyouwith0.03551%probability”(yourT9,bytheway,usesMCstopredictyourinput).ThisMCsarenotneuralnetworksinaclassicway,MCscanbeusedforclassificationbasedonprobabilities(likeBayesianfilters),forclustering(ofsomesort),andasafinitestatemachine.Hopfieldnetworksaretrainedonalimitedsetofsamplessotheyrespondtoaknownsamplewiththesamesample.Eachcellservesasinputcellbeforetraining,ashiddencellduringtrainingandasoutputcellwhenused.AsHNstrytoreconstructthetrainedsample,theycanbeusedfordenoisingandrestoringinputs.Givenahalfoflearnedpictureorsequence,theywillreturnfullsample.BoltzmannmachinesareverysimilartoHNswheresomecellsaremarkedasinputandremainhidden.Inputcellsbecomeoutputassoonaseachhiddencellupdatetheirstate(duringtraining,BMs/HNsupdatecellsonebyone,andnotinparallel).ThisisthefirstnetworktopologythatwassuccesfullytainedusingSimulatedannealingapproach.MultiplestackedBoltzmannMachinescanforaso-calledDeepbeliefnetwork(seebelow),thatisusedforfeaturedetectionandextraction.RBMsresemble,inthestructure,BMsbut,duetobeingrestricted,allowtobetrainedusingbackpropagationjustasFFs(withtheonlydifferencethatbeforebackpropagationpassdataispassedbacktoinputlayeronce).DBNs,mentionedabove,areactuallyastackofBoltzmannMachines(surroundedbyVAEs).Theycanbechainedtogether(whenoneNNtrainsanother)andcanbeusedtogeneratedatabyalreadylearnedpattern.DCNnowadaysarestarsofartificialneuralnetworks.Theyfeatureconvolutioncells(orpoolinglayers)andkernels,eachservingadifferentpurpose.Convolutionkernelsactuallyprocessinputdata,andpoolinglayerssimplifyit(mostlyusingnon-linearfunctions,likemax),reducingunnecessaryfeatures.Typicallyusedforimagerecognition,theyoperateonsmallsubsetofimage(somethingabout20x20pixels).Theinputwindowisslidingalongtheimage,pixelbypixel.Thedataispassedtoconvolutionlayers,thatformafunnel(compressingdetectedfeatures).Fromthetermsofimagerecognition,firstlayerdetectsgradients,secondlines,thirdshapes,andsoontothescaleofparticularobjects.DFFsarecommonlyattachedtothefinalconvolutionallayerforfurtherdataprocessing.DNsareDCNsreversed.DNtakescatimage,andproducesvectorlike{dog:0,lizard:0,horse:0,cat:1}.DCNcantakethisvectoranddrawacatimagefromthat.Itriedtofindasoliddemo,butthebestdemoisonyoutube.DCIGN(ohmygod,thisislong)lookslikeDCNandDNgluedtogether,butitisnotquirecorrect.Actually,itisanautoencoder.DCNandDNdonotactasseparatenetworks,instead,theyarespacersforinputandoutputofthenetwork.Mostlyusedforimageprocessing,thesenetworkscanprocessimagesthattheyhavenotbeentrainedwithpreviously.Thesenets,duetotheirabstractionlevels,canremovecertainobjectsfromimage,re-paintit,orreplacehorseswithzebraslikethefamousCycleGANdid.GANrepresentsahugefamilyofdoublenetworks,thatarecomposedfromgeneratoranddiscriminator.Theyconstantlytrytofooleachother—generatortriestogeneratesomedata,anddiscriminator,receivingsampledata,triestotellgenerateddatafromsamples.Constantlyevolving,thistypeofneuralnetworkscangeneratereal-lifeimages,incaseyouareabletomaintainthetrainingbalancebetweenthesetwonetworks.pix2pixisanexcellentexampleofsuchapproach.LSMissparse(notfullyconnected)neuralnetworkwhereactivationfunctionsarereplacedbythresholdlevels.Cellaccumulatesvaluesfromsequentialsamples,andemitsoutputonlywhenthethresholdisreached,settinginternalcounteragaintozero.Suchideaistakenfromhumanbrain,andthesenetworksarewidelyusedincomputervisionandspeechrecognitionsystems,butwithoutmajorbreakthroughs.ELMisanattempttoreducecomplexitybehindFFnetworksbycreatingsparsehiddenlayerswithrandomconnections.Theyrequirelesscomputationalpower,buttheactualefficiencyheavilydependsonthetaskanddata.ESNisasubtypeofrecurrentnetworkswithaspecialtrainingapproach.Thedataispassedtoinput,thentheoutputifbeingmonitoredformultipleiterations(allowingtherecurrentfeaturestokickin).Onlyweightsbetweenhiddencellsareupdatedafterthat.Personally,Iknownorealapplicationofthattypeapartofmultipletheoreticalbenchmarks.Feelfreetoaddyours).DRNisadeepnetworkwheresomepartofinputdataispassedtonextlayers.Thisfeatureallowsthemtobereallydeep(upto300layers),butactuallytheyarekindofRNNwithoutexplicitdelay.KNintroducesthe“distancetocell”feature.Mostlyusedforclassification,thistypeofnetworktriestoadjusttheircellsformaximalreactiontoparticularinput.Whensomecellisupdated,it’sclosestneighboursareupdatedaswell.LikeSVMsthesenetworksarenotalwaysconsideredtobea“real”neuralnetworks.SVMisusedforbinaryclassificationtasks.Nomatterhowmanydimensions—orinputs—thenetmayprocess,theanswerisalways“yes”or“no”.SVMsarenotalwaysconsideredtobeaneuralnetwork.Huh,thelastone!Neuralnetworksarekindablack-boxes—wecantrainthem,getresults,enhancethembuttheactualdecisionpathismostlyhiddenfromus.TheNTMisanattempttofixit—isitaFFwithmemorycellsextracted.SomeauthorsalsosaythatitisanabstractionoverLSTM.Thememoryisaddressedbyitscontents,andthenetworkcanreadfromandwritetothememorydependingoncurrentstate,representingaTuring-completeneuralnetwork.Hopeyoulikedthisoverview.IfyouthinkImadeamistake,feelfreetocomment,andsubscribeforfuturearticlesaboutMachineLearning(also,checkmyDIYAIseriesifinterestedinthetopic).Seeyousoon!MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceMorefromMedium‘ReverseInference’indeepneuralnetworksTryingtogooglethetermReverseInferenceusuallypointsyoutoneuropsychologyarticles.Thearticlesusuallyrefertomethodsthattry…NeuralArchitects:WhathavewelearnedandwherearewegoingKerasDataAugmentationforScikitLearnHereisamethodtointegrateapreprocessingutilityfromKeraswithamodelfromScikit-learn.ThecompleteJupyternotebookisinthe…BriefIntroductionofReinforcementLearning2.5GANwedobetter?IntroductionConfusionmatrixmultilabel-indicatorisnotsupportedinscikitlearnAndrewNg’sMachineLearningSimplified — Part8|OverfittingandRegularizationInsideConvolutionalNeuralNetworksConvolutionalNeuralNetwork(CNN)isoneoftheadvancingdeeplearningalgorithmsinrecognition,classificationetc.mostcommonly…GetstartedAndrewTch1.3KFollowersIwriteaboutthingsIwonderaboutFollowRelatedSupervised,Semi-Supervised,Unsupervised,andSelf-SupervisedLearningConvolutionalNeuralNetsusingPyTorchIntroductionMixedInputDatainPyTorch :CNN+MLPThefieldofdeeplearningandmachinelearningareevolvingandgettingmoreandmorecomplexasthedataweneedtohandleisalso…Let’sLearn:NeuralNets#1Astep-by-stepchronicleofmelearningaboutneuralnets.HelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?