The Essential Guide to Neural Network Architectures - V7 Labs

文章推薦指數: 80 %
投票人數:10人

Key Components of the Neural Network Architecture. The Neural Network architecture is made of individual units called neurons that mimic the ... Platformv7platformImageAnnotationSmooth,fast,real-timeserviceDatasetManagementSmooth,fast,real-timeserviceModelTrainingSmooth,fast,real-timeservicefeaturesAutomatedAnnotationSmooth,fast,real-timeserviceIndustriesindustriesHealthcareRetailAgricultureLifeSciencesFood & DrinkEnergyInsurance& FinanceAllIndustries->Software& InternetLogisticsGovernmentAutomotiveManufacturingConstructionSportscustomerstoriesHowCattleEyeUsesV7toDevelopAIModels10xFasterMonitoringthehealthofcattlethroughcomputervisionHowMiovisionisUsingV7toBuildSmartCitiesBuildingcomputervision-poweredtrafficsolutions.AllCustomerStories->CompanyAboutUsContactUsJobsWe’rehiring!DataSecurityfromthev7blogHowtoorganizeastartupretreat:10thingswelearnedatV7V7isnowFDAPart11Compliant5AlternativestoScaleAI[Reviewed2022]Readfromourblog->PricingCommunityBlogDocsAcademyOpenDatasetsnewsV7isthe2021CogXawardforBestAIproductinHealthcareOrchestratingcomputervisionwithElixiratV7-ByJoséValim,creatorofElixirV7NamedAmongtheTop25MachineLearningStartupstoWatchIn2021V7Labsraises$3MtohelpAIteams‘automate’trainingdataworkflowsLogin14DayTrialv7platformImageAnnotationDatasetManagementModelTrainingcompanyAboutUsContactUsJobsDataSecurityBlogFeaturesAutomatedAnnotationindustriesHealthcareRetailAgricultureAllIndustries->customerstoriesGEDiagnosisandDetectionReadfromanMITteamabouthowtheyuseV7–– perfecttooltomake.MITSelf-DrivingVehiclesReadfromanMITteamabouthowtheyuseV7–– perfecttooltomake.AllCustomerStories->DeepLearningTheEssentialGuidetoNeuralNetworkArchitecturesWhatareNeuralNetworksandhowdotheywork?LearnaboutdifferentArtificialNeuralNetworksarchitectures,theircharacteristics,andlimitations.16minread  ·  March8,2022PragatiBahetiMicrosoftContentsWhatareNeuralNetworks?Feed-ForwardNeuralNetworksRecurrentNeuralNetworks(RNN)ConvolutionalNeuralNetworks(CNN) GenerativeAdversarialNetworks(GANs)TransformerNetworksHere'sthefact—Deeplearning,specificallyNeuralNetworks,isaboilinghotareaofresearch. TherearecountlessnewNeuralNetworkarchitecturesproposedandupdatedeverysingleday. Earlier,theuseofNeuralNetworkswasrestrictedtosimpleclassificationproblems,likespammessages,buttheyhavesinceadvancedtodomainslikevisualsearchengines,recommendationengines,chatbots,andthemedicalfield.TheevolutionofsmallArtificialNeuralNetworksthatcouldhandlefewerdatasampleshasevolvedintoarchitecturesconsistingofmillionsofparameterstrainedontonsofdata. Afterreadingthisarticle,you’llunderstandthefollowing:WhatareNeuralNetworks?Feed-ForwardNeuralNetworksRecurrentNeuralNetworks(RNN)ConvolutionalNeuralNetworks(CNN) GenerativeAdversarialNetworks(GANs)TransformerNetworksReady?Let'sstartwiththebasics.WhatareNeuralNetworks?NeuralNetworksarethefunctionalunitofDeepLearningandareknowntomimicthebehaviorofthehumanbraintosolvecomplexdata-drivenproblems.Theinputdataisprocessedthroughdifferentlayersofartificialneuronsstackedtogethertoproducethedesiredoutput.Fromspeechrecognitionandpersonrecognitiontohealthcareandmarketing,NeuralNetworkshavebeenusedinavariedsetofdomains. KeyComponentsoftheNeuralNetworkArchitectureTheNeuralNetworkarchitectureismadeofindividualunitscalledneuronsthatmimicthebiologicalbehaviorofthebrain. Herearethevariouscomponentsofaneuron.NeuroninArtificialNeuralNetworkInput-Itisthesetoffeaturesthatarefedintothemodelforthelearningprocess.Forexample,theinputinobjectdetectioncanbeanarrayofpixelvaluespertainingtoanimage.Weight-Itsmainfunctionistogiveimportancetothosefeaturesthatcontributemoretowardsthelearning.Itdoessobyintroducingscalarmultiplicationbetweentheinputvalueandtheweightmatrix.Forexample,anegativewordwouldimpactthedecisionofthesentimentanalysismodelmorethanapairofneutralwords.Transferfunction-Thejobofthetransferfunctionistocombinemultipleinputsintooneoutputvaluesothattheactivationfunctioncanbeapplied.Itisdonebyasimplesummationofalltheinputstothetransferfunction. ActivationFunction-Itintroducesnon-linearityintheworkingofperceptronstoconsidervaryinglinearitywiththeinputs.Withoutthis,theoutputwouldjustbealinearcombinationofinputvaluesandwouldnotbeabletointroducenon-linearityinthenetwork.💡Protip:Lookingforaperfectsourceforarecapofactivationfunctions?CheckoutTypesofNeuralNetworksActivationFunctions.Bias-Theroleofbiasistoshiftthevalueproducedbytheactivationfunction.Itsroleissimilartotheroleofaconstantinalinearfunction.Whenmultipleneuronsarestackedtogetherinarow,theyconstitutealayer,andmultiplelayerspilednexttoeachotherarecalledamulti-layerneuralnetwork.We'vedescribedthemaincomponentsofthistypeofstructurebelow.Multi-layerneuralnetworkInputLayerThedatathatwefeedtothemodelisloadedintotheinputlayerfromexternalsourceslikeaCSVfileorawebservice.ItistheonlyvisiblelayerinthecompleteNeuralNetworkarchitecturethatpassesthecompleteinformationfromtheoutsideworldwithoutanycomputation.HiddenLayersThehiddenlayersarewhatmakesdeeplearningwhatitistoday.Theyareintermediatelayersthatdoallthecomputationsandextractthefeaturesfromthedata.Therecanbemultipleinterconnectedhiddenlayersthataccountforsearchingdifferenthiddenfeaturesinthedata.Forexample,inimageprocessing,thefirsthiddenlayersareresponsibleforhigher-levelfeatureslikeedges,shapes,orboundaries.Ontheotherhand,thelaterhiddenlayersperformmorecomplicatedtaskslikeidentifyingcompleteobjects(acar,abuilding,aperson).OutputLayer Theoutputlayertakesinputfromprecedinghiddenlayersandcomestoafinalpredictionbasedonthemodel’slearnings.Itisthemostimportantlayerwherewegetthefinalresult.Inthecaseofclassification/regressionmodels,theoutputlayergenerallyhasasinglenode.However,itiscompletelyproblem-specificanddependentonthewaythemodelwasbuilt.StandardNeuralNetworksThePerceptronPerceptronisthesimplestNeuralNetworkarchitecture.Itisatypeof NeuralNetworkthattakesanumberofinputs,appliescertainmathematicaloperationsontheseinputs,andproducesanoutput.Ittakesavectorofrealvaluesinputs,performsalinearcombinationofeachattributewiththecorrespondingweightassignedtoeachofthem.Theweightedinputissummedintoasinglevalueandpassedthroughanactivationfunction. TheseperceptronunitsarecombinedtoformabiggerArtificialNeuralNetworkarchitecture.Feed-ForwardNetworksPerceptronrepresentshowasingleneuronworks.But—Whataboutaseriesofperceptronsstackedinarowandpiledindifferentlayers?Howdoesthemodellearnthen?Itisamulti-layerNeuralNetwork,and,asthenamesuggests,theinformationispassedintheforwarddirection—fromlefttoright.Intheforwardpass,theinformationcomesinsidethemodelthroughtheinputlayer,passesthroughtheseriesofhiddenlayers,andfinallygoestotheoutputlayer.ThisNeuralNetworksarchitectureisforwardinnature—theinformationdoesnotloopwithtwohiddenlayers.Thelaterlayersgivenofeedbacktothepreviouslayers.ThebasiclearningprocessofFeed-ForwardNetworksremainthesameastheperceptron.ResidualNetworks(ResNet)NowthatyouknowmoreabouttheFeed-ForwardNetworks,onequestionmighthavepoppedupinyourhead—howtodecideonthenumberoflayersinourneuralnetworkarchitecture?Anaiveanswerwouldbe:Thegreaterthenumberofhiddenlayers,thebetteristhelearningprocess.Morelayersenrichthelevelsoffeatures.But—Isthatso?VerydeepNeuralNetworksareextremelydifficulttotrainduetovanishingandexplodinggradientproblems.ResNetsprovideanalternatepathwayfordatatoflowtomakethetrainingprocessmuchfasterandeasier.Thisisdifferentfromthefeed-forwardapproachofearlierNeuralNetworksarchitectures. ThecoreideabehindResNetisthatadeepernetworkcanbemadefromashallownetworkbycopyingweightfromtheshallowcounterpartsusingidentitymapping.Thedatafrompreviouslayersisfast-forwardedandcopiedmuchforwardintheNeuralNetworks.ThisiswhatwecallskipconnectionsfirstintroducedinResidualNetworkstoresolvevanishinggradients.RecurrentNeuralNetworks(RNNs)Thebasicdeeplearningarchitecturehasafixedinputsize,andthisactsasablockerinscenarioswheretheinputsizeisnotfixed.Also,thedecisionsmadebythemodelwerebasedonthecurrentinputwithnomemoryofthepast. RecurrentNeuralNetworksworkverywellwithsequencesofdataasinput.ItsfunctionalitycanbeseeninsolvingNLPproblemslikesentimentanalysis,spamfilters,timeseriesproblemslikesalesforecasting,stockmarketprediction,etc.TheRecurrentNeuralNetworks(RNN)RecurrentNeuralNetworkshavethepowertorememberwhatithaslearnedinthepastandapplyitinfuturepredictions.TheinputisintheformofsequentialdatathatisfedintotheRNN,whichhasahiddeninternalstatethatgetsupdatedeverytimeitreadsthefollowingsequenceofdataintheinput.Theinternalhiddenstatewillbefedbacktothemodel.TheRNNproducessomeoutputateverytimestamp.Themathematicalrepresentationisgivenbelow:💡Note:Weusethesamefunctionandparametersateverytimestamp.TheLongShortTermMemoryNetwork(LSTM)InRNNeachofourpredictionslookedonlyonetimestampback,andithasaveryshort-termmemory.Itdoesn'tuseanyinformationfromfurtherback.Torectifythis,wecantakeourRecurrentNeuralNetworksstructureandexpanditbyaddingsomemorepiecestoit. ThecriticalpartthatweaddtothisRecurrentNeuralNetworksismemory.Wewantittobeabletorememberwhathappenedmanytimestampsago.Toachievethis,weneedtoaddextrastructurescalledgatestotheartificialneuralnetworkstructure. Cellstate(c_t):Itcorrespondstothelong-termmemorycontentofthenetwork.ForgetGate:Someinformationinthecellstateisnolongerneededandiserased.Thegatereceivestwoinputs,x_t(currenttimestampinput)andh_t-1(previouscellstate),multipliedwiththerelevantweightmatricesbeforebiasisadded.Theresultissentintoanactivationfunction,whichoutputsabinaryvaluethatdecideswhethertheinformationisretainedorforgotten.Inputgate:Itdecideswhatpieceofnewinformationistobeaddedtothecellstate.Itissimilartotheforgetgateusingthecurrenttimestampinputandpreviouscellstatewiththeonlydifferenceofmultiplyingwithadifferentsetofweights.Outputgate:Theoutputgate'sjobistoextractmeaningfulinformationfromthecurrentcellstateandprovideitasanoutput.EchoStateNetworks(ESN)EchostateNetworksisaRNNwithsparselyconnectedhiddenlayerswithtypically1%connectivity.Theconnectivityandweightofhiddenneuronsarefixedandrandomlyassigned.Theonlyweightthatneedstobelearnedisthatoftheoutputlayer.Itcanbeseenasalinearmodeloftheweightedinputpassedthroughallthehiddenlayersandthetargetedoutput.Themainideaistokeeptheearlylayersfixed.Theonlyweightsthatwillbemodifiedduringthetrainingareforthesynopsisthatconnectsthehiddenlayerstotheoutputlayers.Thismakesthelossfunctionsimpleandeasytodifferentiate. Thetrainingbecomesuncomplicated,assuminglinearoutputunits.Theonlythingtokeepinmindistosettherandomconnectionsverycarefully.ConvolutionalNeuralNetworks(CNNs)ConvolutionalNeuralNetworksisatypeofFeed-ForwardNeuralNetworksusedintaskslikeimageanalysis,naturallanguageprocessing,andothercompleximageclassificationproblems.ACNNhashiddenlayersofconvolutionallayersthatformthebaseofConvNets. Featuresrefertominutedetailsintheimagedatalikeedges,borders,shapes,textures,objects,circles,etc.Atahigherlevel,convolutionallayersdetectthesepatternsintheimagedatawiththehelpoffilters.Thehigher-leveldetailsaretakencareofbythefirstfewconvolutionallayers.Thedeeperthenetworkgoes,themoresophisticatedthepatternsearchingbecomes.Forexample,inlaterlayersratherthanedgesandsimpleshapes,filtersmaydetectspecificobjectslikeeyesorears,andeventuallyacat,adog,andwhatnot.FeatureExtractionand&ClassificationinConvolutionalNeuralNetworksWhenaddingaconvolutionallayertoanetwork,weneedtospecifythenumberoffilters.Afiltercanbethoughtofasarelativelysmallmatrixforwhichwedecidethenumberofrowsandcolumnsthismatrixhas.Thevalueofthisfeaturematrixisinitializedwithrandomnumbers.Whenthisconvolutionallayerreceivespixelvaluesofinputdata,thefilterwillconvolveovereachpatchoftheinputmatrix.TheoutputoftheconvolutionallayerisusuallypassedthroughtheReLUactivationfunctiontobringnon-linearitytothemodel.Ittakesthefeaturemapandreplacesallthenegativevalueswithzero.PoolingisaveryimportantstepintheConvNetsasitreducesthecomputationandmakesthemodeltoleranttowardsdistortionsandvariations.AFullyConnectedDenseNeuralNetworkswoulduseaflattenedfeaturematrixandpredictaccordingtotheusecase.TheDeconvolutionalNeuralNetworks(DNN)DeconvolutionalNeuralNetworksareCNNsthatworkinareversemanner.Whenweuseconvolutionallayersandmax-pooling,thesizeoftheimageisreduced.Togototheoriginalsize,weuseupsamplingandtransposeconvolutionallayers.Upsamplingdoesnothavetrainableparameters—itjustrepeatstherowsandcolumnsoftheimagedatabyitscorrespondingsizes.TransposeConvolutionallayermeansapplyingconvolutionaloperationandupsamplingatthesametime.ItisrepresentedasConv2DTranspose(numberoffilters,filtersize,stride).Ifwesetstride=1,wedonothaveanyupsamplingandreceiveanoutputofthesameinputsize. AlexNetAlexNetwastrainedontheImagenetdatasetwith15millionhigh-resolutionimageswith256*256*3.IthasmultipleconvolutionallayersandisdeeperthantheLeNetartificialneuralnetwork.HerearethecharacteristicsofAlexNet:Dropoutisaddedinthisarchitecturetopreventoverfitting. Dataaugmentationwasperformedasapre-trainingprocess.ReLUactivationfunctionwasusedforthefirsttimeinsteadofsigmoid,Softmax.GPUlearningwascarriedoutforthefirsttime Overlappingpoolingwasdoneinordertopreventinformationloss.Ithadfiveconvolutional-poolinglayerblocksfollowedbythreefullyconnecteddenselayersforclassification.OverfeatThisNeuralNetworksarchitectureexploresthreewell-knownvisiontasksofclassification,localization,anddetectionusingasingleframework.Ittrainsthemodelsonallthreetaskssimultaneouslytoboostuptheaccuracy. ItisamodificationofAlexNet.Itpredictsboundingboxesateachspatiallocationandscale.Forlocalization,theclassificationheadisreplacedbyaregressionnetwork.VGG VGGstandsforVisualGeometryGroup.ThethoughtbehindVGGwasthatifAlexNetperformedbetterthanLeNetbybeingbiggeranddeeper,whynotkeeppushingfurther?Oneofthepathsthatwecouldtakewastoaddmoredenselayers.Thiswouldbringwithitmorecomputations.Thenextpossibleapproachwastohavemoreconvolutionallayers.Butthisdidn’tworkoutasitwasverytiringtodefineeachconvolutionallayerseparately.So—Thebestofallthesolutionswastogroupconvolutionallayersintoblocks. Thequestionwas:Isitbettertousefewerwiderconvolutionalblocksormorenarrowones?Eventually,theresearchersconcludedthatmorelayersofnarrowconvolutionsweremorepowerfulthansmallernumbersofwiderconvolutions. AVGG-blockhadabunchof3x3convolutionspaddedby1tokeeptheoutputsizethesameasthatofinput,followedbymax-poolingtohalftheresolution.ThearchitecturehadnnumberofVGGblocksfollowedbythreefullyconnecteddenselayers.Network-in-networkConvolutionallayersneedfewerparameters.Itisthelastfewlayersoffullyconnectedneuronsthatbringahugespikeinthenumberofparameters. Onewaytosolvethisistogetridofthefullyconnectedlayers.But—Althoughitsoundseasyintheory,itisprettydifficulttoimplement.Convolutionsandpoolingreducetheresolutions,butatsomepoint,westillneedtomapittocorrespondingclasses.Therefore,theideawastoreducetheresolutionaswegodeeperandincreasethenumberofchannelsbyusing1*1convolutions.Thisgivesushigh-qualityinformationperchannel. Innetwork-in-networkarchitecture,thelastfullyconnectedlayerisreplacedbyaglobalmax-poolinglayermakingthemodellight.GoogLeNetandInceptionInceptionNeuralNetworksarchitecturehasthreeconvolutionallayerswithdifferentsizefiltersandmax-pooling.Everylayerhasdifferentsizefiltersforparallellearning.Therearedifferentsizefilterstotakecareofhugevariationsinthelocationofinformation,whichmakesitverydifficulttochoosetherightsizefilter.Thesmallfiltersizeconvolutionallayertakescareofasmallinformationarea.Abiggerfiltersizecapturesabiggerunitofinformation.GoogleNetarchitectureconsistsofinceptionblocksthathave1x1,3x3,5x5convolutionallayersfollowedby3x3maxpoolingwithpadding(tomaketheoutputofthesameshapeastheinput)onthepreviouslayer,followedbyconcatenationsoftheiroutput.SqueezeNetItaimsforsmallerCNNssothatthereislesscommunicationacrossserversduringdistributedtraining.ThechangesitperformsontheAlexNetarchitectureareasfollows:Replace3*3filterswith1*1filterstoreducethenumberofparameters.DownsamplelaterinthearchitecturesothattheconvolutionallayershavelargeactivationmapsTheysqueezethefeatureswithsqueezelayersconsistingof1*1convolutionallayersandthentheyexpanditwithacombinationof1*1and3*3convolutionallayers.Eachsqueeze-expandblockisplacedtogetherandisknownasafiremodule.XceptionTheconvolutionallayerthatisthebasicbuildingblockofallCNN'sinvolvesaconvolutionoperation.Eachconvolutionoperationinvolvestheslidingofafilterthroughallthepatchesoftheinputpixelarray.Eachtimethenumberofmultiplicationsperformedisequaltothenumberofelementspresentinthefilter. Instandardconvolution,filtersacrossallinputchannelsandthecombinationofthesevaluesaredoneinasinglestep.DepthwiseseparableconvolutionsthatareproposedinXceptionarchitecturebreakdownthisoperationintotwoparts:DepthwiseconvolutionPointwiseconvolutionMobileNetsMobileNetsusedepth-wiseseparableconvolutionstobuildlightweightdeepNeuralNetworks.Theydevelopverysmall,lowlatencymodelsthatareusedforapplicationslikerobots,self-drivingcars,etc.Theseareconsideredbestformobiledevices,andhencetheirname—MobileNets. InasimpleCNNstructure,afilterisablockthatissuperimposedontheinputimageblock,andthedotproductiscalculatedbetweenthetwooverlappingcomponents.Thedetailsinsideonechannelarecalculatedalongwiththerelationshipbetweendifferentchannels.Insteadofhavingonelargefilter,MobileNetshavetwofilters: OnegoesthroughonechannelatatimetodetecthowallthepixelsinachannelarerelatedTheothergoesthroughallthechannelsatthesametimetoseehowonepixelisrelatedtoeveryotherpixelthatcomesbehindit.💡Protip:CheckoutV7pre-trainedmodel(VoVNet)forobjectdetectionthatoutperformsmanystate-of-artarchitectures.CapsuleNetworksThereweresomeproblemswithConvolutionalNeuralNetworks—Theyweretrainedtolearnonimages:inthelowerlayersandlearnaboutedgesandcurvatures,andaswegoupthehierarchy,itlearnsmorecomplexfeatures.Sub-samplingorpoolinglosesspatialrelationships.Tohelpyouunderstanditbetter—It'snotenoughforthemodeltolearnthattheimagecontainsanose,eyes,oramouth,butitshouldalsounderstandthateyesareabovethenose,andthenoseisinbetweentheeyesandabovethemouth,right?Yousee,ConvolutionalNeuralNetworksperformpoorlyindetectinganimageinadifferentposition,forexample,rotated.Ithastobeinapositionsimilartotheimagestheyweretrainedon.Andthisisaproblem. Insteadofinvariance,thenetworkshouldstriveforequivariance.Itmeansthatnomatterwhatpositionorrotationasubsampledimageisin,theneuralnetworkrespondsinthesameway.Itshouldalsochangeaccordinglytoadapttosuchsub-images.Insimplewords:Weneedanetworkthatiseasiertogeneralize.Here'sthemainidea—ArtificialNeuralNetworksmustachievetranslationrotationandinvarianceinamuchmoreefficientway.Thesenetworksshouldhavelocalcapsulesthatperformcomplexinternalcomputationsontheirinputsandthenencapsulatetheresultsintoasmallvectorofhighlyinformativeoutputs.Now,trytokeepthisinmindandstartthinkingaboutusingacapsuleinsteadofaneuron.Soundsinteresting,right?NeuralNetworkswhereinsteadofaddingalayer,itnestsanewlayerinsidealayer.Thisnestedlayeriscalledacapsulewhichisagroupofneurons.Insteadofmakingthestructuredeeperintermsoflayers,aCapsuleNetworknestsanotherlayerwithinthesamelayer.Thismakesthemodelmorerobust.GenerativeAdversarialNetwork(GAN)Generativemodelingcomesundertheumbrellaofunsupervisedlearning,wherenew/syntheticdataisgeneratedbasedonthepatternsdiscoveredfromtheinputsetofdata.GANisagenerativemodelandisusedtogenerateentirelynewsyntheticdatabylearningthepatternandhenceisanactiveareaofAIresearch. Theyhavetwocomponents-ageneratorandadiscriminatorthatworkinacompetitivefashion. Thegenerator’sjobistocreatesyntheticdatabasedonthemodel’sfeaturesduringitslearningphase.Ittakesinrandomdataasinputandreturnsageneratedimageafterperformingcertaintransformations.Thediscriminatoractsasacriticandhasanoverallideaoftheproblemdomainwithaclearunderstandingofgeneratedimages.Thesegeneratedimagesareclassifiedintofake/genuineimagesbythediscriminator.Thediscriminatorreturnsaprobabilisticpredictionfortheimagestobenoisy/free-of-noisebyavalueintherangeof0to1,where1isanauthenticimageand0afakeimage.Thegeneratornetworkproducessamplesbasedonitslearning.Itsadversary,thediscriminator,strivestodistinguishbetweensamplesfromthetrainingdataandsamplesproducedfromthegenerator.Thereisfeedbackfromthediscriminatorfedtothegeneratortoimprovetheperformance.Whenthediscriminatorsuccessfullydistinguishesbetweenrealandfakeexamples,thecomponentisworkingwellandnochangesneedtobeappliedtoitsparameters.Thegeneratorisgivenapenaltywhenitfailstogenerateanimageasrealsuchthatitcouldfoolthediscriminator.However,ifitsucceedsinmakingthediscriminatorcategorizethegeneratedimageasreal,itshowsthatthetrainingofthegeneratorismovingintherightdirection.Sotheultimateaimforthegeneratoristofoolthediscriminatorwhileforthediscriminatoristosurpasstheaccuracyofthegenerator. Itisusedinscenarioslikepredictingthenextframeinavideo,texttoimagegeneration,imagetoimagetranslationlikestyletransfer,denoisingoftheimage,etc. TransformerNeuralNetworksThetruthis—RNNsareslowandtaketoomuchtimeintraining.Theyarenotverygoodwithlargesequenceddataandleadtovanishinggradients.LSTMsthatwereintroducedtobringmemoryintheRNNbecameevenslowertotrain.ForbothRNNandLSTM,weneedtofeedthedatasequentiallyorserially.ThisdoesnotmakeuseofGPUs. Howtoparallelizethetrainingonsequentialdata?TheanswerisTransformers.Thesenetworksemployanencoder-decoderstructurewithadifferencethattheinputdatacanbepassedparallelly.InRNNstructure,onewordatatimewaspassedthroughtheinputlayer.ButinTransformers,thereisnoconceptoftimestampsforpassingtheinput.Wefeedthecompletesentencetogetherandgettheembeddingsforallthewordstogether.HowdotheseTransformerNeuralNetworksdothis?ConsideranexampleofEnglish-Frenchtranslation.InputEmbeddings:Computersdon'tunderstandwords.Theyunderstandnumbers,vectors,etc.Eachwordismappedwithapointinspacecalledembeddingspace.Apre-trainedembeddingspaceisusedtomapawordtoavector.Thesamewordinadifferentsentencewillhavedifferentmeanings.PositionalEncoder:Vectorgivescontextbasedonthepositionofthewordinthesentence.So,InputEmbeddings+PositionalEncoder=InputEmbeddingswithcontextinformationWepassthisintoanencoderblockwhereitgoestoamulti-headattentionlayerandaFeed-Forwardlayer.Theattentionlayerdetermineswhatpartoftheinputsentencethemodelshouldfocuson.Duringthetraining,thecorrespondingFrenchsentenceembeddingsarefedtothedecoderthathasthreemaincomponents. Self-attentionblocksgenerateattentionvectorsforeverywordinthesentencetorepresenthowmucheachwordisrelatedtoeverywordinthesamesentence.Theseattentionvectorsandencoder’svectorsarepassedintoanotherattentionblockcalled-“encoder-decoderattentionblock.”Thisattentionblockdetermineshowrelatedeachwordvectoriswithrespecttoeachother,andthisiswhereEnglishtoFrenchmappingoccurs.Abigchangeinthearchitecturewasproposed- RNNshadadrawbackofnotusingparallelcomputingandlossofcriticalinformationthroughthesequencedtimestampeddata.Incontrast,TransformersarebasedonAttentionthatrequireasinglesteptofeedallthesequentialdataandhaveaself-attentionmechanismworkinginthecorearchitecturetopreserveimportantinformation.BERTBERT(BidirectionalEncoderRepresentationsfromTransformers)outperformLSTM.Thesemodelsarefasterasthewordscanbeprocessedsimultaneously.Thecontextofwordsisbetterlearnedastheycanlearnfrombothdirectionssimultaneously.Ifwestacktheencoders,wegettheBERTmodel.  LearningstrategiesofBERT:MaskedLanguageModeling:BERTtakesinputsentencesandreplacessomerandomwordswith[MASK]tokens.Thegoalofthemodelistopredicttheoriginalwordofthemaskedwordsbasedonthecontextprovidedbytheother,non-masked,wordsinthesequence.ThemodelcalculatestheprobabilityofeachwordinthevocabularywiththeSoftmaxsquashingfunction.IthelpsBERTunderstandthebidirectionalcontextwithinasentence.NextSentencePrediction:Inthiscase,BERTtaketheinputoftwosentences,anditdeterminesifthesecondsentencefollowsthefirstsentence.ThishelpsBERTunderstandcontextacrossdifferentsentences.Tohelpthemodeldistinguishbetweenthetwosentencesintraining,theinputisprocessedinthefollowingwaybeforeenteringthemodel:A[CLS]tokenisinsertedatthebeginningofthefirstsentenceanda[SEP]tokenisinsertedattheendofeachsentence.AsentenceembeddingindicatingSentenceAorSentenceBisaddedtoeachtoken.Itiscreatedtounderstandthecorrelationbetweenthesentences.Apositionalembeddingisaddedtoeachtokentoindicateitspositioninthesequence.Thishelpsthemodeltounderstanditspositionwhenlearningfrombothdirections.WhentrainingtheBERTmodel,MaskedLMandNextSentencePredictionaretrainedtogethertominimizethecombinedlossfunctionofthetwostrategiesandgetagoodunderstandingofthelanguage.GPT;GPT2;GPT3GPT(GenerativePreTraining)isalanguagemodelusedtopredicttheprobabilityofthesequenceofwords.Languagemodelsinvolvinggenerativetrainingdonotrequirehuman-labeleddata.GPT-1hastwostepsoftraining-unsupervisedpre-trainingusingunlabeleddatawithlanguagemodelobjectivefunctionfollowedbysupervisedfine-tuningofthemodelwithoutatask-specificmodel.GPTusestransformerdecoderarchitecture.WithGPT2,thepurposeofthemodelshiftedmoretothetextgenerationside.Itisanautoregressivelanguagemodel.Itistrainedonaninputsequenceanditstargetispredictingthenexttokenateachpointofthesequence.Itconsistsofasinglestackoftransformerblockswithanattentionmechanism.IthasslightlylowerdimensionalitythanBERT,withmoretransformerblocks(48blocks)andalargersequencelength. ThebasicstructureofGPT3issimilartothatofGPT2,withtheonlydifferenceofmoretransformerblocks(96blocks)andistrainedonmoredata.ThesequencesizeofinputsentencesalsodoubledascomparedtoGPT2.Itisbyfarthelargestneuralnetworkarchitecturecontainingthemostnumberofparameters.MomentumContrast(MoCo)Theideabehindthismodelwasthatunsupervisedpre-trainingcouldsurpassthesupervisedcounterpartincomputervisiontaskslikedetectionorsegmentation.Inthepast,wehavealreadyseenthatmodelslikeBERT,GPTthatarebasedonunsupervisedlearninghavebeenahugesuccessintheNLPdomain.Innaturallanguageprocessingrelatedtasks,amodelisgivenaninputsentenceandthemodelisrequiredtopredictoneormultiplefollowingwords.Letssaywehaveadictionaryofallproposedwords.Usingsuchadictionaryallowsustodefinelossasasimpledictionarylook-upproblem.Let'ssayanimageispassedthroughanencoder;theencodedfeatureoftheimagecanbecalledaquery.Thedictionaryinthiscaseisasetoffeaturesofalargesetofimages.Suchadictionaryisveryhardtocreateasimagesandcorrespondingfeaturesarenotreadilyavailable.Adynamicdictionaryispreparedbyapplyingtheencodermodeltoasetofimages.ThismethodologyiscalledContrastiveLearning. Theimageaboverepresentsabatchingperspectiveoftwooptimizationmechanismsforcontrastivelearning.Imagesareencodedintoarepresentationspace,inwhichpairwiseaffinitiesarecomputed.MoCoaddressestwochallengesinContrastiveLearning:Howtomakethedynamicdictionarylargeenough?Howtomakethedynamicdictionaryconsistentwhentheencoderisbeingupdated?TomakealargedictionaryintheContrastiveLearningframework,wemaintainthefeaturesofthepreviousbatchofimagesasaqueue.Thedictionaryconsistsofcurrentandpriorbatchesandisnotlimitedbythebatchsize.Thefeaturesinthisdictionaryareproducedbyanencoderthatisbeingconstantlyupdated,hencereducingtheoverallconsistencyofthedictionary.Tosolvethisconsistencyproblem,amomentumencoderissuggestedthatgetsslowlyupdated,SimCLR ContrastiveLearningiscombinedwithdataaugmentation,largerbatchsize,moretrainingepochs,andwidernetworks.SimCLRstronglyaugmentedtheunlabeledtrainingdataandfeedthemtoseriesofstandardResNetarchitectureandasmallneuralnetwork. Theimagesarepassedtoabaseencodertogetembeddings.Theseembeddingsarepassedthroughtwolayersofneuralnetworkstogetanothersetofembeddings.Amodifiedversionofcross-entropyisusedthatsaysthatsimilaritybetweentwoembeddingsshouldbecloseiftheyformimageandaugmentedimagepair.Inotherwords,theembeddingsshouldattract.Ontheotherhand,thesimilaritybetweenimagesthatdonotbelongtothesameclassshouldrepel.SummaryEachNeuralNetworksarchitecturehasitsownsetofprosandcons.StandardNeuralNetworkslikeFeed-ForwardNeuralNetworksaremostcommonlyusedinsolvingclassificationandregressionproblemsrelatedtosimplestructureddata.RecurrentNeuralNetworksaremuchmorepowerfulintermsofrememberinginformationforalongtimeandareusedinsequentialdataliketext,audio,videoetc.RecentresearchshowsthatTransformersbasedonAttentionMechanismoutperformRNNsandhavealmostreplacedRNNsineveryfield.ForcomplexdatalikeimageswecanuseConvNetsinclassificationtasksandforgenerationofimagesorstyletransferrelatedtasksGenerativeAdversarialNetworksperformsthebest. 💡Readmore:TheBeginner'sGuidetoSelf-SupervisedLearningOverfittingvs.Underfitting:What'stheDifference?TheBeginner'sGuidetoDeepReinforcementLearning[2022]TheCompleteGuidetoEnsembleLearningANewbie-FriendlyGuidetoTransferLearningTheEssentialGuidetoZero-ShotLearning[2022]Supervisedvs.UnsupervisedLearning:What’stheDifference?TheCompleteGuidetoCVAT-Pros&Cons[2022]5AlternativestoScaleAIYOLO:Real-TimeObjectDetectionExplainedFREEGet65+FreeMachineLearningDatasetsThankyou!Yoursubmissionhasbeenreceived!Oops!Somethingwentwrongwhilesubmittingtheform.RelatedarticlesPopular5AlternativestoScaleAI[Reviewed2022]TomasLaurinavicius6minreadPopular13BestImageAnnotationToolsof2022[Reviewed]8minreadComputerVision15+TopComputerVisionProjectIdeasforBeginnersFor2022AlbertoRizzoli6minreadSubscribetoourblog1personalizedemailfromV7'sCEOpermonth->Thankyouforsubscribing!Oops!Somethingwentwrongwhilesubmittingtheform.NewsletterJoinover7,000+MLscientistslearningthesecretsofbuildinggreatAI.Thankyou!Yoursubmissionhasbeenreceived!Oops!Somethingwentwrongwhilesubmittingtheform.NotusingV7yet?



請為這篇文章評分?