Artificial Neural Networks for Total Beginners | by Rich Stureborg

文章推薦指數: 80 %
投票人數:10人

What are the Components of a Neural Network? · Input · Weights · Transfer Function · Activation Function · Bias. HomeNotificationsListsStoriesWritePublishedinTowardsDataScienceArtificialNeuralNetworksforTotalBeginnersEasyandClearExplanationofNeuralNets(withPictures!)MachineLearningdrivesmuchofthetechnologyweinteractwithnowadays,withapplicationsineverythingfromsearchresultsonGoogletoETApredictionontheroadtotumordiagnosis.Butdespiteitsclearimportancetoourevery-daylife,mostofusareleftwonderinghowthisstuffworks.Wemighthaveheardtheterm“artificialneuralnetwork,”butwhatdoesthatreallymean?Isitarobotthatthinksinthesamewayahumandoes?IsitasupercomputerownedbyApple?Orisitjustafancymathequation?MachineLearningactuallycoverseverythingfromsimpledecisiontrees(similartotheonesyoumadeinyourIntrotoBusinessManagementcourse)toNeuralNetworks,thecomplexalgorithmthatmimicsthefunctionofabrain.Thisarticlewilldiveintoneuralnetworkssincetheyarewhat’sbehindmostoftheveryimpressivemachinelearningthesedays.First,anIllustrativeExampleTounderstandwhatmachinelearningis,considerthetaskoftryingtopredicttheheightofatreebasedonthesoilcontentintheground.Now,sincethisismachinelearningwearetalkingabout,let’sassumewecangetsomereallygooddataonthistask:thousandsofsoilsamplesfromallovertheworld.Therearealotofmeasurementsyoucanmakeonsoilcontents.Thingslikemoisturelevels,ironlevels,grainsize,acidity,etc.Theyallhavesomeeffectonthehealthofatree,andhowtallitgrows.Solet’ssaythatweexaminethousandsoftreesintheworld(allofthesamekind,ofcourse)andcollectbothdataabouttheirsoilcontentsaswellasthetrees’heights.Wehavejustcreatedaperfectdatasetformachinelearning,withbothfeatures(thesoilcontents)aswellaslabels(theheights).Ourgoalistopredictthelabelsusingthefeatures.Thatdefinitelyseemslikeadauntingtask.Evenifthereisarelationshipbetweensoilcontentsandtreeheight,itcertainlyseemsimpossibletobeabletomakeaccuratepredictions,right?Well,machinelearningisn’talwaysperfectlyanalogoustohowourbrainswork,evenifneuralnetworksaremodeledfrombrains.Theimportantthingtorememberisthatthesemodelsaren’tmakingwildguessesaswehumansmight.Instead,theyarecomingupwithexactequationsthatdeterminetheirpredictions.Let’sstartwithsimplifyingtheproblemabitfirst.It’squiteeasytoimaginethatasinglefeaturelikemoisturewillhaveasignificanteffectontreeheight.Toodry,andthetreewon’tgrow,buttoomoistandtherootsmayrot.Wecouldmakeanequationbasedonthissinglemeasurement,butitwouldn’tbeveryaccuratebecausetherearemanymanymorefactorsthatgointothegrowthofatree.Seehowthehypotheticalrelationshipaboveisnotagreatestimate?Thelinefollowsthegeneraltrendsofthedots,butifthat’swhatyouusetomakeyourpredictionsonheightyou’llbewrongmostofthetime.Considerthecasewherethereisaperfectamountofmoisture,butthesoiliswaytooacidic.Thetreewon’tgrowverywell,butourmodelonlyconsidersmoisture,soitwillassumethatitwill.Ifweconsiderbothmeasurements,however,wemightgetamoreaccurateprediction.Thatis,wewouldonlysaythatthetreewillbeverytallwhenboththemoistureandacidityareatgoodlevels,butifoneorbothofthemareatbadlevelswemaypredictthatthetreewillbeshort.Sowhatifweconsidermorefactors?Wecouldlookattheeffectofmoistureandacidityatthesametimebycombiningtherelationshipsintooneequation.Excellent.Nowwehaveamorecomplexequationthatdescribesthetree’sheight,anditconsiderstwofeatures(measurements).Nowwecancombineevenmorefeaturestomakeanevenmorecomplexequation.Forthesakeofclarity,Iwillcallthefinal,combinedequationour“model”.Itmodelshowthefeaturesaffectheight.Combiningsimpleequationslikethisintoamulti-dimensionalmodelisprettystraightforward,andwecancreateaverycomplexmodelprettyfast.Butforeverytweakyoucanmakeononeofthesimpleequations(choosingaslightlydifferentequationfortherelationshipbetweenheightandmoisture),therearenowthousandsifnotmillionsofmore‘models’thatwehavetotry,allslightlydifferentfromoneanother.Oneofthesemodelsmightbegreatatmodelingtherelationshipbetweensoilcontentandheight,butmostareprobablyreallybadatit.Thisiswheremachinelearningcomesin.Itwillcreateamodelcomposedofmanysimplerequations,andthentesthowwellitworks.Basedonitserror(thatis,howwrongthepredictionsare)itthentweaksthesimplerequationsonlyslightly,andtestshowwellthatoneworks.Whenittweaksthesimplerequations,itissimplyalteringoneofthegraphsintheimageabovetolookslightlydifferent.Itmayshiftthegraphtotherightorupanddown,oritcouldslightlyelongatepeaksorincreasethesizeofthevalleys.Throughaprocesssimilartoevolution,itwillarriveatthebest—oratleastagood—solution.Infact,that’swhyit’scalled“machinelearning”.Themachinelearnsthepatternonitsown,withouthumanshavingtotellitevensimpleinformationlike“moistureisgoodfortrees”.Ifyou’recuriousabouthowthemachinelearningmodelpicksthenextcombinationofequations,youshouldreadfurtheraboutmodeltraining.Specifically,theconceptstomasterarestochasticgradientdescentandbackpropagation.Sidenote:IfyoueverstudiedtheFourierseriesatuniversity,itisusefultothinkofthemasananalogyforaneuralnetwork.Inschool,welearnthatyoucancreatecomplexwaveslikeasquarewaveusingacombinationofsimplesinewaves.Well,wecanalsocreateamachinelearningmodelfrommanysimpleequationsinasimilarfashion.WhataretheComponentsofaNeuralNetwork?Neuralnetworksarespecificallydesignedbasedontheinnerworkingsofbiologicalbrains.Thesemodelsimitatethefunctionsofinterconnectedneuronsbypassinginputfeaturesthroughseverallayersofwhatarereferredtoasperceptrons(think‘neurons’),eachtransformingtheinputusingasetoffunctions.Thissectionwillexplainthecomponentsofaperceptron,thesmallestcomponentofaneuralnetwork.ThestructureofaperceptronAperceptron(above)istypicallymadeupofthreemainmathoperations:scalarmultiplication,asummation,andthenatransformationusingadistinctequationcalledanactivationfunction.Sinceaperceptronrepresentsasingleneuroninthebrain,wecanputtogethermanyperceptronstorepresentabrain.Thatwouldbecalledaneuralnetwork,butmoreonthatlater.InputTheinputsaresimplythemeasuresofourfeatures.Forasinglesoilsample,thiswouldbeanarrayofvaluesforeachmeasurement.Forexample,wemayhaveaninputof:representing58%moisture,1.3mmgrainsize,and11microgramsironperkgsoilweight.Theseinputsarewhatwillbemodifiedbytheperceptron.WeightsWeightsrepresentscalarmultiplications.Theirjobistoassesstheimportanceofeachinput,aswellasdirectionality.Forexample,doesmoreironcontributealotoralittletoheight?Doesitmakethetreetallerorshorter?Gettingtheseweightsrightisaverydifficulttask,andtherearemanydifferentvaluestotry.Let’ssaywetriedvaluesforallthreeweightsat0.1incrementsontherangeof-10to10.Theweightsthatshowedthebestresultswerew0=0.2,w1=9.6,w3=-0.9.Noticethattheseweightsdon’thavetoaddupto100.Theimportantthingishowlargeandinwhatdirectiontheyarecomparedtooneanother.Ifwethenmultiplytheseweightsbytheinputswehadfrombefore,wegetthefollowingresult:Thesevalueswillthenbepassedontothenextcomponentoftheperceptron,thetransferfunction.TransferFunctionThetransferfunctionisdifferentfromtheothercomponentsinthatittakesmultipleinputs.Thejobofthetransferfunctionistocombinemultipleinputsintooneoutputvaluesothattheactivationfunctioncanbeapplied.Thisisusuallydonewithasimplesummationofalltheinputstothetransferfunction.Onitsown,thisscalarvalueissupposedtorepresentsomeinformationaboutthesoilcontent.Thisvaluehasalreadyfactoredintheimportanceofeachmeasurement,usingtheweights.Nowitisasinglevaluethatwecanactuallyuse.Youcanalmostthinkofthisasanarbitraryweightedindexofthesoil’scomponents.Ifwehavealotoftheseindexes,itmightbecomeeasiertopredicttreeheightusingthem.Beforethevalueissentoutoftheperceptronasthefinaloutput,however,itistransformedusinganactivationfunction.ActivationFunctionAnactivationfunctionwilltransformthenumberfromthetransferfunctionintoavaluethatdramatizestheinput.Oftentimes,theactivationfunctionwillbenon-linear.Ifyouhaven’ttakenlinearalgebrainuniversityyoumightthinkthatnon-linearmeansthatthefunctiondoesn’tlooklikealine,butit’sabitmorecomplicatedthanthis.Fornow,justrememberthatintroducingnon-linearitytotheperceptronhelpsavoidtheoutputvaryinglinearlywiththeinputsandthereforeallowsforgreatercomplexitytothemodel.Belowaretwocommonactivationfunctions.ReLUisasimplefunctionthatcompareszerowiththeinputandpicksthemaximum.Thatmeansthatanynegativeinputcomesoutaszero,whilepositiveinputsareunaffected.Thisisusefulinsituationswherenegativevaluesdon’tmakemuchsense,orforremovinglinearitywithouthavingtodoanyheavycomputations.Thesigmoidfunctiondoesagoodjobofseparatingvaluesintodifferentthresholds.Itisparticularlyusefulforvaluessuchasz-scores,wherevaluestowardsthemean(zero)needtobelookedatcarefullysinceasmallchangenearthemeanmaysignificantlyaffectaspecificbehavior,butwherevaluesfarfromthemeanprobablyindicatethesamethingaboutthedata.Forexample,ifsoilhaslotsandlotsofmoisture,asmalladditiontomoistureprobablywon’taffecttreeheight,butitifhasaveryaveragelevelofmoisturethenremovingsomesmallamountofmoisturecouldaffectthetreeheightsignificantly.Itemphasizesthedifferenceinvaluesiftheyareclosertozero.Whenyouthinkofactivationfunctions,justrememberthatit’sanonlinearfunctionthatmakestheinputmoredramatic.Thatis,inputsclosertozeroaretypicallyaffectedmorethaninputsfarawayfromzero.Itbasicallyforcesvalueslike4and4.1tobemuchcloser,whilevalueslike0and0.1becomemorespreadapart.Thepurposeofthisistoallowustopickmoredistinctdecisionboundaries.If,forexample,wearetryingtoclassifyatreeaseither“tall,”“medium,”or“short,”valuesof5or-5areveryobviouslyrepresentingtallandshort.Butwhataboutvalueslike1.5?Aroundthesenumbers,itmaybemoredifficulttodetermineadecisionboundary,sobydramatizingtheinputitmaybeeasiertosplitthethreecategories.Wepickanactivationfunctionbeforetrainingourmodel,sothefunctionitselfisalwaysthesame.Itisnotoneoftheparameterswetogglewhentestingthousandsofdifferentmodels.Thatonlyhappenstotheweights.TheoutputoftheReLUactivationfunctionwillbe:BiasUpuntilnow,Ihaveignoredoneelementoftheperceptronthatisessentialtoitssuccess.Itisanadditionalinputof1.Thisinputalwaysstaysthesame,ineveryperceptron.Itismultipliedbyaweightjustliketheotherinputsare,anditspurposeistoallowthevaluebeforetheactivationfunctiontobeshiftedupanddown,independentoftheinputsthemselves.Thisallowstheotherweights(fortheactualinputs,nottheweightforthebias)tobemorespecificsincetheydon’thavetoalsotrytobalancethetotalsumtobearound0.Tobemorespecific,biasmightshiftgraphsliketheleftgraphtosomethingliketherightgraph:Andthat’sit!We’venowbuiltasingleperceptron.We’venowcreatedamodelthatimitatesthebrain’sneuron.Wealsounderstandthatwhilethatsoundsfancy,itreallyjustmeansthatwecancreatecomplexmulti-dimensionalequationsbyalteringafewweights.Asyousaw,thecomponentsaresurprisinglysimple.Infact,theycanbesummarizedbythefollowingequation:FromhereonoutIwillberepresentingthisequation(i.e.asingleperceptron)withagreencircle.Allofthecomponentswehaveseensofar:inputs,bias,weights,transferfunction,andanactivationfunctionareallpresentineverysinglegreencircle.Whenanarrowpointsintothisgreencircle,itrepresentsanindividualinputnode,andwhenthearrowpointsoutofthegreencircleitrepresentsthefinaloutputvalue.Multi-LayerPerceptronsTorepresentanetworkofperceptronswesimplyplugtheoutputofoneintotheinputofanother.Weconnectmanyoftheseperceptronsinchains,flowingfromoneendtoanother.ThisiscalledaMulti-LayerPerceptron(MLP),andasthenamesuggeststherearemultiplelayersofinterconnectedperceptrons.Forsimplicity,wewilllookatafully-connectedMLPs,whereeveryperceptroninonelayerisconnectedtoeveryperceptroninthenextlayer.Youmightbewonderingwhata‘layer’is.Alayerisjustarowofperceptronsthatarenotconnectedtoeachother.PerceptronsinanMLPareconnectedtoeveryperceptroninthelayerbeforeitandeveryperceptroninthelayerafterit,butnottoanyoftheperceptronswithinthesamelayer.Let’slookatanMLPwithtwoinputvalues,2hiddenlayersandanoutputofasinglevalue.Let’ssaythefirsthiddenlayerhastwoperceptronsandthesecondhiddenlayerhasthree.Theperceptronsherewillalltakeintheinputs(arrowspointingtowardsthecircle),performtheoperationsdescribedintheprevioussection,andthenpushtheoutputforward(arrowpointingoutofthecircle).Thisisdonemanytimestocreatemoreandmorecomplexequations,allconsideringthesameinformationmultipletimestomakeanaccurateprediction.Now,althoughthisarticleismeanttoremove“themagic”fromneuralnetworks,itisverydifficulttoexplainwhythishelpsmakemoreaccuratepredictions.Infact,themethodIamdescribingisoftenreferredtoasa“blackbox”approach,becausewedon’tknowwhytheequationsitpicksareimportant.Itiscurrentlyanactiveareaofresearch.Whatwecanunderstand,however,iswhattheneuralnetworkisdoing.Thatisassimpleasfollowingtheweightsthrougheachandeveryperceptron.Thereasonwecallthelayersbetweentheinputlayerandoutputlayers“hidden”isbecauseoncethevaluesarefedfromtheinput,itdoesn’tserveuswelltolookathowthatvalueistransformeduntilitexitsthelastoutputnode.Thisisbecausetheseintermediaryvaluesareneverusedtoevaluatetheperformanceofourmodel(i.e.gettingerrorvaluesforpredictionsmadeonsampledata).Andthat’sreallyit.Combiningmanyoftheseperceptronshelpsuscreateevenmoresophisticatedequationsthatasingleperceptroncancreate.TheoutputvalueofanMLPlikethisiscapableofmakingpredictionsonheightusingsoilcontentmeasurements.Ofcourse,pickingthecorrectweightsinsideeverysingleperceptrontakesalotofcomputationalpower,butthisisexactlywhata‘neuralnetwork’does.Let’sseeitinAction!HereIwilltaketwomeasurementsfrombeforethroughanentireneuralnetwork.ThestructurewillbethesameasthenetworkIshowedabove.Thiswillbeverytedious,butyoumayfollowalongifyouwish.Iwillbeignoringthebiasforthesakeofsimplicity.HerearethevaluesofthetwofeaturesIwilluse.Theyrepresent58%moistureand1.3mmgrainsize.Iwillusethefollowing(random)weightsandactivationfunctionsforeachperceptron.RecallthattheReLUactivationfunctionturnsnegativevaluesinto0anddoesnottransformpositivevalues:Solet’sgettoit!Thefirsttwoperceptronsbothtakethetwoinputs(blue),multipliesthembytheassociatedweights(yellow),addsthem(purple),andthenappliestheReLUfunction(green):Theseoutputsbecometheinputsforeachperceptroninthethirdlayer.Soeveryperceptroninthesecondhiddenlayer(therearethree)willuse338.9and42asinputs.Thoseperceptronsfollowthefollowingequations:Forthenextlayer,however,noticethatwenowhavethree,nottwo,inputs:89.9,16.22,and0.Allthreeinputshavetobeincludedintheequationofthelastperceptron,andthereforeitwillhavethreeweights(inyellowbelow).Itsequationisstillasstraightforwardastheothers.Asasummary,herearethevalueseachperceptronproducedgivenitsinputs:Andthereyouhaveit!Thisneuralnetworkpredictedatreewithaheightof165.72feet!Nowwehavetocomparethepredictedresultstotheactualheightofthesampletreeinourdata.Calculatingsomeerrorvaluecanbeasstraightforwardastakingthedifferencebetweenourpredictedheightandtheactualheight.Thenwerepeatthisprocesswithslightlydifferentweightsoverandoveruntilwefindweightsthatpredicttreeheightwellformanysamples.Butthattakesmuchtoolongforahumantodo,soweneedamachinetocomputetheoptimalweights.Importanttakeaways:Theweightsweretotallyrandomtosimulatethestartingpointofaneuralnetwork.Thismodelisclearlynot‘trained’andthereforewon’tdowellonceweputanothersampleintoit.Wewouldusetheresultsabovetodeterminehowtoaltertheweights.Theintermediaryvaluesdon’ttellusmuchatall.Forexample,theoutputfromthetopnodeinthefirsthiddenlayeris338.9,butthat’snowhereclosetothevaluethattheneuralnetworkpredicted,166ft.It’simportanttonottrytointerprettheintermediaryvaluesashavingareal-worldmeaning.Thisiswhywecalltheselayers‘hidden.’That’sit!Thanksforreading:)Considergivingthearticlearoundofapplause,itreallyhelpsmeout!Tolearnmoreaboutmachinelearning,checkoutthisarticleaboutCNNs,themodelbehindcomputervision.MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceMorefromMediumChatbotTesting:HowtoGetitRightfromScratchTheFutureofVision:Machinesvs.HumansThefutureofvisionisahottopicinthescientificworld.Willmachineseventuallysurpasshumansatvision?Orwillcomputersneverbe…LegalTechhitsRecord$1billioninvestmentin2018OxfordMartinSchoolseries — “Datawork:thehiddentalentandsecretlogicfuellingartificial…Neuralink’sBrainImplantAllowsMonkeystoPlayVideoGameswiththeirMindsTheBotsAren’tComing—They’reAlreadyHereFearoftheFarmRobot?TheseExpertsSayNottoWorryCautiouslyLookingatArtificialIntAstheageoftheinternethitsitsthirties,thebiggestthreatthatitfacesinthemodernageistheoverwhelmingfeelingofresentment…GetstartedRichStureborg47FollowersPh.D.candidateatDukeUniversity,workingonresearchregardingvaccinemisinformationandcredibilityonline.FollowRelatedFine-TuningaCNNmodelforImageClassificationIntuitiontoNeuralNetworkembeddingsSimplifyinghybridandcomplexmodelsbyunderstandingfeatureembeddingsGradientDescent:DesignYourFirstMachineLearningModelLet’sLearn:NeuralNets#1Astep-by-stepchronicleofmelearningaboutneuralnets.HelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?