|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{FeatureDetectionCompVisNavbox}}
| | == Ray Ban Zonnebrillen Belgie ofwel == |
|
| |
|
| '''Scale-invariant feature transform''' (or '''SIFT''') is an algorithm in [[computer vision]] to detect and describe local features in images. The algorithm was published by [[David Lowe (computer scientist)|David Lowe]] in 1999.<ref name="Lowe1999"/> | | Port Grimaud is een prachtig resort en we zouden hier zeker weer te reizen. De "Poort" zelf is zeer schilderachtig met een goede selectie van [http://www.studiodeprez.be/studioverhuur/images/reservatie.asp?r=35-Ray-Ban-Zonnebrillen-Belgie Ray Ban Zonnebrillen Belgie] bars en restaurants. Het was groot en wellkept met een wellstocked supermarkt ter plaatse. Kwaliteiten [http://www.boligna.be/backoffice/ckeditor/nieuws.asp?u=97-Ugg-Boots-Belgium Ugg Boots Belgium] zoals deze zijn cruciaal in positie Liudmila's, aan het roer van een marktleider met 2.000.000 Mobile, Internet Fix klanten op een totale bevolking van ongeveer 3 miljoen. Ze is zich terdege bewust van de bijbehorende verantwoordelijkheden:. "De telecomsector is een katalysator voor de Moldavische economie, waar het goed is voor ongeveer 10% van het BBP We spelen een centrale rol [http://www.slagerij-kris.be/_private/mail.asp?polo=82-Ralph-Lauren-Winkel-Brugge Ralph Lauren Winkel Brugge] als leverancier van diensten die de economische groei te stimuleren en als een schepper van banen. " Wat is ze het meest trots op? <br><br>Ik heb gediend als academische team coach voor 2 jaar bij Nevisdale Elementary School, en diende ook als hun cheerleading sponsor. Ik was academisch team coach voor Whitley County Middle School voor 9 jaar. Ik heb gediend als ambtenaar voor vele districts-en regionale academische wedstrijden, evenals gehost ene wijk en de regionale wedstrijd, en was trots op het succes van de teams ik gecoacht, krijgen te participeren in niveau van de staat concurrentie zien ..<br><br>Terwijl de oven opwarmt, verhit een ovenbestendig gietijzeren koekenpan op mediumhigh hitte. Smelt een 1/2inch stukje boter in de koekenpan, dan schroei elke kant van de biefstuk voor twee minuten. Zet de vlam en zet de biefstuk en de pan in de oven. Er zijn vele websites van deze webserver. De operator [http://www.daelprinting.be/en/cms/inc/categorie.asp?page=110-Woolrich-Online Woolrich Online] gebruikt deze server voor vele hosting klanten. In totaal zijn er minstens 99 websites op deze server.<br><br>Istanbul is groter dan New York en ik ga niet vergelijken met londen, londen is niet eens kwart van istanbul toch Hoe dan ook, hoe dan ook wij weten wat ze bedoelde ja het toont haar opleidingsniveau en zelfs dat ze niet eens de zorg over het onderzoek?.!. NYC naar Istanbul is 9 uur vliegen. <br><br>Ook kan PMillar niet gevangen, ofwel, dat de Form 990 geeft Restauratie Huis geen financiering van welke aard overheid. Wij hier bij KnoxNews zijn in de afgelopen jaren gewend aan het onderzoek van de lokale non-profitorganisaties met een oog naar de bepaling whetheror hoe muchany van hen kan worden fleecing belastingbetalers. <br><br>Bent u op zoek naar een Richard Gasquet v Grigor Dimitrov Gratis live streaming link? Dan bent u hier op de juiste plaats. U kunt een live online stroom van Richard Gasquet v Grigor Dimitrov kijken op deze site. Het is niet nodig om ergens anders kijken. Zoals hierboven reeds besproken, moet u een visueel verleidelijke website om bezoekers te lokken en te behouden hun interesse hebben. Nogmaals, moeten locaties een volledig scala van functionaliteiten te bieden aan gebruikers en navigatie moet geen probleem zijn.<ul> |
| | |
| | <li>[http://yingshi.vivis.cn/forum.php?mod=viewthread&tid=225449&extra= http://yingshi.vivis.cn/forum.php?mod=viewthread&tid=225449&extra=]</li> |
| | |
| | <li>[http://www.apachina.org/bbs/forum.php?mod=viewthread&tid=1325584 http://www.apachina.org/bbs/forum.php?mod=viewthread&tid=1325584]</li> |
| | |
| | <li>[http://dirtyhotshot.com/activity/p/28679/ http://dirtyhotshot.com/activity/p/28679/]</li> |
| | |
| | <li>[http://ks35439.kimsufi.com/spip.php?article453/ http://ks35439.kimsufi.com/spip.php?article453/]</li> |
| | |
| | <li>[http://www.yaocq.com/news/html/?360561.html http://www.yaocq.com/news/html/?360561.html]</li> |
| | |
| | </ul> |
|
| |
|
| Applications include [[object recognition]], [[robotic mapping]] and navigation, [[image stitching]], [[3D modeling]], [[gesture recognition]], [[video tracking]], individual identification of wildlife and [[match moving]].
| | == Michael Kors Sales maar zijn niet geschikt == |
|
| |
|
| The algorithm is patented in the US; the owner is the [[University of British Columbia]].<ref name="patent"/>
| | En terwijl hij het ermee eens het eten is zeker rijk en een aantal van de gerechten mag niet lowcal zijn, je hoeft alleen maar om goede keuzes te maken. "Net als alles, geniet ervan met mate. Excuseer mijn laatste post omdat het niet echt betrekking op de belangrijkste kwestie hier. Ik probeerde te zeggen dat je moet voorzichtig zijn met Baclofen te zijn, omdat het werkt op GABA B, net als <br><br>Je bent een lelijke vrouw, je geen kans op het krijgen van een tv-baan. Denk je dat je te kleden als [Bob] Sager om een baan te krijgen nu. Ik vind verlichting van Cannabis. Benzo's werken, maar zijn niet geschikt, omdat ik uiteindelijk gewoon te veel. Wat ik zeker weet is dat wanneer ik de kans met die Montreal trui te spelen op mijn rug, is er geen betere motivatie dan een Canadien zijn. <br><br>In Peril van Another Rampart Scandal, panel vindt FBI ziet Hezbollah overal in de VS Florida Politie DUI Instructeur gearresteerd voor rijden onder invloed Gemeenschap eist justitie in de dood van Jessie Lee Williams Jr Vindication: Politie daling wiretap rekent Exjailer pleit schuldig patroon claims van misbruik in de gevangenis <br><br>Militaire voertuigen en acteurs in historische militaire kledij zal een aanvulling op de Flying Fortress. Tijdens een kort programma, zullen we eren overlevende B17 bemanningsleden en productie werknemers uit de omgeving van Seattle. Vraagt u zich af wat te doen met het familiebedrijf / vastgoed? Bezorgd over estate planning? Je hebt een wellconceived en uitgevoerd financieel plan. <br><br>Zo ja, dan moet je kijken naar hairstyling opties voor dik golvend haar. Mediumlength Kapsels Je kon krijgen je lokken vormgegeven in een medium gelaagde beste antwoord: Ik heb hetzelfde haar als jij. Een andere schrikken tactiek om mensen te stimuleren om de herindeling terug naar de hogere schema accepteren. <br><br>Wie wil dat niet? Maar er is een keerzijde aan die opvatting. Augustus, Maya Moore en Lindsay Whalen word gewoon de Lynx Olympiërs. Naties. De onderste foto op de kaart hieronder is de Hayes schoorsteen en de bovenste foto is de plek [http://www.metallink.be/flash/produkten.asp?m=8-Michael-Kors-Sales Michael Kors Sales] Naties .. Hey iedereen, Ik heb wat Chinese raws, ik heb geen idee of het voltooid is of niet tho. <br><br>De Pentelic marmer, waarvan de beelden worden gemaakt, natuurlijk verwerft Tan kleur vergelijkbaar met honing bij blootstelling aan de lucht, [http://www.boligna.be/backoffice/ckeditor/nieuws.asp?u=134-Uggs-Sale-Kids Uggs Sale Kids] wordt deze kleurplaat vaak bekend [http://www.ardovlam.be/intranet/contactok.asp?m=23-Mbt-Store Mbt Store] als de marmeren [42], maar Lord Duveen, die de hele onderneming gefinancierd, handelend [http://www.bondvlaamsearchitecten.be/sub/diensten.asp?id=122-Nike-Hakken Nike Hakken] onder de misvatting dat knikkers waren oorspronkelijk wit nofollow [43] waarschijnlijk georganiseerd team van metselaars werken in het project om verkleuring van enkele van de beelden te verwijderen.<ul> |
| | |
| | <li>[http://israeliz.net/activity/p/39501/ http://israeliz.net/activity/p/39501/]</li> |
| | |
| | <li>[http://www.bbs.read-walker.com/forum.php?mod=viewthread&tid=135161 http://www.bbs.read-walker.com/forum.php?mod=viewthread&tid=135161]</li> |
| | |
| | <li>[http://bdkuaican.com/news/html/?4727.html http://bdkuaican.com/news/html/?4727.html]</li> |
| | |
| | <li>[http://kakaland.com/forum.php?mod=viewthread&tid=36123 http://kakaland.com/forum.php?mod=viewthread&tid=36123]</li> |
| | |
| | <li>[http://www.917yx.net/t-191253-1-1.html http://www.917yx.net/t-191253-1-1.html]</li> |
| | |
| | </ul> |
|
| |
|
| == Overview == | | == Nike Air Max Verkooppunten Belgie == |
| {{technical|date=October 2010}}
| |
|
| |
|
| For any object in an image, interesting points on the object can be extracted to provide a "feature description" of the object. This description, extracted from a training image, can then be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable recognition, it is important that the features extracted from the training image be detectable even under changes in image scale, noise and illumination. Such points usually lie on high-contrast regions of the image, such as object edges.
| | A: [http://www.concord-remarketing.com/fr/includes/customer.asp?n=14-Nike-Air-Max-Verkooppunten-Belgie Nike Air Max Verkooppunten Belgie] A: Beste Rusel, Als je een laxeermiddel aankoop een over de toonbank na een gesprek met uw arts nodig. Als u problemen ondervindt op dit gebied proberen het eten van voedingsmiddelen met meer vezels zoals fruit, groenten en pruimen zijn bekend om te helpen. Ik denk dat uw probleem is meer dan alleen de huid diep dieet en stress kan ook leiden tot acne breakout dus ik stel voor om blijvend te proberen'' s Schoon 9 is het regime om toxines in het lichaam te verwijderen. <br><br>Ellen Kardashian, 63, getrouwd Robert in 2003 slechts twee maanden voor zijn dood aan kanker. Ze zegt dat haar [http://www.rivaclub.be/Rivaclubfotoalbum/res/contact.asp?lv=78-Louis-Vuitton-Bags Louis Vuitton Bags] overleden man verdacht Khloe was het resultaat van een affaire Kris Jenner hadden vóór hun huwelijk eindigde in 1990. Barron's GRE met CDROM 18e editie De CD heeft 2 volle lengte testen. <br><br>Zeker als iedereen weet dat Peca wil om terug te komen. Als Buffalo stond open voor het brengen hem terug, zou hij terug. Ze merkte een groot aantal stukken, niet te vergeten hoe het is om een geweldige geven mentaliteit moeten hebben veel meer gemakkelijk weten precies een verscheidenheid [http://www.bondvlaamsearchitecten.be/sub/diensten.asp?id=112-Nike-Online-Shop-Outlet Nike Online Shop Outlet] van slopende dingen. <br><br>Tussen 22 januari en 17 maart sneeuw viel elke dag ergens in het land. De meest rampzalige lawine in het Verenigd Koninkrijk kwam in Lewes, East Sussex op 27 december 1836. Acht mensen werden gedood en verscheidene huizen werden verwoest .. Meer en meer bloggers en webmasters zijn [http://www.ilpastaiolo.be/Test_site/OLd/Slide/slidepasta.asp?k=7-Oakley-Brillen-Belgie Oakley Brillen Belgie] met behulp van RSS-syndicatie hun weblog bezoekers om te zetten in reguliere lezers. Maar voordat je kunt beginnen met het krijgen van hen in te schrijven, moet u eerst hun aandacht te vangen. Duidelijke en pakkende RSS icoon kan een goed begin zijn.<br><br>"We hebben onze strategische focus vernauwd; herstructurering van de onderneming, met inbegrip van onze LifeSize divisie, en geprioriteerd onze middelen om grote nieuwe producten voor tablets te maken, terwijl de pc-markt blijft wegen op onderdelen van ons bedrijf, en de aanhoudende economische onzekerheid in veel van. Europa vertoont geen tekenen van verbetering, ons product portfolio en aanduidingen van stabilisatie in Amerika en Azië, in combinatie met de kostenbesparingen als gevolg van onze FY 2013 herstructureringsmaatregelen, plaatst ons voor een verbeterde winstgevendheid in FY 2014. "voor het boekjaar 2014, eindigend in maart 31, 2014, Logitech momenteel verwacht een omzet van ongeveer 2 miljard dollar bedrijfsresultaat van ongeveer $ 50.000.000 en de brutomarge van ongeveer 34 procent.<br><br>Ik hoop echt dat Live Search wordt beter. Maar voor nu ben ik een beetje teleurgesteld: (Live Search is gebruiksvriendelijk, heeft een geweldig design, maar de zoekresultaten moeten veel completer naar mijn mening .. De bijzonder intelligent lijken met de kleding heeft de neiging om deze extreem populair te maken met dames, terwijl<ul> |
| | |
| | <li>[http://cs.dbwhcb.com/forum.php?mod=viewthread&tid=835727 http://cs.dbwhcb.com/forum.php?mod=viewthread&tid=835727]</li> |
| | |
| | <li>[http://grhdx.site02.51eway.com/news/html/?100551.html http://grhdx.site02.51eway.com/news/html/?100551.html]</li> |
| | |
| | <li>[http://www.ezlsw.com/news/html/?23703.html http://www.ezlsw.com/news/html/?23703.html]</li> |
| | |
| | <li>[http://124.128.87.5:8888/discuz/forum.php?mod=viewthread&tid=435837 http://124.128.87.5:8888/discuz/forum.php?mod=viewthread&tid=435837]</li> |
| | |
| | <li>[http://soft.zfk8.com/bbs/forum.php?mod=viewthread&tid=1403211 http://soft.zfk8.com/bbs/forum.php?mod=viewthread&tid=1403211]</li> |
| | |
| | </ul> |
|
| |
|
| Another important characteristic of these features is that the relative positions between them in the original scene shouldn't change from one image to another. For example, if only the four corners of a door were used as features, they would work regardless of the door's position; but if points in the frame were also used, the recognition would fail if the door is opened or closed. Similarly, features located in articulated or flexible objects would typically not work if any change in their internal geometry happens between two images in the set being processed. However, in practice SIFT detects and uses a much larger number of features from the images, which reduces the contribution of the errors caused by these local variations in the average error of all feature matching errors.
| | == New Balance Hardloopschoenen Aanbieding ' == |
|
| |
|
| Lowe's patented method <ref name=patent/> can robustly identify objects even among clutter and under partial occlusion, because his SIFT feature descriptor is invariant to [[Scaling (geometry)|uniform scaling]], [[Orientation (geometry)|orientation]], and partially invariant to [[Affine transformation|affine distortion]] and illumination changes.<ref name="Lowe1999"/> This section summarizes Lowe's object recognition method and mentions a few competing techniques available for object recognition under clutter and partial occlusion.
| | Met zo veel verschillende social media platformen te overwegen is er de natuurlijke verleiding om een bos van hen proberen. Deze verleiding wordt nog versterkt door de schijnbare afwezigheid [http://www.kaasbistro.be/includes/curiosa.asp?new=88-New-Balance-Hardloopschoenen-Aanbieding New Balance Hardloopschoenen Aanbieding] van kosten aan deze platforms en de veronderstelde "cool factor" een merk gebruiken, kunnen denken dat hun krijgen door ze te gebruiken. <br><br>Ik durf te voegen dat degenen die vicepremier Uhuru Kenyatta's Over pagina zag en ging te worden geserveerd met deze tastbare boosheid in het artikel van Sarah Elderkin zal nooit stemmen voor Raila Odinga, omdat het lijkt dat hij de hel gebogen over het beoefenen van destructieve politiek in plaats van constructieve politiek. In plaats van [http://www.fractal.be/css/ri/search.asp?id=67-Roshe-Run-Id Roshe Run Id] het creëren van een pagina over dat dit zou uitdagen hij besluit geen van zijn tech 'goeroes' zijn tot de taak en beslist de enige weg vooruit is naar de website van Uhuru's prullenbak. <br><br>Ze kan niet opstaan in haar eentje of staan. Ik denk dat de beste manier om een vis pijnloos te doden is om gewoon in een ijsbad, de vis gewoon gos. Is dat zo Craven de beste slogan ooit? Ja .. Elke groep die streeft naar herbouwen sociale structuren, in het bijzonder door middel van geweldloze acties van burgerlijke ongehoorzaamheid, in plaats van fysiek te vernietigen, wordt, in de ogen van Black Bloc anarchisten, de vijand. Black Bloc anarchisten grootste deel van hun woede niet op de architecten van de North American Free Trade Agreement (NAFTA) of globalisme, maar op die, zoals de Zapatistas, die reageren op het probleem. <br><br>Twitter Search Terwijl iedereen is druk bezig om erachter te komen hoe om meer mensen naar hun 140character ontploffing van schittering op Twitter volgen, er is een enorme, rijk en valueintense zoekmachine voor Twitter, dat is het echte goud. Neem een paar minuten en ga naar Twitter Search, pop in de merken die u vertegenwoordigt, uw eigen naam en zelfs de industrie u te dienen. Welke soorten gesprekken zijn al gevoerd over uw bedrijf "in het wild? ' <br><br>Gezien de dynamiek van het web overvloed aan inhoud, consumenten creëren van hun eigen distributienetwerken hoe nieuwsorganisaties en journalisten het beste dienen politieke nieuws de consument? Het is duidelijk, er is geen vervanging voor originele politieke verslaggeving, die [http://www.mortier-agri.be/site/scripts/historiek.asp?groep=27-Vibram-Fivefingers-Bikila Vibram Fivefingers Bikila] een nieuw licht kunnen werpen op de kandidaten, het beleid, platforms, politieke beloften, en partijdige aanvallen. Feit bijeenkomst is nog steeds de basis van de politieke journalistiek. <br><br>Deze dingen zijn gewoon knallen uit mijn hoofd dus stop met het doen van werk voor Troid maar gaan beginnen met het werken voor jezelf als je sumbitting 10K articles.Darn waarom uw eigen artikel directory niet maken. Man je realestate geven aan Triod terwijl u kunt toevoegen aan uw eigen adsense of andere montizable stront in je eigen directory je make-up een goede site goede lay-out met een aantal (10k) [http://www.fractal.be/css/ri/search.asp?id=41-Nike-Roshe-Run-Belgium Nike Roshe Run Belgium] kwaliteit artikelen. Wat denk je dat gaat doen met de SE?<ul> |
| | | |
| === David Lowe's method ===
| | <li>[http://yaobuy.com.cn/news/html/?217597.html http://yaobuy.com.cn/news/html/?217597.html]</li> |
| SIFT keypoints of objects are first extracted from a set of reference images<ref name=Lowe1999 /> and stored in a database. An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on [[Euclidean distance]] of their feature vectors. From the full set of matches, subsets of keypoints that agree on the object and its location, scale, and orientation in the new image are identified to filter out good matches. The determination of consistent clusters is performed rapidly by using an efficient [[hash table]] implementation of the generalized [[Hough transform]]. Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed model verification and subsequently outliers are discarded. Finally the probability that a particular set of features indicates the presence of an object is computed, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence.<ref name="Lowe2004"/>
| | |
| | | <li>[http://www.bv58.cn/news/html/?11439.html http://www.bv58.cn/news/html/?11439.html]</li> |
| {| class="wikitable"
| | |
| |-
| | <li>[http://51mb.cn/news/html/?37394.html http://51mb.cn/news/html/?37394.html]</li> |
| ! Problem
| | |
| ! Technique
| | <li>[http://www.chantal.cn/bbs/forum.php?mod=viewthread&tid=54843&fromuid=16974 http://www.chantal.cn/bbs/forum.php?mod=viewthread&tid=54843&fromuid=16974]</li> |
| ! Advantage
| | |
| |-
| | <li>[http://www.hlyjq.cn/forum.php?mod=viewthread&tid=2696236 http://www.hlyjq.cn/forum.php?mod=viewthread&tid=2696236]</li> |
| | key localization / scale / rotation
| | |
| | DoG / scale-space pyramid / orientation assignment
| | </ul> |
| | accuracy, stability, scale & rotational invariance
| |
| |-
| |
| | geometric distortion
| |
| | blurring / resampling of local image orientation planes
| |
| | affine invariance
| |
| |-
| |
| | indexing and matching
| |
| | nearest neighbor / Best Bin First search
| |
| | Efficiency / speed
| |
| |-
| |
| | Cluster identification
| |
| | Hough Transform voting
| |
| | reliable pose models
| |
| |-
| |
| | Model verification / outlier detection
| |
| | Linear least squares
| |
| | better error tolerance with fewer matches
| |
| |-
| |
| | Hypothesis acceptance
| |
| | Bayesian Probability analysis
| |
| | reliability
| |
| |}
| |
| | |
| ===Key stages===
| |
| | |
| ====Scale-invariant feature detection====
| |
| Lowe's method for image feature generation transforms an image into a large collection of feature vectors, each of which is invariant to image translation, scaling, and rotation, partially invariant to illumination changes and robust to local geometric distortion. These features share similar properties with neurons in [[inferior temporal cortex]] that are used for object recognition in primate vision.<ref name="Serre2005"/> Key locations are defined as maxima and minima of the result of [[difference of Gaussians]] function applied in [[scale space]] to a series of smoothed and resampled images. Low contrast candidate points and edge response points along an edge are discarded. Dominant orientations are assigned to localized keypoints. These steps ensure that the keypoints are more stable for matching and recognition. SIFT descriptors robust to local affine distortion are then obtained by considering pixels around a radius of the key location, blurring and resampling of local image orientation planes.
| |
| | |
| ====Feature matching and indexing====
| |
| Indexing consists of storing SIFT keys and identifying matching keys from the new image. Lowe used a modification of the [[k-d tree]] algorithm called the '''[[Best Bin First|Best-bin-first]] search''' method <ref name="Beis1997"/> that can identify the [[nearest neighbor search|nearest neighbors]] with high probability using only a limited amount of computation. The BBF algorithm uses a modified search ordering for the [[k-d tree]] algorithm so that bins in feature space are searched in the order of their closest distance from the query location. This search order requires the use of a [[heap (data structure)|heap]]-based [[priority queue]] for efficient determination of the search order. The best candidate match for each keypoint is found by identifying its nearest neighbor in the database of keypoints from training images. The nearest neighbors are defined as the keypoints with minimum [[Euclidean distance]] from the given descriptor vector. The probability that a match is correct can be determined by taking the ratio of distance from the closest neighbor to the distance of the second closest.
| |
| | |
| Lowe<ref name=Lowe2004 /> rejected all matches in which the distance ratio is greater than 0.8, which eliminates 90% of the false matches while discarding less than 5% of the correct matches. To further improve the efficiency of the best-bin-first algorithm search was cut off after checking the first 200 nearest neighbor candidates. For a database of 100,000 keypoints, this provides a speedup over exact nearest neighbor search by about 2 orders of magnitude, yet results in less than a 5% loss in the number of correct matches.
| |
| | |
| ====Cluster identification by Hough transform voting==== | |
| [[Hough Transform]] is used to cluster reliable model hypotheses to search for keys that agree upon a particular model [[Pose (computer vision)|pose]]. Hough transform identifies clusters of features with a consistent interpretation by using each feature to vote for all object poses that are consistent with the feature. When clusters of features are found to vote for the same pose of an object, the probability of the interpretation being correct is much higher than for any single feature. An entry in a [[hash table]] is created predicting the model location, orientation, and scale from the match hypothesis.The [[hash table]] is searched to identify all clusters of at least 3 entries in a bin, and the bins are sorted into decreasing order of size.
| |
| | |
| Each of the SIFT keypoints specifies 2D location, scale, and orientation, and each matched keypoint in the database has a record of its parameters relative to the training image in which it was found. The similarity transform implied by these 4 parameters is only an approximation to the full 6 degree-of-freedom pose space for a 3D object and also does not account for any non-rigid deformations. Therefore, Lowe<ref name=Lowe2004 /> used broad bin sizes of 30 degrees for orientation, a factor of 2 for scale, and 0.25 times the maximum projected training image dimension (using the predicted scale) for location. The SIFT key samples generated at the larger scale are given twice the weight of those at the smaller scale. This means that the larger scale is in effect able to filter the most likely neighbours for checking at the smaller scale. This also improves recognition performance by giving more weight to the least-noisy scale. To avoid the problem of boundary effects in bin assignment, each keypoint match votes for the 2 closest bins in each dimension, giving a total of 16 entries for each hypothesis and further broadening the pose range.
| |
| | |
| ====Model verification by linear least squares====
| |
| Each identified cluster is then subject to a verification procedure in which a [[linear least squares (mathematics)|linear least squares]] solution is performed for the parameters of the [[affine transformation]] relating the model to the image. The [[affine transformation]] of a model point [x y]<sup>T</sup> to an image point [u v]<sup>T</sup> can be written as below
| |
| | |
| :<math>
| |
| \begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} m1 & m2 \\ m3 & m4 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} tx \\ ty \end{bmatrix}
| |
| </math>
| |
| | |
| where the model translation is [tx ty]<sup>T</sup> and the affine rotation, scale, and stretch are represented by the parameters m1, m2, m3 and m4. To solve for the transformation parameters the equation above can be rewritten to gather the unknowns into a column vector.
| |
| | |
| :<math>
| |
| \begin{bmatrix} x & y & 0 & 0 & 1 & 0 \\ 0 & 0 & x & y & 0 & 1 \\ ....\\ ....\end{bmatrix} \begin{bmatrix}m1 \\ m2 \\ m3 \\ m4 \\ tx \\ ty \end{bmatrix} = \begin{bmatrix} u \\ v \\ . \\ . \end{bmatrix}
| |
| </math>
| |
| | |
| This equation shows a single match, but any number of further matches can be added, with each match contributing two more rows to the first and last matrix. At least 3 matches are needed to provide a solution.
| |
| We can write this linear system as
| |
| | |
| :<math>A\hat{\mathbf{x}} \approx \mathbf{b},</math>
| |
| | |
| where ''A'' is a known ''m''-by-''n'' [[Matrix (mathematics)|matrix]] (usually with ''m'' > ''n''), '''x''' is an unknown ''n''-dimensional parameter [[vector space|vector]], and '''b''' is a known ''m''-dimensional measurement vector.
| |
| | |
| Therefore the minimizing vector <math>\hat{\mathbf{x}}</math> is a solution of the '''normal equation'''
| |
| :<math> A^T \! A \hat{\mathbf{x}} = A^T \mathbf{b}. </math>
| |
| | |
| The solution of the system of linear equations is given in terms of the matrix <math>(A^TA)^{-1}A^T</math>, called the [[Moore-Penrose pseudoinverse|pseudoinverse]] of ''A'', by
| |
| | |
| :<math> \hat{\mathbf{x}} = (A^T\!A)^{-1} A^T \mathbf{b}. </math>
| |
| | |
| which minimizes the sum of the squares of the distances from the projected model locations to the corresponding image locations.
| |
| | |
| ====Outlier detection====
| |
| [[Outlier]]s can now be removed by checking for agreement between each image feature and the model, given the parameter solution. Given the [[linear least squares (mathematics)|linear least squares]] solution, each match is required to agree within half the error range that was used for the parameters in the [[Hough transform]] bins. As outliers are discarded, the linear least squares solution is re-solved with the remaining points, and the process iterated. If fewer than 3 points remain after discarding [[outlier]]s, then the match is rejected. In addition, a top-down matching phase is used to add any further matches that agree with the projected model position, which may have been missed from the [[Hough transform]] bin due to the similarity transform approximation or other errors.
| |
| | |
| The final decision to accept or reject a model hypothesis is based on a detailed probabilistic model.<ref name="Lowe2001"/> This method first computes the expected number of false matches to the model pose, given the projected size of the model, the number of features within the region, and the accuracy of the fit. A [[Bayesian probability]] analysis then gives the probability that the object is present based on the actual number of matching features found. A model is accepted if the final probability for a correct interpretation is greater than 0.98. Lowe's SIFT based object recognition gives excellent results except under wide illumination variations and under non-rigid transformations.
| |
| | |
| ===Competing methods for scale invariant object recognition under clutter / partial occlusion===
| |
| RIFT <ref name="Lazebnik2004"/> is a rotation-invariant generalization of SIFT. The RIFT descriptor is constructed using circular normalized patches divided into concentric rings of equal width and within each ring a gradient orientation histogram is computed. To maintain rotation invariance, the orientation is measured at each point relative to the direction pointing outward from the center.
| |
| | |
| G-RIF:<ref name="Sungho2006"/> Generalized Robust Invariant Feature is a general context descriptor which encodes edge orientation, edge density and hue information in a unified form combining perceptual information with spatial encoding. The object recognition scheme uses neighbouring context based voting to estimate object models.
| |
| | |
| "[[SURF]]:<ref name="Bay2006"/> Speeded Up Robust Features" is a high-performance scale and rotation-invariant interest point detector / descriptor claimed to approximate or even outperform previously proposed schemes with respect to repeatability, distinctiveness, and robustness. [[SURF]] relies on integral images for image convolutions to reduce computation time, builds on the strengths of the leading existing detectors and descriptors (using a fast [[Hessian matrix]]-based measure for the detector and a distribution-based descriptor). It describes a distribution of [[Haar wavelet]] responses within the interest point neighbourhood. Integral images are used for speed and only 64 dimensions are used reducing the time for feature computation and matching. The indexing step is based on the sign of the [[Laplacian]], which increases the matching speed and the robustness of the descriptor.
| |
| | |
| PCA-SIFT <ref name="Ke2004"/> and [[GLOH]] <ref name="Mikolajczyk2005"/> are variants of SIFT. PCA-SIFT descriptor is a vector of image gradients in x and y direction computed within the support region. The gradient region is sampled at 39x39 locations, therefore the vector is of dimension 3042. The dimension is reduced
| |
| to 36 with [[Principal component analysis|PCA]]. Gradient location-orientation histogram ([[GLOH]]) is an extension of the SIFT descriptor designed to increase its robustness and distinctiveness. The SIFT descriptor is computed for a log-polar location grid with three bins in radial direction (the radius set to 6, 11, and 15) and 8 in angular direction, which results in 17 location bins. The central bin is not divided in angular directions. The gradient orientations are quantized in 16 bins resulting in 272 bin histogram. The size of this descriptor is reduced with [[Principal component analysis|PCA]]. The [[covariance matrix]] for [[Principal component analysis|PCA]] is estimated on image patches collected from various images. The 128 largest [[eigenvector]]s are used for description.
| |
| | |
| Wagner et al. developed two object recognition algorithms especially designed with the limitations of current mobile phones in mind.<ref name="Wagner2008"/> In contrast to the classic SIFT approach, Wagner et al. use the FAST [[Corner detection|corner detector]] for feature detection. The algorithm also distinguishes between the off-line preparation phase where features are created at different scale levels and the on-line phase where features are only created at the current fixed scale level of the phone's camera image. In addition, features are created from a fixed patch size of 15x15 pixels and form a SIFT descriptor with only 36 dimensions. The approach has been further extended by integrating a [[Scalable Vocabulary Tree]] in the recognition pipeline.<ref name="Henze2009"/> This allows the efficient recognition of a larger number of objects on mobile phones. The approach is mainly restricted by the amount of available [[Random-access memory|RAM]].
| |
| | |
| == Features ==
| |
| The detection and description of local image features can help in object recognition. The SIFT features are local and based on the appearance of the object at particular interest points, and are invariant to image scale and rotation. They are also robust to changes in illumination, noise, and minor changes in viewpoint. In addition to these properties, they are highly distinctive, relatively easy to extract and allow for correct object identification with low probability of mismatch. They are relatively easy to match against a (large) database of local features but however the high dimensionality can be an issue, and generally probabilistic algorithms such as [[k-d tree]]s with [[best bin first]] search are used. Object description by set of SIFT features is also robust to partial occlusion; as few as 3 SIFT features from an object are enough to compute its location and pose. Recognition can be performed in close-to-real time, at least for small databases and on modern computer hardware.{{Citation needed|date=August 2008}}
| |
| | |
| == Algorithm ==
| |
| | |
| === Scale-space extrema detection ===
| |
| This is the stage where the interest points, which are called keypoints in the SIFT framework, are detected. For this, the image is convolved with Gaussian filters at different scales, and then the difference of successive Gaussian-blurred images are taken. Keypoints are then taken as maxima/minima of the [[Difference of Gaussians]] (DoG) that occur at multiple scales. Specifically, a DoG image <math>D \left( x, y, \sigma \right)</math> is given by
| |
| | |
| :<math>D \left( x, y, \sigma \right) = L \left( x, y, k_i\sigma \right) - L \left( x, y, k_j\sigma \right)</math>,
| |
| :where <math>L \left( x, y, k\sigma \right)</math> is the convolution of the original image <math>I \left( x, y \right)</math> with the [[Gaussian blur]] <math>G \left( x, y, k\sigma \right)</math> at scale <math>k\sigma</math>, i.e.,
| |
| | |
| :<math>L \left( x, y, k\sigma \right) = G \left( x, y, k\sigma \right) * I \left( x, y \right)</math>
| |
| | |
| Hence a DoG image between scales <math>k_i\sigma</math> and <math>k_j\sigma</math> is just the difference of the Gaussian-blurred images at scales <math>k_i\sigma</math> and <math>k_j\sigma</math>. For [[scale space]] extrema detection in the SIFT algorithm, the image is first convolved with Gaussian-blurs at different scales. The convolved images are grouped by octave (an octave corresponds to doubling the value of <math>\sigma</math>), and the value of <math>k_i</math> is selected so that we obtain a fixed number of convolved images per octave. Then the Difference-of-Gaussian images are taken from adjacent Gaussian-blurred images per octave.
| |
| | |
| Once DoG images have been obtained, keypoints are identified as local minima/maxima of the DoG images across scales. This is done by comparing each pixel in the DoG images to its eight neighbors at the same scale and nine corresponding neighboring pixels in each of the neighboring scales. If the pixel value is the maximum or minimum among all compared pixels, it is selected as a candidate keypoint.
| |
| | |
| This keypoint detection step is a variation of one of the [[blob detection]] methods developed
| |
| by Lindeberg by detecting scale-space extrema of the scale normalized Laplacian,<ref name="Lindeberg1998"/> that is detecting points that are local extrema with respect to both space and scale, in the discrete case by comparisons with the nearest 26 neighbours in a discretized scale-space volume. The difference of Gaussians operator can be seen as an approximation to the Laplacian, with the implicit normalization in the [[pyramid (image processing)|pyramid]] also constituting a discrete approximation of the scale-normalized Laplacian.<ref name="Lindeberg2012"/>
| |
| Another real-time implementation of scale-space extrema of the Laplacian operator has been presented by Lindeberg and Bretzner based on a hybrid pyramid representation.<ref>
| |
| {{cite journal
| |
| | author = Lindeberg, Tony and Bretzner, Lars
| |
| | year = 2003
| |
| | title = Real-time scale selection in hybrid multi-scale representations
| |
| | journal = Proc. Scale-Space'03, Springer Lecture Notes in Computer Science
| |
| | volume = 2695
| |
| | pages = 148–163
| |
| | doi = 10.1007/3-540-44935-3_11
| |
| | isbn = 978-3-540-40368-5
| |
| | url = http://www.nada.kth.se/cvap/abstracts/cvap279.html
| |
| | series = Lecture Notes in Computer Science
| |
| }}</ref>
| |
| | |
| === Keypoint localization ===
| |
| [[Image:Sift keypoints filtering.jpg|thumb|After scale space extrema are detected (their location being shown in the uppermost image) the SIFT algorithm discards low contrast keypoints (remaining points are shown in the middle image) and then filters out those located on edges. Resulting set of keypoints is shown on last image.]] Scale-space extrema detection produces too many keypoint candidates, some of which are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for accurate location, scale, and ratio of [[principal curvatures]]. This information allows points to be rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized along an edge.
| |
| | |
| ==== Interpolation of nearby data for accurate position ====
| |
| First, for each candidate keypoint, interpolation of nearby data is used to accurately determine its position. The initial approach was to just locate each keypoint at the location and scale of the candidate keypoint.<ref name=Lowe1999/> The new approach calculates the interpolated location of the extremum, which substantially improves matching and stability.<ref name=Lowe2004>{{cite journal
| |
| | author = Lowe, David G.
| |
| | year = 2004
| |
| | title = Distinctive Image Features from Scale-Invariant Keypoints
| |
| | journal = International Journal of Computer Vision
| |
| | volume = 60
| |
| | issue = 2
| |
| | pages = 91–110
| |
| | doi = 10.1023/B:VISI.0000029664.99615.94
| |
| | url = http://citeseer.ist.psu.edu/lowe04distinctive.html
| |
| }}</ref> The interpolation is done using the quadratic [[Taylor expansion]] of the Difference-of-Gaussian scale-space function, <math>D \left( x, y, \sigma \right)</math> with the candidate keypoint as the origin. This Taylor expansion is given by:
| |
| | |
| :<math>D(\textbf{x}) = D + \frac{\partial D^T}{\partial \textbf{x}}\textbf{x} + \frac{1}{2}\textbf{x}^T \frac{\partial^2 D}{\partial \textbf{x}^2} \textbf{x}</math>
| |
| | |
| where D and its derivatives are evaluated at the candidate keypoint and <math>\textbf{x} = \left( x, y, \sigma \right)</math> is the offset
| |
| from this point. The location of the extremum, <math>\hat{\textbf{x}}</math>, is determined by taking the derivative of this function with respect to <math>\textbf{x}</math> and setting it to zero. If the offset <math>\hat{\textbf{x}}</math> is larger than <math>0.5</math> in any dimension, then that's an indication that the extremum lies closer to another candidate keypoint. In this case, the candidate keypoint is changed and
| |
| the interpolation performed instead about that point. Otherwise the offset is added to its candidate keypoint to get the interpolated estimate for the location of the extremum.
| |
| A similar subpixel determination of the locations of scale-space extrema is performed
| |
| in the real-time implementation based on hybrid pyramids
| |
| developed by Lindeberg and his co-workers.<ref name="Lindenberg2003"/>
| |
| | |
| ==== Discarding low-contrast keypoints ====
| |
| To discard the keypoints with low contrast, the value of the second-order Taylor expansion <math>D(\textbf{x})</math> is computed at the offset <math>\hat{\textbf{x}}</math>. If this value is less than <math>0.03</math>, the candidate keypoint is discarded. Otherwise it is kept, with final scale-space location <math>\textbf{y} + \hat{\textbf{x}}</math>, where <math>\textbf{y}</math> is the original location of the keypoint.
| |
| | |
| ==== Eliminating edge responses ====
| |
| The DoG function will have strong responses along edges, even if the candidate keypoint is not robust to small amounts of noise. Therefore, in order to increase stability, we need to eliminate the keypoints that have poorly determined locations but have high edge responses.
| |
| | |
| For poorly defined peaks in the DoG function, the [[principal curvature]] across the edge would be much larger than the principal curvature along it. Finding these principal curvatures amounts to solving for the [[eigenvalues]] of the second-order [[Hessian matrix]], '''H''':
| |
| | |
| :<math> \textbf{H} = \begin{bmatrix}
| |
| D_{xx} & D_{xy} \\
| |
| D_{xy} & D_{yy}
| |
| \end{bmatrix} </math>
| |
| | |
| The eigenvalues of '''H''' are proportional to the principal curvatures of D. It turns out that the ratio of the two eigenvalues, say <math>\alpha</math> is the larger one, and <math>\beta</math> the smaller one, with ratio <math>r = \alpha/\beta</math>, is sufficient for SIFT's purposes. The trace of '''H''', i.e., <math>D_{xx} + D_{yy}</math>, gives us the sum of the two eigenvalues, while its determinant, i.e., <math>D_{xx} D_{yy} - D_{xy}^2</math>, yields the product. The ratio <math> \text{R} = \operatorname{Tr} \left( \textbf{H} \right)^2/\operatorname{Det} \left( \textbf{H} \right)</math> can be shown to be equal to <math>\left( r+1 \right)^2/r</math>, which depends only on the ratio of the eigenvalues rather than their individual values. R is minimum when the eigenvalues are equal to each other. Therefore the higher the [[absolute difference]] between the two eigenvalues, which is equivalent to a higher absolute difference between the two principal curvatures of D, the higher the value of R. It follows that, for some threshold eigenvalue ratio <math>r_{\text{th}}</math>, if R for a candidate keypoint is larger than <math>\left( r_{\text{th}} + 1 \right)^2/r_{\text{th}}</math>, that keypoint is poorly localized and hence rejected. The new approach uses <math>r_{\text{th}} = 10</math>.<ref name=Lowe2004/>
| |
| | |
| This processing step for suppressing responses at edges is a transfer of a corresponding approach in the Harris operator for [[corner detection]]. The difference is that the measure for thresholding is computed from the Hessian matrix instead of a second-moment matrix (see [[structure tensor]]).
| |
| | |
| === Orientation assignment ===
| |
| In this step, each keypoint is assigned one or more orientations based on local image gradient directions. This is the key step in achieving [[rotational invariance|invariance to rotation]] as the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation.
| |
| | |
| First, the Gaussian-smoothed image <math>L \left( x, y, \sigma \right)</math> at the keypoint's scale <math>\sigma</math> is taken so that all computations are performed in a scale-invariant manner. For an image sample <math>L \left( x, y \right)</math> at scale <math>\sigma</math>, the gradient magnitude, <math>m \left( x, y \right)</math>, and orientation, <math>\theta \left( x, y \right)</math>, are precomputed using pixel
| |
| differences:
| |
| | |
| :<math>m \left( x, y \right) = \sqrt{\left( L \left( x+1, y \right) - L \left( x-1, y \right) \right)^2 + \left( L \left( x, y+1 \right) - L \left( x, y-1 \right) \right)^2}</math>
| |
| | |
| :<math>\theta \left( x, y \right) = \mathrm{atan2}\left(L \left( x, y+1 \right) - L \left( x, y-1 \right), L \left( x+1, y \right) - L \left( x-1, y \right) \right)</math>
| |
| | |
| The magnitude and direction calculations for the gradient are done for every pixel in a neighboring region around the keypoint in the Gaussian-blurred image L. An orientation histogram with 36 bins is formed, with each bin covering 10 degrees. Each sample in the neighboring window added to a histogram bin is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a <math>\sigma</math> that is 1.5 times that of the scale of the keypoint. The peaks in this histogram correspond to dominant orientations. Once the histogram is filled, the orientations corresponding to the highest peak and local peaks that are within 80% of the highest peaks are assigned to the keypoint. In the case of multiple orientations being assigned, an additional keypoint is created having the same location and scale as the original keypoint for each additional orientation.
| |
| | |
| === Keypoint descriptor ===
| |
| Previous steps found keypoint locations at particular scales and assigned orientations to them. This ensured invariance to image location, scale and rotation. Now we want to compute a descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc. This step is performed on the image closest in scale to the keypoint's scale.
| |
| | |
| First a set of orientation histograms is created on 4x4 pixel neighborhoods with 8 bins each. These histograms are computed from magnitude and orientation values of samples in a 16 x 16 region around the keypoint such that each histogram contains samples from a 4 x 4 subregion of the original neighborhood region. The magnitudes are further weighted by a Gaussian function with <math>\sigma</math> equal to one half the width of the descriptor window. The descriptor then becomes a vector of all the values of these histograms. Since there are 4 x 4 = 16 histograms each with 8 bins the vector has 128 elements. This vector is then normalized to unit length in order to enhance invariance to affine changes in illumination. To reduce the effects of non-linear illumination a threshold of 0.2 is applied and the vector is again normalized.
| |
| | |
| Although the dimension of the descriptor, i.e. 128, seems high, descriptors with lower dimension than this don't perform as well across the range of matching tasks<ref name=Lowe2004/> and the computational cost remains low due to the approximate BBF (see below) method used for finding the nearest-neighbor. Longer descriptors continue to do better but not by much and there is an additional danger of increased sensitivity to distortion and occlusion. It is also shown that feature matching accuracy is above 50% for viewpoint changes of up to 50 degrees. Therefore SIFT descriptors are invariant to minor affine changes. To test the distinctiveness of the SIFT descriptors, matching accuracy is also measured against varying number of keypoints in the testing database, and it is shown that matching accuracy decreases only very slightly for very large database sizes, thus indicating that SIFT features are highly distinctive.
| |
| | |
| ==Comparison of SIFT features with other local features==
| |
| There has been an extensive study done on the performance evaluation of different local descriptors, including SIFT, using a range of detectors.<ref name=Mikolajczyk2005>{{cite journal
| |
| | author = Mikolajczyk, K.; Schmid, C.
| |
| | year = 2005
| |
| | title = A performance evaluation of local descriptors
| |
| | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence
| |
| | volume = 27
| |
| | pages = 1615–1630
| |
| | doi = 10.1109/TPAMI.2005.188
| |
| | url = http://research.microsoft.com/users/manik/projects/trade-off/papers/MikolajczykPAMI05.pdf
| |
| | pmid = 16237996
| |
| | issue = 10
| |
| }}</ref> The main results are summarized below:
| |
| | |
| * SIFT and SIFT-like [[GLOH]] features exhibit the highest matching accuracies (recall rates) for an affine transformation of 50 degrees. After this transformation limit, results start to become unreliable.
| |
| | |
| * Distinctiveness of descriptors is measured by summing the eigenvalues of the descriptors, obtained by the [[Principal components analysis]] of the descriptors normalized by their variance. This corresponds to the amount of variance captured by different descriptors, therefore, to their distinctiveness. PCA-SIFT (Principal Components Analysis applied to SIFT descriptors), GLOH and SIFT features give the highest values.
| |
| | |
| * SIFT-based descriptors outperform other contemporary local descriptors on both textured and structured scenes, with the difference in performance larger on the textured scene.
| |
| | |
| * For scale changes in the range 2-2.5 and image rotations in the range 30 to 45 degrees, SIFT and SIFT-based descriptors again outperform other contemporary local descriptors with both textured and structured scene content.
| |
| | |
| * Introduction of blur affects all local descriptors, especially those based on edges, like [[shape context]], because edges disappear in the case of a strong blur. But GLOH, PCA-SIFT and SIFT still performed better than the others. This is also true for evaluation in the case of illumination changes.
| |
| | |
| The evaluations carried out suggests strongly that SIFT-based descriptors, which are region-based, are the most robust and distinctive, and are therefore best suited for feature matching. However, most recent feature descriptors such as [[SURF]] have not been evaluated in this study.
| |
| | |
| [[SURF]] has later been shown to have similar performance to SIFT, while at the same time being much faster.<ref name="SURF"/> Another study concludes that when speed is not critical, SIFT outperforms SURF.<ref name="SURFvsSIFT"/>
| |
| | |
| Recently, a slight variation of the descriptor employing an irregular histogram grid has been proposed that significantly improves its performance.<ref name="IrrGrid"/> Instead of using a 4x4 grid of histogram bins, all bins extend to the center of the feature. This improves the descriptor's robustness to scale changes.
| |
| | |
| The SIFT-Rank<ref name="Toews2009"/> descriptor was shown to improve the performance of the standard SIFT descriptor for affine feature matching. A SIFT-Rank descriptor is generated from a standard SIFT descriptor, by setting each histogram bin to its rank in a sorted array of bins. The Euclidean distance between SIFT-Rank descriptors is invariant to arbitrary monotonic changes in histogram bin values, and is related to [[Spearman's rank correlation coefficient]].
| |
| | |
| ==Applications==
| |
| | |
| ===Object recognition using SIFT features===
| |
| Given SIFT's ability to find distinctive keypoints that are invariant to location, scale and rotation, and robust to [[affine transformations]] (changes in [[Linear scale|scale]], [[rotation]], [[Shear mapping|shear]], and position) and changes in illumination, they are usable for object recognition. The steps are given below.
| |
| | |
| * First, SIFT features are obtained from the input image using the algorithm described above.
| |
| | |
| * These features are matched to the SIFT feature database obtained from the training images. This feature matching is done through a Euclidean-distance based nearest neighbor approach. To increase robustness, matches are rejected for those keypoints for which the ratio of the nearest neighbor distance to the second nearest neighbor distance is greater than 0.8. This discards many of the false matches arising from background clutter. Finally, to avoid the expensive search required for finding the Euclidean-distance-based nearest neighbor, an approximate algorithm called the best-bin-first algorithm is used.<ref name=Beis1997>{{cite conference
| |
| | author = Beis, J.
| |
| | coauthors=Lowe, David G.
| |
| | year = 1997
| |
| | title = Shape indexing using approximate nearest-neighbour search in high-dimensional spaces
| |
| | booktitle = Conference on Computer Vision and Pattern Recognition, Puerto Rico: sn
| |
| | pages = 1000–1006
| |
| | doi = 10.1109/CVPR.1997.609451
| |
| | url = http://www.cs.ubc.ca/~lowe/papers/cvpr97.pdf
| |
| }}</ref> This is a fast method for returning the nearest neighbor with high probability, and can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time.
| |
| | |
| * Although the distance ratio test described above discards many of the false matches arising from background clutter, we still have matches that belong to different objects. Therefore to increase robustness to object identification, we want to cluster those features that belong to the same object and reject the matches that are left out in the clustering process. This is done using the [[Hough transform]]. This will identify clusters of features that vote for the same object pose. When clusters of features are found to vote for the same pose of an object, the probability of the interpretation being correct is much higher than for any single feature. Each keypoint votes for the set of object poses that are consistent with the keypoint's location, scale, and orientation. ''Bins'' that accumulate at least 3 votes are identified as candidate object/pose matches.
| |
| | |
| * For each candidate cluster, a least-squares solution for the best estimated affine projection parameters relating the training image to the input image is obtained. If the projection of a keypoint through these parameters lies within half the error range that was used for the parameters in the Hough transform bins, the keypoint match is kept. If fewer than 3 points remain after discarding outliers for a bin, then the object match is rejected. The least-squares fitting is repeated until no more rejections take place. This works better for planar surface recognition than 3D object recognition since the affine model is no longer accurate for 3D objects.
| |
| | |
| * In this journal,<ref name="Sirmacek2009"/> authors proposed a new approach to use SIFT descriptors for multiple object detection purposes. The proposed multiple object detection approach is tested on aerial and satellite images.
| |
| | |
| SIFT features can essentially be applied to any task that requires identification of matching locations between images. Work has been done on applications such as recognition of particular object categories in 2D images, 3D reconstruction,
| |
| motion tracking and segmentation, robot localization, image panorama stitching and epipolar calibration. Some of these are discussed in more detail below.
| |
| | |
| ===Robot localization and mapping===
| |
| In this application,<ref name="Se2001"/> a trinocular stereo system is used to determine 3D estimates for keypoint locations. Keypoints are used only when they appear in all 3 images with consistent disparities, resulting in very few outliers. As the robot moves, it localizes itself using feature matches to the existing 3D map, and then incrementally adds features to the map while updating their 3D positions using a Kalman filter. This provides a robust and accurate solution to the problem of robot localization in unknown environments.
| |
| | |
| ===Panorama stitching===
| |
| SIFT feature matching can be used in [[image stitching]] for fully automated [[panorama]] reconstruction from non-panoramic images. The SIFT features extracted from the input images are matched against each other to find ''k'' nearest-neighbors for each feature. These correspondences are then used to find ''m'' candidate matching images for each image. [[homography|Homographies]] between pairs of images are then computed using [[RANSAC]] and a probabilistic model is used for verification. Because there is no restriction on the input images, graph search is applied to find connected components of image matches such that each connected component will correspond to a panorama. Finally for each connected component [[Bundle adjustment]] is performed to solve for joint camera parameters, and the panorama is rendered using [[multi-band blending]]. Because of the SIFT-inspired object recognition approach to panorama stitching, the resulting system is insensitive to the ordering, orientation, scale and illumination of the images. The input images can contain multiple panoramas and noise images (some of which may not even be part of the composite image), and panoramic sequences are recognized and rendered as output.<ref name="Brown2003"/>
| |
| | |
| ===3D scene modeling, recognition and tracking===
| |
| This application uses SIFT features for [[3D object recognition]] and [[3D modeling]] in context of [[augmented reality]], in which synthetic objects with accurate pose are superimposed on real images. SIFT matching is done for a number of 2D images of a scene or object taken from different angles. This is used with [[bundle adjustment]] to build a sparse 3D model of the viewed scene and to simultaneously recover camera poses and calibration parameters. Then the position, orientation and size of the virtual object are defined relative to the coordinate frame of the recovered model. For online [[match moving]], SIFT features again are extracted from the current video frame and matched to the features already computed for the world mode, resulting in a set of 2D-to-3D correspondences. These correspondences are then used to compute the current camera pose for the virtual projection and final rendering. A regularization technique is used to reduce the jitter in the virtual projection.<ref name="Gordon2006"/> 3D extensions of SIFT have also been evaluated for true 3D object recognition and retrieval.<ref name=Flitton2010 /><ref name="flitton13interestpoint">{{cite journal| author=Flitton, G.T., Breckon, T.P., Megherbi, N.| title=A Comparison of 3D Interest Point Descriptors with Application to Airport Baggage Object Detection in Complex CT Imagery| journal=Pattern Recognition| year=2013| publisher=Elsevier| doi=10.1016/j.patcog.2013.02.008| accessdate=8 April 2013}}</ref>
| |
| | |
| ===3D SIFT-like descriptors for human action recognition===
| |
| Extensions of the SIFT descriptor to 2+1-dimensional spatio-temporal data in context of human action recognition in video sequences have been studied.
| |
| <ref name="Flitton2010"/><ref name="Laptev2004"/><ref name="Laptev2007"/><ref name="Scovanner2007"/> The computation of local position-dependent histograms in the 2D SIFT algorithm are extended from two to three dimensions to describe SIFT features in a spatio-temporal domain. For application to human action recognition in a video sequence, sampling of the training videos is carried out either at spatio-temporal interest points or at randomly determined locations, times and scales. The spatio-temporal regions around these interest points are then described using the 3D SIFT descriptor. These descriptors are then clustered to form a spatio-temporal [[Bag of words model]]. 3D SIFT descriptors extracted from the test videos are then matched against these ''words'' for human action classification.
| |
| | |
| The authors report much better results with their 3D SIFT descriptor approach than with other approaches like simple 2D SIFT descriptors and Gradient Magnitude.<ref name="Niebles2006"/>
| |
| | |
| ===Analyzing the Human Brain in 3D Magnetic Resonance Images===
| |
| The Feature-based Morphometry (FBM) technique<ref name=Toews2010>{{cite journal
| |
| | author = Matthew Toews, William M. Wells III, D. Louis Collins, Tal Arbel
| |
| | year = 2010
| |
| | title = Feature-based Morphometry: Discovering Group-related Anatomical Patterns
| |
| | journal = NeuroImage
| |
| | volume = 49
| |
| | pages = 2318–2327
| |
| | doi = 10.1016/j.neuroimage.2009.10.032
| |
| | url = http://www.matthewtoews.com/papers/matt_neuroimage10.pdf
| |
| | pmid = 19853047
| |
| | issue = 3
| |
| }}</ref> uses extrema in a difference of Gaussian scale-space to analyze and classify 3D magnetic resonance images (MRIs) of the human brain. FBM models the image probabilistically as a collage of independent features, conditional on image geometry and group labels, e.g. healthy subjects and subjects with Alzheimer's disease (AD). Features are first extracted in individual images from a 4D difference of Gaussian scale-space, then modeled in terms of their appearance, geometry and group co-occurrence statistics across a set of images. FBM was validated in the analysis of AD using a set of ~200 volumetric MRIs of the human brain, automatically identifying established indicators of AD in the brain and classifying mild AD in new images with a rate of 80%.<ref name=Toews2010 />
| |
| | |
| == See also ==
| |
| *[[Autostitch]]
| |
| *[[Scale space]]
| |
| *[[Scale space implementation]]
| |
| | |
| == References ==
| |
| {{reflist|refs=
| |
| <ref name=Lowe1999>{{cite conference
| |
| | author = Lowe, David G.
| |
| | year = 1999
| |
| | title = Object recognition from local scale-invariant features
| |
| | booktitle = Proceedings of the International Conference on Computer Vision
| |
| | volume = 2
| |
| | pages = 1150–1157
| |
| | doi = 10.1109/ICCV.1999.790410
| |
| | url = http://doi.ieeecs.org/10.1109/ICCV.1999.790410
| |
| }}</ref>
| |
| <ref name=patent>{{US patent|6,711,293}}, "Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image", David Lowe's patent for the SIFT algorithm, March 23, 2004</ref>
| |
| <ref name=Lowe2004>Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.</ref>
| |
| <ref name=Serre2005>Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., Poggio, T., “[http://cbcl.mit.edu/projects/cbcl/publications/ai-publications/2005/AIM-2005-036.pdf A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex]”, Computer Science and Artificial Intelligence Laboratory Technical Report, December 19, 2005 MIT-CSAIL-TR-2005-082.</ref>
| |
| <ref name=Beis1997>Beis, J., and Lowe, D.G “Shape indexing using approximate nearest-neighbour search in high-dimensional spaces”, Conference on Computer Vision and Pattern Recognition, Puerto Rico, 1997, pp. 1000–1006.</ref>
| |
| <ref name=Lowe2001>Lowe, D.G., Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition,Kauai, Hawaii, 2001, pp. 682-688.</ref>
| |
| <ref name=Lazebnik2004>Lazebnik, S., Schmid, C., and Ponce, J., "[http://hal.archives-ouvertes.fr/docs/00/54/85/42/PDF/bmvc04.pdf Semi-Local Affine Parts for Object Recognition]", Proceedings of the British Machine Vision Conference, 2004.</ref>
| |
| <ref name=Sungho2006>Sungho Kim, Kuk-Jin Yoon, In So Kweon, "Object Recognition Using a Generalized Robust Invariant Feature and Gestalt’s Law of Proximity and Similarity", Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06), 2006</ref>
| |
| <ref name=Bay2006>Bay, H., Tuytelaars, T., Gool, L.V., "[http://www.vision.ee.ethz.ch/~surf/eccv06.pdf SURF: Speeded Up Robust Features]", Proceedings of the ninth European Conference on Computer Vision, May 2006.</ref>
| |
| <ref name=Ke2004>Ke, Y., and Sukthankar, R., "[http://www.cs.cmu.edu/~rahuls/pub/cvpr2004-keypoint-rahuls.pdf PCA-SIFT: A More Distinctive Representation for Local Image Descriptors]", Computer Vision and Pattern Recognition, 2004.</ref>
| |
| <ref name=Mikolajczyk2005>Mikolajczyk, K., and Schmid, C., "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 27, pp 1615--1630, 2005.</ref>
| |
| <ref name=Wagner2008>D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg, "[http://mi.eng.cam.ac.uk/~gr281/docs/WagnerIsmar08NFT.pdf Pose tracking from natural features on mobile phones]" Proceedings of the International Symposium on Mixed and Augmented Reality, 2008.</ref>
| |
| <ref name=Henze2009>N. Henze, T. Schinke, and S. Boll, "What is That? Object Recognition from Natural Features on a Mobile Phone" Proceedings of the Workshop on Mobile Interaction with the Real World, 2009.</ref>
| |
| <ref name=Lindeberg1998>{{cite journal
| |
| | author = Lindeberg, Tony
| |
| | year = 1998
| |
| | title = Feature detection with automatic scale selection
| |
| | journal = International Journal of Computer Vision
| |
| | volume = 30
| |
| | issue = 2
| |
| | pages = 79–116
| |
| | doi = 10.1023/A:1008045108935
| |
| | url = http://www.nada.kth.se/cvap/abstracts/cvap198.html
| |
| }}</ref>
| |
| <ref name=Lindeberg2012>{{cite journal
| |
| | author = Lindeberg, Tony
| |
| | year = 2012
| |
| | title = Scale invariant feature transform
| |
| | journal = Scholarpedia
| |
| | volume = 7
| |
| | issue = 5
| |
| | pages = 10491
| |
| | url =http://www.scholarpedia.org/article/Scale_Invariant_Feature_Transform
| |
| }}</ref>
| |
| <ref name=Lindenberg2003>
| |
| {{cite journal
| |
| | author = Lindeberg, Tony and Bretzner, Lars
| |
| | year = 2003
| |
| | title = Real-time scale selection in hybrid multi-scale representations
| |
| | journal = Proc. Scale-Space'03, Springer Lecture Notes in Computer Science
| |
| | volume = 2695
| |
| | pages = 148–163
| |
| | doi = 10.1007/3-540-44935-3_11
| |
| | isbn = 978-3-540-40368-5
| |
| | url = http://www.nada.kth.se/cvap/abstracts/cvap279.html
| |
| }}</ref>
| |
| <ref name=SURF>[http://www.tu-chemnitz.de/etit/proaut/rsrc/iav07-surf.pdf TU-chemnitz.de]</ref>
| |
| <ref name=SURFvsSIFT>Edouard Oyallon, Julien Rabin, "[http://www.ipol.im/pub/pre/69/ An Analysis and Implementation of the SURF Method, and its Comparison to SIFT]", Image Processing On Line</ref>
| |
| <ref name=IrrGrid>{{cite conference
| |
| | first = Y.
| |
| | last = Cui
| |
| | authorlink =
| |
| | coauthors = Hasler, N.; Thormaehlen, T.; Seidel, H.-P.
| |
| | title = Scale Invariant Feature Transform with Irregular Orientation Histogram Binning
| |
| | booktitle = Proceedings of the International Conference on Image Analysis and Recognition (ICIAR 2009)
| |
| | pages =
| |
| | publisher = Springer
| |
| | date = July 2009
| |
| | location = Halifax, Canada
| |
| | url = http://www.mpi-inf.mpg.de/~hasler/download/CuiHasThoSei09igSIFT.pdf
| |
| | doi =
| |
| | id =
| |
| | accessdate = }}</ref>
| |
| <ref name=Toews2009>{{cite conference
| |
| | author = Matthew Toews, William M. Wells III
| |
| | year = 2009
| |
| | title = SIFT-Rank: Ordinal Descriptors for Invariant Feature Correspondence
| |
| | booktitle = IEEE International Conference on Computer Vision and Pattern Recognition
| |
| | pages = 172–177
| |
| | doi = 10.1109/CVPR.2009.5206849
| |
| | url = http://www.matthewtoews.com/papers/cvpr09-matt.final.pdf
| |
| }}</ref>
| |
| <ref name=Se2001>{{cite conference
| |
| | author = Se, S.; Lowe, David G.; Little, J.
| |
| | year = 2001
| |
| | booktitle = Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
| |
| | title = Vision-based mobile robot localization and mapping using scale-invariant features
| |
| | volume = 2
| |
| | pages = 2051
| |
| | doi = 10.1109/ROBOT.2001.932909
| |
| | url = http://citeseer.ist.psu.edu/425735.html
| |
| }}</ref>
| |
| <ref name=Brown2003>{{cite conference
| |
| | author = Brown, M.; Lowe, David G.
| |
| | year = 2003
| |
| | title = Recognising Panoramas
| |
| | booktitle = Proceedings of the ninth IEEE International Conference on Computer Vision
| |
| | pages = 1218–1225
| |
| | volume = 2
| |
| | doi = 10.1109/ICCV.2003.1238630
| |
| | url = http://graphics.cs.cmu.edu/courses/15-463/2005_fall/www/Papers/BrownLowe.pdf
| |
| }}</ref>
| |
| <ref name=Gordon2006>Iryna Gordon and David G. Lowe, "[http://www.cs.ubc.ca/labs/lci/papers/docs2006/lowe_gordon.pdf What and where: 3D object recognition with accurate pose]," in Toward Category-Level Object Recognition, (Springer-Verlag, 2006), pp. 67-82</ref>
| |
| <ref name=Laptev2004>{{cite conference
| |
| | author = Laptev, Ivan and Lindeberg, Tony
| |
| | year = 2004
| |
| | title = Local descriptors for spatio-temporal recognition
| |
| | booktitle = ECCV'04 Workshop on Spatial Coherence for Visual Motion Analysis, Springer Lecture Notes in Computer Science, Volume 3667
| |
| | pages = 91–103
| |
| | doi = 10.1007/11676959_8
| |
| | url = ftp://ftp.nada.kth.se/CVAP/reports/LapLin04-SCVMA.pdf
| |
| }}</ref>
| |
| <ref name=Laptev2007>{{cite journal
| |
| | author = Ivan Laptev, Barbara Caputo, Christian Schuldt and Tony Lindeberg
| |
| | year = 2007
| |
| | title = Local velocity-adapted motion events for spatio-temporal recognition
| |
| | journal = Computer Vision and Image Understanding
| |
| | volume = 108
| |
| | issue = 3
| |
| | pages = 207–229
| |
| | doi = 10.1016/j.cviu.2006.11.023
| |
| | url = http://www.csc.kth.se/cvap/abstracts/LapCapSchLin07-CVIU.html
| |
| }}</ref>
| |
| <ref name=Scovanner2007>{{cite conference
| |
| | author = Scovanner, Paul
| |
| | coauthors = Ali, S; Shah, M
| |
| | year = 2007
| |
| | title = A 3-dimensional sift descriptor and its application to action recognition
| |
| | booktitle = Proceedings of the 15th International Conference on Multimedia
| |
| | pages = 357–360
| |
| | doi = 10.1145/1291233.1291311
| |
| }}</ref>
| |
| <ref name=Flitton2010>{{cite conference
| |
| | author = Flitton, G.
| |
| | coauthors = Breckon, T.
| |
| | year = 2010
| |
| | title = Object Recognition using 3D SIFT in Complex CT Volumes
| |
| | booktitle = Proceedings of the British Machine Vision Conference
| |
| | pages = 11.1–12
| |
| | doi = 10.5244/C.24.11
| |
| | url = http://www.durham.ac.uk/toby.breckon/publications/papers/flitton10baggage.pdf
| |
| }}</ref>
| |
| <ref name=Niebles2006>{{cite conference|author=Niebles, J. C. Wang, H. and Li, Fei-Fei|url=http://vision.cs.princeton.edu/niebles/humanactions.htm |title=Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words |booktitle=Proceedings of the British Machine Vision Conference (BMVC)|location=Edinburgh|year=2006|accessdate=2008-08-20}}</ref>
| |
| <ref name=Sirmacek2009>{{cite journal
| |
| | author = Beril Sirmacek and Cem Unsalan
| |
| | year = 2009
| |
| | title = Urban Area and Building Detection Using SIFT Keypoints and Graph Theory
| |
| | journal = IEEE Transactions on Geoscience and Remote Sensing
| |
| | volume = 47
| |
| | issue = 4
| |
| | pages = 1156–1167
| |
| | doi = 10.1109/TGRS.2008.2008440
| |
| | url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4799121&contentType=Journals+%26+Magazines&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A4804105%29
| |
| }}</ref>
| |
| <ref name=Toews2010>{{cite journal
| |
| | author = Matthew Toews, William M. Wells III, D. Louis Collins, Tal Arbel
| |
| | year = 2010
| |
| | title = Feature-based Morphometry: Discovering Group-related Anatomical Patterns
| |
| | journal = NeuroImage
| |
| | volume = 49
| |
| | pages = 2318–2327
| |
| | doi = 10.1016/j.neuroimage.2009.10.032
| |
| | url = http://www.matthewtoews.com/papers/matt_neuroimage10.pdf
| |
| | pmid = 19853047
| |
| | issue = 3
| |
| }}</ref>
| |
| }}
| |
| ==External links==
| |
| * [http://www.scholarpedia.org/article/SIFT Scale-Invariant Feature Transform (SIFT) in Scholarpedia]
| |
| * [http://robwhess.github.com/opensift/ Rob Hess's implementation of SIFT] accessed 21 Nov 2012
| |
| * [http://www.jprr.org/index.php/jprr/article/view/26 The Invariant Relations of 3D to 2D Projection of Point Sets, Journal of Pattern Recognition Research][http://www.jprr.org (JPRR)], Vol. 3, No 1, 2008.
| |
| * [http://citeseer.ist.psu.edu/lowe04distinctive.html Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.]
| |
| * [http://lear.inrialpes.fr/pubs/2005/MS05/ Mikolajczyk, K., and Schmid, C., "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 27, pp 1615--1630, 2005.]
| |
| * [http://www.cs.cmu.edu/~yke/pcasift/ PCA-SIFT: A More Distinctive Representation for Local Image Descriptors]
| |
| * [http://www-cvr.ai.uiuc.edu/ponce_grp/publication/paper/bmvc04.pdf Lazebnik, S., Schmid, C., and Ponce, J., Semi-Local Affine Parts for Object Recognition, BMVC, 2004. ]
| |
| * [http://www.ipol.im/pub/algo/my_affine_sift/ ASIFT (Affine SIFT)]: large viewpoint matching with SIFT, with source code and online demonstration
| |
| * [http://www.vlfeat.org/ VLFeat], an open source computer vision library in C (with a MEX interface to MATLAB), including an implementation of SIFT
| |
| * [http://www.cs.cityu.edu.hk/~wzhao2/lip-vireo.htm LIP-VIREO], A toolkit for keypoint feature extraction (binaries for Windows, Linux and SunOS), including an implementation of SIFT
| |
| * [https://sites.google.com/site/btabibian/projects/3d-reconstruction/code (Parallel) SIFT in C#], SIFT algorithm in C# using Emgu CV and also a modified parallel version of the algorithm.
| |
| *[http://www.mathworks.com/matlabcentral/fileexchange/38782 DoH & LoG + affine], Blob detector adapted from a SIFT toolbox
| |
| *[http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/ A simple step by step guide to SIFT]
| |
| *[http://www.berilsirmacek.com/sift_multiple_object_detection.html SIFT for multiple object detection]
| |
| *"[http://www.ipol.im/pub/pre/82/ The Anatomy of the SIFT Method]" in Image Processing On Line, a detailed study of every step of the algorithm with an open source implementation and a web demo to try different parameters
| |
| *[https://sourceforge.net/p/ezsift/ ezSIFT: an easy-to-use standalone SIFT implementation in C/C++]. A self-contained SIFT implementation which does not require other libraries.
| |
| * [http://www.matthewtoews.com/fba/featExtract1.3.zip A 3D SIFT implementation: detection and matching in volumetric images.]
| |
| {{DEFAULTSORT:Scale-Invariant Feature Transform}}
| |
| [[Category:Feature detection]]
| |
| [[Category:Object recognition and categorization]]
| |
| | |
| {{Link FA|fr}}
| |