Adam Cerny - cernyad2(-at-)fel(...)cvut(...)cz





ÚKOL: Vypracovani obecne teorie na reseni Captcha problemu




TO DO
- Priorita: Implementace SWT (Stroke Width Transform) + upravy
- Podivat se na Support vector machine - mozna naleznu vyuziti v pokryvani grafu
- Prozatim co nejvice o tematu nastudovat
- Ve volnem case pracovat na zpusobu reseni viz. IDEAS
- Nastudovat Fuzzy c-cluster - zajimave reseni pro "teckovane" CAPTCHY, ale i pro segmentaci souvislych textu v CAPTCHE - nutno dodat dost heuristik, nefunguje vzdy
- Prostudovat i thin plate spline - kvuli rekonstrukci non-linear transformaci
- Prehled vseho, co kdy bylo o captchach napsano




Sbirka CAPTCH - prozatim odkazy na jine stranky, nejznamejsi projekty + statistika, jak moc autori ji umi crackovat.
1) Projekt PWNtcha: Pridane i komentare ohledne obtoznosti jednotlivych captch
2) Rusky projekt: Bohuzel azbuka, ale i tak poucne
3) Cinsti crackeri: Velmi zajimave, serazeno od nejlehcich az po nejtezsi + cena za decaptchu, zrejme. Yahoo, google, myspace a hotmail nejtezsi.
4) OCR RESEARCH TEAM: Ohodnocene captchy dle obtiznosti, vyvoj vlastni "3D" captchy, docela poucne


Hypoteza: Exploitovat veskere strategie strujcu CAPTCHY bude nejspise nemozne - vzdy se rozpadne na podproblemy,
jez nelze jednotne rozhodnout. - Napr. segmentace je mnohdy nemozna, resp. transformace je natolik komplexni, ze
shlukovani nelze jednoznacne rozpoznat - casto nejsou slovnikova slova, tedy nelze ani rozhodnout, co dava a nedava smysl.
Mysleno tedy v dusledku tak, ze bez individualniho pristupu k danym technikam generovani CAPTCH se neobejdeme.




IDEAS:
6) (25.7.2011)
- Uprava funkce pro Continous Component v Stroke Width Transform
- V CC budou takove pixely, jez svoji stroke width meni spojite a pod jistou smernici

5) (15.7.2011)
- Napad na unbreakable captchu - Vse souvisle, nesegmentovatelne, exponencialne upocitatelne - alespon dle znamych technik reseni

4) (14.7.2011)
- Dle nastudovanych materialu a viz. IDEAS 3), je jednoduche exploitovat deterministicke generatory, kdez je za pomoci heuristik mozna segmentace a recognition s uzitim neuronovych siti
- Stale zaujimam nazor, ze exploitace generatoru systemu S bude v podstate idealni, viz. hypoteza - segmentace -> recognition (casto neuronove site na dane sade znaku)
- Existuji vsak i generatory nedeterministicke ve smyslu evoluce sad znaku - to muze byt dosti problematicke, nikoliv vsak neresitelne, pokud je segmetace ok
- Nejtezsi vyzvou prozatim budou captchy z dilen Googlu ci Yahoo z hlediska segmentace, lec mam tuseni, ze IDEAS 1) tento problem uspesne resi, zdali to vsak bude reseni jednoznacne, to je otazka dalsich uvah a komplexnejsich algoritmu
- Je mozne, ze existuje pomerne omezena trida znaku a system pujde zobecnit, vsak prozatim zadne jasne patterny nevyvstavaji

3) (11.7.2011)
- Jako reakce na vyse zminenou hypotezu mne napada jedine, a to problem prevest na studium pouzivanych transformaci textu
- Tedy dokazeme ony transformace popsat a vypocetne nenarocne je v obrazku najit?
- Brutal force, ale prozatim jedina metoda, ktera mi pripada jakozto spasna.
- EDIT: Myslenka konverguje k reverse engineeringu - velmi obtizne, ale vedlo by k 100% úspesnosti.

2) (11.7.2011)
- priblizeni lidskemu vnimani - Oko vidi captchu v ruznych barvach po vrstvach
- Hloubkova pixelova analyza - Jednotlive segmenty (jednoduche na nalezeni, pouhy filtr) lze propojit do grafu a hledat prijatelne patterny
- Vypocetne narocne, tedy nutno vymyslet heurestiky - Pujde to?

1) (6.7.2011 - dnes)
- Polygonal approximation jiz segmentovanych objektu (obecne jakychkoliv souvislych oblasti - nezalezi na tom)
- Prevod do grafu - nutno zavest teleskopicke grafy (jista relace mezi grafy, nutno pro realizaci relaxace podminek, tj. kruznice o sesti vrcholech a dane heurestice bude v relaci s optimalnim znakem O, byť je namapovan na vrcholu 5)
- Namapovani "idealnich" znaku do grafu + heurestiky (poloha v bitmape, úhly, v podstate parametricke fce)
- Finalni jest pokryti daneho grafu znaky
- POZN: Prozatim pozastaveno po diskuzi s prof. Matasem




READ:
- A Computational Approach to Edge Detection, JOHN CANNY, 1986 - 25.7.2011
- Character-Stroke Detection for Text-Localization and Extraction, Krishna Subramanian, Prem Natarajan, Michael Decerbo, David Castan 2005 - 25.7.2011
- Detecting Text in Natural Scenes with Stroke Width Transform, Boris Epshtein, Eyal Ofek, Yonatan Wexler, 2006 - 25.7.2011
- Optical Character Recognition An illustrated guide to the frontier, George Nagy, Thomas A. Nartker, Stephen V. Rice - 19.7.2011
- ScatterType a Reading CAPTCHA Resistant to Segmentation Attack, Henry S. Baird, Terry Riopka, 2005 - 19.7.2011
- Style Consistent Classification of Isogenous Patterns, Prateek Sarkar, George Nagy, 2005 - 19.7.2011
- Building Segmentation Based Human-Friendly Human Interaction Proofs, Kumar Chellapilla, Kevin Larson, Patrice Y. Simard, Mary Czerwinski, 2005 - 19.7.2011
- Whats Up CAPTCHA A CAPTCHA Based On Image Orientation, Rich Gossweiler, Maryam Kamvar, Shumeet Baluja, 2009 - 18.7.2011
- Using Character Recognition and Segmentation to Tell Computer from Humans, Patrice Y. Simard, Richard Szeliski, Josh Benaloh, Julien Couvreur, and Iulian Calinov, 2003 - 18.7.2011
- Using Machine Learning to Break Visual HIPs, Kumar Chellapilla, Patrice Y. Simard - 18.7.2011
- Distortion Estimation Techniques in Solving Visual CAPTCHAs, Gabriel Moy, Nathan Jones, Curt Harkless, Randall Potter, 2004 - 18.7.2011 - velmi zajimavy pristup reseni k jiste omezene tride generatoru
- Telling Humans and Computers Apart, Luis von Ahn, Manuel Blum, John Langford, 2000 - 18.7.2011
- A Low-cost Attack on a Microsoft CAPTCHA, Jeff Yan, Ahmad Salah El Ahmad, 2007 - 18.7.2011
- Drag and Drop: A Better Approach to CAPTCHA, Arpan Desai, Pragnesh Patadia, 2009 - 18.7.2011
- CAPTCHA Using Strangeness in Machine Translation, Takumi Yamamoto1, J. D. Tygar, Masakatsu Nishigaki, 2010 - 18.7.2011 - pomerne zajimave testy, napr. odliseni strojoveho prekladu a smysluplne vety
- Advanced Collage CAPTCHA, Mohammad Shirali-Shahreza, Sajad Shirali-Shahreza, 2008 - 18.7.2011
- DESIGNING CAPTCHA ALGORITHM: SPLITTING AND ROTATING THE IMAGES AGAINST OCRs, Ibrahim Furkan Ince, Ilker Yengin, Yucel Batu Salman, Hwan-Gue Cho, Tae-Cheon Yang, 2008 - 15.7.2011
- Zhang’s CAPTCHA Architecture Based on Intelligent Interaction via RIA, Wenjun Zhang, 2010 - 15.7.2011
- CAPTCHA Design Based on Moving Object, JingSong Cui, LiJing Wang, JingTing Mei, Da Zhang, Xia Wang, Yang Peng, WuZhou Zhang, 2008 - 14.7.2011
- A Proposal of Four-panel cartoon CAPTCHA The Concept, Takumi Yamamoto1, Tokuichiro Suzuki, Masakatsu Nishigaki, 2010 - 14.7.2011
- Pattern Recognition, Achint OommenThomas., AmaliaRusu,VenuGovindaraju, 2009 - 14.7.2011
- Algorithm for secured online authentication using CAPTCHA, Prof. (Mrs.) A.A. Chandavale and Prof. Dr.A.M. Sapkal, 2010 - 13.7.2011
- CAPTCHAs: The Good, the Bad, and the Ugly, Paul Baecher, Marc Fischlin, Lior Gordon, Robert Langenberg, Michael Lutzow, Dominique Schroder, 2010/2011 - 13.7.2011
- Kluever_-_Character_Segmentation_Classification, Kurt Alfred Kluever - 13.7.2011
- http://network-security-research.blogspot.com/ - 13.7.2011, dobry material sdilejici hypotezu (viz. vyse), velmi poucne o pokrocilejsich algoritmech a technikach
- Usability of CAPTCHAs Or usability issues in CAPTCHA design - Jeff Yan, Ahmad Salah El Ahmad, 2008 - 12.7.2011
- A Projection-based Segmentation Algorithm, Shih-Yu Huang, Yeuan-Kuen Lee, Graeme Bell and Zhan-he Ou, 2008 - 12.7.2011
- Reverse Engineering CAPTCHAs, Abram Hindle, Michael W. Godfrey, Richard C. Holt - 11.7.2011
- Leveraging the CAPTCHA Problem, Daniel Lopresti, 2005 - 11.7.2011
- A Highly Legible CAPTCHA that Resists Segmentation Attacks, Henry S. Baird, Michael A. Moll, Sui-Yu Wang, 2005 - 11.7.2011
- CAPTCHA Security A Case Study, Jeff Yan, Ahmad Salah El Ahmad, 2009 - 11.7.2011
- Pitfalls in CAPTCHA design and implementation, Carlos Javier Hernandez-Castro*, Arturo Ribagorda, 2010 - 11.7.2011
- http://cs.joensuu.fi/~koles/approximation/Ch3_0.html - Pekny úvod do Polygonal approximation - 8.7.2011
- Algorithm To Break Visual CAPTCHA - Prof. (Mrs.) A.A. Chandavale, 2010 - 7.7.2011
- The Robustness of CAPTCHAs: A Security Engineering Perspective - J. Yan, A. S. El Ahmad - University of Newcastle upon Tyne., 2009 - 7.7.2011
- Vytezovani textu ze strojove psanych dokumentu - Bakalarska prace, MFF UK - Hubert Kindermann, 2011 - 4.7.2011




TO READ: Bude upresneno - je toho hodne, musim trochu profiltrovat obsah dle toho, jak moc je ok
- Image Processing, Analysis, and Machine Vision - 60/800 - 14.7.2011 - Pochopit hlavni operace uzivane v ruznych materialech o crackovani captchy.
- Learning OpenCV


Aktualizace: 25.7.2011