-
Optical Character Recognition
- 对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息
- Neural Network: pattern recognition- Tutorial
-
android源码实现
- external/tesseract/*
- 编译:
$ cd external/tesseract/
$ mm
生成libocr.so 0(liblept,libtess,libjpeg),push系统/system/lib/中,它也可以放在软件的安装包里
-
doOcr
- /**
* 进行图片识别
*
* @param bitmap
* 待识别图片
* @param language
* 识别语言
* @return 识别结果字符串
*/
public String doOcr(Bitmap bitmap, String language) {
TessBaseAPI baseApi = new TessBaseAPI();
//初始化OCR的字符集data路径:getSDPath()="/mnt/sdcard"
baseApi.init(getSDPath(), language);
//baseApi.init(".", language);
// 必须加此行,tess-two要求BMP必须为此配置
bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);
baseApi.setImage(bitmap);
String text = baseApi.getUTF8Text();
baseApi.clear();
baseApi.end();
return text;
}
-
getUTF8Text
- /**
* The recognized text is returned as a String which is coded as UTF8.
*
* @return the recognized text
*/
public String getUTF8Text() {
// Trim because the text will have extra line breaks at the end
String text = nativeGetUTF8Text();
return text.trim();
}
-
native方法
- private native String nativeGetUTF8Text();
- public class TessBaseAPI {
/**
* Used by the native implementation of the class.
*/
private int mNativeData;
static {
System.loadLibrary("lept");
System.loadLibrary("tess");
nativeClassInit();
}
-
算法
- 切分,归一化,特征提取,和数据库中对比,结果输出
-
识别率
- a search strategy
- classification engine
-
特征提取
- characterized by
- having a large set of symbols
- 匹配能力
-
匹配算法
- 模板匹配
- 人工神经网络训练
- 结构化分析、特征统计
- Training data
- http://code.google.com/p/tesseract-ocr/wiki/Documentation
-
应用
- http://www.i2ocr.com/
- 邮政编码识别的信函自动分拣系统
- 汉王公司、国外的东芝,IBM、HP,NEC
- 光,颜色
- 人类的认知/AI/模式识别
- 宇宙全息律:一切皆是映射
- 01信息技术:存储(Si Disc), 计算(Si cpu),显示(LCD)---道生一,一生二,二生三,三生万物.
- 计算机图形学:图像2值算法(大津展之,1979)
- QRCode 二维码
- DC