计算机视觉学习 BOW模型图像搜索

BOW模型 bag of words

1.简介

bag of words是文档的一种建模方法，它可以把一个文档表示成向量数据，从而使计算机处理文档数据更加方便。

例如：

把文档中的单词抽取出来，可以构成一个单词表：

{

“John”: 1,

“likes”: 2,

“to”: 3,

“watch”: 4,

“movies”: 5,

“also”: 6,

“football”: 7,

“games”: 8,

“Mary”: 9,

“too”: 10

}

这个单词表中有10个不同的词，根据单词表中每个词的索引号，可以把两个文档表示成如下的两个含有10个元素的向量：

其中，元素值代表其索引号在单词表中对应的词在文档中出现的次数。比如第一个文档对应的向量，前两个元素值为1和2，1代表“John”在该文档中的出现1次，2代表“likes”在该文档中出现2次。

这样的建模，只提取了单词，忽略了语法和语序，等于把单词一个一个放进一个袋子里，所以是词袋模型。

2, BOW在图像中的应用

详见 /s/blog_65f81ec601012sd5.html

要应用到图像中，我们要把一幅图看成一个文档，图片分割成的patch对应的sift特征看成单词。所以首先要做的是单词表的构造。由于图像中的词汇不像文本文档中的那样是现成的，我们需要首先从图像中提取出相互独立的视觉词汇，这通常需要经过三个步骤：（1）特征检测，（2）特征表示，（3）单词本的生成。如图：

bow模型的实现主要有以下三步：

1）把图像分割成一个个patch，并对每个patch的中心点计算sift特征。sift算法可以提取图像中的局部不变特征，这一步是做dense sift.

2）利用K-Means算法构造单词表。将这些patch的中心点的sift特征聚成为k个类，用这k个聚类中心来构造单词表。因为一幅图像中能提取出成千上万个sift向量，而每个sift向量是128维的，为了减小计算量，需要对这些sift向量做聚类，把相似的patch合并，取聚类中心作代表，构造单词表。

3）利用单词表的中词汇表示图像。利用SIFT算法，可以从每幅图像中提取很多个特征点，这些特征点都可以用单词表中的单词近似代替，通过统计单词表中每个单词在图像中出现的次数，可以将图像表示成为一个K=4维数值向量

3.Bag-of-words模型在计算机视觉的应用

计算机视觉领域的研究者们尝试将同样的思想应用到图像处理和识别领域，建立了由文本处理技术向图像领域的过渡。但是图像分类问题与文本分类问题又有差别。因为文本是由单词组成的，提取文本关键词的过程没有特别限制。但对于图像来讲，如何定义图像的“单词”，较有难度。通过对 BoW 模型进行研究和探索，提出了采用k -means聚类方法，将具有相似性较强的特征归入到一个聚类类别里，定义每个聚类的中心即为图像的“单词”，聚类类别的数量即为整个视觉词典的大小。这样，每个图像就可以由一系列具有代表性的视觉单词来表示。

在应用BoW模型来表述图像时，图像被看作是文档，而图像中的关键特征被看作为“单词”，其应用于图像分类时主要包括三个步骤：

1.特征提取和描述；

2.视觉词典构造；

3.单词表的中词汇表示图像

1.图像特征提取和描述

(1) 规则网格(Regular Grid)方法是特征提取的最简单且有效的方法之一，该方法将图像应用均匀网格进行划分，从而得到一些图像的局部区域特征，此方法在应用于自然场景分类时收到了良好的效果。图2给出了利用规则网格方法得到的特征提取结果。

采用规则网格法的优点在于：(1)可以人为地设定网格的划分级别，得到想要的特征数目；(2)在划分过程中可以对一些特征进行精确的定位；(3)可以充分利用图像的数据信息，最大限度的做到信息的完整性。然而该方法也存在一定的缺点，例如引入了大量的冗余（背景）信息，而降低了对象本身所提供的有用信息的价值。

(2) 兴趣点检测方法；兴趣点检测子和兴趣区域检测子的实现方法都是通过数学计算，去抽取满足一定数学条件的特征点或者区域，常用的检测子有edge-laplace、harris-laplace、hessian-laplace、harris-affine、hessian-affine、MSER、salient regions实际上，针对具体任务不同以及应用的数据库不同，最佳检测子的选择也很不相同。

2.构建视觉词典

利用聚类算法(如：K-Means算法)对步骤1提取的特征描述子构造单词表（词典），特征描述子分为K个簇，以使簇内具有较高的相似度，而簇间相似度较低，将词义相近的词汇合并，作为单词表中的基础词汇，聚类类别的数量K即为整个视觉词典的大小基础词汇的个数。

3.单词表中的词汇表示图像

从每幅图像中提取很多个特征点，这些特征点都可以用单词表中的单词近似代替，通过统计单词表中每个单词在图像中出现的次数，可以将图像表示成为一个K维数值向量。

代码实现

生成视觉词汇

addImage.py

# -*- coding: utf-8 -*-import picklefrom PCV.imagesearch import imagesearchfrom PCV.localdescriptors import siftfrom sqlite3 import dbapi2 as sqlitefrom PCV.tools.imtools import get_imlist#获取图像列表imlist = get_imlist('first1000/')nbr_images = len(imlist)#获取特征列表featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]# load vocabulary#载入词汇with open('first1000/vocabulary.pkl', 'rb') as f:voc = pickle.load(f)#创建索引indx = imagesearch.Indexer('testImaAdd.db',voc)indx.create_tables()# go through all images, project features on vocabulary and insert#遍历所有的图像，并将它们的特征投影到词汇上for i in range(nbr_images)[:1000]:locs,descr = sift.read_features_from_file(featlist[i])indx.add_to_index(imlist[i],descr)# commit to database#提交到数据库indx.db_commit()con = sqlite.connect('testImaAdd.db')print (con.execute('select count (filename) from imlist').fetchone())print (con.execute('select * from imlist').fetchone())

cocabulary.py

# -*- coding: utf-8 -*-import picklefrom PCV.imagesearch import vocabularyfrom PCV.tools.imtools import get_imlistfrom PCV.localdescriptors import sift#获取图像列表imlist = get_imlist('first1000/')nbr_images = len(imlist)#获取特征列表featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]#提取文件夹下图像的sift特征for i in range(nbr_images):sift.process_image(imlist[i], featlist[i])#生成词汇voc = vocabulary.Vocabulary('ukbenchtest')voc.train(featlist, 1000, 10)#保存词汇# saving vocabularywith open('first1000/vocabulary.pkl', 'wb') as f:pickle.dump(voc, f)print ('vocabulary is:', voc.name, voc.nbr_words)

查询图片

reranking.py

# -*- coding: utf-8 -*-import picklefrom PCV.localdescriptors import siftfrom PCV.imagesearch import imagesearchfrom PCV.geometry import homographyfrom PCV.tools.imtools import get_imlist# load image list and vocabulary#载入图像列表imlist = get_imlist('first1000/')nbr_images = len(imlist)#载入特征列表featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]#载入词汇with open('first1000/vocabulary.pkl', 'rb') as f:voc = pickle.load(f)src = imagesearch.Searcher('testImaAdd.db',voc)# index of query image and number of results to return#查询图像索引和查询返回的图像数q_ind = 0nbr_results = 20# regular query# 常规查询(按欧式距离对结果排序)res_reg = [w[1] for w in src.query(imlist[q_ind])[:nbr_results]]print ('top matches (regular):', res_reg)# load image features for query image#载入查询图像特征q_locs,q_descr = sift.read_features_from_file(featlist[q_ind])fp = homography.make_homog(q_locs[:,:2].T)# RANSAC model for homography fitting#用单应性进行拟合建立RANSAC模型model = homography.RansacModel()rank = {}# load image features for result#载入候选图像的特征for ndx in res_reg[1:]:locs,descr = sift.read_features_from_file(featlist[ndx]) # because 'ndx' is a rowid of the DB that starts at 1# get matchesmatches = sift.match(q_descr,descr)ind = matches.nonzero()[0]ind2 = matches[ind]tp = homography.make_homog(locs[:,:2].T)# compute homography, count inliers. if not enough matches return empty listtry:H,inliers = homography.H_from_ransac(fp[:,ind],tp[:,ind2],model,match_theshold=4)except:inliers = []# store inlier countrank[ndx] = len(inliers)# sort dictionary to get the most inliers firstsorted_rank = sorted(rank.items(), key=lambda t: t[1], reverse=True)res_geom = [res_reg[0]]+[s[0] for s in sorted_rank]print ('top matches (homography):', res_geom)# 显示查询结果imagesearch.plot_results(src,res_reg[:8]) #常规查询imagesearch.plot_results(src,res_geom[:8]) #重排后的结果

运行结果:

test_candidates.py

# -*- coding: utf-8 -*-import picklefrom PCV.imagesearch import imagesearchfrom PCV.localdescriptors import siftfrom sqlite3 import dbapi2 as sqlitefrom PCV.tools.imtools import get_imlist#获取图像列表imlist = get_imlist('first1000/')nbr_images = len(imlist)#获取特征列表featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]#载入词汇f = open('first1000/vocabulary.pkl', 'rb')voc = pickle.load(f)f.close()src = imagesearch.Searcher('testImaAdd.db',voc)locs,descr = sift.read_features_from_file(featlist[0])iw = voc.project(descr)print ('ask using a histogram...')print (src.candidates_from_histogram(iw)[:10])src = imagesearch.Searcher('testImaAdd.db',voc)print ('try a query...')nbr_results = 12res = [w[1] for w in src.query(imlist[0])[:nbr_results]]imagesearch.plot_results(src,res)

运行结果：

原始图片为