博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
只是一个文件节点类为了项目的数据处理
阅读量:6711 次
发布时间:2019-06-25

本文共 12660 字,大约阅读时间需要 42 分钟。

  已经研究生二年级下学期了,已经为了这个检索项目写了差不多2年代码了,回想大四下学期就开始接触的这个项目,在研一的时候根本不知道科研如何做,而且项目就自己一个人,也是胡乱写了代码,而且心事太多,简直只能用一个词语形容就是混乱。

  但是在大二上学期10月份的时候,随着一位同学加入简直就是可以说这个项目才真正开始。在我们的系统完成后,我便心血来潮整理我之前写过的代码,因为我们要写论文,所以需要做很多的数据处理来完成实验对比部分,其实这部分数据处理我在大一的时候就已经写过类似的代码,结果现在不得不重新再写,因为写的时间比回想代码时候更短,所以我发现好多代码都重复写了,这是我整理代码的初衷。我更加想的是用一个文件树的数据结构+数据处理算法流程去流水化我们数据处理模块,以后数据处理的代码就可以复用,干苦力的总是应该想办法提高自己的工作效率。所以我带着这个想法实现了下面这个类。用Python写的,因为Python做数据处理,字符处理,批处理真的太便利。其实这个类或许只能我自己用,为什么我会写出一个博客来,或许是因为以后我带研一新生做论文的时候我会让他去看回我们所写过的代码。让他去用我们写过的代码,我并没太多时间带一个新生,所以我让他来看我的博客。

  我的数据结构其实就是个多叉树,用来表示文件目录结构。每一个结点其实就是一个文件,并且用栈和队列实现遍历树的算法,实现添加节点的算法。直接上代码了,以后有时间的时候在回来写注释:

import osfrom strOp import strExtfrom collections import dequefrom tblOp import tblConcatclass FileNode:    def __init__(self, _fileName_s='',                 _brothers=None,                 _sons=[],                 _isDir_b=False,                 _parent= None                 ):        self.fileName_s = _fileName_s        self.bro = _brothers        self.sons = _sons        self.isDir_b = _isDir_b        self.parent = _parentdef addNodeUnderPathUnrecur(root, _path_s):    ''' inputs:             root -> the root of directory tree. It must give the root of the d            _path_s -> add the sons under the path of _path_s.                        if _path_s is equal to 'D:\\CS_DATA\\'                        then all the file under it is added as sons of the node named 'CS_DATA'        outputs:            Add all the files under _path_s as its sons. The input must give the root of directory    '''    node = searchNodeFromGivenFilePath(root, _path_s)    filesUnderPath = os.listdir(_path_s)    lenOfFilesUnderPath = len(filesUnderPath)    for i in range(lenOfFilesUnderPath):        if len(node.sons) == 0:            newNode = FileNode(filesUnderPath[i], None, [], os.path.isdir(_path_s+filesUnderPath[i]), node)            node.sons.append(newNode)        else:            newNode = FileNode(filesUnderPath[i], None, [], os.path.isdir(_path_s+filesUnderPath[i]), node)            node.sons[len(node.sons)-1].bro = newNode            node.sons.append(newNode)            #isSameName(node, newNode) file system will ensure that no the same name files exist.def searchNodeFromGivenFilePath(root, _path_s):    ''' inputs:             root -> Must give the root of directory. Meaning the absolute path of a node.            _path_s -> The absolute path of a node. Examples: 'D:\\CS_DATA\\'        output:            Search the directory tree from root to find the node whose fileName_s is equal to 'CS_DATA'.            So, you must give the absolute path. Whether 'D:\\CS_DATA\\' or 'D:\\CS_DATA' would be fine.    '''    if _path_s[-1] != '\\':        _path_s += '\\'        folderStructure = _path_s.split('\\')    if root.bro != None:        print 'input root is not root of file tree'        return    if folderStructure[0] != root.fileName_s:        print 'the head of input path is not same as root'        return    stack = []    stack.append(root)    for i in range(1,len(folderStructure)-1):        if len(stack) == 0:            print 'stack is empty'            break        node = stack.pop()        flag = 0        for j in node.sons:            if folderStructure[i] == j.fileName_s:                stack.append(j)                flag = 1        if flag == 0:            print 'can not find the folder %s' % folderStructure[i]            return None    node = stack.pop()    return nodedef addNodeAsSonFromGivenNode(root, _sonPath_s):    ''' inputs:            root -> The root of the directory. Which directory that you want to add the node.            _sonPath_s -> The absolute path of added node.             Examples: 'D:\\CS_DATA\\tree\\' means add the node named 'tree' to its parent 'CS_DATA'        outputs:            The directory tree with added node.    '''    if _sonPath_s[-1] != '\\':        _sonPath_s += '\\'    fileStructure = _sonPath_s.split('\\')    lenOfFileStructure = len(fileStructure)    if lenOfFileStructure <= 2:        print 'These is not son in the input path %s' % _sonPath_s        return        _sonFileName_s = fileStructure[-2]    _parentPath_s = ''    for i in range(len(fileStructure)-2):        _parentPath_s = _parentPath_s + fileStructure[i] + '\\'    _addNodeAsSonFromGivenNode(root, _parentPath_s, _sonFileName_s)def _addNodeAsSonFromGivenNode(root, _parentPath_s, _sonFileName_s):    ''' inputs:            root -> The root of directory tree.            _parentPath_s -> The absolute path of parent            _sonFileName_s -> the filename of added node        outputs:            This function is a auxiliary function of addNodeAsSonFromGivenNode    '''    if _parentPath_s[-1] != '\\':        _parentPath_s += '\\'        parentNode = searchNodeFromGivenFilePath(root, _parentPath_s)    if parentNode == None:        print 'can not find the parent folder %s' % _parentPath_s        return None    if len(parentNode.sons) == 0:        newNode = FileNode(_sonFileName_s, None, [], os.path.isdir(_parentPath_s+_sonFileName_s), parentNode)        if isSameName(parentNode, newNode):            return        parentNode.sons.append(newNode)    else:        newNode = FileNode(_sonFileName_s, None, [], os.path.isdir(_parentPath_s+_sonFileName_s), parentNode)        if isSameName(parentNode, newNode):            return        parentNode.sons[len(parentNode.sons)-1].bro = newNode        parentNode.sons.append(newNode)def isSameName(parentNode, sonNode):    ''' inputs:            parentNode -> The parent node.            sonNode -> the son node.        outputs:            If sonNode is already in parentNode.sons then return True.    '''    for node in parentNode.sons:        if node.fileName_s == sonNode.fileName_s:            print 'has same node %s\\%s -> %s' % (parentNode.fileName_s, node.fileName_s, sonNode.fileName_s)            return True    return Falsedef addNodeUnderPathRecur(root, _path_s):    ''' inputs:            root -> The root of directory.            _path_s -> The absolute path wanted to be added. Examples: 'D:\\CS_DATA\\'        outputs:            1. Add all the file nodes under _path_s recursively.             2. The _path_s must exist in root.        Unsafe:            1. Some system directory can not be added recursively. Examples: 'D:\\System Volume Information'            2. I do not make the judgment between files whether have same name when adding.            3. So, this function must use in the premise of operation system ensuring the rule for us.    '''    if _path_s[-1] != '\\':        _path_s = _path_s + '\\'        fileStructure = _path_s.split('\\')    if fileStructure[0] == root.fileName_s and len(fileStructure) == 2:        print '_path_s can not be the root'        return        returnNode = currentNode = searchNodeFromGivenFilePath(root, _path_s)    if currentNode == None:        print 'can not find the path'        return    queue = deque([])    fileName_sl = os.listdir(_path_s)    for fileName_s in fileName_sl:        file_s = _path_s + fileName_s        newNode = FileNode(fileName_s, None, [], os.path.isdir(file_s), currentNode)        queue.append(newNode)    while(len(queue) != 0):        newNode = queue.popleft()        currentNode = newNode.parent        lenOfSonsCurrentNode = len(currentNode.sons)        if lenOfSonsCurrentNode == 0:            currentNode.sons.append(newNode)        else:            currentNode.sons[lenOfSonsCurrentNode-1].bro = newNode            currentNode.sons.append(newNode)                if newNode.isDir_b == True:            fullPathOfNewNode = getFullPathOfNode(newNode)            subFileName_sl = os.listdir(fullPathOfNewNode)            for subFileName_s in subFileName_sl:                subNewNode = FileNode(subFileName_s, None, [], os.path.isdir(fullPathOfNewNode+subFileName_s), newNode)                queue.append(subNewNode)    return returnNode        def printBrosOfGivenNode(root, _path_s):    ''' inputs:            root -> The root of the directory.            _path_s -> Examples: 'D:\\CS_DATA' , 'D:\\CS_DATA\\'        outputs:            print out the bros of 'CS_DATA' for 'D:\\CS_DATA'            print out the sons of 'CS_DATA' for 'D:\\CS_DATA\\'    '''    if _path_s[-1] != '\\':        node = searchNodeFromGivenFilePath(root, _path_s)        if node == None:            print 'can not find the node'        parentOfNode = node.parent        headOfSons = parentOfNode.sons[0]        printStr = headOfSons.fileName_s + ','        while(headOfSons.bro != None):            headOfSons = headOfSons.bro            printStr = printStr + headOfSons.fileName_s + ','    else:        node = searchNodeFromGivenFilePath(root, _path_s)        if node == None:            print 'can not find the node'        printStr = ''        if len(node.sons) == 0:            print 'its sons is empty'        else:            for son in node.sons:                printStr = printStr + son.fileName_s + ','    print printStr[:-1]def crtFileTreeFromPath(_path_s):    ''' inputs:            _path_s -> Examples: 'D:\\sketchDataset\\'         outputs:            This function will create the root node by 'D:',            and then, call addNodeUnderPathUnrecur to add files under 'D:\\',            and then, again call addNodeUnderPathUnrecur to add files under 'D:\\sketchDataset\\'            This process is a loop until the last separator of _path_s.    '''    if _path_s[-1] != '\\':        _path_s += '\\'    fileStructure = _path_s.split('\\')    lenOfFileStructure = len(fileStructure)    root = FileNode(_fileName_s=fileStructure[0], _isDir_b=os.path.isdir(fileStructure[0]))        fileStr = root.fileName_s + '\\'    addNodeUnderPathUnrecur(root, fileStr)    for i in range(1, lenOfFileStructure-1):        file_s = fileStructure[i]        fileStr = fileStr + file_s + '\\'        addNodeUnderPathUnrecur(root, fileStr)    return rootdef searchLeafNodeUnderGivenNode(root, _path_s):    ''' inputs:            root -> For the given directory tree.            _path_s -> The absolute path of node that wanted to search all the leafs under it.        outputs:            Return all the leafs under the given _path_s.            Leaf is the file whose has not sons and it is not a directory    '''    node = searchNodeFromGivenFilePath(root, _path_s)    leafs = []    if node == None:        print 'can not find the node in searchLeafNodeUnderGivenNode'        return    queue = deque([])    queue.append(node)    while(len(queue) != 0):        currentNode = queue.popleft()        if len(currentNode.sons) == 0 and (currentNode.isDir_b == False):            leafs.append(currentNode)        else:            for son in currentNode.sons:                queue.append(son)    return leafs        def getFullPathOfNode(givenNode):    '''         find the full(absolute) path of the input node.    '''    tmpNode = givenNode    fullPathOfNode = tmpNode.fileName_s + '\\'    while(tmpNode.parent != None):        tmpNode = tmpNode.parent        fullPathOfNode = tmpNode.fileName_s + '\\' + fullPathOfNode    return fullPathOfNode

 比如我要计算草图检索的验证集,可以上上面的代码后面添加代码:

if __name__ == '__main__':    root = crtFileTreeFromPath('D:\\sketchDataset\\')    categroyNode = addNodeUnderPathRecur(root, 'D:\\sketchDataset\\category\\')    leafs = searchLeafNodeUnderGivenNode(root, 'D:\\sketchDataset\\category\\')    containModel_t = {}    for i in range(len(leafs)):        if leafs[i].parent.fileName_s not in containModel_t:            containModel_t[leafs[i].parent.fileName_s] = []            containModel_t[leafs[i].parent.fileName_s].append(strExt.extractModelIdWithSuffix(leafs[i].fileName_s, suffix_s='.off'))        else:            containModel_t[leafs[i].parent.fileName_s].append(strExt.extractModelIdWithSuffix(leafs[i].fileName_s, suffix_s='.off'))    categroyNode = addNodeUnderPathRecur(root, 'D:\\sketchDataset\\all_categorized_sketches\\')    sketchToCate_t = {}    for son in categroyNode.sons:        sketchNodes = son.sons        for sketchNode in sketchNodes:            sketchName = strExt.extractSketchNameWithSuffix(sketchNode.fileName_s, suffix_s='.txt')            if sketchName not in sketchToCate_t:                sketchToCate_t[sketchName] = son.fileName_s         wanted = tblConcat.concatTableByKey_ValAndVal_Vals(sketchToCate_t, containModel_t)    print wanted

 结果就是,也就是草图165号的验证模型是'm1646.off, m1647.off'等等。

{'s165.txt': ['m1646.off', 'm1647.off', 'm1648.off', 'm1649.off', 'm1650.off', 'm1651.off', 'm1652.off', 'm1653.off', 'm1654.off', 'm1655.off', 'm1656.off', 'm1657.off', 'm1658.off', 'm1659.off', 'm1660.off', 'm1661.off', 'm1662.off', 'm1663.off', 'm1664.off', 'm1665.off'] ......}

 

转载于:https://www.cnblogs.com/Key-Ky/p/4461700.html

你可能感兴趣的文章
html5--6-6 CSS选择器3
查看>>
thinkphp缓存使用
查看>>
cookie和session使用
查看>>
hdu 5480 Conturbatio
查看>>
shell学习之变量、判断、重复动作
查看>>
企业架构研究总结(42)——企业架构与建模之ArchiMate详述(中)
查看>>
Openstack组件实现原理 — Glance架构(V1/V2)
查看>>
python操作数据库
查看>>
【已解决】WebUploader 0.1.5 安卓手机不能访问相机、IOS直接访问相机 的问题
查看>>
关于网络编程
查看>>
索引(转载)
查看>>
STL容器的删除操作
查看>>
socket中的SO_REUSEADDR
查看>>
java创建TXT文件并进行读、写、修改操作
查看>>
hdu 5176 The Experience of Love
查看>>
使用AFNetworking框架遇到的一个经典bug的解决方案
查看>>
【莫队算法】【权值分块】bzoj3236 [Ahoi2013]作业
查看>>
Levmar:Levenberg-Marquardt非线性最小二乘算法
查看>>
集训队日常训练20181110 DIV2 题解及AC代码
查看>>
DOCTYPE 与浏览器渲染模式分析
查看>>