数据结构

字典排序

res = sorted(a.iteritems(), key = lambda kv:(kv[1], kv[0]),reverse=True) 

多字段排序

其实就是将要排序的字段组成一个元组

L = [(12, 12), (34, 13), (32, 15), (12, 24), (32, 64), (32, 11)]
L.sort(key=lambda x: (x[0], -x[1])) 

Operator 模块功能允许多级排序。 例如,按 grade 排序,然后按 age 排序:

from operator import itemgetter, attrgetter

# 元组、字典类型,使用itemgetter
student_tuples = [ ('dave', 'B', 10), ('john', 'A', 15), ('jane', 'B', 12)]  # 元组类型
student_dicts = [{'name':'dave','grade':'B','age':10},{'name':'john','grade':'A','age':15},{'name':'jane','grade':'B','age':12}] # 字典类型

sorted(student_tuples, key=itemgetter(1,2))
sorted(student_tuples, key=itemgetter('grade', 'age'))
# [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

# 考虑 Student 对象,使用attrgetter
class Student:
    def __init__(self, name, grade, age):
            self.name = name
            self.grade = grade
            self.age = age
    def __repr__(self):
            return repr((self.name, self.grade, self.age))

student_objects = [
    Student('jane', 'B', 12),
    Student('john', 'A', 12),
    Student('dave', 'B', 10),
]
sorted(student_objects, key=attrgetter('grade', 'age'))
# [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

显示字符串原始内容,不进行转义

s=["s\nsdf\r","s\n","d\r\n","sdf\n\r"]
for i in s:
    print repr(i)

默认字典 带数据的初始化

 d = defaultdict(lambda: 0, {"1":1 , "0":0 })

python从后向前遍历

for i in range(len(arr)-1, -1, -1):
    print(arr[i])

最后个-1表示逆向,从len(arr)-1也就是数组最后一位开始,到0,因为 range 是开始区间,因此为了取到 0,用-1表示结束位置。

字符串操作

删除特定的字符

问题:

  1. 过滤用户输入中前后多余的空白字符:++++abc123---

  2. 过滤某windows下编辑文本中的\rhello world \r\n

  3. 去掉文本中unicode组合字符、音调:Zhào Qián Sūn Lǐ Zhōu Wú Zhèng Wáng

解决方案

  1. 去掉两端字符串: strip(), rstrip(), lstrip()

    s = '  -----abc123++++       '
    
    # 删除两边空字符
    print(s.strip())
    # 删除左边空字符
    print(s.rstrip())
    # 删除右边空字符
    print(s.lstrip())
    # 删除两边 - + 和空字符
    print(s.strip().strip('-+'))
    
    print("北门吹雪:http://www.cnblogs.com/2bjiujiu/")
    
  2. 删除单个固定位置字符: 切片 + 拼接

    s = 'abc:123'
    # 字符串拼接方式去除冒号
    new_s = s[:3] + s[4:]
    print(new_s)
    
  3. 删除任意位置字符同时删除多种不同字符:replace(), re.sub()

    # 去除字符串中相同的字符
    s = '\tabc\t123\tisk'
    print(s.replace('\t', ''))
    
    print("北门吹雪: http://www.cnblogs.com/2bjiujiu/")
    
    import re
    # 去除\r\n\t字符
    s = '\r\nabc\t123\nxyz'
    print(re.sub('[\r\n\t]', '', s))
    
  4. 多种不同字符映射转换:translate()

    s = 'abc123xyz'
    # a _> x, b_> y, c_> z,字符映射加密
    print(str.maketrans('abcxyz', 'xyzabc'))
    # translate把其转换成字符串
    print(s.translate(str.maketrans('abcxyz', 'xyzabc')))
    
  5. 去掉unicode字符中音调

    #!/usr/bin/python3
    
    import sys
    import unicodedata
    s = "Zhào Qián Sūn Lǐ Zhōu Wú Zhèng Wáng"
    remap = {
        # ord返回ascii值
        ord('\t'): '',
        ord('\f'): '',
        ord('\r'): None
        }
    # 去除\t, \f, \r
    a = s.translate(remap)
    '''
      通过使用dict.fromkeys() 方法构造一个字典,每个Unicode 和音符作为键,对于的值全部为None
      然后使用unicodedata.normalize() 将原始输入标准化为分解形式字符
      sys.maxunicode : 给出最大Unicode代码点的值的整数,即1114111(十六进制的0x10FFFF)。
      unicodedata.combining:将分配给字符chr的规范组合类作为整数返回。 如果未定义组合类,则返回0。
    '''
    cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode) if unicodedata.combining(chr(c))) #此部分建议拆分开来理解
    b = unicodedata.normalize('NFD', a)
    '''
       调用translate 函数删除所有重音符
    '''
    print(b.translate(cmb_chrs))
    

    https://www.cnblogs.com/2bjiujiu/p/7257744.html

文件读写

读取多个文件

with open('file1') as f1, open('file2') as f2, open('file3') as f3:
    for i in f1:
        j = f2.readline()
        k = f3.readline()
        print(i,j,k)

还有一种优雅一点的写法(已过时):

from contextlib import nested

with nested(open('file1'), open('file2'), open('file3')) as (f1,f2,f3):
    for i in f1:
        j = f2.readline()
        k = f3.readline()
        print(i,j,k)

readline报错 Unicode error

The default for errors is ‘strict’, meaning that encoding errors raise a UnicodeError. Other possible values are ‘ignore’, ‘replace’, ‘xmlcharrefreplace’, ‘backslashreplace’ and any other name registered via codecs.register_error(), see section Error Handlers. For a list of possible encodings, see section Standard Encodings.

In Python 3, pass an appropriate errors= value (such as errors=ignore or errors=replace) on creating your file object (presuming it to be a subclass of io.TextIOWrapper – and if it isn’t, consider wrapping it in one!); also, consider passing a more likely encoding than charmap (when you aren’t sure, utf-8 is always a good place to start).

For instance:

f = open('misc-notes.txt', encoding='utf-8', errors='ignore')

In Python 2, the read() operation simply returns bytes; the trick, then, is decoding them to get them into a string (if you do, in fact, want characters as opposed to bytes). If you don’t have a better guess for their real encoding:

your_string.decode('utf-8', 'replace')

…to replace unhandled characters, or

your_string.decode('utf-8', 'ignore')

to simply ignore them.

遍历文件夹

source
  -- dir
     -- 文件3.txt
  -- 文件1.txt
  -- 文件2.txt

listdir:罗列出指定目录下的一级文件以及目录不递归进入子文件夹

import os
import sys

filesystemencoding = sys.getfilesystemencoding() # 获取文件系统的编码
files = os.listdir("source/")
for file in files:
    file = file.decode(filesystemencoding).encode("utf-8")
    print file

walk:罗列出指定目录下全部的文件、文件夹

import os
import sys

filesystemencoding = sys.getfilesystemencoding()
for pwd, dirs, files in os.walk("source/"):
    for f in files:
        f = f.decode(filesystemencoding).encode("utf-8")
        print f

写入文件头部

new_data = "ssss"
with open("1.txt", 'r+') as fw:
    old_data = fw.read()    # 读取全部文本内容,此时指针会到文件尾部
    fw.seek(0)                    # 通过seek回到文件头部
    fw.write(new_data)
    fw.write(old_data)

https://blog.csdn.net/junbujianwpl/article/details/73194846

打印显示

在一行内显示进度

Python3版本

import sys
import time

for i in range(0, 100):
    time.sleep(0.1)
    print("\r" + "#" * i, end="")
    sys.stdout.flush()

python2版本,逗号是重点

import sys
import time

for i in range(100):
    time.sleep(0.1)
    print "\r"+"#"*i,   # 重点是这个逗号
    sys.stdout.flush()

format 函数的格式化语言

format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

其他

python 查看所有模块

python

>>> help()
Welcome to Python 3.7's help utility!...........
help> modules

使用其他目录的模块

PROJECT_PATH = path.dirname(path.dirname(path.abspath(__file__))) # 当前文件所在目录的前两级目录
sys.path.append(PROJECT_PATH)  # 添加在环境变量中
from conf import config        # 就可以使用conf目录中的config.py文件了

sys.exit 与 exit、quit 的区别

exit和quit函数

这两个函数的作用,就是直接退出程序,可以带一个参数作为程序的返回码,如果不带参数,默认就是返回0.

xinlin@ubuntu:~/test$ python3 -q
>>> exit(111)
xinlin@ubuntu:~/test$ echo $?
111
xinlin@ubuntu:~/test$ python3 -q
>>> quit(222)
xinlin@ubuntu:~/test$ echo $?
222
xinlin@ubuntu:~/test$ python3 -q
>>> exit()
xinlin@ubuntu:~/test$ echo $?
0
xinlin@ubuntu:~/test$ python3 -q
>>> quit()
xinlin@ubuntu:~/test$ echo $?
0

sys.exit()函数

sys.exit()函数会抛出一个SystemExit异常,Python代码可以捕获这个异常来进行一些程序退出前的清理工作sys.exit函数同样可以带一个参数来作为程序的退出码,默认是0.

实践中,完整的使用sys.exit函数的逻辑应该是如下这样的代码:

import sys

def main():
    # do something
    sys.exit(123)
    return

if __name__ == '__main__':
    try:
        main()
    except SystemExit as e:
        if str(e) == '123':
            print('---123---')
            exit(123)

使用sys.exit函数退出程序,可以有个异常捕获机制来做清理扫尾的工作,程序会更加灵活健壮。

评论