python技巧

玖玖

发布于：Oct 25, 2019

2k words

10 min

数据结构

字典排序

res = sorted(a.iteritems(), key = lambda kv:(kv[1], kv[0]),reverse=True)

多字段排序

其实就是将要排序的字段组成一个元组

L = [(12, 12), (34, 13), (32, 15), (12, 24), (32, 64), (32, 11)]
L.sort(key=lambda x: (x[0], -x[1]))

Operator 模块功能允许多级排序。例如，按 grade 排序，然后按 age 排序：

from operator import itemgetter, attrgetter

# 元组、字典类型，使用itemgetter
student_tuples = [ ('dave', 'B', 10), ('john', 'A', 15), ('jane', 'B', 12)]  # 元组类型
student_dicts = [{'name':'dave','grade':'B','age':10},{'name':'john','grade':'A','age':15},{'name':'jane','grade':'B','age':12}] # 字典类型

sorted(student_tuples, key=itemgetter(1,2))
sorted(student_tuples, key=itemgetter('grade', 'age'))
# [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

# 考虑 Student 对象，使用attrgetter
class Student:
    def __init__(self, name, grade, age):
            self.name = name
            self.grade = grade
            self.age = age
    def __repr__(self):
            return repr((self.name, self.grade, self.age))

student_objects = [
    Student('jane', 'B', 12),
    Student('john', 'A', 12),
    Student('dave', 'B', 10),
]
sorted(student_objects, key=attrgetter('grade', 'age'))
# [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

显示字符串原始内容，不进行转义

s=["s\nsdf\r","s\n","d\r\n","sdf\n\r"]
for i in s:
    print repr(i)

默认字典带数据的初始化

 d = defaultdict(lambda: 0, {"1":1 , "0":0 })

python从后向前遍历

for i in range(len(arr)-1, -1, -1):
    print(arr[i])

最后个-1表示逆向，从len(arr)-1也就是数组最后一位开始，到0，因为 range 是开始区间，因此为了取到 0，用-1表示结束位置。

字符串操作

删除特定的字符

问题：

过滤用户输入中前后多余的空白字符：++++abc123---
过滤某windows下编辑文本中的\r：hello world \r\n
去掉文本中unicode组合字符、音调：Zhào Qián Sūn Lǐ Zhōu Wú Zhèng Wáng

解决方案：

去掉两端字符串： strip(), rstrip(), lstrip()

s = '  -----abc123++++       '

# 删除两边空字符
print(s.strip())
# 删除左边空字符
print(s.rstrip())
# 删除右边空字符
print(s.lstrip())
# 删除两边 - + 和空字符
print(s.strip().strip('-+'))

print("北门吹雪:http://www.cnblogs.com/2bjiujiu/")

删除单个固定位置字符：切片 + 拼接

s = 'abc:123'
# 字符串拼接方式去除冒号
new_s = s[:3] + s[4:]
print(new_s)

删除任意位置字符同时删除多种不同字符：replace(), re.sub()

# 去除字符串中相同的字符
s = '\tabc\t123\tisk'
print(s.replace('\t', ''))

print("北门吹雪: http://www.cnblogs.com/2bjiujiu/")

import re
# 去除\r\n\t字符
s = '\r\nabc\t123\nxyz'
print(re.sub('[\r\n\t]', '', s))

多种不同字符映射转换：translate()

s = 'abc123xyz'
# a _> x, b_> y, c_> z，字符映射加密
print(str.maketrans('abcxyz', 'xyzabc'))
# translate把其转换成字符串
print(s.translate(str.maketrans('abcxyz', 'xyzabc')))

去掉unicode字符中音调

#!/usr/bin/python3

import sys
import unicodedata
s = "Zhào Qián Sūn Lǐ Zhōu Wú Zhèng Wáng"
remap = {
    # ord返回ascii值
    ord('\t'): '',
    ord('\f'): '',
    ord('\r'): None
    }
# 去除\t, \f, \r
a = s.translate(remap)
'''
　　通过使用dict.fromkeys() 方法构造一个字典，每个Unicode 和音符作为键，对于的值全部为None
　　然后使用unicodedata.normalize() 将原始输入标准化为分解形式字符
　　sys.maxunicode : 给出最大Unicode代码点的值的整数，即1114111（十六进制的0x10FFFF）。
　　unicodedata.combining:将分配给字符chr的规范组合类作为整数返回。 如果未定义组合类，则返回0。
'''
cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode) if unicodedata.combining(chr(c))) #此部分建议拆分开来理解
b = unicodedata.normalize('NFD', a)
'''
　　　调用translate 函数删除所有重音符
'''
print(b.translate(cmb_chrs))

https://www.cnblogs.com/2bjiujiu/p/7257744.html

文件读写

读取多个文件

with open('file1') as f1, open('file2') as f2, open('file3') as f3:
    for i in f1:
        j = f2.readline()
        k = f3.readline()
        print(i,j,k)

还有一种优雅一点的写法（已过时）：

from contextlib import nested

with nested(open('file1'), open('file2'), open('file3')) as (f1,f2,f3):
    for i in f1:
        j = f2.readline()
        k = f3.readline()
        print(i,j,k)

readline报错 Unicode error

The default for errors is ‘strict’, meaning that encoding errors raise a UnicodeError. Other possible values are ‘ignore’, ‘replace’, ‘xmlcharrefreplace’, ‘backslashreplace’ and any other name registered via codecs.register_error(), see section Error Handlers. For a list of possible encodings, see section Standard Encodings.

In Python 3, pass an appropriate errors= value (such as errors=ignore or errors=replace) on creating your file object (presuming it to be a subclass of io.TextIOWrapper – and if it isn’t, consider wrapping it in one!); also, consider passing a more likely encoding than charmap (when you aren’t sure, utf-8 is always a good place to start).

For instance:

f = open('misc-notes.txt', encoding='utf-8', errors='ignore')

In Python 2, the read() operation simply returns bytes; the trick, then, is decoding them to get them into a string (if you do, in fact, want characters as opposed to bytes). If you don’t have a better guess for their real encoding:

your_string.decode('utf-8', 'replace')

…to replace unhandled characters, or

your_string.decode('utf-8', 'ignore')

to simply ignore them.

遍历文件夹

source
  -- dir
     -- 文件3.txt
  -- 文件1.txt
  -- 文件2.txt

listdir：罗列出指定目录下的一级文件以及目录，不递归进入子文件夹

import os
import sys

filesystemencoding = sys.getfilesystemencoding() # 获取文件系统的编码
files = os.listdir("source/")
for file in files:
    file = file.decode(filesystemencoding).encode("utf-8")
    print file

walk：罗列出指定目录下全部的文件、文件夹

import os
import sys

filesystemencoding = sys.getfilesystemencoding()
for pwd, dirs, files in os.walk("source/"):
    for f in files:
        f = f.decode(filesystemencoding).encode("utf-8")
        print f

写入文件头部

new_data = "ssss"
with open("1.txt", 'r+') as fw:
    old_data = fw.read()    # 读取全部文本内容，此时指针会到文件尾部
    fw.seek(0)                    # 通过seek回到文件头部
    fw.write(new_data)
    fw.write(old_data)

https://blog.csdn.net/junbujianwpl/article/details/73194846

打印显示

在一行内显示进度

Python3版本

import sys
import time

for i in range(0, 100):
    time.sleep(0.1)
    print("\r" + "#" * i, end="")
    sys.stdout.flush()

python2版本，逗号是重点

import sys
import time

for i in range(100):
    time.sleep(0.1)
    print "\r"+"#"*i,   # 重点是这个逗号
    sys.stdout.flush()

format 函数的格式化语言

format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

其他

python 查看所有模块

python

>>> help()
Welcome to Python 3.7's help utility!...........
help> modules

使用其他目录的模块

PROJECT_PATH = path.dirname(path.dirname(path.abspath(__file__))) # 当前文件所在目录的前两级目录
sys.path.append(PROJECT_PATH)  # 添加在环境变量中
from conf import config        # 就可以使用conf目录中的config.py文件了

sys.exit 与 exit、quit 的区别

exit和quit函数

这两个函数的作用，就是直接退出程序，可以带一个参数作为程序的返回码，如果不带参数，默认就是返回0.

xinlin@ubuntu:~/test$ python3 -q
>>> exit(111)
xinlin@ubuntu:~/test$ echo $?
111
xinlin@ubuntu:~/test$ python3 -q
>>> quit(222)
xinlin@ubuntu:~/test$ echo $?
222
xinlin@ubuntu:~/test$ python3 -q
>>> exit()
xinlin@ubuntu:~/test$ echo $?
0
xinlin@ubuntu:~/test$ python3 -q
>>> quit()
xinlin@ubuntu:~/test$ echo $?
0

sys.exit()函数

sys.exit()函数会抛出一个SystemExit异常，Python代码可以捕获这个异常来进行一些程序退出前的清理工作。sys.exit函数同样可以带一个参数来作为程序的退出码，默认是0.

实践中，完整的使用sys.exit函数的逻辑应该是如下这样的代码：

import sys

def main():
    # do something
    sys.exit(123)
    return

if __name__ == '__main__':
    try:
        main()
    except SystemExit as e:
        if str(e) == '123':
            print('---123---')
            exit(123)

使用sys.exit函数退出程序，可以有个异常捕获机制来做清理扫尾的工作，程序会更加灵活健壮。

更新于：Feb 4, 2020

python

推荐网站

推荐一些发现的高质量学习网站 python python 3.6 入门指南 python cookbook 非常多的实践经验 Python 3 标准库实例教程 pandas 1...

开发环境搭建与效率技巧

Mac、linux、window这 3 大系统的开发环境搭建推荐、遇到的相关问题的解决方案记录通用环境Python 环境macOS安装 anaconda 更多请查看 anaco...