[雪峰磁针石博客]python3快速入门教程2数据结构2字符串

快速入门

字符串可以包含在单引号或双引号中。

>>> 'spam eggs'  # single quotes
'spam eggs'
>>> 'doesn\'t'  # use \' to escape the single quote...
"doesn't"
>>> "doesn't"  # ...or use double quotes instead
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "\"Yes,\" he said."
'"Yes," he said.'
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'

解释器按照字符串被输入的方式显示字符串,通常包含在单引号中,如果内容包含包含单引号,则包含在双引号中。

print会以更可视的格式显示:

>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'
>>> print('"Isn\'t," she said.')
"Isn't," she said.
>>> s = 'First line.\nSecond line.'  # \n means newline
>>> s  # without print(), \n is included in the output
'First line.\nSecond line.'
>>> print(s)  # with print(), \n produces a new line
First line.
Second line.

字符串前面添加'r'表示原始字符串,里面的反斜杠不会转义:

>>> r'C:\Program Files\foo\bar\'
  File "<stdin>", line 1
    r'C:\Program Files\foo\bar\'
                               ^
SyntaxError: EOL while scanning string literal
>>> r'C:\Program Files\foo\bar''\\'
'C:\\Program Files\\foo\\bar\\'
>>> 

原始字符串不能以单个反斜杠结尾。换而言之,原始字符串的最后一个字符不能是反斜杠,除非你对其进行转义(但进行转义时,用于转义的反斜杠也将是字符串的一部分)如果最后一个字符(位于结束引号前面的那个字符)为反斜杠,且未对其进行转义,Python将无法判断字符串是否到此结束。

跨行的字符串多使用三引号,即三个单引号或者三个双引号:

print("""\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
""")
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

注意第一个三引号后面有反斜杠,就不会输出第一个换行符。末尾的反斜杠表示续行。

字符串可用+操作符连接,用*重复:

>>> 3 * 'un' + 'ium'
'unununium'

相邻字符串文本会自动连接,它只用于字符串文本,不能用于字符串表达式和变量(需要使用加号)等:

>>> 'Py' 'thon'
'Python'
>>> prefix 'thon
  File "<stdin>", line 1
    prefix 'thon
               ^
SyntaxError: EOL while scanning string literal
>>> ('un' * 3) 'ium'
  File "<stdin>", line 1
    ('un' * 3) 'ium'
                   ^
SyntaxError: invalid syntax
>>> prefix + 'thon'
'Python'
# 在拆分长字符串时很有用。
>>> text = ('Put several strings within parentheses '
...             'to have them joined together.')
>>> text
'Put several strings within parentheses to have them joined together.'

字符串下标又称索引和C类似 ,第一个字符索引为 0 。没有独立的字符类型,字符就是长度为 1 的字符串,也可以使用负数,-1表示倒数第一个,-2表示倒数第二个,以此类推。不存在的下标会报IndexError。

>>> word = 'Python'
>>> word[0]  # character in position 0
'P'
>>> word[5]  # character in position 5
'n'
>>> word[-1]  # last character
'n'
>>> word[-2]  # second-last character
'o'
>>> word[-6]
'P'
>>> word[-16]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> word[16]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

字符串支持切片:由两个索引,中间是冒号。第一个索引表示起点,包含该元素,默认为0;第2个索引表示终点,不包含该元素,默认为字符串末尾。s[:i] + s[i:]等同于s。

>>> word[0:2]  # characters from position 0 (included) to 2 (excluded)
'Py'
>>> word[2:5]  # characters from position 2 (included) to 5 (excluded)
'tho'
>>> word[:2] + word[2:]
'Python'
>>> word[:4] + word[4:]
'Python'
>>> word[:2]  # character from the beginning to position 2 (excluded)
'Py'
>>> word[4:]  # characters from position 4 (included) to the end
'on'
>>> word[-2:] # characters from the second-last (included) to the end
'on'

记住切片的工作方式:切片索引是在字符之间。左边第一个字符的索引为0,右界索引为字符串长度n 。例如:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

第一行数字给出字符串正索引点值0...5 。第二行给出相应的负索引。切片是从 i 到 j 两个数值标示的边界之间的所有字符。

对于非负索引,如果两个索引都在边界内,切片长度就是两个索引之差。例如, word[1:3] 是 2 。

切片时,下标溢出不会报错。

>>> word[4:42]
'on'
>>> word[43:42]
''

Python的字符串是不可变。向字符串文本的某一个索引赋值会引发错误:

>>> word[0] = 'J'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

通过联合(加号)可以简单高效的创建字符串。(注,jython中这种操作并不高效)。

>>> 'J' + word[1:]
'Jython'
>>> word[:2] + 'py'
'Pypy'

内置函数len()返回字符串长度:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34
  • 试题

1,下面哪个个字符串定义有错误?

A,r'C:\Program Files\foo\bar' B,r'C:\Program Files\foo\bar\' C, r'C:\Program Files\foo\bar\' D,r'C:\Program Files\foo\bar\\'

2,min('abcd')的结果是?

A,a B,b |C,c D,d

2,max('abcd3A')的结果是? A,a B,3 |C,A D,d

参考资料

links