Python String

Python 教學 5 – Python String Objects

1. 宣告 Python String Object

1.1 單引號:’

用來宣告普通的 string object,或是當 string 裡面有雙引號:” 可以使用,基本上和用雙引號:” 宣告一樣,沒辦法用來宣告多行的 string object:

my_string = 'My name is Jimmy'
print(my_string)
# My name is Jimmy

my_string = 'Jimmy is "really" handsome '
print(my_string)
# Jimmy is "really" handsome

1.2 雙引號:”

用來宣告普通的 string object,或是當 string 裡面有單引號:’ 可以使用,基本上和用單引號:’ 宣告一樣,沒辦法用來宣告多行的 string object:

my_string = "My name is Jimmy"
print(my_string)
# My name is Jimmy

my_string = "Jimmy is 'really' handsome "
print(my_string)
# Jimmy is 'really' handsome

1.3 三引號:”’, “””

和以上兩個方法最大的差別在於:可以用來宣告多行的 string object:

my_string = """My name is Jimmy"""
print(my_string)
# My name is Jimmy

my_string = '''Jimmy is 'really' handsome'''
print(my_string)
# Jimmy is 'really' handsome

my_string = """My name is Jimmy
Jimmy is "really" handsome"""
print(my_string)
# My name is Jimmy
# Jimmy is "really" handsome

my_string = '''My name is Jimmy
Jimmy is 'really' handsome'''
print(my_string)
# My name is Jimmy
# Jimmy is 'really' handsome

1.4 str class

可以用 str 這個 class 來 new 出 string object,這個 class 接收幾個參數:

str(object, encoding=encoding, errors=errors)

1.4.1 parameter – object

第一個 parameter 為任何可以轉成 string 的 object:

my_string = str('Hello world')
print(my_string)
# Hello world

my_string = str(True)
print(my_string)
# True

my_string = str(None)
print(my_string)
# None

my_string = str(range(2))
print(my_string)
# range(0, 2)

1.4.2 parameter – encoding

當第一個 argument 傳入的 object 是 bytes,可以用來指定這個 string object 要用什麼方法 decode,預設為 UTF-8:

# Declare a byte object
b = bytes('Café', encoding='utf-8')

# Convert UTF-8 byte object to ASCII
print(str(b, encoding='ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

如果無法 decode 的話,會直接噴錯:UnicodeDecodeError,這時候可以用第三個 argument 來處理 error

1.4.3 parameter – errors

第三個 parameter errors 用來決定當無法 decode 時,要對 character 做什麼錯誤處理,有以下幾個選項:

  • strict – default response which raises a UnicodeDecodeError exception on failure
  • ignore – ignores the unencodable unicode from the result
  • replace – replaces the unencodable unicode to a question mark ?
  • xmlcharrefreplace – inserts XML character reference instead of unencodable unicode
  • backslashreplace – inserts a \uNNNN escape sequence instead of unencodable unicode
  • namereplace – inserts a \N{…} escape sequence instead of unencodable unicode

資料來源:Python String encode() – Programiz

b = bytes('Café', encoding='utf-8')

print(str(b, encoding='ascii', errors='ignore'))
# Caf
print(str(b, encoding='ascii', errors='replace'))
# Caf��
print(str(b, encoding='ascii', errors='backslashreplace'))
# Caf\xc3\xa9

1.5 F-string, f-string (Python 3.6+)

F-string,通常用來 format string(下一節會講到為什麼需要 format string),但也可以用來宣告 string object,用法是 F 或 f 後面加上用引號宣告 string object 的方式:

my_string = F'Hello World'
print(my_string)
# Hello World

my_string = f'Hello World'
print(my_string)
# Hello World

my_string = f"""Hello
World"""
print(my_string)
# Hello
# World

2. Python String Formatting

使用 string formatting – 字串格式化,主要有兩個目的:

  • 讓 string 可以接收 variable
  • 可以直接將不同 type 的 variable 轉換成 string

在 Python 中使用 string formatting 有以下幾個方法:

2.1 % Operator

使用 % operator 可以將 string 內的特定字串用後面的 argument 替換掉,特定字串有:

  • %c – single character
  • %s – string
  • %d, %i – decimal integer ( 十進位整數 )
  • %f – float ( 十進位浮點數 )
  • %o – octal integer ( 八進位整數 )
  • %x – hexadecimal integer using lowercase letters (a-f) ( 十六進位整數 )
  • %X – hexadecimal integer using uppercase letters (A-F) ( 十六進位整數 )
  • %u – unsigned decimal integer
  • %e – exponential notation with a lowercase ‘e’
  • %E – exponential notation with an uppercase ‘E’
  • %g – %f 和 %e 的簡寫
  • %G – %f 和 %E 的簡寫

% 後面可以直接接特定型別的 object 或是 variable,但若是前後型別對不起來,可能會噴錯

my_string = '%d is an integer' % 10
print(my_string)
# 10 is an integer

my_integer = 10
my_string = '%d is an integer' % my_integer
print(my_string)
# 10 is an integer

my_string = '%d is an integer' % 'a'
print(my_string)
# TypeError: %d format: a real number is required, not str

範例:

my_char = 'a'
my_string = '%c is a character' % my_char
print(my_string)
# a is a character

my_int = 10
my_string = '%i is an integer' % my_int
print(my_string)
# 10 is an integer

my_float = 10.5
my_string = '%f is a float' % my_float
print(my_string)
# 10.500000 is a float

my_int = 15
my_string = "%i in octal is %o" % (my_int, my_int)
print(my_string)
# 15 in octal is 17

my_int = 15
my_string = "%i in hex is %x" % (my_int, my_int)
print(my_string)
# 15 in hex is f

可以在 % 後面加上數字,指定這個 object 至少要用幾個 character:

my_city = "Taipei"
my_string = "%s is my home." % my_city
print(my_string)
# Taipei is my home.

my_city = "Taipei"
my_string = "%10s is my home." % my_city
print(my_string) 
#     Taipei is my home.

my_city = "Taipei"
my_string = "%-10s is my home." % my_city
print(my_string)
# Taipei     is my home.

%f 可以在 % 後面加上 .數字,用來表示這個 float 要取多少位(預設 6 位):

my_float = 10.6354687
my_string = "There is a float: %f" % my_float
print(my_string)
# There is a float: 10.635469

my_float = 10.6354687
my_string = "There is a float: %.2f" % my_float
print(my_string)
# There is a float: 10.64

%x 可以在 % 後面加上 0數字,用來表示這個 16 進位的數字要幾個 characters,不夠的補0:

my_int = 15
my_string = "%i in hex is %x" % (my_int, my_int)
print(my_string)
# 15 in hex is f

my_int = 15
my_string = "%i in hex is %04x" % (my_int, my_int)
print(my_string)
# 15 in hex is 000f

2.2 str.format() (Python 2.7+)

Python 2.7 後支援新的 formatting:str.format(),str 裡面用 {} 作為特殊字元,可以在 format() 的 argument 傳入變數或 object,用來替換 str 裡面的 {}:

my_string = 'My name is {}, my age is {}'.format('Jimmy', 10)
print(my_string)
# My name is Jimmy, my age is 10

name = 'Jimmy'
age = 10
my_string = 'My name is {}, my age is {}'.format(name, age)
print(my_string)
# My name is Jimmy, my age is 10

{} 裡面可以放要傳入的 argument:

my_string = 'My name is {1}, my age is {0}'.format(10, 'Jimmy')
print(my_string)
# My name is Jimmy, my age is 10

也可以直接放入 variable name:

my_string = 'My name is {name}, my age is {age}'.format(age = 10, name = 'Jimmy')
print(my_string)
# My name is Jimmy, my age is 10

2.2.1 format 數字

可以用以下方法在數字中加上 , :

'{:,}'.format(1234567890)
# '1,234,567,890'

2.2.2 format datetime

import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
'{:%Y-%m-%d %H:%M:%S}'.format(d)
# '2010-07-04 12:15:58'

2.3 F-string / f-string ( Python 3.6+ )

Python 3.6 後支援 F-string,增加 code 的可閱讀性:

假設今天有個 string 需要有許多 variable 傳入,用 string.format 就會變得比較長:

my_name = 'Jimmy'
my_age = 10
my_city = 'Taipei city'
my_string = 'My name is {my_name}, my age is {my_age}, I\'m living in {my_city}'.format(
    my_name = my_name,
    my_age = my_age,
    my_city= my_city
)
print(my_string)
# My name is Jimmy, my age is 10, I'm living in Taipei city

如果改成 F-string,可以直接將變數放到 string 裡面:

my_name = 'Jimmy'
my_age = 10
my_city = 'Taipei city'
my_string = f'My name is {my_name}, my age is {my_age}, I\'m living in {my_city}'
print(my_string)
# My name is Jimmy, my age is 10, I'm living in Taipei city

也可以直接在 {} 裡面做運算:

my_age = 10
years = 15
my_string = f'My age after {years} years is {my_age + years}'
print(my_string)
# My age after 10 years is 20

2.4 Template Strings (Standard Library)

雖然 f-string 滿方便的,但如果是直接讓使用這輸入 variable 的話,有可能讓使用者有機會取得敏感的資訊:

# 某個敏感變數
SECRET = 'this-is-a-secret'

class Error:
      def __init__(self):
          pass

# 使用者透過自己宣告的 class 來取得全域變數
user_input = '{error.__init__.__globals__[SECRET]}'
err = Error()
print(user_input.format(error=err))
# this-is-a-secret

使用 Template string 可以避免發生這種情況。

user_input = '${error.__init__.__globals__[SECRET]}'
Template(user_input).substitute(error=err)
ValueError:
"Invalid placeholder in string: line 1, col 1"

在使用 Template string 之前,需要先從 string module import Template class,才能使用。用法是將欲輸入的變數前面加上’$’:

from string import Template

s = Template('Hey, $name!')
print(s.substitute(name = 'Jimmy'))
# Hey, Jimmy!

2.5 結論 – 該使用哪種方法做 string formatting

  • string 中的變數是直接從使用者 input 而來 – String Template
  • string 中的變數不是從使用者輸入而來,且 Python 3.6+,使用 f-string
  • string 中的變數不是從使用者輸入而來,且 Python 3.6-,使用 str.format

參考圖:

3. String Operator

3.1 + (Concatenate Operator)

string1 = 'Hello '
string2 = 'World'

print(string1 + string2)
# Hello World
print(string1)
# Hello
print(string2)
# World

3.2 * (Repetition Operator)

string = 'Hi'
print(string * 3)
# HiHiHi

3.3 [] (Slicing Operator)

string = 'Jimmy'

# 取得某個 index 的 char
print(string[0])
# J

# 取得最後一個 char
print(string[-1])
# y

# 取得第 1 個 index 到第 4 個 index 的字串(含頭不含尾)
print(string[1:4])
# imm

# 取得第 1 個 index 以後的字串(含第 1 個)
print(string[1:])
# immy

# 取得第 4 個 index 以前的字串(不含第 4 個)
print(string[:4])
# Jimm

# 反轉字串
print(string[::-1])
# ymmiJ

3.4 in, not in(Membership Operator)

判斷某個 string 是否包含另一個 string:

string = 'Jimmy'
print('J' in string)
# True
print('Ji' in string)
# True
print('j' in string)
# False
print('k' in string)
# False
print('k' not in string)
# True

3.5 \ (Escape Sequence Operator)

用來在 string 中表示特殊字元:

  • \’ – 單引號「’」
  • \”- 雙引號「”」
  • \\ – 反斜線「\」
  • \n – 換行(new line)
  • \r – 游標移到列首
  • \t – tab 鍵
  • \b – backspace
  • \f – 換頁
  • \x – hex value
  • \o – octal value
string = 'Jimmy is handsome \nbut poor'
print(string)
Jimmy is handsome
but poor

string = 'Jimmy is \n\tgreat \n\ttall \n\tpoor'
print(string)
Jimmy is
	great
	tall
	poor

4. String Attribute

使用一些方法取得 string object 的 attribute

4.1 len()

取得 string object 的 length

greet = 'Hello'
len(greet)
# 5

4.2 r””

取得 raw string

string = 'Jimmy is handsome \nbut poor'
print(string)
Jimmy is handsome
but poor

string = r'Jimmy is handsome \nbut poor'
print(string)
Jimmy is handsome \nbut poor

4.3 find

str.find(sub[, start[, end]])

從 str[start, end] 開始找 sub,回傳符合的第一個 index,否則回傳 -1

string = 'Hello World'
print(string.find('W'))
# 6
print(string.find('W', 0, 5))
# -1
print(string.find('or'))
# 7
print(string.find('z'))
# -1

4.4 index

str.index(sub[, start[, end]])

和 find 類似,但找不到會噴錯:ValueError

string = 'Hello World'
print(string.index('W'))
# 6
print(string.index('or'))
# 7
print(string.index('z'))
# ValueError: substring not found

4.5 rfind

str.rfind(sub[, start[, end]])

和 find 類似,但是會回傳符合最後一個的 index,都找不到就回傳 -1

string = 'Hello World'
print(string.rfind('o'))
# 7
print(string.rfind('o', 0, 5))
# 4
print(string.rfind('z'))
# -1

4.6 rindex

str.rindex(sub[, start[, end]])

和 index 類似,但是會回傳符合最後一個的 index,都找不到就噴錯

string = 'Hello World'
print(string.rindex('o'))
# 7
print(string.rindex('or'))
# 7
print(string.rindex('z'))
# ValueError: substring not found

4.7 count

str.count(sub[, start[, end]])

計算 sub 在 str 中出現的次數

string = 'Hello World'
print(string.count('l'))
# 3
print(string.count('l', 0, 5))
# 2

5. String Methods – 操作 (不改變原 object)

介紹一下 string 常用的 built-in methods

5.1 upper

將所有 character 轉成大寫

string = 'hello world'
print(string.upper())
# HELLO WORLD
print(string)
# hello world

5.2 lower

將所有 character 轉成小寫

string = 'Hello WORLD'
print(string.lower())
# hello world
print(string)
# Hello WORLD

5.3 capitalize

將第一個字母大寫

string = 'hello world'
print(string.capitalize())
# Hello world
print(string)
# hello world

5.4 swapcase

轉換大小寫

string = 'Hello World'
print(string.swapcase())
# hELLO wORLD
print(string)
# Hello World

5.5 replace

str.replace(oldnew[, count])

替換 string 中的某些 characters

string = 'Hello World'
print(string.replace('H', ''))
# ello World
print(string.replace('Hello', 'Wow'))
# Wow World
print(string)
# Hello World

5.6 strip

str.strip([chars])

argument 可以傳入一段從 string object 兩邊希望刪除的 string,也可以不傳,default 為 whitespace

'   spacious   '.strip()
'spacious'

'www.example.com'.strip('cmowz.')
'example'

comment_string = '#....... Section 3.2.1 Issue #32 .......'
comment_string.strip('.#! ')
'Section 3.2.1 Issue #32'

5.7 removeprefix (3.9+)

可以移除前綴 string

string = 'HelloWorld'
print(string.removeprefix('Hello'))
# World
print(string.removeprefix('World'))
# HelloWorld
print(string)
# HelloWorld

string = 'tw-text-bold'
print(string.removeprefix('tw-'))
# text-bold
print(string)
# tw-text-bold

5.8 removesuffix (3.9+)

可以移除後綴 string

string = 'HelloWorld'
print(string.removesuffix('Hello'))
# HelloWorld
print(string.removesuffix('World'))
# Hello
print(string)
# HelloWorld

6. String Methods – 轉換成其他 object

6.1 split

str.split(sep=Nonemaxsplit=- 1)

將 string 轉成 list,可以傳入 sep argument,選擇要分隔的符號:

string = 'good great bad'
print(string.split())
# ['good', 'great', 'bad']

string = 'apple,banana,orange'
print(string.split(','))
# ['apple', 'banana', 'orange']

6.2 partition

str.partition(sep)

將 string 以 sep 切成三段,並回傳 tuple:

string = 'I\'m very handsome!'
print(string.partition('very'))
# ("I'm ", 'very', ' handsome!')

6.3 encode

str.encode(encoding='utf-8'errors='strict')

將 str 以指定的 encoding encode 成 bytes

my_bytes = 'hello'.encode()
print(my_bytes)
# b'hello'
print(my_bytes.__class__)
# <class 'bytes'>

7. Iterate string

7.1 for … in

string = 'Hello'
for s in string:
    print(s)
H
e
l
l
o

7.2 for … in with enumerate

除了 value 外,還可以取得 index:

string = 'Hello'
for i, s in enumerate(string):
    print(f'index: {i}, char: {s}')
index: 0, char: H
index: 1, char: e
index: 2, char: l
index: 3, char: l
index: 4, char: o

參考資料

Python str() function – GeeksforGeeks
Built-in Types — Python 3.11.4 documentation
Python String encode() – Programiz
[Python] 字串格式化. 前言| by Tsung-Yu | Tom’s blog – Medium
Python String Formatting Best Practices
Python String Interpolation with the Percent (%) Operator
string — Common string operations — Python 3.11.4 
Examples of String Operators in Python – EDUCBA
Iterating each character in a string using Python – Stack Overflow

如果覺得我的文章有幫助的話,歡迎幫我的粉專按讚哦~謝謝你!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top