forked from zpoint/CPython-Internals
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathenum.md
More file actions
153 lines (107 loc) · 5.54 KB
/
Copy pathenum.md
File metadata and controls
153 lines (107 loc) · 5.54 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# enum
# contents
* [related file](#related-file)
* [memory layout](#memory-layout)
* [example](#example)
* [normal](#normal)
* [en_longindex](#en_longindex)
# related file
* cpython/Objects/enumobject.c
* cpython/Include/enumobject.h
* cpython/Objects/clinic/enumobject.c.h
# memory layout
**enumerate** is a **type**, the instance of **enumerate** object is an iterable object, you can iter through the real delegated object and counting index at the same time

# example
## normal
def gen():
yield "I"
yield "am"
yield "handsome"
e = enumerate(gen())
>>> type(e)
<class 'enumerate'>
before iter through the object **e**, the **en_index** field is 0, **en_sit** stores the actual generator object being iterated, **en_result** points a tuple object with two empty value
we will see the meaning of **en_longindex** later

>>> t1 = next(e)
>>> t1
(0, 'I')
>>> id(t1)
4469348888
now, the **en_index** becomes 1, the tuple in **en_result** is the last tuple object returned, the elements in the tuple are changed, but the address in **en_result** doesn't change, not because of the [free-list mechanism in tuple](https://github.com/zpoint/CPython-Internals/blob/master/BasicObject/tuple/tuple.md#free-list)
it's a trick in the **enumerate** iterating function
static PyObject *
enum_next(enumobject *en)
{
/* omit */
PyObject *result = en->en_result;
/* omit */
if (result->ob_refcnt == 1) {
/* reference count of the tuple object is 1
the only count is from the current enumerate object
since we no longer need the old two element in tuple
we can reset it instead of creating a new one
*/
Py_INCREF(result);
old_index = PyTuple_GET_ITEM(result, 0);
old_item = PyTuple_GET_ITEM(result, 1);
PyTuple_SET_ITEM(result, 0, next_index);
PyTuple_SET_ITEM(result, 1, next_item);
Py_DECREF(old_index);
Py_DECREF(old_item);
return result;
}
/*
reach here, there are other reference to the old tuple
we must create a new one instead of reset it
*/
result = PyTuple_New(2);
if (result == NULL) {
Py_DECREF(next_index);
Py_DECREF(next_item);
return NULL;
}
PyTuple_SET_ITEM(result, 0, next_index);
PyTuple_SET_ITEM(result, 1, next_item);
return result;
}
it's clear, because only the current enumerate object keep a reference to the old `tuple object -> (None, None)` object
**enum_next** reset the 0th element in the tuple to 0, and 1th element in the tuple to 'I', so the address **en_result** points to is the same

because the reference count of the tuple object `(0, 'I') # id(4469348888)` now becomes 2, one from the enumerate object and one from variable t1, the **enum_next** function will create and return a new tuple instead of resetting the one stored in **en_result**
the **en_index** incremented, and **en_result** points to the old tuple object`(0, 'I') # id(4469348888)`
>>> next(e)
(1, 'am')

after `del t1`, the reference count of tuple object `(0, 'I') # id(4469348888)` becomes 1, so **enum_next** will reset the tuple **en_result** pointed to and returns it again
>>> del t1 # decrement the reference count of the object referenced by t1
>>> next(e)
(2, 'handsome')

the termination state is indicated by the object inside the **en_sit** field, nothing changed in the **enumerate** object
>>> next(e)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

## en_longindex
usually, the **index** of enumerate object is stored in the **en_index** field, **en_index** is of type **Py_ssize_t** in c, and **Py_ssize_t** is defined as
#ifdef HAVE_SSIZE_T
typedef ssize_t Py_ssize_t;
#elif SIZEOF_VOID_P == SIZEOF_SIZE_T
typedef Py_intptr_t Py_ssize_t;
#else
# error "Python needs a typedef for Py_ssize_t in pyport.h."
#endif
most time it's a **ssize_t**, which is type **int** in 32-bit os and **long int** in 64-bit os
in my machine, it's type **long int**
what if the index is so big that a signed 64-bit can't hold?
e = enumerate(gen(), 1 << 62)
the **en_index** can hold the value

the max value **en_index** can represent is ((1 << 63) - 1) (PY_SSIZE_T_MAX)
now the actual index is larger than PY_SSIZE_T_MAX, so the **en_longindex** is used to represent the actual index
what **en_longindex** points to is a [PyLongObject(python type int)](https://github.com/zpoint/CPython-Internals/blob/master/BasicObject/long/long.md), which can represent variable size integer
>>> e = enumerate(gen(), (1 << 63) + 100)
