1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 """
40 Provides the implementation for various knapsack algorithms.
41
42 Knapsack algorithms are "fit" algorithms, used to take a set of "things" and
43 decide on the optimal way to fit them into some container. The focus of this
44 code is to fit files onto a disc, although the interface (in terms of item,
45 item size and capacity size, with no units) is generic enough that it can
46 be applied to items other than files.
47
48 All of the algorithms implemented below assume that "optimal" means "use up as
49 much of the disc's capacity as possible", but each produces slightly different
50 results. For instance, the best fit and first fit algorithms tend to include
51 fewer files than the worst fit and alternate fit algorithms, even if they use
52 the disc space more efficiently.
53
54 Usually, for a given set of circumstances, it will be obvious to a human which
55 algorithm is the right one to use, based on trade-offs between number of files
56 included and ideal space utilization. It's a little more difficult to do this
57 programmatically. For Cedar Backup's purposes (i.e. trying to fit a small
58 number of collect-directory tarfiles onto a disc), worst-fit is probably the
59 best choice if the goal is to include as many of the collect directories as
60 possible.
61
62 @sort: firstFit, bestFit, worstFit, alternateFit
63
64 @author: Kenneth J. Pronovici <pronovic@ieee.org>
65 """
66
67
68
69
70
71
72
73
74
76
77 """
78 Implements the first-fit knapsack algorithm.
79
80 The first-fit algorithm proceeds through an unsorted list of items until
81 running out of items or meeting capacity exactly. If capacity is exceeded,
82 the item that caused capacity to be exceeded is thrown away and the next one
83 is tried. This algorithm generally performs more poorly than the other
84 algorithms both in terms of capacity utilization and item utilization, but
85 can be as much as an order of magnitude faster on large lists of items
86 because it doesn't require any sorting.
87
88 The "size" values in the items and capacity arguments must be comparable,
89 but they are unitless from the perspective of this function. Zero-sized
90 items and capacity are considered degenerate cases. If capacity is zero,
91 no items fit, period, even if the items list contains zero-sized items.
92
93 The dictionary is indexed by its key, and then includes its key. This
94 seems kind of strange on first glance. It works this way to facilitate
95 easy sorting of the list on key if needed.
96
97 The function assumes that the list of items may be used destructively, if
98 needed. This avoids the overhead of having the function make a copy of the
99 list, if this is not required. Callers should pass C{items.copy()} if they
100 do not want their version of the list modified.
101
102 The function returns a list of chosen items and the unitless amount of
103 capacity used by the items.
104
105 @param items: Items to operate on
106 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer
107
108 @param capacity: Capacity of container to fit to
109 @type capacity: integer
110
111 @returns: Tuple C{(items, used)} as described above
112 """
113
114
115 included = { }
116
117
118 used = 0
119 remaining = capacity
120 for key in items.keys():
121 if remaining == 0:
122 break
123 if remaining - items[key][1] >= 0:
124 included[key] = None
125 used += items[key][1]
126 remaining -= items[key][1]
127
128
129 return (included.keys(), used)
130
131
132
133
134
135
137
138 """
139 Implements the best-fit knapsack algorithm.
140
141 The best-fit algorithm proceeds through a sorted list of items (sorted from
142 largest to smallest) until running out of items or meeting capacity exactly.
143 If capacity is exceeded, the item that caused capacity to be exceeded is
144 thrown away and the next one is tried. The algorithm effectively includes
145 the minimum number of items possible in its search for optimal capacity
146 utilization. For large lists of mixed-size items, it's not ususual to see
147 the algorithm achieve 100% capacity utilization by including fewer than 1%
148 of the items. Probably because it often has to look at fewer of the items
149 before completing, it tends to be a little faster than the worst-fit or
150 alternate-fit algorithms.
151
152 The "size" values in the items and capacity arguments must be comparable,
153 but they are unitless from the perspective of this function. Zero-sized
154 items and capacity are considered degenerate cases. If capacity is zero,
155 no items fit, period, even if the items list contains zero-sized items.
156
157 The dictionary is indexed by its key, and then includes its key. This
158 seems kind of strange on first glance. It works this way to facilitate
159 easy sorting of the list on key if needed.
160
161 The function assumes that the list of items may be used destructively, if
162 needed. This avoids the overhead of having the function make a copy of the
163 list, if this is not required. Callers should pass C{items.copy()} if they
164 do not want their version of the list modified.
165
166 The function returns a list of chosen items and the unitless amount of
167 capacity used by the items.
168
169 @param items: Items to operate on
170 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer
171
172 @param capacity: Capacity of container to fit to
173 @type capacity: integer
174
175 @returns: Tuple C{(items, used)} as described above
176 """
177
178
179 included = { }
180
181
182 itemlist = items.items()
183 itemlist.sort(lambda x,y: cmp(y[1][1], x[1][1]))
184 keys = []
185 for item in itemlist:
186 keys.append(item[0])
187
188
189 used = 0
190 remaining = capacity
191 for key in keys:
192 if remaining == 0:
193 break
194 if remaining - items[key][1] >= 0:
195 included[key] = None
196 used += items[key][1]
197 remaining -= items[key][1]
198
199
200 return (included.keys(), used)
201
202
203
204
205
206
208
209 """
210 Implements the worst-fit knapsack algorithm.
211
212 The worst-fit algorithm proceeds through an a sorted list of items (sorted
213 from smallest to largest) until running out of items or meeting capacity
214 exactly. If capacity is exceeded, the item that caused capacity to be
215 exceeded is thrown away and the next one is tried. The algorithm
216 effectively includes the maximum number of items possible in its search for
217 optimal capacity utilization. It tends to be somewhat slower than either
218 the best-fit or alternate-fit algorithm, probably because on average it has
219 to look at more items before completing.
220
221 The "size" values in the items and capacity arguments must be comparable,
222 but they are unitless from the perspective of this function. Zero-sized
223 items and capacity are considered degenerate cases. If capacity is zero,
224 no items fit, period, even if the items list contains zero-sized items.
225
226 The dictionary is indexed by its key, and then includes its key. This
227 seems kind of strange on first glance. It works this way to facilitate
228 easy sorting of the list on key if needed.
229
230 The function assumes that the list of items may be used destructively, if
231 needed. This avoids the overhead of having the function make a copy of the
232 list, if this is not required. Callers should pass C{items.copy()} if they
233 do not want their version of the list modified.
234
235 The function returns a list of chosen items and the unitless amount of
236 capacity used by the items.
237
238 @param items: Items to operate on
239 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer
240
241 @param capacity: Capacity of container to fit to
242 @type capacity: integer
243
244 @returns: Tuple C{(items, used)} as described above
245 """
246
247
248 included = { }
249
250
251 itemlist = items.items()
252 itemlist.sort(lambda x,y: cmp(x[1][1], y[1][1]))
253 keys = []
254 for item in itemlist:
255 keys.append(item[0])
256
257
258 used = 0
259 remaining = capacity
260 for key in keys:
261 if remaining == 0:
262 break
263 if remaining - items[key][1] >= 0:
264 included[key] = None
265 used += items[key][1]
266 remaining -= items[key][1]
267
268
269 return (included.keys(), used)
270
271
272
273
274
275
277
278 """
279 Implements the alternate-fit knapsack algorithm.
280
281 This algorithm (which I'm calling "alternate-fit" as in "alternate from one
282 to the other") tries to balance small and large items to achieve better
283 end-of-disk performance. Instead of just working one direction through a
284 list, it alternately works from the start and end of a sorted list (sorted
285 from smallest to largest), throwing away any item which causes capacity to
286 be exceeded. The algorithm tends to be slower than the best-fit and
287 first-fit algorithms, and slightly faster than the worst-fit algorithm,
288 probably because of the number of items it considers on average before
289 completing. It often achieves slightly better capacity utilization than the
290 worst-fit algorithm, while including slighly fewer items.
291
292 The "size" values in the items and capacity arguments must be comparable,
293 but they are unitless from the perspective of this function. Zero-sized
294 items and capacity are considered degenerate cases. If capacity is zero,
295 no items fit, period, even if the items list contains zero-sized items.
296
297 The dictionary is indexed by its key, and then includes its key. This
298 seems kind of strange on first glance. It works this way to facilitate
299 easy sorting of the list on key if needed.
300
301 The function assumes that the list of items may be used destructively, if
302 needed. This avoids the overhead of having the function make a copy of the
303 list, if this is not required. Callers should pass C{items.copy()} if they
304 do not want their version of the list modified.
305
306 The function returns a list of chosen items and the unitless amount of
307 capacity used by the items.
308
309 @param items: Items to operate on
310 @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer
311
312 @param capacity: Capacity of container to fit to
313 @type capacity: integer
314
315 @returns: Tuple C{(items, used)} as described above
316 """
317
318
319 included = { }
320
321
322 itemlist = items.items()
323 itemlist.sort(lambda x,y: cmp(x[1][1], y[1][1]))
324 keys = []
325 for item in itemlist:
326 keys.append(item[0])
327
328
329 used = 0
330 remaining = capacity
331
332 front = keys[0:len(keys)/2]
333 back = keys[len(keys)/2:len(keys)]
334 back.reverse()
335
336 i = 0
337 j = 0
338
339 while remaining > 0 and (i < len(front) or j < len(back)):
340 if i < len(front):
341 if remaining - items[front[i]][1] >= 0:
342 included[front[i]] = None
343 used += items[front[i]][1]
344 remaining -= items[front[i]][1]
345 i += 1
346 if j < len(back):
347 if remaining - items[back[j]][1] >= 0:
348 included[back[j]] = None
349 used += items[back[j]][1]
350 remaining -= items[back[j]][1]
351 j += 1
352
353
354 return (included.keys(), used)
355