Archive for the ‘Programming’ Category
沒有見過比這個更丑的Python程序了
可以大言不慚地吼著說:“我寫的!”
複用不好,很多冗餘代碼。設計不好,接口不明。寫法丑,不夠Pythonic。
唯一的優點是它確實能解決我的問題。數學建模一個題目“最佳陣容”,要算出所有得分會大于某個特定分數的出場陣容。
其實是一個0-1背包問題,但是約束條件比較多,算法自己又不在行,用回溯剪枝寫不出來。Lingo的規劃又只能做出一個解。
還好,動手算了一下,這個組合數也不過百萬級別,這對於現代的電腦完全是小case。所以用最丑的辦法寫了一個出來,當然,還是有點優化的。
程序我也不打算別人看懂,自己以后估計也不會再看,所以就不多解釋了。放這里做個留念,看看自己寫的程序有多丑。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Liu Wei
def combination(items, n=None):
if n is None:
n = len(items)
for i in xrange(len(items)):
v = items[i:i+1]
if n == 1:
yield v
else:
rest = items[i+1:]
for c in combination(rest, n-1):
yield v + c
def permutation(items, n=None):
if n is None:
n = len(items)
for i in xrange(len(items)):
v = items[i:i+1]
if n == 1:
yield v
else:
rest = items[:i] + items[i+i:]
for p in permutation(rest, n-1):
yield v + p
# highest score for each player
scores = [
[9.4, 9.8, 10, 9.5, 9.4, 9.9, 10, 10, 9.4, 9.7],
[10, 9.4, 9.5, 9.9, 9.7, 9.9, 10, 10, 9.8, 9.5],
[9.8, 10, 9.4, 9.7, 9.3, 9.1, 9.3, 9.9, 10, 9.6],
[9.9, 9.6, 10, 10, 9.9, 9.4, 9.8, 9.8, 9.9, 9.8]
]
# final array
array = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]
# sparse array for 6 cols
sparse_array = [
[0, 0, 0, 0, 0, 0], \
[0, 0, 0, 0, 0, 0], \
[0, 0, 0, 0, 0, 0], \
[0, 0, 0, 0, 0, 0], \
]
def __sparse_array_check_valid():
global sparse_array
# 6 cols
for i in xrange(6):
sum = 0
for j in xrange(4):
sum += sparse_array[j][i]
if sum > 3:
return False
return True
def __sparse_array_init():
global sparse_array
sparse_array = [[0, 0, 0, 0, 0, 0], \
[0, 0, 0, 0, 0, 0], \
[0, 0, 0, 0, 0, 0], \
[0, 0, 0, 0, 0, 0], \
]
def __sparse_array_set_row(p, r):
global sparse_array
a, b = p
sparse_array[r][a] = 1
sparse_array[r][b] = 1
def __sparse_array_clear_row(r):
global sparse_array
for i in xrange(len(sparse_array[r])):
sparse_array[r][i] = 0
def sparse_array_output():
global sparse_array
print sparse_array
# generator for sparse arrays
def sparse_array_generator():
global sparse_array
# 4 rows
__sparse_array_init()
for r0 in combination(range(6), 2):
__sparse_array_clear_row(0)
__sparse_array_set_row(r0, 0)
for r1 in combination(range(6), 2):
__sparse_array_clear_row(1)
__sparse_array_set_row(r1, 1)
for r2 in combination(range(6), 2):
__sparse_array_clear_row(2)
__sparse_array_set_row(r2, 2)
for r3 in combination(range(6), 2):
__sparse_array_clear_row(3)
__sparse_array_set_row(r3, 3)
if __sparse_array_check_valid():
yield sparse_array
# sparse_array_output()
def __array_set_col(c):
global array
for i in xrange(len(array)):
array[i][c] = 1
def __array_clean_col(c):
global array
for i in xrange(len(array)):
array[i][c] = 0
def __array_init():
global array
array = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]
def set_allround_player(a):
for i in a:
__array_set_col(i)
def set_individual_player(sa, s):
global array
for i in range(4):
for j in range(6):
array[i][s[j]] = sa[i][j]
def array_generator():
global array
for g in combination(range(10), 4):
# select cols for sparse array in big array
s = [i for i in range(10) if i not in g]
__array_init();
# start to manipulate the array
# first select all-round player
set_allround_player(g)
# then cope with individual event player
for sa in sparse_array_generator():
set_individual_player(sa, s)
yield array
def calc_score(a):
sum = 0
for i in range(len(a)):
for j in range(len(a[0])):
sum += a[i][j] * scores[i][j]
return sum
def final_generator():
for a in array_generator():
total_score = calc_score(a)
if total_score >= 236.2:
yield [a, total_score]
import os
os.chdir("d:/liuw/")
f = file("output", 'a')
for solution, total_score in final_generator():
f.write(str(solution))
f.write('\n')
f.write(str(total_score))
f.write('\n\n')
f.close()
用到了大量的generator,因為怕會內存不足。但是總yield一個全局數組總覺得有點別扭。話說我們這次的做題目用到了Lingo、Python、Matlab和C,果然是程序員的命,全部學計算機的人組個隊還是有好處的。
最後在電腦上跑了5分鐘左右,結果就出來了。假如想再快點,還可以做雙核並行計算。不過自己這方面沒什麽研究,程序也要得比較急,就沒有多管。
發一段以前寫的elisp
無任何實用價值,無聊的時候寫的,做ASCII Art動畫。
(defun cinema (file fps)
(interactive "sFile: \nnFPS: ")
(let ((int-time (/ 1.0 fps)))
(find-file file)
(while t
(scroll-up 25)
(sit-for int-time))))
具體效果見 ASCII Art Bad Apple Emacs 。
Cheet sheet for locking
Pete Zaitcev gives the following summary:
- If you are in a process context (any syscall) and want to lock other process out, use a semaphore. You can take a semaphore and sleep ( copy_from_user* or kmalloc(x,GFP_KERNEL) ).
- Otherwise (== data can be touched in an interrupt), use spin_lock_irqsave() and spin_unlock_irqrestore().
- Avoid holding spinlock for more than 5 lines of code and across any function call (except accessors like readb).
RCU revisited
I used to find some papers on Read-Copy Update mechanism, but I didn’t quite get the point, that the Updater is sure that no reader will hold a reference to old data when all CPU has been scheduled at least once so that the Updater can safely reclaim the old data.
I found the answer in Linux Device Drivers 3rd edition, as quoted.
On the read side, code using an RCU-protected data structure should bracket its references with calls to rcu_read_lock and rcu_read_unlock. as a result, RCU code tends to look like:
struct my_stuff *stuff;
rcu_read_lock();
stuff = find_the_stuff(args…);
do_something_with(stuff);
rcu_read_unlock();The rcu_read_lock call is fast; it disables kernel preemption but does not wait for anything. The code that executes while the read “lock” is held must be atomic. No reference to the protected resource may be used after the call to rcu_read_unlock.
[...snip...]
All that remains is to free the old version. The problem, of course, is that code running on other processors may still have a reference to the older data, so it cannot be freed immediately. Instead, the write code must wait until it knows that no such reference can exist. Since all code holding references to this data stucture must (by the rules) be atomic, we know that once every processor on the system has been scheduled at least once, all references must be gone. So that is what RCU does; it sets aside a callback that waits until all processors have scheduled; that callback is then run to perform the cleanup work.
As the bold text mentioned above, RCU is working mostly by the rules.
locale惹的祸
换了一台新机器去做TPM硬件驱动的开发,于是从远程仓库把代码clone下来。试着把代码编译一次,出问题了,满版的错误,费点时间看了一下,其实是stddef.h没有找到造成的。
新机器和原来的机器都是一样系统,工具链也是一样的,代码也已经在我原来的机器上编译通过了。用find找了一下stddef.h,在机器上也是有的。仔细对比了一下通过和不通过的CFLAGS,发现不通过的情况下,头文件路径中没有gcc的包含路径,问题就出在这里了。
再看Makefile,gcc路径是通过如下的方法取得的。
GCC_INSTALL = $(shell gcc --print-serarch-dirs | sed -n -e 's/install: \(.*\)/\1/p')
使用sed匹配了以install开头的行。
但是,我现在这个系统的locale是中文的,所以“install”没有出现,取而代之的是“安装”,匹配不成功,GCC_INSTALL就成了空值,那么也没有把路径加入。
追踪了半天,没想到是locale出了问题。因为我原来一直用的是英文系统,所以没有遇到过这种情况。所以以后装系统,最好还是用英文的locale,不是说崇洋媚外不喜欢中文,要是遇到我今天的情况,估计还是挺恶心的,看着满版的错误,挺郁闷的。
Python中跟踪函数/方法调用的Decorator
我们的项目需要改Xen用户空间的工具。Xen用户空间的工具是用Python写的,代码量还不小,用人肉方法去跟踪执行路径那是很笨的办法。我们需要trace调用,又不能改变函数原来的行为,那么最好的办法就是用decorator了。上网找一下,已经有人做过这样的工作了,我综合一下,改成下面的样子。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Decorators used to trace function/method calls
# derive from:
# http://wordaligned.org/svn/etc/echo/echo.py
# http://svn.osafoundation.org/chandler/trunk/chandler/util/autolog.py
#
# liuw
# 2010-5-27
import types
# Should use python log facility?
def do_log(s):
print s
def _format_arg_value(arg_val):
arg, val = arg_val
return '%s=%r' % (arg, val)
def log_wrapper(fn, display_name=None):
import functools
if not display_name: display_name = fn.__name__
code = fn.func_code
arg_count = code.co_argcount
arg_names = code.co_varnames[:arg_count]
fn_defaults = fn.func_defaults or list()
arg_defs = dict(zip(arg_names[-len(fn_defaults):], fn_defaults))
@functools.wraps(fn)
def wrapped(*v, **k):
global indent
positional = map(_format_arg_value, zip(arg_names, v))
defaulted = [_format_arg_value((a, arg_defs[a]))
for a in arg_names[len(v):] if a not in k]
nameless = map(repr, v[arg_count:])
keyword = map(_format_arg_value, k.items())
args = positional + defaulted + nameless + keyword
do_log('%s(%s)' % (display_name, ', '.join(args)))
return fn(*v, **k)
return wrapped
def log_class(cls, methodAsFunction=False):
"""Python does not support class decorator, use it this way:
class ClassName:
...
ClassName = log_class(ClassName)
"""
# disallow ALL builtin functions/methods
allow = lambda s: not (s.startswith('__') and s.endswith('__'))
names = cls.__dict__.keys()
for name in names:
if not allow(name): continue
value = getattr(cls, name)
if methodAsFunction and callable(value):
setattr(cls, name, log_wrapper(value))
elif isinstance(value, types.MethodType):
# a normal instance method
if value.im_self == None:
setattr(cls, name, log_wrapper(value))
# class and static method are more complex
# class method
elif value.im_self == cls:
w = log_wrapper(value.im_func,
display_name='%s.%s' % (cls.__name__,
value.__name__))
setattr(cls, name, classmethod(w))
else:
assert False
# static method
elif isinstance(value, types.FunctionType):
w = log_wrapper(value,
display_name='%s.%s' % (cls.__name__,
value.__name__))
setattr(cls, name, staticmethod(w))
return cls
def log_module(m):
# import m if m hasn't been imported
if isinstance(m, str):
d = {}
exec 'import %s' % m in d
import sys
m = sys.modules[m]
names = m.__dict__.keys()
for name in names:
value = getattr(m, name)
if isinstance(value, type):
setattr(m, name, log_class(value))
elif isinstance(value, types.FunctionType):
setattr(m, name, log_wrapper(value))
# ------------------------------TEST----------------------------------
class C1:
@classmethod
def f1(self, a, b=1, c=2):
"""f1"""
pass
@staticmethod
def f2(d , e=3, f=4):
"""f2"""
pass
def f3(self, g):
"""f3"""
pass
C1 = log_class(C1)
@log_wrapper
def test(x):
pass
if __name__ == '__main__':
inst = C1()
inst.f1(10)
inst.f2(20)
inst.f3(1000)
test(321)
functools.wraps这个decorator可以保证被修饰函数/方法的其他attribute与被wrap之前无异。比较让我感慨的是,我参考的有一部分是Chandler的代码,这个项目现在仍健在,实在不易。
散列表长度和素数的关系
看到很多散列表(hash table)的实现中,长度都是一个素数。至于为什么一定要用素数,一下子也没想出所以然来。上网搜索了一下相关的文章,发现很多人也在讨论这个问题。
http://www.cs.unm.edu/~saia/numtheory.html
http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth
http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth-2
http://blog.csdn.net/ilibaba/archive/2009/03/05/3960142.aspx
长度是否要是素数,能不能是合数,这个问题大家的意见得不到统一。支持用素数的人的理由是,素数可以使得散列的分布更加均匀。
自己总结一下看这些文章的心得。一个散列运算可以等价如下式:
hash(k) = mix(k) mod m
散列计算中,求余往往是最后一步,因为没有机器拥有无限的内存,所以必须把结果放入到有限的桶(bucket)中。求余前的运算,假如混淆(mix)得比较好的话,那么m使用素数还是合数,分布是一样的。假如混淆(mix)得不够好的话,那么一个素数可以把一些common pattern加以区别。至于什么样的数据混淆用什么样的mix算法比较好,这个得经统计才可以得出结论。总的来讲,使用一个素数总没有错,虽然运行慢了点,却可以得到相对较好的统计特性。个人看法。
Knuth的The Art of Computer Programming中,也用的是一样的理由推荐用素数的m。手上没有书,他应该没有提到合数的问题。上面也仅仅是我个人的一些想法,没有经过严格的证明,我也没有能力去证明。当然,很欢迎有人来指正我的错误。
手上有《算法导论》,第十一章散列表,对这个问题也没有给出严格的证明。但是它提到这样的一个问题,若m=2^p-1,则所有以2^p为基数的解释的字符串,无论字符位置怎么变化,得出的散列值都是一样的。这个证明比较简单,用同余的性质就可以证明,最后的值,其实是各个字符值的和再对m取余,所以散列值不会随字符的位置发生变化。
UPDATE:
StackOverflow上也有对这个问题的讨论。
http://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus
上面的文章的观点和我总结出来的基本一致。
通过一次malloc生成char**的方法
看到类似这样的代码。
char **arr; int len; arr = produce_array(&len); /* 此时生成了有len个元素的含有内容的数组。用一次free来释放? */ free(arr);
原来看到free()那里,觉得会有问题,我的想法是要为每一个char *分别malloc内存,所以需要有len次free。其实不然,C语言太灵活了,不要被自己原来固有的想法束缚了才好。完全可以通过一次malloc申请到所有的空间,再活用这些空间。
/* 设有 n 个以 NULL 结尾的字符串放在 data 中,它们的总长为 dlen */
char **produce_array(int *len)
{
char **ret;
int i;
*len = n;
ret = malloc(sizeof(char *) * n + dlen);
ret[0] = (char *)(ret + n);
memcpy(ret[0], data, dlen);
for (i = 1; i < n; i++)
ret[i] = ret[i-1] + strlen(ret[i-1]) + 1;
return ret;
}
Milestone 2 is coming
New release, new features.
New stubdom target domb.
Working Dom0 daemon dombd, responsible for communicating with DomB.
Usable domain builder inside DomB.
Coming soon…
Python @staticmethod vs @classmethod
There are two predefined decorators in Python: @staticmethod and @classmethod.
I’m somewhat confused. For someone who has a shallow Java background, static method and class method are just the same.
See this for short.
And see Python PEP and manuals for details.