データ構造とアルゴリズム - 期末試験

計算量 (20 点)

下記左側にある計算量を持つプログラム断片を右側に書きなさい。記述は使い慣れたプログラム言語でも疑似コードでもよい。文法の間違いのではなく、計算量があっているかどうかで採点される。変数の宣言は不要。なお、n*n、n*log(n) などは使用禁止。
For each of the computational complexities below to the left, on the right write a program fragment that has this complexity. You can use a programming language you know well or pseudocode. Not syntax errors, but whether the complexity matches will be graded. You do not need to declare variables. However, you are not allowed to use expressions such as n*n or n*log(n).

例： `O`(`n`)	for (i=0; i<n; i++) sum += i*i;
`O`(`n`²)	for (i=0; i<n; i++) for (j=0; j<n; i++) { sum += i*i; }
`O`(`n`³)	for (i=0; i<n; i++) for (j=0; j<n; j++) for (k=0; k<n; k++) sum += i*j+k;
`O`(log `n`)	for (i=n; i>0; i/=2) { sum += i*i; }
`O`(`n`!)	fun(n) { if (n<=1) sum += n; else for (i=0; i<n; i++) fun(n-1); }
`O`(5ⁿ)	fun(n) { if (n<=1) sum += 1; else for (i=0; i<5; i++) fun(n-1); }

`O` 記法の理由 / The Reason for the Big-O Notation (15 点)

O 記法は定義上いくつかの特徴を持っている。その特徴はなぜアルゴリズムの評価に適しているのかそれぞれ説明しなさい。
Big-O notation by definition has a number of properties. For each property, explain why it is suitable for the evaluation of algorithms.

定数の倍数が無視される。 / A constant multiple is ignored.

色々な計算機の実行速度が違うし、プログラム言語や実装によっても速度の差が出るし、最近は並列化も注目されているが、それによって一定の倍数の差が出るが、それ以上の差が出ません。アルゴリズムの評価では実装の詳細のではなく、アルゴリズム (すなわちアイディア) そのものを評価したいので、定数の倍数を無視した方がいい。

一定数までの差が無視される。 / Difference up to a constant is ignored.

一定の差 (例えば数ミリ秒や数秒) は初期化や準備の実装の差などで出る可能性があるが、データの項目数が増えると全体の実行時間がどんどん伸びるので、一定時間の差はアルゴリズムの比較に必要な根本的な差のではなく、実装の詳細による差に過ぎないので、無視した方がいいです。

対数の底が不要。 / The base of a logarithm is irrelevant.

これは定数の倍数を無視できることの結果である。なぜかというと、対数の場合、底の変更は定数との掛け算で可能である。具体例としては log_ax = log_bx * log_ab である。log_ab は定数なので、無視可能。

B 木と B+ 木の比較 (15 点) / Comparing B-Trees and B+ Trees

次のデータと外部メモリを与えられたと仮定する。 / Assume you have the following data and external memory:
合計の項目の数 / total number of data items: 131072 (= 2¹⁷)
データの大きさ (一項目) / data size (one item): 640 バイト / bytes
ページの大きさ / page size: 2048 バイト / bytes; キーの大きさ / key size: 8 バイト / bytes
ページ番号の大きさ / page number size: 8 バイト / bytes; 最低占有率 / minimum occupancy: 0.5

B+ 木の場合、授業で習ったやり方に沿って途中結果も書きながら、木の最大の高さを計算しなさい。
For a B+ tree, calculate the maximum tree height giving intermediate results the way you learned it in the lecture.

葉には最低2のデータ項目を入れられる。よって、葉の最低の数が 65536。内部節には最低 2048/(8+8)/2 = 64個の子がある。 65536/64 = 1024 (下から2層目); 1024/64= 16 (2層目); 16/64=1 (4層目、根) なので、木の高さは葉を含め最大で 4 である。

B 木の場合、途中結果も書きながら、木の最大の高さを計算しなさい (ヒント: 根から計算してみる)。
For a B tree, calculate the maximum tree height giving intermediate results (hint: try starting from the root).

B 木は B+ 木と違って、葉と内部節の違いがありません。一つの節には最低 1つのデータとその鍵、そして 2つの子のページ番号が入る。ルートのページにデータが 1つ、次の層に 2個 (合計3)、3層目に 4個 (合計 7 = 2³-1)、 4層目に 8個 (合計 15 = 2⁴-1) などなので、合計 2¹¹ 個のためには 2¹²-1 の容量を用意する12層が最大必要。

一つのページにアクセスするに 0.1 秒がかかる場合、二つの木を比べなさい。 / Compare both trees assuming a page access time of 0.1 seconds.

B+ 木の方が階層の数、すなわちデータにアクセスするために取り寄せる必要なページ数が B 木の 1/3 で、時間が 1.2秒から 0.4 秒に減らせる。前者は人間に非常に気付きやすいのに、後者は気付きにくい。

ソートのアルゴリズムの説明 (20 点)

下記の左側にある図はソートのアルゴリズムの可視化の実行中に撮ったものである。図ごとにソートのアルゴリズムの名前、図にみられるアルゴリズムの特徴 (なぜそのアルゴリズムなのかの理由) を述べ、計算量などを含めアルゴリズムを説明しなさい。
The figures below on the left shows the visualization of an in-progress sorting algorithm. For each figure, give the name of the algorithm, the characteristics of the algorithm as they can be seen in the figure (why this figure represents that algorithm), and an explanation of the algorithm including its computational complexity.

ヒープソート。なぜかというと、ヒープソートは配列を使ってヒープ (優先順位キュー) を作成し、一番「優先が高い」(大きい) ものを順次ヒープの根から取り出し、ヒープを小さくしながら取り出したものをヒープの後ろに回している。図のところには完全に整列されている右側 (配列の後ろ) の12個ほどの要素、左側にヒープがみられる。ヒープの根は一番左側で次に右に移動されるものが整列されてないもののなかの一番おおきもので、ヒープの右側の方は、ヒープの特徴のせいで完全ではないが徐々に小さくなる傾向が見られる。計算量が O(n log n)。

選択ソート。左側に全く変更が必要なくてすでに完璧にソート済みの領域ができて、右側に左側の領域の一番大きい要素より大きいものしか残らない。そこから選択ソートが次々と残りの一番小さい要素を選んで、整列済みの領域のすぐ右にある要素と交換し、左側の領域を大きくする。計算量が O((n²) で、他の同じ計算量のバブルソートや挿入ソートよりデータの移動が少ないので、特に一項目あたりのデータの量が大きい時に有利。

挿入ソート。左の部分ではその中が整列済みですが、まだまだ右側の方から左側の部分に挿入する要素が一杯残っている。挿入ソートは整列済みの部分のすぐ右の要素を、適切な場所が見つかるまでに整列済みの要素を右から一つずつ左にすらしながら挿入するアルゴリズム。計算量は O(n²) だが、既に殆ど整列されているデータの場合に非常に有効である。

アルゴリズムの設計方針 (30 点)

下記の課題のため、下記の最初の四つの部分問題のアルゴリズムの設計方針や原理を使って、それぞれアルゴリズムの概略を提案し、予想される計算量とその根拠、設計方針を使った場合の問題点について述べなさい。
For the problem explained below, in the first four subproblems propose ideas for algorithms using the respective algorithm design strategies. In each case, indicate the expected time complexity and the reason for it, and the problems when using this design strategy.

注：この問題の目的は、できるだけ最適な結果のアルゴリズムの発見よりも、設計方針の知識の応用である。
Caution: The purpose of this problem is to show you can apply your knowledge about design strategies, rather than to design the best algorithm.

課題: 地図作製の会社の社長から、ある地図にある n 個の地区の色を、隣り合っている地区が同じ色にならないように、かつできるだけ色の種類が少ないように塗るように依頼されている。地区の数 n は膨大で、百万に近い。
Problem: The CEO of a map-making company asks you to color a map with n areas so that neighboring areas do not have the same color and the total number of colors is as low as possible. The map is very big, with the number n of areas being close to one million.

分割統治法 / divide and conquer

地図を大きく二つの領域に分け、これを再帰的に地区が一つになるまで行う。一つの地区の色を適当に決める。そこから領域を統合する。その時、境界線に沿って色が同じ場所を見つけ、それを検討したうえで片方の領域の色を問題がなくなるように置き換える。計算量が O(n log n) になる可能性が高いが、色の数が最低になる保証がない。

動的計画法 / dynamic programming

隣り合っている地区の二つ組、三つ組、… の順に色を決める。三つ組の色を決める時に、二つ組の色の付け方を考慮する、…。計算量は O(n³) やそれ以上になるが、色の数が最低になる保証がない (分割統治よりはよくなるかも)。

貪欲アルゴリズム / greedy algorithm

地図の片隅からスタートし、一つの地区の色を決める。そこから徐々に隣の地区に移動し、既に色を決めている隣り合っている地区を確認し、それと違う色を選ぶ。地区を「隣の地区の色が決まっている割合」の順に順位キューで管理し、順番を決める。順位キューの関係で計算量が O(n log n) になる。問題は、色の数が最低に抑えられない可能性がある。

遺伝的アルゴリズム / genetic algorithm

色を多く使っても、何個かの解を作っておく。そこから世代ごとに複数の回の組み合わせ、交叉、突然変異を交え、解を増やす。自然淘汰で悪い解を中心に解を減らす。問題は、解を沢山作るに相当時間がかかると解の組み合わせは簡単ではない、のである。

社長に「大事な地図で、絶対に明日までに三色の解を出してくれ。」と言われたときの返事の概略を書いてください。(三色で色付け可能かどうかが NP 問題であることが知られている。) Please write your answer when the CEO tells you "This is a very important map, so you have to find a solution with three colors by tomorrow at all costs.". (It is known that deciding whether a map can be colored with 3 colors or not is an NP problem.)

大変申し訳ございませんが、三色で色を決められるかどうかは NP 問題で、この問題と同様な問題が数多く存在している。この問題を素早く (すなわち、多項式時間で) 解ける方法が万が一見つかったら世紀の大発見になり、一億円以上の賞も貰える。多くの専門家はそれが無理だと思っているが、その証明もいまだできていません。四色や五色なら明日までは問題ありません。

Knuth-Morris-Pratt のアルゴリズム / Knuth-Morris-Pratt Algorithm (18 点)

Knuth-Morris-Pratt のアルゴリズムの実際の文字列照合の部分の疑似コードを書きなさい。入力として text とその長さ t_length, pattern とその長さ p_length、さらに事前計算の結果 next が与えられている。
In pseudocode, write the string matching part of the Knuth-Morris-Pratt algorithm. The inputs are text and its length t_length, pattern and its length p_length, and the result of the precomputation next.

algorithm knuth_morris_pratt
    inputs: text, t_length, (text to be searched and its length)
            pattern, p_length, (pattern to be found and its length)
            next (result of precomputation)
    output: position of pattern in text or not_found
  
    text_index ← pattern_index ← 0                    (initialization)
    
    while text_index < t_length
        if text[text_index] = pattern[pattern_index] (characters match)
            text_index ← text_index + 1              (move by one position)
            pattern_index ← pattern_index + 1
            if pattern_index = pattern_length        (at end of pattern?)
                return text_index - pattern_index    (pattern found)
            end
        else
            pattern_index ← next[pattern_index]
            if pattern_index = -1                    (special case: )
                text_index ← text_index + 1  (move to next character in text)
                pattern_index ← 0            (back to start of pattern)
            end
        end
    end
    return not_found
end

授業へのコメント / Comment about Course (6 点)

この授業で一番分かりにくかったことを書きなさい。 (決まった正解はありません。)
What was most difficult to understand in this course? (there is no definite answer)

@@@@
@@@@

この授業で一番勉強になったことを書きなさい。 (決まった正解はありません。)
What topic in this course was most interesting for you? (there is no definite answer)

@@@@
@@@@

用語の説明 / Explanation of Terms (32 点)

次の英語の用語に相当する日本語の用語を書いて、簡単に説明しなさい。日本語の用語ではカタカナをできるだけ避けなさい。
For each of the English terms below, write the corresponding Japanese term and give a short explanation. For the Japanese terms, avoid Katakana as much as possible.

universal hashing: 万能ハッシュ法; 攻撃しにくいように、ハッシュ関数の詳細を実行ごとに変える方法
reduction: 帰着; ある問題を別の問題に置き換えて解くことで問題の相対的な難しさを示す
decision tree: 決定木; 問題の解決に必要な決定の構造を表す木
balanced tree: 平衡木; 幅や深さの制限である程度バランスを保たれた木構造
simulated annealing: 焼き鈍し法; 解を乱数の変化で、「温度」を提げながら見つける近似アルゴリズム
top-down: 下向き; データ構造や問題を上から下に向かって処理する
average complexity: 平均計算量; 色々な入力を想定した場合の平均的な計算量
open addressing: 解番地法; ハッシュ表の激突対策の一つ、Ruby で去年から採用
brute force algorithm: 腕力法; 効率的なアルゴリズムとの比較に使われる単純なもの
inorder: 通りがけ順、(二分) 木の辿り方で、親を左のこの後、右のこの前に処理する
amortized analysis: 償却分析; アルゴリズムの計算量を操作ごとのではなく全体的に見る分析
asymptotic lower bound: 漸近的下界; 最低でもこのぐらい増えること、アルゴリズムより問題の場合に多く使用
independent set (problem): 独立集合 (問題); NP 問題の一つで、グラフでできるだけ多くの繋がってない頂点の発見
recurrence relation: 漸化式; 再帰的に定義されている数学的な関数、解けにくい場合がある
local optimum: 局所的な最適解; 山登り法という近似アルゴリズムの場合発生する問題
stable sorting: 安定な整列法; 整列基準において同等の様子の順番を入れ替えない整列法

アルゴリズムの表現方法 / Expression of Algorithms (8 点)

プログラミング言語 C をアルゴリズムの表現に使う教科書がある。その選択肢の様々な利点と問題点を説明しなさい。
There are coursebooks that use the programming language C to express algorithms. Explain the various advantages and problems of this choice.

利点 / advantages: プログラミング言語なので、正確で、そのまま (コンパイル後) 実行が可能。C は実行が早いので、他のプログラミン言語より早い。 (コメントなど以外) 自然言語に依存しないので、いろいろな母語の人に読みやすい。

問題点 / problems:プログラミング言語一般でもその問題があるが、 C は特に細かいので、アルゴリズムのアイディアより細かくなりがちで、アルゴリズムそのものが見えなくなる可能性がある。特に変数 (の型) の宣言が必要なので、例えば整列の場合、整数の整列のアルゴリズムがあっても、これを浮動小数点数に応用したい場合書き直さないといけません。

青山学院大学

計算量 (20 点)

`O` 記法の理由 / The Reason for the Big-O Notation (15 点)

B 木と B+ 木の比較 (15 点) / Comparing B-Trees and B+ Trees

青山学院大学

ソートのアルゴリズムの説明 (20 点)

アルゴリズムの設計方針 (30 点)

青山学院大学

Knuth-Morris-Pratt のアルゴリズム / Knuth-Morris-Pratt Algorithm (18 点)

授業へのコメント / Comment about Course (6 点)

用語の説明 / Explanation of Terms (32 点)

アルゴリズムの表現方法 / Expression of Algorithms (8 点)

青山学院大学

計算量 (20 点)

O 記法の理由 / The Reason for the Big-O Notation (15 点)

B 木と B+ 木の比較 (15 点) / Comparing B-Trees and B+ Trees

青山学院大学

ソートのアルゴリズムの説明 (20 点)

アルゴリズムの設計方針 (30 点)

青山学院大学

Knuth-Morris-Pratt のアルゴリズム / Knuth-Morris-Pratt Algorithm (18 点)

授業へのコメント / Comment about Course (6 点)

用語の説明 / Explanation of Terms (32 点)

アルゴリズムの表現方法 / Expression of Algorithms (8 点)

`O` 記法の理由 / The Reason for the Big-O Notation (15 点)