データ構造とアルゴリズム - 期末試験

用語の説明 / Explanation of Terms (40 点)

次の英語の用語に相当する日本語の用語を書いて、簡単に説明しなさい。日本語の用語ではカタカナをできるだけ避けなさい。部分問題 1.11 以降は説明を長くしなさい。
For each of the English terms below, write the corresponding Japanese term and give a short explanation. For the Japanese terms, avoid Katakana as much as possible. For subproblem 1.11 and later, give a longer explanation.

tractable problem: 手に負える問題; 計算量が多項式である問題 (指数的計算量との区別のため)
selection sort: 選択整列; 整列されてない領域から一番小さい項目を選び、先頭に置くことを繰り返す整列法
asymptotic growth: 漸近的な増加; 関数の引数の増加によって関数の結果がおよそどの勢いで増えるか
perfect hash function: 完全ハッシュ関数; あらかじめ決まったデータに対し、激突が起こらないハッシュ関数
brute force algorithm: 総当たりアルゴリズム; 効率的なアルゴリズムとの比較に使われる単純なもの
amortized analysis: 償却分析; アルゴリズムの計算量を操作ごとのではなく全体的に見る分析
simulated annealing: 焼き鈍し法; 「温度を下げる」ふりをする一般的な最適近似法
dictionary: 辞書; データ項目をキーを使って探索、挿入、削除できる抽象データ型
preorder: 行きがけ順、(二分) 木の辿り方で、親を全ての子の前に処理する
cryptographic hash function: 暗号技術的ハッシュ関数; 電子著名などに使う強いハッシュ関数
recurrence relation: 漸化式; 再帰的に定義されている数学的な関数、解けにくい場合がある
@@@
invariant: 不変条件; データ構造など保たされる条件で、挿入や削除などで修復する必要がある
@@@
priority queue: 優先順位キュー; 順位の優先度が一番高い項目を常に先に出す抽象データ型
@@@
chaining: 連鎖法; ハッシュ表で激突回避の方法。ハッシュ関数の結果、同じビンになる項目を連続リストでつなぐ。ハッシュ表の効率を保つため、平均のリストの長さを例えば5に制限する。
abstract datatype: 抽象データ型；データ上の操作だけが定義; 実装はいろいろ
@@@

B 木と B+ 木の比較 / Comparing B-Trees and B+Trees (合計 20 点)

B 木と B+ 木の違いを説明しなさい。 / Explain the difference between B-Trees and B+Trees (5 点)

B 木の場合、全ての層の節はキー、データ、そして他の節への参照をもつ。
B+ 木の場合、データは全部一番下の層に保存され、それより上の層にはキーと他の節への参照しかありません。

ページの大きさが 4096バイト、キーの大きさが12バイト、キーを抜く一項目あたりのデータの大きさが ~~500~~496 バイト、ページ番号の大きさが4バイト、データ項目の数が2¹⁸の場合、 B 木と B+ 木の最低の高さを求めなさい。
For a page size of 4096 bytes, a key size of 12 bytes, data per item of 496 bytes, page numbers of 4 bytes, and 2¹⁶¹⁸ data items, calculate the minimum height of a B-tree and B+tree.

B 木 / B-tree (5 点): 木の高さを最低にするには、各節にできるだけデータなどを詰める必要がある。一節当たり、最大8個のキー、データ、ページ番号が詰められる。各層のデータの数が最大で 2³, 2⁶, 2⁹, 2¹², 2¹⁵ まで5層でまだ足りないので、@@@@合計6層が必要。

B+ 木 / B+tree (5 点): 葉の層にはデータ8項目を詰められるので、葉の数が 2¹⁵ となる。
内部節からは 4096/16=256=2⁸子を参照できるので、
層の大きさは 2⁷、1と続いて、@@@合計3層となる。

なぜ木の高さが重要なのか説明しなさい。 / Explain why the height of the tree is important. (5 点)

B/B+木は外部メモリ (HD/SSD) に使われる。各ノードは外部メモリのページに相当し、一つのページはまとめて読み込むことが可能ですが、一定時間がかかる (HD の場合例えば 100ms)。データを探索するため、木をルートから葉までたどることが必要なので、木の高さが探索の実際にかかる時間に大きな影響を及ぼす。

授業へのコメント / Comment about Course (6 点)

この授業で一番分かりにくかったことを書きなさい。 (決まった正解はありません。)
What was most difficult to understand in this course? (there is no definite answer)

@@@@
@@@@

この授業で一番勉強になったことを書きなさい。 (決まった正解はありません。)
What topic in this course was most interesting for you? (there is no definite answer)

@@@@
@@@@

整列のアルゴリズムの選択 / Selection of Sorting Algorithms (合計 22 点)

下記の三つの事情に合わせて、整列のアルゴリズムを実装するように頼まれた。アルゴリズムを選んで、そのアルゴリズムの名前、仕組み、選んだ理由と実装の祭に特に注意すべき点を記述しなさい。
You have been asked to implement a sorting algorithm for each of three situations below. Select algorithms and give name, workings, the reason for the selection, and points that need special care during implementation.

ライブラリに組み込むために本格的な実装が欲しい A serious implementation is needed, to become part of a library (10 点):: クイックソートを選ぶ。クイックソートはデータを量ではなくできるだけ乱数で選んだ分割要素を境界に再帰的二に分割する。最悪の計算量は O(n²) だが、平均では O(n log n)。他にも同じ計算量のアルゴリズムはありますが、クイックソートは操作が少ないので、その中で一番早い。注意すべき点は特に分割要素の選択 (三項目の中央値や乱数) だが、他に同値の項目の取り扱いやスタックが深すぎてオーバフローになる問題に対応しないと、本格的な実装にならない。
扱うデータ項目が常にほとんど整列された並び順である。そのための素早く実装できる早いアルゴリズムが欲しい The data items are always almost in sorted order. A quick implementation for such situations is needed (6 点):: 挿入ソートを選ぶ。挿入ソートは項目を左から一つ一つその左側にすでに整列できている部分に挿入する。そのため、ほとんど整列されているデータの場合、非常に効果的である。実装もわりと簡単で、注意点が少ない。しかし最悪の場合、計算量は O(n²) なので、本当にデータが常に殆ど整列されているかをよく確認しないといけない。
扱うデータの量が膨大で、現代の計算機のメインメモリに入りきらない The data to be processed is really huge, so that it doesn't fit into the main memory of today's computers (6 点):: マージソートを選ぶ。マージソートはデータを再帰的に分割し、部分ごとに整列し、整列済み部分を順番を考慮して併合するアルゴリズムなので、ハードディスクやテープなど二次メモリが必要な時に最適です。計算量は O(n log n)。部分の整列ですが、メインメモリに入る大きさの部分に分け、その部分にクイックソートなどを使うと全体の効率が上がる。ハードディスクの数に合わせて二分割のではなく、三分割なども考えられる。

アルゴリズムの設計方針 / Algorith Design Strategies (25 点)

下記の課題のため、下記の最初の四つの部分問題のアルゴリズムの設計方針や原理を使って、それぞれアルゴリズムの概略を提案し、予想される計算量とその根拠、設計方針を使った場合の問題点について述べなさい。
For the problem explained below, in the first four subproblems propose ideas for algorithms using the respective algorithm design strategies. In each case, indicate the expected time complexity and the reason for it, and the problems when using this design strategy.

注：この問題の目的は、最適な結果をもたらすアルゴリズムの発見よりも、設計方針の知識の応用である。
Caution: The purpose of this problem is to show you can apply your knowledge about design strategies, rather than to design the best algorithm.

課題: 最長共通部分列問題は、n 個の記号列から、全て共通の一番長い記号列を探す問題です。「部分列」は元の列の順番を維持するが、選んだ記号はもともとの列で連続する必要がありません。 / The longest common subsequence problem finds a longest common subsequence of symbols from n symbol sequences. "Subsequence" means that the order of the symbols in the original sequences is maintained, but these symbols do not have to be contiguous.

分割統治法 / divide and conquer

@@@

貪欲アルゴリズム / greedy algorithm

@@@

遺伝的アルゴリズム / genetic algorithm

@@@

動的計画法 / dynamic programming

@@@

この問題は NP困難問題であると想定される。社長から、「この問題を明日まで絶対に解けないと会社が大変な損失を被る可能性がある。」と言われたときの返事の概略を書いてください。
It is assumed that this problem is NP-hard. Write your answer when your company's president tells you "If we don't solve this problem until tomorrow, our company may suffer a terrible financial loss.".

@@@大変申し訳ございませんが、三色で色を決められるかどうかは NP 問題で、この問題と同様な問題が数多く存在している。この問題を素早く (すなわち、多項式時間で) 解ける方法が万が一見つかったら世紀の大発見になり、一億円以上の賞も貰える。多くの専門家はそれが無理だと思っているが、その証明もいまだできていません。四色や五色なら明日までは問題ありません。

単純な文字列照合アルゴリズム / Simple String Matching Algorithm (合計 22 点)

疑似コード / Pseudocode (12 点)

長さ n の文書 text の中に、長さ m のパターン pattern の最初の出現場所を見つけ、その場所または not_found を返すアルゴリズムの疑似コードを書きなさい。なお、単純なアルゴリズムでも、必要以上遅くならないようなものにすること。
Write the pseudocode for a simple string matching algorithm finding the location of the first occurrence of pattern of length m within text of length n, or returning not_found. Make sure you do not use an overly slow variant of the algorithm.

function simple_string_matching
    inputs: text (length n),
            pattern (length m)

    output: shift or not_found
    
    for i from 0 to n-m
        for j from 0 to m-1
            if text[i+j] ≠ pattern[j]
                break
            end
            if j = m-1
                return i
            end
        end
    end
    return not_found
end

上記のアルゴリズムが極端に遅くなる入力の例とその場合の計算量を書きなさい。
Give an example input where the above algorithm can be extremely slow, and its computational complexity. (4 点)

例えば text が xxxxxxxxxxx....xxxy で、パターンが xxxxxxxxxxy の場合、計算量が O(m·n) になる。

前の部分問題の計算量を回避できるアルゴリズムを選んで、明記したうえでどうやってその問題を回避するかを説明しなさい。/ Name an algorithm that avoids the computational complexity in the previous subproblem, and explain how it can be avoided. (6 点)

Rabin-Carp アルゴリズムを選択します。文書の中のパーターンの長さの部分列のハッシュ関数を一つ前のハッシュ関数の結果から O(1) で計算し、パーターンのハッシュ関数と比較する。ハッシュ関数の設計に気を付ければ、全体の計算量は O(n) となる。

Fibonacci 関数の計算量
Computational Complexity of the Fibonacci Function (合計 17点)

Fibonacci 関数 fib(n) は、引数が大きくなると結果が膨大になることでよく知られています。大きな数値の場合、データの項目数や演算の数だけではなく、一項目のデータの桁数も計算量に考慮する必要があります。このような数値はメモリの一つのワードではなくて、配列で表現します。多くのプログラム言語でそのような膨大な数値の計算を扱うライブラリがあります。
The Fibonacci function fib(n) is known for the fact that as its parameter grows, the result gets really huge. In the case of huge numbers, not only the number of data items and operations, but also the number of digits have to be considered for the computational complexity. Such numbers cannot be represented in a single memory word, but are represented as arrays. Libraries for such operations are available in many programming languages.

Fibonacci 関数は次のように定義されています。 / The Fibonacci function is defined as follows:

fib(0) = 0
fib(1) = 1
fib(n) = fib(n-1) + fib(n-2) [n≥2]

普通の筆算の方法を想定して、二つの k 桁の整数の足し算・引き算の計算量とその理由を書きなさい。 / For two numbers of k digits, assuming long addition/substraction as usual, give and explain the computational complexity of addition/substraction. (2 点)

桁ごとに足し算・引き算して、繰り越しを考慮する必要があるので、計算量は O(k)。

普通の筆算の方法を想定して、二つの k 桁の整数の掛け算の計算量とその理由を書きなさい。 / For two numbers of k digits, assuming long multiplication as usual, give and explain the computational complexity of multiplication. (2 点)

二つの数値の書く桁をそれぞれ掛け合わせる必要があるので、計算量は O(k²)。

上記の fib(n) の定義をそのまま再帰的なプログラムにして、それにメモ化を加え、なおかつ数値の桁数を考慮しない場合、fib(n) の計算量とその説明を書きなさい。 / Converting the above definition of fib(n) directly into a program, and adding memoization, give and explain the computational complexity of fib(n) ignoring the number of digits. (3 点)

0<=x<=n のところの fib(x) の関数の値をそれぞれ一回計算しますので、計算量は O(n)。

Fibonacci 関数の桁数はおおむね n に比例し、十進法の場合では ≤0.3n であると分かる。直前の部分問題で、桁数を考慮するとどうなるか書きなさい。 / It is known that the number of digits of the Fibonacci function is roughly proportional to n, and for decimal numbers, ≤0.3n. Write how and why the answer to the previous subproblem changes when considering the number of digits. (4 点)

n が大きくなるにつれ、結果の桁数が大きくなるので、計算量は O(n*0.3n/2) = O(n²) となる。

Fibonacci 関数について、次の性質が知られている / The following law about the Fibonacci function is known:
fib(a+b) = fib(a) · fib(b+1) + fib(a-1) · fib(b) (∀n≥0, m≥1)
この性質を使った Fibonacci 関数を早く計算するアルゴリズムを提案し、その設計方針や数値の桁数を考慮しない計算量などを書きなさい。 / Using this law, propose a fast algorithm to calculate the Fibonacci function. Give the algorithm design strategy used and the computational complexity ignoring the number of digits. (6 点)

n = a+b で、a と b をほぼ同じ大きさにすると、分割統治法が使えます。a と b を適切に選ぶと、n を半分にする場合、最大3つの Fibonacci 関数を計算することとなる。計算量は O(3^{log₂ n}) 以下になると分かる。

3-SAT 問題と帰着
The 3-SAT Problem and Reduction (合計 12 点)

次の 3-SAT 問題を独立集合問題に帰着しなさい。 / Reduce the following 3-SAT problem to an Independent Set problem. (6 点)

(x ∨ ￢y ∨ z) ∧ (x ∨ ￢y ∨ ￢z) ∧ (￢x ∨ y ∨ ￢z)

帰着と NP 問題の分析においての帰着の役割を説明しなさい。
Explain reduction and its role in the analysis of NP problems. (6 点)

複数の問題の相対的な難しさを調査するときに、ある問題のデータを他の問題へ変更し、解いて、解答をもとの問題に戻すことを帰着と言う。 NP 問題の分析で帰着によって、ほとんどすべての問題の難しさが同じで、それより難しい NP 問題がないということが分かって、このような問題を NP 完全問題と言う。

プログラミング言語によるアルゴリズムの表現
Algorithm Representation using a Programming Language (9 点)

アルゴリズムをプログラミング言語で表現する場合があります。 / Algorithms can be represented by in a Programming Language.

その場合の利点を説明しなさい。 / Describe the advantages of this approach. (2 点)

プログラミン言語は正確性が高く、実行が可能。

その場合の欠点を書きなさい。 / Describe the disadvantages of this approach. (2 点)

プログラミング言語を知る必要があって、場合によって必要以上に細かい。

自分的にアルゴリズムの表現に使いたいプログラミン言語を選んで、明記の上で選んだ理由を書きなさい。
Select the programming language that you would use personally to represent algorithms, give its name, and describe the reasons for your selection. (5 点)

@@@ 決まった正解がないが、ただ「一番知っている」とか「好きだから」では足りません。

青山学院大学

用語の説明 / Explanation of Terms (40 点)

B 木と B+ 木の比較 / Comparing B-Trees and B+Trees (合計 20 点)

授業へのコメント / Comment about Course (6 点)

青山学院大学

整列のアルゴリズムの選択 / Selection of Sorting Algorithms (合計 22 点)

アルゴリズムの設計方針 / Algorith Design Strategies (25 点)

青山学院大学

単純な文字列照合アルゴリズム / Simple String Matching Algorithm (合計 22 点)

Fibonacci 関数の計算量
Computational Complexity of the Fibonacci Function (合計 17点)

青山学院大学

3-SAT 問題と帰着
The 3-SAT Problem and Reduction (合計 12 点)

プログラミング言語によるアルゴリズムの表現
Algorithm Representation using a Programming Language (9 点)

青山学院大学

用語の説明 / Explanation of Terms (40 点)

B 木と B+ 木の比較 / Comparing B-Trees and B+Trees (合計 20 点)

授業へのコメント / Comment about Course (6 点)

青山学院大学

整列のアルゴリズムの選択 / Selection of Sorting Algorithms (合計 22 点)

アルゴリズムの設計方針 / Algorith Design Strategies (25 点)

青山学院大学

単純な文字列照合アルゴリズム / Simple String Matching Algorithm (合計 22 点)

Fibonacci 関数の計算量Computational Complexity of the Fibonacci Function (合計 17点)

青山学院大学

3-SAT 問題と帰着The 3-SAT Problem and Reduction (合計 12 点)

プログラミング言語によるアルゴリズムの表現Algorithm Representation using a Programming Language (9 点)

Fibonacci 関数の計算量
Computational Complexity of the Fibonacci Function (合計 17点)

3-SAT 問題と帰着
The 3-SAT Problem and Reduction (合計 12 点)

プログラミング言語によるアルゴリズムの表現
Algorithm Representation using a Programming Language (9 点)