[Algorithm/알고리즘] Exercise 3: Algorithm Analysis(알고리즘 분석)

Task 1

Consider an array A with n distinct integers that are sorted in an ascending order and an integer t.

a) The C funtion linear search traverses the integers in A, one after another, from the beginning. If t is found in A linear search returns 1, otherwise 0. Complete the C funtion linear search(int A[], int n, int t) in task1.c file.

int linear_search(int A[], int n, int t) {
    for (int i=0; i<n; i++) {
        if (A[i] == t) {
            return 1;
        }
    }
    return 0;
}

답은 이거이긴 한데, 왜 A[i]가 t보다 클 때 중단하지 않는거지...?

b) The C funtion binary search(int A[], int n, int t) that employs binary search to find integer t in A. Reference the pseudocode for binary search in SL02 to implement the binary search function. If t is found in A binary search returns 1, otherwise 0. Complete the C funtion binary search(int A[], int n, int t) in task1.c file.

int binary_search(int A[], int n, int t) {
    int low=0; int high=n-1; // 배열이므로 0부터 시작해야 함
    int mid; 
    while (low<=high) {
        mid=(low+high)/2; // int이므로 내림하지 않아도 됨
        if (t==A[mid]) {
            return 1;
        } else if (t>A[mid]) {
            low=mid+1;
        } else {
            high=mid-1;
        }
    }
    return 0;
}

c) What are the asymptotic complexity for the C funtions linear search and binary search.

linear serach: O(n)

binary search: O(log2_n)

d) Compile task1.c file. Run your codes with the following parameters for n and t:

n = 1000000, t = 1000000

n = 10000000, t = 10000000

n = 100000000, t = 100000000

Report the run time growth for linear search and binary search, respectively.

// 100,000 실행시
Linear search takes : 6.000000 millseconds
Binary search takes : 0.000000 millseconds

// 1,000,000 실행시
Linear search takes : 52.000000 millseconds
Binary search takes : 0.000000 millseconds

// 10,000,000 실행시
Linear search takes : 505.000000 millseconds
Binary search takes : 0.000000 millseconds

뭐지... Binary 설계 잘못한건가..?

Task 2

Below is a pseudocode of a function named whatDoesItDo, which takes an array A[1..n] of n integers and an integer k as inputs.

Note: In the above pseudocode, for j = i to n by k means we do not increase j by 1, but each time, we increase it by k, i.e., j=j+k.

a) Perform exact analysis of the running time of the algorithm.

Instruction	No. of times executed	Cost
result = -1000	1	c1
for i=1 to n do	n+1	c2
current = 0	n	c3
for j=i to n by k do	n(n-1)/2k + 2n	c4
current = current + A[j]	n(n-1)/2k + n	c5
if current > result then	n	c6
result = current	αn (α ∈ [0, 1])	c7
return result	1	c8

c4의 수행횟수에 대한 설명은 다음과 같다.

i=1, n=10, k=3이라고 생각하면, 1에서 한번 무조건 실행하고, 4, 7, 10에서 실행한 뒤, 13을 체크한다.

따라서 i=1일 때 수행횟수는 1+(n-1)/k+1이다.

그러므로 {1+(n-1)/k+1} + {1+(n-2)/k+1} + ... + {1+(n-n)/k+1} = n(n-1)/2k + 2n이다.

마찬가지로 c5의 수행횟수는 위와 비슷하지만 +1이 없기 때문에 {1+(n-1)/k} + {1+(n-2)/k} + ... + {1+(n-n)/k} = n(n-1)/2k + n이다.

따라서 T(n) = c1 + (n+1)*c2 + n*c3 + {n(n-1)/2k + 2n}*c4 + {n(n-1)/2k + n}*c5 + n*c6+ αn*c7 + c8이다.

b) Determine the asymptotic complexity of the algorithm?

O(n^2)이다.

Task 3

Calculate the asymptotic tight bound for the following functions and rank them by their order of growth (lowest first). Clearly work out the calculation step by step in your solution.

f1(n) ∈ Θ((2n+3)!)

f2(n) = 2logn^2*log6 + logπ + 2logn + n^3 = 4log6*logn + logπ + 2logn + n^3 ∈ Θ(n^3)

f3(n) = n^(log2_4) = n^2 ∈ Θ(n^2)

f4(n) = 12√ n + 10^223 + n*log5 ∈ Θ(n)

f5(n) ∈ Θ(n^4)

f6(n) = (2n+1)*logn = 2n*logn + logn ∈ Θ(nlogn)

f7(n) ∈ Θ(√ n)

f8(n) ∈ Θ(1)

따라서 f8 < f7 < f4 < f6 < f3 < f2 < f5 < f1이다.

Task 4

Consider the algorithm algo1. The input parameters are an array A[1..n] with n distinct integers and k ≤ n.

a) Specify the pre/post conditions of the algo1 algorithm.

먼저 알고리즘이 어떻게 작동하는지 예시로 살펴보자.

A[] = {7,2,4,3}; n = 4; k = 2라고 해보자.

그러면 i=1일 때, maxi = 1, sum = 7, swp = A[1] = 7, A[1] = A[1], A[1] = A[1]이므로 A[] = {7,2,4,3} 그대로이고,

i=2일 때, maxi = 3, sum = 7+4, swp = A[2], A[2] = A[3], A[3] = A[3]이므로 A[] = {7,4,2,3}이 된다.

즉, n개의 배열 중 k번째 큰 값까지의 합을 리턴한다.

참고로 pre condition은 input을 가리키는 것이고, post condition은 output을 가리키는 것이다.

따라서 pre condition은 array A[1...n]과 k이고,

post condition은 the sum of those first k-largest elements이다.

b) For the two for loops in the algorithm:

i. Determine if the loop is up loop or down loop.

두 루프 모두 작은 수에서 큰 수로 진행되는 up loop이다.

ii. Determine the invariants of these two loops and verify whether they are hold in three stages: initialization, maintenance and termination.

먼저 안쪽 루프를 살펴보자.

모든 j 즉 i부터 n까지의 경우, A[maxi] >= A[j]이다. 따라서 invariants는 ∀k ∈ [i..n] : A[maxi] ≥ A[k]

Initialization: j=i이므로 배열 A[i..j]는 오직 A[i] = A[maxi]이다. 따라서 A[maxi]가 가장 큰 요소이다.

⇒ When j = i, A[i, j] contains only one element A[i]. At this moment (before the loop starts), we have maxi = i and A[maxi] is the only, and the biggest element in A[i...i].

Maintenance: i < j < n인 경우, i와 j-1 사이의 모든 k에 대해 A[maxi] >= A[k]이다. 만약 A[j] > A[maxi]이면 maxi는 j가 된다. 그러면 i와 j 사이의 모든 m에 대해 A[maxi] >= A[m]이다.

⇒ When i < j < n, A[maxi] ≥ A[k] ∀k ∈ [i..j − 1]. If A[j] > A[maxi], then maxi is assigned to be j. It follows that A[maxi] ≥ A[m], ∀m ∈ [i...j].

Termination: j=n일때 루프는 끝난다. 동일하게 만약 A[j] > A[maxi]이면 maxi는 j가 된다. 그러면 i와 n 사이의 모든 m에 대해 A[maxi] >= A[m]이다.

⇒ The inner loop terminates when j = n. At this time, if A[j] > A[maxi], then maxi is assigned to be n. It follows that A[maxi] ≥ A[m], ∀m ∈ [i...n]. In other words, A[maxi] is the biggest element in A[i...n].

이어서 바깥쪽 루프를 살펴보자.

참고로 이 알고리즘은 위에서 살펴봤듯이 k번째 큰 값까지 리턴하는 알고리즘이다.

따라서 모든 1부터 k번째 배열이 이후 배열보다 크다. 따라서 invariants는 ∀q ∈ [i..k], ∀p ∈ [k+1..n] : A[q] >= A[p]

Initialization: i=1이므로 안쪽 루프에서는 가장 큰 값을 A[maxi]로 내게 되고 그 값이 A[1]이 된다. 따라서 2와 n 사이의 모든 p에 대해 A[1] >= A[p]가 된다.

⇒ When i = 1, inner loop terminates and gives the largest number in A[1...n], and maxi = i. The swap operation moves A[maxi] to A[1]. It follows that p ∈ [2...n], A[1] ≥ A[p]. Hence ∀q ∈ [1..i], ∀p ∈ [i + 1...n], A[q] ≥ A[p] holds true when i = 1.

Maintenance: 1 < i < k에 대하여, A[1..i-1]까지는 내림차순으로 정렬되어 있다. 따라서 1부터 i-1번째 요소가 i부터 n번째 요소보다 항상 같거나 크다. 그리고 안쪽 루프에 의해 i번째 요소는 i부터 n번째 중 가장 큰 값을 가져오기 때문에, 1부터 i번째의 요소가 i+1부터 n번째까지의 요소보다 항상 같거나 크다.

⇒ When 1 < i < k, we have top i − 1 largest numbers that are sorted in descending order in A[1...i − 1]. We have ∀q ∈ [1..i − 1], ∀p ∈ [i...n], A[q] ≥ A[p] Inner loops terminates and guarantees that A[i] is the largest element in A[i...n] and ∀q ∈ [1..i − 1], A[q] ≥ A[i]. Hence it also holds true that ∀q ∈ [1..i], ∀p ∈ [i + 1...n], A[q] ≥ A[p]

Termination: 루프는 i=k일 때 끝난다. 위와 동일하게 1부터 k-1번째 요소가 k부터 n번째 요소보다 항상 같거나 큰데, 안쪽 루프는 k부터 n번째 요소 중 가장 큰 값을 k번째로 내보내기 때문에 참이다.

⇒ The loop terminates when i = k, Inner loops terminates and guarantees that A[k] is the largest element in A[k...n] and ∀q ∈ [1..k − 1], A[q] ≥ A[k]. It follows ∀q ∈ [1..k], ∀p ∈ [k + 1...n], A[q] ≥ A[p].

c) Identify some edges cases of the algorithm and verify if the algorithm has the correct output.

경계 조건은 영어로 edge case로서, 하나의 매개변수 값이 극단적인 최대값 또는 최소값이어서 로직에 문제가 발생할 수 있는 경우를 말한다.
출처: https://knight76.tistory.com/

(1) 만약 배열 A[1..n]이 비어있다면?

sum=0를 리턴한다

(2) 만약 배열 A[1..n]의 요소가 하나라면?

바깥쪽 루프가 한 번 실행되는데 그 A[1]의 값을 0에 더해서 리턴한다.

(3) 만약 k=n이라면?

모든 값의 합을 리턴한다.

d) Conduct an exact analysis of the running time of algorithm algo1.

Instruction	No. of time executed	Cost
sum = 0	1	c1
for i=1 to k do	k+1	c2
maxi = i	k	c3
for j=i to n do	(2kn-k^2+k)/2	c4
if A[j] > A[maxi] then	(2kn-k^2-k)/2	c5
maxi = j	α*(2kn-k^2-k)/2 (0<= α <=1])	c6
sum = sum + A[maxi]	k	c7
swp = A[i]	k	c8
A[i] = A[maxi]	k	c9
A[maxi] = swp	k	c10
return sum	1	c11

c4의 수행횟수: (n+1) + (n) + ... + (n-k+2) = (n+1)n/2 - (n-k+1)(n-k)/2 = (2kn-k^2+k)/2

c5의 수행횟수: (n) + (n-1) + ... + (n-k+1) = n(n-1)/2 - (n-k)(n-k-1)/2 = (2kn-k^2-k)/2

T(n) = c1 + c2*(k+1) + c3*k + c4*(2kn-k^2+k)/2 + c5*(2kn-k^2-k)/2 + c6*α*(2kn-k^2-k)/2 + (c7+c8+c9+c10)*k + c11

e) Determine the best and the worst case of the algorithm. What is the running time and asymptotic complexity in each case?

최선의 경우에도 최악의 경우에도 다른 부분은 차이가 없고 오직 if문에서만 차이가 발생한다.

즉 최선의 경우에는 α가 0인 거고, 최악의 경우에는 α가 1인 거다.

하지만 이미 c5에서 kn이 있기 때문에 두 경우 모두 O(kn)이다.

기출문제

Assume f1(n) = O(1), f2(n) = O(N^2), and f3(n) = O(NlogN). From these complexities it follows that f1(n) + f2(n) + f3(n) = O(NlogN).

N^2이 가장 큰 영향을 미치므로 False.

단어장
traverse 가로지르다, 횡단하다

'Master in Data Science > Informatics 2' 카테고리의 다른 글

[Algorithm/알고리즘] Divide and Conquer, Recurrences(분할정복, 점화식) (0)	2023.04.03
[Algorithm/알고리즘] Algorithm Analysis(알고리즘 분석) (0)	2023.03.28
[Algorithm/알고리즘] Exercise 2: Recursion(재귀) (0)	2023.03.28
[Algorithm/알고리즘] Exercise 1: Sorting(정렬) (0)	2023.03.27
[Algorithm/알고리즘] 정렬(Sorting)과 재귀(Recursion) (0)	2023.03.21