Knowledge World: B.Sc. IT BT0066 (Semester 1, Database Management Systems) Assignment

Spring 2012

Bachelor of Science in Information Technology (BScIT) – Semester 1/

Diploma in Information Technology (DIT) – Semester 1

BT0066 – Database Management Systems – 3 Credits (Book ID: B0950)

Assignment Set – 1 (60 Marks)

1. Differentiate between physical data independence and logical data independence.

Ans.- Physical data independence:-Physical data independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the conceptual view or any of the external views and hence in the application programs using the database. Thus, the files may migrate from one type of physical media to another or the file structure may change without any need for changes in the application programs. Logical data independence implies that application programs need not be changed if fields are added to an existing record; nor do they have to be changed if fields not used by application programs are deleted.

Logical data independence: -Logical data independence indicates that the conceptual schema can be changed without affecting the existing external schemas. Data independence is advantageous in the database environment, since it allows for changes at one level of the database, without affecting other levels. These changes are absorbed by the mappings between the levels. Logical data independence is more difficult to achieve than physical independence. Since application programs are heavily dependent on the logical structure of the data they access.

2. Explain the three level architecture of DBMS.

Ans. - These three levels are the external level, the conceptual level and the internal level.

External Level or Subschema:-The external level is at the highest level of database abstraction where only those portions of the database of concern to a user or application program are included. Any number of user views (some of which may be identical) may exist for a given global or conceptual view.

Conceptual Level or Conceptual Schema:-At this level of database abstraction all the database entities and the relationships among them are included. One conceptual view represents the entire database. This conceptual view is defined by the conceptual schema. It describes all the records and relationships included in the conceptual view and, therefore, in the database. There is only one conceptual schema per database. This schema also contains the method of deriving the objects in the conceptual view from the objects in the internal view.

Internal Level or Physical Schema: - We find this view at the lowest level of abstraction, closest to the physical storage method used. It indicates how the data will be stored and describes the data structures and access methods to be used by the database. The internal view is expressed by the internal schema, which contains the definition of the stored record, the method of representing the data fields, and the access aids used.

3. Explain the distinctions among the terms primary key, candidate key, and super key.

Ans: - Super keys- A super key is a set of one or more attributes that, taken collectively, allow us to uniquely identify a tuple in the relation.E.g.—the customer-id attribute of the relation customer is sufficient to distinguish one customer tuple from another. Thus customer-id is a super key. Also the combination of customer-name and customer-id is a superkey.Customer name cannot be a super key since several people may have same name. A super key may consist of extraneous attributes.

Candidate keys- Super keys for which no proper subset is a super key are called candidate keys. Several distinct sets of attributes can serve as a candidate key. Suppose the combination {customer-name, customer-street} uniquely identifies a relation in a table.
{Customer-id} can also identify a relation uniquely. Then, the combination {customer-name, customer-id} is not a candidate key because customer-id itself alone is a candidate key.

Primary key- It is a candidate key that is chosen by the database designer as the principle means of identifying tuples within a relation. The primary keyshould be chosen such that its values are never, or rarely changed.

4. Explain various storage devices and their characteristics.

Ans. - Several types of data storage exist in most computer systems. These storage media are classified by the speed with which data can be accessed, by the cost per unit of data to buy the medium, and by the medium’s reliability. Among the media typically available are these:

Cache. The cache is the fastest and most costly form of storage. Cache memory is small; its use is managed by the computer system hardware. We shall not be concerned about managing cache storage in the database system.

Main memory. The storage medium used for data that are available to be operated on is main memory. The general-purpose machine instructions operate on main memory. Although main memory may contain many megabytes of data, or even gigabytes of data in large server systems, it is generally too small (or too expensive) for storing the entire database. The contents of main memory are usually lost if a power failure or system crash occurs.

Flash memory. Also known as electrically erasable programmable read-only memory (EEPROM), flash memory differs from main memory in that

Data survive power failure. Reading data from flash memory takes less than 100 nanoseconds (a nanosecond is 1/1000 of a microsecond), which is roughly as fast as reading data from main memory.

Magnetic-disk storage. The primary medium for the long-term on-line storage of data is the magnetic disk. Usually, the entire database is stored on magnetic disk. The system must move the data from disk to main memory, so that they can be accessed. After the system has performed the designated operations, the data that have been modified must be written to disk.

Optical storage. The most popular forms of optical storage are the compact disks (CD), which can hold about 640 megabytes of data, and the digital video disk (DVD) which can hold 4.7 or 8.5 gigabytes of data per side of the disk (or up to 17 gigabytes on a two-sided disk). Data are stored optically on a disk, and are read by a laser. The optical disks used in read-only compact disks (CD-ROM) or read-only digital video disk (DVD-ROM) cannot be written, but are supplied with data prerecorded.

Tape storage. Tape storage is used primarily for backup and archival data. Although magnetic tape is much cheaper than disks, access to data is much slower, because the tape must be accessed sequentially from the beginning. For this reason, tape storage is referred to as sequential-access storage. In contrast, disk storage is referred to as direct-access storage because it is possible to read data from any location on disk.

5. What are the benefits of making the system catalogs relations?

Ans. - We can store a relation using one of several alternative file structures, and we can create one or more indexes each stored as a file on every relation. Conversely, in a relational DBMS, every file contains either the tuples in a relation or the entries in an index. The collection of files corresponding to users' relations and indexes represents the data in the database. A fundamental property of a database system is that it maintains a description of all the data that it contains. A relational DBMS maintains information about every relation and index that it contains. The DBMS also maintains information about views, for which no tuples are stored explicitly; rather, a definition of the view is stored and used to compute the tuples that belong in the view when the view is queried. This information is stored in a collection of relations, maintained by the system, called the catalog relations; an example of a catalog relation is shown in Figure 5.8. The catalog relations are also called the system catalog, the catalog, or the data dictionary. The system catalog is sometimes referred to as metadata; that is, not data, but descriptive information about the data. The information in the system catalog is used extensively for query optimization.

Let us consider what is stored in the system catalog. At a minimum we have system wide information, such as the size of the buffer pool and the page size, and the following information about individual relations, indexes, and views: For each relation:

Its relation name, the file name (or some identifier), and the file structure (e.g., heap file) of the file in which it is stored.

The attribute name and type of each of its attributes.

The index name of each index on the relation.

The integrity constraints (e.g., primary key and foreign key constraints) on the relation.

8. Define the term functional dependency.

Ans. - As the concept of dependency is very important, it is essential that we first understand it well and then proceed to the idea of normalization. There is no fool-proof algorithmic method of identifying dependency. We have to use our commonsense and judgment of specify dependencies.

Let X and Y are the two attributes of a relation. Given the value of X, if there is only one value of Y corresponding to it, then Y is said to be functionally dependent on X. This is indicated by the notation:

X Y

For example, given the value of item code, there is only one value of item name for it. Thus item name is functionally dependent on item code. This is shown as:

Item code item name

9. Discuss the relative advantages of centralized and distributed databases.

Ans. - Advantages of Distributed Systems over Centralized ones
1: Incremental growth: Computing power can be added in small increments
2: Reliability: If one machine crashes, the system as a whole can still survive
3: Speed: A distributed system may have more total computing power than a mainframe
4: Open system: This is the most important point and the most characteristic point of a distributed system. Since it is an open system it is always ready to communicate with other systems. an open system that scales has an advantage over a perfectly closed and self-contained system.

10. List a few requirements for multimedia data management.

Ans. - In modern office information or other multi-media systems, data includes not only text and numbers but also images, graphics and digital audio and video. Such multimedia data is typically stored as sequences of bytes with variable lengths, and segments of data are linked together for easy reference. The variable length data structure cannot fit well into the relational framework, which mainly deals with fixed-format records. Furthermore, applications may require access to multimedia data on the basis of the structure of a graphical item or by following logical links. Conventional query languages were not designed for such applications.

Spring 2012

Bachelor of Science in Information Technology (BScIT) – Semester 1/

Diploma in Information Technology (DIT) – Semester 1

BT0066 – Database Management Systems – 3 Credits (Book ID: B0950)

Assignment Set – 2 (60 Marks)

11. Describe the circumstances in which you would choose to use embedded SQL rather than SQL alone or only a general-purpose programming language.

Ans. - SQL functions are primarily a mechanism for extending the power of SQL to handle attributes of complex data types (like images), or to perform complex and non-standard operations. Embedded SQL is useful when imperative actions like displaying results and interacting with the user are needed. These cannot be done conveniently in an SQL only environment. Embedded SQL can be used instead of SQL functions by retrieving data and then performing the function's operations on the SQL result. However a drawback is that a lot of query-evaluation functionality may end up getting repeated in the host language code.

13. What is an unsafe query? Give an example and explain why it is important to disallow such queries.

Ans. - This query is syntactically correct. However, it asks for all tuples S such that S is not in (the given instance of) Sailors. The set of such S tuples is obviously infinite, in the context of infinite domains such as the set of all integers. This simple example illustrates an unsafe query. It is desirable to restrict relational calculus to disallow unsafe queries.

14. How do you differentiate a relational algebra and relational calculus?

Ans. - We have presented two formal query languages for the relational model. Are they equivalent in power? Can every query that can be expressed in relational algebra also be expressed in relational calculus? The answer is yes, it can. Can every query that can be expressed in relational calculus also be expressed in relational algebra? Before we answer this question, we consider a major problem with the calculus as we have presented it. Consider the query {S | ¬(S ε Sailors)}. This query is syntactically correct. However, it asks for all tuples S such that S is not in (the given instance of) Sailors. The set of such S tuples is obviously infinite, in the context of infinite domains such as the set of all integers. This simple example illustrates an unsafe query. It is desirable to restrict relational calculus to disallow unsafe queries. We now sketch how calculus queries are restricted to be safe. Consider a set I of relation instances, with one instance per relation that appears in the query Q. Let Dom (Q, I) be the set of all constants that appear in these relation instances I, or in the formulation of the query Q itself. Since we only allow finite instances I, Dom (Q, I) is also finite. For a calculus formula Q to be considered safe, at a minimum we want to ensure that for any given I, the set of answers for Q contains only values that are in Dom(Q, I).

While this restriction is obviously required, it is not enough. Not only do we want the set of answers to be composed of constants in Dom (Q, I), we wish to compute the set of answers by only examining tuples that contain constants in Dom (Q, I)! This wish leads to a subtle point associated with the use of quantifiers For all and For any: Given a TRC formula of the form For any R(p(R)), we want to find all values for variable R, that make this formula true by checking only tuples that contain constants in Dom(Q, I). Similarly, given a TRC formula of the form For all R(p(R)), we want to find any values for variable R, that make this formula false, by checking only tuples that contain constants in Dom(Q, I). We therefore define a safe TRC formula Q to be a formula such that:

1. For any given I, the set of answers for Q contains only values that are in Dom (Q, I).

2. For each sub expression of the form For any R(p(R)) in Q, if a tuple r (assigned to variable R) makes the formula true, then r contains only constants in Dom(Q, I).

3. For each sub expression of the form For all R (p(R)) in Q, if a tuple r (assigned to variable R) contains a constant that is not in Dom (Q, I), then r must make the formula true.

Note that this definition is not constructive, that is, it does not tell us how to check if a query is safe.

The query Q = {S |¬(S 2 Sailors)} is unsafe by this definition. Dom (Q, I) is the set of all values that appear in (an instance I of) Sailors. Consider the instance S1 shown in Figure 8.1. The answer to this query obviously includes values that do not appear in Dom (Q, S1). Returning to the question of expressiveness, we can show that every query that can be expressed using a safe relational calculus query, can also be expressed as a relational algebra query. The expressive power of relational algebra is often used as a metric of how powerful a relational database query language is. If a query language can express all the queries that we can express in relational algebra, it is said to be relationally complete. A practical query language is expected to be relationally complete; in addition, commercial query languages typically support features that allow us to express some queries that cannot be expressed in relational algebra.

15. What types of anomalies are found in relational database?

Ans. - There are several standardtypes of anomaly in a database. It doesn't need to be a relational database; the same anomalies are present in any database. A properly designed relational database specifically aims to eliminate these anomalies.

Insertion Anomaly: - It is a failure to place information about a new database entry into all the places in the database where information about the new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database, in an inadequately normalized database, information about a new entry may need to be inserted into more than one place, and human fallibility being what it is, some of the needed additional insertions may be missed.

Deletion Anomaly: - It is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database, in an inadequately normalized database, information about that old entry may need to be deleted from more than one place.

Update Anomaly: - An update of a database involves modifications that may be additions, deletions, or both. Thus “update anomalies” can be either of the kinds discussed above.

All three kinds of anomalies are highly undesirable, since their occurrence constitutes corruption of the database. Properly normalized database are much less susceptible to corruption than are un-normalized databases.

16. Give a set of FDs for the relation schema R (A, B, C, D) with primary key AB under which R is in 1NF but not in 2NF.

Ans. - Consider the set of FD: AB → CD and B → C. AB is obviously a key for this

relation since AB → CD implies AB → ABCD. It is a primary key since there are

no smaller subsets of keys that hold over R(A,B,C,D). The FD: B → C violates

2NF since:

C B is false; that is, it is not a trivial FD

B is not a super key

C is not part of some key for R

B is a proper subset of the key AB (transitive dependency)

17. When is it useful to have replication or fragmentation of data? Explain your answer.

Ans. - Replication is useful when there are many read-only transactions at different sites

Wanting access to the same data. They can all execute quickly in parallel, accessing local data.

But updates become difficult with replication. Fragmentation is useful if transactions on different

Sites tend to access different parts of the database.

19. What is the meaning of multimedia data?

Ans. - There are number of data types that can be characterized as multimedia data types. These are typically the elements for the building blocks of ore generalized multimedia environments, platforms, or integrating tools. The basic types can be described as follows:

Text: The form in which the text can be stored can vary greatly. In addition to ASCII based files, text is typically stored in processor files, spreadsheets, databases and annotations on more general multimedia objects. With availability and proliferation of GUIs, text fonts the job of storing text is becoming complex allowing special effects (color, shades...).

Images: There is great variance in the quality and size of storage for still images. Digitalized images are sequence of pixels that represents a region in the user's graphical display. The space overhead for still images varies on the basis of resolution, size, complexity, and compression scheme used to store image. The popular image formats are jpg, png, bmp, tiff.

Audio: An increasingly popular data type being integrated in most of applications is Audio. Its quite space intensive. One minute of sound can take up to 2-3 Mbs of space. Several techniques are used to compress it in suitable format.

Video: One on the most space consuming multimedia data type is digitalized video. The digitalized videos are stored as sequence of frames. Depending upon its resolution and size a single frame can consume upto 1 MB. Also to have realistic video playback, the transmission, compression, and decompression of digitalized require continuous transfer rate.

Graphic Objects: These consist of special data structures used to define 2D and 3D shapes through which we can define multimedia objects. These include various formats used by image, video editing applications. Examples are CAD / CAM objects.

20. What do you mean by late binding of methods? Give an example of inheritance that illustrates the need for dynamic binding.

Ans. - Late binding is a computer programming mechanism in which the method being called upon an object is looked up by name at runtime. This is informally known as duck typing or name binding.

Late binding is often confused with dynamic dispatch, but there are significant differences. With early binding the compiler statically verifies that there are one or more methods with the appropriate method name and signature. This is usually stored in the compiled program as an offset in a virtual method table("v-table") and is very efficient. With late binding the compiler does not have enough information to verify the method even exists, let alone bind to its particular slot on the v-table. Instead the method is looked up by name at runtime.

The primary advantage of using late binding in Component Object Model(COM) programming is that it does not require the compiler to reference the libraries that contain the object at compile time. This makes the compilation process more resistant to version conflicts, in which the class's v-table may be accidentally modified. (This is not a concern in JIT-compiled platforms such as .NET or Java, because the v-table is created by the secondary compiler against the libraries as they are being loaded into the running application.)

For More Assignments Click Here

Pages

Saturday, April 13, 2013

B.Sc. IT BT0066 (Semester 1, Database Management Systems) Assignment

No comments:

Post a Comment

Twitter