What is an algorithm?

A programming algorithm is a computer procedure that is a lot like a recipe (called a procedure) and tells your computer precisely what steps to take to solve a problem or reach a goal

How old Is Data management?

Early Data Management include File and Database systems that were designed prior the relational database in the 1970s. Including ( Flat File Dta. Mangmt. | Hierarchical Dta. Mgmt. Sytms. | Network Dta. Mgmt. Sytms.)

What are the different generations of Databases?

1.Flat file data model:

Organized set of data stored in a long-term storage medium, Disk of Magnetic tape.

2.Hierarchical data model 

Files are related in a parent/child manner, with each child file having at most one parent file.

3.Network data model

Made of data records linked together. Data records are known as “Nodes” and the links as “Edges”. Not restricted to one parent record. Having Schema and Database. It standardized in 1971 by the CODASYL group (Conference on Data Systems Languages).

4.Relational database model (1970 – E.F Codd)

Is an application made of multiple programs that manage data and allow users to add, update, read, and delete data. Designed to use a common and standardized language to manipulate data, called SQL. The minimal requirements to implement RDBMS includes 4 components: Storage Mgmt. ProgramsMemory Mgmt. ProgramsData Dictionary, and Query Language.

Relational databases Major components:

User Interface | Business Logic | Database Code.

Disadvantages: Can’t support large volumes of read and write operations, Low Latency response times, and high availability. When too many users Relational Dtbs, had to increase # of CPU’s for memory increase, which only worked until certain point.

SQL Phrase: SELECT first_name, last_name FROM employees

5.Object oriented database model

It supports the modeling and creation of the data as objects. Can efficiently manage a large number of different data types. Objects with complex behaviors are easy to handle using inheritance and polymorphism etc.

Four Characteristics of Data management systems that are particular important for large scale data management:

Scalability | Cost | Flexibility | Availability

Scaling out? Adding servers as needed, depending on the traffic.

Scaling up? Upgrading an existing database server to add additional processors, memory, network band width to improve performance.

Scaling out is more flexible than Scaling up.

Entity-Relationship Model:

An entity relationship diagram (ERD) shows the relationships of entity sets stored in a database. An entity set is a collection of similar entities. These entities can have attributes that define its properties. An HR schema might include employees, managers, and departments, and an inventory schema warehouses, products, and suppliers.

What are some limitations of Flat Files?

It is inefficient to access data in anyway other than the one it was organized in the file. 

Changes to file structure require changes to programs

Different kinds of data have different security requirements

Data can be stored in multiple files, making it difficult to maintain consistent sets of data. 

List out the major types of data storage and examples of each type of storage.    

Punched Cards: The leading card format was the IBM 80-column card with 80 columns by 24 rows until the 1950s.

Magnetic Tapes: Designed for sound recording. Half-inch tape formats which have the origin in IBM reel-to-reel tape (IBM cartridges, StorageTek cartridges, DLT and LTO); 2) quarter inch and 8-mm QIC tape formats (3M QIC, SLR and Travan.

Magnetic Disks: The primary component in modern storage systems, began with the IBM 350 Disk File [15] developed by the IBM team led by Reynold B. Johnson.

Optical/magneto-Optical Storage Media: are storage media that can record information by changing photo-physical forms on their recording surfaces and read the recorded information by emitting light beams against the surface and sensing their reflection.  LaserDisc (LD), announced in 1980 by David Paul Gregg.

Storage Class Memory: Nonmechanical storage media, such as flash memory, are currently deployed for secondary storage in computer systems. Flash memory is a sort of electrically erasable programmable read-only memory (EEPROM)

Storage Networking: can connect arbitrary storage devices and computers via a network often designed for connecting storage devices. A SAN links together multiple storage devices and provides block-level storage that can be accessed by servers.

Cloud Storage and the Future: storage service providers (SSPs), started to manage customers’ storage systems in their data centers, where customers could access their business data via broadband networks. From the customers’ viewpoint this trend was rightly regarded as storage management outsourcing, which was enabled by the emerging storage virtualization technology. Currently major cloud-based storage services include Amazon S3, Windows Azure Storage, and Google Cloud Storage.

What are some similarities/differences between network storage and the cloud? 

Cloud storage is renting up a space from a provider. You can imagine it as renting a space in several NAS devices standing somewhere in the world. Network Attached Storage provides a space for the whole local network and usually finds its place in your dorm room. With the ability to store and share data most NAS devices can also run as servers (for your website, FTP or else).

Similarities between SQL and Python? 

It gives the power and flexibility to answer any question. SQL is needed to build the data set into a final table that has all of the necessary attributes. Then, from this large data set, you can use Python to spin off deeper analysis.

Why is there a need for NoSQL databases?

Distributed systems offer some level of operational simplicity being able to add and remove servers as needed

Four types of NoSQL databases?

1.Key-Value database: based on keys, identifiers for looking up data and values (Baggage Tag / suitcase)

2.Document Database: also use identifiers to look up values but the values are typically more complex. Documents are collections of data items stored together in a flexible structure. ( First_name, Last_name, Position, Office_Number..)

3.Column: ability to link or join tables for improved performance. Similarities with relational databases. (Columns and Rows)

4.Graph: suited to model objects and relationships between objects. (Graphs)

What are the three Vs of big data?  Volume | Variety | Velocity

Cap Theorem?

Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two of the following: Consistency: Every read receives the most recent write or an error. Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write. Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

What is ACID and BASE?

ACID provides a safe environment in which to operate on your data. Atomic: All operations in a transaction succeed or every operation is rolled back. Consistent: On the completion of a transaction, the database is structurally sound. Isolated: Transactions do not contend with one another. Contentious access to data is moderated by the database so that transactions appear to run sequentially. Durable: The results of applying a transaction are permanent, even in the presence of failures.

BASE: Basic Availability:The database appears to work most of the time. Soft-state: Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.Eventual consistency:Stores exhibit consistency at some later point. Overall, the BASE consistency model provides a less strict assurance than ACID: data will be consistent in the future, either at read time

In graph databases, what is a node? What is a relationship? 

A node is an object that has an identifier and a set of attributes. Relationship is a link between two nodes that contain attributes about their relation. Nodes can represent people and relationships can their relationships in their social network. 

Features of Key/Value databases?

A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects. Key-value databases are highly partitionable and allow horizontal scaling at scales that other types of databases cannot achieve. For example, Amazon DynamoDB allocates additional partitions to a table if an existing partition fills to capacity and more storage space is required.

What is a distributed system?

is a system with multiple components located on different machines that communicate and coordinate actions in order to appear as a single coherent system to the end-user. When systems run on multiple servers, instead of just one computer.

Describe a two-phase commit. 

Phase 1: the database writes, or commits, the data to the disk of the primary server. 
Phase 2: The database writes data to the disk of the backup server. 
It helps ensure consistency because if the primary server fails, it can switch to the backup database.

Describe monotonic write consistency. Why is it so important?

If you were to issue several update commands, they would be executed in the order you had issued them. 
This ensures that the results of a set of commands are predictable. Repeating the same commands with the same starting data will yield the same results.

How many values can be stored with a single key in a key-value database? One.

What is a namespace? Why is it important in key-value database?

A collection of identifiers. Keys must be unique within a namespace.