What Is Database Management?
First, what’s a database? In one regard, a database is the hardware that stores information, whether it’s an address book, a list of products, or another type of interrelated knowledge. The termdatabasecan also refer to the data itself. In addition, the database can mean the combination of the data, the hardware, and thedatabase management system(DBMS), which is the software that allows users to access and work with the stored data. We will use the last definition in this article.
Database managementrefers to the actions that businesses take to work with and process their data, which can include creating, validating, retrieving, updating, maintaining, linking, aligning, archiving, and deleting data when it’s no longer needed. To ensure control over the data and its use, a business must protect its data.
The span of time from when an organization creates data until it archives or deletes it is known as thedata lifecycle. The lifecycle will have different components depending on the business, and even within a business (sales, tax, and HR data will have different lifecycles, though they’re all from the same company). For example, the life of HR data might look like this:
The company creates a file when it hires a new person.
The data is accessed and updated yearly with results from the annual review.
The data is unlinked from one team and linked to another when the person changes positions.
When the person leaves the company, their data is archived and eventually deleted.
All of these actions allow the business to leverage their data for operations and business intelligence.
But asRaghu Ramakrishnan, co-author of a book on database management, says, “It’s a topic in flux. The underlying role of data is changing rapidly. Twenty years ago, people would have said relational databases. But today it’s murkier; it’s not just relational databases. There’s more unstructured and semi-structured databases, such as in Hadoop systems. All are versions of data management.”
Roles in Database Management
There are many roles for people who want to be involved in database management. Key positions include the following:
IT/tech management
Data scientists
Database architects
Database administrators
Computer system analysts
Software developers
What Are Database Management Systems?
As mentioned above, a DBMS is the software that allows the company to access, manipulate, process, store, update, archive, and delete data. It acts as the gatekeeper, separating the database from the users and apps that want to view or change the data.
DBMS Terms
Before delving deeper into database management systems, you’ll need to understand some key terminology:
Data Management Framework or Data Governance Framework: A very high-level process to choose and create guidelines and rules on how data will be created and manipulated. In big data, the termdata frameworkrefers to software like Hadoop and Spark.
Data Model: The requirements, business rules, and design that drive how the schema is created.Data modelcan also refer to the DBMS type.
Schema: The code that manages the organization and storage of the data, as well as how objects such as tables, views, and stored procedures are grouped and connected. It includes types for each field (e.g., text only, alphanumeric), along with their lengths.
View: A window into the data that is customized for a user or user type. The view allows them to make queries and manipulate the results. This enforces security by limiting access to only the data that each user or user type can see.
查询: A request for data that is submitted by a user.
Transactional Application: An application that processes online transactions.
How to Choose the Right DBMS for Your Needs
There are numerous types of DBMSs, each with strengths and weaknesses. Below are some common types and examples of what they are used for.
DBMS Type | Description | Examples | Common Applications |
---|---|---|---|
Flat File | The records have no relation to one another. | Access, FileMaker Pro | Address lists |
Object-Oriented | A type of relational database where data is represented by objects (like those in object-oriented programming) rather than in rows and columns. | db4o, Perst | Science and engineering, large-scale manufacturing |
Relational | In the best-known type of database, records are organized in rows, and relationships can be created between records. SQL is the most common language used. | MySQL, SQL Server, SQLite, Oracle Database | Libraries, flight reservations, parts/retail inventory, ATMs, business transactions, data warehouses |
Hierarchical | Data is connected in a treelike format. | IBM Information Management System | Banking, telecommunications |
Network | Similar to hierarchical, but the structure is more flexible. | Integrated Data Store, TurboIMAGE | Customer sales records |
Navigational | Combination of hierarchical and network | IBM Information Management System, CA IDMS | U.S. Federal Reserve and other large banking organizations |
Columnar | A variant of the the relational database, but records are in columns rather than rows | Sybase IQ, VectorWise, Druid | Patient records |
Desktop | Runs on a PC Can use SQL or noSQL |
Access, OpenOffice Base, Axisbase | Personal or small-business data |
Post-relational | Also known as non-relational or NoSQL/NewSQL, good for large volumes of freeform data (such as text or photos) | MongoDB, Cassandra, HBase | Online transaction processing |
There are many variations of the database and DBMS types listed above, including the following:
Knowledge Base: A knowledge base stores information to help answer questions by customer service organizations or other reference bodies. It can be hierarchical or relational.
Document-oriented: A type of NoSQL database that is designed for document storage.
Deductive: A relational database subtype overlaid with software that can make deductions from the data according to coded rules.
Probabilistic: A relational database that tries to take the possible states of uncertain information into consideration. Used for insurance risk analysis or to predict elections.
In-memory: Uses a computer's main memory rather than a storage device (like a hard drive). In-memory systems have faster response times than databases using storage devices, and are often deployed for telecommunications, mobile advertising, and data analytics.
Hypertext: Text and objects can be related via hyperlink, similar to the way the web works. Hypertext is often used for online encyclopedias and is also called超媒体.
Embedded: These database tools are tightly integrated with an application, so a discrete database program is unnecessary.
Federated: A number of databases, each with their own DBMS but managed by an overarching DBMS, that operate as a single database. This configuration is sometimes called amultidatabase.
Graph Database: A NoSQL database that incorporates graph structures to link the data.
Array: A NoSQL database that storesmultidimensional arrays, which are large collections of data points, such as very high-resolution satellite photographs.
Mobile: This type of database is optimized for access via a mobile device.
Operational: Businesses use this for customer data, sales transactions, and employee data.
Parallel: This architecture is designed to improve performance by using parallel processors. These can include shared memory, shared disk, and shared nothing architectures.
Real-time: An architecture that processes data at a high speed and returns results quickly enough for immediate action.
Active: A database in which certain events trigger actions, such as alarms for hacking or notifications generated when conditions are met (such as a sales threshold).
Hadoop and Spark are not DBMSs in the traditional sense, but they serve some of the same functions. They are designed to perform data processing and analysis tasks on large volumes of data on distributed networks or big data. Hadoop is an older product. Some businesses are replacing Hadoop with Spark, while others are finding ways to make them work in concert.
A Brief History of Databases and Data Management
The key concepts in data management come from the worlds of accounting, statistics, and logistics.
Databases were preceded by punch card systems and weren’t practical until the electronic era. In the 1960s, IBM created the Information Management System (IMS), one of the first commercially available databases. It was a hierarchical system.
随着计算能力的增加和成本的降低,performance and ability improved, allowing greater storage and quicker responses to queries.The Association of Data Processing Service Organizations (ADAPSO) was one of many groups that pushed for best practices in data management in the 1960s; ADAPSO later changed its name to the Information Technology Association of America (ITAA), as its focus shifted to the available technology. A similar group, the National Microfilm Association, changed its name to the Association for Information and Image Management (AIIM).
E. F. “Ted” Codd conceived of the relational model while working at IBM in the 1970s; it became the prominent model during that decade. This is still true today, though new models are challenging its dominance. IBM was devoted to IMS and the hierarchical model, so Michael Stonebreaker and Eugene Wong at UC Berkeley researched and popularized the relational model. IBM eventually got involved in the relational model and developed SQL.
The NoSQL model started to become more popular in the late 1980 and early 1990s due to the data management demands created by the internet. Also in the 1980s, the Data Management Association International (DAMA) formed to promote data management education. DAMA created the Data Management Body of Knowledge (DAMA DMBOK), which is used as a standard for data management functions and methods.
Other key people in the history of databases include Ralph Kimball and Bill Inmon, who were involved in creating data warehouses, and Jim Gray, who helped pioneer many foundational concepts and practices.
Data Management Best Practices
Data is a corporate asset. It’s used for operations such as billing, customer acquisition, inventory tracking, and planning for the future and requires stewardship. Therefore, a company's data needs to be secure and accurate. Following these best practices will help with that goal, as well as prevent events that may degrade efficiency and revenue (i.e. monetizing the data) and can increase data accuracy and usefulness, as well as improve business intelligence.
Control data through the data lifecycle. From creating to archiving or deleting, ensure that that the data is secure and only changed by authorized parties.
Protect the organization’s data. Keep it secure from prying external eyes.
Prevent legal and compliance risk. Ensure that data use and retention policies meet all applicable laws and regulations.
Plan for storage and capacity needs. Ensure that there is enough space to store new data as it comes in, in addition to what needs to be retained.Implement professional training. Not only should you train IT on how to manage data as needs and methods change, but also teach users how to query data and utilize the results.
Keep data-driven applications performing at their best. This makes sure that response times are quick and query returns are accurate.
In addition, stay on top of these details and features to get the most out of your database:
Partitioning: Divide large relational database tables into smaller tables to speed query response times.
Replication: Data shared between multiple databases (such as a distributed database arrangement) should be replicated frequently so that all users have access.
Masking: Hide sensitive data (such as Social Security numbers) from potential attackers and thieves.
Rollbacks and restarts: A rollback reverts data records to the previous state when a transaction fails. This makes sure that all fields in a record are in sync if an update does not complete properly. As with a computer, you sometimes need to restart your database. You must have a process in place to ensure that you don’t lose any transactions, updates, or deletions when the database is down.
Logging and auditing: Keep track of who accessed the data, and when data is changed, archived, or deleted.
Consistent data processing: This not only makes sure that the database is more accurate, but reduces duplicated data and, thus, storage needs.
Access Privileges: Ensure that users only see what they need to perform their job.
Change Management: You will need to update your data, so have a plan in place on how to do this.Metrics: Create and use reports that show you are maintaining data quality.
Standard APIs: APIs (application programming interfaces) allow different programs to communicate with each other and exchange data. The use of standard APIs ensures data doesn’t get corrupted during transfer between applications.
Deduplication: This is the process of making sure that each piece of data exists only once. Deduplicating data helps keep it accurate and updated.
数据治理和质量阿舒尔ance: Implement checks and balances to ensure that data is accurate and does not get corrupted.
Concurrency:这是多个用户访问的能力and update data simultaneously. Begin with a business question and acquire data to answer the question. Sharing data is preferable to making copies, as it reduces the chance for errors getting into the master file.
Performance Monitoring and Tuning: Like all systems, databases need to be maintained.
Materialized Views: Store frequently accessed views and common queries in memory to increase response time and lower processing needs.
Benefits of Data Management
Data management can have positive impacts for any business. Some critical advantages are listed below.
Data becomes an asset and can be used in many ways for business operations.
Using data to evaluate past performance and predict future conditions can make a business more competitive and drive revenue and growth.
The use of analytics techniques can enhance the value of data.
Efficient use of data can lead to reduced TCO (total cost of operations).
Data management helps enhance business intelligence and corporate performance management.
As the volume of data continues to increase, data management techniques can prevent the negative effects of this growth, as well as archive or delete data that’s no longer useful.
Data management can optimize storage by minimizing duplication, which contributes to lower costs of both acquisitions and operations.
Because more people are using mobile devices as a computing platform, data should be accessible to mobile apps and users, as well as those using traditional computers.
Backing up data eases recovery from crashes.
Database management preserves data integrity.
Uniform data administration procedures are easier to manage.
When organized logically, data is easier to query and work with.
Data abstraction, the difference between the logical structure and the conceptual schema, allows the data to be viewed in different ways, which gives it more value.
You can increase customer loyalty and engagement as well as add value to customer interactions by presenting a personalized customer experience, such as content or offers based on their past actions or their demographics information.
By preventing concurrent updates, you keep data more accurate.
Challenges of Database Management Systems
The world of data management is in flux, and the practice and practitioners face many pressures and changes. These include the following:
Rapid data growth strains memory use and system resources, and requires more capacity.
Collecting data is easy, but managing it is difficult. This often results from a lack of planning. The old “garbage in, garbage out” idiom applies here.
Poor application performance means that data is inaccurate and response times are slow.
Compliance risks, including from recent regulations like General Data Protection Regulation (GDPR),Sarbanes-Oxley(SOX), the California Consumer Privacy Act of 2018, Basel III, and Payment Card Industry Data Security Standard (PCI-DSS) policies create stricter requirements for data retention, privacy and consumer protection, and security.
DBMSs are declining in popularity due to competitive pressures from cloud infrastructure and commodity hardware.
Vendor consolidation means fewer options.
Database Platform as a Service (dbPaaS) and other software-only approaches require new paradigms and ways of connecting servers and storing data.
Open-source DBMS (OSDBMS) means that vendors of legacy programs will have to lower their prices or face going out of business.
Data breachesreduce confidence in data and data management.
Recent Developments in Database Management Systems
The biggest recent changes and future trends in data management and databases are the result of big data. This means greater data volumes, an increase in unstructured data, greater use of graph databases, and an onslaught of data from multiple sources (such as the internet of things) in varied formats. This requires new paradigms (like key-value stores) and software (such as Hadoop and Spark) to store, process, analyze, and archive data. Data management will be even more important as data volume grows.
Professor Raghu Ramakrishnan says, “The volume of data will become even more ridiculous than it already is, so tools will need to be increasingly sophisticated to match.”
The desire for more real-time information needs require more in-database analytics and event stream processing. “You’ll see more real-time queries and interactions. Easy-to-use tools will be important,” says Ramakrishnan.
As for careers, Ramakrishnan offers the following advice:
Because data has become ubiquitous, people who are experts in it will not be the only people making decisions based on it. The use of data will expand. Data literacy will be a fundamental skill for any job, so we’ll have to shift our educational priorities to match. For people in traditional IT roles, systems skills, coding and solid engineering skills, and systems and query skills will still be important.
What Is Distributed Database Management?
Adistributed databaseoperates on storage devices that are not all connected to a common processor. Distributed database management is often used when data is stored in the cloud. Ideally, it shouldn’t matter to users where the data is stored, and they don’t need to know.
What Is Spatial Database Management?
Aspatial databaseis optimized for data representing objects defined in a geometric space, from simple items (such as lines and polygons) to more complex items (such as 3D objects). They are valuable for modeling cities in urban planning and mapping. Spatial database management requires many of the same skills as regular database management, as well as a familiarity with geometry and geography.
数据和数据库管理书籍
For more on database management, see these titles:
An Introduction to Database Systemsby Christopher J. Date
Fundamentals of Database Systemsby Ramez Elmasri
Database Management Systemsby Johannes Gehrke and Raghu Ramakrishnan
Principles of Data Managementby Keith Gordon
Improve Database & IT Operations Management with Smartsheet
Empower your people to go above and beyond with a flexible platform designed to match the needs of your team — and adapt as those needs change.
The Smartsheet platform makes it easy to plan, capture, manage, and report on work from anywhere, helping your team be more effective and get more done. Report on key metrics and get real-time visibility into work as it happens with roll-up reports, dashboards, and automated workflows built to keep your team connected and informed.
When teams have clarity into the work getting done, there’s no telling how much more they can accomplish in the same amount of time.Try Smartsheet for free, today.