관계형 데이터베이스의 널은 괜찮습니까?

Development Tip

관계형 데이터베이스의 널은 괜찮습니까?

yourdevel 2020. 11. 2. 19:52

관계형 데이터베이스의 널은 괜찮습니까?

관계형 데이터베이스에서 null 값을 허용해서는 안된다는 생각이 많이 있습니다. 즉, 테이블의 속성 (열)은 null 값을 허용하지 않아야합니다. 소프트웨어 개발 배경에서 왔기 때문에 나는 이것을 정말로 이해하지 못합니다. 속성의 컨텍스트 내에서 null이 유효한 경우 허용되어야합니다. 이것은 객체 참조가 종종 null 인 Java에서 매우 일반적입니다. 광범위한 데이터베이스 경험이 없기 때문에 여기에 뭔가 빠졌는지 궁금합니다.

Null은 데이터베이스 정규화 관점에서 부정적으로 간주됩니다. 값이 아무것도 아닐 수 있다면 값이없는 항목에 대한 행이 필요하지 않도록 다른 희소 테이블로 분할해야한다는 생각입니다.

모든 데이터가 유효하고 가치가 있는지 확인하기위한 노력입니다.

성능상의 이유로 또 다른 조인을 피하고 싶을 때 특히 어떤 경우에는 null 필드가있는 것이 유용합니다 (비범 한 고성능 시나리오를 제외하고 데이터베이스 엔진이 제대로 설정된 경우 문제가되지 않아야 함).

-아담

null에 대한 한 가지 주장은 잘 정의 된 해석이 없다는 것입니다. 필드가 null이면 다음 중 하나로 해석 될 수 있습니다.

값은 "Nothing"또는 "Empty set"입니다.
해당 필드에 적합한 값이 없습니다.
값을 알 수 없습니다.
값이 아직 입력되지 않았습니다.
값은 빈 문자열입니다 (null과 빈 문자열을 구분하지 않는 데이터베이스의 경우).
일부 애플리케이션 특정 의미 (예 : "값이 null이면 기본값 사용")
오류가 발생하여 필드가 실제로는 안될 때 null 값을 갖게됩니다.

일부 스키마 디자이너는 모든 값과 데이터 유형이 잘 정의 된 해석을 가져야한다고 요구하므로 null은 잘못되었습니다.

Null 마커는 괜찮습니다. 정말 그렇습니다.

때에 따라 다르지.

NULL데이터베이스에서를 허용하는 이유 ( 열별로 선택해야 함 ) 및이를 해석, 무시 또는 처리하는 방법을 이해하는 한 괜찮습니다.

예를 들어, 다음과 같은 열 NUM_CHILDREN이 있습니다 NULL. 답을 모르면 어떻게하나요?이어야합니다 . 내 생각에이 열의 디자인에 가장 적합한 다른 옵션은 없습니다 ( NUM_CHILDREN열이 유효한지 여부를 결정하는 플래그 가 있더라도이 열에 값이 있어야 함).

반면에 NULLs를 허용하지 않고 특정 경우 (플래그 대신)에 대해 특별히 예약 된 값이있는 경우 (예 : 실제로 알 수없는 경우 자식 수에 대해 -1) 비슷한 방식으로 처리해야합니다. 규약, 문서 등의 조건

따라서 궁극적으로 문제는 규칙, 문서화 및 일관성으로 해결되어야합니다.

위의 답변에서 Adam Davis가 분명히 옹호 한 대안은 열을 희소 (또는 NUM_CHILDREN대부분의 데이터에 알려진 값 이있는 예의 경우에는 그렇게 희박하지 않음 ) 테이블 로 정규화하는 것 입니다. 모든 NULL을 제거하고 일반적으로 작동하지 않습니다.

속성을 알 수없는 많은 경우 NULL에 더 간단한 디자인에서 s를 허용 할 수있는 각 열에 대해 다른 테이블에 조인하는 것은별로 의미가 없습니다 . 조인의 오버 헤드, 기본 키의 공간 요구 사항은 현실 세계에서 거의 의미가 없습니다.

이를 통해 카디널리티 열을 추가하여 중복 행을 제거 할 수있는 방법을 떠 올릴 수 있으며, 이론적으로는 대규모 데이터와 같이 실제로 때때로 불가능한 고유 키가없는 문제를 이론적으로 해결합니다. 순수 주의자들은 대신 대리 PK를 빠르게 제안하지만, 관계 (테이블)에서 의미없는 대리가 튜플 (행)의 일부를 형성 할 수 있다는 생각은 관계 이론의 관점에서 웃을 수 있습니다.

NULL 사용에 대해 여러 가지 반대 의견이 있습니다. 이의 제기 중 일부는 데이터베이스 이론을 기반으로합니다. 이론적으로 이론과 실제 사이에는 차이가 없습니다. 실제로 있습니다.

완전히 정규화 된 데이터베이스가 NULLS 없이도 잘 지낼 수 있다는 것은 사실입니다. 데이터 값을 제외해야하는 곳은 정보 손실없이 전체 행을 제외 할 수있는 곳입니다.

실제로이 정도까지 테이블을 분해하는 것은 그다지 유용한 목적이 아니며 데이터베이스에서 간단한 CRUD 작업을 수행하는 데 필요한 프로그래밍이 더 지루하고 오류가 발생하기 쉽습니다.

NULLS를 사용하면 문제가 발생할 수있는 곳이 있습니다. 본질적으로 이러한 문제는 다음 질문을 중심으로 진행됩니다. 누락 된 데이터가 실제로 무엇을 의미합니까? NULL이 실제로 전달하는 모든 것은 주어진 필드에 저장된 값이 없다는 것입니다. 그러나 애플리케이션 프로그래머가 누락 된 데이터에서 도출하는 추론이 때때로 올바르지 않아 많은 문제가 발생합니다.

다양한 이유로 위치에서 데이터가 누락 될 수 있습니다. 다음은 몇 가지입니다.

이 컨텍스트에서는 데이터를 적용 할 수 없습니다. 예 : 한 사람의 배우자의 이름.
데이터 입력 양식의 사용자는 필드를 비워 두 었으며 애플리케이션은 필드에 항목을 입력 할 필요가 없습니다.
데이터가 다른 데이터베이스 또는 파일에서 데이터베이스로 복사되고 소스에 누락 된 데이터가 있습니다.
외래 키로 인코딩 된 선택적 관계가 있습니다.
Oracle 데이터베이스에 빈 문자열이 저장되었습니다.

NULLS를 피해야하는 경우에 대한 몇 가지 지침은 다음과 같습니다.

정상적인 예상 프로그래밍 과정에서 쿼리 작성자는 NULL을 유효한 값으로 대체하기 위해 많은 ISNULL, NV, COALESCE 또는 유사한 코드를 작성해야합니다. 저장되는 것이 "현실"이면 저장 시간에 대체하는 것이 더 낫습니다.

NULL을 포함하는 행이 계수되어 계수가 꺼질 가능성이있는 경우. 종종 이것은 count (*) 대신 count (MyField)를 선택하여 제거 할 수 있습니다.

여기에 당신이 NULLS에 더 잘 익숙해지고 그에 따라 프로그래밍하는 한 곳이 있습니다 : LEFT JOIN 및 RIGHT JOIN과 같은 외부 조인을 사용할 때마다. 내부 조인과 구별되는 외부 조인 뒤의 요점은 일치하는 데이터가 누락 된 경우 행을 가져 오는 것입니다. 누락 된 데이터는 NULLS로 제공됩니다.

내 결론 : 이론을 이해하지 않고 무시하지 마십시오. 그러나 이론에서 벗어날 때와 그것을 따르는 방법을 배우십시오.

데이터 필드에 NULL을 사용하는 것은 잘못된 것이 아닙니다. 키를 null로 설정할 때주의해야합니다. 기본 키는 NULL이 아니어야합니다. 외래 키는 null 일 수 있지만 고아 레코드를 만들지 않도록주의해야합니다.

"존재하지 않는"항목이 있으면 빈 문자열이나 다른 종류의 플래그 대신 NULL을 사용해야합니다.

NULL, tristate vs boolean 논리 등의 모든 문제를 작성하는 대신-이 간결한 조언을 제공 할 것입니다.

누락되거나 불완전한 데이터를 나타내는 매직 값을 추가 할 때까지 열에 NULL을 허용하지 마십시오.
이 질문을하고 있으므로 NULL에 접근하는 방법에 매우 주의 해야 합니다. 분명하지 않은 함정이 많이 있습니다. 확실하지 않은 경우 NULL을 사용하지 마십시오.

"N / A"또는 "N / K"또는 빈 문자열 (별도의 테이블)을 사용하는 또 다른 대안이 있습니다.

예 : 고객의 전화 번호를 알 수 있거나 알 수없는 경우 :

CREATE TABLE Customer (ID int PRIMARY KEY, Name varchar(100) NOT NULL, Address varchar(200) NOT NULL);
CREATE TABLE CustomerPhone (ID int PRIMARY KEY, Phone varchar(20) NOT NULL, CONSTRAINT FK_CustomerPhone_Customer FOREIGN KEY (ID) REFERENCES Customer (ID));

전화 번호를 모르는 경우 두 번째 테이블에 행을 추가하지 않습니다.

Null이 반드시 사용되어야한다고 말하고 싶습니다. 데이터 부족을 나타내는 다른 올바른 방법은 없습니다. 예를 들어, 누락 된 주소 행을 나타 내기 위해 빈 문자열을 사용하거나 누락 된 연령 데이터 항목을 나타 내기 위해 0을 사용하는 것은 잘못된 것입니다. 빈 문자열과 0은 모두 데이터이기 때문입니다. Null은 이러한 시나리오를 나타내는 가장 좋은 방법입니다.

필드를 NULL 허용으로 만들어 생성하는 복잡성을 과소 평가하지 마십시오. 예를 들어, 다음 where 절은 모든 행과 일치하는 것처럼 보입니다 (비트는 1 또는 0 만 가능합니다. 맞습니까?)

where bitfield in (1,0)

그러나 비트 필드가 NULL 가능하면 일부를 놓칠 것입니다. 또는 다음 쿼리를 수행하십시오.

select * from mytable
where id not in (select id from excludetable)

이제 excludetable에 null과 1이 포함되어 있으면 다음과 같이 변환됩니다.

select * from mytable
where id <> NULL and id <> 1

그러나 "id <> NULL"은 모든 id 값에 대해 false이므로 행을 반환하지 않습니다. 이것은 경험이 풍부한 데이터베이스 개발자도 놀라게합니다.

대부분의 사람들이 NULL에 의해 방심 당할 수 있다는 점을 감안할 때 가능한 한 피하려고합니다.

NULL은 다음과 같은 많은 것을 의미 할 수 있기 때문에 엄청난 웜 캔입니다.

그 사람이 아직 살아 있기 때문에 사망 날짜가 없습니다.
우리는 그것이 무엇인지 모르기 때문에 또는 그것이 존재하더라도 휴대 전화 번호가 없습니다.
그 사람은 사회 보장 번호가 없다는 것을 알고 있기 때문에 없습니다.

이들 중 일부는 정규화로 피할 수 있고, 일부는 해당 열 ( "N / A")에 값이 있으면 피할 수 있으며, 일부는 NULL의 존재를 설명하는 별도의 열을 사용하여 완화 할 수 있습니다. ( "N / K", "N / A"등).

또한이를 찾는 데 필요한 SQL 구문이 널이 아닌 값의 구문과 다르고 결합하기가 어렵고 일반적으로 인덱스 항목에 포함되지 않기 때문에 웜의 가능성이 있습니다.

이전의 이유 때문에 null이 불가피한 경우를 찾을 것입니다.

후자의 이유 때문에 그 수를 최소화하기 위해 최선을 다해야합니다.

관계없이 항상 NOT NULL 제약 조건을 사용하여 값이 필요한 null을 방지하십시오.

null의 주요 문제는 비교, 집계 및 조인을 통해 예상치 못한 결과를 생성 할 수있는 특수 의미 체계가 있다는 것입니다.

어떤 것도 null과 같지 않으며, null보다 크거나 작은 것도 없으므로 대량 비교를 수행하려면 null을 자리 표시 자 값으로 설정해야합니다.
이는 조인에 사용될 수있는 복합 키의 문제이기도합니다. 자연 키에 Null 허용 열이 포함 된 경우 합성 키 사용을 고려할 수 있습니다.
Null은 개수에서 누락 될 수 있으며 원하는 의미가 아닐 수 있습니다.
조인 할 수있는 열의 Null은 내부 조인에서 행을 제거합니다. 일반적으로 이것은 아마도 바람직한 행동이지만보고를하는 사람들에게 코끼리 덫을 놓을 수 있습니다.

null에 대한 몇 가지 다른 미묘함이 있습니다. Joe Celko의 Smarties 를 위한 SQL 은 주제에 대한 전체 장을 가지고 있으며 어쨌든 읽을만한 가치가있는 좋은 책입니다. null이 좋은 솔루션 인 장소의 몇 가지 예는 다음과 같습니다.

결합 된 엔터티가있을 수도 있고 없을 수도있는 선택적 관계입니다. Null은 외래 키 열에서 선택적 관계를 나타내는 유일한 방법입니다.
개수를 삭제하기 위해 null에 사용할 수있는 열입니다.
존재하거나 존재하지 않을 수있는 선택적 숫자 (예 : 통화) 값입니다. 숫자 체계 (특히 0이 합법적 인 값인 경우)에서 '기록되지 않음'에 대한 효과적인 자리 표시 자 값이 없으므로 실제로 null이이를 수행하는 유일한 좋은 방법입니다.

미묘한 버그를 유발할 수 있기 때문에 null 사용을 피하고 싶은 장소의 몇 가지 예입니다.

참조 테이블에 대해 FK가있는 코드 필드의 '기록되지 않음'값. 자리 표시 자 값을 사용하면 데이터베이스에 대한 쿼리를 수행 할 때 사용자 (또는 임의의 비즈니스 분석가)가 실수로 결과 집합에서 행을 삭제하지 않도록합니다.
아무것도 입력되지 않은 설명 필드-null 문자열 ( '') 이 제대로 작동합니다. 이렇게하면 null을 특수한 경우로 처리 할 필요가 없습니다.
보고 또는 데이터웨어 하우스 시스템의 선택적 열입니다. 이 경우 차원에서 '기록되지 않음'에 대한 자리 표시 자 행을 만들고 이에 대해 조인하십시오. 이는 쿼리를 단순화하고 임시보고 도구를 사용하여 훌륭하게 작동합니다.

다시 말하지만, Celko의 책은 주제에 대한 좋은 대우입니다.

Best thing to know about Normal Forms is that they are guides and guides should not be doggedly adhered to. When the world of academia clashes with the actual world you seldom find many surviving warriors of acedemia.

The answer to this question is that its ok to use nulls. Just evaluate your situation and decide if you want them to show up in the table or collapse the data into another related table if you feel you ratio of null values to actual values is too high.

As a friend is fond of saying, "Don't let the perfect be the enemy of the good". Think Voltaire also said that. 8)

According to strict relational algebra, nulls are not needed. However for any practical project, they are needed.

First, much real-world data is unknown or not applicable and nulls implement that behavior well. Second, they make views and outer joins much more practical.

You'll find with step-by-step data acquisition systems that you can't avoid having nulls in a database because the order of asking questions / data gathering very rarely matches the logical data model.

Or you can default the values (requiring code to handle these default values). You can assume all strings are empty instead of null, for example, in your model.

Or you can have staging database tables for data acquisition that continues until all the data is obtained before you populate the actual database tables. This is a lot of extra work.

To a database, null translates to "I don't have a value for this". Which means that (interestingly), a boolean column that allows nulls is perfectly acceptable, and appears in many database schemas. In contrast, if you have a boolean in your code that can have a value of 'true', 'false' or 'undefined', you're likely to see your code wind up on thedailywtf sooner or later :)

So yes, if you need to allow for the possibility of a field not having any value at all, then allowing nulls on the column is perfectly acceptable. It's significantly better than the potential alternatives (empty strings, zero, etc)

Nulls can be hard to work with, but they make sense in some cases.

Suppose you have an invoice table with a column "PaidDate" which has a date value. What do you put in that column before the invoice has been paid (assuming you don't know beforehand when it will be paid)? It can't be an empty string, because that's not a valid date. It doesn't make sense to give it an arbitrary date (e.g. 1/1/1900) because that date simply isn't correct. It seems the only reasonable value is NULL, because it does not have a value.

Working with nulls in a database has a few challenges, but databases handle them well. The real problems are when you load nulls from your database into your application code. That's where I've found that things are more difficult. For example, in .NET, a date in a strongly typed dataset (mimicking your DB structure) is a value type and cannot be null. So you have to build workarounds.

Avoid nulls when you can, but don't rule them out because they have valid uses.

I think you're confusing Conceptual Data Modeling with Physical Data Modeling.

In CDM's if an object has an optional field, you should subtype the object and create a new object for when that field is not null. That's the theory in CDMs

In the physical world we make all sorts of compromises for the real world. In the real world NULLS are more than fine, they are essential

I agree with many of the answers above and also believe that NULL can be used, where appropriate, in a normalized schema design - particularly where you may wish to avoid using some kind of "magic number" or default value which, in turn, could be misleading!

Ultimately though, I think usage of null needs to be well thought out (rather than by default) to avoid some of the assuptions listed in the answers above, particularly where NULL might be assumed to be 'nothing' or 'empty', 'unknown' or the 'value hasn't been entered yet'.

One gotcha if you are using an Oracle database. If you save an empty string to a CHAR type column then Oracle will coerce the value to be NULL without asking. So it can be quite difficult to avoid NULL values in string columns in Oracle.

If you are using NULL values, learn to use the SQL command COALESCE, especially with string values. You can then prevent NULL values propogating into your programming language. For example, imagine a person having a FirstName, MiddleName and FamilyName but you want to return a single field;

  SELECT FullName = COALESCE(FirstName + ' ', '') + COALESCE(MiddleName+ ' ', '') + COALESCE(FamilyName, '') FROM Person

If you don't use COALESCE, if any column contains a NULL value you get NULL returned.

Technically, nulls are illegal in relational math on which the relational database is based. So from a purely technical, semantic relational model point of view, no, they are not okay.

In the real world, denormalization and some violations of the model are okay. But, in general, nulls are an indicator that you should look at your overall design more closely.

I am always very wary of nulls and try to normalize them out whenever I can. But that doesn't mean that they aren't the best choice sometimes. But I would definitely lean to the side of "no nulls" unless you are really sure that having the nulls is better in your particular base.

NULL rocks. If it wasn't necessary in some cases, SQL would not have IS NULL and IS NOT NULL as special-case operators. NULL is the root of the conceptual universal, all else is NOT NULL. Use NULLs freely, whenever it may be possible for a data value to be absent but not missed. Default values can only compensate for NULL if they are absolutely correct all of the time. For example, if i have a single-bit field "IsReady" it may make perfect sense for this field to have a default value of false and NULL not be allowed, but this implicitly asserts that we know that the whatever is not ready, when in fact we may have no such knowledge. Chances are, in a workflow scenario, the person who decides ready-or-not just hasn't had the chance to enter their opinion yet, so a default of false could actually be dangerous, leading them to overlook a decision that appears to have been made but was in fact only defaulted.

as an aside, and in reference to the middle-initial example, my father had no middle name, therefore his middle initial would be NULL - not blank, space, or asterisk - except in the Army where his middle initial was NMI = No Middle Initial. How silly was that?

While technically NULLs are ok as a field value, they are quite frequently frowned upon. Depending on how data is written to your database, it is possible (and common) to end up with an empty string value in the field as opposed to a NULL. So, any query that has this field as part of the WHERE clause, would need to handle both scenarios which are needless keystrokes.

null means no value while 0 doesn't, if you see a 0 you don't know the meaning, if you see a null you know it is a missing value

I think nulls are much clearer, 0 and '' are confusing since they don't clearly show the intent of the value stored

Dont take my words sarcastic, I mean it. Unless you are working with toy databases NULLs are inevitable and in realworld we cannot avoid NULL values.

Just for saying how can you have first name, middle name, last name for every person. (Middle name and Last name is optional, then in that case NULLs are there for you) and how you can have Fax,Business phone,Office phone for everybody in the blog list.

NULLS are fine, and you have to handle them properly when retrieval. In SQL server 2008 there is a concept of Sparse columns where you can avoid the space taken for NULLs also.

Dont confuse NULLs with Zeros and any other value. People do that any say it is right.

Thanks Naveen

My controversial opinion for the day - the default of allowing NULLs in database columns was probably the worst universally accepted design decision in all of RDBMs land. Every vendor does it, and it's wrong. NULLs are fine in certain, specific, well thought out instances, but the idea that you have to explicitly disallow NULLs for every column makes negligent nullability way more common than it should be.

Personally, I think that nulls should only be used when you are using the field as a foreign key to another table, to symbolize that this record doesn't link to anything in the other table. Other than that, I find that null values are actually very troublesome when programming application logic. Because there is no direct representation of a database null in most programming languages for many data types, it ends up creating a lot of application code to deal with the meaning of these null values. When a DB encounters null integer, and tries, for instance, add a value of 1 to it (aka null + 1), the database will return null, as that is how the logic is defined. However, when a programming language tries to add null and 1, it will usually thrown an exception. So, your code ends up littered with checks of what to do when the value is null, which often just equates to converting to 0 for numbers, empty string for text, and some null date (1900/1/1?) for date fields.

I think the question comes down to what you interpret a value of NULL to signify. Yes, there are many interpretations for a NULL value, however some of them posted here should never be used. The true meaning of NULL is determined by the context of your application and should never mean more than one thing. For example, one suggestion was that NULL on a date of birth field would indicate the person was still alive. This is dangerous.

In all simplicity, define NULL and stick to it. I use it to mean "the value in this field is unknown at this time". It means that and ONLY that. If you need it to mean something else AS WELL, then you need to re-examine your data model.

It's absolutely fine with null.

It all comes down to normalization versus ease of use and performance issues.

If you are going to stick to complete normalization rules you are going to end up writing stuff that looks like:

Select c.id, c.lastname,....... from customer c left join customerphonenumber cpn on c.id = cpn.customerid left join customeraddress ca on c.id = ca.customerid left join customerphonenumber2 cpn2 on c.id = cpn2.customerid etc, etc, etc

참고URL : https://stackoverflow.com/questions/163434/are-nulls-in-a-relational-database-okay

'Development Tip' 카테고리의 다른 글

Outlook에서 줄 바꿈을 인쇄하도록 전자 메일의 문자열 서식을 어떻게 지정합니까? (0)	2020.11.02
여러 버튼의 OnClickListener () android (0)	2020.11.02
karma가 이미 설치된 경우 karma 명령을 찾을 수 없습니다. (0)	2020.11.02
클라이언트 측에서 암호 암호화 (0)	2020.11.02
iOS 10에서 카메라 및 라이브러리에 대한 권한 요청-Info.plist (0)	2020.11.02

현재글관계형 데이터베이스의 널은 괜찮습니까?

yourdevel

관계형 데이터베이스의 널은 괜찮습니까?

관계형 데이터베이스의 널은 괜찮습니까?

'Development Tip' 카테고리의 다른 글

'Development Tip'의 다른글

티스토리툴바

관계형 데이터베이스의 널은 괜찮습니까?

관계형 데이터베이스의 널은 괜찮습니까?

'Development Tip' 카테고리의 다른 글

'Development Tip'의 다른글

관련글

티스토리툴바