Rendered at 05:39:21 GMT+0000 (Coordinated Universal Time) with Wasmer Edge.
halayli 5 hours ago [-]
This topic cannot be discussed alone without talking about disks. SSDs write 4k page at a time. Meaning if you're going to update 1 bit, the disk will read 4k, you update the bit, and it writes back a 4k page in a new slot. So the penalty for copying varies depending on the disk type.
It will be available to try on the supabase platform later this year too
justinclift 6 hours ago [-]
As a data point, there are easily noticeable typos on that docs page. Might be a good idea to have someone run a spell checker over it at some point?
philippemnoel 5 hours ago [-]
The whole ParadeDB team is really excited for OrioleDB and Supabase to ship this :) It's long overdue in the Postgres ecosystem!
fforflo 9 hours ago [-]
Yeah, I've been keeping an eye on the pgsql-hackers discussions. Alexander+team are doing great work.
fweimer 11 hours ago [-]
The big advantage is that you do not need any extra space if your workload mostly consists of INSERTs (followed by table drops). And it's generally unnecessary to split up insertion transactions because there is no size limit as such (neither on the generated data or the total count of rows changed). There is a limit on statements in a transaction, but you can sidestep that by using COPY FROM if you do not have to switch tables too frequently. From a DBA point of view, there is no need to manage a rollback/undo space separately from table storage.
Every application is a bit different, but it's not that the PostgreSQL design is a loser in all regards. It's not like bubble sort.
indulona 10 hours ago [-]
> but it's not that the PostgreSQL design is a loser in all regards
the article literally says that pg's mvcc design is from the 90s and no one does it like that any more. that is technology that is outdated by over 30 years. i'd say it does not make it a loser in all regards, but in the most important aspects.
mikeocool 9 hours ago [-]
When it comes to your data store, some people might consider using technology that’s been reliably used in production by many organizations for 30 years a feature not a bug.
I’d prefer not to be the first person running up against a limit or discovering a bug in my DB software.
mannyv 8 hours ago [-]
Well every product has issues. The question is, do you feel like dealing with those issues or not?
Flat files have also been reliably used in production for decades. That doesn't mean they're ideal...although amusingly enough s3 and its equivalent of flat files is what we've migrated to as a data store.
naranha 9 hours ago [-]
At least couchdb is also append only with vacuum. So it's maybe not completely outdated.
jbellis 8 hours ago [-]
High performance has never been a reason to use couchdb.
kunley 9 hours ago [-]
Still I am very happy to use every day the technology designed in early 70s by Ken Thompson and colleagues, so far in that specific field many tried to invent something more "modern" and "better" and failed, with an exception of a certain Finnish clone of that tech, also started in 80s by the way.
So, newer not always means better, just saying
nine_k 7 hours ago [-]
Speaking of which, if you try an actual System V in an emulator, or look at C code in K&R style, certain progress, as in "much more actually usable", can be noticed.
While persisting key architectural ideas certainly has benefits, so does evolving their implementations.
throwawayie6 6 hours ago [-]
> exception of a certain Finnish clone of that tech
Are you referring to C++? That was actually created by a Danish guy, who was also inspired by the object oriented Simula language created in the 60s
nneonneo 6 hours ago [-]
Pretty sure the OP was referring to UNIX and its “Finnish clone” Linux.
nightfly 11 hours ago [-]
> MySQL and Oracle store a compact delta between the new and current versions (think of it like a git diff).
Doesn't git famously _not_ store diffs and instead follows the same storage pattern postgres uses here and stores the full new and old objects?
ChadNauseam 11 hours ago [-]
TBF, the quoted section doesn't say that git stores diffs (or anything about git storage), it just says that what MySQL and Oracle stores is similar to a git diff.
zdragnar 3 hours ago [-]
It's a little too easy to misinterpret if you're skimming and still have memories of working with SVN, mercurial, perforce, and probably others (I've intentionally repressed everything about tfvc).
jmholla 11 hours ago [-]
That is correct. Each version of a file is a separate blob. There is some compression done by packing to make cloning faster, but the raw for git works with is these blobs.
quotemstr 9 hours ago [-]
git's model is a good example of layered architecture. Most of the code works in terms of whole blobs. The blob storage system, as an implementation detail, stores some blobs with diffs. The use of diffs doesn't leak into the rest of the system. Good separation of concerns
thaumasiotes 11 hours ago [-]
[flagged]
ori_b 10 hours ago [-]
Git does both. When you create a commit, it stores a full (zipped) copy of the object, without any deltas.
Periodically (I believe it used to be every thousand commits, though I'm not sure what the heuristic is today), git will take the loose objects and compress them into a pack.
The full blob format is how objects are manipulated by git internally: to do anything useful, the objects need to be extracted from the blob, with all deltas applied, before anything can be done with them.
It's also worth nothing that accessing a deltified object is slow (O(n) in the number of deltas), so the length of the delta chain is limited. Because deltification is really just a compression format, it doesn't matter how or where the deltas are done -- the trivial "no deltas" option will work just fine if you want to implement that.
You can trivially verify this by creating commits and looking in '.git/objects/*' for loose objects, running 'git repack', and then looking in '.git/objects/pack' for the deltified packs.
simonw 10 hours ago [-]
Saying "that's incorrect" is a lot more productive than saying "that's a lie".
Calling something a lie implies that the incorrect information was deliberate.
haradion 10 hours ago [-]
The file contents are logically distinct blobs. Packfiles will aggregate and delta-compress similar blobs, but that's all at a lower level than the logical model.
arp242 7 hours ago [-]
Sjeez, tone it down. People can be incorrect without lying.
epcoa 7 hours ago [-]
Others have mentioned that it said “git diffs”. However git does use deltas in pack files as a low level optimization, similar to the MySQL comparison. You don’t get back diffs from a SQL query either.
7 hours ago [-]
paulddraper 10 hours ago [-]
1. The comparison was to MySQL and Oracle storage using git diff format as an analogy, not git storage.
2. git storage does compress, and the compression is "diff-based" of sorts, but it is not based on commit history as one might naively expect.
Hendrikto 10 hours ago [-]
Git diffs are generated on the fly, but diffs are still diffs.
dfox 9 hours ago [-]
> Oracle and MySQL do not have this problem in their MVCC implementation because their secondary indexes do not store the physical addresses of new versions. Instead, they store a logical identifier (e.g., tuple id, primary key) that the DBMS then uses to look up the current version’s physical address. Now this may make secondary index reads slower since the DBMS has to resolve a logical identifier, but these DBMS have other advantages in their MVCC implementation to reduce overhead.
Interesting behavior of MySQL that I have observed (~500GB database, with a schema that is more of an document oriented than relational) is that when you update single row doing SELECT id WHERE something; UPDATE what WHERE id=id is orders of magnitudes faster than UPDATE what WHERE something. I somehow suspect that this is the reason for this behavior. But well, the normal workload will not do that and this only slows down ad-hoc DML when you fix some inconsistency.
whazor 8 hours ago [-]
A SELECT is a readonly operation and can be performed in parallel. However, an UPDATE actually writes and might lock the table. Whereas UPDATE id=id allows for row level locking. There is also the risk of missing newly inserted records between the SELECT and the UPDATE.
nine_k 8 hours ago [-]
SELECT FOR UPDATE was invented to address this,
fipar 8 hours ago [-]
Or just select + update in a transaction, which with IIRC, with the default isolation level will use optimistic locking for the select part, unlike select for update.
thih9 8 hours ago [-]
> Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“.
It was designed by former DoubleClick engineers as an afterthought DIY db for another service because no other db met their requirements. Supposedly version 4.2.8 (2020) is fairly solid, i.e. no dirty writes. https://en.wikipedia.org/wiki/MongoDB#Technical_criticisms
mbreese 4 hours ago [-]
So, if this is such a problem, my question is — are the poor MVCC choices of Postgres enough to make the authors (or people here) recommend another RDBMS?
abenga 3 hours ago [-]
The last couple of paragraphs of the article answer this. (The answer is No).
mbreese 2 hours ago [-]
Thanks - I completely missed the “concluding remarks” paragraph the first time. After the “problems” sections, I apparently just stopped reading.
For others who are curious:
> But please don’t misunderstand our diatribe to mean that we don’t think you should ever use PostgreSQL. Although its MVCC implementation is the wrong way to do it, PostgreSQL is still our favorite DBMS. To love something is to be willing to work with its flaws.
thr0w 9 hours ago [-]
Don't agree with their characterization of `pg_repack`. `VACUUM FULL` is definitely crushing, but that's why repack exists as a faster/lighter alternative. Anyone have a different experience?
fovc 6 hours ago [-]
The requirement for having two copies of the table simultaneously on systems that make it easy to add but not subtract storage. Otherwise pg_repack has worked really well.
We solved the 2x storage with partitions, but it feels like the tail wagging the dog
dfox 9 hours ago [-]
pg_repack is an hack-ish solution to do what VACUUM FULL does without completely locking the relation in question. But well, when you care about either of these things, your workload has significant issues, with the typical case being using pgsql as a backend for something that was originally a thick client designed for some kind of RDBMS based on shared files (InterBase mentioned in TFA, MS Jet whatever…)
srcreigh 3 hours ago [-]
It’s really annoying to see people write that Postgres has a “primary index” and “secondary indexes”. No. That’s not what those words mean. Every index in Postgres is a secondary index.
OrvalWintermute 8 hours ago [-]
This article is incorrect IMO - the following section in particular.
“ In the 2000s, the conventional wisdom selected MySQL because rising tech stars like Google and Facebook were using it. Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“. In the last five years, PostgreSQL has become the Internet’s darling DBMS. And for good reasons! It’s dependable, feature-rich, extensible, and well-suited for most operational workloads.”
Smart engineers were choosing postgres not because of the logical fallacy of popularum, but for the following reasons:
Data safety - not MyIsam,
ACID,
Similarity to Oracle,
MVCC,
SQL standards adherence,
Postgres team,
Helpful awesome community,
Data types,
High performance,
BSD flexibility
Above are the reasons I selected Postgres while at ATT early 2000s and our Oracle DBA found it a very easy transition. While Mysql went through rough transitions, PG has gone from strength to strength and ever improving path.
I think Bruce Momjian is a big part of this success; they truly have an excellent community.
<3
derefr 10 hours ago [-]
Question: is storing full new row-tuple versions something fundamental to Postgres as a whole, or is it just a property of the default storage engine / “table access method”?
dfox 8 hours ago [-]
From the PQ protocol PoV the way how this works is pretty much irrelevant, but the actual implementation of PostgreSQL contains ridiculous amount of places that depend on the “backward” MVCC implementation of the tuple heaps.
paulddraper 10 hours ago [-]
You could store partial tuples without disrupting the interface.
Though full tuples is pretty fundamental to the underlying implementation....MVCC, VACUUM, etc. It'd be a massive change to say the least.
msie 10 hours ago [-]
Nicely written article. Easy to read and understand!
avg_dev 11 hours ago [-]
pretty informative. now i understand why people are often having issues with vacuum-related stuff. i like the diagrams too.
I’ve been trashed for suggesting as much. People here seem to think any team or startup not choosing Postgres is committing some type of negligence.
rnewme 9 hours ago [-]
To be fair that's sort of a possotive self fulfilling propercy. The more it's used the better it's going to get.
jongjong 8 hours ago [-]
For most cases, MVCC sounds like over-engineering. From the problem description:
> The goal of MVCC in a DBMS is to allow multiple queries to read and write to the database simultaneously without interfering with each other when possible.
How is that a problem for most use cases?
If there is a read query which is taking a long time, with many rows, and some of these later rows happen to be updated mid-read but the earlier rows are not... It's not really a problem for the vast majority of application. Why is it better for all rows to be delivered out of date versus just the first half fetched being out of date? It's not ideal in either case but it's unavoidable that some requests can sometimes return out of date data. It seems like a tiny advantage.
I suspect the real need to implement MVCC arose out of the desire for databases like Postgres to implement atomic transactions as a magical black box.
IMO, two-phase commit is a simpler solution to this problem. It's not possible to fully hide concurrency complexity from the user; it ends up with tradeoffs.
kccqzy 8 hours ago [-]
One person's over engineering is another person's essential feature. I personally like the fact that Postgres supports the serializable isolation level that simplifies application programming.
> It's not really a problem for the vast majority of application.
This is true, but I don't even want to think about when it is indeed not really a problem and in the few cases when it is a problem.
magicalhippo 8 hours ago [-]
> I personally like the fact that Postgres supports the serializable isolation level that simplifies application programming.
Not sure how PG implements it, but I tried it in a case where I did need it in SQLAnywhere, and only found out a bit too late that while the docs stated it was very detrimental to performance, the docs didn't explicitly say why, and it was much worse than I had assumed.
I assumed it meant the transaction would lock the table, do it's thing and release on commit/rollback. And of course, that would hurt performance a lot if there was high contention. But no, that's not what it did. It was much, much worse.
Instead of taking a lock on the whole table, it locked all the rows. Which went as swimmingly as you could expect on a table with thousands upon thousands of rows.
Not sure why they did it this way, but yeah had to ditch that and went with the good old retry loop.
sitharus 48 minutes ago [-]
One of the best things about postgresql is the documentation. They document not only the features, but the constraints and considerations for using it and why they exist.
Oriole has joined us at supabase now and it’s being worked on full time by Alexander and his team. Here is the patch set:
https://www.orioledb.com/docs#patch-set
It will be available to try on the supabase platform later this year too
Every application is a bit different, but it's not that the PostgreSQL design is a loser in all regards. It's not like bubble sort.
the article literally says that pg's mvcc design is from the 90s and no one does it like that any more. that is technology that is outdated by over 30 years. i'd say it does not make it a loser in all regards, but in the most important aspects.
I’d prefer not to be the first person running up against a limit or discovering a bug in my DB software.
Flat files have also been reliably used in production for decades. That doesn't mean they're ideal...although amusingly enough s3 and its equivalent of flat files is what we've migrated to as a data store.
So, newer not always means better, just saying
While persisting key architectural ideas certainly has benefits, so does evolving their implementations.
Are you referring to C++? That was actually created by a Danish guy, who was also inspired by the object oriented Simula language created in the 60s
Doesn't git famously _not_ store diffs and instead follows the same storage pattern postgres uses here and stores the full new and old objects?
Periodically (I believe it used to be every thousand commits, though I'm not sure what the heuristic is today), git will take the loose objects and compress them into a pack.
The full blob format is how objects are manipulated by git internally: to do anything useful, the objects need to be extracted from the blob, with all deltas applied, before anything can be done with them.
It's also worth nothing that accessing a deltified object is slow (O(n) in the number of deltas), so the length of the delta chain is limited. Because deltification is really just a compression format, it doesn't matter how or where the deltas are done -- the trivial "no deltas" option will work just fine if you want to implement that.
You can trivially verify this by creating commits and looking in '.git/objects/*' for loose objects, running 'git repack', and then looking in '.git/objects/pack' for the deltified packs.
Calling something a lie implies that the incorrect information was deliberate.
2. git storage does compress, and the compression is "diff-based" of sorts, but it is not based on commit history as one might naively expect.
Interesting behavior of MySQL that I have observed (~500GB database, with a schema that is more of an document oriented than relational) is that when you update single row doing SELECT id WHERE something; UPDATE what WHERE id=id is orders of magnitudes faster than UPDATE what WHERE something. I somehow suspect that this is the reason for this behavior. But well, the normal workload will not do that and this only slows down ad-hoc DML when you fix some inconsistency.
Off topic, it was marketing all along: https://news.ycombinator.com/item?id=15124306
For others who are curious:
> But please don’t misunderstand our diatribe to mean that we don’t think you should ever use PostgreSQL. Although its MVCC implementation is the wrong way to do it, PostgreSQL is still our favorite DBMS. To love something is to be willing to work with its flaws.
We solved the 2x storage with partitions, but it feels like the tail wagging the dog
“ In the 2000s, the conventional wisdom selected MySQL because rising tech stars like Google and Facebook were using it. Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“. In the last five years, PostgreSQL has become the Internet’s darling DBMS. And for good reasons! It’s dependable, feature-rich, extensible, and well-suited for most operational workloads.”
Smart engineers were choosing postgres not because of the logical fallacy of popularum, but for the following reasons:
Data safety - not MyIsam, ACID, Similarity to Oracle, MVCC, SQL standards adherence, Postgres team, Helpful awesome community, Data types, High performance, BSD flexibility
Above are the reasons I selected Postgres while at ATT early 2000s and our Oracle DBA found it a very easy transition. While Mysql went through rough transitions, PG has gone from strength to strength and ever improving path.
I think Bruce Momjian is a big part of this success; they truly have an excellent community. <3
Though full tuples is pretty fundamental to the underlying implementation....MVCC, VACUUM, etc. It'd be a massive change to say the least.
The Part of PostgreSQL We Hate the Most (2023)
https://news.ycombinator.com/item?id=41892830
https://news.ycombinator.com/item?id=41837317
> The goal of MVCC in a DBMS is to allow multiple queries to read and write to the database simultaneously without interfering with each other when possible.
How is that a problem for most use cases?
If there is a read query which is taking a long time, with many rows, and some of these later rows happen to be updated mid-read but the earlier rows are not... It's not really a problem for the vast majority of application. Why is it better for all rows to be delivered out of date versus just the first half fetched being out of date? It's not ideal in either case but it's unavoidable that some requests can sometimes return out of date data. It seems like a tiny advantage.
I suspect the real need to implement MVCC arose out of the desire for databases like Postgres to implement atomic transactions as a magical black box.
IMO, two-phase commit is a simpler solution to this problem. It's not possible to fully hide concurrency complexity from the user; it ends up with tradeoffs.
> It's not really a problem for the vast majority of application.
This is true, but I don't even want to think about when it is indeed not really a problem and in the few cases when it is a problem.
Not sure how PG implements it, but I tried it in a case where I did need it in SQLAnywhere, and only found out a bit too late that while the docs stated it was very detrimental to performance, the docs didn't explicitly say why, and it was much worse than I had assumed.
I assumed it meant the transaction would lock the table, do it's thing and release on commit/rollback. And of course, that would hurt performance a lot if there was high contention. But no, that's not what it did. It was much, much worse.
Instead of taking a lock on the whole table, it locked all the rows. Which went as swimmingly as you could expect on a table with thousands upon thousands of rows.
Not sure why they did it this way, but yeah had to ditch that and went with the good old retry loop.
So from reading https://www.postgresql.org/docs/17/transaction-iso.html#XACT... we can tell that using serializable transactions only locks data actually used.