Rendered at 11:35:02 GMT+0000 (Coordinated Universal Time) with Wasmer Edge.
CGamesPlay 22 hours ago [-]
If you are interested in using S3 as a git remote but are concerned with privacy, I built a tool a while ago to use S3 as an untrusted git remote using Restic. https://github.com/CGamesPlay/git-remote-restic
Scribbd 2 days ago [-]
This is something I was trying to implement myself. I am surprised it can be done with just an s3 bucket. I was messing with API Gateways, Lambda functions and DynamoDB tables to support the s3 bucket. It didn't occur to me to implement it client side.
I might have stuck a bit too much to the lfs test server implementation. https://github.com/git-lfs/lfs-test-server
chx 2 days ago [-]
Client side is, while interesting, of limited use as every CI and similar tool won't work this. This seems like a sort of automation of wormhole which I guess is neat https://github.com/cxw42/git-tools/blob/master/wormhole
Actually, moto is just one bandaid for that problem - there are SO MANY s3 storage implementations, including the pre-license-switch Apache 2 version of minio (one need not use a bleeding edge for something as relatively stable as the S3 Api)
notpushkin 2 days ago [-]
> there are SO MANY s3 storage implementations
I suppose given this is under the AWS Labs org, they don’t really care about non-AWS S3 implementations.
mdaniel 2 days ago [-]
Well, I look forward to their `docker run awslabs/the-real-s3:latest` implementation then. Until such time, monkeypatching api calls to always give the exact answer the consumer is looking for is damn cheating
notpushkin 2 days ago [-]
Agreed, haha. Well, I think it should work with Minio & co. just as well, but be prepared to have your issues closed as unsupported. (Pesonally, I might give it a go with Backblaze B2 just to play around, yeah)
chrsig 2 days ago [-]
it wouldn't be unprecedented. dynamodb-local exists.
remram 20 hours ago [-]
Unfortunately there's been a few vulnerability since that old Minio release. For something you expose to users, it's a problem.
mdaniel 20 hours ago [-]
I would hope my mentioning moto made it clear my comment was about having an S3 implementation for testing. Presumably one should not expose moto to users, either
Happy 10,000th Day to you :-D Yes, moto and its friend localstack are just fantastic for being able to play with AWS without spending money, or to reproduce kabooms that only happen once a month with the real API
I believe moto has an "embedded" version such that one need not even have in listen on a network port, but I find it much, much less mental gymnastics to just supersede the "endpoint" address in the actual AWS SDKs to point to 127.0.0.1:4566 and off to the races. The AWS SDKs are even so friendly as to not mandate TLS or have allowlists of endpoint addresses, unlike their misguided Azure colleagues
Just remember, the mininum billing increment for file size is 128KB in real AWS S3. So your Git repo may be a lot more expensive than you would think if you have a giant source tree full of small files.
justin_oaks 20 hours ago [-]
That 128KB only applies to non-standard S3 storage tiers (glacier, infrequent access, one zone, etc)
S3 standard, which is likely what people would use for git storage, doesn't have that minimum file size charge.
I’ve used this guy’s CloudFormation template since forever for LFS on S3.
GitHub has to lower its egregious LFS pricing.
kernelsanderz 14 hours ago [-]
I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.
milkey_mouse 1 days ago [-]
You can also do this with Cloudflare Workers for fewer setup steps/moving parts:
Been using it to store datasets via lfs. Written in rust and has been very reliable.
matrss 19 hours ago [-]
There is also git-annex, which supports S3 as well as a bunch of other storage backends (and it is very easy to implement your own, it just has to loosely resemble a key-value store). Git-annex can use any of its special remotes as git remotes, like what the presented tool does for just S3.
bagavi 1 days ago [-]
Dvc is great tool!
lenova 1 days ago [-]
I haven't heard of dvc, so I had to google it, which took me to: https://dvc.org/
But I'm still confused as to what is dvc is after a cursory glance at their homepage.
chatmasta 1 days ago [-]
It was on the front page contemporaneously with this comment that recommended it, so you know it was an unbiased recommendation. :)
philsnow 2 days ago [-]
I'm surprised they just punt on concurrent updates [0] instead of locking with something like dynamodb, like terraform does.
S3 recently got conditional writes and you can use do locking entirely in S3 - I don't think they are using this though. Must be too recent an addition.
Yeah, it might still be possible to implement a mutex based on just the existence of an object, but it'll be harder to add expiration/liveness which I find essential.
mdaniel 2 days ago [-]
I thank goodness I have access to a non-stupid Terraform state provider[1] so I've never tried that S3+dynamodb setup but, if I understand the situation correctly, introducing Yet Another AWS Service ™ into this mix would mandate that callers also be given a `dynamo:WriteSomething` IAM perm, which is actually different from S3 in that in S3 one can -- at their discretion -- set the policies on the bucket such that it would work without any explicit caller IAM
I think this is more about storing the entire repository on s3, not just large files as git-lfs and git-annex are usually concerned with. But coincidentally, git-annex somewhat recently got the feature to use any of its special remotes as normal git remotes (https://git-annex.branchable.com/git-remote-annex/), including s3, webdav, anything that rclone supports, and a few more.
mattxxx 22 hours ago [-]
This seems wrong, since you can't push transactionally + consistently in S3.
But it seems like this is just the wrong tool for the job (hosting git repos).
xena 23 hours ago [-]
How do you install this? Homebrew broke global pip install. Is there a homebrew package or something?
mdaniel 20 hours ago [-]
FWIW, their helpers make things pretty cheap to create new Formula by yourself
$ brew create --python --set-license Apache-2 https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
Formula name [git-remote-s3]:
==> Downloading https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
==> Downloading from https://codeload.github.com/awslabs/git-remote-s3/tar.gz/refs/tags/v0.1.19
##O=-# #
Warning: Cannot verify integrity of '84b0a9a6936ebc07a39f123a3e85cd23d7458c876ac5f42e9f3ffb027dcb3a0f--git-remote-s3-0.1.19.tar.gz'.
No checksum was provided.
For your reference, the checksum is:
sha256 "3faa1f9534c4ef2ec130fac2df61428d4f0a525efb88ebe074db712b8fd2063b"
==> Retrieving PyPI dependencies for "https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz"...
==> Retrieving PyPI dependencies for excluded ""...
==> Getting PyPI info for "boto3==1.35.44"
==> Getting PyPI info for "botocore==1.35.44"
==> Excluding "git-remote-s3==0.1.19"
==> Getting PyPI info for "jmespath==1.0.1"
==> Getting PyPI info for "python-dateutil==2.9.0.post0"
==> Getting PyPI info for "s3transfer==0.10.3"
==> Getting PyPI info for "six==1.16.0"
==> Getting PyPI info for "urllib3==2.2.3"
==> Updating resource blocks
Please run the following command before submitting:
HOMEBREW_NO_INSTALL_FROM_API=1 brew audit --new git-remote-s3
Editing /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/g/git-remote-s3.rb
They also support building from git directly, if you want to track non-tagged releases (see the "--head" option to create)
x3n0ph3n3 1 days ago [-]
Wow, AWS really wants to get rid of CodeCommit.
fortran77 2 days ago [-]
Amazon has deprecated Amazon Code Commit, so this may be an interesting alternative.
adobrawy 1 days ago [-]
In what use case it can be interesting alternativd?
Limited access control (e.g. CI pass required), so not very useful for end users. For machine-to-machine it's an additional layer of abstraction when a regular tarball is fine.
tonymet 2 days ago [-]
how does it handle incremental changes? If it’s writing your entire repo on a loop, I could see why AWS would promote it.
Actually, moto is just one bandaid for that problem - there are SO MANY s3 storage implementations, including the pre-license-switch Apache 2 version of minio (one need not use a bleeding edge for something as relatively stable as the S3 Api)
I suppose given this is under the AWS Labs org, they don’t really care about non-AWS S3 implementations.
EDIT: They probably do not, I'm guessing they mean https://docs.getmoto.org/en/latest/index.html ?
I use this, and testing.postgresql for unit testing my api servers with barely any mocks used at all.
https://testcontainers-python.readthedocs.io/en/latest/
I believe moto has an "embedded" version such that one need not even have in listen on a network port, but I find it much, much less mental gymnastics to just supersede the "endpoint" address in the actual AWS SDKs to point to 127.0.0.1:4566 and off to the races. The AWS SDKs are even so friendly as to not mandate TLS or have allowlists of endpoint addresses, unlike their misguided Azure colleagues
Sorry, not sure what you mean?
S3 standard, which is likely what people would use for git storage, doesn't have that minimum file size charge.
See the asterisk sections in https://aws.amazon.com/s3/pricing/
I’ve used this guy’s CloudFormation template since forever for LFS on S3.
GitHub has to lower its egregious LFS pricing.
https://github.com/milkey-mouse/git-lfs-s3-proxy
Been using it to store datasets via lfs. Written in rust and has been very reliable.
But I'm still confused as to what is dvc is after a cursory glance at their homepage.
[0] https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
Doesn't S3 provide primitives to do the same? At least since moving to strong read-after-write consistency?
PS: I wrote the above package. Happy to answer questions about it.
GCS also allows for conditional overwrites using `If-Match: <etag>` which means you can do optimistic concurrency control. https://cloud.google.com/storage/docs/request-preconditions
1: https://docs.gitlab.com/ee/user/infrastructure/iac/terraform...
They address this directly in their section on concurrent writes: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
And in their design: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
But it seems like this is just the wrong tool for the job (hosting git repos).
Limited access control (e.g. CI pass required), so not very useful for end users. For machine-to-machine it's an additional layer of abstraction when a regular tarball is fine.