Ticket #1207 (new enhancement)

Opened 8 years ago

Last modified 8 years ago

Add xz support to expatch

Reported by: irishespresso Owned by:
Priority: Sometime Milestone:
Component: repositories/e/ebuild Version: 0.70.1
Keywords: Cc:
Blocked By: Blocking:
Distribution: Exherbo


xz compression has around for several years and is being used to zip an increasing number of mainstream tarballs, including several common GNU utilities and, most recently, the linux kernel itself. So I think it is now safe to say that xz is stable and it is time to add xz support to expatch. That also makes expatch consistent with the rest of paludis, which already supports xz and has had that support for a while. The attached patch adds xz support to expatch.


0001-Add-xz-support-to-expatch.patch Download (876 bytes) - added by irishespresso 8 years ago.
lvv.txt Download (887 bytes) - added by irishespresso 8 years ago.
output from SCM "xz -lvv foo.xz"
lvvrobot.txt Download (266 bytes) - added by irishespresso 8 years ago.
output from SCM "xz -lvv --robot foo.xz"

Change History

Changed 8 years ago by irishespresso


comment:1 Changed 8 years ago by ciaranm

Has upstream promised that xz is stable now? We had problems with lzma changing formats and breaking backwards compatibility. I'd rather not commit us to supporting .xz long term if it might change again.

comment:2 Changed 8 years ago by irishespresso

I could not find anything as formal as a promise on their website, but the format has not changed since August 2009, over two years ago, so I would say one could make a good case that it is stable at this point. And if it were still unstable, I doubt it would be used for things like the tarballs of coreutils and the linux kernel. Could you imagine the uproar if the linux-3.1.tar.xz tarball suddenly became unreadable? In any case, it's not like paludis does not currently support xz; paludis already handles source tarballs that are zipped with xz, and has for a while. This patch merely extends that ability to patch files. And since only a handful of patch files are zipped anyway, this is not a disruptive change that would be likely to break anything. It merely give expatch the capability to handle xz-compressed patches if it happens to encounter them.

comment:3 Changed 8 years ago by irishespresso

After some googling, it looks like xz went officially stable when they went from xz-4.999 beta to the xz-5.0 release. Not quite a promise, but close.

comment:4 Changed 8 years ago by irishespresso

And how does adding xz support to src_prepare, where 99% of patch files are uncompressed, represent a long-term commitment when src_unpack, where tarballs compressed with xz are becoming extremely common, has supported xz for years. In my humble opinion, this patch merely makes src_prepare consistent with src_unpack rather than representing a new long-term commitment to xz.

That said it's your call. If you a development branch for a version of paludis with a targeted release date of a year or more from now, that may be the best place to put this since by 2013 I think all remaining doubts about the stability of xz will have been resolved.

comment:5 Changed 8 years ago by irishespresso

I have sent a message to the xz development mailing list asking for a promise that xz is now stable. I will tell you what their response is.

comment:6 Changed 8 years ago by irishespresso

Here is upstream's response. New features will be added, but backward compatibility will be maintained and the format has been stable since it left the alpha phase. If you want to see the thread, xz-devel is archived at tukaani.org:

It is stable in sense that new tools will always be able to decompress old .xz files that have been created with a stable release of XZ Utils. It is possible and even somewhat likely that new features will be added in the future which old programs won't support.

Compare to the .zip format. It has got support for new compression methods and other features over the years, including LZMA support. When maximum portability is needed, people stick to the Deflate algorithm which all non-ancient .zip implementations support.

The old .lzma format hasn't changed since it was introduced in LZMA SDK and also used by LZMA Utils. There were development versions of the .xz format that used also the .lzma suffix, but no one has claimed that those alpha versions would be stable. If someone has thought the development versions were stable, it has been a major misunderstanding.

comment:7 Changed 8 years ago by ciaranm

Adding new features but keeping the extension the same presents us with some fairly major stability issues. Although it's not an issue for exheres-0, for exheres-1 we'll need to be able to lock down to a particular version and say that anything using exheres-1 isn't allowed to use a newer version of the xz format. Is there some way of finding out which version of the xz format a file uses?

comment:8 Changed 8 years ago by irishespresso

That is good point. I have asked that question on the xz-devel mailing list and will tell you what their response this. May guess is that you are looking for a way to restrict xz support to version x.y.z and under, which is a sensible thing to do.

comment:9 Changed 8 years ago by irishespresso

One member of the xz-devel mailing list recommended using "xz --test foo.xz", which will return an error if xz cannot uncompress foo.xz for any reason, which I assume includes new unrecognized features in the format.

comment:10 Changed 8 years ago by ciaranm

That's not really what's needed. We need a way to verify that a specific named version of xz, which might be older than the version someone has installed, will be able to uncompress a particular file.

comment:11 Changed 8 years ago by irishespresso

Unfortunately, "xz --list" does not show the version information that you are looking for. That said, the xz format is forward-compatible as long as one sticks to the defaults. From the xz maintainer:

Right now there is no way to get a version number of the format.

I could make xz -lvv show the oldest XZ Utils version that will decompress the file. It can only work for files that are supported by the xz tool, so it's not possible to make an old xz tool to display how much newer xz is required for a given file; the old tool could only tell that it doesn't support it. I don't know if this could be good enough for you.

To understand the reason for the above, it's good to understand how incompatible additions may happen:

(1) A new filter/method ID may be added into the official .xz format

specification. Old tools will show that there is an unsupported filter ID and cannot decompress such files (will display an error).

(2) Third-party developers may use custom filter IDs which aren't in

the official specification and aren't supported by XZ Utils. If they don't deviate from the .xz specification in any other way, this is OK. Old tools cannot distinguish this situation from (1).

(3) A new .xz format specification may add new features to the

container format. The old tools will detect such files as unsupported (they won't claim them to be corrupt). With old tools, the difference to (1) and (2) is that the old tools won't be able to list even the filter IDs.

If incompatible additions are made, the xz tool won't use them by default. Maybe they might become a default after several years have passed and old xz versions aren't common anymore. But it won't be done easily, because it would make people angry if the default settings created files that many wouldn't be able to decompress without extra work.

comment:12 Changed 8 years ago by ciaranm

We don't need to know "which version of xz will be able to decompress this?". We need to know "could version x.y.z have decompressed this?", where x.y.z is less than or equal to the version of xz that is installed.

In other words, we can guarantee for exheres-1 that at least version x.y.z of xz will be installed. But for validation purposes, we'd like to be able to verify that any given xz file will decompress with version x.y.z (where x.y.z is less than or equal to the current version, since we can force a particular xz version to be installed for a given EAPI to be supported), even if the user is running with a much newer version of xz.

Otherwise, we'll get horrible breakages because most users will be running a recent xz version anyway, and so the only people who would notice such breakage if we didn't have detection code would be people trying to upgrade from an old image.

comment:13 Changed 8 years ago by irishespresso

What output should the xz developers add to the --list option to satisfy your requirement? The maintainer has proposed adding information to --list -vv such that "xz --list -vv foo.xz" will show the minimum version x.y.z of xz-utils required to decompress foo.xz. If it returns "5.0.0", that means that anyone who has a stable, post-beta version of xz-utils will be able to decompress foo.xz. If it returns "5.0.3", only the most recent version will work. I suspect this is close to, but not quite, what you are looking for.

Could you give me an example of the xz command invocation and the resulting output that you are looking for? That would help to clear up any misunderstandings. Thanks.

comment:14 Changed 8 years ago by ciaranm

I need to be able to determine whether a given file will be extractable by any version of xz that is greater than or equal to, say, 5.1.2. This can be done via 'ever at_least' if necessary, so long as xz's versions are compatible with exheres-0 versions.

comment:15 Changed 8 years ago by irishespresso

The proposed enhancement to "xz -lvv foo.xz" will add to its output the minimum version of xz required to decompress the file. Thus something like this would work:

v=$(xz -lvv foo.xz | grep -Po '(?<=version )[.0-9]*'} ever at_least $v

Therefore, it seems like the maintainer's proposed enhancement meets your requirement, unless I am missing something.

comment:16 Changed 8 years ago by ciaranm

Probably, although parsing textual output like that is a horrible pain and prone to all kinds of breakages when either upstream tweaks the output slightly or when a user decides for some reason that they want program output in a language other than English...

comment:17 Changed 8 years ago by irishespresso

I agree that parsing can be a pain and have asked upstream to make this information easily parsible. xz has a "robot" option aimed at making the output of "xz --list" easier to parse, so "xz -lvv --robot foo.xz | grep ... " may be the way to go.

comment:18 Changed 8 years ago by irishespresso

Here is another update from the xz maintainer. He has added the minimum required version feature in SCM and it will be part of the xz 5.1.x series of stable releases. As I suspected, he recommended that we use the --robot option for easy and consistent parsing. Unfortunately, he has not yet set a release date for xz 5.1, but at least the feature is now in the pipeline. The maintainer said:

The feature is now available in the git repository. It will be in 5.1.2alpha, but I don't know when it will be released. It won't be in 5.0.x because I won't add any new features into a stable branch.

The info is also in xz -lvv --robot output so it should be easy to parse. The idea of --robot is to make parsing simple and stable across xz versions.

I didn't update the man page yet.

Changed 8 years ago by irishespresso

output from SCM "xz -lvv foo.xz"

Changed 8 years ago by irishespresso

output from SCM "xz -lvv --robot foo.xz"

comment:19 Changed 8 years ago by irishespresso

Obviously this is alpha code and thus subject to change. That said, the newly attached files show what the "xz -lvv" output looks like in the SCM version of xz. The robot version shows the version number as "50000002"; I am not sure how to translate that into x.y.z, but I am sure that the manpage will clarify this when it is updated. I am not going to worry about that now since, as I said, this is alpha code, so the final version of the output may be slightly different anyway.

comment:20 Changed 8 years ago by irishespresso

The version number translation is xyyyzzzs, where s is 0 for alpha, 1 for beta, and 2 for stable. Therefore 50000002 is 5 000 000 2 is 5.0.0 stable.

comment:21 Changed 8 years ago by ciaranm

Which means all we'd need to do is pick some version and hard code it every time we do an EAPI. Ok, guess we can go with this then.

Note: See TracTickets for help on using tickets.